ADSM-L

Re: [ADSM-L] Unicode on UNIX

2010-05-18 19:42:11
Subject: Re: [ADSM-L] Unicode on UNIX
From: km <km AT GROGG DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 19 May 2010 01:20:25 +0200
On 19/05, Michael Green wrote:
> On Tue, May 18, 2010 at 10:49 PM, km <km AT grogg DOT org> wrote:
> > I would advice against overriding the default settings in a script and
> > instead to set the correct locale for the system. Most system settings in
> > RHEL based distros are made in the sysconfig directory:
> >
> > http://www.centos.org/docs/5/html/5.1/Deployment_Guide/s2-sysconfig-i18n.html
> >
>
> Please let me disagree with you. I think it's a wrong approach to
> change locale for the entire OS for the sake of backups only.
> Besides, I'm not fully aware of consequences of changing the locale
> system wide.
> Are you?

You should not change it for the sole purpose of backups but rather the
system locale should (YMMV) match what is being used on the system. This is
very common in non english speaking countries and fully supported with UTF-8
since atleast the release of RHEL 4. So yes, I am.

> > In this case, if the locale does not exist, just install it. Since the en_US
> > locale is included in the glibc-common RPM try to reinstall or update that
> > RPM.
>
> I didn't tell en_US locale doesn't exist. In contrary, it does. What I
> said is that Linux TSM client will not backup files with funny
> characters in filename after dsmcad is started from init script on
> _bootup_ with LC_CTYPE and LANG locales set to en_US in RHEL and SLES.

Wasn't that what the OP was saying, that the locale did not exist? Atleast
that is what I interpret this as:

"However, a user running CentOS thinks that en_US does not exist in that flavor 
of LINUX, so he misses 1000s of files each night."

This looks to me like the en_US locale is borked, which is part of
glibc-common. Maybe it was a hypothetical question. My bad in that case.

> I challenge anyone to show that it works for him/her in any version of
> RHEL or SLES.

The system i18n settings are sourced by rc.sysinit before either inittab or
any of the runlevel scripts are run so in theory everything should inherit
it correctly. I will check this tomorrow.

> >> However, a user running CentOS thinks that en_US does not exist in that 
> >> flavor of LINUX, so he misses 1000s of files each night.
> >>
> >> Anyone have any thoughts on this?
>
> Fred has touched here a major problem that has plagued TSM product
> line for ages and continues to go unresolved. This is absolutely
> unacceptable that TSM client skips files with filenames that do not
> conform to specific locale. In my view, every file that can be
> registered in a file system (ext3/reiser/xfs) supported by major
> commercial Linux distributions (RHEL/SLES) must be backed up no matter
> what.  As long as file system itself is consistent and underlying
> physical media is not damaged everything should just work.
> At around 2008 IBM published a paper called "Tivoli Storage Manager
> and Localization". The paper contains explanations on why it doesn't
> work and describes in length how to deal with the files named in
> various barbarian languages. It's a fascinating reading, but doesn't
> help much in my situation. And besides, with all due respect, IMO
> that's not something I, as administrator, should be dealing with. If
> GNU tar can swallow and restore these files without messing with
> locale or anything else, why TSM cannot?
> --
> Warm regards,
> Michael Green

I totally agree with this. A very common problem is servers with
filenames in different locales on the same server, for example software
repositories or file shares for multiple countries/languages.

-km