Debian Bug report logs - #531721
bash: glob pattern "[A-Z]" matches "b", "c" .. and "z" on LC_ALL=en_US.UTF-8

version graph

Package: bash; Maintainer for bash is Matthias Klose <doko@debian.org>; Source for bash is src:bash (PTS, buildd, popcon).

Reported by: SATOH Fumiyasu <fumiyas@osstech.jp>

Date: Wed, 3 Jun 2009 14:06:01 UTC

Severity: grave

Found in version bash/3.2-5

Done: Matthias Klose <doko@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Matthias Klose <doko@debian.org>:
Bug#531721; Package bash. (Wed, 03 Jun 2009 14:06:03 GMT) (full text, mbox, link).


Acknowledgement sent to SATOH Fumiyasu <fumiyas@osstech.jp>:
New Bug report received and forwarded. Copy sent to Matthias Klose <doko@debian.org>. (Wed, 03 Jun 2009 14:06:03 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: SATOH Fumiyasu <fumiyas@osstech.jp>
To: submit@bugs.debian.org
Subject: bash: glob pattern "[A-Z]" matches "b", "c" .. and "z" on LC_ALL=en_US.UTF-8
Date: Wed, 03 Jun 2009 23:04:56 +0900
Package: bash
Version: 3.2-5
Severity: important

$ dpkg -l libc6 bash
...
ii  bash             3.2-5            The GNU Bourne Again SHell
ii  libc6            2.9-13           GNU C Library: Shared libraries
$ mkdir tmp
$ cd tmp
$ touch a b c x y z A B C X Y Z
$ LC_ALL=C /bin/bash --noprofile --norc -c 'echo [A-Z]'
A B C X Y Z
$ LC_ALL=ja_JP.UTF-8 /bin/bash --noprofile --norc -c 'echo [A-Z]'
A B C X Y Z
$ LC_ALL=en_US.UTF-8 /bin/bash --noprofile --norc -c 'echo [A-Z]'
A b B c C x X y Y z Z

-- 
-- Name: SATOH Fumiyasu (fumiyas @ osstech co jp)
-- Business Home: http://www.OSSTech.co.jp/
-- Personal Home: http://www.SFO.jp/blog/




Information forwarded to debian-bugs-dist@lists.debian.org, Matthias Klose <doko@debian.org>:
Bug#531721; Package bash. (Tue, 14 Jul 2009 11:21:05 GMT) (full text, mbox, link).


Acknowledgement sent to Michael Schutte <michi@uiae.at>:
Extra info received and forwarded to list. Copy sent to Matthias Klose <doko@debian.org>. (Tue, 14 Jul 2009 11:21:05 GMT) (full text, mbox, link).


Message #10 received at 531721@bugs.debian.org (full text, mbox, reply):

From: Michael Schutte <michi@uiae.at>
To: SATOH Fumiyasu <fumiyas@osstech.jp>, 531721@bugs.debian.org
Subject: Re: Bug#531721: bash: glob pattern "[A-Z]" matches "b", "c" .. and "z" on LC_ALL=en_US.UTF-8
Date: Tue, 14 Jul 2009 12:42:31 +0200
[Message part 1 (text/plain, inline)]
Hey Fumiyasu,

On Wed, Jun 03, 2009 at 11:04:56PM +0900, SATOH Fumiyasu wrote:
> $ dpkg -l libc6 bash
> ...
> ii  bash             3.2-5            The GNU Bourne Again SHell
> ii  libc6            2.9-13           GNU C Library: Shared libraries
> $ mkdir tmp
> $ cd tmp
> $ touch a b c x y z A B C X Y Z
> $ LC_ALL=C /bin/bash --noprofile --norc -c 'echo [A-Z]'
> A B C X Y Z
> $ LC_ALL=ja_JP.UTF-8 /bin/bash --noprofile --norc -c 'echo [A-Z]'
> A B C X Y Z
> $ LC_ALL=en_US.UTF-8 /bin/bash --noprofile --norc -c 'echo [A-Z]'
> A b B c C x X y Y z Z

This behavior seems quite dangerous to me: the command “rm [A-Z]*” could
remove more than just files starting with an uppercase letter, which
most people probably would not expect.

The root of this issue is in bash-3.2/lib/glob/smatch.c:

    static int rangecmp (c1, c2)
         int c1, c2;
    {
      static char s1[2] = { ' ', '\0' };
      static char s2[2] = { ' ', '\0' };
      int ret;

      /* Eight bits only.  Period. */
      c1 &= 0xFF;
      c2 &= 0xFF;

      if (c1 == c2)
        return (0);

      s1[0] = c1;
      s2[0] = c2;

      if ((ret = strcoll (s1, s2)) != 0)
        return ret;
      return (c1 - c2);
    }

This function uses the strcoll() function which is similar to strcmp()
but “compares two strings using the current locale”.  This allows things
like

    $ touch ö
    $ echo [o-p]
    ö

to work and also causes the problem you described.  Interestingly, the
POSIX specification permits this:

    7. In the POSIX locale, a range expression represents the set of
    collating elements that fall between two elements in the collation
    sequence, inclusive. In other locales, a range expression has
    unspecified behavior: strictly conforming applications shall not
    rely on whether the range expression is valid, or on the set of
    collating elements matched.

     – http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

So, the bash maintainers may decide to use -DUSE_POSIX_GLOB_LIBRARY
(which is deprecated according to
http://lists.gnu.org/archive/html/bug-bash/2001-02/msg00032.html), patch
away the usage of strcoll, or leave everything as it is.

All the best,
-- 
Michael Schutte <michi@uiae.at>
[signature.asc (application/pgp-signature, inline)]

Severity set to 'grave' from 'important' Request was from Josh Triplett <josh@joshtriplett.org> to control@bugs.debian.org. (Sun, 06 Sep 2009 23:15:17 GMT) (full text, mbox, link).


Reply sent to Matthias Klose <doko@debian.org>:
You have taken responsibility. (Sun, 13 Sep 2009 12:00:13 GMT) (full text, mbox, link).


Notification sent to SATOH Fumiyasu <fumiyas@osstech.jp>:
Bug acknowledged by developer. (Sun, 13 Sep 2009 12:00:14 GMT) (full text, mbox, link).


Message #17 received at 531721-done@bugs.debian.org (full text, mbox, reply):

From: Matthias Klose <doko@debian.org>
To: 531721-done@bugs.debian.org
Subject: Re: bash: glob pattern "[A-Z]" matches "b", "c" .. and "z" on LC_ALL=en_US.UTF-8
Date: Sun, 13 Sep 2009 13:34:29 +0200
This is not a bug but intended behaviour. Use the C locale for the desired 
behaviour.




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Mon, 12 Oct 2009 07:30:47 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Jun 4 23:42:51 2023; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.