Debian Bug report logs -
#387704
grep: -i breaks \W in some locales (perhaps UTF-8 locales only)
Reported by: Christoph Biedl <cbiedl@gmx.de>
Date: Sat, 16 Sep 2006 08:48:35 UTC
Severity: normal
Tags: confirmed
Found in versions grep/2.5.1.ds2-5, grep/2.5.3~dfsg-6
Fixed in version 2.6.3-1
Done: santiago@debian.org
Bug is archived. No further changes may be made.
Forwarded to bug-grep@gnu.org
Toggle useless messages
Report forwarded to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#387704; Package grep.
(full text, mbox, link).
Acknowledgement sent to Christoph Biedl <cbiedl@gmx.de>:
New Bug report received and forwarded. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Package: grep
Version: 2.5.1.ds2-5
Severity: normal
I noticed that enabling --ignore-case suddenly caused certain patterns
not to match any longer although they should:
$ echo 'foo bar' | grep '^foo\W'
foo bar
$ echo 'foo bar' | grep -i '^foo\W'
$
Digging further reveals that there's an locales influence since
$ echo 'foo bar' | LANG=C grep -i '^foo\W'
foo bar
$
matches again. After a check using all my generated locales:
MATCH:
- de_DE
- de_DE@euro
- en_US
FAIL:
- de_DE.UTF-8
- de_DE.UTF-8@euro
- en_US.UTF-8
there's a strong impression that UTF-8 locales somehow disturb \W when
using -i.
Even more confusing, using the bracket expression instead of the synonym
matches again:
$ echo 'foo bar' | LANG=de_DE.UTF-8 grep -i '^foo[^[:alnum:]]'
foo bar
$
For the records, this sounds somewhat similar to #209194 and #218873 but
these bugs are fixed in this version (2.5.1.ds2-5), I've checked.
By the way, there's a typo in the manpage
and
.B \eW
is a synonym for
- .BR [^[:alnum]] .
+ .BR [^[:alnum:]] .
.PP
-- System Information:
Debian Release: testing/unstable
APT prefers testing
APT policy: (500, 'testing')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.17.13
Locale: LANG=de_DE.UTF-8@euro, LC_CTYPE=de_DE.UTF-8@euro (charmap=UTF-8)
Versions of packages grep depends on:
ii libc6 2.3.6.ds1-4 GNU C Library: Shared libraries
grep recommends no packages.
-- no debconf information
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#387704; Package grep.
(Sun, 11 Jan 2009 16:15:16 GMT) (full text, mbox, link).
Acknowledgement sent
to Ruben Molina <rmolina@udea.edu.co>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(Sun, 11 Jan 2009 16:15:16 GMT) (full text, mbox, link).
Message #10 received at 387704@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
tags 387704 + confirmed
found 387704 2.5.3~dfsg-6
thanks
$ locale
LANG=es_CO.UTF-8
LC_CTYPE="es_CO.UTF-8"
LC_NUMERIC="es_CO.UTF-8"
LC_TIME="es_CO.UTF-8"
LC_COLLATE="es_CO.UTF-8"
LC_MONETARY="es_CO.UTF-8"
LC_MESSAGES="es_CO.UTF-8"
LC_PAPER="es_CO.UTF-8"
LC_NAME="es_CO.UTF-8"
LC_ADDRESS="es_CO.UTF-8"
LC_TELEPHONE="es_CO.UTF-8"
LC_MEASUREMENT="es_CO.UTF-8"
LC_IDENTIFICATION="es_CO.UTF-8"
LC_ALL=
$ echo 'foo bar' | grep '^foo\W'
foo bar
$
$ echo 'foo bar' | grep -i '^foo\W'
$
$ echo 'foo bar' | LANG=C grep -i '^foo\W'
foo bar
$
[signature.asc (application/pgp-signature, inline)]
Tags added: confirmed
Request was from Ruben Molina <rmolina@udea.edu.co>
to control@bugs.debian.org.
(Sun, 11 Jan 2009 16:15:17 GMT) (full text, mbox, link).
Bug marked as found in version 2.5.3~dfsg-6.
Request was from Ruben Molina <rmolina@udea.edu.co>
to control@bugs.debian.org.
(Sun, 11 Jan 2009 16:15:17 GMT) (full text, mbox, link).
Information forwarded
to debian-bugs-dist@lists.debian.org, Anibal Monsalve Salazar <anibal@debian.org>:
Bug#387704; Package grep.
(Sun, 29 Mar 2009 05:51:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Aníbal Monsalve Salazar <anibal@debian.org>:
Extra info received and forwarded to list. Copy sent to Anibal Monsalve Salazar <anibal@debian.org>.
(Sun, 29 Mar 2009 05:51:02 GMT) (full text, mbox, link).
Message #19 received at 387704@bugs.debian.org (full text, mbox, reply):
forwarded 387704 bug-grep@gnu.org
thanks
On Sun, Jan 11, 2009 at 11:14:07AM -0500, Ruben Molina wrote:
>On Sat, Sep 16, 2006 at 10:35:26AM +0200, Christoph Biedl wrote:
>>Package: grep
>>Version: 2.5.1.ds2-5
>>Severity: normal
>>
>>I noticed that enabling --ignore-case suddenly caused certain patterns
>>not to match any longer although they should:
>>
>>$ echo 'foo bar' | grep '^foo\W'
>>foo bar
>>$ echo 'foo bar' | grep -i '^foo\W'
>>$
>>
>>Digging further reveals that there's an locales influence since
>>$ echo 'foo bar' | LANG=C grep -i '^foo\W'
>>foo bar
>>$
>>
>>matches again. After a check using all my generated locales:
>>
>>MATCH:
>>- de_DE
>>- de_DE@euro
>>- en_US
>>
>>FAIL:
>>- de_DE.UTF-8
>>- de_DE.UTF-8@euro
>>- en_US.UTF-8
>>
>>there's a strong impression that UTF-8 locales somehow disturb \W when
>>using -i.
>>
>>Even more confusing, using the bracket expression instead of the
>>synonym matches again:
>>$ echo 'foo bar' | LANG=de_DE.UTF-8 grep -i '^foo[^[:alnum:]]'
>>foo bar
>>$
>>
>>For the records, this sounds somewhat similar to #209194 and #218873
>>but these bugs are fixed in this version (2.5.1.ds2-5), I've checked.
>>
>>By the way, there's a typo in the manpage
>>
>> and
>> .B \eW
>> is a synonym for
>>- .BR [^[:alnum]] .
>>+ .BR [^[:alnum:]] .
>> .PP
>>
>>-- System Information:
>>Debian Release: testing/unstable
>> APT prefers testing
>> APT policy: (500, 'testing')
>>Architecture: i386 (i686)
>>Shell: /bin/sh linked to /bin/bash
>>Kernel: Linux 2.6.17.13
>>Locale: LANG=de_DE.UTF-8@euro, LC_CTYPE=de_DE.UTF-8@euro
>>(charmap=UTF-8)
>>
>>Versions of packages grep depends on:
>>ii libc6 2.3.6.ds1-4 GNU C Library: Shared
>>libraries
>>
>>grep recommends no packages.
>>
>>-- no debconf information
>
>tags 387704 + confirmed
>found 387704 2.5.3~dfsg-6
>thanks
>
>$ locale
>LANG=es_CO.UTF-8
>LC_CTYPE="es_CO.UTF-8"
>LC_NUMERIC="es_CO.UTF-8"
>LC_TIME="es_CO.UTF-8"
>LC_COLLATE="es_CO.UTF-8"
>LC_MONETARY="es_CO.UTF-8"
>LC_MESSAGES="es_CO.UTF-8"
>LC_PAPER="es_CO.UTF-8"
>LC_NAME="es_CO.UTF-8"
>LC_ADDRESS="es_CO.UTF-8"
>LC_TELEPHONE="es_CO.UTF-8"
>LC_MEASUREMENT="es_CO.UTF-8"
>LC_IDENTIFICATION="es_CO.UTF-8"
>LC_ALL=
>
>$ echo 'foo bar' | grep '^foo\W'
>foo bar
>$
>
>$ echo 'foo bar' | grep -i '^foo\W'
>$
>
>$ echo 'foo bar' | LANG=C grep -i '^foo\W'
>foo bar
>$
I can reproduce this bug with 2.5.4
grep -V
GNU grep 2.5.4
echo 'foo bar' | grep '^foo\W'; echo $?
foo bar
0
echo 'foo bar' | grep -i '^foo\W'; echo $?
foo bar
0
echo 'foo bar' | LANG=C grep -i '^foo\W'; echo $?
foo bar
0
echo 'foo bar' | LANG=en_AU grep -i '^foo\W'; echo $?
foo bar
0
echo 'foo bar' | LANG=en_AU.UTF-8 grep -i '^foo\W'; echo $?
1
Noted your statement that Bug has been forwarded to bug-grep@gnu.org.
Request was from Aníbal Monsalve Salazar <anibal@debian.org>
to control@bugs.debian.org.
(Sun, 29 Mar 2009 05:51:03 GMT) (full text, mbox, link).
Reply sent
to santiago@debian.org:
You have taken responsibility.
(Tue, 24 Jun 2014 17:51:16 GMT) (full text, mbox, link).
Notification sent
to Christoph Biedl <cbiedl@gmx.de>:
Bug acknowledged by developer.
(Tue, 24 Jun 2014 17:51:16 GMT) (full text, mbox, link).
Message #26 received at 387704-done@bugs.debian.org (full text, mbox, reply):
Version: 2.6.3-1
Hi,
I'm closing this bug since the issues with character classes and cases
ignored in multi-byte locales was fixed in grep 2.6.
$ echo 'foo bar' | LANG=C grep '^foo\W'; echo $?
foo bar
$ echo 'foo bar' | LANG=es_CO.UTF-8 grep '^foo\W'; echo $?
foo bar
0
Regards,
Santiago
Bug archived.
Request was from Debbugs Internal Request <owner@bugs.debian.org>
to internal_control@bugs.debian.org.
(Wed, 23 Jul 2014 07:25:58 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Thu Jan 11 17:48:52 2018;
Machine Name:
beach
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.