Debian Bug report logs - #127293
mawk does not understand unescaped '/' in character classes

version graph

Package: mawk; Maintainer for mawk is Steve Langasek <vorlon@debian.org>; Source for mawk is src:mawk.

Reported by: "Andrew T. Young" <aty@sciences.sdsu.edu>

Date: Tue, 1 Jan 2002 06:03:01 UTC

Severity: normal

Tags: fixed-upstream

Found in version 1.3.3-5

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, James Troup <james@nocrew.org>:
Bug#127293; Package mawk. Full text and rfc822 format available.

Acknowledgement sent to "Andrew T. Young" <aty@sciences.sdsu.edu>:
New Bug report received and forwarded. Copy sent to James Troup <james@nocrew.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: "Andrew T. Young" <aty@sciences.sdsu.edu>
To: submit@bugs.debian.org
Subject: mawk: regular expression compile failed
Date: Mon, 31 Dec 2001 21:51:24 -0800
Package: mawk
Version: 1.3.3-5
Severity: normal

A script that worked correctly with gawk failed with mawk.  I have isolated
the error in the following minimal script, put into a file named "pgm":

{if ($0 ~/^[-+()0-9.,$%/'"]*$/)
        {
	print "Found"
	}
}

So far as I can see, this is legal awk code.  But it crashes mawk.
Suppose "datfil" is the file of data to be processed.  Then:


Running mawk -f pgm datfil  gives the error message:

mawk: pgm: line 3: regular expression compile failed (bad class -- [], [^]
or [)
^[-+()0-9.,$%
mawk: 3: unexpected character '''
mawk: pgm: line 3: runaway string constant "]*$/) ...


but running  gawk -f pgm datfil  runs with no errors.

Perhaps the problem is the slash in the [...] class; mawk seems to be
taking it to denote the end of the regular expression.  I believe the
standard permits a slash inside character classes.  I emphasize that
this works OK under gawk, and I think has also worked on other systems
with other awks.

As an aside, I've had other problems with mawk, though they are documented
(e.g., the funny business with multi-line records).  It seems to me that
mawk is peculiar enough that Debian would be better off with gawk as the
default "awk", rather than using mawk for the default.

		-- A. T. Young   (aty@sciences.sdsu.edu)

-- System Information
Debian Release: 2.2
Kernel Version: Linux aty486 2.2.19 #1 Thu Nov 1 19:52:06 EST 2001 i486 unknown

Versions of the packages mawk depends on:
ii  libc6          2.1.3-19       GNU C Library: Shared libraries and Timezone



Information forwarded to debian-bugs-dist@lists.debian.org, "Andrew T. Young" <aty@sciences.sdsu.edu>, James Troup <james@nocrew.org>, mawk@packages.qa.debian.org:
Bug#127293; Package mawk. Full text and rfc822 format available.

Acknowledgement sent to "H. S. Teoh" <hsteoh@quickfur.ath.cx>:
Extra info received and forwarded to list. Copy sent to "Andrew T. Young" <aty@sciences.sdsu.edu>, James Troup <james@nocrew.org>, mawk@packages.qa.debian.org. Full text and rfc822 format available.

Message #10 received at 127293@bugs.debian.org (full text, mbox):

From: "H. S. Teoh" <hsteoh@quickfur.ath.cx>
To: 127293@bugs.debian.org, control@bugs.debian.org
Subject: Problem is with slash in character class
Date: Wed, 11 Dec 2002 11:44:48 -0500
retitle 127293 mawk does not understand unescaped '/' in character classes
thanks

I have verified that mawk is complaining because of the '/' inside the
character class, whereas gawk doesn't. However, escaping the '/' works
with *both* mawk and gawk. The following awk program works properly (note
that the slash is now escaped):

{if ($0 ~/^[-+()0-9.,$%\/'"]*$/)
	{
	print "Found"
	}
}

For both mawk and gawk, an input line containing '\' is *not* matched; so
both programs properly treat it as an escaped sequence.

I don't know what the POSIX recommendation for awk says; but if it doesn't
specify that slashes are allowed to be un-escaped inside a character
class, I submit that this is an implementation-dependent behaviour, and is
therefore not a bug.

HTH.


T

-- 
It's bad luck to be superstitious. -- YHL



Changed Bug title. Request was from "H. S. Teoh" <hsteoh@quickfur.ath.cx> to control@bugs.debian.org. Full text and rfc822 format available.

Message sent on to "Andrew T. Young" <aty@sciences.sdsu.edu>:
Bug#127293. (Sun, 12 Jul 2009 20:21:02 GMT) Full text and rfc822 format available.

Message #15 received at 127293-submitter@bugs.debian.org (full text, mbox):

From: Thomas Dickey <dickey@his.com>
To: 127293-submitter@bugs.debian.org
Subject: re: #127293 mawk does not understand unescaped '/' in character classes
Date: Sun, 12 Jul 2009 16:15:52 -0400
[Message part 1 (text/plain, inline)]
I'm inclined to agree since 

http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_01

do not state that awk is handling a subset of the BREs, etc.

However, I constructed a testcase for my fix, and found that gawk itself
has a bug.  Here's the testcase:

{
        if ($0 ~/^[-+()0-9.,$%/'"]*$/)
        {
                print "Found 1"
        }
        if ($0 ~/^[]-+()0-9.,$%/'"]*$/)
        {
                print "Found 2"
        }
        if ($0 ~/^[^]-+()0-9.,$%/'"]*$/)
        {
                print "Found 3"
        }
}

The fixed mawk will be in today's updates for

	ftp://invisible-island.net/mawk/

-- 
Thomas E. Dickey <dickey@invisible-island.net>
http://invisible-island.net
ftp://invisible-island.net
[signature.asc (application/pgp-signature, inline)]

Added tag(s) fixed-upstream. Request was from Thomas Dickey <dickey@his.com> to control@bugs.debian.org. (Tue, 28 Jul 2009 08:51:24 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Steve Langasek <vorlon@debian.org>:
Bug#127293; Package mawk. (Fri, 18 Dec 2009 17:45:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "David Rebatto" <David.Rebatto@mi.infn.it>:
Extra info received and forwarded to list. Copy sent to Steve Langasek <vorlon@debian.org>. (Fri, 18 Dec 2009 17:45:03 GMT) Full text and rfc822 format available.

Message #22 received at 127293@bugs.debian.org (full text, mbox):

From: "David Rebatto" <David.Rebatto@mi.infn.it>
To: 127293@bugs.debian.org
Subject: Problem still present in 1.3.3-11
Date: Fri, 18 Dec 2009 17:35:48 +0100 (CET)
Hi,
the described bug is still present in 1.3.3-11 (at least in Debian
distribution).
This is annoying as it prevents e.g. glibc from compiling (unless you
modify the .awk scripts).

David Rebatto





Information forwarded to debian-bugs-dist@lists.debian.org, Steve Langasek <vorlon@debian.org>:
Bug#127293; Package mawk. (Mon, 01 Mar 2010 18:42:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jonathan Nieder <jrnieder@gmail.com>:
Extra info received and forwarded to list. Copy sent to Steve Langasek <vorlon@debian.org>. (Mon, 01 Mar 2010 18:42:07 GMT) Full text and rfc822 format available.

Message #27 received at 127293@bugs.debian.org (full text, mbox):

From: Jonathan Nieder <jrnieder@gmail.com>
To: David Rebatto <David.Rebatto@mi.infn.it>
Cc: 127293@bugs.debian.org
Subject: Re: mawk does not understand unescaped '/' in character classes
Date: Mon, 1 Mar 2010 12:39:16 -0600
Hi David,

David Rebatto wrote:

> the described bug is still present in 1.3.3-11 (at least in Debian
> distribution).

True to his word, Thomas Dickey included a fix for this (among other
things) in mawk 1.3.3-20090712.  The CVS id suggests that there were
multiple patches in that update, so you might be able to get a more
targetted patch from him.

The changes can be found in the changes to collect_RE() from commit
http://git.debian.org/?p=collab-maint/mawk.git;a=commit;h=7c435c1f
I’m not linking to the diff because without ignoring whitespace (i.e.,
without ‘git show -w’) there’s too much noise.

That patch breaks regexps such as [a[].  You can find a series of
related fixes here:
http://git.debian.org/?p=collab-maint/mawk.git;a=commitdiff;h=b59d681;hp=770b12a
(Merge branch 'jn/cclass', 2010-01-30, applied upstream in mawk
1.3.4-20100131).  You may need the fix from Bug #65617 for this patch
to apply.

Hope that helps,
Jonathan




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#127293; Package mawk. (Mon, 01 Mar 2010 18:51:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Steve Langasek <vorlon@debian.org>:
Extra info received and forwarded to list. (Mon, 01 Mar 2010 18:51:06 GMT) Full text and rfc822 format available.

Message #32 received at 127293@bugs.debian.org (full text, mbox):

From: Steve Langasek <vorlon@debian.org>
To: Jonathan Nieder <jrnieder@gmail.com>, 127293@bugs.debian.org
Cc: David Rebatto <David.Rebatto@mi.infn.it>
Subject: Re: Bug#127293: mawk does not understand unescaped '/' in character classes
Date: Mon, 1 Mar 2010 10:49:47 -0800
[Message part 1 (text/plain, inline)]
On Mon, Mar 01, 2010 at 12:39:16PM -0600, Jonathan Nieder wrote:

> David Rebatto wrote:

> > the described bug is still present in 1.3.3-11 (at least in Debian
> > distribution).

> True to his word, Thomas Dickey included a fix for this (among other
> things) in mawk 1.3.3-20090712.  The CVS id suggests that there were
> multiple patches in that update, so you might be able to get a more
> targetted patch from him.

Where did you find a CVS repository?  Thomas has only ever pointed me to a
set of tarballs.

> The changes can be found in the changes to collect_RE() from commit
> http://git.debian.org/?p=collab-maint/mawk.git;a=commit;h=7c435c1f
> I’m not linking to the diff because without ignoring whitespace (i.e.,
> without ‘git show -w’) there’s too much noise.

Why are you setting up a mawk repository under collab-maint without talking
to the maintainer?

-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek@ubuntu.com                                     vorlon@debian.org
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Steve Langasek <vorlon@debian.org>:
Bug#127293; Package mawk. (Mon, 01 Mar 2010 19:03:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jonathan Nieder <jrnieder@gmail.com>:
Extra info received and forwarded to list. Copy sent to Steve Langasek <vorlon@debian.org>. (Mon, 01 Mar 2010 19:03:03 GMT) Full text and rfc822 format available.

Message #37 received at 127293@bugs.debian.org (full text, mbox):

From: Jonathan Nieder <jrnieder@gmail.com>
To: Steve Langasek <vorlon@debian.org>
Cc: 127293@bugs.debian.org, David Rebatto <David.Rebatto@mi.infn.it>
Subject: Re: Bug#127293: mawk does not understand unescaped '/' in character classes
Date: Mon, 1 Mar 2010 12:58:14 -0600
Hi Steve,

Steve Langasek wrote:
> On Mon, Mar 01, 2010 at 12:39:16PM -0600, Jonathan Nieder wrote:

>> True to his word, Thomas Dickey included a fix for this (among other
>> things) in mawk 1.3.3-20090712.  The CVS id suggests that there were
>> multiple patches in that update, so you might be able to get a more
>> targetted patch from him.
>
> Where did you find a CVS repository?  Thomas has only ever pointed me to a
> set of tarballs.

I don’t have access.  I was just referring to the $MawkId$ string in
the source.  Maybe it’s an RCS id.

>> The changes can be found in the changes to collect_RE() from commit
>> http://git.debian.org/?p=collab-maint/mawk.git;a=commit;h=7c435c1f
>> I’m not linking to the diff because without ignoring whitespace (i.e.,
>> without ‘git show -w’) there’s too much noise.
>
> Why are you setting up a mawk repository under collab-maint without talking
> to the maintainer?

I did talk to you (in not the best way, for which I’m sorry) [1].  I
put it on git.debian.org because that seemed like the easiest place to
work with others (especially to work with you...).

Hope that clarifies a little.
Jonathan

[1] http://bugs.debian.org/554167




Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Apr 17 21:56:47 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.