Debian Bug report logs - #158481
mawk: Crash when using gsub

version graph

Package: mawk; Maintainer for mawk is Steve Langasek <vorlon@debian.org>; Source for mawk is src:mawk.

Reported by: "Anders Bostr�m" <anders@as1-5-5.ulr.s.bonet.se>

Date: Tue, 27 Aug 2002 13:18:01 UTC

Severity: important

Tags: upstream

Found in version 1.3.3-8

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, James Troup <james@nocrew.org>, mawk@packages.qa.debian.org:
Bug#158481; Package mawk. Full text and rfc822 format available.

Acknowledgement sent to "Anders Bostr�m" <anders@as1-5-5.ulr.s.bonet.se>:
New Bug report received and forwarded. Copy sent to James Troup <james@nocrew.org>, mawk@packages.qa.debian.org. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: "Anders Boström" <anders@as1-5-5.ulr.s.bonet.se>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: mawk: Crash whe using gsub
Date: Tue, 27 Aug 2002 15:17:07 +0200
Package: mawk
Version: 1.3.3-8
Severity: important

The following very short script makes mawk seg-fault:

cat foo | mawk '{ gsub(/../, "0x&,"); print }' > /dev/null

when foo is a large file with only one line (an hex-encoded
binary). gawk works fine. I also tested:

cat foo | mawk '{ gsub(/[0-9a-f][0-9a-f]/, "0x&,"); print }' > /dev/null

With the same result, seg-fault.

The function is the same as "sed 's/../0x\0,/g'", and I'm making a
work-around for bug 155751, first filed on sed and now moved to
glibc.

/ Anders

-- System Information
Debian Release: testing/unstable
Architecture: i386
Kernel: Linux eckert 2.4.19 #1 mån aug 5 22:21:09 CEST 2002 i686
Locale: LANG=C, LC_CTYPE=sv_SE

Versions of packages mawk depends on:
ii  libc6                         2.2.5-13   GNU C Library: Shared libraries an




Information forwarded to debian-bugs-dist@lists.debian.org, James Troup <james@nocrew.org>, mawk@packages.qa.debian.org:
Bug#158481; Package mawk. Full text and rfc822 format available.

Acknowledgement sent to Morgon Kanter <morgon@surgo.net>:
Extra info received and forwarded to list. Copy sent to James Troup <james@nocrew.org>, mawk@packages.qa.debian.org. Full text and rfc822 format available.

Message #10 received at 158481@bugs.debian.org (full text, mbox):

From: Morgon Kanter <morgon@surgo.net>
To: 158481@bugs.debian.org
Subject: Looks like stack overflow
Date: Tue, 15 Apr 2003 02:54:08 -0400
[Message part 1 (text/plain, inline)]
Just attached GDB to a running mawk with the command you described on a 
file with a fairly long line (not long enough to cause a segfault, but 
long enough to take about 30 seconds to finish). I tried it with both 
a -g compiled mawk and a -g -O compiled mawk, and the backtrace showed 
gsub being called recursively hundreds upon hundreds of times. Too late 
to begin looking for a way to fix this tonight, but I'll look more at 
it tomorrow.

At least we know the problem. Seems like a stack overflow error to me.

-- 
Morgon Kanter <morgon@surgo.net> http://www.surgo.net
GPG key ID: 297CEA5B
Please don't CC me on mailing lists, I read them!
[Message part 2 (application/pgp-signature, inline)]

Message sent on to "Anders Bostr�m" <anders@as1-5-5.ulr.s.bonet.se>:
Bug#158481. (Sat, 11 Jul 2009 12:33:06 GMT) Full text and rfc822 format available.

Message #13 received at 158481-submitter@bugs.debian.org (full text, mbox):

From: Thomas Dickey <dickey@his.com>
To: 158481-submitter@bugs.debian.org
Subject: re: #158481 mawk: Crash whe using gsub
Date: Sat, 11 Jul 2009 08:28:28 -0400
[Message part 1 (text/plain, inline)]
>The following very short script makes mawk seg-fault:
>
>cat foo | mawk '{ gsub(/../, "0x&,"); print }' > /dev/null
>
>when foo is a large file with only one line (an hex-encoded
>binary). gawk works fine. I also tested:

How large is the file?  I just tested with a 2Mb file, and mawk works
properly (and runs a little slower than gawk).  Same result for a 32Mb file.

That for 

ii  mawk                                                     1.3.3-14                             a pattern scanning and text processing language

(probably one of the bug-fixes addressed this issue).

-- 
Thomas E. Dickey <dickey@invisible-island.net>
http://invisible-island.net
ftp://invisible-island.net
[signature.asc (application/pgp-signature, inline)]

Message sent on to "Anders Bostr�m" <anders@as1-5-5.ulr.s.bonet.se>:
Bug#158481. (Thu, 30 Jul 2009 23:42:03 GMT) Full text and rfc822 format available.

Message #16 received at 158481-submitter@bugs.debian.org (full text, mbox):

From: Thomas Dickey <dickey@his.com>
To: 158481-submitter@bugs.debian.org
Subject: re: #158481 - mawk: Crash whe using gsub
Date: Thu, 30 Jul 2009 19:35:38 -0400
[Message part 1 (text/plain, inline)]
I see the issue is the line-length (file-size is irrelevant).
It's fixable, can be done by rewriting the recursive gsub code to
use for-loops.

-- 
Thomas E. Dickey <dickey@invisible-island.net>
http://invisible-island.net
ftp://invisible-island.net
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Steve Langasek <vorlon@debian.org>:
Bug#158481; Package mawk. (Mon, 01 Mar 2010 22:15:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jonathan Nieder <jrnieder@gmail.com>:
Extra info received and forwarded to list. Copy sent to Steve Langasek <vorlon@debian.org>. (Mon, 01 Mar 2010 22:15:03 GMT) Full text and rfc822 format available.

Message #21 received at 158481@bugs.debian.org (full text, mbox):

From: Jonathan Nieder <jrnieder@gmail.com>
To: 158481@bugs.debian.org, control@bugs.debian.org
Cc: Morgon Kanter <morgon.kanter@gmail.com>, Anders Boström <anders@as1-5-5.ulr.s.bonet.se>
Subject: Re: mawk: Crash when using gsub
Date: Mon, 1 Mar 2010 16:12:07 -0600
retitle 158481 mawk: Crash when using gsub
tags 158481 + upstream
thanks 

Anders Boström wrote:

> The following very short script makes mawk seg-fault:
> 
> cat foo | mawk '{ gsub(/../, "0x&,"); print }' > /dev/null
> 
> when foo is a large file with only one line (an hex-encoded
> binary). gawk works fine.

Morgon Kanter wrote:

> At least we know the problem. Seems like a stack overflow error to me.

Yes.  In case someone is interested in working on this:

gsub() was written in a bit of a quick and dirty way as far as I can
tell.  It is recursive, but it does not need to be.

Overview of current gsub():

 1. Look for a match.  No match → hoorah!
 2. Copy the replacement string.
 3. Okay, we found a match.
    IF the match was an empty match at the start of the string
    and such matches are disallowed (they are allowed for the
    first match):
      case 1: The whole string is empty.  Throw away the replacement
              string.  The modified string will be empty, too.

      case 2: The regexp to match was anchored.  Throw away the
              replacement string.  There can be no more matches;
              the modified string is the current string.

      case 3: Unanchored match.  Throw away the replacement string.
              The match was disallowed, so we have to start matching
              with the next character.  So save the first character
              from the source string in the buffer for the replacement
              string and call gsub() to deal with the rest.
    OTHERWISE (i.e., the match is not at the start of the string,
    or empty matches at start are allowed):
      a. Front consists of all characters up to the match.
      a. Figure out the value to substitute (replacing the &s
         with copies of the matched string)
      b. Call gsub() on the rest of the string.
 4. Concatenate the three pieces (front, middle, and back) and
    return the result.

So there are many fronts and middles for previous stack frames
being collected, but it would be perfectly reasonable to collect
them all in one go.  If the string to collect them is allowed to
grow quickly enough, it would avoid a lot of unnecessary copying.

Hope that helps,
Jonathan




Changed Bug title to 'mawk: Crash when using gsub' from 'mawk: Crash whe using gsub' Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Mon, 01 Mar 2010 22:15:07 GMT) Full text and rfc822 format available.

Added tag(s) upstream. Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Mon, 01 Mar 2010 22:15:07 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Apr 24 20:08:31 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.