Debian Bug report logs - #445215
logcheck: egrep is soooo slow

version graph

Package: logcheck; Maintainer for logcheck is Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>; Source for logcheck is src:logcheck.

Reported by: Frédéric Brière <fbriere@fbriere.net>

Date: Thu, 4 Oct 2007 05:21:01 UTC

Severity: wishlist

Found in version logcheck/1.2.62

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>:
Bug#445215; Package logcheck. Full text and rfc822 format available.

Acknowledgement sent to Frédéric Brière <fbriere@fbriere.net>:
New Bug report received and forwarded. Copy sent to Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Frédéric Brière <fbriere@fbriere.net>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: logcheck: egrep is soooo slow
Date: Thu, 04 Oct 2007 01:17:52 -0400
[Message part 1 (text/plain, inline)]
Package: logcheck
Version: 1.2.62
Severity: wishlist

Yesterday, while running logcheck against all my syslogs for the week, I
started bemoaning how long the whole thing was taking (over 9 minutes
for 4 megs).  I wondered if maybe one bad regex was stalling the whole
thing, but the debug output showed that all rulefiles were taking up
time proportional to their size.  (Besides, egrep being based on a DFA,
it doesn't care much about how a regex is written.)

Out of curiosity, and realizing that an egrep regex should, AFAIK, work
just the same in Perl, I whipped up a one-liner to test out one
rulefile.  egrep took over 20 seconds to match ignore.d.server/spamd
against my logs; perl took less than 2 to produce the same results.

So, I wrote up the attached script as a quick hack to try out perl as a
substitute for egrep.  This brought the run time down to less than a
minute and a half.  As Mr. Brian Norris said: "I'm convinced".  :)


Now, I'm not advocating immediate action, as such a switch should
certainly not be taken lightly, especially given the security role of
logcheck.  Nevertheless, I think it's something worth mulling over,
given the speed difference.  What do you think?


-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.21-2-k7 (SMP w/1 CPU core)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages logcheck depends on:
ii  adduser          3.105                   add and remove users and groups
ii  cron             3.0pl1-100              management of regular background p
ii  lockfile-progs   0.1.11                  Programs for locking and unlocking
ii  logtail          1.2.62                  Print log file lines that have not
ii  mailx            1:8.1.2-0.20070424cvs-1 A simple mail user agent
ii  postfix [mail-tr 2.4.5-4                 High-performance mail transport ag
ii  sysklogd [system 1.5-1                   System Logging Daemon

Versions of packages logcheck recommends:
ii  logcheck-database             1.2.62     database of system log rules for t

-- no debconf information
[minigrep.pl (application/x-perl, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>:
Bug#445215; Package logcheck. Full text and rfc822 format available.

Acknowledgement sent to "Johan Walles" <johan.walles@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>. Full text and rfc822 format available.

Message #10 received at 445215@bugs.debian.org (full text, mbox):

From: "Johan Walles" <johan.walles@gmail.com>
To: 445215@bugs.debian.org, fbriere@fbriere.net, 450649@bugs.debian.org
Subject: 445215 and 450649 are related
Date: Sat, 1 Dec 2007 11:42:49 +0100
Bug 445215 is about  egrep being a ton slower than doing the same thing in PERL.

Bug 450649 is about egrep spending 50% of its time clearing memory
using a for() loop rather than memset().

There's obviously some kind of relationship between these issues :-).

  Regards //Johan




Information forwarded to debian-bugs-dist@lists.debian.org, Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>:
Bug#445215; Package logcheck. (Mon, 21 Dec 2009 08:27:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Johan Walles <johan.walles@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>. (Mon, 21 Dec 2009 08:27:03 GMT) Full text and rfc822 format available.

Message #15 received at 445215@bugs.debian.org (full text, mbox):

From: Johan Walles <johan.walles@gmail.com>
To: 445215@bugs.debian.org
Subject: Still slow
Date: Mon, 21 Dec 2009 09:22:47 +0100
Even with the egrep memory clearing issue fixed (it was long time
ago), grep still uses up ungodly amounts of CPU time while logcheck is
running.  If the above perl hack is the way to go, please implement
it!

Or maybe logcheck is becoming big enough that it shouldn't be in
shellscript any more?

  //Johan




Information forwarded to debian-bugs-dist@lists.debian.org, Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>:
Bug#445215; Package logcheck. (Mon, 21 Dec 2009 11:33:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to martin f krafft <madduck@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian logcheck Team <logcheck-devel@lists.alioth.debian.org>. (Mon, 21 Dec 2009 11:33:06 GMT) Full text and rfc822 format available.

Message #20 received at 445215@bugs.debian.org (full text, mbox):

From: martin f krafft <madduck@debian.org>
To: Johan Walles <johan.walles@gmail.com>, 445215@bugs.debian.org
Subject: Re: Bug#445215: Still slow
Date: Mon, 21 Dec 2009 18:30:38 +0700
also sprach Johan Walles <johan.walles@gmail.com> [2009.12.21.1522 +0700]:
> Even with the egrep memory clearing issue fixed (it was long time
> ago), grep still uses up ungodly amounts of CPU time while logcheck is
> running.  If the above perl hack is the way to go, please implement
> it!
> 
> Or maybe logcheck is becoming big enough that it shouldn't be in
> shellscript any more?

It should have never been a shell script. If you are interested in
a rewrite, have a look at http://wiki.logcheck.org/logfilter and
don't hesitate to speak to me for input, should you want it.

-- 
 .''`.   martin f. krafft <madduck@d.o>      Related projects:
: :'  :  proud Debian developer               http://debiansystem.info
`. `'`   http://people.debian.org/~madduck    http://vcs-pkg.org
  `-  Debian - when you have better things to do than fixing systems




Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Apr 20 01:25:13 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.