Debian Bug report logs - #638256
race condition causes many syslog messages: File a[[:xdigit:]]{13} is in wrong format - aborting

version graph

Package: at; Maintainer for at is Jose M Calhariz <calhariz@debian.org>; Source for at is src:at (PTS, buildd, popcon).

Reported by: Martin Dorey <mdorey@bluearc.com>

Date: Thu, 18 Aug 2011 00:15:05 UTC

Severity: normal

Tags: patch

Found in version at/3.1.10.2

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Ansgar Burchardt <ansgar@debian.org>:
Bug#638256; Package at. (Thu, 18 Aug 2011 00:15:08 GMT) (full text, mbox, link).


Acknowledgement sent to Martin Dorey <mdorey@bluearc.com>:
New Bug report received and forwarded. Copy sent to Ansgar Burchardt <ansgar@debian.org>. (Thu, 18 Aug 2011 00:15:08 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Martin Dorey <mdorey@bluearc.com>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: race condition causes many syslog messages: File a[[:xdigit:]]{13} is in wrong format - aborting
Date: Wed, 17 Aug 2011 17:12:34 -0700
[Message part 1 (text/plain, inline)]
Package: at
Version: 3.1.10.2
Severity: normal
Tags: patch

In my experience, when a file system fills up, it often seems that there's room
in the inode table when there's no room for file bodies.  Unintentionally empty
files become common.  If /var/spool/cron/atjobs ends up containing a zero-length
job file for a job due over an hour ago, then atd sometimes goes nuts, logging
an unbounded number of messages like:

2011-08-09T16:11:12-07:00 merc55rm atd[4194]: File a00afd014dc579 is in wrong format - aborting

Even when the original reason for the full file system is removed, atd can go
on to fill it again with gigabytes of syslog entries.

Google finds many similar complaints over the years:

http://goo.gl/rvdP6
http://ubuntuforums.org/archive/index.php/t-1575261.html
https://bugzilla.redhat.com/show_bug.cgi?id=718422
http://forums.fedoraforum.org/showthread.php?t=252412

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=101919 may have been the cause
of an apparently corrupted at job.  What I'm focusing on here is the manifold
repetition of the error message, assuming a corrupted at job.

On one particular machine, I was able to reproduce this, most of the time, with
this test:

sudo su -
cd /var/spool/cron/atjobs
/etc/init.d/atd stop
touch a0000000000000
chmod +x a0000000000000
grep -w atd /var/log/syslog | wc -l
/etc/init.d/atd start
sleep 1
grep -w atd /var/log/syslog | wc -l
sleep 1
grep -w atd /var/log/syslog | wc -l

The number of lines of atd message rises for a while, then stabilizes.

I wasn't able to reliably reproduce the issue on a couple of other machines,
including the one from which I'm reporting the bug, unless I started atd from
within valgrind:

valgrind --trace-children=yes -q /etc/init.d/atd start

An strace extract, available on request, from the system where I could easily
reproduce the problem, showed that atd stabilized when the child that logs
the "aborting" message exits before the main atd process goes to sleep.
If the parent wins the race, then the sleep is interrupted by SIGCHLD.  The
parent then calls run_loop again and the cycle repeats.

My suggested fix sleeps again if we're woken prematurely.

-- System Information:
Debian Release: 5.0.8
  APT prefers oldstable
  APT policy: (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages at depends on:
ii  exim4                     4.69-9+lenny4  metapackage to ease Exim MTA (v4) 
ii  exim4-daemon-heavy [mail- 4.69-9+lenny4  Exim MTA (v4) daemon with extended
ii  libc6                     2.7-18lenny7   GNU C Library: Shared libraries
ii  libpam0g                  1.0.1-5+lenny1 Pluggable Authentication Modules l
ii  lsb-base                  3.2-20         Linux Standard Base 3.2 init scrip

at recommends no packages.

at suggests no packages.

-- no debconf information
[atd.c.patch (text/plain, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Ansgar Burchardt <ansgar@debian.org>:
Bug#638256; Package at. (Wed, 14 Oct 2015 10:06:03 GMT) (full text, mbox, link).


Acknowledgement sent to Kristýna Streitová <kstreitova@suse.com>:
Extra info received and forwarded to list. Copy sent to Ansgar Burchardt <ansgar@debian.org>. (Wed, 14 Oct 2015 10:06:03 GMT) (full text, mbox, link).


Message #10 received at 638256@bugs.debian.org (full text, mbox, reply):

From: Kristýna Streitová <kstreitova@suse.com>
To: 638256@bugs.debian.org
Subject: race condition causes many syslog messages: File a[[:xdigit:]]{13} is in wrong format - aborting
Date: Wed, 14 Oct 2015 11:47:12 +0200
[Message part 1 (text/plain, inline)]
This still is an issue in at-3.1.16

"Abrupt reboot of a server, from a crash for example, can leave
malformed at job files, which can cause atd to go into high cpu
utilization after boot.

Logs can be flooded with such errors as:

# grep atd /var/log/messages|tail
Aug 26 11:24:29 isp2264 atd[30648]: Job    62696 (a0f4e8016e2e2b) -
userid 250
does not match file uid 0
Aug 26 11:24:30 isp2264 atd[30649]: Job    62696 (a0f4e8016e2e2b) -
userid 250
does not match file uid 0
Aug 26 11:24:30 isp2264 atd[30652]: Job    62696 (a0f4e8016e2e2b) -
userid 250
does not match file uid 0
Aug 26 11:24:30 isp2264 atd[30654]: Job    62696 (a0f4e8016e2e2b) -
userid 250
does not match file uid 0
(...)"


I'm attaching a patch that reschedules failed jobs with a pause in order
to avoid error messages flood. It also tries to prevent creation of the
corrupted files by using 'O_SYNC' flag with open() system call.


Best regards
Kristyna Streitova
[at-3.1.16-handle_malformed_jobs.patch (text/x-patch, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Jose M Calhariz <calhariz@debian.org>:
Bug#638256; Package at. (Tue, 23 Jun 2020 16:15:05 GMT) (full text, mbox, link).


Acknowledgement sent to Dimedrol Vitaminovich <dimedrol-vitaminovich@yandex.ru>:
Extra info received and forwarded to list. Copy sent to Jose M Calhariz <calhariz@debian.org>.

Your message did not contain a Subject field. They are recommended and useful because the title of a Bug is determined using this field. Please remember to include a Subject field in your messages in future.

(Tue, 23 Jun 2020 16:15:05 GMT) (full text, mbox, link).


Message #15 received at 638256@bugs.debian.org (full text, mbox, reply):

From: Dimedrol Vitaminovich <dimedrol-vitaminovich@yandex.ru>
To: 638256@bugs.debian.org
Date: Tue, 23 Jun 2020 19:04:23 +0300
This still is an issue in at-3.1.23-1


root@karpigor:/var/log# dpkg -l at
...
+++-==============-============-============-==========================================
ii  at             3.1.23-1     amd64        Delayed job execution and 
batch processing
root@karpigor:/var/log#


/var/log/syslog   filled with

...
Jun 23 11:30:42 karpigor atd[5282]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5283]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5284]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5285]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5286]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5287]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5288]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5289]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5290]: File a000ad0195130c is in wrong 
format - aborting
Jun 23 11:30:42 karpigor atd[5291]: File a000ad0195130c is in wrong 
format - aborting
...


root@karpigor:/var/log# ls -l syslog
-rw-r----- 1 root adm 26730783 июн 23 18:44 syslog

root@karpigor:/var/log# grep 'is in wrong format - aborting' syslog | wc -l
303841




Information forwarded to debian-bugs-dist@lists.debian.org, Jose M Calhariz <calhariz@debian.org>:
Bug#638256; Package at. (Tue, 06 Dec 2022 04:15:03 GMT) (full text, mbox, link).


Acknowledgement sent to Dave <bug-638256@wsfd.ath.cx>:
Extra info received and forwarded to list. Copy sent to Jose M Calhariz <calhariz@debian.org>. (Tue, 06 Dec 2022 04:15:03 GMT) (full text, mbox, link).


Message #20 received at 638256@bugs.debian.org (full text, mbox, reply):

From: Dave <bug-638256@wsfd.ath.cx>
To: 638256@bugs.debian.org
Subject: Bug 638256 Update - Still Exists
Date: Mon, 5 Dec 2022 23:05:52 -0500
TO: 638256@bugs.debian.org


And... Still a bug.  Since at least 2010...

As of December 2022 this is still an issue in 3.1.23.1-1

ii  at             3.1.23-1.1   arm64        Delayed job execution and 
batch processing

ls -l /usr/sbin/atd
-rwxr-xr-x 1 root root 26568 Sep 25  2020 /usr/sbin/atd


/var/log/syslog (and/or /var/log/cron) filled with:
...
Dec  5 21:11:36 mysys atd[17889]: File a006e001a8bf99 is in wrong format 
- aborting
Dec  5 21:11:36 mysys atd[17890]: File a006e001a8bf99 is in wrong format 
- aborting
Dec  5 21:11:36 mysys atd[17891]: File a006e001a8bf99 is in wrong format 
- aborting
Dec  5 21:11:36 mysys atd[17892]: File a006e001a8bf99 is in wrong format 
- aborting
...

/var/spool/cron/atjobs contained:
drwxrwx--T 2 daemon daemon 4096 Dec  5 21:11  .
drwxr-xr-x 5 root   root   4096 Dec  4 11:30  ..
-rw------- 1 daemon daemon    6 Dec  4 13:18  .SEQ
-rwx------ 2 dab    daemon    0 Dec  4 13:18 '=006e001a8bf99'
-rwx------ 2 dab    daemon    0 Dec  4 13:18  a006e001a8bf99

Deleting these zero-sized files/jobs and restarting atd resolved the issue.

I'm not a developer, but a possible quick fix would be to add:
  find /var/spool/cron/atjobs/ -type f -empty -print -delete
to the /etc/init.d/atd start stanza

Although this would only cover instances when/where atd is (re)started.





Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Aug 23 14:35:23 2023; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.