Debian Bug report logs - #897266
systemd: journalctl assertion failure

version graph

Package: systemd; Maintainer for systemd is Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>; Source for systemd is src:systemd (PTS, buildd, popcon).

Reported by: Marc Lehmann <debian-reportbug@plan9.de>

Date: Tue, 1 May 2018 03:33:01 UTC

Severity: normal

Tags: moreinfo

Found in version systemd/232-25+deb9u2

Done: Michael Biebl <biebl@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Tue, 01 May 2018 03:33:04 GMT) (full text, mbox, link).


Acknowledgement sent to Marc Lehmann <debian-reportbug@plan9.de>:
New Bug report received and forwarded. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Tue, 01 May 2018 03:33:04 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Marc Lehmann <debian-reportbug@plan9.de>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: systemd: journalctl assertion failure
Date: Tue, 01 May 2018 05:29:49 +0200
Package: systemd
Version: 232-25+deb9u2
Severity: normal

Dear Maintainer,

I tried to see cron messages using:

   journalctl -fu cron

but journalctl aborted with an assertion failure:

   Assertion 'p > 0' failed at ../src/journal/journal-file.c:2280, function test_object_offset(). Aborting.
   Aborted
   [Exit 134 (SIGABRT)]

-- Package-specific info:

-- System Information:
Debian Release: 9.4
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'stable-updates'), (500, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.14.37-041437-generic (SMP w/8 CPU cores)
Locale: LANG=C, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=C (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/bash
Init: systemd (via /run/systemd/system)

Versions of packages systemd depends on:
ii  adduser         3.115
ii  libacl1         2.2.52-3+b1
ii  libapparmor1    2.11.0-3+deb9u2
ii  libaudit1       1:2.6.7-2
ii  libblkid1       2.29.2-1+deb9u1
ii  libc6           2.24-11+deb9u3
ii  libcap2         1:2.25-1
ii  libcryptsetup4  2:1.7.3-4
ii  libgcrypt20     1.7.6-2+deb9u2
ii  libgpg-error0   1.26-2
ii  libidn11        1.33-1
ii  libip4tc0       1.6.0+snapshot20161117-6
ii  libkmod2        23-2
ii  liblz4-1        0.0~r131-2+b1
ii  liblzma5        5.2.2-1.2+b1
ii  libmount1       2.29.2-1+deb9u1
ii  libpam0g        1.1.8-3.6
ii  libseccomp2     2.3.1-2.1
ii  libselinux1     2.6-3+b3
ii  libsystemd0     232-25+deb9u2
ii  mount           2.29.2-1+deb9u1
ii  procps          2:3.3.12-3
ii  util-linux      2.29.2-1+deb9u1

Versions of packages systemd recommends:
ii  dbus            1.10.26-0+deb9u1
ii  libpam-systemd  232-25+deb9u2

Versions of packages systemd suggests:
ii  policykit-1        0.105-18
pn  systemd-container  <none>
ii  systemd-ui         3-4+b1

Versions of packages systemd is related to:
pn  dracut           <none>
ii  initramfs-tools  0.130
ii  udev             232-25+deb9u2

-- Configuration Files:
/etc/systemd/logind.conf changed:
[Login]
UserTasksMax=50%


-- no debconf information



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Sat, 05 May 2018 01:27:03 GMT) (full text, mbox, link).


Acknowledgement sent to Michael Biebl <biebl@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Sat, 05 May 2018 01:27:03 GMT) (full text, mbox, link).


Message #10 received at 897266@bugs.debian.org (full text, mbox, reply):

From: Michael Biebl <biebl@debian.org>
To: 897266@bugs.debian.org, Marc Lehmann <debian-reportbug@plan9.de>
Subject: Re: systemd: journalctl assertion failure
Date: Sat, 5 May 2018 03:23:35 +0200
[Message part 1 (text/plain, inline)]
Control: tags -1 + moreinfo

On Tue, 01 May 2018 05:29:49 +0200 Marc Lehmann
<debian-reportbug@plan9.de> wrote:
> Package: systemd
> Version: 232-25+deb9u2
> Severity: normal
> 
> Dear Maintainer,
> 
> I tried to see cron messages using:
> 
>    journalctl -fu cron
> 
> but journalctl aborted with an assertion failure:
> 
>    Assertion 'p > 0' failed at ../src/journal/journal-file.c:2280, function test_object_offset(). Aborting.
>    Aborted
>    [Exit 134 (SIGABRT)]

Is this problem reproducible?
If so, can you please install the dbgsym package for systemd/libsystemd0
and get a backtrace from the crash [1].

Do you have persistent journal enabled?
What does journalctl --verify say?


[1] https://wiki.debian.org/HowToGetABacktrace
-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

[signature.asc (application/pgp-signature, attachment)]

Added tag(s) moreinfo. Request was from Michael Biebl <biebl@debian.org> to 897266-submit@bugs.debian.org. (Sat, 05 May 2018 01:27:03 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Sat, 05 May 2018 03:45:06 GMT) (full text, mbox, link).


Acknowledgement sent to Marc Lehmann <schmorp@schmorp.de>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Sat, 05 May 2018 03:45:06 GMT) (full text, mbox, link).


Message #17 received at 897266@bugs.debian.org (full text, mbox, reply):

From: Marc Lehmann <schmorp@schmorp.de>
To: Michael Biebl <biebl@debian.org>
Cc: 897266@bugs.debian.org, Marc Lehmann <debian-reportbug@plan9.de>
Subject: Re: systemd: journalctl assertion failure
Date: Sat, 5 May 2018 05:27:48 +0200
On Sat, May 05, 2018 at 03:23:35AM +0200, Michael Biebl <biebl@debian.org> wrote:
> Is this problem reproducible?

Yes.

> If so, can you please install the dbgsym package for systemd/libsystemd0
> and get a backtrace from the crash [1].

Cool, I didn't know about the debug symbols repository. See below.

> Do you have persistent journal enabled?

Yes.

> What does journalctl --verify say?

A lot of pass, but also 27 FAIL and "File corruption" and similar messages
(as usual with systemd).

Example:

PASS: /var/log/journal/a1002e4c6eb50c7b5cdd5e004698a90b/user-65533@1614fbd80d4f42e8a21cc0a4825e4de1-0000000000795bfa-00056a6baca9f49a.journal
12dddc8: Invalid object                                                         
File corruption detected at /var/log/journal/a1002e4c6eb50c7b5cdd5e004698a90b/system@7ea033a201334367b130dd1201659ee1-000000000079ca96-00056a74a6a87814.journal:12dddc8 (of 33554432 bytes, 58%).

Backtrace as follows:

(gdb) r -fu cron
Starting program: /usr/bin/journalctl -fu cron
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Assertion 'p > 0' failed at ../src/journal/journal-file.c:2280, function test_object_offset(). Aborting.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fffefb3242a in __GI_abort () at abort.c:89
#2  0x00007fffeff81a42 in log_assert_failed (text=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at ../src/basic/log.c:795
#3  0x00007fffeffbe17f in test_object_offset (f=<optimized out>, p=<optimized out>, needle=<optimized out>) at ../src/journal/journal-file.c:2280
#4  0x00007fffeffbfe13 in generic_array_bisect (f=f@entry=0x55555557ad10, first=<optimized out>, first@entry=3739384, n=n@entry=2343, needle=11772088, test_object=0x7fffeffbe100 <test_object_offset>, direction=direction@entry=DIRECTION_UP, ret=0x0, offset=0x7fffffffdbd0, idx=0x0) at ../src/journal/journal-file.c:2019
#5  0x00007fffeffc025c in generic_array_bisect_plus_one (f=<optimized out>, extra=<optimized out>, first=<optimized out>, n=<optimized out>, needle=<optimized out>, test_object=<optimized out>, direction=<optimized out>, ret=<optimized out>, offset=<optimized out>, idx=<optimized out>, idx=<optimized out>, offset=<optimized out>, ret=<optimized out>, direction=<optimized out>, test_object=<optimized out>, needle=<optimized out>, n=<optimized out>, first=<optimized out>, extra=<optimized out>, f=<optimized out>) at ../src/journal/journal-file.c:2251
#6  0x00007fffeffc056e in journal_file_move_to_entry_by_offset_for_data (f=<optimized out>, data_offset=<optimized out>, p=<optimized out>, direction=<optimized out>, ret=<optimized out>, offset=<optimized out>) at ../src/journal/journal-file.c:2688
#7  0x00007fffefee310c in next_for_match (j=j@entry=0x555555567050, m=m@entry=0x555555569380, f=f@entry=0x55555557ad10, after_offset=after_offset@entry=11772088, direction=direction@entry=DIRECTION_UP, ret=ret@entry=0x0, offset=0x7fffffffdbd0) at ../src/journal/sd-journal.c:524
#8  0x00007fffefee3184 in next_for_match (j=0x555555567050, m=<optimized out>, f=0x55555557ad10, after_offset=11772088, direction=DIRECTION_UP, ret=ret@entry=0x0, offset=0x7fffffffdc58) at ../src/journal/sd-journal.c:534
#9  0x00007fffefee31d4 in next_for_match (j=j@entry=0x555555567050, m=m@entry=0x5555555692e0, f=f@entry=0x55555557ad10, after_offset=after_offset@entry=11772088, direction=direction@entry=DIRECTION_UP, ret=ret@entry=0x0, offset=0x7fffffffdd88) at ../src/journal/sd-journal.c:556
#10 0x00007fffefee361a in find_location_for_match (j=j@entry=0x555555567050, m=m@entry=0x5555555692e0, f=f@entry=0x55555557ad10, direction=direction@entry=DIRECTION_UP, ret=ret@entry=0x0, offset=offset@entry=0x7fffffffdd88) at ../src/journal/sd-journal.c:687
#11 0x00007fffefee3758 in find_location_for_match (j=j@entry=0x555555567050, m=m@entry=0x555555569290, f=f@entry=0x55555557ad10, direction=direction@entry=DIRECTION_UP, ret=ret@entry=0x0, offset=offset@entry=0x7fffffffde28) at ../src/journal/sd-journal.c:641
#12 0x00007fffefee35d4 in find_location_for_match (j=j@entry=0x555555567050, m=0x555555569240, f=f@entry=0x55555557ad10, direction=direction@entry=DIRECTION_UP, ret=ret@entry=0x7fffffffdf00, offset=offset@entry=0x7fffffffdf08) at ../src/journal/sd-journal.c:679
#13 0x00007fffefee3a2a in find_location_with_matches (offset=0x7fffffffdf08, ret=0x7fffffffdf00, direction=DIRECTION_UP, f=0x55555557ad10, j=0x555555567050) at ../src/journal/sd-journal.c:724
#14 next_beyond_location (direction=DIRECTION_UP, f=0x55555557ad10, j=0x555555567050) at ../src/journal/sd-journal.c:784
#15 real_journal_next (j=j@entry=0x555555567050, direction=direction@entry=DIRECTION_UP) at ../src/journal/sd-journal.c:832
#16 0x00007fffefee42f6 in real_journal_next_skip (j=0x555555567050, direction=DIRECTION_UP, skip=10) at ../src/journal/sd-journal.c:892
#17 0x000055555555ba25 in main (argc=<optimized out>, argv=<optimized out>) at ../src/journal/journalctl.c:2458

Additional info:

#4  0x00007fffeffbfe13 in generic_array_bisect (f=f@entry=0x55555557ad10, first=<optimized out>, first@entry=3739384, n=n@entry=2343, needle=11772088, test_object=0x7fffeffbe100 <test_object_offset>, direction=direction@entry=DIRECTION_UP, ret=0x0, offset=0x7fffffffdbd0, idx=0x0) at ../src/journal/journal-file.c:2019
2019    in ../src/journal/journal-file.c
(gdb) p *f
$1 = {
  fd = 13, 
  mode = 0, 
  flags = 0, 
  prot = 1, 
  writable = false, 
  compress_xz = false, 
  compress_lz4 = true, 
  seal = false, 
  defrag_on_close = false, 
  close_fd = true, 
  archive = false, 
  tail_entry_monotonic_valid = false, 
  last_direction = DIRECTION_UP, 
  location_type = LOCATION_HEAD, 
  last_n_entries = 8977, 
  path = 0x55555557aef0 "/var/log/journal/a1002e4c6eb50c7b5cdd5e004698a90b/system@7ea033a201334367b130dd1201659ee1-00000000007a33f3-00056a7eb6600ac0.journal", 
  last_stat = {
    st_dev = 25, 
    st_ino = 788380, 
    st_nlink = 1, 
    st_mode = 33184, 
    st_uid = 0, 
    st_gid = 137, 
    __pad0 = 0, 
    st_rdev = 0, 
    st_size = 16777216, 
    st_blksize = 4096, 
    st_blocks = 32768, 
    st_atim = {
      tv_sec = 1524467341, 
      tv_nsec = 170198568
    }, 
    st_mtim = {
      tv_sec = 1524512161, 
      tv_nsec = 130419538
    }, 
    st_ctim = {
      tv_sec = 1524512161, 
      tv_nsec = 278419226
    }, 
    __glibc_reserved = {0, 0, 0}
  }, 
  last_stat_usec = 2182963690030, 
  header = 0x7fffe9039000, 
  data_hash_table = 0x7fffe903a5e0, 
  field_hash_table = 0x0, 
  current_offset = 0, 
  current_seqnum = 0, 
  current_realtime = 0, 
  current_monotonic = 0, 
  current_boot_id = {
    bytes = '\000' <repeats 15 times>, 
    qwords = {0, 0}
  }, 
  current_xor_hash = 0, 
  metrics = {
    max_size = 0, 
    min_size = 0, 
    max_use = 0, 
    min_use = 0, 
    keep_free = 0, 
    n_max_files = 0
  }, 
  mmap = 0x555555569190, 
  post_change_timer = 0x0, 
  post_change_timer_period = 0, 
  chain_cache = 0x555555567378, 
  offline_thread = 0, 
  offline_state = OFFLINE_JOINED, 
  compress_buffer = 0x0, 
  compress_buffer_size = 0, 
  hmac = 0x0, 
  hmac_running = false, 
  fss_file = 0x0, 
  fss_file_size = 0, 
  fss_start_usec = 0, 
  fss_interval_usec = 0, 
  fsprg_state = 0x0, 
  fsprg_state_size = 0, 
  fsprg_seed = 0x0, 
  fsprg_seed_size = 0
}


-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Sat, 05 May 2018 10:54:05 GMT) (full text, mbox, link).


Acknowledgement sent to Michael Biebl <biebl@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Sat, 05 May 2018 10:54:05 GMT) (full text, mbox, link).


Message #22 received at 897266@bugs.debian.org (full text, mbox, reply):

From: Michael Biebl <biebl@debian.org>
To: Marc Lehmann <schmorp@schmorp.de>, 897266@bugs.debian.org
Cc: Marc Lehmann <debian-reportbug@plan9.de>
Subject: Re: Bug#897266: systemd: journalctl assertion failure
Date: Sat, 5 May 2018 12:13:35 +0200
[Message part 1 (text/plain, inline)]
Thanks

Am 05.05.2018 um 05:27 schrieb Marc Lehmann:

> #4  0x00007fffeffbfe13 in generic_array_bisect (f=f@entry=0x55555557ad10, first=<optimized out>, first@entry=3739384, n=n@entry=2343, needle=11772088, test_object=0x7fffeffbe100 <test_object_offset>, direction=direction@entry=DIRECTION_UP, ret=0x0, offset=0x7fffffffdbd0, idx=0x0) at ../src/journal/journal-file.c:2019
> 2019    in ../src/journal/journal-file.c
> (gdb) p *f
> $1 = {
>   fd = 13, 
>   mode = 0, 
>   flags = 0, 
>   prot = 1, 
>   writable = false, 
>   compress_xz = false, 
>   compress_lz4 = true, 
>   seal = false, 
>   defrag_on_close = false, 
>   close_fd = true, 
>   archive = false, 
>   tail_entry_monotonic_valid = false, 
>   last_direction = DIRECTION_UP, 
>   location_type = LOCATION_HEAD, 
>   last_n_entries = 8977, 
>   path = 0x55555557aef0 "/var/log/journal/a1002e4c6eb50c7b5cdd5e004698a90b/system@7ea033a201334367b130dd1201659ee1-00000000007a33f3-00056a7eb6600ac0.journal", 

What does journalctl --verify say about this particular file?
Could you create a /var/log/journal-broken directory and move the broken
journal files away into this directory one by one and test which of the
files is triggering the abort?

Can you reproduce the problem with systemd v238 from sid/buster?
In case you don't want to upgrade your system to buster for this:
it should be possible to point journalctl from a v238 installation at
the journal files of your stetch system via journalctl --directory.

Are you able able/willing to share the journal file(s)?
At some point we will probably have to involve upstream, at least if the
problem is still reproducible with v238.

Regards,
Michael

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Sun, 20 May 2018 07:03:03 GMT) (full text, mbox, link).


Acknowledgement sent to Marc Lehmann <schmorp@schmorp.de>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Sun, 20 May 2018 07:03:03 GMT) (full text, mbox, link).


Message #27 received at 897266@bugs.debian.org (full text, mbox, reply):

From: Marc Lehmann <schmorp@schmorp.de>
To: Michael Biebl <biebl@debian.org>
Cc: 897266@bugs.debian.org, Marc Lehmann <debian-reportbug@plan9.de>
Subject: Re: Bug#897266: systemd: journalctl assertion failure
Date: Sun, 20 May 2018 08:59:39 +0200
Sorry for the late reply, I got a bit swamped with work.

On Sat, May 05, 2018 at 12:13:35PM +0200, Michael Biebl <biebl@debian.org> wrote:
> What does journalctl --verify say about this particular file?

AFAICR, it complained about file corruption for that file.

> Could you create a /var/log/journal-broken directory and move the broken
> journal files away into this directory one by one and test which of the
> files is triggering the abort?

I could when I originally read your reply, but since I was busy at the
time and thought I read somewhere that systemd would leave corrupted
journal files alone, I moved my investigation to later. When I looked for
it now, it was gone.

What I found interesting, though, is that jorunalctl --verify reports file
corruptiopn for a lot of files in /var/log/journal, although I didn't have
a crash (all files are newer than 14 days, and the machine has an uptime
of 40 days and even systemd-journald is running uninterrupted).

> Can you reproduce the problem with systemd v238 from sid/buster?

Wasn't able to test it before it was deleted, sorry. This was unexpected.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Sun, 20 May 2018 20:03:02 GMT) (full text, mbox, link).


Acknowledgement sent to Michael Biebl <biebl@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Sun, 20 May 2018 20:03:02 GMT) (full text, mbox, link).


Message #32 received at 897266@bugs.debian.org (full text, mbox, reply):

From: Michael Biebl <biebl@debian.org>
To: Marc Lehmann <schmorp@schmorp.de>
Cc: 897266@bugs.debian.org
Subject: Re: Bug#897266: systemd: journalctl assertion failure
Date: Sun, 20 May 2018 21:58:10 +0200
[Message part 1 (text/plain, inline)]
Am 20.05.2018 um 08:59 schrieb Marc Lehmann:
> Sorry for the late reply, I got a bit swamped with work.
> 
> On Sat, May 05, 2018 at 12:13:35PM +0200, Michael Biebl <biebl@debian.org> wrote:
>> What does journalctl --verify say about this particular file?
> 
> AFAICR, it complained about file corruption for that file.

We'd need a more detailed error message, but I guess this is moot now.

>> Could you create a /var/log/journal-broken directory and move the broken
>> journal files away into this directory one by one and test which of the
>> files is triggering the abort?
> 
> I could when I originally read your reply, but since I was busy at the
> time and thought I read somewhere that systemd would leave corrupted
> journal files alone, I moved my investigation to later. When I looked for
> it now, it was gone.

Hm, too bad. Not sure what to do about this bug report now that we no
longer have the offending journal file. I'm tempted to close it and
reopen again, in case you run into it again and we can further debug this.

Old journal files are removed when you hit certain limits. The default
retention policy is size based. As long as you have a certain amount of
free space, old journal files are kept. If those limits are exceeded,
the oldest journal files are removed first.

> What I found interesting, though, is that jorunalctl --verify reports file
> corruptiopn for a lot of files in /var/log/journal, although I didn't have
> a crash (all files are newer than 14 days, and the machine has an uptime
> of 40 days and even systemd-journald is running uninterrupted).

That seems very strange. The only case where I personally ran into
journal file corruption is when I had to power cycle the machine.
But you said that journald ran uninterrupted for 40 days.
Would it be possible that this is a hardware or file system issue?

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Sat, 02 Jun 2018 13:57:03 GMT) (full text, mbox, link).


Acknowledgement sent to Marc Lehmann <schmorp@schmorp.de>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Sat, 02 Jun 2018 13:57:03 GMT) (full text, mbox, link).


Message #37 received at 897266@bugs.debian.org (full text, mbox, reply):

From: Marc Lehmann <schmorp@schmorp.de>
To: Michael Biebl <biebl@debian.org>
Cc: 897266@bugs.debian.org
Subject: Re: Bug#897266: systemd: journalctl assertion failure
Date: Sat, 2 Jun 2018 15:53:39 +0200
On Sun, May 20, 2018 at 09:58:10PM +0200, Michael Biebl <biebl@debian.org> wrote:
> Hm, too bad. Not sure what to do about this bug report now that we no
> longer have the offending journal file. I'm tempted to close it and
> reopen again, in case you run into it again and we can further debug this.

Would make sense, IMHO.

> > What I found interesting, though, is that jorunalctl --verify reports file
> > corruptiopn for a lot of files in /var/log/journal, although I didn't have
> > a crash (all files are newer than 14 days, and the machine has an uptime
> > of 40 days and even systemd-journald is running uninterrupted).
> 
> That seems very strange. The only case where I personally ran into
> journal file corruption is when I had to power cycle the machine.
> But you said that journald ran uninterrupted for 40 days.
> Would it be possible that this is a hardware or file system issue?

The journal files themselves on btrfs, but that in itself shouldn't cause
corruption, at least not in the kernel versions we now use (and btrfs
turned out to be an excellent problem detector in the past, blowing
up at the tiniest issues). The server is indeed the only one showing
the problem, but on the other hand, it is also by far the busiest and
most-watched one. I also ran standalone memtest for half a night a few
weeks ago to rule out a memory problem, and it otherwise works fine,
despite crunching through a terabyte of data per day.

So, I don't think its a hardware issue, unless it's a hardware issue that
would affect only systemd, which is possible, but not that likely.

In any case, closing the bug report is the right thing to do if this
cnanot be further debugged. When I stumble over another such case I can
reopen it or create a new one.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Tue, 05 Jun 2018 17:39:04 GMT) (full text, mbox, link).


Acknowledgement sent to Marc Lehmann <schmorp@schmorp.de>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Tue, 05 Jun 2018 17:39:04 GMT) (full text, mbox, link).


Message #42 received at 897266@bugs.debian.org (full text, mbox, reply):

From: Marc Lehmann <schmorp@schmorp.de>
To: Michael Biebl <biebl@debian.org>
Cc: 897266@bugs.debian.org
Subject: Re: Bug#897266: systemd: journalctl assertion failure
Date: Tue, 5 Jun 2018 19:37:37 +0200
On Sun, May 20, 2018 at 09:58:10PM +0200, Michael Biebl <biebl@debian.org> wrote:
> That seems very strange. The only case where I personally ran into
> journal file corruption is when I had to power cycle the machine.
> But you said that journald ran uninterrupted for 40 days.
> Would it be possible that this is a hardware or file system issue?

I have an update to this, and can reproduce this: systemd is likely off the
hook for the corruption itself. Clearly it shouldn't crash, but I can
reproduce the corruption now, and it's a almost certainly a linux 4.14 bug.

As for background: linux 4.4 was the last kernel which worked on our
servers.  At some point in 4.6, we started getting frequent OOM kills a
few hours after booting, despite many gigabytes of memory "available"
(e.g. used as cache) (you might remember me complaining about missing 4.4
compatibility for this reason - we couldn't switch to 4.9). The first
kernel that kind of worked for us again was 4.14, but only with this
hourly cronjob:

   echo 3 >/proc/sys/vm/drop_caches

Without it, mysql still gets killed once per week or so. This doesn't work
with debians 4.9 LTS kernel, which is why we use the 4.14 LTS kernel from
the ubuntu mainline ppa.

And the above command causes corruption of the systemd journal. I have
reproduced this multiple times now, by deleting the journal and restarting
the journald, following by waiting for a day, and then doing this:

   # journalctl --verify
   [everything fine at this point]
   # echo 3 >/proc/sys/vm/drop_caches
   # journalctl --verify
   [journcal now reporting corruption problems]

We are in the lucky position to have "expected" md5 checksums for practically
all files on the servers this happens on (and debian packages usually have
md5sum files as well) and luckily, neither the fs itself now any other file
seems corrupted, including some write-heavy mysql databases and over 53TB of
data we verified.

Only one other program also suffers from corruption: rtorrent, which doesn't
run on many servers :) which is why I found out about it only by accident.
There, the same patterns happens: downloading a torrent is fine, downloading
a torrent while dropping the caches frequently causes file corruption.

I also have cmp -l output from a corrupted file vs. a correct file, and it
seems the corruption manifests itself as (non-aligned to anything obvious,
such as 512 or 4k borders) streaks of zero bytes instead of the real data
that should be there.

I will pursue this with the linux upstream. It's possible that systemd
(like rtorrent is known to) does something to increase the chance of
corruption, as it luckily only seems to affect those tow programs, but it's
unlikely to be a bug in systemd itself (other than it probably shouldn't
crash), as drop_caches is supposed to be safe.

Greetings,

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#897266; Package systemd. (Tue, 05 Jun 2018 17:57:03 GMT) (full text, mbox, link).


Acknowledgement sent to Michael Biebl <biebl@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Tue, 05 Jun 2018 17:57:03 GMT) (full text, mbox, link).


Message #47 received at 897266@bugs.debian.org (full text, mbox, reply):

From: Michael Biebl <biebl@debian.org>
To: Marc Lehmann <schmorp@schmorp.de>
Cc: 897266@bugs.debian.org
Subject: Re: Bug#897266: systemd: journalctl assertion failure
Date: Tue, 5 Jun 2018 19:55:44 +0200
[Message part 1 (text/plain, inline)]
Am 05.06.2018 um 19:37 schrieb Marc Lehmann:
> I have an update to this, and can reproduce this: systemd is likely off the
> hook for the corruption itself. Clearly it shouldn't crash, but I can
> reproduce the corruption now, and it's a almost certainly a linux 4.14 bug.

[snip]

Thanks for the detailed reply!
-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

[signature.asc (application/pgp-signature, attachment)]

Reply sent to Michael Biebl <biebl@debian.org>:
You have taken responsibility. (Wed, 27 Jun 2018 21:27:06 GMT) (full text, mbox, link).


Notification sent to Marc Lehmann <debian-reportbug@plan9.de>:
Bug acknowledged by developer. (Wed, 27 Jun 2018 21:27:06 GMT) (full text, mbox, link).


Message #52 received at 897266-done@bugs.debian.org (full text, mbox, reply):

From: Michael Biebl <biebl@debian.org>
To: 897266-done@bugs.debian.org, Marc Lehmann <schmorp@schmorp.de>
Subject: Re: Bug#897266: systemd: journalctl assertion failure
Date: Wed, 27 Jun 2018 23:22:40 +0200
[Message part 1 (text/plain, inline)]
On Tue, 5 Jun 2018 19:55:44 +0200 Michael Biebl <biebl@debian.org> wrote:
> Am 05.06.2018 um 19:37 schrieb Marc Lehmann:
> > I have an update to this, and can reproduce this: systemd is likely off the
> > hook for the corruption itself. Clearly it shouldn't crash, but I can
> > reproduce the corruption now, and it's a almost certainly a linux 4.14 bug.
> 
> [snip]
> 
> Thanks for the detailed reply!


Closing this bug report. If there is anything left to do on the systemd
side, feel free to reopen.

Regards,
Michael

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

[signature.asc (application/pgp-signature, attachment)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 26 Jul 2018 07:28:40 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Jul 23 23:40:46 2020; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.