Debian Bug report logs - #473812
libc6: calloc returns non-zero memory areas when mlockall is being used

version graph

Package: libc6; Maintainer for libc6 is GNU Libc Maintainers <debian-glibc@lists.debian.org>; Source for libc6 is src:eglibc.

Reported by: Marc Lehmann <debian-reportbug@plan9.de>

Date: Tue, 1 Apr 2008 19:39:02 UTC

Severity: normal

Found in version glibc/2.7-5

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#473812; Package libc6. Full text and rfc822 format available.

Acknowledgement sent to Marc Lehmann <debian-reportbug@plan9.de>:
New Bug report received and forwarded. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Marc Lehmann <debian-reportbug@plan9.de>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: libc6: calloc returns non-zero memory areas when mlockall is being used
Date: Tue, 01 Apr 2008 21:32:18 +0200
Package: libc6
Version: 2.7-5
Severity: normal


Hi!

The bug I found (if it is a bug) is very hard to reproduce for me, so
bear with me if the explanation is a bit sketchy (a glibc-malloc expert
would need to look at this in more detail). Please also note that I have
sticthed together the examples from multipel debugging runs, so the
addresses do not neccessarily match.

Findings of fact:

   1. calloc returns memory areas that contain data from previous allocations
      (typical example:

         0x2aaab01c6fc0: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fc8: -56 'È' -20 'ì' 26 '\032'       5 '\005'        0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fd0: 13 '\r' 0 '\0'  0 '\0'  0 '\0'  4 '\004'        0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fd8: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fe0: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6fe8: -80 '°' -82 '®' -81 '¯' 2 '\002'        0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c6ff0: -16 'ð' 108 'l' 28 '\034'       -80 '°' -86 'ª' 42 '*'  0 '\0'  0 '\0'
         0x2aaab01c6ff8: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c7000: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c7008: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7010: 0 '\0'  0 '\0'  0 '\0'  0 '\0'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7018: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7020: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7028: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7030: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7038: -48 'Ð' -22 'ê' 126 '~' 2 '\002'        0 '\0'  0 '\0'  0 '\0'  0 '\0'
         0x2aaab01c7040: 112 'p' 90 'Z'  28 '\034'       -80 '°' -86 'ª' 42 '*'  0 '\0'  0 '\0'
         0x2aaab01c7048: 28 '\034'       0 '\0'  0 '\0'  0 '\0'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7050: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7058: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7060: 64 '@'  0 '\0'  0 '\0'  0 '\0'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7068: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7070: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7078: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'
         0x2aaab01c7080: 85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'  85 'U'

      the 0x55's in there result from this code, executed earlier:

        if (text) memset (SvPVX(text),0x55,SvLEN(text));//D
        if (text) SvREFCNT_dec (text);

      the second line causes the memory filled with 0x55 to be freed.

      Note that the 0x55's start near a 4k boundary.

   2. mallopt (M_PERTURB, <nonzero>) makes the program work
   3. NOT using mlockall (MCL_CURENT | MCL_FUTURE) makes the program work
   4. using valgrind makes the program work
   5. using dmalloc makes the program work

   So this problem only happens with the glibc malloc, when mlockall is
   active and the perturb-debugging-code is NOT active. I will show why these
   conditions are neccessary.

How this likely happens:

   From looking throught he glibc sourcecode, I can see that calloc
   sometimes does not clear the memory block, or only clears part of it,
   as an optimisation:

        /* Two optional cases in which clearing not necessary */
      #if HAVE_MMAP
        if (chunk_is_mmapped (p))
          {
            if (__builtin_expect (perturb_byte, 0))
              MALLOC_ZERO (mem, sz);
            return mem;
          }
      #endif

        csz = chunksize(p);

      #if MORECORE_CLEARS
        if (perturb_byte == 0 && (p == oldtop && csz > oldtopsize)) {
          /* clear only the bytes from non-freshly-sbrked memory */
          csz = oldtopsize;
        }
      #endif

   The memory block above is not an mmapped chunk (the word before it
   in memory is "0xb5" which means its not from brk-managed memory, has
   a size of 0xb0 bytes, has no valid prevous size prefix and is not an
   mmapp chunk).

   However, the second part checks for the case when an allocation
   has been extended which happens when there was a call to sbrk,
   extending the heap, or, for mmap-managed heaps, when there was a
   call to mprotect. In both cases, calloc will only clear up to the
   newly-allocated segment.

   This is apparently the condition that gets triggered, and here is how:

   Again, from reading the sources, it seems that glibc has the ability
   to manage multiple heap arenas, one with brk/sbrk, and multiple
   ones with mmap(PROT_NONE) which get "physically allocated" with
   mprotect(PROT_READ|PROT_WRITE) and "physically freed" with madvise
   (MADV_DONT_NEED).

   In an strace (intermingled with debugging output), I see this:

      mprotect(0x2aaab0135000, 155648, PROT_READ|PROT_WRITE) = 0
         (a) 0x2aaab010ec00 [0x2aaab0134d60 0x2aaab015aeb6]
      madvise(0x2aaab012f000, 180224, 0x4 /* MADV_??? */) = -1 EINVAL (Invalid argument)
         (b) 0x2aaab013afc0 0x5555555555555555 (0 135)

   Explanation: 

   The first mprotect "allocates" the memory used for the "text"
   above (the piece of memory that later gets memset to 0x55).

   The line (a) is debugging output from my program showing that
   [0x2aaab0134d60..0x2aaab015aeb6] was allocated.

   It is subsequently memset to 0x55 and then freed, resulting in the
   madvise (from malloc/arena.c), where glibc tries to get rid of the
   memory. The expectation from madvise is that the memory is cleared to
   zero by the kernel. Note how the madvise call (0x4 == MADV_DONTNEED
   btw.) fails, and also note that glibc completely ignores errors from
   madvise (see malloc/arena.c).

   In line (b) we see the address returned by calloc, and a pointer
   inside the calloc'ed memory areas, which should be 0, but isn't. This
   is because glibc thinks madvise cleared the memory, and the calloc
   optimisation kicks in where glibc assumes that the memory is now zero,
   when in fact it isn't cleared at all.

   EINVAL from madvise is documented as:

      EINVAL The value len is negative, start is not page-aligned, advice
      is not a valid value, or the application is attempting to release
      locked or shared pages (with MADV_DONTNEED).

   which explains why it fails only when mlockall is being used.

Result:

   mlockall is incompatible with the glibc memory allocator. this should
   either be fixed or clearly documented (preferably fixed, as most
   programs using mlockall are rather mission-critical, which is why they
   use mlockall in the first place :)

Again, my test program is rather big, and I didn't instrument my glibc, so
the above could also be wrong, which is why a glibc expert needs to look
at it. In any case, I think the problem is relatively obvious, and not
checking the madvise return code was a bad thing in the first place.

(as a related note, I think this could also explain some of the memleaks
I experience where mallinfo shows much _less_ memory used than ps
(i.e. 400mb vs. 1.5gb), which isn't explainable by mere internal
fragmentation. this would fit into the above, as glibc might assume
the additional memory has been madvised into oblivion when the kernel
disagrees).

-- System Information:
Debian Release: 4.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.23-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libgcc1                       1:4.3.0-1  GCC support library

libc6 recommends no packages.

-- debconf information:
  glibc/restart-services:
  glibc/restart-failed:




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#473812; Package libc6. (Fri, 19 Feb 2010 10:27:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Julien Pommier <pommier@pianoteq.com>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 19 Feb 2010 10:27:03 GMT) Full text and rfc822 format available.

Message #10 received at 473812@bugs.debian.org (full text, mbox):

From: Julien Pommier <pommier@pianoteq.com>
To: 473812@bugs.debian.org
Subject: libc6: calloc returns non-zero memory areas when mlockall is being used
Date: Fri, 19 Feb 2010 11:19:02 +0100
Hi,

I just want to point that the bug is still there in debian 5.0.4. It is
a bit annoying because any application using libjack is concerned since
jack calls mlockall(). 

Here is an ugly but small test program that triggers the bug:

#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <assert.h>
#include <pthread.h>
int amnt = 0;

void test(int n) {
  printf("test , n=%d\n", n);
  int i, j;
  char *chunk[n]; 
  int sz[n];

  /* alloc */
  for (i=0; i < n; ++i) {
    sz[i] = 3000;
    chunk[i] = calloc(sz[i], 1);
    for (j=0; j< sz[i]; ++j) {
      assert(chunk[i][j]==0);
      chunk[i][j] = 0x88;
    }
    amnt += sz[i];
  }

  /* lock */
  int ret = mlockall(MCL_CURRENT|MCL_FUTURE);
  if (ret != 0) {
     fprintf(stderr, "mlockall failed: %s\n", strerror(errno));
     exit(1);
  }
  printf("Memory locked\n");

  /* free */
  for (i=n-1; i>=0; --i) {
    for (j=0; j < sz[i]; ++j) chunk[i][j] = 0xee; 
    free(chunk[i]); chunk[i] = 0; 
  }

  /* alloc again */
  for (i=0; i < n; ++i) {
    sz[i] = 3000;
    chunk[i] = calloc(sz[i], 1);
    for (j=0; j< sz[i]; ++j) {
      assert(chunk[i][j]==0); // or calloc bug..
      chunk[i][j] = 0x88;
    }
    amnt += sz[i];
  }
}

void *do_test(void *arg) {
  test((int)(long)arg);
  printf("test finished\n");
  return 0;
}

int main(int argc, char **argv) {
  printf("test the calloc bug..\n");
  int n = (argc > 1 ? atoi(argv[1]) : 1000);
  int i;
  pthread_t t;
  for (i=0; i < 10; ++i) { 
    pthread_create(&t, 0, &do_test, (void*)(long)n);
  }
  pthread_join(t, 0);
  printf("allocated %d bytes\n", amnt);
  return 0;
}



output with with libc6 2.7-18lenny2 on a x86_64 installation:
> gcc ./clbug.c -pthread && ./a.out 1000

test the calloc bug..
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
Memory locked
Memory locked
Memory locked
Memory locked
Memory locked
a.out: ./clbug.c:46: test: Assertion `chunk[i][j]==0' failed.
Aborted

I believe it has been fixed in newer glibc releases ( see
https://bugzilla.redhat.com/show_bug.cgi?id=405781 )





Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#473812; Package libc6. (Sat, 20 Feb 2010 23:48:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Adrien Kunysz <adrien@kunysz.be>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Sat, 20 Feb 2010 23:48:03 GMT) Full text and rfc822 format available.

Message #15 received at 473812@bugs.debian.org (full text, mbox):

From: Adrien Kunysz <adrien@kunysz.be>
To: 473812@bugs.debian.org
Subject: libc6: calloc returns non-zero memory areas when mlockall is being used
Date: Sat, 20 Feb 2010 23:45:08 +0000
Upstream bug: http://sources.redhat.com/bugzilla/show_bug.cgi?id=6958
This is the commit that was used to fix Red Hat bug 405781:
http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=4cd4c5d6a28c4fbdc86651c4578f4c4f24efce08

Using your test case, I confirm I can reproduce the issue with glibc 2.7-18lenny2 x86_64.
With your test case, I cannot reproduce the issue with Red Hat Enterprise Linux
glibc-2.5-24.x86_64 which includes the above patch:

# ./debbug473812 
test the calloc bug..
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
test , n=1000
Memory locked
Memory locked
Memory locked
Memory locked
Memory locked
test finished
allocated 32502000 bytes

This suggest the above patch indeeds fixes this issue although upstream bug
is still open and madvise() return value is still not checked.






Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sat Apr 19 12:22:18 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.