Debian Bug report logs - #357140
coreutils: ln -sf is not atomic

version graph

Package: coreutils; Maintainer for coreutils is Michael Stone <mstone@debian.org>; Source for coreutils is src:coreutils (PTS, buildd, popcon).

Reported by: Roger Leigh <rleigh@debian.org>

Date: Wed, 15 Mar 2006 21:33:07 UTC

Severity: minor

Tags: wontfix

Found in version coreutils/5.94-1

Done: Michael Stone <mstone@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#357140; Package coreutils. (full text, mbox, link).


Acknowledgement sent to Roger Leigh <rleigh@debian.org>:
New Bug report received and forwarded. Copy sent to Michael Stone <mstone@debian.org>. (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Roger Leigh <rleigh@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: coreutils: ln -sf has two races which cause breakage
Date: Wed, 15 Mar 2006 21:23:08 +0000
Package: coreutils
Version: 5.94-1
Severity: normal

ln -sf can fail due to two unhandled race conditions:

To reproduce:
  while true; do /bin/ln -sf foo bar; done &
  (repeat a few times)
  while true; do rm -f bar ; sleep 1; done &

You will see intermittent errors:

  /bin/ln: cannot remove `bar': No such file or directory
  /bin/ln: creating symbolic link `bar' to `foo': File exists

The latter error is much more frequent than the former.

This is due to the sequence of events:
  stat file
  unlink file
  symlink file

Normally, this is fine, but it does not cope with these factors:

1) The file can be unlinked by another process in between the stat and
   the unlink, causing the unlink to fail.  ln should be aware this
   can happen, and not treat this as a fatal error.

2) Another process can create file between unlink and symlink, leading
   to symlink creation failing.  Again, ln should be aware creation
   may fail.  If the error is EEXIST, it should repeat the unlink and
   try again.


Regards,
Roger

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (990, 'unstable')
Architecture: powerpc (ppc)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.15.4
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)

Versions of packages coreutils depends on:
ii  libacl1                       2.2.35-1   Access control list shared library
ii  libc6                         2.3.6-3    GNU C Library: Shared libraries an
ii  libselinux1                   1.28-4     SELinux shared libraries

coreutils recommends no packages.

-- no debconf information



Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#357140; Package coreutils. (full text, mbox, link).


Acknowledgement sent to Michael Stone <mstone@debian.org>:
Extra info received and forwarded to list. (full text, mbox, link).


Message #10 received at 357140@bugs.debian.org (full text, mbox, reply):

From: Michael Stone <mstone@debian.org>
To: Roger Leigh <rleigh@debian.org>, 357140@bugs.debian.org
Subject: Re: Bug#357140: coreutils: ln -sf has two races which cause breakage
Date: Wed, 15 Mar 2006 16:37:58 -0500
On Wed, Mar 15, 2006 at 09:23:08PM +0000, Roger Leigh wrote:
>1) The file can be unlinked by another process in between the stat and
>   the unlink, causing the unlink to fail.  ln should be aware this
>   can happen, and not treat this as a fatal error.

Why? 

>2) Another process can create file between unlink and symlink, leading
>   to symlink creation failing.  Again, ln should be aware creation
>   may fail.  If the error is EEXIST, it should repeat the unlink and
>   try again.

Again, why? 

Is there any practical purpose to changing any of this? I'm not sure I 
care too much about what happens if you're running a tight endless loop 
like that, because the outcome is going to be random (dependent on 
execute order) anyway.

Mike Stone



Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#357140; Package coreutils. (full text, mbox, link).


Acknowledgement sent to Roger Leigh <rleigh@whinlatter.ukfsn.org>:
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (full text, mbox, link).


Message #15 received at 357140@bugs.debian.org (full text, mbox, reply):

From: Roger Leigh <rleigh@whinlatter.ukfsn.org>
To: Michael Stone <mstone@debian.org>
Cc: 357140@bugs.debian.org
Subject: Re: Bug#357140: coreutils: ln -sf has two races which cause breakage
Date: Wed, 15 Mar 2006 23:53:54 +0000
[Message part 1 (text/plain, inline)]
Michael Stone <mstone@debian.org> writes:

> On Wed, Mar 15, 2006 at 09:23:08PM +0000, Roger Leigh wrote:
>>1) The file can be unlinked by another process in between the stat and
>>   the unlink, causing the unlink to fail.  ln should be aware this
>>   can happen, and not treat this as a fatal error.
>
> Why?

Because otherwise the command fails when it should not:

process1  process2

stat      stat
unlink
          unlink
symlink
          FAIL

>>2) Another process can create file between unlink and symlink, leading
>>   to symlink creation failing.  Again, ln should be aware creation
>>   may fail.  If the error is EEXIST, it should repeat the unlink and
>>   try again.
>
> Again, why?

Because otherwise the command fails when it should not:

process1  process2

          stat
          unlink
stat
symlink
          symlink
          FAIL

In both these cases, the -f switch tells ln to unlink and replace any
preexisting file, if it exists.  If the file appears or disappears
during the execution of ln, this should not affect its behaviour.  As
it is, ln does not behave deterministically.

> Is there any practical purpose to changing any of this? I'm not sure I
> care too much about what happens if you're running a tight endless
> loop like that, because the outcome is going to be random (dependent
> on execute order) anyway.

This example was an artificial test case.  Its only purpose is to
demonstrate the problem in an efficient manner.

This *is* a real problem, which we are seeing with the sbuild part of
buildd.  It has (perl):

  system "/bin/ln -sf $main::pkg_logfile $conf::build_dir/current-$main::distribution";
  system "/bin/ln -sf $main::pkg_logfile $conf::build_dir/current";

When buildd schedules builds concurrently, sbuild is showing the
problem intermittently.


Regards,
Roger

-- 
Roger Leigh
                Printing on GNU/Linux?  http://gutenprint.sourceforge.net/
                Debian GNU/Linux        http://www.debian.org/
                GPG Public Key: 0x25BFB848.  Please sign and encrypt your mail.
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#357140; Package coreutils. (full text, mbox, link).


Acknowledgement sent to Michael Stone <mstone@debian.org>:
Extra info received and forwarded to list. (full text, mbox, link).


Message #20 received at 357140@bugs.debian.org (full text, mbox, reply):

From: Michael Stone <mstone@debian.org>
To: Roger Leigh <rleigh@whinlatter.ukfsn.org>
Cc: 357140@bugs.debian.org
Subject: Re: Bug#357140: coreutils: ln -sf has two races which cause breakage
Date: Wed, 15 Mar 2006 21:04:05 -0500
On Wed, Mar 15, 2006 at 11:53:54PM +0000, Roger Leigh wrote:
>In both these cases, the -f switch tells ln to unlink and replace any
>preexisting file, if it exists.  If the file appears or disappears
>during the execution of ln, this should not affect its behaviour.

I'm not convinced that's true; I don't think there's anything in the 
documentation that suggests that the command is atomic.

>This *is* a real problem, which we are seeing with the sbuild part of
>buildd.  It has (perl):
>
>  system "/bin/ln -sf $main::pkg_logfile $conf::build_dir/current-$main::distribution";
>  system "/bin/ln -sf $main::pkg_logfile $conf::build_dir/current";
>
>When buildd schedules builds concurrently, sbuild is showing the
>problem intermittently.

Well, calling ln from perl via system is fugly anyway. I'd suggest using 
something like:

unlink($conf::build_dir/current-$main::distribution);
if (!symlink($main::pkg_logfile, $conf::build_dir/current-$main::distribution))
{
  handle_error() if (not $! =~ m/File exists/);
}

I'm not sure what you expect the end result to be if two processes try 
and operate at the same time. It may be that you need a locking 
mechanism for this to make sense.

Mike Stone




Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#357140; Package coreutils. (full text, mbox, link).


Acknowledgement sent to bob@proulx.com (Bob Proulx):
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (full text, mbox, link).


Message #25 received at 357140@bugs.debian.org (full text, mbox, reply):

From: bob@proulx.com (Bob Proulx)
To: Roger Leigh <rleigh@whinlatter.ukfsn.org>, 357140@bugs.debian.org
Subject: Re: Bug#357140: coreutils: ln -sf has two races which cause breakage
Date: Thu, 16 Mar 2006 01:29:57 -0700
Roger Leigh wrote:
> Michael Stone <mstone@debian.org> writes:
> > Roger Leigh wrote:
> >>1) The file can be unlinked by another process in between the stat and
> >>   the unlink, causing the unlink to fail.  ln should be aware this
> >>   can happen, and not treat this as a fatal error.
> >
> > Why?
> 
> Because otherwise the command fails when it should not:

I don't think there ever has been a promise that these commands have
atomic behavior.

  ln -sf SOURCE DESTINATION

Is basically the same as the following, only more conveniently
packaged for use as one shell command.

  rm -f DESTINATION
  ln -s SOURCE DESTINATION

Here is a reference to a similar discussion from upstream:

  http://lists.gnu.org/archive/html/bug-coreutils/2005-05/msg00033.html

If you have suggestions as to how to improve the behavior I am sure
that would be appreciated.  But I don't see how this can be made
atomic.  And this does not seem like something to be fixed simply.

> This *is* a real problem, which we are seeing with the sbuild part of
> buildd.  It has (perl):
> 
>   system "/bin/ln -sf $main::pkg_logfile $conf::build_dir/current-$main::distribution";
>   system "/bin/ln -sf $main::pkg_logfile $conf::build_dir/current";

Ew, that is ugly.  And why the hard coded paths?

> When buildd schedules builds concurrently, sbuild is showing the
> problem intermittently.

If you need an atomic operation then you will need to use a command
that is atomic such as rename(2) accessable from within mv.  Here is
an example in shell off the top of my head but expecting to use a
better temporary name than the one I just produced for this example.

  ln -sf $pkg_logfile $build_dir/current-$distribution.$$
  mv $build_dir/current-$distribution.$$ $build_dir/current-$distribution

But of course you are in perl and can access rename from perl even
more directly.

Michael Stone wrote:
> Well, calling ln from perl via system is fugly anyway. I'd suggest using 
> something like:
> 
> unlink($conf::build_dir/current-$main::distribution);
> if (!symlink($main::pkg_logfile, 
> $conf::build_dir/current-$main::distribution))
> {
>   handle_error() if (not $! =~ m/File exists/);
> }

Let me suggest the following instead:

  unlink($conf::build_dir/current-$main::distribution);
  symlink($main::pkg_logfile,$conf::build_dir/current-$main::distribution.$$) or die;
  rename($conf::build_dir/current-$main::distribution.$$,$conf::build_dir/current-$main::distribution) or die;

I tested the principle using this script.

  #!/usr/bin/perl
  unlink($ARGV[1].$$);
  symlink($ARGV[0],$ARGV[1].$$) || die;
  rename($ARGV[1].$$,$ARGV[1]) || die;

> I'm not sure what you expect the end result to be if two processes try 
> and operate at the same time. It may be that you need a locking 
> mechanism for this to make sense.

Agreed.  But avoiding needing locking when possible is better.

Bob



Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#357140; Package coreutils. (full text, mbox, link).


Acknowledgement sent to Michael Stone <mstone@debian.org>:
Extra info received and forwarded to list. (full text, mbox, link).


Message #30 received at 357140@bugs.debian.org (full text, mbox, reply):

From: Michael Stone <mstone@debian.org>
To: Bob Proulx <bob@proulx.com>, 357140@bugs.debian.org
Subject: Re: Bug#357140: coreutils: ln -sf has two races which cause breakage
Date: Thu, 16 Mar 2006 06:04:55 -0500
On Thu, Mar 16, 2006 at 01:29:57AM -0700, Bob Proulx wrote:
>Let me suggest the following instead:
>
>  unlink($conf::build_dir/current-$main::distribution);
>  symlink($main::pkg_logfile,$conf::build_dir/current-$main::distribution.$$) or die;
>  rename($conf::build_dir/current-$main::distribution.$$,$conf::build_dir/current-$main::distribution) or die;

I considered something like that...

>> I'm not sure what you expect the end result to be if two processes try 
>> and operate at the same time. It may be that you need a locking 
>> mechanism for this to make sense.
>
>Agreed.  But avoiding needing locking when possible is better.

...but without knowing what the end result is, I didn't see it as an 
improvement. That is, if two processes are writing at the same time, 
one's output is going to get clobbered--if one gets clobbered atomically 
I don't see an obvious advantage. If you have an actual locking scheme 
you can determine the order of what happens; otherwise you're just 
guessing, especially since it looks like there are *two* files involved 
(the rename scheme can't atomically handle both of them). If you are 
going to do the symlink/rename scheme I'd also use a better random 
filename than $$. It's also not necessary to do the unlink in that 
case--actually it's better not to, since the rename will clobber the 
destination anyway.

Mike Stone



Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#357140; Package coreutils. (full text, mbox, link).


Acknowledgement sent to bob@proulx.com (Bob Proulx):
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (full text, mbox, link).


Message #35 received at 357140@bugs.debian.org (full text, mbox, reply):

From: bob@proulx.com (Bob Proulx)
To: Michael Stone <mstone@debian.org>, 357140@bugs.debian.org
Subject: Re: Bug#357140: coreutils: ln -sf has two races which cause breakage
Date: Thu, 16 Mar 2006 09:14:55 -0700
Michael Stone wrote:
> ...but without knowing what the end result is, I didn't see it as an 
> improvement. That is, if two processes are writing at the same time, 
> one's output is going to get clobbered--if one gets clobbered atomically 
> I don't see an obvious advantage.

Sometimes it does not matter which file is in place as long as the
file in place is always syntactically correct.

But you are right, not knowing the particular case here makes it hard
to suggest an alternative.  I was transferring from cases I have
typically run into in my own experience and was pursuing an
alternative which avoided spurious noise from commands.  However, that
noise may have been an early warning indicator that something was
generally wrong.  So making it quiet may not be a good thing after
all because it removes the early warning of other problems.

> Bob Proulx wrote:
> > unlink($conf::build_dir/current-$main::distribution);
> > symlink($main::pkg_logfile,$conf::build_dir/current-$main::distribution.$$) or die;
> > rename($conf::build_dir/current-$main::distribution.$$,$conf::build_dir/current-$main::distribution) or die;

> It's also not necessary to do the unlink in that case--actually it's
> better not to, since the rename will clobber the destination anyway.

Drat!  You are correct.  My test script had it right, unlinking the
symlink target.  But I goofed translating that to the *real* case and
unlinked the wrong target.  That should have been:

  unlink($conf::build_dir/current-$main::distribution.$$);
  symlink($main::pkg_logfile,$conf::build_dir/current-$main::distribution.$$) or die;
  rename($conf::build_dir/current-$main::distribution.$$,$conf::build_dir/current-$main::distribution) or die;

With an unlink of the symlink target.  Otherwise the symlink call may
fail if it tries to symlink to an existing target.

> I'd also use a better random filename than $$.

That would be fine.  But this is a temporary file being used by
cooperating processes in a local tmp directory.  It does not need to
be random.  It just needs to be unique among all of the processes at
that moment in time.  I don't think extreme measures are needed for
this type of simple case.  But obviously it won't hurt and if the case
becomes more complicated later then the infrastructure is in place for
it.  But I was trying to keep the example simple.  (And as you saw I
still made a typo.  :-)

Bob



Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#357140; Package coreutils. (full text, mbox, link).


Acknowledgement sent to Roger Leigh <rleigh@whinlatter.ukfsn.org>:
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (full text, mbox, link).


Message #40 received at 357140@bugs.debian.org (full text, mbox, reply):

From: Roger Leigh <rleigh@whinlatter.ukfsn.org>
To: bob@proulx.com (Bob Proulx)
Cc: 357140@bugs.debian.org
Subject: Re: Bug#357140: coreutils: ln -sf has two races which cause breakage
Date: Fri, 17 Mar 2006 12:07:26 +0000
[Message part 1 (text/plain, inline)]
bob@proulx.com (Bob Proulx) writes:

> Roger Leigh wrote:
>> Michael Stone <mstone@debian.org> writes:
>> > Roger Leigh wrote:
>> >>1) The file can be unlinked by another process in between the stat and
>> >>   the unlink, causing the unlink to fail.  ln should be aware this
>> >>   can happen, and not treat this as a fatal error.
>> >
>> > Why?
>> 
>> Because otherwise the command fails when it should not:
>
> I don't think there ever has been a promise that these commands have
> atomic behavior.

OK.

> If you have suggestions as to how to improve the behavior I am sure
> that would be appreciated.  But I don't see how this can be made
> atomic.  And this does not seem like something to be fixed simply.

I'm not sure if it should or not be fixed, and if fixed, whether it
should or should not be atomic.  I just wanted to report the existence
of the problem.

I'm not sure that (filesystem) atomicity is necessarily required.  The
important thing (for me) is that it succeeds without error unless
there's a real failure (such as permissions issues).  If that means
repeated unlinking until symlink succeeds, that's OK for me: that's
why I used the "-f" switch, because it's OK for the file to be
clobbered.

If this is undesirable, feel free to close or downgrade the severity.
I'm not sure how this would fit in with POSIX/SUS; if it doesn't fit,
it's obviously a broken idea.

>> This *is* a real problem, which we are seeing with the sbuild part of
>> buildd.  It has (perl):
>> 
>>   system "/bin/ln -sf $main::pkg_logfile $conf::build_dir/current-$main::distribution";
>>   system "/bin/ln -sf $main::pkg_logfile $conf::build_dir/current";
>
> Ew, that is ugly.  And why the hard coded paths?

The hard-coded paths are just for testing (in case ln was a sh
builtin).  I know it's really ugly: I didn't write it myself, and I
will be replacing it shortly with proper perl.

The reason this is done is that the "current" symlink points to the
currently running build log, so you could e.g. tail it to monitor the
current build.  When more than one build is running, it points to the
most recently started one, but when they start at almost exactly the
same time, this is when the race is seen.

> If you need an atomic operation then you will need to use a command
> that is atomic such as rename(2) accessable from within mv.  Here is
> an example in shell off the top of my head but expecting to use a
> better temporary name than the one I just produced for this example.
>
>   ln -sf $pkg_logfile $build_dir/current-$distribution.$$
>   mv $build_dir/current-$distribution.$$ $build_dir/current-$distribution
>
> But of course you are in perl and can access rename from perl even
> more directly.

Absolutely.  I'll be doing just that.  Or even just unlink and symlink
and continuing on error: in this case, it's not so important that it
succeeds, just that the current symlink exists and points to the
latest build.  If it fails, it's because it got preempted by another
build, so that's OK.

> Let me suggest the following instead:
>
>   unlink($conf::build_dir/current-$main::distribution);
>   symlink($main::pkg_logfile,$conf::build_dir/current-$main::distribution.$$) or die;
>   rename($conf::build_dir/current-$main::distribution.$$,$conf::build_dir/current-$main::distribution) or die;
>
> I tested the principle using this script.
>
>   #!/usr/bin/perl
>   unlink($ARGV[1].$$);
>   symlink($ARGV[0],$ARGV[1].$$) || die;
>   rename($ARGV[1].$$,$ARGV[1]) || die;

I'll probably just use || return.

Thanks for the suggestions, guys.


Regards,
Roger

-- 
Roger Leigh
                Printing on GNU/Linux?  http://gutenprint.sourceforge.net/
                Debian GNU/Linux        http://www.debian.org/
                GPG Public Key: 0x25BFB848.  Please sign and encrypt your mail.
[Message part 2 (application/pgp-signature, inline)]

Changed Bug title to `coreutils: ln -sf is not atomic' from `coreutils: ln -sf has two races which cause breakage'. Request was from Lucas Nussbaum <lucas@lucas-nussbaum.net> to control@bugs.debian.org. (Wed, 23 Jan 2008 13:09:05 GMT) (full text, mbox, link).


Tags added: wontfix Request was from Lucas Nussbaum <lucas@lucas-nussbaum.net> to control@bugs.debian.org. (Wed, 23 Jan 2008 13:09:05 GMT) (full text, mbox, link).


Severity set to `minor' from `normal' Request was from Lucas Nussbaum <lucas@lucas-nussbaum.net> to control@bugs.debian.org. (Wed, 23 Jan 2008 13:09:06 GMT) (full text, mbox, link).


Reply sent to Michael Stone <mstone@debian.org>:
You have taken responsibility. (full text, mbox, link).


Notification sent to Roger Leigh <rleigh@debian.org>:
Bug acknowledged by developer. (full text, mbox, link).


Message #51 received at 357140-done@bugs.debian.org (full text, mbox, reply):

From: Michael Stone <mstone@debian.org>
To: 357140-done@bugs.debian.org
Subject: Re: #357140: coreutils: ln -sf is not atomic
Date: Wed, 02 Apr 2008 19:19:45 -0400
I'm going to close this since I still question the general utility of 
attempting a sylink after an unlink fails, and since some alternate 
approaches were suggested and seem to have been effective.

Mike Stone




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 01 May 2008 07:33:06 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sat Aug 26 13:27:07 2023; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.