Debian Bug report logs - #641166
coreutils: 'man sort': '--random-sort' misleading

version graph

Package: coreutils; Maintainer for coreutils is Michael Stone <mstone@debian.org>; Source for coreutils is src:coreutils.

Reported by: gwern <gwern0@gmail.com>

Date: Sat, 10 Sep 2011 23:48:02 UTC

Severity: minor

Found in version coreutils/8.5-1

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, gwern0@gmail.com, Michael Stone <mstone@debian.org>:
Bug#641166; Package coreutils. (Sat, 10 Sep 2011 23:48:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to gwern <gwern0@gmail.com>:
New Bug report received and forwarded. Copy sent to gwern0@gmail.com, Michael Stone <mstone@debian.org>. (Sat, 10 Sep 2011 23:48:05 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: gwern <gwern0@gmail.com>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: coreutils: 'man sort': '--random-sort' misleading
Date: Sat, 10 Sep 2011 19:44:21 -0400
Package: coreutils
Version: 8.5-1
Severity: minor

The existing documentation for the option:

       -R, --random-sort
              sort by random hash of keys

This is not wrong, strictly-speaking, but it is misleading: sorting by random hash *sounds* like a perfect shuffle,
which is what 99% of users want, and sorting by hash is equivalent if and if only there are no duplicate entries.
If there *are* duplicate entries, then the 'random' sort will put all duplicates in consecutive runs.

I suggest amending the line to read more like

              sort by random hash of keys; equivalent to perfect shuffle on unique keys

or maybe just say

              sort by random hash of keys; not the same as a perfect shuffle

Or at least warn in some fashion that 'random' is not quite what 'random' usually means on lists.

(I do random shuffle with mplayer using 'sort -R', and once, to 'bias' the selection to a particular set of songs, I put that directory in
3 or 4 times; I thought I was going crazy when the first such song came up 3 times, which I calculated at billions to one against.
I checked everything until I began to wonder what exactly 'random hash of keys' meant, and then saw how it treated duplicate entries.)

-- System Information:
Debian Release: wheezy/sid
  APT prefers proposed-updates
  APT policy: (500, 'proposed-updates'), (500, 'unstable'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.0.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8)
Shell: /bin/sh linked to /bin/dash

Versions of packages coreutils depends on:
ii  libacl1                       2.2.51-3   Access control list shared library
ii  libattr1                      1:2.4.46-3 Extended attribute shared library
ii  libc6                         2.13-18    Embedded GNU C Library: Shared lib
ii  libselinux1                   2.0.98-1.1 SELinux runtime shared libraries

coreutils recommends no packages.

coreutils suggests no packages.

-- no debconf information




Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package coreutils. (Mon, 12 Sep 2011 14:03:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jim Meyering <jim@meyering.net>:
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (Mon, 12 Sep 2011 14:03:04 GMT) Full text and rfc822 format available.

Message #10 received at 641166@bugs.debian.org (full text, mbox, reply):

From: Jim Meyering <jim@meyering.net>
To: gwern <gwern0@gmail.com>
Cc: 641166@bugs.debian.org
Subject: Re: Bug#641166: coreutils: 'man sort': '--random-sort' misleading
Date: Mon, 12 Sep 2011 16:01:07 +0200
gwern wrote:

> Package: coreutils
> Version: 8.5-1
> Severity: minor
>
> The existing documentation for the option:
>
>        -R, --random-sort
>               sort by random hash of keys
>
> This is not wrong, strictly-speaking, but it is misleading: sorting by random hash *sounds* like a perfect shuffle,
> which is what 99% of users want, and sorting by hash is equivalent if and if only there are no duplicate entries.
> If there *are* duplicate entries, then the 'random' sort will put all duplicates in consecutive runs.
>
> I suggest amending the line to read more like
>
>               sort by random hash of keys; equivalent to perfect shuffle on unique keys
>
> or maybe just say
>
>               sort by random hash of keys; not the same as a perfect shuffle
>
> Or at least warn in some fashion that 'random' is not quite what 'random' usually means on lists.
>
> (I do random shuffle with mplayer using 'sort -R', and once, to 'bias' the selection to a particular set of songs, I put that directory in
> 3 or 4 times; I thought I was going crazy when the first such song came up 3 times, which I calculated at billions to one against.
> I checked everything until I began to wonder what exactly 'random hash of keys' meant, and then saw how it treated duplicate entries.)

Thanks for the suggestion, but we try to keep the man page pretty terse,
since it's automatically derived from sort --help output.

Did you see the "real" documentation?

  `-R'
  `--random-sort'
  `--sort=random'
       Sort by hashing the input keys and then sorting the hash values.
       Choose the hash function at random, ensuring that it is free of
       collisions so that differing keys have differing hash values.
       This is like a random permutation of the inputs (*note shuf
       invocation::), except that keys with the same value sort together.

       If multiple random sort fields are specified, the same random hash
       function is used for all fields.  To use different random hash
       functions for different fields, you can invoke `sort' more than
       once.

       The choice of hash function is affected by the `--random-source'
       option.

There should be a note like this the end of the man page:

 SEE ALSO
       The  full documentation for sort is maintained as a Texinfo manual.  If
       the info and sort programs are properly installed  at  your  site,  the
       command

              info sort

       should give you access to the complete manual.




Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package coreutils. (Mon, 12 Sep 2011 15:18:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gwern Branwen <gwern0@gmail.com>:
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (Mon, 12 Sep 2011 15:18:03 GMT) Full text and rfc822 format available.

Message #15 received at 641166@bugs.debian.org (full text, mbox, reply):

From: Gwern Branwen <gwern0@gmail.com>
To: 641166@bugs.debian.org
Subject: Re: Bug#641166: coreutils: 'man sort': '--random-sort' misleading
Date: Mon, 12 Sep 2011 11:15:35 -0400
On Mon, Sep 12, 2011 at 10:01 AM, Jim Meyering <jim@meyering.net> wrote:
> Did you see the "real" documentation?

No; when I was younger, I sometimes looked at the info page for
commands, but invariably they seemed to be useless or copies of the
man page, and I wrote them off completely as a strange GNU waste of
time akin to Guile or other GNU quirks. If length is the problem, how
about adding '; not perfect shuffle'? 3 words in the places most
people will look for documentation.

-- 
gwern
http://www.gwern.net




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#641166; Package coreutils. (Mon, 12 Sep 2011 16:00:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Stone <mstone@debian.org>:
Extra info received and forwarded to list. (Mon, 12 Sep 2011 16:00:08 GMT) Full text and rfc822 format available.

Message #20 received at 641166@bugs.debian.org (full text, mbox, reply):

From: Michael Stone <mstone@debian.org>
To: Gwern Branwen <gwern0@gmail.com>, 641166@bugs.debian.org
Subject: Re: Bug#641166: coreutils: 'man sort': '--random-sort' misleading
Date: Mon, 12 Sep 2011 11:55:38 -0400
On Mon, Sep 12, 2011 at 11:15:35AM -0400, you wrote:
>time akin to Guile or other GNU quirks. If length is the problem, how
>about adding '; not perfect shuffle'? 3 words in the places most
>people will look for documentation.

I can almost guarantee that would lead to a bug report asking for that 
to be explained.

Mike Stone




Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package coreutils. (Mon, 12 Sep 2011 16:09:09 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jim Meyering <jim@meyering.net>:
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (Mon, 12 Sep 2011 16:09:09 GMT) Full text and rfc822 format available.

Message #25 received at 641166@bugs.debian.org (full text, mbox, reply):

From: Jim Meyering <jim@meyering.net>
To: Gwern Branwen <gwern0@gmail.com>
Cc: 641166@bugs.debian.org
Subject: Re: Bug#641166: coreutils: 'man sort': '--random-sort' misleading
Date: Mon, 12 Sep 2011 18:06:40 +0200
Gwern Branwen wrote:
> On Mon, Sep 12, 2011 at 10:01 AM, Jim Meyering <jim@meyering.net> wrote:
>> Did you see the "real" documentation?
>
> No; when I was younger, I sometimes looked at the info page for
> commands, but invariably they seemed to be useless or copies of the

Please try to reset your misconception, at least for the coreutils.
Though, in general the info documentation for GNU programs is far
superior to the man pages.

> man page, and I wrote them off completely as a strange GNU waste of
> time akin to Guile or other GNU quirks. If length is the problem, how
> about adding '; not perfect shuffle'? 3 words in the places most
> people will look for documentation.

Sorry, but that would be inaccurate, because sometimes (no duplicates),
it does give you a perfect shuffle.




Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package coreutils. (Wed, 07 Oct 2015 08:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Philip Hands <phil@hands.com>:
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (Wed, 07 Oct 2015 08:51:04 GMT) Full text and rfc822 format available.

Message #30 received at 641166@bugs.debian.org (full text, mbox, reply):

From: Philip Hands <phil@hands.com>
To: 641166@bugs.debian.org
Subject: Please at least mention shuf in sort's man page
Date: Wed, 07 Oct 2015 09:43:16 +0100
Hi Micheal,

A conversation on #debconf-team this morning which mentioned the use of
sort -R revealed that people that have been bitten by the quirks of
sort -R have a folkloric understanding that there's something not quite
right about it compared with what they want, but that even people that
understand that are not necessarily aware of shuf.

Since shuf does the thing that people are most often wanting, how about
just adding a note to the -R option to say something like:

  you probably want shuf(1) instead

As a data point, I've been aware that GNU commands are documented in
info for over two decades, but having uselessly invoked info only to be
looking at a strangely formatted version of the man page again (because
the real info has been kicked into non-free due to GFDL problems) I
don't think it would occur to me to consult the info if looking at the
sort man page.  Even if I did, I might well miss the importance of the note
reference to shuf.

Given that there is not even a mention of info in the man page, the fact
that the documentation in info is better than in man seems to be a poor
reason not improve the documentation that most people consult.

Cheers, Phil.
-- 
|)|  Philip Hands  [+44 (0)20 8530 9560]  HANDS.COM Ltd.
|-|  http://www.hands.com/    http://ftp.uk.debian.org/
|(|  Hugo-Klemm-Strasse 34,   21075 Hamburg,    GERMANY



Information forwarded to debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package coreutils. (Mon, 19 Oct 2015 16:30:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pádraig Brady <P@draigBrady.com>:
Extra info received and forwarded to list. Copy sent to Michael Stone <mstone@debian.org>. (Mon, 19 Oct 2015 16:30:03 GMT) Full text and rfc822 format available.

Message #35 received at 641166@bugs.debian.org (full text, mbox, reply):

From: Pádraig Brady <P@draigBrady.com>
To: Philip Hands <phil@hands.com>, 641166@bugs.debian.org
Subject: Re: Bug#641166: Please at least mention shuf in sort's man page
Date: Mon, 19 Oct 2015 17:20:50 +0100
On 07/10/15 09:43, Philip Hands wrote:
> Hi Micheal,
> 
> A conversation on #debconf-team this morning which mentioned the use of
> sort -R revealed that people that have been bitten by the quirks of
> sort -R have a folkloric understanding that there's something not quite
> right about it compared with what they want, but that even people that
> understand that are not necessarily aware of shuf.
> 
> Since shuf does the thing that people are most often wanting, how about
> just adding a note to the -R option to say something like:
> 
>   you probably want shuf(1) instead

Done upstream at:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v8.24-52-ge50f527

> As a data point, I've been aware that GNU commands are documented in
> info for over two decades, but having uselessly invoked info only to be
> looking at a strangely formatted version of the man page again (because
> the real info has been kicked into non-free due to GFDL problems) I
> don't think it would occur to me to consult the info if looking at the
> sort man page.

Note since 8.24, sort man page now points to
http://www.gnu.org/software/coreutils/sort

cheers,
Pádraig.



Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Oct 21 02:04:25 2016; Machine Name: beach

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.