Debian Bug report logs -
#641166
coreutils: 'man sort': '--random-sort' misleading
Reported by: gwern <gwern0@gmail.com>
Date: Sat, 10 Sep 2011 23:48:02 UTC
Severity: minor
Found in version coreutils/8.5-1
Reply or subscribe to this bug.
Toggle useless messages
Report forwarded
to
debian-bugs-dist@lists.debian.org, gwern0@gmail.com, Michael Stone <mstone@debian.org>:
Bug#641166; Package
coreutils.
(Sat, 10 Sep 2011 23:48:05 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
gwern <gwern0@gmail.com>:
New Bug report received and forwarded. Copy sent to
gwern0@gmail.com, Michael Stone <mstone@debian.org>.
(Sat, 10 Sep 2011 23:48:05 GMT)
Full text and
rfc822 format available.
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
Package: coreutils
Version: 8.5-1
Severity: minor
The existing documentation for the option:
-R, --random-sort
sort by random hash of keys
This is not wrong, strictly-speaking, but it is misleading: sorting by random hash *sounds* like a perfect shuffle,
which is what 99% of users want, and sorting by hash is equivalent if and if only there are no duplicate entries.
If there *are* duplicate entries, then the 'random' sort will put all duplicates in consecutive runs.
I suggest amending the line to read more like
sort by random hash of keys; equivalent to perfect shuffle on unique keys
or maybe just say
sort by random hash of keys; not the same as a perfect shuffle
Or at least warn in some fashion that 'random' is not quite what 'random' usually means on lists.
(I do random shuffle with mplayer using 'sort -R', and once, to 'bias' the selection to a particular set of songs, I put that directory in
3 or 4 times; I thought I was going crazy when the first such song came up 3 times, which I calculated at billions to one against.
I checked everything until I began to wonder what exactly 'random hash of keys' meant, and then saw how it treated duplicate entries.)
-- System Information:
Debian Release: wheezy/sid
APT prefers proposed-updates
APT policy: (500, 'proposed-updates'), (500, 'unstable'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.0.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8)
Shell: /bin/sh linked to /bin/dash
Versions of packages coreutils depends on:
ii libacl1 2.2.51-3 Access control list shared library
ii libattr1 1:2.4.46-3 Extended attribute shared library
ii libc6 2.13-18 Embedded GNU C Library: Shared lib
ii libselinux1 2.0.98-1.1 SELinux runtime shared libraries
coreutils recommends no packages.
coreutils suggests no packages.
-- no debconf information
Information forwarded
to
debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package
coreutils.
(Mon, 12 Sep 2011 14:03:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Jim Meyering <jim@meyering.net>:
Extra info received and forwarded to list. Copy sent to
Michael Stone <mstone@debian.org>.
(Mon, 12 Sep 2011 14:03:04 GMT)
Full text and
rfc822 format available.
Message #10 received at 641166@bugs.debian.org (full text, mbox, reply):
gwern wrote:
> Package: coreutils
> Version: 8.5-1
> Severity: minor
>
> The existing documentation for the option:
>
> -R, --random-sort
> sort by random hash of keys
>
> This is not wrong, strictly-speaking, but it is misleading: sorting by random hash *sounds* like a perfect shuffle,
> which is what 99% of users want, and sorting by hash is equivalent if and if only there are no duplicate entries.
> If there *are* duplicate entries, then the 'random' sort will put all duplicates in consecutive runs.
>
> I suggest amending the line to read more like
>
> sort by random hash of keys; equivalent to perfect shuffle on unique keys
>
> or maybe just say
>
> sort by random hash of keys; not the same as a perfect shuffle
>
> Or at least warn in some fashion that 'random' is not quite what 'random' usually means on lists.
>
> (I do random shuffle with mplayer using 'sort -R', and once, to 'bias' the selection to a particular set of songs, I put that directory in
> 3 or 4 times; I thought I was going crazy when the first such song came up 3 times, which I calculated at billions to one against.
> I checked everything until I began to wonder what exactly 'random hash of keys' meant, and then saw how it treated duplicate entries.)
Thanks for the suggestion, but we try to keep the man page pretty terse,
since it's automatically derived from sort --help output.
Did you see the "real" documentation?
`-R'
`--random-sort'
`--sort=random'
Sort by hashing the input keys and then sorting the hash values.
Choose the hash function at random, ensuring that it is free of
collisions so that differing keys have differing hash values.
This is like a random permutation of the inputs (*note shuf
invocation::), except that keys with the same value sort together.
If multiple random sort fields are specified, the same random hash
function is used for all fields. To use different random hash
functions for different fields, you can invoke `sort' more than
once.
The choice of hash function is affected by the `--random-source'
option.
There should be a note like this the end of the man page:
SEE ALSO
The full documentation for sort is maintained as a Texinfo manual. If
the info and sort programs are properly installed at your site, the
command
info sort
should give you access to the complete manual.
Information forwarded
to
debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package
coreutils.
(Mon, 12 Sep 2011 15:18:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Gwern Branwen <gwern0@gmail.com>:
Extra info received and forwarded to list. Copy sent to
Michael Stone <mstone@debian.org>.
(Mon, 12 Sep 2011 15:18:03 GMT)
Full text and
rfc822 format available.
Message #15 received at 641166@bugs.debian.org (full text, mbox, reply):
On Mon, Sep 12, 2011 at 10:01 AM, Jim Meyering <jim@meyering.net> wrote:
> Did you see the "real" documentation?
No; when I was younger, I sometimes looked at the info page for
commands, but invariably they seemed to be useless or copies of the
man page, and I wrote them off completely as a strange GNU waste of
time akin to Guile or other GNU quirks. If length is the problem, how
about adding '; not perfect shuffle'? 3 words in the places most
people will look for documentation.
--
gwern
http://www.gwern.net
Information forwarded
to
debian-bugs-dist@lists.debian.org:
Bug#641166; Package
coreutils.
(Mon, 12 Sep 2011 16:00:07 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Michael Stone <mstone@debian.org>:
Extra info received and forwarded to list.
(Mon, 12 Sep 2011 16:00:08 GMT)
Full text and
rfc822 format available.
Message #20 received at 641166@bugs.debian.org (full text, mbox, reply):
On Mon, Sep 12, 2011 at 11:15:35AM -0400, you wrote:
>time akin to Guile or other GNU quirks. If length is the problem, how
>about adding '; not perfect shuffle'? 3 words in the places most
>people will look for documentation.
I can almost guarantee that would lead to a bug report asking for that
to be explained.
Mike Stone
Information forwarded
to
debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package
coreutils.
(Mon, 12 Sep 2011 16:09:09 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Jim Meyering <jim@meyering.net>:
Extra info received and forwarded to list. Copy sent to
Michael Stone <mstone@debian.org>.
(Mon, 12 Sep 2011 16:09:09 GMT)
Full text and
rfc822 format available.
Message #25 received at 641166@bugs.debian.org (full text, mbox, reply):
Gwern Branwen wrote:
> On Mon, Sep 12, 2011 at 10:01 AM, Jim Meyering <jim@meyering.net> wrote:
>> Did you see the "real" documentation?
>
> No; when I was younger, I sometimes looked at the info page for
> commands, but invariably they seemed to be useless or copies of the
Please try to reset your misconception, at least for the coreutils.
Though, in general the info documentation for GNU programs is far
superior to the man pages.
> man page, and I wrote them off completely as a strange GNU waste of
> time akin to Guile or other GNU quirks. If length is the problem, how
> about adding '; not perfect shuffle'? 3 words in the places most
> people will look for documentation.
Sorry, but that would be inaccurate, because sometimes (no duplicates),
it does give you a perfect shuffle.
Information forwarded
to
debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package
coreutils.
(Wed, 07 Oct 2015 08:51:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Philip Hands <phil@hands.com>:
Extra info received and forwarded to list. Copy sent to
Michael Stone <mstone@debian.org>.
(Wed, 07 Oct 2015 08:51:04 GMT)
Full text and
rfc822 format available.
Message #30 received at 641166@bugs.debian.org (full text, mbox, reply):
Hi Micheal,
A conversation on #debconf-team this morning which mentioned the use of
sort -R revealed that people that have been bitten by the quirks of
sort -R have a folkloric understanding that there's something not quite
right about it compared with what they want, but that even people that
understand that are not necessarily aware of shuf.
Since shuf does the thing that people are most often wanting, how about
just adding a note to the -R option to say something like:
you probably want shuf(1) instead
As a data point, I've been aware that GNU commands are documented in
info for over two decades, but having uselessly invoked info only to be
looking at a strangely formatted version of the man page again (because
the real info has been kicked into non-free due to GFDL problems) I
don't think it would occur to me to consult the info if looking at the
sort man page. Even if I did, I might well miss the importance of the note
reference to shuf.
Given that there is not even a mention of info in the man page, the fact
that the documentation in info is better than in man seems to be a poor
reason not improve the documentation that most people consult.
Cheers, Phil.
--
|)| Philip Hands [+44 (0)20 8530 9560] HANDS.COM Ltd.
|-| http://www.hands.com/ http://ftp.uk.debian.org/
|(| Hugo-Klemm-Strasse 34, 21075 Hamburg, GERMANY
Information forwarded
to
debian-bugs-dist@lists.debian.org, Michael Stone <mstone@debian.org>:
Bug#641166; Package
coreutils.
(Mon, 19 Oct 2015 16:30:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Pádraig Brady <P@draigBrady.com>:
Extra info received and forwarded to list. Copy sent to
Michael Stone <mstone@debian.org>.
(Mon, 19 Oct 2015 16:30:03 GMT)
Full text and
rfc822 format available.
Message #35 received at 641166@bugs.debian.org (full text, mbox, reply):
On 07/10/15 09:43, Philip Hands wrote:
> Hi Micheal,
>
> A conversation on #debconf-team this morning which mentioned the use of
> sort -R revealed that people that have been bitten by the quirks of
> sort -R have a folkloric understanding that there's something not quite
> right about it compared with what they want, but that even people that
> understand that are not necessarily aware of shuf.
>
> Since shuf does the thing that people are most often wanting, how about
> just adding a note to the -R option to say something like:
>
> you probably want shuf(1) instead
Done upstream at:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v8.24-52-ge50f527
> As a data point, I've been aware that GNU commands are documented in
> info for over two decades, but having uselessly invoked info only to be
> looking at a strangely formatted version of the man page again (because
> the real info has been kicked into non-free due to GFDL problems) I
> don't think it would occur to me to consult the info if looking at the
> sort man page.
Note since 8.24, sort man page now points to
http://www.gnu.org/software/coreutils/sort
cheers,
Pádraig.
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Fri Oct 21 02:04:25 2016;
Machine Name:
beach
Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.