Debian Bug report logs - #632450
ITP: pmatch -- Duplicate finder and removal tool.

Package: wnpp; Maintainer for wnpp is wnpp@debian.org;

Reported by: Tomasz Muras <nexor1984@gmail.com>

Date: Sat, 2 Jul 2011 10:30:02 UTC

Owned by: Tomasz Muras <nexor1984@gmail.com>

Severity: wishlist

Done: Tomasz Muras <nexor1984@gmail.com>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, nexor1984@gmail.com, debian-devel@lists.debian.org, wnpp@debian.org:
Bug#632450; Package wnpp. (Sat, 02 Jul 2011 10:30:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tomasz Muras <nexor1984@gmail.com>:
New Bug report received and forwarded. Copy sent to nexor1984@gmail.com, debian-devel@lists.debian.org, wnpp@debian.org. (Sat, 02 Jul 2011 10:30:10 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Tomasz Muras <nexor1984@gmail.com>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: ITP: pmatch -- Duplicate finder and removal tool.
Date: Sat, 02 Jul 2011 11:27:36 +0100
Package: wnpp
Severity: wishlist
Owner: Tomasz Muras <nexor1984@gmail.com>


* Package name    : pmatch
  Version         : 0.4.0
  Upstream Author : Tomasz Muras <nexor1984@gmail.com>
* URL             : http://pmatch.rubyforge.org
* License         : GPL v3
  Programming Lang: Ruby
  Description     : Duplicate finder and removal tool.

Perfect Match (pmatch) is a command-line utility for finding duplicate files.
It can perform some logic for choosing which duplicate to act on (delete, create
a link or perform any other action).




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Sat, 02 Jul 2011 11:15:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lars Wirzenius <liw@liw.fi>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Sat, 02 Jul 2011 11:15:06 GMT) Full text and rfc822 format available.

Message #10 received at submit@bugs.debian.org (full text, mbox):

From: Lars Wirzenius <liw@liw.fi>
To: Tomasz Muras <nexor1984@gmail.com>, 632450@bugs.debian.org
Cc: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Re: Bug#632450: ITP: pmatch -- Duplicate finder and removal tool.
Date: Sat, 2 Jul 2011 12:12:31 +0100
On Sat, Jul 02, 2011 at 11:27:36AM +0100, Tomasz Muras wrote:
> * Package name    : pmatch
>   Version         : 0.4.0
>   Upstream Author : Tomasz Muras <nexor1984@gmail.com>
> * URL             : http://pmatch.rubyforge.org
> * License         : GPL v3
>   Programming Lang: Ruby
>   Description     : Duplicate finder and removal tool.
> 
> Perfect Match (pmatch) is a command-line utility for finding duplicate files.
> It can perform some logic for choosing which duplicate to act on (delete, create
> a link or perform any other action).

We have a bunch of these in Debian already. For example:

* finddup
* finddup
* hardlink

How does pmatch compare to the others?

I wrote one myself (http://liw.fi/dupfiles/), but it turned out
hardlink is way faster, so decided not to add mine to Debian, since
it would be unnecessary duplication.

-- 
Freedom-based blog/wiki/web hosting: http://www.branchable.com/




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Sat, 02 Jul 2011 11:24:14 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lars Wirzenius <liw@liw.fi>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Sat, 02 Jul 2011 11:24:15 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#632450; Package wnpp. (Sat, 02 Jul 2011 12:27:17 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tomasz Muras <nexor1984@gmail.com>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Sat, 02 Jul 2011 12:27:39 GMT) Full text and rfc822 format available.

Message #20 received at 632450@bugs.debian.org (full text, mbox):

From: Tomasz Muras <nexor1984@gmail.com>
To: Lars Wirzenius <liw@liw.fi>
Cc: 632450@bugs.debian.org
Subject: Re: Bug#632450: ITP: pmatch -- Duplicate finder and removal tool.
Date: Sat, 2 Jul 2011 13:16:36 +0100
Lars,

On Sat, Jul 2, 2011 at 1:03 PM, Lars Wirzenius <liw@liw.fi> wrote:
> It seems you didn't Cc the bug, or debian-devel. Just in case that
> was intentional, I'm not doing it either.

My mistake, thanks for pointing it out.

> On Sat, Jul 02, 2011 at 12:36:51PM +0100, Tomasz Muras wrote:
>> It is different as it (tries to) solve the problem of not just on
>> finding the duplicates but also what should be done with them once
>> they are found (e.g. which file should be considered original and
>> which duplicate). My original motivation behind first looking for and
>> then creating this utility was cleaning up my photos: imagine
>> thousands of files in hundreds of directories that needed to be clean
>> up. I had a preference to leave some files in sorted directories,
>> while removing the duplicates from all those "dump", "backup", etc
>> ones in the automated fashion. And my top priority: I could not allow
>> for any mistakes, so I've put significant effort into testing the
>> tool.
>>
>> The second problem it solves is finding and acting on files that are
>> partial files of some other, presumably full file (e.g. not completed
>> FTP download).
>>
>> Before I started working on it I looked for similar utilities and
>> documented it [1]. Also see [2] for other usages.
>>
>> [1] http://pmatch.rubyforge.org/competition.html
>> [2] http://pmatch.rubyforge.org/usage.html
>>
>> I welcome any comments and criticism.
>> Tomek
>
> That does make pmatch seem like a very useful tool! You should add
> some summary of that information from the usage page to your long
> package description.

Agreed. I guess I did a poor job at "advertising" the package.

> Your description said you use a hash to compare files. Is that
> a hash of the complete file? I found, when developing my tool,
> that it's much faster to compare just a little bit of data from
> the beginning of the file, and since my data set had several quite
> large files, this had a big impact. (Obviously, check file size first.)
>
> I quite like your approach of writing out shell commands instead of
> doing any changes directly.
>
> Looking forward to seeing pmatch in Debian.

Agreed again, I'm planning to do more work on pmatch soon - at the
moment getting it into Debian is my priority.
Comparing the initial size may be a very good idea, especially for my
use case (photos) as most of the files are of similar size.

Thank you for your review Lars,
Tomek




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Sat, 02 Jul 2011 14:06:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Olivier Berger <olivier.berger@it-sudparis.eu>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Sat, 02 Jul 2011 14:06:03 GMT) Full text and rfc822 format available.

Message #25 received at 632450@bugs.debian.org (full text, mbox):

From: Olivier Berger <olivier.berger@it-sudparis.eu>
To: Tomasz Muras <nexor1984@gmail.com>
Cc: 632450@bugs.debian.org
Subject: Re: Bug#632450: ITP: pmatch -- Duplicate finder and removal tool.
Date: Sat, 02 Jul 2011 16:03:27 +0200
Hi.

Le samedi 02 juillet 2011 à 11:27 +0100, Tomasz Muras a écrit :
> Package: wnpp
> Severity: wishlist
> Owner: Tomasz Muras <nexor1984@gmail.com>
> 
> 
> * Package name    : pmatch
>   Version         : 0.4.0
>   Upstream Author : Tomasz Muras <nexor1984@gmail.com>
> * URL             : http://pmatch.rubyforge.org
> * License         : GPL v3
>   Programming Lang: Ruby
>   Description     : Duplicate finder and removal tool.
> 
> Perfect Match (pmatch) is a command-line utility for finding duplicate files.
> It can perform some logic for choosing which duplicate to act on (delete, create
> a link or perform any other action).


For the records, how does it compare to fslint (already packaged) ?

Best regards,

-- 
Olivier BERGER <olivier.berger@it-sudparis.eu>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 2048R/5819D7E8
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)





Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Sun, 03 Jul 2011 06:39:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to thomas@koch.ro:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Sun, 03 Jul 2011 06:39:07 GMT) Full text and rfc822 format available.

Message #30 received at submit@bugs.debian.org (full text, mbox):

From: Thomas Koch <thomas@koch.ro>
To: debian-devel@lists.debian.org, Tomasz Muras <nexor1984@gmail.com>, 632450@bugs.debian.org
Cc: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Re: Bug#632450: ITP: pmatch -- Duplicate finder and removal tool.
Date: Sun, 3 Jul 2011 08:35:54 +0200
Tomasz Muras:
> Package: wnpp
> Severity: wishlist
> Owner: Tomasz Muras <nexor1984@gmail.com>
> 
> 
> * Package name    : pmatch
>   Version         : 0.4.0
>   Upstream Author : Tomasz Muras <nexor1984@gmail.com>
> * URL             : http://pmatch.rubyforge.org
> * License         : GPL v3
>   Programming Lang: Ruby
>   Description     : Duplicate finder and removal tool.
> 
> Perfect Match (pmatch) is a command-line utility for finding duplicate
> files. It can perform some logic for choosing which duplicate to act on
> (delete, create a link or perform any other action).

Hi,

we already got the packages fdupes, rdfind, fslint, perforate for this 
purpose.[1] And there's a tool called duff[2], not yet packaged for Debian. I 
don't think we need another tool for this. And I don't think such basic tools 
should be written in a scripting language.

[1] found by axi-cache search duplicate files
[2] http://duff.sourceforge.net/

Thank you however for your initiative to contribute to Debian! If you have any 
questions, please just ask.

Best regards,

Thomas Koch, http://www.koch.ro




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Sun, 03 Jul 2011 07:18:09 GMT) Full text and rfc822 format available.

Acknowledgement sent to thomas@koch.ro:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Sun, 03 Jul 2011 07:18:09 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#632450; Package wnpp. (Sun, 03 Jul 2011 10:30:13 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tomasz Muras <nexor1984@gmail.com>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Sun, 03 Jul 2011 10:30:25 GMT) Full text and rfc822 format available.

Message #40 received at 632450@bugs.debian.org (full text, mbox):

From: Tomasz Muras <nexor1984@gmail.com>
To: thomas@koch.ro
Cc: debian-devel@lists.debian.org, 632450@bugs.debian.org, olivier.berger@it-sudparis.eu, Lars Wirzenius <liw@liw.fi>
Subject: Re: Bug#632450: ITP: pmatch -- Duplicate finder and removal tool.
Date: Sun, 3 Jul 2011 11:26:37 +0100
>> Package: wnpp
>> Severity: wishlist
>> Owner: Tomasz Muras <nexor1984@gmail.com>
>>
>>
>> * Package name    : pmatch
>>   Version         : 0.4.0
>>   Upstream Author : Tomasz Muras <nexor1984@gmail.com>
>> * URL             : http://pmatch.rubyforge.org
>> * License         : GPL v3
>>   Programming Lang: Ruby
>>   Description     : Duplicate finder and removal tool.
>>
>> Perfect Match (pmatch) is a command-line utility for finding duplicate
>> files. It can perform some logic for choosing which duplicate to act on
>> (delete, create a link or perform any other action).
>
> Hi,
>
> we already got the packages fdupes, rdfind, fslint, perforate for this
> purpose.[1] And there's a tool called duff[2], not yet packaged for Debian. I
> don't think we need another tool for this. And I don't think such basic tools
> should be written in a scripting language.
>
> [1] found by axi-cache search duplicate files
> [2] http://duff.sourceforge.net/

To answer this one and comment from Olivier Berger.
fslint is a GUI program, pmatch focuses on CLI functionality.
I covered difference between fdupes (and other) already in my previous
mail, see [1]. The same applies to duff and perforate.
rdfind has similar functionality but it only allows for
differentiating between originals and duplicates via the order of the
arguments. pmatch will allow for "secondary choices". pmatch also
searches for "partial" files, If you're interested in even more
similar application (not necessarily packaged for Debian, see [2]).

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632450#20
[2] http://pmatch.rubyforge.org/competition.html

Thanks for the comments,
Tomasz Muras




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Sun, 03 Jul 2011 11:06:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to thomas@koch.ro:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Sun, 03 Jul 2011 11:06:13 GMT) Full text and rfc822 format available.

Message #45 received at 632450@bugs.debian.org (full text, mbox):

From: Thomas Koch <thomas@koch.ro>
To: debian-devel@lists.debian.org
Cc: Tomasz Muras <nexor1984@gmail.com>, 632450@bugs.debian.org, olivier.berger@it-sudparis.eu, Lars Wirzenius <liw@liw.fi>
Subject: Re: Bug#632450: ITP: pmatch -- Duplicate finder and removal tool.
Date: Sun, 3 Jul 2011 13:02:48 +0200
Ok, I didn't see that this ITP already had a discussion. However there are 
three minor and subjective issues that I have with this ITP:

- It's not really in the spirit of Open Source of Free Software to start a new 
project, if there already are a douzen projects in the same area. If none of 
them fits your needs perfectly there should be at least one that could be 
extended to your needs instead of starting from scratch.

- It's always better if somebody else beside the upstream author who also 
wants a package to be in Debian. In that case there's proof that the package 
is at least important for a second person.

- I hate ruby with a passion ... :-)

Best regards,

Thomas Koch, http://www.koch.ro




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Mon, 04 Jul 2011 01:51:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tshepang Lekhonkhobe <tshepang@gmail.com>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Mon, 04 Jul 2011 01:51:06 GMT) Full text and rfc822 format available.

Message #50 received at 632450@bugs.debian.org (full text, mbox):

From: Tshepang Lekhonkhobe <tshepang@gmail.com>
To: thomas@koch.ro
Cc: debian-devel@lists.debian.org, Tomasz Muras <nexor1984@gmail.com>, 632450@bugs.debian.org, olivier.berger@it-sudparis.eu, Lars Wirzenius <liw@liw.fi>
Subject: Re: Bug#632450: ITP: pmatch -- Duplicate finder and removal tool.
Date: Mon, 04 Jul 2011 03:45:14 +0200
On Sun, 2011-07-03 at 13:02 +0200, Thomas Koch wrote:
> Ok, I didn't see that this ITP already had a discussion. However there are 
> three minor and subjective issues that I have with this ITP:
> 
> - It's not really in the spirit of Open Source of Free Software to start a new 
> project, if there already are a douzen projects in the same area. If none of 
> them fits your needs perfectly there should be at least one that could be 
> extended to your needs instead of starting from scratch.

Many tools (perhaps most FLOSS) start off as toy projects, with the
author not expecting to go far with them. And then they get to the point
where they actually start being useful, even being better than existing
ones. In such a case, you don't really want it thrown away or kept
private right? Imagine that there isn't much motivation (e.g. wrong
language, ugly code, not enough time, too boring) to port back the
features to the older tool.

> - It's always better if somebody else beside the upstream author who also 
> wants a package to be in Debian. In that case there's proof that the package 
> is at least important for a second person.

What of cases where the package would get wider usage simply for being
available in Debian? I'm not saying all trash should be accepted, I'm
just saying that if, for example, one's own project is their first real
exposure to technical Debian work, why not have them start with what
they want the most for now... package their project for Debian?






Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Mon, 04 Jul 2011 01:57:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tshepang Lekhonkhobe <tshepang@gmail.com>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Mon, 04 Jul 2011 01:57:03 GMT) Full text and rfc822 format available.

Message #55 received at submit@bugs.debian.org (full text, mbox):

From: Tshepang Lekhonkhobe <tshepang@gmail.com>
To: thomas@koch.ro
Cc: debian-devel@lists.debian.org, Tomasz Muras <nexor1984@gmail.com>, 632450@bugs.debian.org, Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Re: Bug#632450: ITP: pmatch -- Duplicate finder and removal tool.
Date: Mon, 04 Jul 2011 03:52:31 +0200
On Sun, 2011-07-03 at 08:35 +0200, Thomas Koch wrote:
> I don't think we need another tool for this.
 
Maybe it's better than them in some way (not necessarily overall), which
in my books would make it good enough, meaning *we need another tool for
this*.

> And I don't think such basic tools should be written in a scripting
> language.

If this isn't a joke, it's a real bad reason.





Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>:
Bug#632450; Package wnpp. (Mon, 04 Jul 2011 01:57:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tshepang Lekhonkhobe <tshepang@gmail.com>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Tomasz Muras <nexor1984@gmail.com>. (Mon, 04 Jul 2011 01:57:05 GMT) Full text and rfc822 format available.

Reply sent to Tomasz Muras <nexor1984@gmail.com>:
You have taken responsibility. (Wed, 07 Dec 2011 19:00:48 GMT) Full text and rfc822 format available.

Notification sent to Tomasz Muras <nexor1984@gmail.com>:
Bug acknowledged by developer. (Wed, 07 Dec 2011 19:00:48 GMT) Full text and rfc822 format available.

Message #65 received at 632450-done@bugs.debian.org (full text, mbox):

From: Tomasz Muras <nexor1984@gmail.com>
To: 632450-done@bugs.debian.org
Subject: Closing
Date: Wed, 07 Dec 2011 18:53:36 +0000
Looks like including pmatch package in Debian is too controversial - so
let's drop it, closing bug.

Tomasz (Tomek) Muras




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 05 Jan 2012 07:36:03 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Apr 20 19:26:21 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.