Debian Bug report logs - #499328
new magic: Haskell program files

version graph

Package: file; Maintainer for file is Christoph Biedl <debian.axhn@manchmal.in-ulm.de>; Source for file is src:file.

Reported by: Julian Gilbey <jdg@debian.org>

Date: Wed, 17 Sep 2008 20:15:01 UTC

Severity: wishlist

Tags: confirmed, help, upstream

Found in version file/4.25-1

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Daniel Baumann <daniel@debian.org>:
Bug#499328; Package file. Full text and rfc822 format available.

Acknowledgement sent to Julian Gilbey <jdg@debian.org>:
New Bug report received and forwarded. Copy sent to Daniel Baumann <daniel@debian.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Julian Gilbey <jdg@debian.org>
To: Debian bug reports <submit@bugs.debian.org>
Subject: file: recognise Haskell program files
Date: Wed, 17 Sep 2008 21:13:15 +0100
Package: file
Version: 4.25-1
Severity: wishlist

Currently, file reports that a Haskell source file is actually a Java
program text file.  Oops.

   Julian




Information forwarded to debian-bugs-dist@lists.debian.org, Daniel Baumann <daniel@lists.debian-maintainers.org>:
Bug#499328; Package file. (Mon, 26 Apr 2010 16:33:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gwern Branwen <gwern0@gmail.com>:
Extra info received and forwarded to list. Copy sent to Daniel Baumann <daniel@lists.debian-maintainers.org>. (Mon, 26 Apr 2010 16:33:06 GMT) Full text and rfc822 format available.

Message #10 received at 499328@bugs.debian.org (full text, mbox):

From: Gwern Branwen <gwern0@gmail.com>
To: 499328 <499328@bugs.debian.org>
Subject: Heuristics
Date: Mon, 26 Apr 2010 12:29:55 -0400
There are a couple heuristics that 'file' could use to detect Haskell files:

- Java files always have {}s in them*. Haskell files may have {}s, but
mostly don't.
- Java 'import' statements end in ';'. Haskell 'import' statements are
allowed to end in in ';' as part of the alternate syntax, but in
practice very rarely do. I have hundreds** of Haskell source
repositories, and grepping through them all, I found 12 repos*** using
semicolons with import statements.
- Haskell has some unusual syntax that Java doesn't; '::' in type
signatures, for example. Or operators like '>>='.
- Haskell modules usually start with either Haskell comments, -- or {-
-}, which differ significantly from Java comments, // or /* */
- or they start or with the 'module' keyword, where Java would start
with a visibility modifier, public/private/protected, module kinds,
abstract/interface, and the 'class' keyword.
- And of course, Haskell files are usually suffixed .hs or .lhs, while
Java files are more usually .java

Between these 6 differences, I think 'file' could quite reliably
distinguish Java from Haskell files.

* #java on Freenode tells me that a Java source file which consists
only of 'import' statements can get away with no {}s, but such a file
will do nothing useful. And 'package-info.java' files - some sort of
autogenerated file - can also apparently be {}-free. But these are
very much edge-cases and I think can be disregarded.
** roughly 970
*** hlint, ddc, jhc,  open-witnesses, nobench, buddha, cabal,
langage-c, bnfc, protocol-buffers, lhc, hera

-- 
gwern




Changed Bug title to 'new magic: Haskell program files' from 'file: recognise Haskell program files' Request was from Daniel Baumann <daniel.baumann@progress-technologies.net> to control@bugs.debian.org. (Wed, 13 Mar 2013 12:03:25 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Christoph Biedl <debian.axhn@manchmal.in-ulm.de>:
Bug#499328; Package file. (Fri, 07 Mar 2014 11:57:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Christoph Biedl <christoph@biedl.ath.cx>:
Extra info received and forwarded to list. Copy sent to Christoph Biedl <debian.axhn@manchmal.in-ulm.de>. (Fri, 07 Mar 2014 11:57:04 GMT) Full text and rfc822 format available.

Message #17 received at 499328@bugs.debian.org (full text, mbox):

From: Christoph Biedl <christoph@biedl.ath.cx>
To: Julian Gilbey <jdg@debian.org>, Gwern Branwen <gwern0@gmail.com>, 499328@bugs.debian.org
Subject: Re: Bug#499328: Heuristics
Date: Fri, 7 Mar 2014 12:55:25 +0100
[Message part 1 (text/plain, inline)]
tags 499328 confirmed upstream moreinfo
thanks

Gwern Branwen wrote...

> There are a couple heuristics that 'file' could use to detect Haskell files:
(...)

I like that list, it should provide enough stuff to create good
patterns. Since my knowledge of Haskell is rather limited (but I wrote
some programs in Miranda many years ago), can you recommend a package
of very typical Haskell sources (there is no such thing as typical
sources, I know) so I can do tests?

Just one thing:

> - And of course, Haskell files are usually suffixed .hs or .lhs, while
> Java files are more usually .java

That one will not work, file(1) is completely filename agnostic.

    Christoph
[signature.asc (application/pgp-signature, inline)]

Added tag(s) upstream, confirmed, and moreinfo. Request was from Christoph Biedl <christoph@biedl.ath.cx> to control@bugs.debian.org. (Fri, 07 Mar 2014 11:57:07 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Christoph Biedl <debian.axhn@manchmal.in-ulm.de>:
Bug#499328; Package file. (Fri, 07 Mar 2014 16:48:21 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gwern Branwen <gwern0@gmail.com>:
Extra info received and forwarded to list. Copy sent to Christoph Biedl <debian.axhn@manchmal.in-ulm.de>. (Fri, 07 Mar 2014 16:48:21 GMT) Full text and rfc822 format available.

Message #24 received at 499328@bugs.debian.org (full text, mbox):

From: Gwern Branwen <gwern0@gmail.com>
To: Christoph Biedl <christoph@biedl.ath.cx>
Cc: Julian Gilbey <jdg@debian.org>, 499328 <499328@bugs.debian.org>
Subject: Re: Bug#499328: Heuristics
Date: Fri, 7 Mar 2014 11:46:08 -0500
On Fri, Mar 7, 2014 at 6:55 AM, Christoph Biedl <christoph@biedl.ath.cx> wrote:
> Since my knowledge of Haskell is rather limited (but I wrote
> some programs in Miranda many years ago), can you recommend a package
> of very typical Haskell sources (there is no such thing as typical
> sources, I know) so I can do tests?

You can find all the Haskell sources you could desire on
http://hackage.haskell.org/packages/ ; I'm not sure about *typical*,
but as far as 'clean' goes, the XMonad source is pretty good, and if
you want to keep accuracy on popular codebases, I'd suggest Darcs,
Pandoc, & hledger. (GHC is probably way too big and gnarly to start
on.) The CSV libraries are generally small, if that's helpful.

-- 
gwern
http://www.gwern.net



Removed tag(s) moreinfo. Request was from Christoph Biedl <debian.axhn@manchmal.in-ulm.de> to control@bugs.debian.org. (Sun, 09 Mar 2014 12:21:20 GMT) Full text and rfc822 format available.

Added tag(s) help. Request was from Christoph Biedl <debian.axhn@manchmal.in-ulm.de> to control@bugs.debian.org. (Sun, 09 Mar 2014 12:21:20 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Apr 24 04:22:20 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.