Debian Bug report logs - #395439
[uscan] Option for parsing source code (not webpage) for version number

version graph

Package: devscripts; Maintainer for devscripts is Devscripts Devel Team <devscripts-devel@lists.alioth.debian.org>; Source for devscripts is src:devscripts.

Reported by: Axel Beckert <abe@debian.org>

Date: Fri, 27 Oct 2006 00:18:08 UTC

Severity: wishlist

Found in version devscripts/2.9.22

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Axel Beckert <abe@deuxchevaux.org>, Julian Gilbey <jdg@debian.org>:
Bug#395439; Package devscripts. Full text and rfc822 format available.

Acknowledgement sent to Axel Beckert <abe@deuxchevaux.org>:
New Bug report received and forwarded. Copy sent to Axel Beckert <abe@deuxchevaux.org>, Julian Gilbey <jdg@debian.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Axel Beckert <abe@deuxchevaux.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: devscripts: uscan: Option for parsing source code (not webpage) for version number
Date: Fri, 27 Oct 2006 02:14:46 +0200
Package: devscripts
Version: 2.9.22
Severity: wishlist

I would like to have an option for uscan which turns off the inclusion
of patterns into href="..." so that the source code itself can be
parsed for version numbers. (And if a newer version is found, the file
parsed is being downloaded respectively saved since it alread had to
be downloaded for parsing. :-)

Example scenario: The packaged program is a simple but useful shell
script. The maintainer has always only the newest version online under
the same URL. uscan should scan the the given URL for version numbers.

Real Life Example:

The upstream URL for the package wikipedia2text is
http://www.256bit.org/~chrisbra/wiki -- In this file there is a line
"VERSION=0.05"

A future watch file for wikipedia2text could look like this:

---snip---
opts=nohref
http://www.256bit.org/~chrisbra/wiki VERSION=([\d\.]+) debian uupdate
---snap---

TIA.

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.4.33.2-1-dphys-k8-smp-64gb
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Versions of packages devscripts depends on:
ii  debianutils                  2.17.3      Miscellaneous utilities specific t
ii  dpkg-dev                     1.13.24     package building tools for Debian
ii  libc6                        2.3.6.ds1-7 GNU C Library: Shared libraries
ii  perl                         5.8.8-6.1   Larry Wall's Practical Extraction 
ii  sed                          4.1.5-1     The GNU sed stream editor

Versions of packages devscripts recommends:
ii  fakeroot                      1.5.10     Gives a fake root environment

-- no debconf information



Changed Bug title. Request was from Mohammed Adnène Trojette <adn+deb@diwi.org> to control@bugs.debian.org. Full text and rfc822 format available.

Changed Bug submitter to 'Axel Beckert <abe@debian.org>' from 'Axel Beckert <abe@deuxchevaux.org>' Request was from Axel Beckert <abe@debian.org> to control@bugs.debian.org. (Wed, 17 Feb 2010 19:15:05 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Devscripts Devel Team <pkg-devscripts@teams.debian.net>:
Bug#395439; Package devscripts. (Sun, 18 Jul 2010 16:21:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Nick Andrik <nick.andrik@gmail.com>:
Extra info received and forwarded to list. Copy sent to Devscripts Devel Team <pkg-devscripts@teams.debian.net>. (Sun, 18 Jul 2010 16:21:07 GMT) Full text and rfc822 format available.

Message #14 received at 395439@bugs.debian.org (full text, mbox):

From: Nick Andrik <nick.andrik@gmail.com>
To: 395439@bugs.debian.org
Subject: [uscan] Option for parsing source code (not webpage) for version number
Date: Sun, 18 Jul 2010 18:18:40 +0200
[Message part 1 (text/plain, inline)]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tags: patch

I have created a patch in order to support this request.

I've introduced an option called fullsourcemangle (any recommendations
for something more meaningful are more than welcome ) whose existence
tells the program to search in the full source instead of just in the
href="..." urls and the actual mangle is used to generate the href link
from the text matched.

For Axel's request, the watch file would be:
- ---snip---
version=3
opts=fullsourcemangle=s/.*/wiki/ \
http://www.256bit.org/~chrisbra/wiki VERSION=([\d\.]+) debian uupdate
- ---snap---

I checked and it works for two of the packages I maintain and they have
no version information in the link (guifications and myspell-el-gr) and
for wikipedia2text package.


I am not a perl expert, so please let me know of any
problems/suggestions for the code.


Now the ideal would be to have a way to specify the upstream version in
the second option of filenamemangle so that we save the downloaded file
in a name containing the actual version, but this is another story :)

- --
=Do-
N.AND


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkxDKWAACgkQrdZ2oYS0I7J7GwCdHxvDKThIS5upppLjX4pJrwRs
mCsAnjbAv1KSDtRGE+A/7p2fjAlThnPF
=s5z/
-----END PGP SIGNATURE-----
[uscan.patch (text/x-patch, inline)]
--- usr/bin/uscan	2009-10-08 23:26:42.000000000 +0200
+++ home/andrikos/temp/uscan	2010-07-18 17:59:25.000000000 +0200
@@ -6,6 +6,7 @@
 # Originally written by Christoph Lameter <clameter@debian.org> (I believe)
 # Modified by Julian Gilbey <jdg@debian.org>
 # HTTP support added by Piotr Roszatycki <dexter@debian.org>
+# Full source search support by Nick Andrik <nick.andrik@gmail.com>
 # Rewritten in Perl, Copyright 2002-2006, Julian Gilbey
 #
 # This program is free software; you can redistribute it and/or modify
@@ -675,6 +676,13 @@
 # opts=downloadurlmangle=s/prdownload/download/ \
 #   http://developer.berlios.de/project/showfiles.php?group_id=2051 \
 #   http://prdownload.berlios.de/softdevice/vdr-softdevice-(.*).tgz
+#
+# The option fullsourcemangle can be used to search in the full text of the page
+# (not just in the href="..." URLs). The substitution has to compute the URL of
+# the file from the matching text. For example:
+# opts=fullsourcemangle=s/.*"(.*)".*/$1/ \
+#   http://foo.bar.org/trac/downloads \
+#   <a\s+href="/trac/downloads/[\d]+">\s*foo-([\d\.]+)\.tar\.bz2\s*</a>
 
 
 sub process_watchline ($$$$$$)
@@ -764,6 +772,9 @@
 		elsif ($opt =~ /^downloadurlmangle\s*=\s*(.+)/) {
 		    @{$options{'downloadurlmangle'}} = split /;/, $1;
 		}
+		elsif ($opt =~ /^fullsourcemangle\s*=\s*(.+)/) {
+		    @{$options{'fullsourcemangle'}} = split /;/, $1;
+		}
 		else {
 		    warn "$progname warning: unrecognised option $opt\n";
 		}
@@ -893,33 +904,63 @@
 
 	print STDERR "$progname debug: matching pattern(s) @patterns\n" if $debug;
 	my @hrefs;
-	while ($content =~ m/<\s*a\s+[^>]*href\s*=\s*([\"\'])(.*?)\1/sgi) {
-	    my $href = $2;
-	    $href =~ s/\n//g;
-	    foreach my $_pattern (@patterns) {
-		if ($href =~ m&^$_pattern$&) {
-		    if ($watch_version == 2) {
-			# watch_version 2 only recognised one group; the code
-			# below will break version 2 watchfiles with a construction
-			# such as file-([\d\.]+(-\d+)?) (bug #327258)
-			push @hrefs, [$1, $href];
-		    } else {
-			# need the map { ... } here to handle cases of (...)?
-			# which may match but then return undef values
-			my $mangled_version =
-			    join(".", map { $_ if defined($_) }
-			 	$href =~ m&^$_pattern$&);
-			foreach my $pat (@{$options{'uversionmangle'}}) {
-			    if (! safe_replace(\$mangled_version, $pat)) {
-				warn "$progname: In $watchfile, potentially"
-				  . " unsafe or malformed uversionmangle"
-				  . " pattern:\n  '$pat'"
-				  . " found. Skipping watchline\n"
-				  . "  $line\n";
-				return 1;
+	if ( exists $options{'fullsourcemangle'} ) {
+	    # We are searching in the whole text of the file, not just the links
+	    while ($content =~ m/($filepattern)/sgi){
+	    	my $href = $1;
+		my $mangled_version = $2;
+		$href =~ s/[\s\n]+/ /g;
+		foreach my $pat (@{$options{'fullsourcemangle'}}) {
+		    if (! safe_replace(\$href, $pat)) {
+		        warn "$progname: In $watchfile, potentially"
+		          . " unsafe or malformed fullsourcemangle"
+		          . " pattern:\n  '$pat'"
+		          . " found. Skipping watchline\n"
+		          . "  $line\n";
+		        return 1;
+		    }
+		}
+		foreach my $pat (@{$options{'uversionmangle'}}) {
+		    if (! safe_replace(\$mangled_version, $pat)) {
+		        warn "$progname: In $watchfile, potentially"
+		          . " unsafe or malformed uversionmangle"
+		          . " pattern:\n  '$pat'"
+		          . " found. Skipping watchline\n"
+		          . "  $line\n";
+		        return 1;
+		    }
+		}
+		push @hrefs, [$mangled_version, $href];
+	    }
+	} else {
+	    while ($content =~ m/<\s*a\s+[^>]*href\s*=\s*([\"\'])(.*?)\1/sgi) {
+		my $href = $2;
+		$href =~ s/\n//g;
+		foreach my $_pattern (@patterns) {
+		    if ($href =~ m&^$_pattern$&) {
+			if ($watch_version == 2) {
+			    # watch_version 2 only recognised one group; the code
+			    # below will break version 2 watchfiles with a construction
+			    # such as file-([\d\.]+(-\d+)?) (bug #327258)
+			    push @hrefs, [$1, $href];
+			} else {
+			    # need the map { ... } here to handle cases of (...)?
+			    # which may match but then return undef values
+			    my $mangled_version =
+				join(".", map { $_ if defined($_) }
+				    $href =~ m&^$_pattern$&);
+			    foreach my $pat (@{$options{'uversionmangle'}}) {
+				if (! safe_replace(\$mangled_version, $pat)) {
+				    warn "$progname: In $watchfile, potentially"
+				      . " unsafe or malformed uversionmangle"
+				      . " pattern:\n  '$pat'"
+				      . " found. Skipping watchline\n"
+				      . "  $line\n";
+				    return 1;
+				}
 			    }
+			    push @hrefs, [$mangled_version, $href];
 			}
-			push @hrefs, [$mangled_version, $href];
 		    }
 		}
 	    }

Information forwarded to debian-bugs-dist@lists.debian.org, Devscripts Devel Team <pkg-devscripts@teams.debian.net>:
Bug#395439; Package devscripts. (Tue, 27 Jul 2010 04:12:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Raphael Geissert <geissert@debian.org>:
Extra info received and forwarded to list. Copy sent to Devscripts Devel Team <pkg-devscripts@teams.debian.net>. (Tue, 27 Jul 2010 04:12:03 GMT) Full text and rfc822 format available.

Message #19 received at 395439@bugs.debian.org (full text, mbox):

From: Raphael Geissert <geissert@debian.org>
To: Nick Andrik <nick.andrik@gmail.com>, 395439@bugs.debian.org
Subject: Re: Bug#395439: [uscan] Option for parsing source code (not webpage) for version number
Date: Tue, 27 Jul 2010 00:08:52 -0400
Hi,

I haven't looked at how it was implemented or anything, but for the record 
(and to prevent surprises:) this new feature should only be introduced on a 
new watch files version, as not to break compatibility.

Cheers,
-- 
Raphael Geissert - Debian Developer
www.debian.org - get.debian.net




Information forwarded to debian-bugs-dist@lists.debian.org, Devscripts Devel Team <pkg-devscripts@teams.debian.net>:
Bug#395439; Package devscripts. (Tue, 27 Jul 2010 12:48:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Nick Andrik <nick.andrik@gmail.com>:
Extra info received and forwarded to list. Copy sent to Devscripts Devel Team <pkg-devscripts@teams.debian.net>. (Tue, 27 Jul 2010 12:48:03 GMT) Full text and rfc822 format available.

Message #24 received at 395439@bugs.debian.org (full text, mbox):

From: Nick Andrik <nick.andrik@gmail.com>
To: Raphael Geissert <geissert@debian.org>
Cc: 395439@bugs.debian.org
Subject: Re: Bug#395439: [uscan] Option for parsing source code (not webpage) for version number
Date: Tue, 27 Jul 2010 13:45:15 +0200
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Raphael,

Since I just introduce a new option (fullsourcemangle), the new version
is backwards compatible with the old one. The old watch files still work
exactly the same if they don't have this option.

Of course we would have to update uuscan (and the other tools that read
the watch file ). If we need to increase the version for that, it is not
a problem, I can update the patch to check for version 4.

Nick



O/H Raphael Geissert έγραψε:
> Hi,
> 
> I haven't looked at how it was implemented or anything, but for the record 
> (and to prevent surprises:) this new feature should only be introduced on a 
> new watch files version, as not to break compatibility.
> 
> Cheers,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkxOxscACgkQrdZ2oYS0I7KMngCg3Na1fn4ht69B58aKUIMSCLZx
2zMAoJ8lBgYIfBlX/jfW+gV6X1hLRvdI
=yKkQ
-----END PGP SIGNATURE-----




Information forwarded to debian-bugs-dist@lists.debian.org, Devscripts Devel Team <pkg-devscripts@teams.debian.net>:
Bug#395439; Package devscripts. (Wed, 27 Jul 2011 20:57:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to barak@cs.nuim.ie:
Extra info received and forwarded to list. Copy sent to Devscripts Devel Team <pkg-devscripts@teams.debian.net>. (Wed, 27 Jul 2011 20:57:03 GMT) Full text and rfc822 format available.

Message #29 received at 395439@bugs.debian.org (full text, mbox):

From: "Barak A. Pearlmutter" <barak@cs.nuim.ie>
To: 395439@bugs.debian.org
Subject: fullsourcemangle vs tar
Date: Wed, 27 Jul 2011 21:53:55 +0100
I have a package whose upstream distributes a tarball, always under
the same URL.  To get the version number out, I need to look in a
particular file in the tarball and find it there.  This is easy to do
with a fragment of shell code: use tar to extract the file to stdout,
pipe to something like awk to find and cut the version out.

My suggestion is that the mechanism here should allow an arbitrary
fragment of shell code to run on the retrieved file, printing the
found version to its stdout.

					--Barak.
--
Barak A. Pearlmutter
 Hamilton Institute & Dept Comp Sci, NUI Maynooth, Co. Kildare, Ireland
 http://www.bcl.hamilton.ie/~barak/




Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Apr 23 13:35:42 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.