Debian Bug report logs - #163194
most debian lists should use SA's FARAWAY tags

Package: lists.debian.org; Maintainer for lists.debian.org is Debian Listmaster Team <listmaster@lists.debian.org>;

Reported by: Santiago Vila <sanvila@unex.es>

Date: Thu, 3 Oct 2002 11:18:02 UTC

Severity: wishlist

Merged with 188604, 196208

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, sanvila@unex.es, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Santiago Vila <sanvila@unex.es>:
New Bug report received and forwarded. Copy sent to sanvila@unex.es, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@unex.es>
To: submit@bugs.debian.org
Subject: most debian lists should use SA's FARAWAY tags
Date: Thu, 3 Oct 2002 13:14:27 +0200 (CEST)
Package: lists.debian.org

Please use Spamassassin's CHARSET_FARAWAY tag and related ones when it's
useful to do so (i.e. most lists).

You don't need different spamd daemons running in parallel. Just
modify /etc/spamassassin/65_debian.cf so that it reads like this:

[...]
# score CHARSET_FARAWAY 0
# score CHARSET_FARAWAY_BODY 0
# score CHARSET_FARAWAY_HEADERS 0
# score SUBJ_FULL_OF_8BITS 0

then create a special user (let's call it "faraway") and create a file
called .spamassassin/user_prefs in faraway's home dir containing this:

score CHARSET_FARAWAY 0
score CHARSET_FARAWAY_BODY 0
score CHARSET_FARAWAY_HEADERS 0
score SUBJ_FULL_OF_8BITS 0

Then you can just call "spamc" for normal lists and "spamc -u faraway"
for the others.

Thanks.




Message sent on to Santiago Vila <sanvila@unex.es>:
Bug#163194. Full text and rfc822 format available.

Message #8 received at 163194-submitter@bugs.debian.org (full text, mbox):

From: listmaster@lists.debian.org
To: 163194-submitter@bugs.debian.org
Subject: FARAWAY tags
Date: Sat, 4 Jan 2003 18:29:08 +1100
I don't understand what the value of doing this would be. Care to
elaborate?

Thanks,
Anand

-- 
 `` We are shaped by our thoughts, we become what we think.
 When the mind is pure, joy follows like a shadow that never
 leaves. '' -- Buddha, The Dhammapada



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Santiago Vila <sanvila@unex.es>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org. Full text and rfc822 format available.

Message #13 received at 163194@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@unex.es>
To: 163194@bugs.debian.org
Subject: Re: Bug#163194: FARAWAY tags
Date: Sat, 4 Jan 2003 14:31:42 +0100 (CET)
On Sat, 4 Jan 2003 listmaster@lists.debian.org wrote:

> I don't understand what the value of doing this would be. Care to
> elaborate?

Many spam have subject lines like this one:

Subject: [GB2312] Ò°ÂùµÄ ÍòÍø£¨www.net.cn) £¡£¡£¡£¡

This could be, in principle, acceptable for a korean or japanese
debian mailing list, but it is not for debian-devel or any other
english-speaking list.

Different lists may have different filtering rules.



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Josip Rodin <joy@gkvk.hr>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org. Full text and rfc822 format available.

Message #18 received at 163194@bugs.debian.org (full text, mbox):

From: Josip Rodin <joy@gkvk.hr>
To: Santiago Vila <sanvila@unex.es>, 163194@bugs.debian.org
Subject: Re: Bug#163194: FARAWAY tags
Date: Sat, 4 Jan 2003 11:04:32 -0600
On Sat, Jan 04, 2003 at 02:31:42PM +0100, Santiago Vila wrote:
> > I don't understand what the value of doing this would be. Care to
> > elaborate?
> 
> Many spam have subject lines like this one:
> 
> Subject: [GB2312] ?????? ??????www.net.cn) ????????
> 
> This could be, in principle, acceptable for a korean or japanese
> debian mailing list, but it is not for debian-devel or any other
> english-speaking list.
> 
> Different lists may have different filtering rules.

Just for the record, I knew exactly what you meant, but never got around to
finding the right procmailrc where I could put the filters. The existence of
hardlinks makes this even more confusing for me. <sigh>

-- 
Joy



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Santiago Vila <sanvila@unex.es>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org. Full text and rfc822 format available.

Message #23 received at 163194@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@unex.es>
To: 163194@bugs.debian.org
Subject: Re: Bug#163194: FARAWAY tags
Date: Tue, 7 Jan 2003 19:12:08 +0100 (CET)
Josip Rodin wrote:
> Santiago Vila wrote:
> > Many spam have subject lines like this one:
> >
> > Subject: [GB2312] ?????? ??????www.net.cn) ????????
> >
> > This could be, in principle, acceptable for a korean or japanese
> > debian mailing list, but it is not for debian-devel or any other
> > english-speaking list.
> >
> > Different lists may have different filtering rules.
>
> Just for the record, I knew exactly what you meant, but never got around to
> finding the right procmailrc where I could put the filters. The existence of
> hardlinks makes this even more confusing for me. <sigh>

If having different spamassassin configurations is not an easy thing
to do, here is a possible way to do it using only procmail rules:


In .etc, create two files:

* A file called "notsofaraway" containing the simple names of the lists for
which we don't want to receive any GB2312 spam, i.e.

debian-devel
debian-user
[...]

* A file called rc.gb2312 containing this:

:0:
* ^Content-Type:.*charset="GB2312"
* $? grep -q ^${list}\$ ${HOME}/.etc/notsofaraway
* !? formail -X"From " -xFrom: -xReply-To: -xSender: -xResent-From: \
    -xResent-Reply-To: -xResent-Sender: -xReturn-Path: | \
    multigram -b1 -m -l$submit_threshold -L$domain \
    -x$listaddr -x$listreq accept accept2
../gb2312-junk.`date +%Y-%m-%d`

This rule:

a) checks that the mail is written using the GB2312 charset.

b) checks that the list for which the mail is destined is in the list
of lists ( .etc/notsofaraway) for which we don't want GB2312 spam.

c) checks that the person sending the mail is not subscribed to the list,
or not registered in the accept2 file (this is the same check which is
done in crossassassin).

Assuming the "notsofaraway" contains the right lists, this will hardly
produce false positives. If all three conditions are met, the mail is
saved to the mbox folder ../gb2312-junk.`date +%Y-%m-%d` and it's not
delivered.


To enable this, make a symlink from rc.gb2312 in every list directory
to ../.etc/rc.gb2312 and add this line:

INCLUDERC=rc.gb2312

to the rc.local.s10 file (*after* the line saying INCLUDERC=rc.crossassassin,
if there is one).

That's all.

Note: By looking at the email I have received from debian lists in the past
three months, I have not receiving a *single* legitimate message using
GB2312 charset, so I consider that doing this using procmail is safe enough.


Thanks.



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Jarno Elonen <elonen@iki.fi>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org. Full text and rfc822 format available.

Message #28 received at 163194@bugs.debian.org (full text, mbox):

From: Jarno Elonen <elonen@iki.fi>
To: 163194@bugs.debian.org
Subject: A possible way without porcmail
Date: Fri, 11 Apr 2003 20:56:53 +0300
How about applying the faraway rule to all but neutralizing the score for 
the few chinese lists? I have never built SA rules, but I guess it would look 
something like this:

body NEUTRALIZE_FARAWAY_CH_LISTS        eval:check_for_faraway_charset()
header   NEUTRALIZE_FARAWAY_CH_LISTS    /* CODE TO DETECT A CHINESE LIST */
score NEUTRALIZE_FARAWAY_CH_LISTS       -3.0
describe NEUTRALIZE_FARAWAY_CH_LISTS    Character set indicates a foreign 
lang.
tflags NEUTRALIZE_FARAWAY_CH_LISTS      userconf




Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Santiago Vila <sanvila@unex.es>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org. Full text and rfc822 format available.

Message #33 received at 163194@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@unex.es>
To: Jarno Elonen <elonen@iki.fi>, 163194@bugs.debian.org
Subject: Re: Bug#163194: A possible way without porcmail
Date: Fri, 11 Apr 2003 23:28:29 +0200 (CEST)
Jarno Elonen wrote:
> How about applying the faraway rule to all but neutralizing the
> score for the few chinese lists? I have never built SA rules, but I
> guess it would look something like this:
>
>
> body NEUTRALIZE_FARAWAY_CH_LISTS        eval:check_for_faraway_charset()
> header   NEUTRALIZE_FARAWAY_CH_LISTS    /* CODE TO DETECT A CHINESE LIST */
> score NEUTRALIZE_FARAWAY_CH_LISTS       -3.0
> describe NEUTRALIZE_FARAWAY_CH_LISTS    Character set indicates a foreign
> lang.
> tflags NEUTRALIZE_FARAWAY_CH_LISTS      userconf

Just because a mail is sent to a chinese list does not mean it must be
in chinese (english is also accepted). What you propose would substract
3.0 points to all english spam on chinese lists, which would add much
more spam on those lists.


What would be really nice is to have a spamc program which accepts an
option to read a different user_prefs file for every list.



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Jarno Elonen <elonen@iki.fi>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org. Full text and rfc822 format available.

Message #38 received at 163194@bugs.debian.org (full text, mbox):

From: Jarno Elonen <elonen@iki.fi>
To: 163194@bugs.debian.org
Subject: Re: Bug#163194: A possible way without porcmail
Date: Sat, 12 Apr 2003 10:47:45 +0300
> What you propose would substract 3.0 points to all english spam
> on chinese lists, which would add much more spam on those lists.

Hmm, maybe I didn't get the syntax right - I thought there was implicit "and" 
operator between the lines. Can you do something like this:

 ??? NEUTRALIZE_FARAWAY_CH_LISTS  (eval:check_for_faraway_charset() && 
                                   /* CODE TO DETECT A CHINESE LIST)

= add spam points to all non-english messages and substract them if the list 
was supposed to accept them. This shouldn't affect english messages at all.

Of course, different rule sets for different lists would be cleaner but if 
that won't work for some reason, this might.

- Jarno



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Santiago Vila <sanvila@unex.es>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>, lists.debian.org@packages.qa.debian.org. Full text and rfc822 format available.

Message #43 received at 163194@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@unex.es>
To: 163194@bugs.debian.org
Subject: Re: Bug#163194: most debian lists should use SA's FARAWAY tags
Date: Thu, 17 Apr 2003 14:13:21 +0200 (CEST)
Here is a possible way of fixing this bug by having per-list
spamassassin preferences:

Call spamd with the following options:

-x --virtual-config=/var/list/SA

[ If you are using the Debian package this should be done by changing
/etc/default/spamassassin and restarting /etc/init.d/spamassassin ].

Then you can add "-u $list" to the procmail line where you call spamc.

Then the SA configuration file that will be used will be
/var/list/SA/${list}.prefs. If such file does not exist,
/var/list/SA/default.prefs will be used instead.

So I suggest that you create /var/list/SA/asiatic.prefs or something
alike, identify the lists that should allow FARAWAY tags and then
symlink $list.prefs to /var/list/SA/asiatic.prefs, leaving all the
others lists with the default "default.prefs" file.

[ I've tested this with SA 2.43, maybe there are even more ways to do
this using SA 2.53 ].

Thanks.



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Duncan Findlay <duncf@debian.org>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>. Full text and rfc822 format available.

Message #48 received at 163194@bugs.debian.org (full text, mbox):

From: Duncan Findlay <duncf@debian.org>
To: 163194@bugs.debian.org
Subject: Use ok_locales/ok_languages instead
Date: Tue, 3 Jun 2003 21:30:09 -0400
[Message part 1 (text/plain, inline)]
Recent versions of spamassassin no longer have these tests commented
out in 65_debian.cf. Instead, these tests depend on the ok_locales
option.

Therefore, the ok_languages and ok_locales options should be used.
This may be slightly difficult to set up, but all english lists should
have "ok_languages en" and "ok_locales en".

Lists in other languages should have that language set in
ok_languages.

Lists in japanese, korean, russian, etc should have ok_locales set
accordingly.

This is all listed in the Mail::SpamAssassin::Conf man page if you
need more info.

As I have often said, I am happy to help with any configuration issues
if you have questions.

-- 
Duncan Findlay
[Message part 2 (application/pgp-signature, inline)]

Merged 163194 196208. Request was from "Artur R. Czechowski" <arturcz@hell.pl> to control@bugs.debian.org. Full text and rfc822 format available.

Merged 163194 188604 196208. Request was from Josip Rodin <joy@srce.hr> to control@bugs.debian.org. Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Josip Rodin <joy@srce.hr>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>. Full text and rfc822 format available.

Message #57 received at 163194@bugs.debian.org (full text, mbox):

From: Josip Rodin <joy@srce.hr>
To: 163194@bugs.debian.org
Subject: language/charset SA checks
Date: Fri, 5 Sep 2003 13:40:30 +0200
Hi,

As an experiment I added this to the generic user_prefs:

ok_locales              en
ok_languages            en
score CHARSET_FARAWAY           0.03
score CHARSET_FARAWAY_HEADERS   0.02
score HTML_CHARSET_FARAWAY      0.005
score MIME_CHARSET_FARAWAY      0.02
score UNDESIRED_LANGUAGE_BODY   0.03

That way we can get some statistics for this bug report.

-- 
     2. That which causes joy or happiness.



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Anand Kumria <wildfire@progsoc.uts.edu.au>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>. Full text and rfc822 format available.

Message #62 received at 163194@bugs.debian.org (full text, mbox):

From: Anand Kumria <wildfire@progsoc.uts.edu.au>
To: 163194@bugs.debian.org
Subject: using SA faraway tag
Date: Sat, 13 Mar 2004 15:44:17 +1100
I've spoken with Pascal as he has done an implementation of this and I
think using his spamassassin virtual setup is appropriate.

Apparently Josip has some concerns - what are they?

/var/list/.spamassassin/languages has the basic implementation and it
looks good to me.

Anand

-- 
 `` We are shaped by our thoughts, we become what we think.
 When the mind is pure, joy follows like a shadow that never
 leaves. '' -- Buddha, The Dhammapada



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Josip Rodin <joy@srce.hr>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>. Full text and rfc822 format available.

Message #67 received at 163194@bugs.debian.org (full text, mbox):

From: Josip Rodin <joy@srce.hr>
To: 163194@bugs.debian.org
Subject: Re: Bug#163194: using SA faraway tag
Date: Sat, 13 Mar 2004 22:41:22 +0100
On Sat, Mar 13, 2004 at 03:44:17PM +1100, Anand Kumria wrote:
> I've spoken with Pascal as he has done an implementation of this and I
> think using his spamassassin virtual setup is appropriate.

I vaguely recall pasc saying he's done something, but not everything...

> Apparently Josip has some concerns - what are they?

How is it apparent, exactly? :)

-- 
     2. That which causes joy or happiness.



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to 163194@bugs.debian.org:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>. Full text and rfc822 format available.

Message #72 received at 163194@bugs.debian.org (full text, mbox):

From: Pascal Hakim <pasc@debian.org>
To: Josip Rodin <joy@srce.hr>, 163194@bugs.debian.org
Subject: Re: Bug#163194: using SA faraway tag
Date: Sun, 14 Mar 2004 13:19:02 +1100
On Sat, Mar 13, 2004 at 10:41:22PM +0100, Josip Rodin wrote:
> On Sat, Mar 13, 2004 at 03:44:17PM +1100, Anand Kumria wrote:
> > I've spoken with Pascal as he has done an implementation of this and I
> > think using his spamassassin virtual setup is appropriate.
> 
> I vaguely recall pasc saying he's done something, but not everything...

It still needs someone to change the way spamd is started.

> > Apparently Josip has some concerns - what are they?
> 
> How is it apparent, exactly? :)

You mentioned on IRC that there were a couple of issues with this. I
can't find the logs from that time, but from memory:

* You were concerned having to maintain around 10 different user_prefs.

I don't think this is that big an issue. It'd be nicer if spam assassin
could include config files, as we then wouldn't have this problem, but
it doesn't appear to do that yet. Still, we wouldn't be maintaining
hundreds of user_prefs, just one for each language. 

* There was something about bayesian filtering (if we decided to use it)

At the moment, there'd be a database for every list. This is probably a
good idea anyway. At any rate, we'd want to split it at least by
language. We can make that change if people want, but it's academic as
we're not using it now, and probably won't until we either distribute it
to other computers, or get faster hardware.

* There was also something about preferring it to be run from procmail
rather than SA.

I've had a look at some of the SA code; I can't say I looked in too
deeply, but I do not think it's worth re-implementing what they've
already done again in procmail. I've played with rc.farway on
debian-devel for example, and while it's caught a far amount of stuff,
I think spam assassin would do a better job.

This is what we've caught so far (just on debian-devel, the only list
running rc.farway):

faraway.2004-01:241
faraway.2004-02:1078
faraway.2004-03:498

Putting this scheme in would result in even more stuff being caught, or
more lists, so I believe it to be worth it.

	Cheers,

Pasc
-- 
Pascal Hakim                                            +61 4 0341 1672



Information forwarded to debian-bugs-dist@lists.debian.org, Martin Schulze and others <listmaster@lists.debian.org>:
Bug#163194; Package lists.debian.org. Full text and rfc822 format available.

Acknowledgement sent to Josip Rodin <joy@srce.hr>:
Extra info received and forwarded to list. Copy sent to Martin Schulze and others <listmaster@lists.debian.org>. Full text and rfc822 format available.

Message #77 received at 163194@bugs.debian.org (full text, mbox):

From: Josip Rodin <joy@srce.hr>
To: 163194@bugs.debian.org
Subject: Re: Bug#163194: using SA faraway tag
Date: Fri, 19 Mar 2004 22:10:38 +0100
On Sun, Mar 14, 2004 at 01:19:02PM +1100, Pascal Hakim wrote:
> * There was something about bayesian filtering (if we decided to use it)
> 
> At the moment, there'd be a database for every list. This is probably a
> good idea anyway. At any rate, we'd want to split it at least by
> language. We can make that change if people want, but it's academic as
> we're not using it now, and probably won't until we either distribute it
> to other computers, or get faster hardware.

We _might_ want to split by language, but there should still be a fair bit
of interlap between those Bayesian databases. Spammers don't discriminate
between language lists.

If you split them up in tiny bits, you'd get less spam on high-volume lists,
and more spam on low-volume lists, which would be quite unproductive (more
people would notice the latter).

> * There was also something about preferring it to be run from procmail
> rather than SA.

I don't recall this...

-- 
     2. That which causes joy or happiness.



Severity set to 'wishlist' from 'normal' Request was from Cord Beermann <cord@debian.org> to control@bugs.debian.org. (Fri, 11 Jun 2010 21:09:06 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Apr 20 21:47:22 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.