Debian Bug report logs - #908678
security-tracker - Breaks salsa.d.o

Package: security-tracker; Maintainer for security-tracker is Debian Security Tracker Team <debian-security-tracker@lists.debian.org>;

Reported by: Bastian Blank <waldi@debian.org>

Date: Wed, 12 Sep 2018 13:15:02 UTC

Severity: critical

Tags: bullseye-ignore, buster-ignore, confirmed

Reply or subscribe to this bug.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 12 Sep 2018 13:15:13 GMT) (full text, mbox, link).


Acknowledgement sent to Bastian Blank <waldi@debian.org>:
New Bug report received and forwarded. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 12 Sep 2018 13:15:13 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Bastian Blank <waldi@debian.org>
To: submit@bugs.debian.org
Subject: security-tracker - Breaks salsa.d.o
Date: Wed, 12 Sep 2018 15:10:56 +0200
Package: security-tracker
Severity: critical

The security tracker git repository is in a state which git does not
really like.  git clone takes ages, fsck takes ages, repack is reported
to be impossible.

The GitLab on salsa.d.o also chokes on it some times during git
operations.  Some may be attributed to the old diff formatter problem,
which I hope gets fixed soon.  But lately it even caused stalls on git
operation.

As the problems caused by the state of this repo now causes user visible
outages, this needs to be fixed.

Regards,
Bastian

-- 
I'm a soldier, not a diplomat.  I can only tell the truth.
		-- Kirk, "Errand of Mercy", stardate 3198.9



Added tag(s) confirmed. Request was from Salvatore Bonaccorso <carnil@debian.org> to control@bugs.debian.org. (Thu, 13 Sep 2018 09:03:09 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Thu, 13 Sep 2018 11:39:02 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Thu, 13 Sep 2018 11:39:02 GMT) (full text, mbox, link).


Message #12 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: Bastian Blank <waldi@debian.org>, 908678@bugs.debian.org
Subject: Re: Bug#908678: security-tracker - Breaks salsa.d.o
Date: Thu, 13 Sep 2018 13:37:35 +0200
Hi Bastian,

On Wed, Sep 12, 2018 at 03:10:56PM +0200, Bastian Blank wrote:
> Package: security-tracker
> Severity: critical
> 
> The security tracker git repository is in a state which git does not
> really like.  git clone takes ages, fsck takes ages, repack is reported
> to be impossible.
> 
> The GitLab on salsa.d.o also chokes on it some times during git
> operations.  Some may be attributed to the old diff formatter problem,
> which I hope gets fixed soon.  But lately it even caused stalls on git
> operation.
> 
> As the problems caused by the state of this repo now causes user visible
> outages, this needs to be fixed.

Do you have any hints at us on what we could look at to faciliate/help
more salsa maintainers?

What is actually this old diff formater problem you mentioned which
going to be solved? Would it in the meantime help to make the access
only for logged in users/restricted?

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Thu, 13 Sep 2018 11:48:03 GMT) (full text, mbox, link).


Acknowledgement sent to Paul Wise <pabs@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Thu, 13 Sep 2018 11:48:03 GMT) (full text, mbox, link).


Message #17 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Paul Wise <pabs@debian.org>
To: 908678@bugs.debian.org
Subject: Re: Bug#908678: security-tracker - Breaks salsa.d.o
Date: Thu, 13 Sep 2018 19:44:36 +0800
On Thu, Sep 13, 2018 at 7:37 PM, Salvatore Bonaccorso wrote:

> Do you have any hints at us on what we could look at to faciliate/help
> more salsa maintainers?

I think I read on IRC that the main thing is that the design of git is
not optimised for having large and growing files that change on every
commit. So splitting them up into to one file per CVE/DSA/DLA/etc
might help? Or switching from git to a database or something like
restic or borg.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Sun, 16 Sep 2018 11:33:03 GMT) (full text, mbox, link).


Acknowledgement sent to Bastian Blank <waldi@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Sun, 16 Sep 2018 11:33:03 GMT) (full text, mbox, link).


Message #22 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Bastian Blank <waldi@debian.org>
To: Salvatore Bonaccorso <carnil@debian.org>
Cc: 908678@bugs.debian.org
Subject: Re: Bug#908678: security-tracker - Breaks salsa.d.o
Date: Sun, 16 Sep 2018 13:22:32 +0200
Hi Salvatore

On Thu, Sep 13, 2018 at 01:37:35PM +0200, Salvatore Bonaccorso wrote:
> Do you have any hints at us on what we could look at to faciliate/help
> more salsa maintainers?

Please try to fork that repo.  Git will take a long time to resolve
deltas.  This is due to Git not handling very well the one file appended
in every revision.  To fix it for all times this file needs to be split
up.  With that change in place the repo needs to be rewritten.

We have even one fork of this repo where blobs are missing.

> What is actually this old diff formater problem you mentioned which
> going to be solved? Would it in the meantime help to make the access
> only for logged in users/restricted?

For some requests the diff formatter blocks and runs into the one minute
hard timeout.  This should be fixed with the 11.3 release next week, so
we can ignore that.

Regards,
Bastian

-- 
Those who hate and fight must stop themselves -- otherwise it is not stopped.
		-- Spock, "Day of the Dove", stardate unknown



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Mon, 17 Sep 2018 18:39:03 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Mon, 17 Sep 2018 18:39:03 GMT) (full text, mbox, link).


Message #27 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: Bastian Blank <waldi@debian.org>, 908678@bugs.debian.org
Subject: Re: Bug#908678: security-tracker - Breaks salsa.d.o
Date: Mon, 17 Sep 2018 20:34:57 +0200
Hi Bastian,

On Sun, Sep 16, 2018 at 01:22:32PM +0200, Bastian Blank wrote:
> Hi Salvatore
> 
> On Thu, Sep 13, 2018 at 01:37:35PM +0200, Salvatore Bonaccorso wrote:
> > Do you have any hints at us on what we could look at to faciliate/help
> > more salsa maintainers?
> 
> Please try to fork that repo.  Git will take a long time to resolve
> deltas.  This is due to Git not handling very well the one file appended
> in every revision.  To fix it for all times this file needs to be split
> up.  With that change in place the repo needs to be rewritten.

Just to say, we got your reply. I see that we need to try to improve
that situation, as it has impact as well on other users as well. A
split up of the data/CVE/list file would need updates in various other
tasks and workflows on it. I will try to look into that closer.

> We have even one fork of this repo where blobs are missing.
> 
> > What is actually this old diff formater problem you mentioned which
> > going to be solved? Would it in the meantime help to make the access
> > only for logged in users/restricted?
> 
> For some requests the diff formatter blocks and runs into the one minute
> hard timeout.  This should be fixed with the 11.3 release next week, so
> we can ignore that.

Ok!

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 25 Sep 2018 19:03:06 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 25 Sep 2018 19:03:06 GMT) (full text, mbox, link).


Message #32 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: 908678@bugs.debian.org
Subject: Re: Bug#908678: security-tracker - Breaks salsa.d.o
Date: Tue, 25 Sep 2018 21:00:49 +0200
One suggestion from IRC discussion:

< DLange> summary: suggestions are along the idea of creating list-$year and combine in list for current tools or amend them?




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 26 Sep 2018 07:21:03 GMT) (full text, mbox, link).


Acknowledgement sent to Guido Günther <agx@sigxcpu.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 26 Sep 2018 07:21:03 GMT) (full text, mbox, link).


Message #37 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Guido Günther <agx@sigxcpu.org>
To: Salvatore Bonaccorso <carnil@debian.org>, 908678@bugs.debian.org
Subject: Re: Bug#908678: security-tracker - Breaks salsa.d.o
Date: Wed, 26 Sep 2018 09:19:09 +0200
Hi,
On Tue, Sep 25, 2018 at 09:00:49PM +0200, Salvatore Bonaccorso wrote:
> One suggestion from IRC discussion:
> 
> < DLange> summary: suggestions are along the idea of creating list-$year and combine in list for current tools or amend them?

I think that makes sense. An alternative would be to use shallow clones
(--depth=1) on clones for all the tools (and to recommend it in the
docs).

Did somebody contact git upstream yet? It might be worth showing this
use case.

Cheers,
 -- Guido



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 26 Sep 2018 12:21:10 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Lange <DLange@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 26 Sep 2018 12:21:10 GMT) (full text, mbox, link).


Message #42 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Daniel Lange <DLange@debian.org>
To: 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Some more thoughts and some tests on the security-tracker git repo
Date: Wed, 26 Sep 2018 13:56:16 +0200
The main issue is that we need to get clone and diff+render operations
back into normal time frames. The salsa workers (e.g. to render a
diff) time out after 60s. Similar time constraints are put onto other
rendering frond-ends. Actually you can easily get Apache to segfault
if you do not time-constrain cgi/fcgi type processes.
But that's out of scope here.

Back on topic:

Just splitting the file will not do. We need to (unfortunately)
somehow "get rid" of the history (delta-resolution) walks in git:

	# test setup limits: Network bw: 200 MBit, client system: 4 core

	$ time git clone https://.../debian_security_security-tracker
	Klone nach 'debian_security_security-tracker' ...
	remote: Counting objects: 334274, done.
	remote: Compressing objects: 100% (67288/67288), done.
	remote: Total 334274 (delta 211939), reused 329399 (delta 208905)
	Empfange Objekte: 100% (334274/334274), 165.46 MiB | 21.93 MiB/s, Fertig.
	Löse Unterschiede auf: 100% (211939/211939), Fertig.
	
	real	14m13,159s
	user	27m23,980s
	sys	0m17,068s
	
	# Run the tool already available to split the main CVE/list
	# file into annual files. Thanks Raphael Geissert!
	$ bin/split-by-year
	
	# remove the old big CVE/list file
	$ git rm data/CVE/list
	
	# get the new files into git
	$ git add data/CVE/list.*
	$ git commit --all
	[master a06d3446ca] Remove list and commit bin/split-by-year results
	 21 files changed, 342414 insertions(+), 342414 deletions(-)
	 delete mode 100644 data/CVE/list
	 create mode 100644 data/CVE/list.1999
	 create mode 100644 data/CVE/list.2000
	 create mode 100644 data/CVE/list.2001
	 create mode 100644 data/CVE/list.2002
	 create mode 100644 data/CVE/list.2003
	 create mode 100644 data/CVE/list.2004
	 create mode 100644 data/CVE/list.2005
	 create mode 100644 data/CVE/list.2006
	 create mode 100644 data/CVE/list.2007
	 create mode 100644 data/CVE/list.2008
	 create mode 100644 data/CVE/list.2009
	 create mode 100644 data/CVE/list.2010
	 create mode 100644 data/CVE/list.2011
	 create mode 100644 data/CVE/list.2012
	 create mode 100644 data/CVE/list.2013
	 create mode 100644 data/CVE/list.2014
	 create mode 100644 data/CVE/list.2015
	 create mode 100644 data/CVE/list.2016
	 create mode 100644 data/CVE/list.2017
	 create mode 100644 data/CVE/list.2018
	
	# this one is fast:
	$ git push
	
	# create a new clone
	$ time git clone https://.../debian_security_security-tracker_split_files test-clone
	Klone nach 'test-clone' ...
	remote: Counting objects: 334298, done.
	remote: Compressing objects: 100% (67312/67312), done.
	remote: Total 334298 (delta 211943), reused 329399 (delta 208905)
	Empfange Objekte: 100% (334298/334298), 168.91 MiB | 21.28 MiB/s, Fertig.
	Löse Unterschiede auf: 100% (211943/211943), Fertig.
	
	real	14m35,444s
	user	27m45,500s
	sys	0m21,100s

--> so splitting alone doesn't help. Git is not clever enough to not run
through the deltas of not to be checked-out files.

git 2.18's git2 wire protocol could be used with server-side filtering
but that's an awful hack. Telling people to

git clone --depth 1 #(shallow)

like Guido advises is easier and more reliable for the clone use-case.
For the original repo that will take ~1.5s, for a split-by-year repo ~0.2s.

There are tools to split git files and keep the history
e.g. https://github.com/potherca-bash/git-split-file
but we'd need (to create) one that also zaps the old deltas.
So really "rewrite history" as the git folks tend to call this.
git filter-branch can do this. But it would get somewhat complex and murky
with commits that span CVE/list-year and list-year+1 which are at least 21 for
2018+2017, 19 for 2017+2016 and ~10 for previous year combos.
So I wouldn't put too much effort into that path.

In any case, a repo with just the split files but no maintained history clones
in ~12s in the above test setup. It also brings the (bare) repo down from 3,3GB
to 189MB. So the issue is really the data/CVE/list file.

That said, data/DSA/list is 14575 lines. That seems to not bother git too much
yet. Still if things get re-structured, this file may be worth a look, too.

To me the most reasonable path forward unfortunately looks like start a new repo
for 2019+ and "just" import the split files or single-record files as mentioned
by pabs but not the git/svn/cvs history. The old repo would - of course - stay
around but frozen at a deadline.

Corsac also mentioned on IRC that the repo could be hosted outside of Gitlab.
That would reduce the pressure for some time.
But cgit and other git frontends (as well as backends) we tested also struggle
with the repo (which is why my company, Faster IT GmbH, used the security-tracker
repo as a very welcome test case in the first place).
So that would buy time but not be a solution long(er) term.

Thanks for reading that much!



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 26 Sep 2018 13:18:05 GMT) (full text, mbox, link).


Acknowledgement sent to Guido Günther <agx@sigxcpu.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 26 Sep 2018 13:18:05 GMT) (full text, mbox, link).


Message #47 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Guido Günther <agx@sigxcpu.org>
To: Daniel Lange <DLange@debian.org>
Cc: 908678@bugs.debian.org, Bastian Blank <waldi@debian.org>
Subject: Re: Some more thoughts and some tests on the security-tracker git repo
Date: Wed, 26 Sep 2018 15:15:14 +0200
Hi,
On Wed, Sep 26, 2018 at 01:56:16PM +0200, Daniel Lange wrote:
> The main issue is that we need to get clone and diff+render operations
> back into normal time frames. The salsa workers (e.g. to render a
> diff) time out after 60s. Similar time constraints are put onto other

I wonder why that is since "git diff" is pretty fast on a local
checkout. Did we ask the gitlab folks about it?

[..snip..]
> Just splitting the file will not do. We need to (unfortunately)
> somehow "get rid" of the history (delta-resolution) walks in git:

Not necessarily. Maybe a graft would do:

    https://developer.atlassian.com/blog/2015/08/grafting-earlier-history-with-git/

This is IMHO preferable over history rewrites. I've used this to tie
histories in the past. I've not used "git replace" though but
.git/info/grafts.

Cheers,
 -- Guido



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Fri, 09 Nov 2018 21:09:03 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Fri, 09 Nov 2018 21:09:03 GMT) (full text, mbox, link).


Message #52 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Daniel Lange <DLange@debian.org>, 908678@bugs.debian.org, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Bug#908678: Some more thoughts and some tests on the security-tracker git repo
Date: Fri, 09 Nov 2018 16:05:06 -0500
On 2018-09-26 14:56:16, Daniel Lange wrote:

[...]

> In any case, a repo with just the split files but no maintained history clones
> in ~12s in the above test setup. It also brings the (bare) repo down from 3,3GB
> to 189MB. So the issue is really the data/CVE/list file.

So I've looked in that problem as well, four months ago:

https://salsa.debian.org/security-tracker-team/security-tracker/issues/2

In there I proposed splitting the data/CVE/list file into "one file per
CVE". In retrospect, that was a rather naive approach and yielded all
sorts of problems: there were so many files that it create problems even
for the shell (argument list too long).

I hadn't thought of splitting things in "one *file* per year". That
could really help! Unfortunately, it's hard to simulate what it would
look like *14 years* from now (yes, that's how old that repo is
already).

I can think of two ways to simulate that:

 1. generate commits to recreate all files from scratch: parse
    data/CVE/list, split it up into chunks, and add each CVE in one
    separate commit. it's not *exactly* how things are done now, but it
    should be a close enough approximation

 2. do a crazy filter-branch to send commits to the right
    files. considering how long an initial clone takes, i can't even
    begin to imagine how long *that* would take. but it would be the
    most accurate simulation.

Short of that, I think it's somewhat dishonest to compare a clean
repository with split files against a repository with history over 14
years and thousands of commits. Intuitively, I think you're right and
that "sharding" the data in yearly packets would help a lot git's
performance. But we won't know until we simulate it, and if hit that
problem again 5 years from now, all that work will have been for
nothing. (Although it *would* give us 5 years...)

> That said, data/DSA/list is 14575 lines. That seems to not bother git too much
> yet. Still if things get re-structured, this file may be worth a look, too.

Yeah, I haven't had trouble with that one yet either.

> To me the most reasonable path forward unfortunately looks like start a new repo
> for 2019+ and "just" import the split files or single-record files as mentioned
> by pabs but not the git/svn/cvs history. The old repo would - of course - stay
> around but frozen at a deadline.

In any case, I personally don't think history over those files is that
critical. We rarely dig into that history because it's so
expensive... Any "git annotate" takes forever in this repo, and running
*that* it over data/CVE/list takes tens of minutes.

That said, once we pick a solution, we *could* craft a magic
filter-branch that *would* keep history. It might be worth eating that
performance cost then. I'll run some tests to see if I can make sense of
such a filter.

> Corsac also mentioned on IRC that the repo could be hosted outside of Gitlab.
> That would reduce the pressure for some time.
> But cgit and other git frontends (as well as backends) we tested also struggle
> with the repo (which is why my company, Faster IT GmbH, used the security-tracker
> repo as a very welcome test case in the first place).
> So that would buy time but not be a solution long(er) term.

Agreed. I think the benefits of hosting on gitlab outweigh the trouble
in rearchitecturing our datastore. As I said, it's not just gitlab
that's struggling with a 17MB text file: git itself has trouble dealing
with it as well, and I am often frustrated by that in my work...

A.

-- 
You are absolutely deluded, if not stupid, if you think that a
worldwide collection of software engineers who can't write operating
systems or applications without security holes, can then turn around
and suddenly write virtualization layers without security holes.
                        - Theo de Raadt



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Fri, 09 Nov 2018 23:09:02 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Fri, 09 Nov 2018 23:09:02 GMT) (full text, mbox, link).


Message #57 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Daniel Lange <DLange@debian.org>, 908678@bugs.debian.org, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Bug#908678: Some more thoughts and some tests on the security-tracker git repo
Date: Fri, 09 Nov 2018 18:05:55 -0500
[Message part 1 (text/plain, inline)]
On 2018-11-09 16:05:06, Antoine Beaupré wrote:
>  2. do a crazy filter-branch to send commits to the right
>     files. considering how long an initial clone takes, i can't even
>     begin to imagine how long *that* would take. but it would be the
>     most accurate simulation.
>
> Short of that, I think it's somewhat dishonest to compare a clean
> repository with split files against a repository with history over 14
> years and thousands of commits. Intuitively, I think you're right and
> that "sharding" the data in yearly packets would help a lot git's
> performance. But we won't know until we simulate it, and if hit that
> problem again 5 years from now, all that work will have been for
> nothing. (Although it *would* give us 5 years...)

So I've done that craaaazy filter-branch, on a shallow clone (1000
commits). The original clone is about 30MB, but the split repo is only
4MB.

Cloning the original repo takes a solid 30+ seconds:

[1221]anarcat@curie:src130$ time git clone file://$PWD/security-tracker-1000.orig security-tracker-1000.orig-test
Clonage dans 'security-tracker-1000.orig-test'...
remote: Énumération des objets: 5291, fait.
remote: Décompte des objets: 100% (5291/5291), fait.
remote: Compression des objets: 100% (1264/1264), fait.
remote: Total 5291 (delta 3157), réutilisés 5291 (delta 3157)
Réception d'objets: 100% (5291/5291), 8.80 MiB | 19.47 MiB/s, fait.
Résolution des deltas: 100% (3157/3157), fait.
64.35user 0.44system 0:34.32elapsed 188%CPU (0avgtext+0avgdata 200056maxresident)k
0inputs+58968outputs (0major+48449minor)pagefaults 0swaps

Cloning the split repo takes less than a second:

[1223]anarcat@curie:src$ time git clone file://$PWD/security-tracker-1000-filtered security-tracker-1000-filtered-test
Clonage dans 'security-tracker-1000-filtered-test'...
remote: Énumération des objets: 2214, fait.
remote: Décompte des objets: 100% (2214/2214), fait.
remote: Compression des objets: 100% (1190/1190), fait.
remote: Total 2214 (delta 936), réutilisés 2214 (delta 936)
Réception d'objets: 100% (2214/2214), 1.25 MiB | 22.78 MiB/s, fait.
Résolution des deltas: 100% (936/936), fait.
0.25user 0.04system 0:00.38elapsed 79%CPU (0avgtext+0avgdata 8200maxresident)k
0inputs+8664outputs (0major+3678minor)pagefaults 0swaps

So this is clearly a win, and I think it would be possible to rewrite
the history using the filter-branch command. Commit IDs would change,
but we would keep all commits and so annotate and all that good stuff
would still work.

The split-by-year bash script was too slow for my purposes: it was
taking a solid 15 seconds for each run, which meant it would have taken
9 *days* to process the entire repository.

So I tried to see if this could be optimized, so we could split the file
while keeping history without having to shutdown the whole system for
days. I first rewrote it in Python, which processed the 1000 commits in
801 seconds. This gives an estimate of 15 hours for the 68278 commits I
had locally. Concerned about the Python startup time, I then tried
golang, which processed the tree in 262 seconds, giving final estimate
of 4.8 hours.

Attached are both implementations, for those who want to reproduce my
results. Note that they differ from the original implementation in that
they have to (naturally) remove the data/CVE/list file itself otherwise
it's kept in history.

Here's how to call it:

git -c commit.gpgSign=false filter-branch --tree-filter '/home/anarcat/src/security-tracker/bin/split-by-year.py data/CVE/list' HEAD

Also observe how all gpg commit signatures are (obviously) lost. I have
explicitely disabled that because those actually take a long time to
compute...

I haven't tested if a graft would improve performance, but I suspect it
would not, given the sheer size of the repository that would effectively
need to be carried over anyways.

A.

-- 
Man really attains the state of complete humanity when he produces,
without being forced by physical need to sell himself as a commodity.
                        - Ernesto "Che" Guevara
[split-by-year.go (text/x-golang, inline)]
package main

import (
	"bufio"
	"bytes"
	"io"
	"log"
	"os"
	"strconv"
	"strings"
)

func main() {
	file, err := os.Open("data/CVE/list")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	var (
		line     []byte
		cve      []byte
		year     uint64
		year_str string
		target   *os.File
		header   bool
	)
	fds := make(map[uint64]*os.File, 20)
	scanner := bufio.NewReader(file)
	for {
		line, err = scanner.ReadBytes('\n')

		if bytes.HasPrefix(line, []byte("CVE-")) {

			cve = line
			year_str = strings.Split(string(line), "-")[1]
			year, _ = strconv.ParseUint(year_str, 0, 0)
			header = true
		} else {
			if target, ok := fds[year]; !ok {
				target, err = os.OpenFile("data/CVE/list."+year_str, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
				if err != nil {
					log.Fatal(err)
				}
				fds[year] = target
			}
			if header {
				target.Write(cve)
				header = false
			}
			target.Write(line)
		}
		if err != nil {
			break
		}
	}
	if err != io.EOF {
		log.Fatal(err)
	}
	os.Remove("data/CVE/list")
}
[split-by-year.py (text/x-python, inline)]
#!/usr/bin/python3

import os

data = 'data/CVE/list'

fds = {}

with open(data) as source:
    for line in source:
        if line.startswith('CVE-'):
            cve = line
            year = int(line.split('-')[1])
        else:
            yearly = 'data/CVE/list.{:d}'.format(year)
            target = fds.get(year, None)
            if target is None:
                fds[year] = target = open(yearly, 'a')
            if cve:
                target.write(cve)
                cve = None
            target.write(line)

for year, fd in fds.items():
    fd.close()
os.unlink(data)

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Sat, 10 Nov 2018 18:27:05 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Lange <DLange@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Sat, 10 Nov 2018 18:27:05 GMT) (full text, mbox, link).


Message #62 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Daniel Lange <DLange@debian.org>
To: Antoine Beaupré <anarcat@debian.org>, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Testing the filter-branch scripts
Date: Sat, 10 Nov 2018 18:56:01 +0100
Antoine,

thank you very much for your filter-branch scripts.

I tested each:

1) the golang version:
It completes after 3h36min:

# git filter-branch --tree-filter '/split-by-year' HEAD
Rewrite a09118bf0a33f3721c0b8f6880c4cbb1e407a39d (68282/68286) (12994 seconds passed, remaining 0 predicted)
Ref 'refs/heads/master' was rewritten

But it doesn't Close() the os.OpenFile handles so ...
all data/CVE/list.yyyy files are 0 bytes long. Sic!

I can reproduce that just running the golang executable
against a current checkout of data/CVE/list.

# go version
go version go1.10.3 linux/amd64
(Stretch backport golang-go 2:1.10~5~bpo9+1)

2.1) the Python version
You claim #!/usr/bin/python3 in the shebang, so I tried that first:

# git filter-branch --tree-filter '/usr/bin/python3 /__pycache__/split-by-year.cpython-35.pyc' HEAD
Rewrite 990d3c4bbb49308fb3de1e0e91b9ba5600386f8a (1220/68293) (41 seconds passed, remaining 2254 predicted)
  Traceback (most recent call last):
  File "split-by-year.py", line 13, in <module>
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 5463: invalid start byte
tree filter failed: /usr/bin/python3 /__pycache__/split-by-year.cpython-35.pyc

The offending commit is:
* 990d3c4bbb - Rename sarge-checks data to something not specific to sarge, since we're working on etch now.
  Sorry for the probable annoyance, but it had to be done. (13 years ago) [Joey Hess]

There will be many more like this, so for Python3
this needs needs to be made unicode-agnostic.

Notice I compiled the .py to .pyc which makes it
much faster and thus well usable.

2.2) Python, when a string was a string .. Python2
Your code is actually Python2, so why not give that a try:

# git filter-branch --tree-filter '/usr/bin/python2 /split-by-year.pyc' HEAD
Rewrite b59da20b82011ffcfa6c4a453de9df58ee036b2c (2516/68293) (113 seconds passed, remaining 2954 predicted)
  Traceback (most recent call last):
  File "split-by-year.py", line 18, in <module>
    yearly = 'data/CVE/list.{:d}'.format(year)
NameError: name 'year' is not defined
tree filter failed: /usr/bin/python2 /split-by-year.pyc

The offending commit is:
* b59da20b82 - claim (13 years ago) [Moritz Muehlenhoff]
| diff --git a/data/CVE/list b/data/CVE/list
| index 7b5d1d21d6..cdf0b74dd0 100644
| --- a/data/CVE/list
| +++ b/data/CVE/list
| @@ -1,3 +1,4 @@
| +begin claimed by jmm
|  CVE-2005-3276 (The sys_get_thread_area function in process.c in Linux 2.6 before ...)
|       TODO: check
|  CVE-2005-3275 (The NAT code (1) ip_nat_proto_tcp.c and (2) ip_nat_proto_udp.c in ...)
| @@ -34,6 +35,7 @@ CVE-2005-3260 (Multiple cross-site scripting (XSS) vulnerabilities in ...)
|       TODO: check
|  CVE-2005-3259 (Multiple SQL injection vulnerabilities in versatileBulletinBoard (vBB) ...)
|       TODO: check
| +end claimed by jmm
|  CVE-2005-XXXX [Insecure caching of user id in mantis]
|       - mantis <unfixed> (bug #330682; unknown)
|  CVE-2005-XXXX [Filter information disclosure in mantis]

As you see the line "+begin claimed by jmm" breaks the too simplistic parser logic.
Unfortunately dry-running against a current version of data/CVE/list such errors do not show up.
The "violations" of the file format are transient and buried in history.

Best,
Daniel



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Mon, 12 Nov 2018 17:24:39 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Mon, 12 Nov 2018 17:24:39 GMT) (full text, mbox, link).


Message #67 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Daniel Lange <DLange@debian.org>, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Testing the filter-branch scripts
Date: Mon, 12 Nov 2018 12:22:58 -0500
[Message part 1 (text/plain, inline)]
On 2018-11-10 18:56:01, Daniel Lange wrote:
> Antoine,
>
> thank you very much for your filter-branch scripts.

you're welcome! glad it can be of use.

> I tested each:
>
> 1) the golang version:
> It completes after 3h36min:
>
> # git filter-branch --tree-filter '/split-by-year' HEAD
> Rewrite a09118bf0a33f3721c0b8f6880c4cbb1e407a39d (68282/68286) (12994 seconds passed, remaining 0 predicted)
> Ref 'refs/heads/master' was rewritten
>
> But it doesn't Close() the os.OpenFile handles so ...
> all data/CVE/list.yyyy files are 0 bytes long. Sic!

Well. That explains part of the performance difference. ;)

There were multiple problems with the golang source - variable shadowing
and, yes, a missing Close(). Surprisingly, the fixed version results is
*slower* than the equivalent Python code, taking about one second per
run or 1102 seconds for the last 1000 commits. I'm at a loss as to how I
managed to make go run slower than Python here (and can't help but think
C would have been easier, again). Probably poor programming on my
part. New version attached.

[...]

> 2.1) the Python version
> You claim #!/usr/bin/python3 in the shebang, so I tried that first:
>
> # git filter-branch --tree-filter '/usr/bin/python3 /__pycache__/split-by-year.cpython-35.pyc' HEAD
> Rewrite 990d3c4bbb49308fb3de1e0e91b9ba5600386f8a (1220/68293) (41 seconds passed, remaining 2254 predicted)
>   Traceback (most recent call last):
>   File "split-by-year.py", line 13, in <module>
>   File "/usr/lib/python3.5/codecs.py", line 321, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 5463: invalid start byte
> tree filter failed: /usr/bin/python3 /__pycache__/split-by-year.cpython-35.pyc

I suspected this would be a problem, but didn't find any occurence in
the shallow clone so I forgot about it. Note that the golang version
takes great care to treat the data as binary...

> The offending commit is:
> * 990d3c4bbb - Rename sarge-checks data to something not specific to sarge, since we're working on etch now.
>   Sorry for the probable annoyance, but it had to be done. (13 years ago) [Joey Hess]
>
> There will be many more like this, so for Python3
> this needs needs to be made unicode-agnostic.

... so I rewrote the thing to handle only binary and tested it against
that version of the file. It seems to work fine.

> Notice I compiled the .py to .pyc which makes it
> much faster and thus well usable.

Interesting. I didn't see much difference in performance in my
benchmarks on average, but the worst-case run did improve by 150ms, so I
guess this is worth the trouble. For those who didn't know (like me)
this means running:

    python -m compileall bin/split-by-year.py

Whenever the .py file changes (right?).

> 2.2) Python, when a string was a string .. Python2
> Your code is actually Python2, so why not give that a try:
>
> # git filter-branch --tree-filter '/usr/bin/python2 /split-by-year.pyc' HEAD
> Rewrite b59da20b82011ffcfa6c4a453de9df58ee036b2c (2516/68293) (113 seconds passed, remaining 2954 predicted)
>   Traceback (most recent call last):
>   File "split-by-year.py", line 18, in <module>
>     yearly = 'data/CVE/list.{:d}'.format(year)
> NameError: name 'year' is not defined
> tree filter failed: /usr/bin/python2 /split-by-year.pyc
>
> The offending commit is:
> * b59da20b82 - claim (13 years ago) [Moritz Muehlenhoff]
> | diff --git a/data/CVE/list b/data/CVE/list
> | index 7b5d1d21d6..cdf0b74dd0 100644
> | --- a/data/CVE/list
> | +++ b/data/CVE/list
> | @@ -1,3 +1,4 @@
> | +begin claimed by jmm
> |  CVE-2005-3276 (The sys_get_thread_area function in process.c in Linux 2.6 before ...)
> |       TODO: check
> |  CVE-2005-3275 (The NAT code (1) ip_nat_proto_tcp.c and (2) ip_nat_proto_udp.c in ...)
> | @@ -34,6 +35,7 @@ CVE-2005-3260 (Multiple cross-site scripting (XSS) vulnerabilities in ...)
> |       TODO: check
> |  CVE-2005-3259 (Multiple SQL injection vulnerabilities in versatileBulletinBoard (vBB) ...)
> |       TODO: check
> | +end claimed by jmm
> |  CVE-2005-XXXX [Insecure caching of user id in mantis]
> |       - mantis <unfixed> (bug #330682; unknown)
> |  CVE-2005-XXXX [Filter information disclosure in mantis]
>
> As you see the line "+begin claimed by jmm" breaks the too simplistic parser logic.
> Unfortunately dry-running against a current version of data/CVE/list such errors do not show up.
> The "violations" of the file format are transient and buried in history.

Hmm... That's a trickier one. I guess we could just pretend that line
doesn't exist and drop it from history... But I chose to buffer it and
treat it like the CVE line so it gets attached to the right file. See if
it does what you expect.

   git cat-file -p b59da20b82:data/CVE/list > data/CVE/list.b59da20b82
   split-by-year.py data/CVE/list.b59da20b82

Performance-wise, I shaved off a surprising 60ms by enclosing all the
code in a function (yes, it's crazy), but the buffering to deal with the
above issue added another 40ms so performance should be similar.

I'll start a run on the whole history to see if I can find any problems,
as soon as a first clone finishes resolving those damn deltas. ;)

Thanks for the review!

A.

-- 
Premature optimization is the root of all evil
                        - Donald Knuth
[split-by-year.py (text/x-python, inline)]
#!/usr/bin/python3

import os
import sys


def main(path):
    fds = {}

    year = None
    buffer = b''
    with open(path, 'rb') as source:
        for line in source:
            if line.startswith(b'CVE-'):
                buffer += line
                year = line.split(b'-')[1]
                year = int(year.decode('ascii', errors='surrogateescape'))
            elif year:
                yearly = 'data/CVE/list.{:d}'.format(year)
                target = fds.get(year, None)
                if target is None:
                    fds[year] = target = open(yearly, 'ab')
                if buffer:
                    target.write(buffer)
                    buffer = b''
                target.write(line)
            else:
                buffer += line

    for year, fd in fds.items():
        fd.close()
    os.unlink(path)


if __name__ == '__main__':
    path = 'data/CVE/list'
    if len(sys.argv) > 1:
        path = sys.argv[1]
    main(path)
[split-by-year.go (text/x-golang, inline)]
package main

import (
	"bufio"
	"bytes"
	"io"
	"log"
	"os"
	"strconv"
	"strings"
)

func main() {
	file, err := os.Open("data/CVE/list")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	var (
		line     []byte
		cve      []byte
		year     uint64
		year_str string
		target   *os.File
		header   bool
		ok       bool
	)
	fds := make(map[uint64]*os.File, 20)
	scanner := bufio.NewReader(file)
	for {
		line, err = scanner.ReadBytes('\n')
		if err != nil {
			break
		}
		if bytes.HasPrefix(line, []byte("CVE-")) {
			cve = line
			year_str = strings.Split(string(line), "-")[1]
			year, _ = strconv.ParseUint(year_str, 0, 0)
			header = true
		} else {
			if target, ok = fds[year]; !ok {
				target, err = os.Create("data/CVE/list." + year_str)
				if err != nil {
					log.Fatal(err)
				}
				fds[year] = target
			}
			if header {
				_, err := target.Write(cve)
				if err != nil {
					log.Println("error writing", string(cve), target, err)
					break
				}
				header = false
			}
			_, err := target.Write(line)
			if err != nil {
				log.Println("error writing", string(line), target, err)
				break
			}
		}
	}
	if err != io.EOF {
		log.Fatal(err)
	}
	os.Remove("data/CVE/list")
}

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 13 Nov 2018 16:00:13 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 13 Nov 2018 16:00:14 GMT) (full text, mbox, link).


Message #72 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Daniel Lange <DLange@debian.org>, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Testing the filter-branch scripts
Date: Tue, 13 Nov 2018 10:56:24 -0500
On 2018-11-12 12:22:58, Antoine Beaupré wrote:
> I'll start a run on the whole history to see if I can find any problems,
> as soon as a first clone finishes resolving those damn deltas. ;)

The Python job finished successfully here after 10 hours.

I did some tests on the new git repository. Cloning the repository from
scratch takes around 2 minutes (the original repo: 21 minutes). It is
145MB while the original repo is 1.6GB.

Running git annotate on data/CVE/list.2018 takes about 26 seconds, while
it takes basically forever to annotate the original data/CVE/list. (It's
been running for 10 minutes here already.)

So that's about it. I have not done a thorough job at checking the
actual *integrity* of the results. It's difficult, considering CVE
identifiers are not sequential in the data/CVE/list file, so a naive
diff like this will fail:

$ diff -u <(cat ../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999} ) data/CVE/list | diffstat
 list |106562 +++++++++++++++++++++++++++++++++----------------------------------
 1 file changed, 53281 insertions(+), 53281 deletions(-)

But at least the numbers add up: it looks like no line is lost. And
indeed, it looks like all CVEs add up:

$ diff -u <(cat ../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999} | grep ^CVE | sort -n ) <( grep ^CVE data/CVE/list | sort -n  ) | diffstat
 0 files changed

A cursory look at the diff seems to indicate it is clean, however.

I looked at splitting that file per CVE. That did not scale and just
created new problems. But splitting by *year* seems like a very
efficient switch, and I think it would be worth pursuing that idea
forward.

A.

-- 
There is no cloud, it's just someone else's computer.
                       - Chris Watterson



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 13 Nov 2018 17:18:02 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Lange <DLange@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 13 Nov 2018 17:18:02 GMT) (full text, mbox, link).


Message #77 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Daniel Lange <DLange@debian.org>
To: Antoine Beaupré <anarcat@debian.org>, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Testing the filter-branch scripts
Date: Tue, 13 Nov 2018 18:14:54 +0100
> The Python job finished successfully here after 10 hours.
6h40 mins here as I ported your improved logic to the python2 version :).

# git filter-branch --tree-filter '/usr/bin/python2 /split-by-year.pyc' HEAD
Rewrite 1169d256b27eb7244273671582cc08ba88002819 (68356/68357) (24226 seconds passed, remaining 0 predicted)
Ref 'refs/heads/master' was rewritten

The tree-filter blows up the .git/objects store to 13G though.
But nothing a git gc can't fix.

> 
> I did some tests on the new git repository. Cloning the repository from
> scratch takes around 2 minutes (the original repo: 21 minutes).
Confirmed.

> So that's about it. I have not done a thorough job at checking the
> actual *integrity* of the results. It's difficult, considering CVE
> identifiers are not sequential in the data/CVE/list file, so a naive
> diff like this will fail:
> 
> $ diff -u <(cat ../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999} ) data/CVE/list | diffstat
>  list |106562 +++++++++++++++++++++++++++++++++----------------------------------
>  1 file changed, 53281 insertions(+), 53281 deletions(-)
> 
> But at least the numbers add up: it looks like no line is lost. And
> indeed, it looks like all CVEs add up:
> 
> $ diff -u <(cat ../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999} | grep ^CVE | sort -n ) <( grep ^CVE data/CVE/list | sort -n  ) | diffstat
>  0 files changed
> 
> A cursory look at the diff seems to indicate it is clean, however.

I uploaded "my" version to https://people.debian.org/~dlange/
so people can poke the log and diffs and see whether there are any
issues left.

> I looked at splitting that file per CVE. That did not scale and just
> created new problems. But splitting by *year* seems like a very
> efficient switch, and I think it would be worth pursuing that idea
> forward.

The tools in bin/ would need a brush through. I.e. throw away the
unused ones and amend the ones that are used on data/CVE/* to learn
about the split files.



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 13 Nov 2018 17:24:03 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 13 Nov 2018 17:24:03 GMT) (full text, mbox, link).


Message #82 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Daniel Lange <DLange@debian.org>, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Testing the filter-branch scripts
Date: Tue, 13 Nov 2018 12:22:54 -0500
On 2018-11-13 18:14:54, Daniel Lange wrote:
>> The Python job finished successfully here after 10 hours.
> 6h40 mins here as I ported your improved logic to the python2 version :).
>
> # git filter-branch --tree-filter '/usr/bin/python2 /split-by-year.pyc' HEAD
> Rewrite 1169d256b27eb7244273671582cc08ba88002819 (68356/68357) (24226 seconds passed, remaining 0 predicted)
> Ref 'refs/heads/master' was rewritten
>
> The tree-filter blows up the .git/objects store to 13G though.
> But nothing a git gc can't fix.

Ah but that's because the old repository is still in there. You need to
clone the repo in a clean copy:

git clone file://$PWD/security-tracker security-tracker-filtered

To get the minimal version, i even did that twice although I'm not sure
that's necessary.

[...]

>> I looked at splitting that file per CVE. That did not scale and just
>> created new problems. But splitting by *year* seems like a very
>> efficient switch, and I think it would be worth pursuing that idea
>> forward.
>
> The tools in bin/ would need a brush through. I.e. throw away the
> unused ones and amend the ones that are used on data/CVE/* to learn
> about the split files.

Oh yes, lots of work remains, whether we keep the history or not. That's
probably the *most* work we need to do.

But before going through that trouble, I think we'd need to get approval
from the security team first, as that's quite a lot of work. I figured
we would make a feasability study first...

a.
-- 
On reconnait la grandeur et la valeur d'une nation à la façon dont
celle-ci traite ses animaux.
                        - Mahatma Gandhi



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 13 Nov 2018 22:12:02 GMT) (full text, mbox, link).


Acknowledgement sent to Moritz Muehlenhoff <jmm@inutil.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 13 Nov 2018 22:12:02 GMT) (full text, mbox, link).


Message #87 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Moritz Muehlenhoff <jmm@inutil.org>
To: Antoine Beaupré <anarcat@debian.org>, 908678@bugs.debian.org
Cc: Daniel Lange <DLange@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Bug#908678: Testing the filter-branch scripts
Date: Tue, 13 Nov 2018 23:09:41 +0100
On Tue, Nov 13, 2018 at 12:22:54PM -0500, Antoine Beaupré wrote:
 > But before going through that trouble, I think we'd need to get approval
> from the security team first, as that's quite a lot of work. I figured
> we would make a feasability study first...

The current data structure works very well for us and splitting the files
has many downsides.

If we can't get the repository in run on salsa in a manner that doesn't
impact other repositories (e.g. by disabling the repository browser or
similar), then moving the security tracker repository out of Salsa is
the more likely solution.

Did anyone follow Guido's suggestion to report this upstream to
get their assessment on possible optimisations?

Cheers,
        Moritz




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 14 Nov 2018 06:36:03 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Lange <DLange@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 14 Nov 2018 06:36:03 GMT) (full text, mbox, link).


Message #92 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Daniel Lange <DLange@debian.org>
To: Moritz Muehlenhoff <jmm@inutil.org>, 908678@bugs.debian.org
Cc: Antoine Beaupré <anarcat@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Bug#908678: Testing the filter-branch scripts
Date: Wed, 14 Nov 2018 07:34:03 +0100
Am 13.11.18 um 23:09 schrieb Moritz Muehlenhoff:
> The current data structure works very well for us and splitting the files
> has many downsides.

Could you detail what those many downsides are besides the scripts that
need to be amended?



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 14 Nov 2018 08:30:06 GMT) (full text, mbox, link).


Acknowledgement sent to Guido Günther <agx@sigxcpu.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 14 Nov 2018 08:30:06 GMT) (full text, mbox, link).


Message #97 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Guido Günther <agx@sigxcpu.org>
To: Moritz Muehlenhoff <jmm@inutil.org>, 908678@bugs.debian.org
Cc: Antoine Beaupré <anarcat@debian.org>, Daniel Lange <DLange@debian.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Bug#908678: Testing the filter-branch scripts
Date: Wed, 14 Nov 2018 09:28:10 +0100
Hi,
On Tue, Nov 13, 2018 at 11:09:41PM +0100, Moritz Muehlenhoff wrote:
> On Tue, Nov 13, 2018 at 12:22:54PM -0500, Antoine Beaupré wrote:
>  > But before going through that trouble, I think we'd need to get approval
> > from the security team first, as that's quite a lot of work. I figured
> > we would make a feasability study first...
> 
> The current data structure works very well for us and splitting the files
> has many downsides.
> 
> If we can't get the repository in run on salsa in a manner that doesn't
> impact other repositories (e.g. by disabling the repository browser or
> similar), then moving the security tracker repository out of Salsa is
> the more likely solution.
> 
> Did anyone follow Guido's suggestion to report this upstream to
> get their assessment on possible optimisations?

Just in case someone takes this upstream. I've filed

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=913124

against git a couple of days ago.
Cheers,
 -- Guido



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 14 Nov 2018 18:48:02 GMT) (full text, mbox, link).


Acknowledgement sent to Moritz Muehlenhoff <jmm@inutil.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 14 Nov 2018 18:48:02 GMT) (full text, mbox, link).


Message #102 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Moritz Muehlenhoff <jmm@inutil.org>
To: Daniel Lange <DLange@debian.org>, 908678@bugs.debian.org
Cc: Antoine Beaupré <anarcat@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Bug#908678: Testing the filter-branch scripts
Date: Wed, 14 Nov 2018 19:45:59 +0100
On Wed, Nov 14, 2018 at 07:34:03AM +0100, Daniel Lange wrote:
> Am 13.11.18 um 23:09 schrieb Moritz Muehlenhoff:
> > The current data structure works very well for us and splitting the files
> > has many downsides.
> 
> Could you detail what those many downsides are besides the scripts that
> need to be amended?

Nearly all the tasks of actually editing the data require a look at the complete
data, e.g. to check whether something was tracked before, whether there's an ITP
for something, whether something was tracked as NFU in the past and lots more.

Cheers,
        Moritz



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 14 Nov 2018 19:36:03 GMT) (full text, mbox, link).


Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 14 Nov 2018 19:36:03 GMT) (full text, mbox, link).


Message #107 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Holger Levsen <holger@layer-acht.org>
To: Moritz Muehlenhoff <jmm@inutil.org>, 908678@bugs.debian.org
Cc: Daniel Lange <DLange@debian.org>, Antoine Beaupré <anarcat@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Bug#908678: Testing the filter-branch scripts
Date: Wed, 14 Nov 2018 19:32:02 +0000
[Message part 1 (text/plain, inline)]
On Wed, Nov 14, 2018 at 07:45:59PM +0100, Moritz Muehlenhoff wrote:
> Nearly all the tasks of actually editing the data require a look at the complete
> data, e.g. to check whether something was tracked before, whether there's an ITP
> for something, whether something was tracked as NFU in the past and lots more.

according to git log, the data goes back to 2004. Do you really need all
those 15 years of history or could we maybe make a yearly split for
(now) the first 10 years and have the last 5 years in "one"?

And then when we move into 2019 we would move 2014 to the then 11 first
years and so on... same in 2020 with 2015 then...

IMHO we should do something, else dealing with security-tracker.git will be
even more cumbersome in 5 or 10 years ahead.


-- 
cheers,
	Holger

-------------------------------------------------------------------------------
               holger@(debian|reproducible-builds|layer-acht).org
       PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Wed, 14 Nov 2018 20:51:06 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Wed, 14 Nov 2018 20:51:06 GMT) (full text, mbox, link).


Message #112 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: Moritz Muehlenhoff <jmm@inutil.org>, 908678@bugs.debian.org
Cc: Daniel Lange <DLange@debian.org>, Antoine Beaupré <anarcat@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>
Subject: Re: Bug#908678: Testing the filter-branch scripts
Date: Wed, 14 Nov 2018 21:48:17 +0100
Hi,

On Wed, Nov 14, 2018 at 07:45:59PM +0100, Moritz Muehlenhoff wrote:
> On Wed, Nov 14, 2018 at 07:34:03AM +0100, Daniel Lange wrote:
> > Am 13.11.18 um 23:09 schrieb Moritz Muehlenhoff:
> > > The current data structure works very well for us and splitting the files
> > > has many downsides.
> > 
> > Could you detail what those many downsides are besides the scripts that
> > need to be amended?
> 
> Nearly all the tasks of actually editing the data require a look at the complete
> data, e.g. to check whether something was tracked before, whether there's an ITP
> for something, whether something was tracked as NFU in the past and lots more.

Agreed from my point of view as well, history is and contains valuable
data, we do not want to loose that. And even if researching in older
items and made changes takes time. You will even see that with time
passed people started to put more information in the respective done
changes/commits, giving rationales, notes, and additional informations.

And if that all is going to be too much hassle for the salsa
infrastructure we would need/could move the repository to somewhere
else, with the unfortunate downside on contributors from the whole
comunity.  But admitely the people regularly contributing is
overviewable.

On the agreement side I fully agree that initial clones of the repo
are a problem. It as well would be intreesting to see what git
upstream would think on that usecase and #913124 raised by Guido.

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Thu, 24 Jan 2019 12:21:07 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Lange <DLange@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Thu, 24 Jan 2019 12:21:07 GMT) (full text, mbox, link).


Message #117 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Daniel Lange <DLange@debian.org>
To: 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>
Subject: Update on the security-tracker git discussion
Date: Thu, 24 Jan 2019 12:23:31 +0100
Zobel brought up the security-tracker git discussion in the 
#debian-security irc channel again and I'd like to record a few of the 
items touched there for others that were not present:

DLange has a running mirror of the git repo with split files since three 
months. This is based on anarcat's scripts published previously in this 
bug. The rewriting mirror repo works flawlessly. All history is retained 
sans gpg commit signatures.

Corsac noted that "redoing the tooling is a pain" and anarcat and DLange 
iterated we are willing to help fix the tools. But we need a commitment 
from the security-team that the migration to a split file repo is 
wanted. And we need a prioritized list of tools that need to be 
split-files enabled.

The discussion iterated that "moving elsewhere" doesn't really fix the 
underlying git-usage issue. So while this would take load off salsa, it 
will not improve clone times and hamper collaboration with Debian people 
outside the security team.

Still - to gain some data - DLange tried to push the security-tracker 
repo to github. This bails out as the history contains a file > 100MB 
(hard limit for Github):

remote: error: GH001: Large files detected. You may want to try Git 
Large File Storage - https://git-lfs.github.com.
[..]
remote: error: File data/CVE/allitems.html is 111.44 MB; this exceeds 
GitHub's file size limit of 100.00 MB

So we would have to re-write history for pushing to GitHub. Commits from 
2017-12-29 that introduce "data/CVE/allitems.html" and drop it again 
would need to be modified. Technically all commits after these have to 
be re-written as well. I have not tested whether Github supports 
refs/replace substitutes which would be a work-around.

As noticeable on Salsa and per 
https://gitlab.com/gitlab-com/support-forum/issues/230 Gitlab does not 
enforce per-file size limits.
But the pain of hosting and using this repo is not really different for 
any Gitlab instance.

So that means self-hosting of a non-split-file repo would probably have 
to be on a security DSA machine or similar.

Again, as said above, discussion participants outside the security team 
would prefer a commitment to split the offending data/CVE/list file into 
annual chunks, enable the tooling and stay on salsa.




Added tag(s) bullseye-ignore and buster-ignore. Request was from Paul Gevers <elbrus@debian.org> to control@bugs.debian.org. (Thu, 04 Apr 2019 08:21:11 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Mon, 13 May 2019 16:21:02 GMT) (full text, mbox, link).


Acknowledgement sent to Bastian Blank <waldi@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Mon, 13 May 2019 16:21:03 GMT) (full text, mbox, link).


Message #124 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Bastian Blank <waldi@debian.org>
To: 908678@bugs.debian.org
Cc: Salvatore Bonaccorso <carnil@debian.org>
Subject: Re: Bug#908678: security-tracker - Breaks salsa.d.o
Date: Mon, 13 May 2019 18:08:34 +0200
Hi Salvatore

On Thu, Sep 13, 2018 at 01:37:35PM +0200, Salvatore Bonaccorso wrote:
> On Wed, Sep 12, 2018 at 03:10:56PM +0200, Bastian Blank wrote:
> > As the problems caused by the state of this repo now causes user visible
> > outages, this needs to be fixed.

Please provide a plan how and when to fix this before 2019-06-30.

Just for the record: you must drop the complete project before importing
the rewritten repository.  GitLab keeps all the revisions around, as 
they are associated with jobs.

Regards,
Bastian

-- 
Captain's Log, star date 21:34.5...



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Thu, 06 Jun 2019 05:33:03 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Thu, 06 Jun 2019 05:33:03 GMT) (full text, mbox, link).


Message #129 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: Daniel Lange <DLange@debian.org>, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Thu, 6 Jun 2019 07:31:56 +0200
Hi Daniel,

On Thu, Jan 24, 2019 at 12:23:31PM +0100, Daniel Lange wrote:
> Zobel brought up the security-tracker git discussion in the #debian-security
> irc channel again and I'd like to record a few of the items touched there
> for others that were not present:
> 
> DLange has a running mirror of the git repo with split files since three
> months. This is based on anarcat's scripts published previously in this bug.
> The rewriting mirror repo works flawlessly. All history is retained sans gpg
> commit signatures.
> 
> Corsac noted that "redoing the tooling is a pain" and anarcat and DLange
> iterated we are willing to help fix the tools. But we need a commitment from
> the security-team that the migration to a split file repo is wanted. And we
> need a prioritized list of tools that need to be split-files enabled.
> 
> The discussion iterated that "moving elsewhere" doesn't really fix the
> underlying git-usage issue. So while this would take load off salsa, it will
> not improve clone times and hamper collaboration with Debian people outside
> the security team.
> 
> Still - to gain some data - DLange tried to push the security-tracker repo
> to github. This bails out as the history contains a file > 100MB (hard limit
> for Github):
> 
> remote: error: GH001: Large files detected. You may want to try Git Large
> File Storage - https://git-lfs.github.com.
> [..]
> remote: error: File data/CVE/allitems.html is 111.44 MB; this exceeds
> GitHub's file size limit of 100.00 MB
> 
> So we would have to re-write history for pushing to GitHub. Commits from
> 2017-12-29 that introduce "data/CVE/allitems.html" and drop it again would
> need to be modified. Technically all commits after these have to be
> re-written as well. I have not tested whether Github supports refs/replace
> substitutes which would be a work-around.
> 
> As noticeable on Salsa and per
> https://gitlab.com/gitlab-com/support-forum/issues/230 Gitlab does not
> enforce per-file size limits.
> But the pain of hosting and using this repo is not really different for any
> Gitlab instance.
> 
> So that means self-hosting of a non-split-file repo would probably have to
> be on a security DSA machine or similar.
> 
> Again, as said above, discussion participants outside the security team
> would prefer a commitment to split the offending data/CVE/list file into
> annual chunks, enable the tooling and stay on salsa.

I was planning to take so time in the next days to to re-evaluate your
findings. As this was missing in previous reply thanks Daniel for your
time so far for the above summarization.

Thanks as well for your effort in finding a solution which involves
retaining the history.

Could you again point me to your splitted up variant mirror?

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Thu, 06 Jun 2019 06:39:03 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Lange <DLange@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Thu, 06 Jun 2019 06:39:03 GMT) (full text, mbox, link).


Message #134 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Daniel Lange <DLange@debian.org>
To: Salvatore Bonaccorso <carnil@debian.org>
Cc: 908678@bugs.debian.org, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Thu, 6 Jun 2019 08:35:47 +0200
Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso:
> Could you again point me to your splitted up variant mirror?

https://git.faster-it.de/debian_security_security-tracker_split_files/




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Thu, 06 Jun 2019 16:15:02 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Thu, 06 Jun 2019 16:15:02 GMT) (full text, mbox, link).


Message #139 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: Daniel Lange <DLange@debian.org>, 908678@bugs.debian.org
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Thu, 6 Jun 2019 18:11:53 +0200
Hi Daniel,

On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote:
> Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso:
> > Could you again point me to your splitted up variant mirror?
> 
> https://git.faster-it.de/debian_security_security-tracker_split_files/

Thanks!

While starting to look at it, could you change the splitting to
$year.list instead of list.$year? I know this comes from the initial
script which was commited. It is though more intuitive working with
$work.something than something.$year in this context.

Thanks already!

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Sat, 08 Jun 2019 16:33:03 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Sat, 08 Jun 2019 16:33:03 GMT) (full text, mbox, link).


Message #144 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: 908678@bugs.debian.org
Cc: Daniel Lange <DLange@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>, holger@debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Sat, 8 Jun 2019 18:29:24 +0200
Hi,

On Thu, Jun 06, 2019 at 06:11:53PM +0200, Salvatore Bonaccorso wrote:
> Hi Daniel,
> 
> On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote:
> > Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso:
> > > Could you again point me to your splitted up variant mirror?
> > 
> > https://git.faster-it.de/debian_security_security-tracker_split_files/
> 
> Thanks!
> 
> While starting to look at it, could you change the splitting to
> $year.list instead of list.$year? I know this comes from the initial
> script which was commited. It is though more intuitive working with
> $work.something than something.$year in this context.

Thanks to Daniel for providing the converted repository (with list
named as well the other way around as $year.list, which is more
intuitive, and looks saner (to me)) which get updated regularly, this
helps as a extremly good basis.

Below are some thoughs which I started thinking of during the last few
days, please not it might not yet be complete. Please as well try to
not push/force us too much -- whilst we understand the issue, and see
that something whatever the solution is (split, move somewhere else)
-- we have regularly more serious issues popping up we want and need
to look at those. But we acknowledge and see als well salsa admin
point of view.

That said, here is what I have at the moment, some are easy, some
will/might be more involving.

Notes on possible CVE/list splits
---------------------------------

- workflows on files itself by most active users. Often kept open
  cross-checking issues all issues in one file. But this will "just"
  need other ways to deal with the situation by the persons working
  most on it.
- Code of security-tracker service and python modules itself which
  currently rely on the data/*/list formats (DSA, DLA, CVE, ...) This
  could probably be split up and use data/*/*.list
- Externally called but included in code: update script which fetches
  MITRE list and integrates all needed changes (see further below).
- bin/bts-update (called from scripts/update-CVE-assignments in cron of
  the securiy-tracker-services) operates based on data/CVE/list and
  keeps track of the already tagged bugs by comparing with an 'oldlist'.
  The oldlist is copied on a run on soriano.debian.org as 'state' file
  similar to logroate's statefile (cron).
- bin/check-new-issues: parsing of TODO and checks for the new issues is
  as well based on 'data/CVE/list' existence and parsing. After a split
  up the interactive commands should still be able to navigate trough
  the items.
- bin/check-syntax: Check syntax of the various lists based on the security-
  tracker parser for the lists. make check-syntax from the Makefile, pre-
  commit hook or C/I tests are all using this script for syntax check.
  Depends on CVEfile as well from python/bugs.py. Relevant here is the
  check-syntax target from the Makefile. At SVN times this was actually
  only testing the syntax of the changed files, but now it just runs
  make check-syntax.
- bin/compare-nvd-cve reads from data/CVE/list and this is probably
  easier to adapt and it's used basically in a "experimental" target in
  Makefile for update-compare-nvd target. AFAICS this is just reading
  the information should be easy to adapt to any split up setup.
- bin/gen-{DSA,DLA}: Used the data/CVE/list for sanity check for
  presence of the CVE.
- bin/get-todo-items (this script is currently not working correctly and
  it's implemented already via the webview, so need to consider if we
  actually still need it).
- bin/inject-embedded-code-copies (experimental script, not
  actively used)
- bin/rejected-with-info relies on data/CVE/list directly, but will be
  potentially easily adaptable in a splited setup.
- bin/setup-repo: checks for data/CVE/list just to make sure it's the
  right repo.
- bin/report-vuln uses CVEFile (from python/bugs.py).
- bin/update and bin/updatelist: Parses DSA/DTSA/DLA list and
  data/CVE/list adding new entries from MITRE feed and crossreferences
  for the DSA/DLA's to a new data/CVE/list which then in the cronjob on
  soriano will be committed. That is one processing those files in a
  splitted setup this will need continue to work.
- bin/update-db (Used triggered by Makefile target to update security.db
  sqlite database).
- bin/update-nvd (possibly dependency on the CVE lists via the used
  modules but not directly).
- data/config.json contains the sources for CVE, DSA, DLA and extended
  lists. Currently path thus will be a path component starting from
  data, e.g. for CVE files path is '/CVE/list'. See as well "Setting up
  an extended instance" in the documentation.
- lib/python/bugs.py contains the classes CVEFile, DSAFile,
  CVEExtendFile.
- lib/python/debian_support.py: defines the getconfig function reading
  data/config.json.
- lib/python/security_db.py, via getSources get the configuration from
  where to read CVE, DSA, DLA, Extends information defined in
  config.json.

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Sun, 09 Jun 2019 10:51:03 GMT) (full text, mbox, link).


Acknowledgement sent to Guido Günther <agx@sigxcpu.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Sun, 09 Jun 2019 10:51:03 GMT) (full text, mbox, link).


Message #149 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Guido Günther <agx@sigxcpu.org>
To: Salvatore Bonaccorso <carnil@debian.org>, 908678@bugs.debian.org
Cc: Daniel Lange <DLange@debian.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>, holger@debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Sun, 9 Jun 2019 12:46:32 +0200
Hi Salvatore,
On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote:
> Hi,
> 
> On Thu, Jun 06, 2019 at 06:11:53PM +0200, Salvatore Bonaccorso wrote:
> > Hi Daniel,
> > 
> > On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote:
> > > Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso:
> > > > Could you again point me to your splitted up variant mirror?
> > > 
> > > https://git.faster-it.de/debian_security_security-tracker_split_files/
> > 
> > Thanks!
> > 
> > While starting to look at it, could you change the splitting to
> > $year.list instead of list.$year? I know this comes from the initial
> > script which was commited. It is though more intuitive working with
> > $work.something than something.$year in this context.
> 
> Thanks to Daniel for providing the converted repository (with list
> named as well the other way around as $year.list, which is more
> intuitive, and looks saner (to me)) which get updated regularly, this
> helps as a extremly good basis.
> 
> Below are some thoughs which I started thinking of during the last few
> days, please not it might not yet be complete. Please as well try to
> not push/force us too much -- whilst we understand the issue, and see
> that something whatever the solution is (split, move somewhere else)
> -- we have regularly more serious issues popping up we want and need
> to look at those. But we acknowledge and see als well salsa admin
> point of view.
> 
> That said, here is what I have at the moment, some are easy, some
> will/might be more involving.
> 
> Notes on possible CVE/list splits
> ---------------------------------
> 
> - workflows on files itself by most active users. Often kept open
>   cross-checking issues all issues in one file. But this will "just"
>   need other ways to deal with the situation by the persons working
>   most on it.
> - Code of security-tracker service and python modules itself which
>   currently rely on the data/*/list formats (DSA, DLA, CVE, ...) This
>   could probably be split up and use data/*/*.list
> - Externally called but included in code: update script which fetches
>   MITRE list and integrates all needed changes (see further below).
> - bin/bts-update (called from scripts/update-CVE-assignments in cron of
>   the securiy-tracker-services) operates based on data/CVE/list and
>   keeps track of the already tagged bugs by comparing with an 'oldlist'.
>   The oldlist is copied on a run on soriano.debian.org as 'state' file
>   similar to logroate's statefile (cron).
> - bin/check-new-issues: parsing of TODO and checks for the new issues is
>   as well based on 'data/CVE/list' existence and parsing. After a split
>   up the interactive commands should still be able to navigate trough
>   the items.
> - bin/check-syntax: Check syntax of the various lists based on the security-
>   tracker parser for the lists. make check-syntax from the Makefile, pre-
>   commit hook or C/I tests are all using this script for syntax check.
>   Depends on CVEfile as well from python/bugs.py. Relevant here is the
>   check-syntax target from the Makefile. At SVN times this was actually
>   only testing the syntax of the changed files, but now it just runs
>   make check-syntax.
> - bin/compare-nvd-cve reads from data/CVE/list and this is probably
>   easier to adapt and it's used basically in a "experimental" target in
>   Makefile for update-compare-nvd target. AFAICS this is just reading
>   the information should be easy to adapt to any split up setup.
> - bin/gen-{DSA,DLA}: Used the data/CVE/list for sanity check for
>   presence of the CVE.
> - bin/get-todo-items (this script is currently not working correctly and
>   it's implemented already via the webview, so need to consider if we
>   actually still need it).
> - bin/inject-embedded-code-copies (experimental script, not
>   actively used)
> - bin/rejected-with-info relies on data/CVE/list directly, but will be
>   potentially easily adaptable in a splited setup.
> - bin/setup-repo: checks for data/CVE/list just to make sure it's the
>   right repo.
> - bin/report-vuln uses CVEFile (from python/bugs.py).
> - bin/update and bin/updatelist: Parses DSA/DTSA/DLA list and
>   data/CVE/list adding new entries from MITRE feed and crossreferences
>   for the DSA/DLA's to a new data/CVE/list which then in the cronjob on
>   soriano will be committed. That is one processing those files in a
>   splitted setup this will need continue to work.
> - bin/update-db (Used triggered by Makefile target to update security.db
>   sqlite database).
> - bin/update-nvd (possibly dependency on the CVE lists via the used
>   modules but not directly).
> - data/config.json contains the sources for CVE, DSA, DLA and extended
>   lists. Currently path thus will be a path component starting from
>   data, e.g. for CVE files path is '/CVE/list'. See as well "Setting up
>   an extended instance" in the documentation.
> - lib/python/bugs.py contains the classes CVEFile, DSAFile,
>   CVEExtendFile.
> - lib/python/debian_support.py: defines the getconfig function reading
>   data/config.json.
> - lib/python/security_db.py, via getSources get the configuration from
>   where to read CVE, DSA, DLA, Extends information defined in
>   config.json.

Maybe this helps to cut down on the list of things to tackle:

For things needing the whole history and only requiring r/o access we
could just add a makefile target that creates data/CVE/list from the
split files and have that in .gitignore. For tools writing they usually
only need the latest file so we could have a

    data/CVE/latest -> data/CVE/2019.list

comitted to git that gets moved once a year. Let me know if that makes
sense and I can help with that.

Cheers,
 -- Guido

> 
> Regards,
> Salvatore
> 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Sun, 09 Jun 2019 11:51:03 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Sun, 09 Jun 2019 11:51:03 GMT) (full text, mbox, link).


Message #154 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: 908678@bugs.debian.org, Daniel Lange <DLange@debian.org>
Cc: Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>, holger@debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Sun, 9 Jun 2019 13:48:58 +0200
On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote:
> Notes on possible CVE/list splits
> ---------------------------------
[...]

After a face-to-face conversation with Daniel, Daniel suggested to
create a priority list out of that, we will followup with that to that
(ideally as gitlab task-list) here with a link once we have made our
minds on it.

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Mon, 17 Jun 2019 08:51:03 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Lange <DLange@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Mon, 17 Jun 2019 08:51:03 GMT) (full text, mbox, link).


Message #159 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Daniel Lange <DLange@debian.org>
To: 908678@bugs.debian.org
Cc: Salvatore Bonaccorso <carnil@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>
Subject: Split file repo v2
Date: Mon, 17 Jun 2019 10:48:22 +0200
as requested in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908678#139
we have created a data/CVE/yyyy.list repo ("v2") during MiniDebConf HH

It is mirrored at Salsa:
https://salsa.debian.org/dlange/debian_security_security-tracker_split_files_v2



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Mon, 24 Jun 2019 12:00:05 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Mon, 24 Jun 2019 12:00:06 GMT) (full text, mbox, link).


Message #164 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: 908678@bugs.debian.org
Cc: Daniel Lange <DLange@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>, holger@debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Mon, 24 Jun 2019 13:57:37 +0200
Hi,

On Sun, Jun 09, 2019 at 01:48:58PM +0200, Salvatore Bonaccorso wrote:
> On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote:
> > Notes on possible CVE/list splits
> > ---------------------------------
> [...]
> 
> After a face-to-face conversation with Daniel, Daniel suggested to
> create a priority list out of that, we will followup with that to that
> (ideally as gitlab task-list) here with a link once we have made our
> minds on it.

The plan was initially to do that in that week. Due to some other
issues (Debian related, and other) this was not possible. The plan
still holds to prioritize these tasks so that people wanting to help
contribute have something to tackle.

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 02 Jul 2019 11:27:06 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 02 Jul 2019 11:27:06 GMT) (full text, mbox, link).


Message #169 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: 908678@bugs.debian.org
Cc: Daniel Lange <DLange@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>, holger@debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Tue, 2 Jul 2019 13:25:43 +0200
Hi,

On Mon, Jun 24, 2019 at 01:57:36PM +0200, Salvatore Bonaccorso wrote:
> Hi,
> 
> On Sun, Jun 09, 2019 at 01:48:58PM +0200, Salvatore Bonaccorso wrote:
> > On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote:
> > > Notes on possible CVE/list splits
> > > ---------------------------------
> > [...]
> > 
> > After a face-to-face conversation with Daniel, Daniel suggested to
> > create a priority list out of that, we will followup with that to that
> > (ideally as gitlab task-list) here with a link once we have made our
> > minds on it.
> 
> The plan was initially to do that in that week. Due to some other
> issues (Debian related, and other) this was not possible. The plan
> still holds to prioritize these tasks so that people wanting to help
> contribute have something to tackle.

So I'm starting to track those here be better/more easily track work
on those:
https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1
(but they need to reshuffle an consolidate the items). Basically
before the switch the two major topics (the security-tracker code base
itself) and tools involved in the workflow for triaging/updating CVEs
need to be adapted to a split repo situation, which makes many of the
items go into the first group anyway, but not all.

So slow still work in progress.

On personal note, it would be nice to have some dedicated time for
this only, but ...

Regards,
Salvatore

p.s.: Question is if we should do a split as well for the other types of
      files which are supported (DSA, TDSA, ...) while at it.



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 02 Jul 2019 11:42:03 GMT) (full text, mbox, link).


Acknowledgement sent to Moritz Muehlenhoff <jmm@inutil.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 02 Jul 2019 11:42:03 GMT) (full text, mbox, link).


Message #174 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Moritz Muehlenhoff <jmm@inutil.org>
To: Salvatore Bonaccorso <carnil@debian.org>, 908678@bugs.debian.org
Cc: Daniel Lange <DLange@debian.org>, Guido Günther <agx@sigxcpu.org>, Bastian Blank <waldi@debian.org>, Martin Zobel-Helas <zobel@debian.org>, holger@debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Tue, 2 Jul 2019 13:38:10 +0200
On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote:
> p.s.: Question is if we should do a split as well for the other types of
>       files which are supported (DSA, TDSA, ...) while at it.

We can axe out DTSA/* while we're at it.

For DSA/list (and DLA/list) we can initially keep it as a single file, it can
still be split later on if necessary.

Cheers,
        Moritz



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 06 Aug 2019 06:15:04 GMT) (full text, mbox, link).


Acknowledgement sent to Bastian Blank <waldi@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 06 Aug 2019 06:15:04 GMT) (full text, mbox, link).


Message #179 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Bastian Blank <waldi@debian.org>
To: 908678@bugs.debian.org
Cc: Moritz Muehlenhoff <jmm@inutil.org>, Salvatore Bonaccorso <carnil@debian.org>, Daniel Lange <DLange@debian.org>, Guido Günther <agx@sigxcpu.org>, Martin Zobel-Helas <zobel@debian.org>, holger@debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Tue, 6 Aug 2019 08:05:11 +0200
Moin

On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote:
> On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote:
> > p.s.: Question is if we should do a split as well for the other types of
> >       files which are supported (DSA, TDSA, ...) while at it.
> We can axe out DTSA/* while we're at it.
> For DSA/list (and DLA/list) we can initially keep it as a single file, it can
> still be split later on if necessary.

Following up to 

| Please provide a plan how and when to fix this before 2019-06-30.

We have now one month later.  Please provide the plan.

Bastian

-- 
We do not colonize.  We conquer.  We rule.  There is no other way for us.
		-- Rojan, "By Any Other Name", stardate 4657.5



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Tue, 06 Aug 2019 06:33:02 GMT) (full text, mbox, link).


Acknowledgement sent to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Tue, 06 Aug 2019 06:33:02 GMT) (full text, mbox, link).


Message #184 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Salvatore Bonaccorso <carnil@debian.org>
To: Bastian Blank <waldi@debian.org>, 908678@bugs.debian.org
Cc: Moritz Muehlenhoff <jmm@inutil.org>, Daniel Lange <DLange@debian.org>, Guido Günther <agx@sigxcpu.org>, Martin Zobel-Helas <zobel@debian.org>, holger@debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Tue, 6 Aug 2019 08:28:43 +0200
Hi Bastian,

Thanks for keeping track and following up.

On Tue, Aug 06, 2019 at 08:05:11AM +0200, Bastian Blank wrote:
> Moin
> 
> On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote:
> > On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote:
> > > p.s.: Question is if we should do a split as well for the other types of
> > >       files which are supported (DSA, TDSA, ...) while at it.
> > We can axe out DTSA/* while we're at it.
> > For DSA/list (and DLA/list) we can initially keep it as a single file, it can
> > still be split later on if necessary.
> 
> Following up to 
> 
> | Please provide a plan how and when to fix this before 2019-06-30.
> 
> We have now one month later.  Please provide the plan.

The items in
https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1
needs further detailed and then sorted/prioritized. Later actual
implementation work on making the split possible on tracker and other
tooling side needs to happen. We cannot depend on a non-functional
instance for the day to day work, so all of the above basically will
need to be ported in some sensible way.

Progress is slow due to other time limitations in day to day tasks.

Still if it is going to be too much burden for salsa admin and needs
to be fast, then I only see that we temporarily switch away from salsa
to gitlab or another hosting (github will not work) and then move back
once the split has finally happened.

Regards,
Salvatore



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Security Tracker Team <debian-security-tracker@lists.debian.org>:
Bug#908678; Package security-tracker. (Fri, 02 Oct 2020 20:18:03 GMT) (full text, mbox, link).


Acknowledgement sent to Sylvain Beucler <beuc@beuc.net>:
Extra info received and forwarded to list. Copy sent to Debian Security Tracker Team <debian-security-tracker@lists.debian.org>. (Fri, 02 Oct 2020 20:18:03 GMT) (full text, mbox, link).


Message #189 received at 908678@bugs.debian.org (full text, mbox, reply):

From: Sylvain Beucler <beuc@beuc.net>
To: 908678@bugs.debian.org
Subject: Re: Bug#908678: Update on the security-tracker git discussion
Date: Fri, 2 Oct 2020 22:04:32 +0200
Hi,

On Tue, 6 Aug 2019 08:28:43 +0200 Salvatore Bonaccorso wrote:
> Thanks for keeping track and following up.
> 
> On Tue, Aug 06, 2019 at 08:05:11AM +0200, Bastian Blank wrote:
> > Moin
> > 
> > On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote:
> > > On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote:
> > > > p.s.: Question is if we should do a split as well for the other types of
> > > >       files which are supported (DSA, TDSA, ...) while at it.
> > > We can axe out DTSA/* while we're at it.
> > > For DSA/list (and DLA/list) we can initially keep it as a single file, it can
> > > still be split later on if necessary.
> > 
> > Following up to 
> > 
> > | Please provide a plan how and when to fix this before 2019-06-30.
> > 
> > We have now one month later.  Please provide the plan.
> 
> The items in
> https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1
> needs further detailed and then sorted/prioritized. Later actual
> implementation work on making the split possible on tracker and other
> tooling side needs to happen. We cannot depend on a non-functional
> instance for the day to day work, so all of the above basically will
> need to be ported in some sensible way.
> 
> Progress is slow due to other time limitations in day to day tasks.
> 
> Still if it is going to be too much burden for salsa admin and needs
> to be fast, then I only see that we temporarily switch away from salsa
> to gitlab or another hosting (github will not work) and then move back
> once the split has finally happened.

It seems a bit difficult to make a big switch, probably because it's not
easy to know and test all the various involved scripts.

Considering a more progressive approach, is there something preventing
us from switching to the rewritten repository and split/merging the
file, something like:

diff --git a/conf/post-merge b/conf/post-merge
new file mode 100755
index 0000000000..a9991c1cc9
--- /dev/null
+++ b/conf/post-merge
@@ -0,0 +1,3 @@
+#!/bin/sh
+echo "post-merge"
+[ -f data/CVE/1999.list ] && cat data/CVE/*.list > data/CVE/list
diff --git a/conf/pre-commit b/conf/pre-commit
index 767e478e36..12e781e97d 100755
--- a/conf/pre-commit
+++ b/conf/pre-commit
@@ -5,3 +5,4 @@ set -e
 exec 1>&2

 make check-syntax
+bin/split-by-year.py

?

Cheers!
Sylvain



Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Nov 21 23:37:32 2024; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.