Debian Bug report logs - #675106
RFP: pgbulkload -- A high speed data loading utility for PostgreSQL

Package: wnpp; Maintainer for wnpp is wnpp@debian.org;

Reported by: Alexander Kuznetsov <acca@cpan.org>

Date: Tue, 29 May 2012 22:24:01 UTC

Severity: wishlist

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, debian-devel@lists.debian.org, wnpp@debian.org:
Bug#675106; Package wnpp. (Tue, 29 May 2012 22:24:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Alexander Kuznetsov <acca@cpan.org>:
New Bug report received and forwarded. Copy sent to debian-devel@lists.debian.org, wnpp@debian.org. (Tue, 29 May 2012 22:24:04 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Alexander Kuznetsov <acca@cpan.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: ITP: pgbulkload -- A high speed data loading utility for PostgreSQL
Date: Tue, 29 May 2012 18:20:06 -0400
Package: wnpp
Severity: wishlist
Owner: Alexander Kuznetsov <acca@cpan.org>

* Package name    : pgbulkload
  Version         : 3.1.1
  Upstream Author :
		Takahiro Itagaki	itagaki.takahiro @nospam@ gmail.com
		Masao Fujii		masao.fujii @nospam@ gmail.com
		Mitsuru Hasegawa	hasegawa @nospam@ metrosystems.co.jp
		Masahiko Sakamoto	sakamoto_masahiko_b1 @nospam@ lab.ntt.co.jp
		Toru SHIMOGAKI		shimogaki.toru @nospam@ oss.ntt.co.jp
* URL             : http://pgfoundry.org/projects/pgbulkload/
* License         : BSD
  Programming Lang: C, SQL
  Description     : A high speed data loading utility for PostgreSQL
 pg_bulkload is designed to load huge amount of data to a database.
 You can choose whether database constraints are checked and how many errors are
 ignored during the loading. For example, you can skip integrity checks for
 performance when you copy data from another database to PostgreSQL. On the
 other hand, you can enable constraint checks when loading unclean data.
 .
 The original goal of pg_bulkload was an faster alternative of COPY command in
 PostgreSQL, but version 3.0 or later has some ETL features like input data
 validation and data transformation with filter functions.
 .
 In version 3.1, pg_bulkload can convert the load data into the binary file
 which can be used as an input file of pg_bulkload. If you check whether
 the load data is valid when converting it into the binary file, you can skip
 the check when loading it from the binary file to a table. Which would reduce
 the load time itself. Also in version 3.1, parallel loading works
 more effectively than before.




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Alexander Kuznetsov <acca@cpan.org>:
Bug#675106; Package wnpp. (Mon, 04 Jun 2012 23:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Alexander <acca@cpan.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Alexander Kuznetsov <acca@cpan.org>. (Mon, 04 Jun 2012 23:15:02 GMT) Full text and rfc822 format available.

Message #10 received at 675106@bugs.debian.org (full text, mbox):

From: Alexander <acca@cpan.org>
To: 675106@bugs.debian.org
Subject: Re: Bug#675106: ITP: pgbulkload -- A high speed data loading utility for PostgreSQL
Date: Mon, 4 Jun 2012 19:13:02 -0400
Ivan Shmakov <oneingray@gmail.com> writes:
> Alexander Kuznetsov <acca@cpan.org> writes:
[…]
>	(Some wording fixes and suggestions.)

Thanks a lot! For some reasons the message got off the thread, I
accidently found it while searching for another. Also lists.debian.org
cannot find the original post, while GMANE shows it perfectly fine. Is
it supposed to be like that?


[...]
>> ignored during the loading. For example, you can skip integrity checks for
>> performance when you copy data from another database to PostgreSQL. On the
>> other hand, you can enable constraint checks when loading unclean data.
>
>	Are “constraint checks” different to “integrity checks” in the
>	above?  Unless they are, it should rather be, e. g.:

Integrity check does include constraint check but in this case they
are kept separate. The authors emphasize the fact that you can perform
constraint check with pg_bulkload for unclean data while having
[expensive] database server integrity check turned off.


>> PostgreSQL, but version 3.0 or later has some ETL features like input data
>> validation and data transformation with filter functions.
>
>   … but as of version 3.0 some ETL features… were added.
>
>	And what's ETL, BTW?

Enter-Transform-Load - a software development pattern which currently
evolved into an industry. Used to be a nice girl by a keyboard,
nowadays implemented with network clusters.


>> In version 3.1, pg_bulkload can convert the load data into the binary file
>> which can be used as an input file of pg_bulkload. If you check whether
>
>	Perhaps:
>
>   As of version 3.1, pg_bulkload can dump the preprocessed data into a
>   binary file, allowing for…

This would not be entirely true. While pg_bulkload does allow to
convert the data into binary file, it requires assistance of
server-side components of the package. Which one may consider not
pg_bulkload utility itself and this is certainly not simple dumping
preprocessed data.


>	(Here, the purpose should be mentioned.  Is this for improving
>	the performance of later multiple “bulkloads”, for instance?)

I would say the reverse. Multiple `bulkload' instances perform
conversion using multiple [satellite] servers, which may populate
[network] storage. Later a "main" server could pick up preprocessed
data chunks and quickly load them.

To make use of pg_bulkload 3.1+ ability to convert the data into
binary form it is currently required to create a rather specific
setup. I would withhold the promises of better performance as people
would expect "dump binary locally, then upload to the server"
functionality. It may hardly be feasible if, say, the server and the
client have different CPU types.

A single server/single storage case is the worst for the binary
conversion. The process will be constrained by the RAM/storage
bandwidth and slowed down almost twice.


>> the load time itself. Also in version 3.1, parallel loading works
>> more effectively than before.
>	s/effectively/efficiently/.  But the whole sentence makes little
>	sense, as the earlier versions weren't packaged for Debian.

Good point, thanks!

-- 
Sincerely yours, Alexander Kuznetsov




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Alexander Kuznetsov <acca@cpan.org>:
Bug#675106; Package wnpp. (Fri, 16 Aug 2013 17:07:15 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lucas Nussbaum <lucas@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Alexander Kuznetsov <acca@cpan.org>. (Fri, 16 Aug 2013 17:07:16 GMT) Full text and rfc822 format available.

Message #15 received at 675106@bugs.debian.org (full text, mbox):

From: Lucas Nussbaum <lucas@debian.org>
To: 675106@bugs.debian.org
Cc: control@bugs.debian.org
Subject: pgbulkload: changing back from ITP to RFP
Date: Fri, 16 Aug 2013 18:58:53 +0200
retitle 675106 RFP: pgbulkload -- A high speed data loading utility for PostgreSQL
noowner 675106
tag 675106 - pending
thanks

Hi,

A long time ago, you expressed interest in packaging pgbulkload. Unfortunately,
it seems that it did not happen. In Debian, we try not to keep ITP bugs open
for a too long time, as it might cause other prospective maintainers to
refrain from packaging the software.

This is an automatic email to change the status of pgbulkload back from ITP
(Intent to Package) to RFP (Request for Package), because this bug hasn't seen
any activity during the last 14 months.

If you are still interested in packaging pgbulkload, please send a mail to
<control@bugs.debian.org> with:

 retitle 675106 ITP: pgbulkload -- A high speed data loading utility for PostgreSQL
 owner 675106 !
 thanks

It is also a good idea to document your progress on this ITP from time to
time, by mailing <675106@bugs.debian.org>.  If you need guidance on how to
package this software, please reply to this email, and/or contact the
debian-mentors@lists.debian.org mailing list.

Thank you for your interest in Debian,
-- 
Lucas, for the QA team <debian-qa@lists.debian.org>



Changed Bug title to 'RFP: pgbulkload -- A high speed data loading utility for PostgreSQL' from 'ITP: pgbulkload -- A high speed data loading utility for PostgreSQL' Request was from Lucas Nussbaum <lucas@debian.org> to control@bugs.debian.org. (Fri, 16 Aug 2013 17:11:37 GMT) Full text and rfc822 format available.

Removed annotation that Bug was owned by Alexander Kuznetsov <acca@cpan.org>. Request was from Lucas Nussbaum <lucas@debian.org> to control@bugs.debian.org. (Fri, 16 Aug 2013 17:11:38 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sat Apr 19 06:26:10 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.