You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2012/03/29 07:41:43 UTC

[jira] [Commented] (SQOOP-390) PostgreSQL connector for direct export with pg_bulkload

    [ https://issues.apache.org/jira/browse/SQOOP-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240995#comment-13240995 ] 

jiraposter@reviews.apache.org commented on SQOOP-390:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review6510
-----------------------------------------------------------


Hello Masatake, I have a quick question about your patch.

Since you're adding new tests to test/com/cloudera/sqoop/manager, "ant test -Dthirdparty=true" will fail unless the test requirements are not installed (PostgreSql 9.1, pg_bulkload 3.1, etc).

I think that you can do either:

1) Not run your tests with "ant test -Dthirdparty=true".

or

2) Update the wiki page and user doc for setting up development env.

But since your tests are for a specific connector, I think that #1 makes more sense. Please let me know what you think.

Thanks,
Cheolsoo

- Cheolsoo


On 2012-01-13 05:23:36, Masatake Iwasaki wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2724/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 05:23:36)
bq.  
bq.  
bq.  Review request for Sqoop.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Patch for SQOOP-390
bq.  https://issues.apache.org/jira/browse/SQOOP-390
bq.  
bq.  
bq.  This addresses bug SQOOP-390.
bq.      https://issues.apache.org/jira/browse/SQOOP-390
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/java/com/cloudera/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
bq.    /src/java/com/cloudera/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
bq.    /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
bq.    /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
bq.    /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
bq.    /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerTest.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2724/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  This patch include the test class PGBulkloadManagerTest.
bq.  I've tested "ant test" and passed.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Masatake
bq.  
bq.


                
> PostgreSQL connector for direct export with pg_bulkload
> -------------------------------------------------------
>
>                 Key: SQOOP-390
>                 URL: https://issues.apache.org/jira/browse/SQOOP-390
>             Project: Sqoop
>          Issue Type: New Feature
>            Reporter: Masatake Iwasaki
>         Attachments: SQOOP-390-1.patch
>
>
> h1. Features
> * Fast data export into PostgreSQL database with [pg_bulkload|http://pgbulkload.projects.postgresql.org/index.html].
> * User can get benefit of functionality of pg_bulkload such as
> ** fast export bypassing shared bufferes and WAL,
> ** removing invalid data which cause parse error,
> ** ETL feature with filter functions.
> h1. Implementation
> h2. PGBulkloadExportMapper
> At first, each map tasks create their own staging table with names based on task attempt id because pg_bulkload holds table level lock.
> Arguments of pg_bulkload command can be passed via configuration object.
> The Mapper export data by invoking pg_bulkload command as external process.
> Commnad execution is done in the same way as DirectPostgresqlManager.
> h2. PGBulkloadExportReducer
> Reducer migrates data from staging tables into destination table.
> Reducer gets the names of staging tables as map output values.
> In order to do migration in a transaction, the number of reducers must be 1. (It is set by ConnectionManager internally).
> Migration is done in same way as Sqoop defalt connectors using "INSERT INTO dst ( SELECT * FROM src )".
> In the cleanup method, staging tables are dropped.
> If exception is raised in the reducer, garbage staging tables are left.
> User can delete them by executing this connector with --clear-staging-table option.
> h1. Requirements
> * pg_bulkload must be installed on DB server and all slave nodes.
> ** RPM for RedHat or CentOS is available in [download page|http://pgfoundry.org/frs/?group_id=1000261].
> * [PostgreSQL JDBC| http://jdbc.postgresql.org/download.html] is also required on client node (same as PostgresqlManager).
> * Superuser role of PostgreSQL database is required for execution of pg_bulkload.
> h1. Usage
> Currently there is no Factory class.
> Specify connection manager class name with --connection-manager option to use.
>   
> {noformat}
> $ sqoop export --connect jdbc:postgresql://localhost:5432/test \
>                --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
>                --table test --username postgres --export-dir=/test -m 1
> {noformat}
> You can also specify pg_bulkload configuration with Hadoop configuration properties.
> {noformat}
> $ sqoop export \
>     -Dpgbulkload.bin="/usr/local/bin/pg_bulkload" \
>     -Dpgbulkload.input.field.delim=$'\t' \
>     -Dpgbulkload.check.constraints="YES" \
>     -Dpgbulkload.parse.errors="INFINITE" \
>     -Dpgbulkload.duplicate.errors="INFINITE" \
>     --connect jdbc:postgresql://localhost:5432/test \
>     --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
>     --table test --username postgres --export-dir=/test -m 1
> {noformat}
> h1. Test
> There is test class named PGBulkloadManagerTest extending TestExport.
> This test expects that
> * The major version of PostgreSQL is 9.0,
> * The version of pg_bulkload is 3.1.1,
> * PostgreSQL server is running on localhost:5432 of testing node,
> * there is database named sqooptest,
> * super user role named sqooptest exists,
> * no password is set for the role, or .pgpass file is created.
> Tests can be invoked as below.
> {noformat}
> # ant -Dtestcase=PGBulkloadManagerTest test
> {noformat}
> h2. Test on single-node cluster built on CentOS 6.0
> Hadoop is expected to be running in pseudo distributed mode.
> {noformat}
> # rpm -ivh jdk-6u29-linux-amd64.rpm
> # yum install ant-junit
> # yum install hadoop-0.20-conf-pseudo
> # sudo -u hdfs hadoop namenode -format
> # /etc/init.d/hadoop-0.20-namenode start
> # /etc/init.d/hadoop-0.20-datanode start
> # /etc/init.d/hadoop-0.20-jobtracker start
> # /etc/init.d/hadoop-0.20-tasktracker start
> {noformat}
> Setup for PostgreSQL:
> {noformat}
> # wget http://ftp.postgresql.org/pub/source/v9.0.7/postgresql-9.0.7.tar.bz2
> # tar jxf postgresql-9.0.7.tar.bz2
> # cd postgresql-9.0.7
> # ./configure --prefix=/usr/local
> # make
> # make install
> # useradd postgres
> # mkdir /var/pgdata
> # chown postgres:postgres /var/pgdata
> # chmod 700 /var/pgdata
> # sudo -u postgres /usr/local/bin/initdb  -D /var/pgdata -E utf-8
> # sudo -u postgres /usr/local/bin/pg_ctl start -D /var/pgdata -l /var/pgdata/postgresql.log
> # createuser -U postgres -s sqooptest
> # createdb -U sqooptest sqooptest
> {noformat}
> Setup for pg_bulkload:
> {noformat}
> # cd ..
> # wget http://pgfoundry.org/frs/download.php/3176/pg_bulkload-3.1.1.tar.gz
> # tar zxf pg_bulkload-3.1.1.tar.gz
> # mv pg_bulkload-3.1.1 postgresql-9.0.7/contrib/
> # cd postgresql-9.0.7/contrib/pg_bulkload-3.1.1
> # make
> # make install
> # psql -U sqooptest -f /usr/local/share/postgresql/contrib/pg_bulkload.sql sqooptest
> # ln -s /usr/local/bin/pg_bulkload /usr/bin/
> {noformat}
> Setup for PostgreSQL JDBC Driver:
> {noformat}
> # wget http://jdbc.postgresql.org/download/postgresql-9.0-802.jdbc4.jar
> # cp postgresql-9.0-802.jdbc4.jar /usr/lib/hadoop/lib/
> {noformat}
> run test.
> {noformat}
> # cd /usr/local/src/sqoop-trunk
> # ant -Dtestcase=PGBulkloadManagerTest test
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira