You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2012/03/29 07:41:43 UTC
[jira] [Commented] (SQOOP-390) PostgreSQL connector for direct
export with pg_bulkload
[ https://issues.apache.org/jira/browse/SQOOP-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240995#comment-13240995 ]
jiraposter@reviews.apache.org commented on SQOOP-390:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review6510
-----------------------------------------------------------
Hello Masatake, I have a quick question about your patch.
Since you're adding new tests to test/com/cloudera/sqoop/manager, "ant test -Dthirdparty=true" will fail unless the test requirements are not installed (PostgreSql 9.1, pg_bulkload 3.1, etc).
I think that you can do either:
1) Not run your tests with "ant test -Dthirdparty=true".
or
2) Update the wiki page and user doc for setting up development env.
But since your tests are for a specific connector, I think that #1 makes more sense. Please let me know what you think.
Thanks,
Cheolsoo
- Cheolsoo
On 2012-01-13 05:23:36, Masatake Iwasaki wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2724/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-01-13 05:23:36)
bq.
bq.
bq. Review request for Sqoop.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Patch for SQOOP-390
bq. https://issues.apache.org/jira/browse/SQOOP-390
bq.
bq.
bq. This addresses bug SQOOP-390.
bq. https://issues.apache.org/jira/browse/SQOOP-390
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/java/com/cloudera/sqoop/manager/PGBulkloadManager.java PRE-CREATION
bq. /src/java/com/cloudera/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION
bq. /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION
bq. /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION
bq. /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION
bq. /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerTest.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/2724/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. This patch include the test class PGBulkloadManagerTest.
bq. I've tested "ant test" and passed.
bq.
bq.
bq. Thanks,
bq.
bq. Masatake
bq.
bq.
> PostgreSQL connector for direct export with pg_bulkload
> -------------------------------------------------------
>
> Key: SQOOP-390
> URL: https://issues.apache.org/jira/browse/SQOOP-390
> Project: Sqoop
> Issue Type: New Feature
> Reporter: Masatake Iwasaki
> Attachments: SQOOP-390-1.patch
>
>
> h1. Features
> * Fast data export into PostgreSQL database with [pg_bulkload|http://pgbulkload.projects.postgresql.org/index.html].
> * User can get benefit of functionality of pg_bulkload such as
> ** fast export bypassing shared bufferes and WAL,
> ** removing invalid data which cause parse error,
> ** ETL feature with filter functions.
> h1. Implementation
> h2. PGBulkloadExportMapper
> At first, each map tasks create their own staging table with names based on task attempt id because pg_bulkload holds table level lock.
> Arguments of pg_bulkload command can be passed via configuration object.
> The Mapper export data by invoking pg_bulkload command as external process.
> Commnad execution is done in the same way as DirectPostgresqlManager.
> h2. PGBulkloadExportReducer
> Reducer migrates data from staging tables into destination table.
> Reducer gets the names of staging tables as map output values.
> In order to do migration in a transaction, the number of reducers must be 1. (It is set by ConnectionManager internally).
> Migration is done in same way as Sqoop defalt connectors using "INSERT INTO dst ( SELECT * FROM src )".
> In the cleanup method, staging tables are dropped.
> If exception is raised in the reducer, garbage staging tables are left.
> User can delete them by executing this connector with --clear-staging-table option.
> h1. Requirements
> * pg_bulkload must be installed on DB server and all slave nodes.
> ** RPM for RedHat or CentOS is available in [download page|http://pgfoundry.org/frs/?group_id=1000261].
> * [PostgreSQL JDBC| http://jdbc.postgresql.org/download.html] is also required on client node (same as PostgresqlManager).
> * Superuser role of PostgreSQL database is required for execution of pg_bulkload.
> h1. Usage
> Currently there is no Factory class.
> Specify connection manager class name with --connection-manager option to use.
>
> {noformat}
> $ sqoop export --connect jdbc:postgresql://localhost:5432/test \
> --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
> --table test --username postgres --export-dir=/test -m 1
> {noformat}
> You can also specify pg_bulkload configuration with Hadoop configuration properties.
> {noformat}
> $ sqoop export \
> -Dpgbulkload.bin="/usr/local/bin/pg_bulkload" \
> -Dpgbulkload.input.field.delim=$'\t' \
> -Dpgbulkload.check.constraints="YES" \
> -Dpgbulkload.parse.errors="INFINITE" \
> -Dpgbulkload.duplicate.errors="INFINITE" \
> --connect jdbc:postgresql://localhost:5432/test \
> --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
> --table test --username postgres --export-dir=/test -m 1
> {noformat}
> h1. Test
> There is test class named PGBulkloadManagerTest extending TestExport.
> This test expects that
> * The major version of PostgreSQL is 9.0,
> * The version of pg_bulkload is 3.1.1,
> * PostgreSQL server is running on localhost:5432 of testing node,
> * there is database named sqooptest,
> * super user role named sqooptest exists,
> * no password is set for the role, or .pgpass file is created.
> Tests can be invoked as below.
> {noformat}
> # ant -Dtestcase=PGBulkloadManagerTest test
> {noformat}
> h2. Test on single-node cluster built on CentOS 6.0
> Hadoop is expected to be running in pseudo distributed mode.
> {noformat}
> # rpm -ivh jdk-6u29-linux-amd64.rpm
> # yum install ant-junit
> # yum install hadoop-0.20-conf-pseudo
> # sudo -u hdfs hadoop namenode -format
> # /etc/init.d/hadoop-0.20-namenode start
> # /etc/init.d/hadoop-0.20-datanode start
> # /etc/init.d/hadoop-0.20-jobtracker start
> # /etc/init.d/hadoop-0.20-tasktracker start
> {noformat}
> Setup for PostgreSQL:
> {noformat}
> # wget http://ftp.postgresql.org/pub/source/v9.0.7/postgresql-9.0.7.tar.bz2
> # tar jxf postgresql-9.0.7.tar.bz2
> # cd postgresql-9.0.7
> # ./configure --prefix=/usr/local
> # make
> # make install
> # useradd postgres
> # mkdir /var/pgdata
> # chown postgres:postgres /var/pgdata
> # chmod 700 /var/pgdata
> # sudo -u postgres /usr/local/bin/initdb -D /var/pgdata -E utf-8
> # sudo -u postgres /usr/local/bin/pg_ctl start -D /var/pgdata -l /var/pgdata/postgresql.log
> # createuser -U postgres -s sqooptest
> # createdb -U sqooptest sqooptest
> {noformat}
> Setup for pg_bulkload:
> {noformat}
> # cd ..
> # wget http://pgfoundry.org/frs/download.php/3176/pg_bulkload-3.1.1.tar.gz
> # tar zxf pg_bulkload-3.1.1.tar.gz
> # mv pg_bulkload-3.1.1 postgresql-9.0.7/contrib/
> # cd postgresql-9.0.7/contrib/pg_bulkload-3.1.1
> # make
> # make install
> # psql -U sqooptest -f /usr/local/share/postgresql/contrib/pg_bulkload.sql sqooptest
> # ln -s /usr/local/bin/pg_bulkload /usr/bin/
> {noformat}
> Setup for PostgreSQL JDBC Driver:
> {noformat}
> # wget http://jdbc.postgresql.org/download/postgresql-9.0-802.jdbc4.jar
> # cp postgresql-9.0-802.jdbc4.jar /usr/lib/hadoop/lib/
> {noformat}
> run test.
> {noformat}
> # cd /usr/local/src/sqoop-trunk
> # ant -Dtestcase=PGBulkloadManagerTest test
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira