You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Masatake Iwasaki (Updated) (JIRA)" <ji...@apache.org> on 2011/11/04 09:58:59 UTC

[jira] [Updated] (SQOOP-390) PostgreSQL connector for direct export with pg_bulkload

     [ https://issues.apache.org/jira/browse/SQOOP-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Masatake Iwasaki updated SQOOP-390:
-----------------------------------

    Attachment: SQOOP-390-1.patch

I would like review and suggestion for this connector Postgresql.
Thanks in advance.
                
> PostgreSQL connector for direct export with pg_bulkload
> -------------------------------------------------------
>
>                 Key: SQOOP-390
>                 URL: https://issues.apache.org/jira/browse/SQOOP-390
>             Project: Sqoop
>          Issue Type: New Feature
>            Reporter: Masatake Iwasaki
>         Attachments: SQOOP-390-1.patch
>
>
> h1. Features
> * Fast data export with pg_bulkload.
>   ** http://pgbulkload.projects.postgresql.org/index.html
> * User can get benefit of functionality of pg_bulkload such as
> ** bypassing shared bufferes and WAL,
> ** ignoring data causing parse error,
> ** ETL feature with filter functions.
> h1. Implementation
> * Each map tasks create their own staging table with names based on task attempt id. Then export data with pg_bulkload invoked as outer process.
> * Staging tables are erased on
> ** Successful completion of export, 
> ** or Exception in map task.
> * Reduce task migrate data from staging tables into destination table
> ** Number of reduce tasks is internally set to 1 .
> h1. Requirements
> * pg_bulkload must be installed on DB server and all slave nodes.
> * Also PostgreSQL JDBC is required on client. (Same as PostgresqlManager)
> * Superuser role of PostgreSQL database is required for pg_bulkload.
> h1. Usage
> Currently there is no Factory class. Specify connection manager class name with --connection-manager option to use.
>   
> {noformat} 
> sqoop export --connect jdbc:postgresql://localhost:5432/test \
>              --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
>              --table test --username postgres --export-dir=/test -m 1
> {noformat} 
> You can also specify pg_bulkload configuration with Hadoop Configuration properties.
> {noformat} 
> -Dpgbulkload.bin="/usr/bin/pg_bulkload"
> -Dpgbulkload.input.field.delim=$'\t'
> -Dpgbulkload.check.constraints="YES"
> -Dpgbulkload.parse.errors="INFINITE"
> -Dpgbulkload.duplicate.errors="INFINITE"
> {noformat} 
> h1. Test
> There is test class named PGBulkloadManagerTest extending TestExport.
> {noformat} 
>  ant -Dtestcase=PGBulkloadManagerTest test
> {noformat} 
> This test requires 
> * PostgreSQL running on localhost:5432,
> * database named sqooptest,
> * super user role named sqooptest with no password, or .pgpass created

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira