You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Masatake Iwasaki <iw...@nttdata.co.jp> on 2012/08/01 11:46:41 UTC

Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload


> On July 27, 2012, 4:34 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java, lines 82-86
> > <https://reviews.apache.org/r/2724/diff/3/?file=129306#file129306line82>
> >
> >     Could add option to create those temporary tables in different database?
> 
> Masatake Iwasaki wrote:
>     As far as PostgreSQL concerned, staging across databases is inefficient because it causes network data transfer via client (slave node). Also this change requires handling of multiple connections and causes a lot of code modifications.  I would like to leave this as a future improvement.
>     It may be more preferable to handle the feature connecting to multiple databases for staging in a independent JIRA issue about Sqoop global functionality.
>
> 
> Jarek Cecho wrote:
>     I do not have strong PostgreSQL background, so please excuse me if this will be stupid question. The way how we're doing it in other connectors for explicit temporary tables is that we're using just one connection (to the target database specified on the command line) and we're  using explicit database name in case that user wants data stored in different database. Something like "create table tmp_database.tmp_table like exported_table" and "insert into exported_table select * from tmp_database.tmp_table". Is something like this possible in PostgreSQL?

In PostgreSQL, users can use "schema" in the same way and using "tablespace" enables physical data separation of staging table and destination table. Though default PostgresSQL has no problem for use of schema and tablespace, pg_bulkload connector needs fix because each map task of PGBulkloadExportJob create their own staging table on the fly. I am going to try adding a option for it.

references for scheam and tablespace:
 http://www.postgresql.org/docs/9.0/interactive/ddl-schemas.html
 http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9540
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>