You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Masatake Iwasaki <iw...@nttdata.co.jp> on 2012/04/02 04:30:05 UTC

Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/
-----------------------------------------------------------

(Updated 2012-04-02 02:30:05.378389)


Review request for Sqoop.


Changes
-------

renamed test case class.


Summary
-------

Patch for SQOOP-390
https://issues.apache.org/jira/browse/SQOOP-390


This addresses bug SQOOP-390.
    https://issues.apache.org/jira/browse/SQOOP-390


Diffs (updated)
-----

  /src/java/com/cloudera/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
  /src/java/com/cloudera/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
  /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
  /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
  /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
  /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2724/diff


Testing
-------

This patch include the test class PGBulkloadManagerTest.
I've tested "ant test" and passed.


Thanks,

Masatake


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

On July 25, 2012, 11:15 p.m., Masatake Iwasaki wrote:
> > Again, please accept my deep apology for not getting back to you for so long.
> > 
> > Jarcec

Hi Jarek,
thanks for your comments. I updated the patch and tested it against current svn trunk.
Though I changed package name from com.cloudera to org.apatch, some dependent classes are still in com.cloudera and I didn't touch them.


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9486
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.

On July 25, 2012, 11:15 p.m., Masatake Iwasaki wrote:
> > Again, please accept my deep apology for not getting back to you for so long.
> > 
> > Jarcec
> 
> Masatake Iwasaki wrote:
>     Hi Jarek,
>     thanks for your comments. I updated the patch and tested it against current svn trunk.
>     Though I changed package name from com.cloudera to org.apatch, some dependent classes are still in com.cloudera and I didn't touch them.

Thank you very much sir for your prompt response. I'll look into your new version as soon as I can. However please give me few days as I do want to have good understanding of your code.

Jarcec


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9486
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9486
-----------------------------------------------------------


Hi Masatake,
thank you very much for your effort in creating this connector. I'm extremely sorry that I did not provide feedback sooner.  Please let me know if you're still willing to work on it as I would like to get it committed at some point. I did just quickly read your changes and I do have few high level comments for now:

1) Can you please move all your classes into org.apache.sqoop instead of com.cloudera.sqoop?

2) Could you please change all license files from Cloudera to Apache Software Foundation?


/src/java/com/cloudera/sqoop/manager/PGBulkloadManager.java
<https://reviews.apache.org/r/2724/#comment20299>

    Please move this to the PGBulkloadExportJob. We've recently introduced method propagateOptionsToJob() that you can override and that should serve exactly this purpose.


Again, please accept my deep apology for not getting back to you for so long.

Jarcec

- Jarek Cecho


On April 2, 2012, 2:30 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated April 2, 2012, 2:30 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/com/cloudera/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/com/cloudera/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/com/cloudera/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

> On July 27, 2012, 4:34 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java, lines 82-86
> > <https://reviews.apache.org/r/2724/diff/3/?file=129306#file129306line82>
> >
> >     Could add option to create those temporary tables in different database?
> 
> Masatake Iwasaki wrote:
>     As far as PostgreSQL concerned, staging across databases is inefficient because it causes network data transfer via client (slave node). Also this change requires handling of multiple connections and causes a lot of code modifications.  I would like to leave this as a future improvement.
>     It may be more preferable to handle the feature connecting to multiple databases for staging in a independent JIRA issue about Sqoop global functionality.
>
> 
> Jarek Cecho wrote:
>     I do not have strong PostgreSQL background, so please excuse me if this will be stupid question. The way how we're doing it in other connectors for explicit temporary tables is that we're using just one connection (to the target database specified on the command line) and we're  using explicit database name in case that user wants data stored in different database. Something like "create table tmp_database.tmp_table like exported_table" and "insert into exported_table select * from tmp_database.tmp_table". Is something like this possible in PostgreSQL?

In PostgreSQL, users can use "schema" in the same way and using "tablespace" enables physical data separation of staging table and destination table. Though default PostgresSQL has no problem for use of schema and tablespace, pg_bulkload connector needs fix because each map task of PGBulkloadExportJob create their own staging table on the fly. I am going to try adding a option for it.

references for scheam and tablespace:
 http://www.postgresql.org/docs/9.0/interactive/ddl-schemas.html
 http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9540
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.

> On July 27, 2012, 4:34 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java, lines 82-86
> > <https://reviews.apache.org/r/2724/diff/3/?file=129306#file129306line82>
> >
> >     Could add option to create those temporary tables in different database?
> 
> Masatake Iwasaki wrote:
>     As far as PostgreSQL concerned, staging across databases is inefficient because it causes network data transfer via client (slave node). Also this change requires handling of multiple connections and causes a lot of code modifications.  I would like to leave this as a future improvement.
>     It may be more preferable to handle the feature connecting to multiple databases for staging in a independent JIRA issue about Sqoop global functionality.
>
> 
> Jarek Cecho wrote:
>     I do not have strong PostgreSQL background, so please excuse me if this will be stupid question. The way how we're doing it in other connectors for explicit temporary tables is that we're using just one connection (to the target database specified on the command line) and we're  using explicit database name in case that user wants data stored in different database. Something like "create table tmp_database.tmp_table like exported_table" and "insert into exported_table select * from tmp_database.tmp_table". Is something like this possible in PostgreSQL?
> 
> Masatake Iwasaki wrote:
>     In PostgreSQL, users can use "schema" in the same way and using "tablespace" enables physical data separation of staging table and destination table. Though default PostgresSQL has no problem for use of schema and tablespace, pg_bulkload connector needs fix because each map task of PGBulkloadExportJob create their own staging table on the fly. I am going to try adding a option for it.
>     
>     references for scheam and tablespace:
>      http://www.postgresql.org/docs/9.0/interactive/ddl-schemas.html
>      http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html

Thank you sir.

Jarce


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9540
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.

> On July 27, 2012, 4:34 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java, lines 82-86
> > <https://reviews.apache.org/r/2724/diff/3/?file=129306#file129306line82>
> >
> >     Could add option to create those temporary tables in different database?
> 
> Masatake Iwasaki wrote:
>     As far as PostgreSQL concerned, staging across databases is inefficient because it causes network data transfer via client (slave node). Also this change requires handling of multiple connections and causes a lot of code modifications.  I would like to leave this as a future improvement.
>     It may be more preferable to handle the feature connecting to multiple databases for staging in a independent JIRA issue about Sqoop global functionality.
>

I do not have strong PostgreSQL background, so please excuse me if this will be stupid question. The way how we're doing it in other connectors for explicit temporary tables is that we're using just one connection (to the target database specified on the command line) and we're  using explicit database name in case that user wants data stored in different database. Something like "create table tmp_database.tmp_table like exported_table" and "insert into exported_table select * from tmp_database.tmp_table". Is something like this possible in PostgreSQL?


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9540
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

> On July 27, 2012, 4:34 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java, lines 82-86
> > <https://reviews.apache.org/r/2724/diff/3/?file=129306#file129306line82>
> >
> >     Could add option to create those temporary tables in different database?

As far as PostgreSQL concerned, staging across databases is inefficient because it causes network data transfer via client (slave node). Also this change requires handling of multiple connections and causes a lot of code modifications.  I would like to leave this as a future improvement.
It may be more preferable to handle the feature connecting to multiple databases for staging in a independent JIRA issue about Sqoop global functionality.


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9540
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9540
-----------------------------------------------------------


Hi Masatake,
I've finished reviewing this version. I was able to move some bits, so thank you very much for this connector!

I've gathered couple others small notes:


/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java
<https://reviews.apache.org/r/2724/#comment20375>

    Could add option to create those temporary tables in different database?



/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java
<https://reviews.apache.org/r/2724/#comment20376>

    Could you add between debug output that will print out all used arguments? Something like:
    
    LOG.debug("Starting pg_bulkload with arguments:");
    for (String arg : args) {
      LOG.debug("  " + arg);
    }



/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java
<https://reviews.apache.org/r/2724/#comment20377>

    I've noticed that very similar method is already present in DirectPostgresqlManager, do you think you could reuse this one?


Please incorporate my comments from both this and previous review - I believe we're on the right track to get this committed!

Jarcec

- Jarek Cecho


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.

> On July 26, 2012, 9:13 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java, line 83
> > <https://reviews.apache.org/r/2724/diff/3/?file=129307#file129307line83>
> >
> >     I'm thinking whether it would be beneficial here not to die immediately. For example if user is exporting 30 partitions and there will be some issues with the first. He would be forced to manually do all 30 of them. But if we would try all them first and log only the failed ones, user would have to fix only that one. What do you think sir?
> 
> Masatake Iwasaki wrote:
>     The motivation for the current design is keeping destination table clean by staging in a single atomic transaction. I think making a number of reduce tasks configurable is a good solution. Users can choose single atomic staging or parallel staging.

I see your point and do agree that it would be great to offer both options to the users.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9514
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

> On July 26, 2012, 9:13 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java, lines 215-216
> > <https://reviews.apache.org/r/2724/diff/3/?file=129305#file129305line215>
> >
> >     I personally do not like this table searching. We might delete some tables that user left there intentionally from previous export. I would more prefer to simply not support --clear-staging-table parameter (properly die if it will be specified and document this feature in the user guide). What do you think?
> 
> Masatake Iwasaki wrote:
>     I agree that deleting table based name pattern is unsafe and unpredictable. I will disable --clear-staging-table option.

I fixed staging table logic and got rid of clearing based on table name pattern.


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9514
-----------------------------------------------------------


On Aug. 5, 2012, 6:56 p.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2012, 6:56 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1367974 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

> On July 26, 2012, 9:13 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java, line 83
> > <https://reviews.apache.org/r/2724/diff/3/?file=129307#file129307line83>
> >
> >     I'm thinking whether it would be beneficial here not to die immediately. For example if user is exporting 30 partitions and there will be some issues with the first. He would be forced to manually do all 30 of them. But if we would try all them first and log only the failed ones, user would have to fix only that one. What do you think sir?

The motivation for the current design is keeping destination table clean by staging in a single atomic transaction. I think making a number of reduce tasks configurable is a good solution. Users can choose single atomic staging or parallel staging.


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9514
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

> On July 26, 2012, 9:13 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java, lines 34-35
> > <https://reviews.apache.org/r/2724/diff/3/?file=129304#file129304line34>
> >
> >     I've notice that this class is to very hight extent identical to Sqoop's AutoProgressMapper. I'm thinking that rather then keeping similar code base twice, it might be beneficial to abstract shared functionality to separate class and simply reuse it in both AutoProgress[Mapper|Reducer]. What do you think?

I am going to get ProgressThread out of AutoProgressMapper and use it from both of AutoProgress[Mapper|Reducer] to remove duplicate code.


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9514
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

> On July 26, 2012, 9:13 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java, lines 215-216
> > <https://reviews.apache.org/r/2724/diff/3/?file=129305#file129305line215>
> >
> >     I personally do not like this table searching. We might delete some tables that user left there intentionally from previous export. I would more prefer to simply not support --clear-staging-table parameter (properly die if it will be specified and document this feature in the user guide). What do you think?

I agree that deleting table based name pattern is unsafe and unpredictable. I will disable --clear-staging-table option.


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9514
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9514
-----------------------------------------------------------


Hi Masatake,
I've read more carefully this time through your code and I do have few comments. I didn't have time to transfer some bits yet, so this is definitely not my final review. Please feel free to wait with your actions until I'll be done. I mainly wanted to provide you feedback and show you that I'm actually working on it this time :-)


/src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java
<https://reviews.apache.org/r/2724/#comment20335>

    This class seems to be more Reducer than Mapper :-) Would you mind fixing the javadoc?



/src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java
<https://reviews.apache.org/r/2724/#comment20342>

    I've notice that this class is to very hight extent identical to Sqoop's AutoProgressMapper. I'm thinking that rather then keeping similar code base twice, it might be beneficial to abstract shared functionality to separate class and simply reuse it in both AutoProgress[Mapper|Reducer]. What do you think?



/src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java
<https://reviews.apache.org/r/2724/#comment20336>

    Could you add stack trace printing here -- I believe it might be useful. Something like:
    
    Log.warn("..." + ie.toString(), ie);



/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java
<https://reviews.apache.org/r/2724/#comment20337>

    You're also setting number of reducers on line 169 of this file. I believe that it's not necessary to do it twice.



/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java
<https://reviews.apache.org/r/2724/#comment20338>

    This exactly same handling of two different exceptions seems weird to me. What about putting only one catch class that will catch Exception (i.e everything)?



/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java
<https://reviews.apache.org/r/2724/#comment20343>

    I personally do not like this table searching. We might delete some tables that user left there intentionally from previous export. I would more prefer to simply not support --clear-staging-table parameter (properly die if it will be specified and document this feature in the user guide). What do you think?



/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java
<https://reviews.apache.org/r/2724/#comment20340>

    I would suggest putting here just one catch class for Exception and effectively catch up everything.



/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java
<https://reviews.apache.org/r/2724/#comment20341>

    I'm thinking whether it would be beneficial here not to die immediately. For example if user is exporting 30 partitions and there will be some issues with the first. He would be forced to manually do all 30 of them. But if we would try all them first and log only the failed ones, user would have to fix only that one. What do you think sir?


I also noticed that you've done very good job documenting usage on the JIRA. There is a lot of extra arguments that user needs to set up, so I would prefer if you could also upgrade sqoop user guide in this patch.

Jarcec

- Jarek Cecho


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

> On Aug. 9, 2012, 7:08 a.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java, lines 85-90
> > <https://reviews.apache.org/r/2724/diff/4/?file=134267#file134267line85>
> >
> >     I'm thinking here - is there a situation where the table will exists when you're using unique task id as a part of table name? (except the case where user will actually have such weirdly named table)

I think it is rare situation in real clusters but mapreduce job run in local mode creates same task attempt name every time. Clearing staging table is useful at least for ant tests.


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review10059
-----------------------------------------------------------


On Aug. 15, 2012, 6:14 p.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2012, 6:14 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/docs/user/SqoopUserGuide.txt 1367974 
>   /src/docs/user/connectors.txt PRE-CREATION 
>   /src/java/org/apache/sqoop/manager/DirectPostgresqlManager.java 1367974 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1367974 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/util/PostgreSQLUtils.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.

On Aug. 9, 2012, 7:08 a.m., Masatake Iwasaki wrote:
> > Thank you very much for your time and effort.
> > 
> > Jarcec

Thanks for your reviews, too.


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review10059
-----------------------------------------------------------


On Aug. 18, 2012, 8:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated Aug. 18, 2012, 8:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/docs/user/SqoopUserGuide.txt 1374499 
>   /src/docs/user/connectors.txt PRE-CREATION 
>   /src/java/org/apache/sqoop/manager/DirectPostgresqlManager.java 1374499 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1374499 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/util/PostgreSQLUtils.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review10059
-----------------------------------------------------------


Hi Masatake,
thank you very much for your update. I believe that we're almost at the end!

You've done very good job documenting the usage on JIRA, would you mind updating user guide in similar manner?

Also I do have two comments:


/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java
<https://reviews.apache.org/r/2724/#comment21295>

    I'm thinking here - is there a situation where the table will exists when you're using unique task id as a part of table name? (except the case where user will actually have such weirdly named table)



/src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java
<https://reviews.apache.org/r/2724/#comment21303>

    I've noticed that class DirectPostgresqlManager have very similar method. Do you think you can reuse it? Or maybe abstract it somewhere as public static?


Thank you very much for your time and effort.

Jarcec

- Jarek Cecho


On Aug. 5, 2012, 6:56 p.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2012, 6:56 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1367974 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review10415
-----------------------------------------------------------


Thank you sir for your updated patch. I believe we're almost at the end. I just have two additional notes:


/src/docs/user/connectors.txt
<https://reviews.apache.org/r/2724/#comment22189>

    Nit: Please remove the white space character at the end of the line



/src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java
<https://reviews.apache.org/r/2724/#comment22190>

    My compiler is complaining here that this method do not override or implement method from a super type.
    
    This method do not seem to be used anywhere, so I'm wondering if we need it?


I believe that I'll be able to commit your patch once you will fix those!

Jarcec

- Jarek Cecho


On Aug. 15, 2012, 6:14 p.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2012, 6:14 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/docs/user/SqoopUserGuide.txt 1367974 
>   /src/docs/user/connectors.txt PRE-CREATION 
>   /src/java/org/apache/sqoop/manager/DirectPostgresqlManager.java 1367974 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1367974 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/util/PostgreSQLUtils.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review10525
-----------------------------------------------------------

Ship it!


Hi Masatake, 
please attach your patch to JIRA and I'll commit it. Please make sure to check "Grant Apache license" button.

Jarcec

- Jarek Cecho


On Aug. 18, 2012, 8:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated Aug. 18, 2012, 8:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/docs/user/SqoopUserGuide.txt 1374499 
>   /src/docs/user/connectors.txt PRE-CREATION 
>   /src/java/org/apache/sqoop/manager/DirectPostgresqlManager.java 1374499 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1374499 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/util/PostgreSQLUtils.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/
-----------------------------------------------------------

(Updated Aug. 18, 2012, 8:41 a.m.)


Review request for Sqoop.


Changes
-------

1. removed trailing spaces.
2. removed obsolete method following changes of the parent class.


Description
-------

Patch for SQOOP-390
https://issues.apache.org/jira/browse/SQOOP-390


This addresses bug SQOOP-390.
    https://issues.apache.org/jira/browse/SQOOP-390


Diffs (updated)
-----

  /src/docs/user/SqoopUserGuide.txt 1374499 
  /src/docs/user/connectors.txt PRE-CREATION 
  /src/java/org/apache/sqoop/manager/DirectPostgresqlManager.java 1374499 
  /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1374499 
  /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/util/PostgreSQLUtils.java PRE-CREATION 
  /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2724/diff/


Testing
-------

This patch include the test class PGBulkloadManagerTest.
I've tested "ant test" and passed.


Thanks,

Masatake Iwasaki


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/
-----------------------------------------------------------

(Updated Aug. 15, 2012, 6:14 p.m.)


Review request for Sqoop.


Changes
-------

1. added PostgreSQLUtils class to store common logic for PostgreSQL connectors.
2. added user documentation.


Description
-------

Patch for SQOOP-390
https://issues.apache.org/jira/browse/SQOOP-390


This addresses bug SQOOP-390.
    https://issues.apache.org/jira/browse/SQOOP-390


Diffs (updated)
-----

  /src/docs/user/SqoopUserGuide.txt 1367974 
  /src/docs/user/connectors.txt PRE-CREATION 
  /src/java/org/apache/sqoop/manager/DirectPostgresqlManager.java 1367974 
  /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1367974 
  /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/util/PostgreSQLUtils.java PRE-CREATION 
  /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2724/diff/


Testing
-------

This patch include the test class PGBulkloadManagerTest.
I've tested "ant test" and passed.


Thanks,

Masatake Iwasaki


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/
-----------------------------------------------------------

(Updated Aug. 5, 2012, 6:56 p.m.)


Review request for Sqoop.


Changes
-------

1. got ProgressThread out of AutoProgressMapper and use it from both of AutoProgress[Mapper|Reducer] to remove duplicate code.
2. moved clearing of staging table to each map tasks and got rid of deletion based on table name patten matching.
3. made the number of reduce tasks configurable by mapred.reduce.tasks property.
4. made tablespace for staging table configurable by pgbulkload.staging.tablespace property.
5. added testcase for export with multi reduce tasks.
6. fixed some log messages and comments.


Description
-------

Patch for SQOOP-390
https://issues.apache.org/jira/browse/SQOOP-390


This addresses bug SQOOP-390.
    https://issues.apache.org/jira/browse/SQOOP-390


Diffs (updated)
-----

  /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/AutoProgressMapper.java 1367974 
  /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/ProgressThread.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/SqoopReducer.java PRE-CREATION 
  /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2724/diff/


Testing
-------

This patch include the test class PGBulkloadManagerTest.
I've tested "ant test" and passed.


Thanks,

Masatake Iwasaki


Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload

Posted by Masatake Iwasaki <iw...@nttdata.co.jp>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/
-----------------------------------------------------------

(Updated July 26, 2012, 10:41 a.m.)


Review request for Sqoop.


Changes
-------

1. movee all classes except testcase into org.apache.sqoop from com.cloudera.sqoop.
2. updated license header from Cloudera to Apache Software Foundation.
3. moved mapreduce job configuration setting to PGBulkloadExportJob from PGBulkloadManager.
4. fixed PGBulkloadManagerManualTest to pass tests against current trunk.
5. trivial fixes to pass checkstyle.


Description
-------

Patch for SQOOP-390
https://issues.apache.org/jira/browse/SQOOP-390


This addresses bug SQOOP-390.
    https://issues.apache.org/jira/browse/SQOOP-390


Diffs (updated)
-----

  /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
  /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
  /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2724/diff/


Testing
-------

This patch include the test class PGBulkloadManagerTest.
I've tested "ant test" and passed.


Thanks,

Masatake Iwasaki