You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Hari Shreedharan <hs...@cloudera.com> on 2013/07/25 06:18:27 UTC

Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/
-----------------------------------------------------------

Review request for Sqoop.


Bugs: SQOOP-777
    https://issues.apache.org/jira/browse/SQOOP-777


Repository: sqoop-sqoop2


Description
-------

Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 

I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 

I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.


Diffs
-----

  common/pom.xml db11b5b 
  common/src/main/java/org/apache/sqoop/common/SqoopException.java 6caed13 
  common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java PRE-CREATION 
  common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
  common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
  common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java PRE-CREATION 
  common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
  core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java 4293dce 
  core/src/main/java/org/apache/sqoop/framework/JobManager.java 9f09982 
  core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
  execution/mapreduce/pom.xml f9a2a0e 
  execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
  pom.xml 520c107 
  spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
  submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 

Diff: https://reviews.apache.org/r/12936/diff/


Testing
-------


Thanks,

Hari Shreedharan


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Hari Shreedharan <hs...@cloudera.com>.

> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java, line 78
> > <https://reviews.apache.org/r/12936/diff/4/?file=329175#file329175line78>
> >
> >     Can we throw more meaning full exception here? :-)

I added this to figure out which tests were failing because of lack of schema. This check is not required, since the schema would exist in the conf anyway.


> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java, lines 71-74
> > <https://reviews.apache.org/r/12936/diff/4/?file=329160#file329160line71>
> >
> >     I'm afraid that the schema generation for query based export will fail and I'm not sure how to implement that, so I we might need to remove that functionality.

For now, I am going to leave it around. We should work on removing that in a separate jira.


> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java, lines 80-100
> > <https://reviews.apache.org/r/12936/diff/4/?file=329160#file329160line80>
> >
> >     Similar code is also in importer, so maybe we could refactore that into some Util/Helper class?

Done.


> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java, lines 59-61
> > <https://reviews.apache.org/r/12936/diff/4/?file=329163#file329163line59>
> >
> >     Nit: This property seems to be used mainly within the Execute/Submission engine, so it's better placed in JobConstants instead.

Done


> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > pom.xml, lines 326-330
> > <https://reviews.apache.org/r/12936/diff/4/?file=329183#file329183line326>
> >
> >     We already have guava dependency in Hadoop 100 profile. If the goal here is to make it explicit, I would suggest to move it out from there and use the property guava.version to specify the version.

Done


> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java, line 19
> > <https://reviews.apache.org/r/12936/diff/4/?file=329153#file329153line19>
> >
> >     I would much rather see such class(es) in connector sdk than in common module (that is for example shared with client).

Done


> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > common/src/main/java/org/apache/sqoop/etl/io/DataReader.java, lines 24-34
> > <https://reviews.apache.org/r/12936/diff/4/?file=329154#file329154line24>
> >
> >     Nit: Can we add java docs describing purpose of each method?

Done


> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java, lines 24-34
> > <https://reviews.apache.org/r/12936/diff/4/?file=329155#file329155line24>
> >
> >     Nit: Can we add java docs describing purpose of each method?

DOne


> On July 31, 2013, 11:25 p.m., Jarek Cecho wrote:
> > common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java, lines 119-125
> > <https://reviews.apache.org/r/12936/diff/4/?file=329156#file329156line119>
> >
> >     Nit: Please add the java docs here as well.

Done


- Hari


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review24352
-----------------------------------------------------------


On July 27, 2013, 4:11 a.m., Hari Shreedharan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> 
> (Updated July 27, 2013, 4:11 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 
> 
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 
> 
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5b 
>   common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java PRE-CREATION 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
>   common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java PRE-CREATION 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
>   common/src/test/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormatTest.java PRE-CREATION 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
>   core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java 4293dce 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java 9f09982 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
>   execution/mapreduce/pom.xml f9a2a0e 
>   execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
>   pom.xml 520c107 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 
> 
> Diff: https://reviews.apache.org/r/12936/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review24352
-----------------------------------------------------------


Hi Hari,
thank you very much for working on this significant feature! I've done high level review and I think that we are in a good shape - good work here! Couple of high level notes:


common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment48198>

    I would much rather see such class(es) in connector sdk than in common module (that is for example shared with client).



common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment48200>

    Nit: Can we add java docs describing purpose of each method?



common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment48201>

    Nit: Can we add java docs describing purpose of each method?



common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment48202>

    Nit: "the the"



common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment48203>

    Nit: Please add the java docs here as well.



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment48244>

    I'm afraid that the schema generation for query based export will fail and I'm not sure how to implement that, so I we might need to remove that functionality.



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment48247>

    Similar code is also in importer, so maybe we could refactore that into some Util/Helper class?



core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java
<https://reviews.apache.org/r/12936/#comment48249>

    Nit: This property seems to be used mainly within the Execute/Submission engine, so it's better placed in JobConstants instead.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
<https://reviews.apache.org/r/12936/#comment48248>

    Can we throw more meaning full exception here? :-)



pom.xml
<https://reviews.apache.org/r/12936/#comment48197>

    We already have guava dependency in Hadoop 100 profile. If the goal here is to make it explicit, I would suggest to move it out from there and use the property guava.version to specify the version.


Jarcec

- Jarek Cecho


On July 27, 2013, 4:11 a.m., Hari Shreedharan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> 
> (Updated July 27, 2013, 4:11 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 
> 
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 
> 
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5b 
>   common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java PRE-CREATION 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
>   common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java PRE-CREATION 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
>   common/src/test/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormatTest.java PRE-CREATION 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
>   core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java 4293dce 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java 9f09982 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
>   execution/mapreduce/pom.xml f9a2a0e 
>   execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
>   pom.xml 520c107 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 
> 
> Diff: https://reviews.apache.org/r/12936/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review25304
-----------------------------------------------------------


Hi Hari,
thank you again for working on this JIRA and please accept my apologies for the late response. I've deep dived into the patch and I do have couple of comments and questions.


common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49646>

    Extreme nit: Read data from the execution framework...
    
    (not necessary MR)



common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49647>

    Extreme nit: Read data from execution framework...



common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49648>

    Nit: Read data from execution framework as a native format.
    
    This should be independent native format right?



common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment49649>

    Extreme nit: Write an array of objects into the execution framework...



common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment49650>

    Extreme nit: Write data into execution framework....



common/src/main/java/org/apache/sqoop/schema/type/Column.java
<https://reviews.apache.org/r/12936/#comment49651>

    Can we add descriptive java doc describing what exactly are expecting to validate?



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment50066>

    Those imports seems to be unused.



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment50067>

    We should also take into account exportJobConfiguration.table.schema when building the query.



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
<https://reviews.apache.org/r/12936/#comment50068>

    Let's clean up unused imports after this refactoring.



connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
<https://reviews.apache.org/r/12936/#comment49652>

    This method seems to be generated automatically, but it seems to be removing the fail() call, is it intentional?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49656>

    The wiki [1] is also saying that the 0x5C should be substituted.
    
    1: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49658>

    Similarly as above, it seems that we are missing the \\ replacement for 0x5C.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50050>

    Is it wise to consume the exception here without any counters or other reporting except of log message?
    
    It also seems that other parts of the code are not null safe, so this error handling will most likely just cause NPE somewhere else.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49660>

    The setData() method is conditionally running the validations, are they intentionally skipped here in setTextData()?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49661>

    I'm afraid that String.split() won't work correctly for following example input:
    
    1,"Hi, here is Jarcec"
    
    That contains two columns, but three columns will be created when splitting by comma.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50051>

    Nit: Please pass the correct array size instead of the zero.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50052>

    I would suggest to be strict here and accept only upper case variant of NULL.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50053>

    I would suggest to die quickly for unsupported data types rather than just silently pass them as a un-escaped strings.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50055>

    Is there a reason why to do our own join rather than StringUtils.join()?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50056>

    Is expected and desirable that user can't reset schema to NULL?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50035>

    Nit: Would it make more sense to have IntermediateDataFormat (without type here) or CSVIntermediateDataFormat since we are not allowing nothing else?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50036>

    This will convert the binary string to an array of 0 and 1, right? The intermediate data format is suggesting the MySQLdump approach that is serializing the bytes as they are though. The 0 and 1 seems to be quite inefficient.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50037>

    Since the regular expressions are pre-build and therefore the PatternSyntaxException should not be thrown, is there any other exception that we are expecting?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50060>

    The conditional escaped argument seems quite dangerous to me - what if the user has saved the data on disk with escaped = true and is using this class to read the data from disk? The default value is false, so this will remove some actual data, right?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50038>

    Since the regular expressions are pre-build and therefore the PatternSyntaxException should not be thrown, is there any other exception that we are expecting?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50058>

    Since couple of the methods, such as setSchema(), getSchema(), validateData() will have to be implemented by every IDF in a very similar way, I'm wondering if it would make sense to convert the IDF to abstract class and provide the base implementation?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50039>

    Can we javadoc for those two methods?
    
    Also it seems that they are not used, so I'm wondering if we are planning to use them in the future or if they are artifact from previous development?



core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java
<https://reviews.apache.org/r/12936/#comment50040>

    Can we make this a Class instead of String?



execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
<https://reviews.apache.org/r/12936/#comment50041>

    Nit: Can we keep the style from previous lines and put this on single line?



execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java
<https://reviews.apache.org/r/12936/#comment50063>

    The length variable seems to be used only readFields method, so I'm wondering if local variable wouldn't be better fit?



execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java
<https://reviews.apache.org/r/12936/#comment50064>

    Casting the data to UTF might be quite dangerous, especially for binary values as they will be interpreted and possibly corrupted. This won't be an issue with current implementation that stores binary data as a stream of 0s and 1s, but will become an issue when (if) we switch to the format designed on the wiki.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java
<https://reviews.apache.org/r/12936/#comment50070>

    This import do not seem to be relevant any more.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java
<https://reviews.apache.org/r/12936/#comment50042>

    Nit: Please keep the line on a single line.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java
<https://reviews.apache.org/r/12936/#comment50043>

    Nit: This change do not seem necessary.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java
<https://reviews.apache.org/r/12936/#comment50071>

    This import do not seem to be relevant any more.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
<https://reviews.apache.org/r/12936/#comment50044>

    Nit: Can we please put this on single line?



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java
<https://reviews.apache.org/r/12936/#comment50072>

    This import do not seem to be relevant any more.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java
<https://reviews.apache.org/r/12936/#comment50045>

    Nit: Please put the line on a single line.



execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java
<https://reviews.apache.org/r/12936/#comment50069>

    This import do not seem relevant any more.



execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java
<https://reviews.apache.org/r/12936/#comment50046>

    Nit: Please put this on a single line.



execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java
<https://reviews.apache.org/r/12936/#comment50047>

    It seems that we are building the same schema in multiple test cases. Would it make sense to create a helper method for this rather than copy&pasting the code?



execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java
<https://reviews.apache.org/r/12936/#comment50065>

    Hari got 20 experience points for using awesome text messages in the tests!



spi/pom.xml
<https://reviews.apache.org/r/12936/#comment50048>

    I'm not sure whether we want to make all connectors depending on the SDK. The SDK should be light library of code for connectors, We should not force the connector to reuse anything. Is the dependency required?



spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
<https://reviews.apache.org/r/12936/#comment50049>

    Nit: Please put this on a single line.


Jarcec

- Jarek Cecho


On Aug. 1, 2013, 3:41 a.m., Hari Shreedharan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2013, 3:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 
> 
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 
> 
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5b 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java 96818ba 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java PRE-CREATION 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
>   connector/connector-sdk/pom.xml 4056e14 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java PRE-CREATION 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java d0a087d 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
>   execution/mapreduce/pom.xml f9a2a0e 
>   execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 7fd9a01 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
>   pom.xml 5ea0633 
>   spi/pom.xml 0b240e8 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 
> 
> Diff: https://reviews.apache.org/r/12936/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Hari Shreedharan <hs...@cloudera.com>.

> On Aug. 1, 2013, 4:42 a.m., Venkat Ranganathan wrote:
> > common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java, line 45
> > <https://reviews.apache.org/r/12936/diff/5/?file=332241#file332241line45>
> >
> >     Do you think this should be writeContent (or conversely the method in DataReader should be changed to readRecord instead of Content?)
> 
> Hari Shreedharan wrote:
>     Venkat:
>     
>     Maybe I can make the javadoc clearer, but the idea of having readContent and writeContent in the DataReader/DataWriter and the IntermediateDataFormat is that if we allow the connector to choose a IntermediateDataFormat - the connector can read/write in the native format and (once we make the serialization part in the OutputFormat pluggable) we would be able to have the serializer also read/write in the connector's native format. In that case, it is possible that the native format might be efficiently able to put in several records in one call itself - which is why I named it as such (so all others will be record oriented while this method is not). Does that make the intent clearer?
> 
> Venkat Ranganathan wrote:
>     Hari:
>     
>     I guessed your intent, but was commenting that one is called readContent and another is called writeRecord.  Do you think they both should be xxxxContent?
>     
>     Thanks

Ah, yes. Will update in the next patch.


- Hari


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review24408
-----------------------------------------------------------


On Aug. 1, 2013, 3:41 a.m., Hari Shreedharan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2013, 3:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 
> 
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 
> 
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5b 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java 96818ba 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java PRE-CREATION 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
>   connector/connector-sdk/pom.xml 4056e14 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java PRE-CREATION 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java d0a087d 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
>   execution/mapreduce/pom.xml f9a2a0e 
>   execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 7fd9a01 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
>   pom.xml 5ea0633 
>   spi/pom.xml 0b240e8 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 
> 
> Diff: https://reviews.apache.org/r/12936/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Venkat Ranganathan <n....@live.com>.

> On Aug. 1, 2013, 4:42 a.m., Venkat Ranganathan wrote:
> > common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java, line 45
> > <https://reviews.apache.org/r/12936/diff/5/?file=332241#file332241line45>
> >
> >     Do you think this should be writeContent (or conversely the method in DataReader should be changed to readRecord instead of Content?)
> 
> Hari Shreedharan wrote:
>     Venkat:
>     
>     Maybe I can make the javadoc clearer, but the idea of having readContent and writeContent in the DataReader/DataWriter and the IntermediateDataFormat is that if we allow the connector to choose a IntermediateDataFormat - the connector can read/write in the native format and (once we make the serialization part in the OutputFormat pluggable) we would be able to have the serializer also read/write in the connector's native format. In that case, it is possible that the native format might be efficiently able to put in several records in one call itself - which is why I named it as such (so all others will be record oriented while this method is not). Does that make the intent clearer?

Hari:

I guessed your intent, but was commenting that one is called readContent and another is called writeRecord.  Do you think they both should be xxxxContent?

Thanks


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review24408
-----------------------------------------------------------


On Aug. 1, 2013, 3:41 a.m., Hari Shreedharan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2013, 3:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 
> 
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 
> 
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5b 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java 96818ba 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java PRE-CREATION 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
>   connector/connector-sdk/pom.xml 4056e14 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java PRE-CREATION 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java d0a087d 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
>   execution/mapreduce/pom.xml f9a2a0e 
>   execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 7fd9a01 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
>   pom.xml 5ea0633 
>   spi/pom.xml 0b240e8 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 
> 
> Diff: https://reviews.apache.org/r/12936/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Hari Shreedharan <hs...@cloudera.com>.

> On Aug. 1, 2013, 4:42 a.m., Venkat Ranganathan wrote:
> > common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java, line 45
> > <https://reviews.apache.org/r/12936/diff/5/?file=332241#file332241line45>
> >
> >     Do you think this should be writeContent (or conversely the method in DataReader should be changed to readRecord instead of Content?)

Venkat:

Maybe I can make the javadoc clearer, but the idea of having readContent and writeContent in the DataReader/DataWriter and the IntermediateDataFormat is that if we allow the connector to choose a IntermediateDataFormat - the connector can read/write in the native format and (once we make the serialization part in the OutputFormat pluggable) we would be able to have the serializer also read/write in the connector's native format. In that case, it is possible that the native format might be efficiently able to put in several records in one call itself - which is why I named it as such (so all others will be record oriented while this method is not). Does that make the intent clearer?


- Hari


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review24408
-----------------------------------------------------------


On Aug. 1, 2013, 3:41 a.m., Hari Shreedharan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2013, 3:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 
> 
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 
> 
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5b 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java 96818ba 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java PRE-CREATION 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
>   connector/connector-sdk/pom.xml 4056e14 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java PRE-CREATION 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java d0a087d 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
>   execution/mapreduce/pom.xml f9a2a0e 
>   execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 7fd9a01 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
>   pom.xml 5ea0633 
>   spi/pom.xml 0b240e8 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 
> 
> Diff: https://reviews.apache.org/r/12936/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Venkat Ranganathan <n....@live.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review24408
-----------------------------------------------------------


Thanks for working on this and looks good.   The ability to have an intermediate format is a good thing (I am mimicking somewhat similar targeted work for Sqoop 1 for some new changes).


common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment48298>

    Do you think this should be writeContent (or conversely the method in DataReader should be changed to readRecord instead of Content?)


- Venkat Ranganathan


On Aug. 1, 2013, 3:41 a.m., Hari Shreedharan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2013, 3:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 
> 
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 
> 
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5b 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java 96818ba 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java PRE-CREATION 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
>   connector/connector-sdk/pom.xml 4056e14 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java PRE-CREATION 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java PRE-CREATION 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java d0a087d 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
>   execution/mapreduce/pom.xml f9a2a0e 
>   execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 7fd9a01 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
>   pom.xml 5ea0633 
>   spi/pom.xml 0b240e8 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 
> 
> Diff: https://reviews.apache.org/r/12936/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Hari Shreedharan <hs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/
-----------------------------------------------------------

(Updated Aug. 1, 2013, 3:41 a.m.)


Review request for Sqoop.


Changes
-------

Incorporate review feedback


Bugs: SQOOP-777
    https://issues.apache.org/jira/browse/SQOOP-777


Repository: sqoop-sqoop2


Description
-------

Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 

I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 

I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.


Diffs (updated)
-----

  common/pom.xml db11b5b 
  common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
  common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
  common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java 96818ba 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java PRE-CREATION 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
  connector/connector-sdk/pom.xml 4056e14 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java PRE-CREATION 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java PRE-CREATION 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java PRE-CREATION 
  core/src/main/java/org/apache/sqoop/framework/JobManager.java d0a087d 
  core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
  execution/mapreduce/pom.xml f9a2a0e 
  execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 7fd9a01 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
  pom.xml 5ea0633 
  spi/pom.xml 0b240e8 
  spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
  submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 

Diff: https://reviews.apache.org/r/12936/diff/


Testing
-------


Thanks,

Hari Shreedharan


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Hari Shreedharan <hs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/
-----------------------------------------------------------

(Updated July 27, 2013, 4:11 a.m.)


Review request for Sqoop.


Changes
-------

Sorry, uploaded the wrong patch last time. This is the real patch with the tests included.


Bugs: SQOOP-777
    https://issues.apache.org/jira/browse/SQOOP-777


Repository: sqoop-sqoop2


Description
-------

Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 

I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 

I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.


Diffs (updated)
-----

  common/pom.xml db11b5b 
  common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java PRE-CREATION 
  common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
  common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
  common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java PRE-CREATION 
  common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
  common/src/test/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormatTest.java PRE-CREATION 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
  core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java 4293dce 
  core/src/main/java/org/apache/sqoop/framework/JobManager.java 9f09982 
  core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
  execution/mapreduce/pom.xml f9a2a0e 
  execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
  pom.xml 520c107 
  spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
  submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 

Diff: https://reviews.apache.org/r/12936/diff/


Testing
-------


Thanks,

Hari Shreedharan


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Hari Shreedharan <hs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/
-----------------------------------------------------------

(Updated July 27, 2013, 1:46 a.m.)


Review request for Sqoop.


Changes
-------

Unit tests for CSVIntermediateDataFormat which tests some of the escaping too.


Bugs: SQOOP-777
    https://issues.apache.org/jira/browse/SQOOP-777


Repository: sqoop-sqoop2


Description
-------

Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 

I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 

I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.


Diffs (updated)
-----

  common/pom.xml db11b5b 
  common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java PRE-CREATION 
  common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
  common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
  common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java PRE-CREATION 
  common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
  core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java 4293dce 
  core/src/main/java/org/apache/sqoop/framework/JobManager.java 9f09982 
  core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
  execution/mapreduce/pom.xml f9a2a0e 
  execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
  pom.xml 520c107 
  spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
  submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 

Diff: https://reviews.apache.org/r/12936/diff/


Testing
-------


Thanks,

Hari Shreedharan


Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Posted by Hari Shreedharan <hs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/
-----------------------------------------------------------

(Updated July 26, 2013, 12:04 a.m.)


Review request for Sqoop.


Changes
-------

Unit tests for SqoopWritable. Some code cleanup


Bugs: SQOOP-777
    https://issues.apache.org/jira/browse/SQOOP-777


Repository: sqoop-sqoop2


Description
-------

Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop. 

I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch. 

I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.


Diffs (updated)
-----

  common/pom.xml db11b5b 
  common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java PRE-CREATION 
  common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
  common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
  common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java PRE-CREATION 
  common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f 
  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff 
  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba 
  core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java 4293dce 
  core/src/main/java/org/apache/sqoop/framework/JobManager.java 9f09982 
  core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d0039 
  execution/mapreduce/pom.xml f9a2a0e 
  execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java 767080c 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java 1978ec6 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java a07c511 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java 4621942 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java 356ae8a 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 59cf391 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java 90de6ef 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java 739eb17 
  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java b31161c 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aae 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c6 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION 
  execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java bee8ab7 
  pom.xml 520c107 
  spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56 
  submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java 6fc485b 

Diff: https://reviews.apache.org/r/12936/diff/


Testing
-------


Thanks,

Hari Shreedharan