You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Sandish Kumar HN <sa...@gmail.com> on 2017/08/24 09:51:12 UTC

Review Request 61882: SQOOP-3215 : sqoop import to hive table as formats(avro, parquet)

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/
-----------------------------------------------------------

Review request for Sqoop and Anna Szonyi.


Bugs: SQOOP-3215
    https://issues.apache.org/jira/browse/SQOOP-3215


Repository: sqoop-trunk


Description
-------

sqoop import and create hive table as formats(avro,parquet)


Diffs
-----

  src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1564bdcb 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b 
  testdata/hive/scripts/createhiveImportasparquet.q PRE-CREATION 
  testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION 


Diff: https://reviews.apache.org/r/61882/diff/1/


Testing
-------

testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.


Thanks,

Sandish Kumar HN


Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as formats(avro, parquet)

Posted by Sandish Kumar HN <sa...@gmail.com>.

> On Nov. 16, 2017, 1:43 p.m., Szabolcs Vasas wrote:
> > Hi Sandish,
> > 
> > I have took another look at your patch and I think there is some confusion here.
> > 
> > * The JIRA description says that create-hive-table tool needs to be improved to be able to create Parquet and Avro tables too (schema only). The problem is that there is a --create-hive-table option as well which is a totally different thing. The --create-hive-table option gets mapped to org.apache.sqoop.SqoopOptions#failIfHiveTableExists boolean value (which is a much better description) and it means that Sqoop job gets aborted if the Hive table already exist. I think your patch tries to address the --create-hive-table option so it should be done under a different JIRA.
> > * Based on the code you try to include 2 changes in this patch and I think they should be separated.
> >   * The first thing was the Parquet Hive table creation changes in TableDefWriter. The problem is that I think this code never gets executed since org.apache.sqoop.hive.HiveImport#importTable never gets invoked if we import into Parquet files. See: org/apache/sqoop/tool/ImportTool.java:539
> >   * The other thing is the Avro support for Hive tables. I think this part still needs more testing with possibly all the supported data types and codecs.
> > 
> > Regards,
> > Szabolcs

Hi @vasas,

Thanks for review, I will create two seperate JIRA issues for this patch. One for "Sqoop import parquet support" and "Sqoop import Avro support"


- Sandish Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review191177
-----------------------------------------------------------


On Oct. 23, 2017, 12:30 p.m., Sandish Kumar HN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
> 
> (Updated Oct. 23, 2017, 12:30 p.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-3215
>     https://issues.apache.org/jira/browse/SQOOP-3215
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> sqoop import and create hive table as formats(avro,parquet)
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 6a4dcb09 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b 
>   testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61882/diff/2/
> 
> 
> Testing
> -------
> 
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>


Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as formats(avro, parquet)

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review191177
-----------------------------------------------------------



Hi Sandish,

I have took another look at your patch and I think there is some confusion here.

* The JIRA description says that create-hive-table tool needs to be improved to be able to create Parquet and Avro tables too (schema only). The problem is that there is a --create-hive-table option as well which is a totally different thing. The --create-hive-table option gets mapped to org.apache.sqoop.SqoopOptions#failIfHiveTableExists boolean value (which is a much better description) and it means that Sqoop job gets aborted if the Hive table already exist. I think your patch tries to address the --create-hive-table option so it should be done under a different JIRA.
* Based on the code you try to include 2 changes in this patch and I think they should be separated.
  * The first thing was the Parquet Hive table creation changes in TableDefWriter. The problem is that I think this code never gets executed since org.apache.sqoop.hive.HiveImport#importTable never gets invoked if we import into Parquet files. See: org/apache/sqoop/tool/ImportTool.java:539
  * The other thing is the Avro support for Hive tables. I think this part still needs more testing with possibly all the supported data types and codecs.

Regards,
Szabolcs

- Szabolcs Vasas


On Oct. 23, 2017, 12:30 p.m., Sandish Kumar HN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
> 
> (Updated Oct. 23, 2017, 12:30 p.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-3215
>     https://issues.apache.org/jira/browse/SQOOP-3215
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> sqoop import and create hive table as formats(avro,parquet)
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 6a4dcb09 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b 
>   testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61882/diff/2/
> 
> 
> Testing
> -------
> 
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>


Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as formats(avro, parquet)

Posted by Sandish Kumar HN <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/
-----------------------------------------------------------

(Updated Oct. 23, 2017, 12:30 p.m.)


Review request for Sqoop and Anna Szonyi.


Changes
-------

fixs


Bugs: SQOOP-3215
    https://issues.apache.org/jira/browse/SQOOP-3215


Repository: sqoop-trunk


Description
-------

sqoop import and create hive table as formats(avro,parquet)


Diffs (updated)
-----

  src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 6a4dcb09 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b 
  testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION 


Diff: https://reviews.apache.org/r/61882/diff/2/

Changes: https://reviews.apache.org/r/61882/diff/1-2/


Testing
-------

testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.


Thanks,

Sandish Kumar HN


Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as formats(avro, parquet)

Posted by Szabolcs Vasas <va...@gmail.com>.

> On Sept. 26, 2017, 1:35 p.m., Szabolcs Vasas wrote:
> > Hi Sandish,
> > 
> > Thank you for your patch! Please find my findings below:
> > 
> > - Sqoop uses Kite for importing in Parquet format so I think your patch should also utilize the Kite libraries for Hive table generation (see org.apache.sqoop.mapreduce.ParquetJob#createDataset). This would ensure that the table creation behaviour is the same when we import the data and when we only create the table.
> > - SQLServerHiveImportTest.testNormalHiveImportAsAvro test case fails (SQLServerHiveImportTest is a subclass of TestHiveImport so it has inherited your new test case). Can you please take a look?
> > 
> > I have not had time yet to deeply review the Avro part but my impression is that it needs much more thorough testing(e.g. more data types) since it is a new feature. What do you think?
> > 
> > Regards,
> > Szabolcs
> 
> Sandish Kumar HN wrote:
>     - Do you want me to use org.apache.sqoop.mapreduce.ParquetJob#createDataset in testcases??
>     - Yes will add more data types on testcases for avro/parquet

I haven't checked if ParquetJob#createDataset is a perfect fit for your use case here but looks like a good starting point.


- Szabolcs


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review186246
-----------------------------------------------------------


On Aug. 24, 2017, 9:51 a.m., Sandish Kumar HN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2017, 9:51 a.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-3215
>     https://issues.apache.org/jira/browse/SQOOP-3215
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> sqoop import and create hive table as formats(avro,parquet)
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1564bdcb 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b 
>   testdata/hive/scripts/createhiveImportasparquet.q PRE-CREATION 
>   testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61882/diff/1/
> 
> 
> Testing
> -------
> 
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>


Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as formats(avro, parquet)

Posted by Sandish Kumar HN <sa...@gmail.com>.

> On Sept. 26, 2017, 1:35 p.m., Szabolcs Vasas wrote:
> > Hi Sandish,
> > 
> > Thank you for your patch! Please find my findings below:
> > 
> > - Sqoop uses Kite for importing in Parquet format so I think your patch should also utilize the Kite libraries for Hive table generation (see org.apache.sqoop.mapreduce.ParquetJob#createDataset). This would ensure that the table creation behaviour is the same when we import the data and when we only create the table.
> > - SQLServerHiveImportTest.testNormalHiveImportAsAvro test case fails (SQLServerHiveImportTest is a subclass of TestHiveImport so it has inherited your new test case). Can you please take a look?
> > 
> > I have not had time yet to deeply review the Avro part but my impression is that it needs much more thorough testing(e.g. more data types) since it is a new feature. What do you think?
> > 
> > Regards,
> > Szabolcs

- Do you want me to use org.apache.sqoop.mapreduce.ParquetJob#createDataset in testcases??
- Yes will add more data types on testcases for avro/parquet


> On Sept. 26, 2017, 1:35 p.m., Szabolcs Vasas wrote:
> > src/test/com/cloudera/sqoop/hive/TestHiveImport.java
> > Line 302 (original), 308 (patched)
> > <https://reviews.apache.org/r/61882/diff/1/?file=1802862#file1802862line308>
> >
> >     Why did you need to add this change? As far as I understand your Parquet changes should only be effective if the --create-hive-table option is specified.

Yes I can remove this.


- Sandish Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review186246
-----------------------------------------------------------


On Aug. 24, 2017, 9:51 a.m., Sandish Kumar HN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2017, 9:51 a.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-3215
>     https://issues.apache.org/jira/browse/SQOOP-3215
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> sqoop import and create hive table as formats(avro,parquet)
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1564bdcb 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b 
>   testdata/hive/scripts/createhiveImportasparquet.q PRE-CREATION 
>   testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61882/diff/1/
> 
> 
> Testing
> -------
> 
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>


Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as formats(avro, parquet)

Posted by Sandish Kumar HN <sa...@gmail.com>.

> On Sept. 26, 2017, 1:35 p.m., Szabolcs Vasas wrote:
> > Hi Sandish,
> > 
> > Thank you for your patch! Please find my findings below:
> > 
> > - Sqoop uses Kite for importing in Parquet format so I think your patch should also utilize the Kite libraries for Hive table generation (see org.apache.sqoop.mapreduce.ParquetJob#createDataset). This would ensure that the table creation behaviour is the same when we import the data and when we only create the table.
> > - SQLServerHiveImportTest.testNormalHiveImportAsAvro test case fails (SQLServerHiveImportTest is a subclass of TestHiveImport so it has inherited your new test case). Can you please take a look?
> > 
> > I have not had time yet to deeply review the Avro part but my impression is that it needs much more thorough testing(e.g. more data types) since it is a new feature. What do you think?
> > 
> > Regards,
> > Szabolcs
> 
> Sandish Kumar HN wrote:
>     - Do you want me to use org.apache.sqoop.mapreduce.ParquetJob#createDataset in testcases??
>     - Yes will add more data types on testcases for avro/parquet
> 
> Szabolcs Vasas wrote:
>     I haven't checked if ParquetJob#createDataset is a perfect fit for your use case here but looks like a good starting point.

I'm not creating any parquet files, I'm directly using hive serde properties which is easy for hive. creating parquet files through kite will be a burden for sqoop and also added string,int data types test's, please let me know if you want me to add more data types and testcases.


- Sandish Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review186246
-----------------------------------------------------------


On Oct. 23, 2017, 12:30 p.m., Sandish Kumar HN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
> 
> (Updated Oct. 23, 2017, 12:30 p.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-3215
>     https://issues.apache.org/jira/browse/SQOOP-3215
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> sqoop import and create hive table as formats(avro,parquet)
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 6a4dcb09 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b 
>   testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61882/diff/2/
> 
> 
> Testing
> -------
> 
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>


Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as formats(avro, parquet)

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review186246
-----------------------------------------------------------



Hi Sandish,

Thank you for your patch! Please find my findings below:

- Sqoop uses Kite for importing in Parquet format so I think your patch should also utilize the Kite libraries for Hive table generation (see org.apache.sqoop.mapreduce.ParquetJob#createDataset). This would ensure that the table creation behaviour is the same when we import the data and when we only create the table.
- SQLServerHiveImportTest.testNormalHiveImportAsAvro test case fails (SQLServerHiveImportTest is a subclass of TestHiveImport so it has inherited your new test case). Can you please take a look?

I have not had time yet to deeply review the Avro part but my impression is that it needs much more thorough testing(e.g. more data types) since it is a new feature. What do you think?

Regards,
Szabolcs


src/test/com/cloudera/sqoop/hive/TestHiveImport.java
Line 302 (original), 308 (patched)
<https://reviews.apache.org/r/61882/#comment262735>

    Why did you need to add this change? As far as I understand your Parquet changes should only be effective if the --create-hive-table option is specified.


- Szabolcs Vasas


On Aug. 24, 2017, 9:51 a.m., Sandish Kumar HN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2017, 9:51 a.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-3215
>     https://issues.apache.org/jira/browse/SQOOP-3215
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> sqoop import and create hive table as formats(avro,parquet)
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1564bdcb 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b 
>   testdata/hive/scripts/createhiveImportasparquet.q PRE-CREATION 
>   testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61882/diff/1/
> 
> 
> Testing
> -------
> 
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>