You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Sandish Kumar HN <sa...@gmail.com> on 2017/08/24 09:51:12 UTC
Review Request 61882: SQOOP-3215 : sqoop import to hive table as
formats(avro, parquet)
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/
-----------------------------------------------------------
Review request for Sqoop and Anna Szonyi.
Bugs: SQOOP-3215
https://issues.apache.org/jira/browse/SQOOP-3215
Repository: sqoop-trunk
Description
-------
sqoop import and create hive table as formats(avro,parquet)
Diffs
-----
src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6
src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1564bdcb
src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b
testdata/hive/scripts/createhiveImportasparquet.q PRE-CREATION
testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION
Diff: https://reviews.apache.org/r/61882/diff/1/
Testing
-------
testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
Thanks,
Sandish Kumar HN
Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as
formats(avro, parquet)
Posted by Sandish Kumar HN <sa...@gmail.com>.
> On Nov. 16, 2017, 1:43 p.m., Szabolcs Vasas wrote:
> > Hi Sandish,
> >
> > I have took another look at your patch and I think there is some confusion here.
> >
> > * The JIRA description says that create-hive-table tool needs to be improved to be able to create Parquet and Avro tables too (schema only). The problem is that there is a --create-hive-table option as well which is a totally different thing. The --create-hive-table option gets mapped to org.apache.sqoop.SqoopOptions#failIfHiveTableExists boolean value (which is a much better description) and it means that Sqoop job gets aborted if the Hive table already exist. I think your patch tries to address the --create-hive-table option so it should be done under a different JIRA.
> > * Based on the code you try to include 2 changes in this patch and I think they should be separated.
> > * The first thing was the Parquet Hive table creation changes in TableDefWriter. The problem is that I think this code never gets executed since org.apache.sqoop.hive.HiveImport#importTable never gets invoked if we import into Parquet files. See: org/apache/sqoop/tool/ImportTool.java:539
> > * The other thing is the Avro support for Hive tables. I think this part still needs more testing with possibly all the supported data types and codecs.
> >
> > Regards,
> > Szabolcs
Hi @vasas,
Thanks for review, I will create two seperate JIRA issues for this patch. One for "Sqoop import parquet support" and "Sqoop import Avro support"
- Sandish Kumar
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review191177
-----------------------------------------------------------
On Oct. 23, 2017, 12:30 p.m., Sandish Kumar HN wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
>
> (Updated Oct. 23, 2017, 12:30 p.m.)
>
>
> Review request for Sqoop and Anna Szonyi.
>
>
> Bugs: SQOOP-3215
> https://issues.apache.org/jira/browse/SQOOP-3215
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> sqoop import and create hive table as formats(avro,parquet)
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 6a4dcb09
> src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b
> testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/61882/diff/2/
>
>
> Testing
> -------
>
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
>
>
> Thanks,
>
> Sandish Kumar HN
>
>
Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as
formats(avro, parquet)
Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review191177
-----------------------------------------------------------
Hi Sandish,
I have took another look at your patch and I think there is some confusion here.
* The JIRA description says that create-hive-table tool needs to be improved to be able to create Parquet and Avro tables too (schema only). The problem is that there is a --create-hive-table option as well which is a totally different thing. The --create-hive-table option gets mapped to org.apache.sqoop.SqoopOptions#failIfHiveTableExists boolean value (which is a much better description) and it means that Sqoop job gets aborted if the Hive table already exist. I think your patch tries to address the --create-hive-table option so it should be done under a different JIRA.
* Based on the code you try to include 2 changes in this patch and I think they should be separated.
* The first thing was the Parquet Hive table creation changes in TableDefWriter. The problem is that I think this code never gets executed since org.apache.sqoop.hive.HiveImport#importTable never gets invoked if we import into Parquet files. See: org/apache/sqoop/tool/ImportTool.java:539
* The other thing is the Avro support for Hive tables. I think this part still needs more testing with possibly all the supported data types and codecs.
Regards,
Szabolcs
- Szabolcs Vasas
On Oct. 23, 2017, 12:30 p.m., Sandish Kumar HN wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
>
> (Updated Oct. 23, 2017, 12:30 p.m.)
>
>
> Review request for Sqoop and Anna Szonyi.
>
>
> Bugs: SQOOP-3215
> https://issues.apache.org/jira/browse/SQOOP-3215
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> sqoop import and create hive table as formats(avro,parquet)
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 6a4dcb09
> src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b
> testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/61882/diff/2/
>
>
> Testing
> -------
>
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
>
>
> Thanks,
>
> Sandish Kumar HN
>
>
Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as
formats(avro, parquet)
Posted by Sandish Kumar HN <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/
-----------------------------------------------------------
(Updated Oct. 23, 2017, 12:30 p.m.)
Review request for Sqoop and Anna Szonyi.
Changes
-------
fixs
Bugs: SQOOP-3215
https://issues.apache.org/jira/browse/SQOOP-3215
Repository: sqoop-trunk
Description
-------
sqoop import and create hive table as formats(avro,parquet)
Diffs (updated)
-----
src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6
src/java/org/apache/sqoop/tool/BaseSqoopTool.java 6a4dcb09
src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b
testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION
Diff: https://reviews.apache.org/r/61882/diff/2/
Changes: https://reviews.apache.org/r/61882/diff/1-2/
Testing
-------
testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
Thanks,
Sandish Kumar HN
Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as
formats(avro, parquet)
Posted by Szabolcs Vasas <va...@gmail.com>.
> On Sept. 26, 2017, 1:35 p.m., Szabolcs Vasas wrote:
> > Hi Sandish,
> >
> > Thank you for your patch! Please find my findings below:
> >
> > - Sqoop uses Kite for importing in Parquet format so I think your patch should also utilize the Kite libraries for Hive table generation (see org.apache.sqoop.mapreduce.ParquetJob#createDataset). This would ensure that the table creation behaviour is the same when we import the data and when we only create the table.
> > - SQLServerHiveImportTest.testNormalHiveImportAsAvro test case fails (SQLServerHiveImportTest is a subclass of TestHiveImport so it has inherited your new test case). Can you please take a look?
> >
> > I have not had time yet to deeply review the Avro part but my impression is that it needs much more thorough testing(e.g. more data types) since it is a new feature. What do you think?
> >
> > Regards,
> > Szabolcs
>
> Sandish Kumar HN wrote:
> - Do you want me to use org.apache.sqoop.mapreduce.ParquetJob#createDataset in testcases??
> - Yes will add more data types on testcases for avro/parquet
I haven't checked if ParquetJob#createDataset is a perfect fit for your use case here but looks like a good starting point.
- Szabolcs
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review186246
-----------------------------------------------------------
On Aug. 24, 2017, 9:51 a.m., Sandish Kumar HN wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
>
> (Updated Aug. 24, 2017, 9:51 a.m.)
>
>
> Review request for Sqoop and Anna Szonyi.
>
>
> Bugs: SQOOP-3215
> https://issues.apache.org/jira/browse/SQOOP-3215
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> sqoop import and create hive table as formats(avro,parquet)
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1564bdcb
> src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b
> testdata/hive/scripts/createhiveImportasparquet.q PRE-CREATION
> testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/61882/diff/1/
>
>
> Testing
> -------
>
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
>
>
> Thanks,
>
> Sandish Kumar HN
>
>
Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as
formats(avro, parquet)
Posted by Sandish Kumar HN <sa...@gmail.com>.
> On Sept. 26, 2017, 1:35 p.m., Szabolcs Vasas wrote:
> > Hi Sandish,
> >
> > Thank you for your patch! Please find my findings below:
> >
> > - Sqoop uses Kite for importing in Parquet format so I think your patch should also utilize the Kite libraries for Hive table generation (see org.apache.sqoop.mapreduce.ParquetJob#createDataset). This would ensure that the table creation behaviour is the same when we import the data and when we only create the table.
> > - SQLServerHiveImportTest.testNormalHiveImportAsAvro test case fails (SQLServerHiveImportTest is a subclass of TestHiveImport so it has inherited your new test case). Can you please take a look?
> >
> > I have not had time yet to deeply review the Avro part but my impression is that it needs much more thorough testing(e.g. more data types) since it is a new feature. What do you think?
> >
> > Regards,
> > Szabolcs
- Do you want me to use org.apache.sqoop.mapreduce.ParquetJob#createDataset in testcases??
- Yes will add more data types on testcases for avro/parquet
> On Sept. 26, 2017, 1:35 p.m., Szabolcs Vasas wrote:
> > src/test/com/cloudera/sqoop/hive/TestHiveImport.java
> > Line 302 (original), 308 (patched)
> > <https://reviews.apache.org/r/61882/diff/1/?file=1802862#file1802862line308>
> >
> > Why did you need to add this change? As far as I understand your Parquet changes should only be effective if the --create-hive-table option is specified.
Yes I can remove this.
- Sandish Kumar
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review186246
-----------------------------------------------------------
On Aug. 24, 2017, 9:51 a.m., Sandish Kumar HN wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
>
> (Updated Aug. 24, 2017, 9:51 a.m.)
>
>
> Review request for Sqoop and Anna Szonyi.
>
>
> Bugs: SQOOP-3215
> https://issues.apache.org/jira/browse/SQOOP-3215
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> sqoop import and create hive table as formats(avro,parquet)
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1564bdcb
> src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b
> testdata/hive/scripts/createhiveImportasparquet.q PRE-CREATION
> testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/61882/diff/1/
>
>
> Testing
> -------
>
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
>
>
> Thanks,
>
> Sandish Kumar HN
>
>
Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as
formats(avro, parquet)
Posted by Sandish Kumar HN <sa...@gmail.com>.
> On Sept. 26, 2017, 1:35 p.m., Szabolcs Vasas wrote:
> > Hi Sandish,
> >
> > Thank you for your patch! Please find my findings below:
> >
> > - Sqoop uses Kite for importing in Parquet format so I think your patch should also utilize the Kite libraries for Hive table generation (see org.apache.sqoop.mapreduce.ParquetJob#createDataset). This would ensure that the table creation behaviour is the same when we import the data and when we only create the table.
> > - SQLServerHiveImportTest.testNormalHiveImportAsAvro test case fails (SQLServerHiveImportTest is a subclass of TestHiveImport so it has inherited your new test case). Can you please take a look?
> >
> > I have not had time yet to deeply review the Avro part but my impression is that it needs much more thorough testing(e.g. more data types) since it is a new feature. What do you think?
> >
> > Regards,
> > Szabolcs
>
> Sandish Kumar HN wrote:
> - Do you want me to use org.apache.sqoop.mapreduce.ParquetJob#createDataset in testcases??
> - Yes will add more data types on testcases for avro/parquet
>
> Szabolcs Vasas wrote:
> I haven't checked if ParquetJob#createDataset is a perfect fit for your use case here but looks like a good starting point.
I'm not creating any parquet files, I'm directly using hive serde properties which is easy for hive. creating parquet files through kite will be a burden for sqoop and also added string,int data types test's, please let me know if you want me to add more data types and testcases.
- Sandish Kumar
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review186246
-----------------------------------------------------------
On Oct. 23, 2017, 12:30 p.m., Sandish Kumar HN wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
>
> (Updated Oct. 23, 2017, 12:30 p.m.)
>
>
> Review request for Sqoop and Anna Szonyi.
>
>
> Bugs: SQOOP-3215
> https://issues.apache.org/jira/browse/SQOOP-3215
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> sqoop import and create hive table as formats(avro,parquet)
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 6a4dcb09
> src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b
> testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/61882/diff/2/
>
>
> Testing
> -------
>
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
>
>
> Thanks,
>
> Sandish Kumar HN
>
>
Re: Review Request 61882: SQOOP-3215 : sqoop import to hive table as
formats(avro, parquet)
Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61882/#review186246
-----------------------------------------------------------
Hi Sandish,
Thank you for your patch! Please find my findings below:
- Sqoop uses Kite for importing in Parquet format so I think your patch should also utilize the Kite libraries for Hive table generation (see org.apache.sqoop.mapreduce.ParquetJob#createDataset). This would ensure that the table creation behaviour is the same when we import the data and when we only create the table.
- SQLServerHiveImportTest.testNormalHiveImportAsAvro test case fails (SQLServerHiveImportTest is a subclass of TestHiveImport so it has inherited your new test case). Can you please take a look?
I have not had time yet to deeply review the Avro part but my impression is that it needs much more thorough testing(e.g. more data types) since it is a new feature. What do you think?
Regards,
Szabolcs
src/test/com/cloudera/sqoop/hive/TestHiveImport.java
Line 302 (original), 308 (patched)
<https://reviews.apache.org/r/61882/#comment262735>
Why did you need to add this change? As far as I understand your Parquet changes should only be effective if the --create-hive-table option is specified.
- Szabolcs Vasas
On Aug. 24, 2017, 9:51 a.m., Sandish Kumar HN wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61882/
> -----------------------------------------------------------
>
> (Updated Aug. 24, 2017, 9:51 a.m.)
>
>
> Review request for Sqoop and Anna Szonyi.
>
>
> Bugs: SQOOP-3215
> https://issues.apache.org/jira/browse/SQOOP-3215
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> sqoop import and create hive table as formats(avro,parquet)
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/hive/TableDefWriter.java deec32d6
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1564bdcb
> src/test/com/cloudera/sqoop/hive/TestHiveImport.java a624f52b
> testdata/hive/scripts/createhiveImportasparquet.q PRE-CREATION
> testdata/hive/scripts/normalHiveImportAvro.q PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/61882/diff/1/
>
>
> Testing
> -------
>
> testNormalHiveImportAsAvro, testNormalHiveImportAsParquet with table creation checks.
>
>
> Thanks,
>
> Sandish Kumar HN
>
>