You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Joey Echeverria (JIRA)" <ji...@apache.org> on 2011/08/19 03:14:27 UTC
[jira] [Created] (SQOOP-318) Add support for splittable lzo files
with Hive
Add support for splittable lzo files with Hive
----------------------------------------------
Key: SQOOP-318
URL: https://issues.apache.org/jira/browse/SQOOP-318
Project: Sqoop
Issue Type: Improvement
Components: hive-integration
Reporter: Joey Echeverria
Priority: Minor
When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088076#comment-13088076 ]
jiraposter@reviews.apache.org commented on SQOOP-318:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/#review1563
-----------------------------------------------------------
Great patch Joey! I do have a high-level suggestion of adding a mapping to alias "lzop" to the codec "com.hadoop.compression.lzo.LzopCodec" in com.cloudera.sqoop.io.CodecMap implementation. If you do that, it is likely that the tests you have added in HiveImport and TableDefWriter will have to be modified in order to accommodate the use of the alias.
Also, it would be great to have a blurb about this in the user guide under src/docs/user.
Some minor checkstyle issues noted below.
src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3536>
Indent.
src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3537>
Line longer than 80.
src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3538>
Line longer than 80.
src/java/com/cloudera/sqoop/hive/TableDefWriter.java
<https://reviews.apache.org/r/1597/#comment3539>
Lines longer than 80.
- Arvind
On 2011-08-19 18:49:06, Joey Echeverria wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1597/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-19 18:49:06)
bq.
bq.
bq. Review request for Sqoop.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs
bq.
bq. STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
bq. OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
bq.
bq. at the end of the create table command, otherwise it outputs the standard
bq.
bq. STORED AS TEXTFILE
bq.
bq. I also added a call to the DistributedLzoIndexer before the data is imported into Hive.
bq.
bq.
bq. This addresses bug SQOOP-318.
bq. https://issues.apache.org/jira/browse/SQOOP-318
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba
bq. src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135
bq. src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e
bq.
bq. Diff: https://reviews.apache.org/r/1597/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.
bq.
bq.
bq. Thanks,
bq.
bq. Joey
bq.
bq.
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-318:
----------------------------------
Status: Patch Available (was: Open)
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch, SQOOP-318-2.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-318:
----------------------------------
Attachment: SQOOP-318-1.patch
Here's my first cut at a patch. It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Reporter: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "Arvind Prabhakar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvind Prabhakar updated SQOOP-318:
-----------------------------------
Resolution: Fixed
Fix Version/s: 1.4.0
Status: Resolved (was: Patch Available)
Patch committed. Thanks Joey!
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Fix For: 1.4.0
>
> Attachments: SQOOP-318-1.patch, SQOOP-318-2.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089126#comment-13089126 ]
jiraposter@reviews.apache.org commented on SQOOP-318:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/
-----------------------------------------------------------
(Updated 2011-08-22 23:01:36.319406)
Review request for Sqoop.
Changes
-------
I added lzop to the CodecMap and modified the tests to reference the codec with the short name. I added a blurb at the end of the Hive documentation describing the splitting you get with the lzop codec. I also fixed the checkstyle issues.
Summary
-------
I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
at the end of the create table command, otherwise it outputs the standard
STORED AS TEXTFILE
I also added a call to the DistributedLzoIndexer before the data is imported into Hive.
This addresses bug SQOOP-318.
https://issues.apache.org/jira/browse/SQOOP-318
Diffs (updated)
-----
src/docs/user/hive.txt 059d7cb
src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba
src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135
src/java/com/cloudera/sqoop/io/CodecMap.java 8564164
src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e
Diff: https://reviews.apache.org/r/1597/diff
Testing
-------
It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.
Thanks,
Joey
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch, SQOOP-318-2.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-318:
----------------------------------
Affects Version/s: 1.3.0
Status: Patch Available (was: Open)
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-318:
----------------------------------
Status: Open (was: Patch Available)
Canceling first patch.
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087887#comment-13087887 ]
jiraposter@reviews.apache.org commented on SQOOP-318:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/
-----------------------------------------------------------
Review request for Sqoop.
Summary
-------
I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
at the end of the create table command, otherwise it outputs the standard
STORED AS TEXTFILE
I also added a call to the DistributedLzoIndexer before the data is imported into Hive.
This addresses bug SQOOP-318.
https://issues.apache.org/jira/browse/SQOOP-318
Diffs
-----
src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba
src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135
src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e
Diff: https://reviews.apache.org/r/1597/diff
Testing
-------
It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.
Thanks,
Joey
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089593#comment-13089593 ]
jiraposter@reviews.apache.org commented on SQOOP-318:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/#review1599
-----------------------------------------------------------
Ship it!
+1
- Arvind
On 2011-08-22 23:01:36, Joey Echeverria wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1597/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-22 23:01:36)
bq.
bq.
bq. Review request for Sqoop.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs
bq.
bq. STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
bq. OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
bq.
bq. at the end of the create table command, otherwise it outputs the standard
bq.
bq. STORED AS TEXTFILE
bq.
bq. I also added a call to the DistributedLzoIndexer before the data is imported into Hive.
bq.
bq.
bq. This addresses bug SQOOP-318.
bq. https://issues.apache.org/jira/browse/SQOOP-318
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/docs/user/hive.txt 059d7cb
bq. src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba
bq. src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135
bq. src/java/com/cloudera/sqoop/io/CodecMap.java 8564164
bq. src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e
bq.
bq. Diff: https://reviews.apache.org/r/1597/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.
bq.
bq.
bq. Thanks,
bq.
bq. Joey
bq.
bq.
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch, SQOOP-318-2.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-318:
----------------------------------
Attachment: SQOOP-318-2.patch
Implemented recommendations made on review board.
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch, SQOOP-318-2.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089630#comment-13089630 ]
Hudson commented on SQOOP-318:
------------------------------
Integrated in Sqoop-jdk-1.6 #16 (See [https://builds.apache.org/job/Sqoop-jdk-1.6/16/])
SQOOP-318. Support splittable LZO files with Hive.
(Joey Echeverria via Arvind Prabhakar)
arvind : http://svn.apache.org/viewvc/?view=rev&rev=1160815
Files :
* /incubator/sqoop/trunk/src/docs/user/hive.txt
* /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/io/CodecMap.java
* /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java
* /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/hive/TableDefWriter.java
* /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/hive/HiveImport.java
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Fix For: 1.4.0
>
> Attachments: SQOOP-318-1.patch, SQOOP-318-2.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (SQOOP-318) Add support for splittable lzo files
with Hive
Posted by "Arvind Prabhakar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvind Prabhakar reassigned SQOOP-318:
--------------------------------------
Assignee: Joey Echeverria
> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
> Key: SQOOP-318
> URL: https://issues.apache.org/jira/browse/SQOOP-318
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-318-1.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira