You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Joey Echeverria <jo...@cloudera.com> on 2011/08/19 20:49:06 UTC

Review Request: SQOOP-318 Add support for splittable lzo files with Hive

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/
-----------------------------------------------------------

Review request for Sqoop.


Summary
-------

I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs

STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

at the end of the create table command, otherwise it outputs the standard

STORED AS TEXTFILE

I also added a call to the DistributedLzoIndexer before the data is imported into Hive.


This addresses bug SQOOP-318.
    https://issues.apache.org/jira/browse/SQOOP-318


Diffs
-----

  src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba 
  src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135 
  src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e 

Diff: https://reviews.apache.org/r/1597/diff


Testing
-------

It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.


Thanks,

Joey


Re: Review Request: SQOOP-318 Add support for splittable lzo files with Hive

Posted by Arvind Prabhakar <ar...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/#review1599
-----------------------------------------------------------

Ship it!


+1

- Arvind


On 2011-08-22 23:01:36, Joey Echeverria wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1597/
> -----------------------------------------------------------
> 
> (Updated 2011-08-22 23:01:36)
> 
> 
> Review request for Sqoop.
> 
> 
> Summary
> -------
> 
> I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs
> 
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
> 
> at the end of the create table command, otherwise it outputs the standard
> 
> STORED AS TEXTFILE
> 
> I also added a call to the DistributedLzoIndexer before the data is imported into Hive.
> 
> 
> This addresses bug SQOOP-318.
>     https://issues.apache.org/jira/browse/SQOOP-318
> 
> 
> Diffs
> -----
> 
>   src/docs/user/hive.txt 059d7cb 
>   src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba 
>   src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135 
>   src/java/com/cloudera/sqoop/io/CodecMap.java 8564164 
>   src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e 
> 
> Diff: https://reviews.apache.org/r/1597/diff
> 
> 
> Testing
> -------
> 
> It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.
> 
> 
> Thanks,
> 
> Joey
> 
>


Re: Review Request: SQOOP-318 Add support for splittable lzo files with Hive

Posted by Joey Echeverria <jo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/
-----------------------------------------------------------

(Updated 2011-08-22 23:01:36.319406)


Review request for Sqoop.


Changes
-------

I added lzop to the CodecMap and modified the tests to reference the codec with the short name. I added a blurb at the end of the Hive documentation describing the splitting you get with the lzop codec. I also fixed the checkstyle issues.


Summary
-------

I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs

STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

at the end of the create table command, otherwise it outputs the standard

STORED AS TEXTFILE

I also added a call to the DistributedLzoIndexer before the data is imported into Hive.


This addresses bug SQOOP-318.
    https://issues.apache.org/jira/browse/SQOOP-318


Diffs (updated)
-----

  src/docs/user/hive.txt 059d7cb 
  src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba 
  src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135 
  src/java/com/cloudera/sqoop/io/CodecMap.java 8564164 
  src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e 

Diff: https://reviews.apache.org/r/1597/diff


Testing
-------

It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.


Thanks,

Joey


Re: Review Request: SQOOP-318 Add support for splittable lzo files with Hive

Posted by Arvind Prabhakar <ar...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/#review1563
-----------------------------------------------------------


Great patch Joey! I do have a high-level suggestion of adding a mapping to alias "lzop" to the codec "com.hadoop.compression.lzo.LzopCodec" in com.cloudera.sqoop.io.CodecMap implementation. If you do that, it is likely that the tests you have added in HiveImport and TableDefWriter will have to be modified in order to accommodate the use of the alias.

Also, it would be great to have a blurb about this in the user guide under src/docs/user.

Some minor checkstyle issues noted below.


src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3536>

    Indent.



src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3537>

    Line longer than 80.



src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3538>

    Line longer than 80.



src/java/com/cloudera/sqoop/hive/TableDefWriter.java
<https://reviews.apache.org/r/1597/#comment3539>

    Lines longer than 80.


- Arvind


On 2011-08-19 18:49:06, Joey Echeverria wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1597/
> -----------------------------------------------------------
> 
> (Updated 2011-08-19 18:49:06)
> 
> 
> Review request for Sqoop.
> 
> 
> Summary
> -------
> 
> I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs
> 
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
> 
> at the end of the create table command, otherwise it outputs the standard
> 
> STORED AS TEXTFILE
> 
> I also added a call to the DistributedLzoIndexer before the data is imported into Hive.
> 
> 
> This addresses bug SQOOP-318.
>     https://issues.apache.org/jira/browse/SQOOP-318
> 
> 
> Diffs
> -----
> 
>   src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba 
>   src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135 
>   src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e 
> 
> Diff: https://reviews.apache.org/r/1597/diff
> 
> 
> Testing
> -------
> 
> It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.
> 
> 
> Thanks,
> 
> Joey
> 
>