You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org> on 2011/08/19 20:50:30 UTC

[jira] [Commented] (SQOOP-318) Add support for splittable lzo files with Hive

    [ https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087887#comment-13087887 ] 

jiraposter@reviews.apache.org commented on SQOOP-318:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/
-----------------------------------------------------------

Review request for Sqoop.


Summary
-------

I added a check when generating the create table string to see if the LzopCodec is in use. If it is, it outputs

STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

at the end of the create table command, otherwise it outputs the standard

STORED AS TEXTFILE

I also added a call to the DistributedLzoIndexer before the data is imported into Hive.


This addresses bug SQOOP-318.
    https://issues.apache.org/jira/browse/SQOOP-318


Diffs
-----

  src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba 
  src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135 
  src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e 

Diff: https://reviews.apache.org/r/1597/diff


Testing
-------

It includes a test for the create table syntax. I manually tested calling the indexer. I'm not sure how to automate that without making LZO required to build.


Thanks,

Joey



> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
>                 Key: SQOOP-318
>                 URL: https://issues.apache.org/jira/browse/SQOOP-318
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: hive-integration
>    Affects Versions: 1.3.0
>            Reporter: Joey Echeverria
>            Assignee: Joey Echeverria
>            Priority: Minor
>         Attachments: SQOOP-318-1.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It would also be nice to automatically run the DistributedIndexer so that the LZO files can be split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira