You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Jarek Cecho <ja...@apache.org> on 2015/09/28 20:30:40 UTC

Re: Review Request 37353: Support snappy compression in Sqoop Import with HCatalog.The Jira is SQOOP-2331

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37353/#review100838
-----------------------------------------------------------


Thanks for taking a look Shashank and my apologies for late review on this one.

Would you mind adding tests for the newly added functionality?


src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java (lines 843 - 845)
<https://reviews.apache.org/r/37353/#comment158092>

    Can we also add similar case for parquet?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java (lines 845 - 849)
<https://reviews.apache.org/r/37353/#comment158089>

    Shouldn't the FileOutputFormat be wrapped in else statement?



src/java/org/apache/sqoop/tool/ImportTool.java (lines 1152 - 1159)
<https://reviews.apache.org/r/37353/#comment158091>

    I'm wondering if this is indeed that case - the code seems to be allowin any codec for text file and only limiting snappy for sequencefile/orc.


Jarcec

- Jarek Cecho


On Aug. 11, 2015, 10:39 a.m., Shashank Tandon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37353/
> -----------------------------------------------------------
> 
> (Updated Aug. 11, 2015, 10:39 a.m.)
> 
> 
> Review request for Sqoop and Venkat Ranganathan.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Apache Sqoop does not compress with --compress option with --hcatalog-table.It also does not support option --compression-codec snappy. Will add Snappy compression support in Apache Sqoop. When a user will try to use --compress, then it will use the by default compression i.e. GZIP. otherwise If user provide option --compress --compression-codec snappy then it will compress into snappy format.
> 
> 
> Diffs
> -----
> 
>   src/docs/user/hcatalog.txt 99ae4f5 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java e19c17b 
>   src/java/org/apache/sqoop/io/CodecMap.java cec9358 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 4070c24 
>   src/java/org/apache/sqoop/tool/ImportTool.java 39af42c 
> 
> Diff: https://reviews.apache.org/r/37353/diff/
> 
> 
> Testing
> -------
> 
> yes
> 
> 
> Thanks,
> 
> Shashank Tandon
> 
>


Re: Review Request 37353: Support snappy compression in Sqoop Import with HCatalog.The Jira is SQOOP-2331

Posted by Shashank Tandon <st...@expedia.com>.

> On Sept. 28, 2015, 6:30 p.m., Jarek Cecho wrote:
> > Thanks for taking a look Shashank and my apologies for late review on this one.
> > 
> > Would you mind adding tests for the newly added functionality?
> 
> Venkat Ranganathan wrote:
>     Offline, Shashank asked for help regarding this 
>     
>     ==
>     
>     
>     I am getting the below error while running the snappy compression test 
>     java.lang.RuntimeException: native snappy library not available
>             at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:123)
>             at org.apache.hadoop.hive.ql.io.CodecPool.getCompressor(CodecPool.java:101)
>             at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:1148)
>             at org.apache.hadoop.hive.ql.io.RCFile$Writer.close(RCFile.java:1259)
>             at org.apache.hadoop.hive.ql.io.RCFileOutputFormat$1.close(RCFileOutputFormat.java:92)
>             at org.apache.hive.hcatalog.mapreduce.FileRecordWriterContainer.close(FileRecordWriterContainer.java:150)
>             at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
>             at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
>             at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>             at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>     
>     I have also tried to copy native snappy library into the lib folder and run the command like shown below :
>     
>     ant clean test -Dhadoopversion=100 -Dtestcase=HCatalogImportTest -Djava.library.path=lib/libsnappy.so
>     
>     But still facing the same issue as I have spent lot of time but unable to run snappy compression test from my local machine.
>     
>     ==
>     
>     I don't think we need to add unit tests for actual compression - except that the command options did take effect, but integration tests (those that don't with prefix Test) can be used with test this.   For example, we have TestHCatalog* and *HCatalogTest test cases

A new method testHCatImportWithSnappyCompressionOptions will be added in the TestHCatalogBasic class as suggested by venkat.


> On Sept. 28, 2015, 6:30 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 845-849
> > <https://reviews.apache.org/r/37353/diff/1/?file=1037642#file1037642line845>
> >
> >     Shouldn't the FileOutputFormat be wrapped in else statement?

This is a common statement which needs to be set up independant of any file format.


> On Sept. 28, 2015, 6:30 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 843-845
> > <https://reviews.apache.org/r/37353/diff/1/?file=1037642#file1037642line843>
> >
> >     Can we also add similar case for parquet?

Hcatalog does not work with Parquet.  There is a hive jira for this https://issues.apache.org/jira/browse/HIVE-7502


> On Sept. 28, 2015, 6:30 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/tool/ImportTool.java, lines 1157-1164
> > <https://reviews.apache.org/r/37353/diff/1/?file=1037643#file1037643line1157>
> >
> >     I'm wondering if this is indeed that case - the code seems to be allowin any codec for text file and only limiting snappy for sequencefile/orc.

There is no limitation on file format.The only if condition is that the compression codec snappy is only supported rightnow in case oh hcatalog.If other compression codec needs to be added then this condtion will be taken care.


- Shashank


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37353/#review100838
-----------------------------------------------------------


On Aug. 11, 2015, 10:39 a.m., Shashank Tandon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37353/
> -----------------------------------------------------------
> 
> (Updated Aug. 11, 2015, 10:39 a.m.)
> 
> 
> Review request for Sqoop and Venkat Ranganathan.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Apache Sqoop does not compress with --compress option with --hcatalog-table.It also does not support option --compression-codec snappy. Will add Snappy compression support in Apache Sqoop. When a user will try to use --compress, then it will use the by default compression i.e. GZIP. otherwise If user provide option --compress --compression-codec snappy then it will compress into snappy format.
> 
> 
> Diffs
> -----
> 
>   src/docs/user/hcatalog.txt 99ae4f5 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java e19c17b 
>   src/java/org/apache/sqoop/io/CodecMap.java cec9358 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 4070c24 
>   src/java/org/apache/sqoop/tool/ImportTool.java 39af42c 
> 
> Diff: https://reviews.apache.org/r/37353/diff/
> 
> 
> Testing
> -------
> 
> yes
> 
> 
> Thanks,
> 
> Shashank Tandon
> 
>


Re: Review Request 37353: Support snappy compression in Sqoop Import with HCatalog.The Jira is SQOOP-2331

Posted by Venkat Ranganathan <n....@live.com>.

> On Sept. 28, 2015, 11:30 a.m., Jarek Cecho wrote:
> > Thanks for taking a look Shashank and my apologies for late review on this one.
> > 
> > Would you mind adding tests for the newly added functionality?

Offline, Shashank asked for help regarding this 

==


I am getting the below error while running the snappy compression test 
java.lang.RuntimeException: native snappy library not available
        at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:123)
        at org.apache.hadoop.hive.ql.io.CodecPool.getCompressor(CodecPool.java:101)
        at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:1148)
        at org.apache.hadoop.hive.ql.io.RCFile$Writer.close(RCFile.java:1259)
        at org.apache.hadoop.hive.ql.io.RCFileOutputFormat$1.close(RCFileOutputFormat.java:92)
        at org.apache.hive.hcatalog.mapreduce.FileRecordWriterContainer.close(FileRecordWriterContainer.java:150)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

I have also tried to copy native snappy library into the lib folder and run the command like shown below :

ant clean test -Dhadoopversion=100 -Dtestcase=HCatalogImportTest -Djava.library.path=lib/libsnappy.so

But still facing the same issue as I have spent lot of time but unable to run snappy compression test from my local machine.

==

I don't think we need to add unit tests for actual compression - except that the command options did take effect, but integration tests (those that don't with prefix Test) can be used with test this.   For example, we have TestHCatalog* and *HCatalogTest test cases


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37353/#review100838
-----------------------------------------------------------


On Aug. 11, 2015, 3:39 a.m., Shashank Tandon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37353/
> -----------------------------------------------------------
> 
> (Updated Aug. 11, 2015, 3:39 a.m.)
> 
> 
> Review request for Sqoop and Venkat Ranganathan.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Apache Sqoop does not compress with --compress option with --hcatalog-table.It also does not support option --compression-codec snappy. Will add Snappy compression support in Apache Sqoop. When a user will try to use --compress, then it will use the by default compression i.e. GZIP. otherwise If user provide option --compress --compression-codec snappy then it will compress into snappy format.
> 
> 
> Diffs
> -----
> 
>   src/docs/user/hcatalog.txt 99ae4f5 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java e19c17b 
>   src/java/org/apache/sqoop/io/CodecMap.java cec9358 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 4070c24 
>   src/java/org/apache/sqoop/tool/ImportTool.java 39af42c 
> 
> Diff: https://reviews.apache.org/r/37353/diff/
> 
> 
> Testing
> -------
> 
> yes
> 
> 
> Thanks,
> 
> Shashank Tandon
> 
>