You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2013/07/31 23:53:48 UTC
[jira] [Updated] (HADOOP-9802) Support Snappy codec on Windows.
[ https://issues.apache.org/jira/browse/HADOOP-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Nauroth updated HADOOP-9802:
----------------------------------
Attachment: HADOOP-9802-branch-1-win.1.patch
This work started on branch-1-win, so I'm attaching the patch for that. I'll provide a trunk patch soon too. Here is a summary of the changes:
# Update the runtime library path used in hadoop.cmd so that snappy.dll can be loaded from lib/native if the build bundled snappy into the distro.
# build.xml changes to call javah on Windows.
# Visual Studio project file changes to compile the C code.
# Windows-specific dynamic library loading code.
# Minor changes to C code to guarantee correct calling convention and move a few variable declarations to the top of the function, because MSVC doesn't support C99.
Assuming you have Snappy itself deployed to C:\snappy, here is the easiest way to test it:
{code}
ant clean test-core -Dwindows=true -Dsnappy.prefix=C:\snappy -Dtestcase=TestCodec
{code}
I also successfully tested creating a distro with snappy bundled:
{code}
ant clean tar -Dwindows=true -Dforrest.home=C:\apache-forrest-0.9 -Dbundle.snappy=true -Dsnappy.prefix=C:\snappy
{code}
Then, I used that distro to test running a wordcount MR job that compresses its output:
{code}
hadoop-1.3.0-SNAPSHOT\bin\hadoop.cmd jar hadoop-1.3.0-SNAPSHOT\hadoop-examples-1.3.0-SNAPSHOT.jar wordcount -D mapred.output.compress=true -D mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec /input /output
{code}
Then, I ran a grep MR job using the snappy-compressed file as input to verify that the codec could decompress successfully:
{code}
hadoop-1.3.0-SNAPSHOT\bin\hadoop.cmd jar hadoop-1.3.0-SNAPSHOT\hadoop-examples-1.3.0-SNAPSHOT.jar grep /output/part* /grepout Apache
{code}
(My input file was our LICENSE.txt file, which is why I grepped for "Apache" in my test.)
Big thanks to [~chuanliu] who started a lot of this work.
> Support Snappy codec on Windows.
> --------------------------------
>
> Key: HADOOP-9802
> URL: https://issues.apache.org/jira/browse/HADOOP-9802
> Project: Hadoop Common
> Issue Type: Improvement
> Components: io
> Affects Versions: 3.0.0, 1-win, 2.1.1-beta
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-9802-branch-1-win.1.patch
>
>
> Build and test the existing Snappy codec on Windows.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira