You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2013/07/31 23:53:48 UTC

[jira] [Updated] (HADOOP-9802) Support Snappy codec on Windows.

     [ https://issues.apache.org/jira/browse/HADOOP-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Nauroth updated HADOOP-9802:
----------------------------------

    Attachment: HADOOP-9802-branch-1-win.1.patch

This work started on branch-1-win, so I'm attaching the patch for that.  I'll provide a trunk patch soon too.  Here is a summary of the changes:
# Update the runtime library path used in hadoop.cmd so that snappy.dll can be loaded from lib/native if the build bundled snappy into the distro.
# build.xml changes to call javah on Windows.
# Visual Studio project file changes to compile the C code.
# Windows-specific dynamic library loading code.
# Minor changes to C code to guarantee correct calling convention and move a few variable declarations to the top of the function, because MSVC doesn't support C99.

Assuming you have Snappy itself deployed to C:\snappy, here is the easiest way to test it:

{code}
ant clean test-core -Dwindows=true -Dsnappy.prefix=C:\snappy -Dtestcase=TestCodec
{code}

I also successfully tested creating a distro with snappy bundled:

{code}
ant clean tar -Dwindows=true -Dforrest.home=C:\apache-forrest-0.9 -Dbundle.snappy=true -Dsnappy.prefix=C:\snappy
{code}

Then, I used that distro to test running a wordcount MR job that compresses its output:

{code}
hadoop-1.3.0-SNAPSHOT\bin\hadoop.cmd jar hadoop-1.3.0-SNAPSHOT\hadoop-examples-1.3.0-SNAPSHOT.jar wordcount -D mapred.output.compress=true -D mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec /input /output
{code}

Then, I ran a grep MR job using the snappy-compressed file as input to verify that the codec could decompress successfully:

{code}
hadoop-1.3.0-SNAPSHOT\bin\hadoop.cmd jar hadoop-1.3.0-SNAPSHOT\hadoop-examples-1.3.0-SNAPSHOT.jar grep /output/part* /grepout Apache
{code}

(My input file was our LICENSE.txt file, which is why I grepped for "Apache" in my test.)

Big thanks to [~chuanliu] who started a lot of this work.

                
> Support Snappy codec on Windows.
> --------------------------------
>
>                 Key: HADOOP-9802
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9802
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 3.0.0, 1-win, 2.1.1-beta
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-9802-branch-1-win.1.patch
>
>
> Build and test the existing Snappy codec on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira