You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2013/08/05 23:16:49 UTC

[jira] [Updated] (HADOOP-9802) Support Snappy codec on Windows.

     [ https://issues.apache.org/jira/browse/HADOOP-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Nauroth updated HADOOP-9802:
----------------------------------

    Attachment: HADOOP-9802-trunk.1.patch

I'm attaching the trunk patch.  This was a bit more involved to fully port the CMake logic and deal with some enhancements in the trunk version of the Snappy code:

# Updated native.vcxproj and pom.xml to support all of the same build options as on Linux: snappy.prefix, snappy.lib, snappy.include, require.snappy, and bundle.snappy.  (No need to update BUILDING.txt with special instructions for Windows.)
# Refactored a common {{GetLibraryName}} function into libwinutils.
# Updated hadoop-project-dist/pom.xml for bundling Snappy into the right sub-directory on Windows.
# Updated hadoop-project/pom.xml to make snappy.dll available in tests.

Here are testing instructions, assuming that you have the Snappy code deployed in C:\snappy and snappy.dll in C:\snappy\lib.  If you're going to run MR jobs for full end-to-end testing, then you'll need to manually set the PATH environment variable to include hadoop.dll and snappy.dll before launching NodeManager.  This is documented in YARN-1025.

>From working directory hadoop-common-project/hadoop-common, here is a unit test that covers it:

{code}
mvn --offline clean test -Dsnappy.lib=C:\snappy\lib -Dsnappy.include=C:\snappy -Drequire.snappy -Dtest=TestCodec
{code}

This creates a distro with Snappy bundled:

{code}
mvn --offline -Pdist -Dtar -Dsnappy.lib=C:/snappy/lib -Dsnappy.include=C:/snappy -Drequire.snappy -Dbundle.snappy -DskipTests clean package
{code}

The require.snappy flag causes the build to fail if it doesn't find Snappy:

{code}
mvn --offline -Pdist -Dtar -Dsnappy.lib=C:/nosnappy/lib -Dsnappy.include=C:/nosnappy -Drequire.snappy -Dbundle.snappy -DskipTests clean package
{code}

I successfully tested the distro by running an MR job with Snappy compression for the output:

{code}
hadoop-1.3.0-SNAPSHOT\bin\hadoop.cmd jar hadoop-1.3.0-SNAPSHOT\hadoop-examples-1.3.0-SNAPSHOT.jar wordcount -D mapred.output.compress=true -D mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec /input /output
{code}

Then, I ran another MR job reading the Snappy compressed file as input:

{code}
hadoop-1.3.0-SNAPSHOT\bin\hadoop.cmd jar hadoop-1.3.0-SNAPSHOT\hadoop-examples-1.3.0-SNAPSHOT.jar grep /output/part* /grepout Apache
{code}

I also tested the refactored {{GetLibraryName}} code by running {{TestNativeLibraryChecker}} and the {{NativeLibraryChecker}} command through the shell.

I also verified that the native build on Linux still works and still bundles Snappy correctly.

Chuan, how does this look?  If it looks good, do you think we should backport some of this into the branch-1-win patch too for consistency?

                
> Support Snappy codec on Windows.
> --------------------------------
>
>                 Key: HADOOP-9802
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9802
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 3.0.0, 1-win, 2.1.1-beta
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-9802-branch-1-win.1.patch, HADOOP-9802-trunk.1.patch
>
>
> Build and test the existing Snappy codec on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira