You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Oliver Meyn (GBIF)" <om...@gbif.org> on 2012/01/09 11:42:11 UTC

snappy error during completebulkload

Hi all,

I'm trying to do bulk loading into a table with snappy compression enabled and I'm getting an exception complaining about missing native snappy library, namely:

12/01/09 11:16:53 WARN snappy.LoadSnappy: Snappy native library not loaded
Exception in thread "main" java.io.IOException: java.lang.RuntimeException: native snappy library not available
at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89)

First, to be clear, everything in this chain works fine if I don't use compression, and using hbase shell to 'put' to the compression-enable table also works fine.

Here's what I'm doing:
- use importtsv to generate the hfiles. I'm passing -Dhfile.compression=snappy on the command line, as per a mailing list email from lars g i found while googling. The import runs without errors, but I don't know how to test whether the hfiles are actually compressed.
- use completebulkload to move the hfiles into the cluster. This is where I get the exception. I'm running the command from my os x workstation, targeting a remote hdfs and hbase (both running on the same cluster).

My environment:
- hbase/hadoop are cdh3u2, fully distributed
- workstation is os x, 10.6

It seems really weird that compression (native compression even moreso) should be required by a command that is in theory moving files from one place on a remote filesystem to another. Any light shed would be appreciated.

Thanks,
Oliver
--
Oliver Meyn
Software Developer
Global Biodiversity Information Facility (GBIF)
+45 35 32 15 12
http://www.gbif.org

Re: snappy error during completebulkload

Posted by "Oliver Meyn (GBIF)" <om...@gbif.org>.

Thanks Todd, that makes more sense now.  I gave up on trying to build the native libraries on os x (not officially supported presumably because it's such a PITA) and instead ran from a centos machine and that worked flawlessly out of the box.

Cheers,
Oliver

On 2012-01-09, at 7:21 PM, Todd Lipcon wrote:

> On Mon, Jan 9, 2012 at 2:42 AM, Oliver Meyn (GBIF) <om...@gbif.org> wrote:
>> It seems really weird that compression (native compression even moreso) should be required by a command that is in theory moving files from one place on a remote filesystem to another.  Any light shed would be appreciated.
> 
> The issue is that the completebulkload script does actually open the
> files to read their metadata as well as the first/last key in the
> file. This is necessary to figure out which region each file belongs
> in. So, you do need the compression support on whatever machine you
> run completebulkload from.
> 
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera
> 


--
Oliver Meyn
Software Developer
Global Biodiversity Information Facility (GBIF)
+45 35 32 15 12
http://www.gbif.org

Re: snappy error during completebulkload

Posted by Todd Lipcon <to...@cloudera.com>.

On Mon, Jan 9, 2012 at 2:42 AM, Oliver Meyn (GBIF) <om...@gbif.org> wrote:
> It seems really weird that compression (native compression even moreso) should be required by a command that is in theory moving files from one place on a remote filesystem to another.  Any light shed would be appreciated.

The issue is that the completebulkload script does actually open the
files to read their metadata as well as the first/last key in the
file. This is necessary to figure out which region each file belongs
in. So, you do need the compression support on whatever machine you
run completebulkload from.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: snappy error during completebulkload

Posted by Jeff Whiting <je...@qualtrics.com>.

Sounds like the snappy library isn't installed on the machine or that java can't find the native 
library.  I think you need the hadoop-0.20-native installed (via apt or yum).

~Jeff

On 1/9/2012 3:42 AM, Oliver Meyn (GBIF) wrote:
> Hi all,
>
> I'm trying to do bulk loading into a table with snappy compression enabled and I'm getting an exception complaining about missing native snappy library, namely:
>
> 12/01/09 11:16:53 WARN snappy.LoadSnappy: Snappy native library not loaded
> Exception in thread "main" java.io.IOException: java.lang.RuntimeException: native snappy library not available
> 	at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89)
>
> First, to be clear, everything in this chain works fine if I don't use compression, and using hbase shell to 'put' to the compression-enable table also works fine.
>
> Here's what I'm doing:
> - use importtsv to generate the hfiles.  I'm passing -Dhfile.compression=snappy on the command line, as per a mailing list email from lars g i found while googling.  The import runs without errors, but I don't know how to test whether the hfiles are actually compressed.
> - use completebulkload to move the hfiles into the cluster.  This is where I get the exception.  I'm running the command from my os x workstation, targeting a remote hdfs and hbase (both running on the same cluster).
>
> My environment:
> - hbase/hadoop are cdh3u2, fully distributed
> - workstation is os x, 10.6
>
> It seems really weird that compression (native compression even moreso) should be required by a command that is in theory moving files from one place on a remote filesystem to another.  Any light shed would be appreciated.
>
> Thanks,
> Oliver
> --
> Oliver Meyn
> Software Developer
> Global Biodiversity Information Facility (GBIF)
> +45 35 32 15 12
> http://www.gbif.org
>

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com