You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2006/09/15 07:58:22 UTC

[jira] Created: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
-----------------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-538
                 URL: http://issues.apache.org/jira/browse/HADOOP-538
             Project: Hadoop
          Issue Type: Improvement
    Affects Versions: 0.6.1
            Reporter: Arun C Murthy
         Assigned To: Arun C Murthy
             Fix For: 0.7.0


There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12449297 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Doug,
> 1. There's some debug output in bin/hadoop, an 'echo'. 
Newer patch forthcoming to fix this oversight...

> The patch should not include binaries: these should be built somehow by build.xml. 
I have attached the .so for linux built via 'ant compile-native'... we will need to check this in for people who don't want to install the C compilation environment as we had discussed previously. Am I missing something here?

> On second thought, the custom libs should probably go in build/native and the prebuilt in lib/native.
Just to clarify: you'd want the 'custom built' libhadoop.so to override the 'prebuilt' one in lib/native on 'ant package' ?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12446647 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

To further annotate my previous comment: I believe the perception today is that src/c++ contains code to 'access' hadoop via C/C++; I think that is something worth carrying on, which is why I'd put the native code which is 'core' hadoop in src/native.

A related problem which came up during a discussion with Owen: we cannot let a single 'core-hadoop' library i.e. libhadoop.so have a direct dependency on libz.so (zlib shared-object)... this is because in future we will then need libhadoop.so have a dependency on liblzo.so (lzo library) and so on... the issue is that it will force people to install both libz.so and liblzo.so even if they want to use only of of them.

Solutions:
a) Have multiple libhadoopzlib.so libhadooplzo.so and so on. This will mean we will have multiple so's to track and load in hadoop, which could turn out to be a maintenece nightmare.
b) Supply stubs for each 'piece' i.e. zlib/lzo which have 'weak' symbols and let them get overridden on loading the actual .so.
c) Ensure the native code doesn't actually do a -lz (or -lzo) in the Makefile, but rely on dlopen/dlsym for necessary libz/liblzo calls. This means libhadoop.so doens't have a direct dependency on either libz.so/liblzo.so, is still a single share-obj with all of hadoop's native code and also lets people load whatever pieces of native code they like without paying for the rest.

Given it's relative simplicity I'd go for option c) - it's easy and effective.

Thoughts?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12437687 ] 
            
Doug Cutting commented on HADOOP-538:
-------------------------------------

+1  This looks like a good direction to me.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12442098 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

I don't think clean-native is worth having in the build.xml, since it isn't used by the clean target and I don't see when you'd want to clean the native stuff without cleaning the java stuff.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12448232 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Forgot to mention: please 
$ cd lib/native/prebuilt/Linux-i386-32
$ tar -xzvf ~/linux-libraries.tgz

before running ant tasks on this patch... thanks!

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Owen O'Malley updated HADOOP-538:
---------------------------------

    Fix Version/s: 0.9.0
                       (was: 0.8.0)

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Doug Cutting updated HADOOP-538:
--------------------------------

    Fix Version/s: 0.8.0
                       (was: 0.7.0)

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12447983 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

The CompressionCodec interface should not extend Configurable.
The DefaultCodec should extend Configured.
The rationale is that CompressionCodecs in general do not need to be configured. If the Codecs are Configurable, they will be configured by ReflectionUtils.newInstance(Class).

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12445272 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Thanks for the detailed feedback to Owen/Sameer, i'll put up an updated patch asap... though I admit I hadn't thought about 32-bit jvm on a 64-bit OS! :)

Meanwhile, one of the nice side-effects of this patch will be to enable the GzipCodec to work with SequenceFiles. 

Context: gzip is just zlib algo + extra headers. java.util.zip.GZIP{Input|Output}Stream and hence existing GzipCodec won't work with SequenceFile due the fact that java.util.zip.GZIP{Input|Output}Streams will try to read/write gzip headers in the constructors which won't work in SequenceFiles since we typically read data from disk onto buffers, these buffers are empty on startup/after-reset and cause the java.util.zip.GZIP{Input|Output}Streams to fail.

The upshot of this patch is that newer (zlib-1.2.*) can deal with this directly (java.util.zip is zlib-1.1.*), which means we can use them in SequenceFile. However, the downside is that people will need to have native hadoop code for getting this benefit. If people strongly feel we need this funcationality without native hadoop code, IMHO not critical since gzip is zlib+headers i.e. exact compression etc., then I guess we can track it via a separate jira issue... would people object to me enabling GzipCodec to work with SequenceFile for now only with native code in? If the native code isn't present I can print out a warning very early and exit...

Thoughts?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438395 ] 
            
Andrzej Bialecki  commented on HADOOP-538:
------------------------------------------

If we are to permit use of a native libhadoop there is at least one more thing that could be optimized - the annoying issue of getting the amount of free disk space, which currently requires running a Unix / cygwin utility and parsing its text output.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12446163 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

If I apply this patch:

ant clean tar fails with:

BUILD FAILED
/home/oom/work/clean-hadoop/build.xml:456: /home/oom/work/clean-hadoop/build/hadoop-0.7.3-dev/lib/native not found.

It also has a conflict with the mainline over one of the javadoc comments. 

It also causes a new warning from javadoc, which needs to be fixed.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12443720 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

src/java/org/apache/hadoop/io/compress/zip/ZlibCompressor.java and Decompressor:
   1. The fields should be private.
   2. The enum fields should be declared as enums rather than ints and the conversion done when the native methods are called.
   3. More missing javadoc *smile*

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538.patch
                HADOOP-538_benchmarks.tgz

Here's a patch incorporating 2 major features:
a) A framework for decoupling the '(de)compressor' from the 'stream' as discussed on this issue i.e. the {Com|Decom}pressor interfaces and a concrete implementation of a Data{Com|Decom}pression{Out|In}putStream for plugging the (de)compressors.
b) A straight-forward wrapper over the popular zlib algorithm (http://www.zlib.net) using a 'direct' java.nio.ByteBuffer - this gives us 60%-70% improvement in performance over existing java.util.zip.{De|In}flater (benchmarks attached).

I have also added some amount of framework (tweaks in build.xml to build the native libhadoop.so, Makefiles, src/native directory etc.) so that any future 'native' code written for hadoop will fit into the native-code framework (src/native dir and libhadoop.so) fairly easily... I have built the infrastructure so that only 1 share-object (libhadoop.so) is created by the top-level Makefile (src/native/Makefile) and it should have *all* of the necessary 'native' code for hadoop.

I'd appreciate feedback/improvements to any of the above stuff...

Doug, could you please treat this patch as 99.9% ready and hold-off committing it (assuming you are satisfied with it) so as to get as much feedback over this as possible? Thanks!


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Status: Patch Available  (was: In Progress)

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12446345 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

IMHO src/native is par for the course since src/c++ contains librecordio and libhdfs (libmapred in future?) which are sortof 'add-ons' to hadoop while HADOOP-538 is 'core-hadoop' in a way. Hence we don't want to mix these up in src/c++.

Another reason I wouldn't go down the path of src/c++/libhadoopzip because very soon we might need src/c++/libhadooplzo (for lzo) or something similar... and we don't want to proliferate share-objects needed for hadoop.

Given all this and the fact that we can house all of 'native' hadoop code (as and when required) in a single 'src/native' I would go for the current scheme, which can nicely create a single libhadoop.so for all of hadoop's native code.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438481 ] 
            
Doug Cutting commented on HADOOP-538:
-------------------------------------

> we need a class that is responsible for trying to load libhadoop.so and having fall back replacements

+1

I question whether DF belongs here.  Each thing here requires two implementations.  So we should only do it in places where it is critical.  We've found the improvements of native codecs to be dramatic.  Do we think that DF is causing performance problems?  Because we will still need the non-native version of DF as a fallback, for OSes and architectures that we won't provide pre-built binaries for in releases.

It would also be good to minimize the number of libhadoop binaries we commit to subversion.  I'd argue that we only check in libhadoop-linux-i386.so.  Even that will be impossible to rebuild in nightly builds, since those are made on solaris x86.  We will require that any time anything in the native code is changed, all pre-built binaries be re-generated as a part of the commit of the source changes.  Since linux-i386 is both the dominant development and deployment environment, I think it makes sense to ship with it alone pre-built.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538_20061030.tgz

Here's an updated patch incorporating all of Sameer's and Owen's comments... 

... I have also updated TestSequenceFile.java to accept a '-codec' argument and TestCodec.java to automatically test GzipCodec.


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12437727 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Couple of insights gleaned from a discussion with Owen:

a) The proposed DataCompression{In|Out}putStreams should probably be 'package private' instead of public, since it is more of an implementation detail; hence it probably shouldn't affect the public api.

b) The {Com|Decom}pressor interfaces should not have any public apis which expose the 'nio direct buffer' details - presumably they are implementation details. 

  The public apis should only be:

  public int deflate(byte[] b, int off, int len);   // Compressor
  and 
  public int inflate(byte[] b, int off, int len);   // Decompressor

  We should leave it to the actual zlib/lzo/* (de)compressors to use nio as they see fit.

Thanks Owen!

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538_20061114.patch

Here is another patch incorporating Doug's feedback...

a) Custom built libraries in build directory override the ones in lib/native, and hence there is no prebuilt/custom distinction

b) I have reworked to account for a 'compile.native' property which controls whether the native stuff is built/tested/packaged...
$ ant -Dcompile.native=true package    #compile native-hadoop
$ ant package                                    #no native hadoop

  I have had to define a new target: 'compile-core-classes' which replaces existing 'compile-core' and redefined 'compile-core' as:

  <target name="compile-core" depends="compile-core-classes,compile-core-native">
   </target>

  The alternative to get 'ant -Dcompile.native=true package' or 'ant -Dcompile.native=true test' to work was to add a 'depends="compile-native"' to every target which has a 'depends="compile-core"' which I felt was error-prone and difficult to maintain...



> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_20061114.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538_20061108.tgz

Here is another patch incorporating all of Owen's comments for review...

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12448019 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

1. Naming of .so's

The naming was done as exists so as to ensure we can easily 'know' whether libhadoop.so is loaded or libhadoop-{platform-specific}.so and print out diagnostic information from NativeCodeLoader...

However if people feel strongly about it, here is a slight variation:

lib/native/prebuilt/Linux-i386-32/libhadoop.so  //pre-built, checked-in
build/hadoop-*/lib/native/prebuilt/Linux-i386-32/libhadoop.so // prebuilt packaged location 

build/hadoop-*/lib/native/custom/Linux-i386-32/libhadoop.so // custom-built packaged location

2. package-native

Unfortunately we cannot just 'copy' libraries because of the need to invoke 'libtool' from the package-native target, this is because only libtool understands the platform-specific conventions about 'libraries' i.e. .so v/s .dll and so on... and hence a simple copy would not suffice. Also we cannot invoke libtool from 'package' since we can't assume that 'libtool' exists without 'compile-native'.... thus the need for a separate 'package-native' target.

3. CompressionCodec and 'Configurable'

I'll get this done, with a small caveat: We cannot get DefaultCodec to 'extend' Configured since it would mean that we will need to add a new constructor to DefaultCodec (there is an implicit constructor today) to satisfy the 'Configured' class, thus breaking the existing public api. I can get around this by 
public class DefaultCodec implements Configurable, CompressionCodec {
  //...
}
Does that fly?




> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12447943 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

I think the naming of the .so's is confusing. How about:

lib/native/Linux-i386-32/libhadoop.so     // prebuilt library checked into svn
build/hadoop-*/lib/native/prebuilt/Linux-i386-32/libhadoop.so    // prebuilt packaged location

The compiled library would still be packaged into:
build/hadoop-*/lib/native/Linux-i386-32/libhadoop.so

I think we should remove the "package-native" target and just make the "package" target copy any compiled libraries (for all platforms) into the package directory.

So to do a full build, it would be:

ant clean compile-native tar



> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12437447 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

While working on this I've realised that the 'custom compressor' framework we have built as a part of HADOOP-441 isn't the most flexible one or complete.

Specifically the existing framework only lets us plug-in custom compress/decompress 'streams' (e.g. a bzip2 input/output stream) while in many cases it is sufficient to use an existing 'stream' and just plug-in a custom deflater/inflater (e.g. native-zlib or lzo inflater/deflater pair)... java.util.zip's {De|In}flater classes just haven't been designed with this kind of functionality in mind; making them unsuitable.

Hence I would like to propose that we add a org.apache.hadoop.io.compress.{Com|Decom}pressor interface which custom {de}compressors can implement and plug into an existing {de}compression stream... further I would also like to propose that the above {Com|Decom}pressor interfaces have the same interfaces as the public methods in java.util.zip.{De|In}flater ie.

public interface Compressor {
  public void setInput(byte[] b, int off, int len);
  public boolean needsInput();
  public void setDictionary(byte[] b, int off, int len);
  public void finish();
  public boolean finished();
  public int deflate(ByteBuffer directBuffer, int directBufferLength); // for native calls with nio's direct buffer
  public int deflate(byte[] b, int off, int len); // for native methods without nio
  public void reset();
  public void end();
}

public interface Decompressor {
  public void setInput(byte[] b, int off, int len);
  public boolean needsInput();
  public void setDictionary(byte[] b, int off, int len);
  public boolean needsDictionary();
  public void finish();
  public boolean finished();
  public int inflate(ByteBuffer directBuffer, int directBufferLength); // for native calls with nio's direct buffer
  public int inflate(byte[] b, int off, int len); // for native methods without nio
  public void reset();
  public void end();
}

  On the same trajectory we will need to supply a pair of input/output streams which can take objects implementing the above interfaces to achieve actual compression/decompression. Again java.util.zip.{De|In}flater{Out|In}put streams won't suffice since they weren't designed with these in mind. 

 I would like to propose org.apache.hadoop.io.compress.Compression{In|Out}putStreams, but they are already taken; how about org.apache.hadoop.io.compress.DataCompression{In|Out}putStreams? 

 With existing Compression{Out|In}putStreams and the above {Com|Decom}pressor/DataCompression{In|Out}putStreams we should have a sufficiently complete abstractions to support 'custom codecs'...

Thoughts?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12443706 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

I think that "ant compile-native" should create:

build/native/libhadoop-${os}-${arch}.so

and package should move build/native/libhadoop-*.so to the root level of the distribution parallel with the jar files. Then the NativeCodeLoader should always load "hadoop-${os}-${arch}".



> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Sameer Paranjpye updated HADOOP-538:
------------------------------------

    Component/s: io

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Status: Patch Available  (was: Open)

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_20061114.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538_20061107.tgz

Here is a patch incorporating things put forth in the last comment i.e. single libhadoop.so which has no direct dependency on libz, relies on dlopen/dlsym and also lets people load whatever pieces of native code they like without paying for the rest.


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12444950 ] 
            
Sameer Paranjpye commented on HADOOP-538:
-----------------------------------------

Some more comments on the native code:

Makefiles etc.

- We should compile with '-Wall'. 
- What about 64-bit vs 32-bit builds, this makefile will generate the OS default, which may be 
incompatible with the JVM used. For instance on a 64-bit machine with a 32-bit JVM this will generate
a 64-bit library, which will be incompatible with the JVM. We should check the JVM version and compile
with -m32 or -m64 as appropriate.

-----------

Index: src/native/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c

+JNIEXPORT jlong JNICALL
+Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init(
+	JNIEnv *env, jclass class, jint level, jint strategy
+	) {
+	// Create a z_stream
+    z_stream *stream = malloc(sizeof(z_stream));
+    if (!stream) {
+		THROW(env, "java/lang/OutOfMemoryError", NULL);
+		return (jlong)0;
+    } 
+    memset((void*)stream, 0, sizeof(z_stream));
...
...
...
+
+    return (jlong)((int)stream);
+}
+

Is this the only way to return a void* to the JVM? This is a bad, unsafe cast, not 64-bit safe. 
If a cast like this is needed it should be '(jlong)(stream)'.


+JNIEXPORT jint JNICALL
+Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect(
+	JNIEnv *env, jobject this
+	) {
+	// Get members of ZlibCompressor
+    z_stream *stream = (z_stream *)
+  				((int)(*env)->GetLongField(env, this, ZlibCompressor_stream));
...
...
...
+}

Cast to int is not 64-bit safe.


-------------
Index: src/native/org/apache/hadoop/io/compress/zlib/ZlibDecompressor.c


+JNIEXPORT jlong JNICALL
+Java_org_apache_hadoop_io_compress_zlib_ZlibDecompressor_init(
+	JNIEnv *env, jclass cls
+	) {
+    z_stream *stream = malloc(sizeof(z_stream));
+    memset((void*)stream, 0, sizeof(z_stream));
+
...
...
...
+	return (jint)((int)stream);
+}


Cast is not 64-bit safe.



-----------------
Index: src/native/org/apache/hadoop/io/compress/zlib/org_apache_hadoop_io_compress_zlib_util.h

+// TODO: Fix the below macro for 64-bit systems
+// Helpful macro to convert a jlong to z_stream*
+#define ZSTREAM(stream) ((z_stream*)((int)stream))


Act on the comment, please. :)



-----------------
Index: src/native/org_apache_hadoop.h

+ 
+ #define THROW(env, exception_name, message) \
+ 			{ \
+			 	jclass ecls = (*env)->FindClass(env, exception_name); \
+			 	if (ecls) { \
+					(*env)->ThrowNew(env, ecls, message); \
+		 		} \
+ 			}
+		 			 			


This macro creates a local reference to a class in jclass and does not destroy it, causing a memory leak.
Add a '(*env)->DestroyLocalReference(env, ecls)' statement.



> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12446332 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

I think that we should avoid adding a new top-level directory in src. Furthermore libhadoop is pretty general for an optimization package.

Maybe src/c++/libhadoopzip or something would be better.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538_20061026.tgz

Here is another stab at native-zlib incorporating all of Owen's comments... the significant feature being better support for mutli-arch builds of hadoop. Appreciate any feedback...

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12443707 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

NativeCodeLoader:
   1. The Class & public methods need to have javadoc
   2. The log field should be private.
   3. Using a static Configuration is pretty broken, because it:
      a. causes the config files to be re-read
      b. ignores mapred-default.xml
      c. doesn't let the application control it
      As a result, I think the test for hadoop.native.code and the conf field should be removed.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment:     (was: HADOOP-538_20061030.tgz)

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12449465 ] 
            
Doug Cutting commented on HADOOP-538:
-------------------------------------

> I have attached the .so for linux  [ ... ] Am I missing something here?

Not really.  Mostly I was checking that I wasn't missing something, that there wasn't something special about these.  Please don't bother to submit these with future versions of the patch.  Rather, I'll generate them and commit them.  To my thinking, generated files should be generated by the committer and not included in the patch.

> you'd want the 'custom built' libhadoop.so to override the 'prebuilt' one in lib/native on 'ant package'

Yes.  The contents of the build directory should override pre-built binaries.


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538_20061030.tgz

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12448217 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

getPrimitiveArrayCritical has been observed to have really bad performance in some cases. Part of the problem is that you have no control over if it will copy or not. So some minor changes to the code completely thrash your performance.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438483 ] 
            
Doug Cutting commented on HADOOP-538:
-------------------------------------

Let me refine that a bit: if we ship with any pre-built binaries, I think it should be libhadoop-linux-i386.so only.  All current committers will have no trouble maintaining this, and it will make the out-of-the-box experience faster for the majority of production installations.  It's not ideal, surely, but it would not currently be much of a burden, and will provide substantial benefit.  If it becomes burdensome, or the benefit seems small, then we should revisit this.  Thoughts?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438205 ] 
            
Dawid Weiss commented on HADOOP-538:
------------------------------------

Don't know if it's at all relevant, but if you must stick to pure Java and still want to avoid SUN's buggy compression streams, we have an implementation of inflater/deflater and GZIP wrappers on top of jzlib. It works just fine for us (although is about 50% slower compared to the JDK version, which, although buggy, has certain native implementations).

http://svn.sourceforge.net/viewvc/carrot2/trunk/carrot2/components/carrot2-util-gzip/

Dawid


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12447951 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

The ZlibCompressor has dead nativeZlibLoaded code.

The ZlibFactory getGzip{Decom,Com}pressor methods should never return null, because their contract should be that they always return a valid implementation regardless of whether the native library was loaded. I see that the problem is that you want to use the native Gzip stream stuff, which doesn't provide a {com,decom}pressor. I guess I'd vote for just using:

public CompressionInputStream createInputStream(InputStream in) throws IOException {
  if (ZlibFactory.isNativeZlibLoaded()) {
    return new DecompressorStream(...);
  } else {
    return new GzipInputStream(in);
  }
}

and removing the extra ZlibFactory.getGzip* methods.


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12448840 ] 
            
Doug Cutting commented on HADOOP-538:
-------------------------------------

This is looking really close.  A few comments:

1. There's some debug output in bin/hadoop, an 'echo'.

2. The patch should not include binaries: these should be built somehow by build.xml.

3. I don't like custom/prebuilt distinction.  How about we just put prebuilt stuff in lib/, and custom be in build/lib/?  A deploy-native target can copy things from build/lib/ to lib/, and, if that part of lib/ is stored in subversion, these files will be committed.  The "package" target will always place whatever's in build/lib into the lib/ directory of the packaged code.

4. We should define a "compile.native" property in build.xml that determines whether the native code is compiled, tested and packaged.  By default this will be unset, although we may want to set it for releases and nightly builds.

The only reason to check any binaries into subversion is for folks who: (1) build from trunk rather than using a release; and (2) don't have the C compliation environment installed.


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538_20061005.tgz

Here is an updated patch incorporating the following:
* Completely integrated with GNU autotools (autoconf, automake, libtool) to ease building of native-hadoop on a wide-variety of platforms.
* Use GNU libtool to create the actual native-hadoop library to ensure platform specfic builds and also uses java.lang.System.loadLibrary consistently to handle platform specific mappings.
* The 'fallback' feature and libhadoop-linux-i386.so provided by default as discussed here.
* Complete integration with build.xml and autotools. I have put in 3 new targets: compile-native, clean-native and package-native which need to be called explicitly. 
* I have also ensured that no 'generated files' are ever placed in src/native directory; only in build/ to ensure 'ant clean' works as of today. This is accomplished via 'vpath' tweaks to the Makefiles.
* Uses -Djava.library.path (in both bin/hadoop and build.xml) to help find the native hadoop library.

I'd appreciate any feedback on any of the above; particularly since I'm a autotools n00bie...

I haven't tested this on cygwin yet; on my todo list. 

Doug - 
* Could you please copy the attached libhadoop-linux-i386.so in trunk/lib/native first before attempting to build native-hadoop?
* Also, please treat this as 'almost complete' again and hold off committing it? I feel we should skip this patch for release 0.7... thanks!

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12437448 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Addendum:

The above DataCompression{In|Out}putStream should obviate the need for the existing org.apache.hadoop.io.compress.DefaultCodec.Resetable{De|In}flater{Out|In}putStreams and they can be removed safely without affecting any existing code...

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Arun C Murthy updated HADOOP-538:
---------------------------------

    Attachment: HADOOP-538_20061011.tgz

Here's another patch incorporating all features listed here: http://issues.apache.org/jira/browse/HADOOP-538#action_12439996 with a few additions:
* org.apache.hadoop.io.compress.zlib.{Com|Decom}pressor classes can now encode/decode both zlib and gzip formats. This is used in GzipCodec now.
* I have added a switch: 'hadoop.native.library' in hadoop-default.xml to let users control whether or not they wish to use the native code.
* Fixed an issue with autom4te which generated autom4te.cache in src/native

Doug - could you please copy the attached libhadoop-linux-i386.so in trunk/lib/native first before attempting to build native-hadoop? It will need to be committed also... thanks!

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12444771 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

A couple more comments:

The buffer sized in all the new code should use the conf.getInt("io.file.buffer.size") for the size rather than 4k.

The CompressionCodec sub-classes (in particular DefaultCodec) need to be Configurable and use the Configuration that is given to them rather than a default Configuraiton.

The built in gzip codec is still going to be broken when used inside of SequenceFiles.

Do the C source directories really need to be nested so deeply?

Since we already have src/c++, wouldn't it make more sense to use src/c++/libhadoop/.. instead of src/native/..

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Work started: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Work on HADOOP-538 started by Arun C Murthy.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12442097 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

Just started looking through the patch, but please set your emacs and/or eclipse to use spaces instead of tabs. It makes it hard to read when the indentation isn't even. Thanks!

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12446181 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

I think the build-failure is due to the fact that you will need to copy the attached libhadoop-i386-32.so into trunk/lib/native manually first, could you please try that? Thanks!

I'll put up another patch with the javadoc fixes, a new patch is necessiated anyway by the new source headers that need to be used as per HADOOP-658...

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438331 ] 
            
Dawid Weiss commented on HADOOP-538:
------------------------------------

Doug is correct -- compiling native code is a pain and including it in the runtime is even more pain. I have a project that uses JNI to bind to a shared library, which in turn calls other libraries. Apart from setting up the build process in the ANT build file, which is itself quite tricky -- the same set of tools (gcc) on Linux produces *.so, whereas on Cygwin it creates *.dll files -- you need to set up java.library.path to point to your shared objects AND a regular path (Windows) or LD_LIBRARY_PATH (Linux-ish) so that the system can find native dependencies. In the end, it's a real pain to compile and maintain.

The suggestion to fallback to pure java is I think a good one. It allows you to start playing with Hadoop on heterogenic environment and without much hassle and then tune it later (by placing appropriate libraries wherever the system can find them).

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438478 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

Ok, I think we need a class that is responsible for trying to load libhadoop.so and having fall back replacements.

package org.apache.hadoop.util;

public class NativeCode {
  private static boolean inFallback = true;

  public static Class<DF> getDFClass() { ... }

  public static Class<CompressionCodec> getDefaultCodecClass() { ... }

  public static Class<CompressionCodec> getGzipCodecClass(){...}

  static {
     // try to load libhadoop.so and set fallback flag appropriately
  }
}

Does that look reasonable?

I think that we'd be fine with libhadoop-${os.name}-${os.arch}.so for the most part. Especially if we keep C++ out of the APIs. *smile* I don't think we should try to determine the distro or get into that game.


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438492 ] 
            
Owen O'Malley commented on HADOOP-538:
--------------------------------------

I was mostly putting in DF since it had been mentioned and I didn't think the NativeCode class should be buried up in org.apache.hadoop.io.compress. *smile*

I think that forking the process and parsing the results is far more error-prone that a simple syscall. But that is a side issue in my opinion. I mostly wanted to set the granularity of NativeCode as returning classes with the appropriate implementation.


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12437531 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

The above {Com|Decom}pressor interfaces will also need a 

public boolean usesDirectBuffer();

api so as to advertise if they are based on 'native' code and can take advantage of nio's direct buffers. There is no reason of course to stipulate it since we very well could be using a 'pure java' codec in future.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12448027 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Wrt ZlibFactory, I agree it looks a little ugly for now - and above direction.

However I feel we can do sligthly better by getting ZlibFactory to return the 'Class' instead of the instance i.e.
public Class getZlibCompressorClass() 
&
public Class getZlibDecompressorClass() 

(This of course is Owen's original suggestion...)

This will require that ReflectionUtils has a new method:

    public static Object newInstance(Class theClass, Object[] arguments, 
        Configuration conf) {
        // Constructor signature
        Class[] constructorSignature = new Class[arguments.length];
        for (int i=0; i < arguments.length; ++i) {
          constructorSignature[i] = arguments[i].getClass();
        }
        
        Object result;
        
        try {
            Constructor meth = theClass.getConstructor(constructorSignature);
            meth.setAccessible(true);
            result = meth.newInstance(arguments);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
        
        if (conf != null) {
            if (result instanceof Configurable) {
                ((Configurable) result).setConf(conf);
            }
            if (conf instanceof JobConf && 
                    result instanceof JobConfigurable) {
                ((JobConfigurable)result).configure((JobConf) conf);
            }
        }
        return result;
    }

which can then be used to call constructors with arguments (ala http://java.sun.com/docs/books/tutorial/reflect/object/arg.html). 

This is necessary since I will need to call Zlib{Com|Decom}pressor with arguments to make it 'behave' like a Gzip {com|decom}pressor. 

The api can probably be useful elsewhere too...

Thoughts?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438384 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Good points Doug; thanks!

Some thoughts...

I like the 'fallback' mechanism, it should ease hadoop adoption for people on heterogenous environments, I'll get to work on this one.

However I'm a little skeptical about 'ship pre-built libhadoop.so' part for the following reasons:
a) We will need to maintain versions for different h/w architectures and linux distros (Definition of 'standard platforms'? -> cygwin and RHEL/ubuntu/FC{4|5}).
b) glibc compatibility (and also stuff like whether it is compiled with -nptl or not and so on); agreed it isn't very common, but something which might bite us once in a while. (Hmm... differing glibc across linux distros?)
c) Though we don't have it so far, a single line of C++ code in src/native will make things far worse due to C++ ABI issues; means we will need to maintain different versions of libhadoop for gcc-2x, gcc-3x, gcc-4x etc, even for 'standard platforms'.

Hence I'd like to propose we only have the 'fallback' mechanism - if you need libhadoop.so, please build it yourself. We'll do all we can to ease the second part.

I'll also start work to use the GNU build system (autoconf/automake/libtool) for building libhadoop.so, this should ease the 'please build it yourself' part for people.

Thoughts?


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438210 ] 
            
Doug Cutting commented on HADOOP-538:
-------------------------------------

A lot of Hadoop developers might not have the right development environment to build the native libraries.  For example, I use Ubuntu, and had previously manually installed gcc, make and a few other things in order to get libhdfs to compile, but to get this to compile I had to install zlib1g-dev.  I have not yet tried to build this under cygwin, and don't even know what packages are required there.

Do you think it would be possible to commit pre-built versions of libhadoop.so for common platforms?  Minimally we could include 32-bit builds for cygwin and linux.  This will complicate the process of changing these, but the alternative is to complicate the release process, since we currently don't require non-developers to have much more than a JRE.

And/or we can fallback (with a warning) to using either Sun's implementation or jzlib when an appropriate libhadoop.so is not available.  Then we could still decide to ship with a pre-built version (e.g., linux only, which would be easy, since releases are all built on linux) but things would still work fine out-of-the-box on other platforms and can be optimized by those who wish to install the required developer tools.

A corallary to these approaches is that the generated libhadoop file name should probably include the OS and architecture names.

Thoughts?


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12447815 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Addendum: I've had issues getting this whole native framework to work on cygwin i.e. the autotool chain, shell scripts etc. - appreciate anyone helping out on that one...

Meanwhile can we get this out to people on *nix platform, since everything is optional stuff anyway, and if there is sufficient interest we can track this as a separate issue if anyone is really interested on deploying a largish hadoop on cygwin-based cluster?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12448890 ] 
            
Doug Cutting commented on HADOOP-538:
-------------------------------------

On second thought, the custom libs should probably go in build/native and the prebuilt in lib/native.

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12438503 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Sounds like shipping libhadoop-linux-i386.so along should be a sweet-spot between performance and maintainence/release costs for now... I'll get this done; and the NativeCodeLoader too. Thanks!

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Doug Cutting updated HADOOP-538:
--------------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Fixed

I just committed this.  Thanks, Arun, this was a big one!

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_20061114.patch, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-538?page=all ]

Doug Cutting updated HADOOP-538:
--------------------------------

    Status: Open  (was: Patch Available)

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_20061108.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12448212 ] 
            
Raghu Angadi commented on HADOOP-538:
-------------------------------------

FYI:  If a user creates a lot of such streams, it could be affected by known issue with direct buffers (see HADOOP-637). 

If there was a way in JNI to copy from java array to C array directly, we could avoid direct buffer allocation in Java and do a malloc() inside C. In fact JNI documentation says GetPrimitiveArrayCritical() does not always do a copy. 


> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.9.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_20061030.tgz, HADOOP-538_20061107.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira