You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2007/12/10 22:57:43 UTC

[jira] Commented: (PIG-39) BufferedPositionedInputStream drastically reduces read performance because it doesn't override read([], o, l) in InputStream

    [ https://issues.apache.org/jira/browse/PIG-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550206 ] 

Olga Natkovich commented on PIG-39:
-----------------------------------

I incorporated the change and ran performance tests. Unfortunately, I did not see any change in performance. By looking at Hadoop, code, I think they already buffering the data, so our code just going against data cached in memory.

I am still going to commit the patch since this is a bug.

> BufferedPositionedInputStream drastically reduces read performance because it doesn't override read([], o, l) in InputStream
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-39
>                 URL: https://issues.apache.org/jira/browse/PIG-39
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>         Environment: Java 1.6, Mac OS X 10.5
>            Reporter: Sam Pullara
>
> Simple fix can have a huge effect on performance of certain kinds of PIG programs:
> Index: src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
> ===================================================================
> --- src/org/apache/pig/impl/io/BufferedPositionedInputStream.java	(revision 597597)
> +++ src/org/apache/pig/impl/io/BufferedPositionedInputStream.java	(working copy)
> @@ -49,7 +49,14 @@
>          pos += rc;
>          return rc;
>      }
> -    
> +
> +    @Override
> +    public int read(byte b[], int off, int len) throws IOException {
> +        int read = in.read(b, off, len);
> +        pos += read;
> +        return read;
> +    }
> +
>      /**
>       * Returns the current position in the tracked InputStream.
>       */

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.