You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/08/04 08:57:14 UTC

[jira] Created: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Improve ByteStream by removing all synchronized method calls
------------------------------------------------------------

                 Key: HIVE-720
                 URL: https://issues.apache.org/jira/browse/HIVE-720
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
Both these classes have a lot of sychronized methods, which make them really slow.

We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739129#action_12739129 ] 

Zheng Shao commented on HIVE-720:
---------------------------------

Overall it looks good. Some nitpicks about class names:
1: ByteArrayInputBuffer -> NonSyncByteArrayInputStream, ByteArrayOutputBuffer -> NonSyncByteArrayOutputStream.
2. HiveDataInputBuffer -> NonSyncDataInputBuffer, HiveDataOutputBuffer -> NonSyncDataOutputBuffer

Also can we put the new classes into common/io instead of common?

Also, what is the reason that HiveDataInputBuffer inherits FilterInputStream, while HiveDataOutputBuffer inherits DataOutputStream?
We might have discussed it before but I cannot remember. Can you put some comments into the code?


> Improve ByteStream by removing all synchronized method calls
> ------------------------------------------------------------
>
>                 Key: HIVE-720
>                 URL: https://issues.apache.org/jira/browse/HIVE-720
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>         Attachments: HIVE-720.1.patch
>
>
> org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
> Both these classes have a lot of sychronized methods, which make them really slow.
> We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739343#action_12739343 ] 

He Yongqiang commented on HIVE-720:
-----------------------------------

Will do 1 and 2, and move the two classes into common.io.
>>what is the reason that HiveDataInputBuffer inherits FilterInputStream, while HiveDataOutputBuffer inherits DataOutputStream?
Let me figure out why and add some comment.

> Improve ByteStream by removing all synchronized method calls
> ------------------------------------------------------------
>
>                 Key: HIVE-720
>                 URL: https://issues.apache.org/jira/browse/HIVE-720
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>         Attachments: HIVE-720.1.patch
>
>
> org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
> Both these classes have a lot of sychronized methods, which make them really slow.
> We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739361#action_12739361 ] 

Zheng Shao commented on HIVE-720:
---------------------------------

+1. Will commit if tests pass.

> Improve ByteStream by removing all synchronized method calls
> ------------------------------------------------------------
>
>                 Key: HIVE-720
>                 URL: https://issues.apache.org/jira/browse/HIVE-720
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>         Attachments: HIVE-720.1.patch, HIVE-720.2.patch
>
>
> org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
> Both these classes have a lot of sychronized methods, which make them really slow.
> We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-720:
------------------------------

    Attachment: HIVE-720.1.patch

HIVE-720.1.patch tries to reuse some io code. It passed tests in my local.

> Improve ByteStream by removing all synchronized method calls
> ------------------------------------------------------------
>
>                 Key: HIVE-720
>                 URL: https://issues.apache.org/jira/browse/HIVE-720
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>         Attachments: HIVE-720.1.patch
>
>
> org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
> Both these classes have a lot of sychronized methods, which make them really slow.
> We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738919#action_12738919 ] 

He Yongqiang commented on HIVE-720:
-----------------------------------

Hive has its own HiveDataInputBuffer and HiveDataOutputBuffer. We can reuse a lot of code here.

> Improve ByteStream by removing all synchronized method calls
> ------------------------------------------------------------
>
>                 Key: HIVE-720
>                 URL: https://issues.apache.org/jira/browse/HIVE-720
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>
> org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
> Both these classes have a lot of sychronized methods, which make them really slow.
> We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao resolved HIVE-720.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.4.0
     Release Note: HIVE-720. Improve ByteStream by removing all synchronized method calls. (Yongqiang He via zshao)
     Hadoop Flags: [Reviewed]

Committed. Thanks Yongqiang!

> Improve ByteStream by removing all synchronized method calls
> ------------------------------------------------------------
>
>                 Key: HIVE-720
>                 URL: https://issues.apache.org/jira/browse/HIVE-720
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>             Fix For: 0.4.0
>
>         Attachments: HIVE-720.1.patch, HIVE-720.2.patch
>
>
> org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
> Both these classes have a lot of sychronized methods, which make them really slow.
> We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao reassigned HIVE-720:
-------------------------------

    Assignee: He Yongqiang

> Improve ByteStream by removing all synchronized method calls
> ------------------------------------------------------------
>
>                 Key: HIVE-720
>                 URL: https://issues.apache.org/jira/browse/HIVE-720
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: HIVE-720.1.patch, HIVE-720.2.patch
>
>
> org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
> Both these classes have a lot of sychronized methods, which make them really slow.
> We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-720) Improve ByteStream by removing all synchronized method calls

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-720:
------------------------------

    Attachment: HIVE-720.2.patch

> Improve ByteStream by removing all synchronized method calls
> ------------------------------------------------------------
>
>                 Key: HIVE-720
>                 URL: https://issues.apache.org/jira/browse/HIVE-720
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>         Attachments: HIVE-720.1.patch, HIVE-720.2.patch
>
>
> org.apache.hadoop.hive.serde2.ByteStream has 2 inner classes: Input and Output, which inherits from ByteArrayInputStream and ByteArrayOutputStream.
> Both these classes have a lot of sychronized methods, which make them really slow.
> We should let ByteStream.Input and ByteStream.Output directly inherit InputStream and OutputStream so we don't need to call synchronized methods at all. This will help LazySimpleSerDe, ColumnarSerDe as well as LazyBinarySerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.