You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/03/27 01:39:50 UTC
[jira] Created: (HIVE-375) LazySimpleSerDe to directly serialize
(append) int/long/byte/short etc to UTF-8 buffer
LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
--------------------------------------------------------------------------------------
Key: HIVE-375
URL: https://issues.apache.org/jira/browse/HIVE-375
Project: Hadoop Hive
Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Zheng Shao
LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-375) LazySimpleSerDe to directly serialize
(append) int/long/byte/short etc to UTF-8 buffer
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-375:
----------------------------
Description:
LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns), and this blocks HIVE-266.
was:
LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns).
> LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
> --------------------------------------------------------------------------------------
>
> Key: HIVE-375
> URL: https://issues.apache.org/jira/browse/HIVE-375
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Attachments: HIVE-375.1.patch, HIVE-375.2.patch
>
>
> LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
> Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
> We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
> This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns), and this blocks HIVE-266.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-375) LazySimpleSerDe to directly serialize
(append) int/long/byte/short etc to UTF-8 buffer
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Thusoo reassigned HIVE-375:
----------------------------------
Assignee: Zheng Shao
> LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
> --------------------------------------------------------------------------------------
>
> Key: HIVE-375
> URL: https://issues.apache.org/jira/browse/HIVE-375
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Attachments: HIVE-375.1.patch, HIVE-375.2.patch
>
>
> LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
> Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
> We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
> This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns), and this blocks HIVE-266.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-375) LazySimpleSerDe to directly serialize
(append) int/long/byte/short etc to UTF-8 buffer
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-375:
----------------------------
Attachment: HIVE-375.1.patch
LazySimpleSerDe is using UTF8 buffer for serialization now.
Also added direct UTF8 serialization for byte, short, int, and long.
> LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
> --------------------------------------------------------------------------------------
>
> Key: HIVE-375
> URL: https://issues.apache.org/jira/browse/HIVE-375
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Attachments: HIVE-375.1.patch
>
>
> LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
> Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
> We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
> This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-375) LazySimpleSerDe to directly serialize
(append) int/long/byte/short etc to UTF-8 buffer
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-375:
----------------------------
Status: Patch Available (was: Open)
> LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
> --------------------------------------------------------------------------------------
>
> Key: HIVE-375
> URL: https://issues.apache.org/jira/browse/HIVE-375
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Attachments: HIVE-375.1.patch
>
>
> LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
> Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
> We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
> This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-375) LazySimpleSerDe to directly serialize
(append) int/long/byte/short etc to UTF-8 buffer
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-375:
--------------------------------
Fix Version/s: 0.4.0
(was: 0.3.0)
> LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
> --------------------------------------------------------------------------------------
>
> Key: HIVE-375
> URL: https://issues.apache.org/jira/browse/HIVE-375
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-375.1.patch, HIVE-375.2.patch
>
>
> LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
> Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
> We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
> This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns), and this blocks HIVE-266.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-375) LazySimpleSerDe to directly serialize
(append) int/long/byte/short etc to UTF-8 buffer
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-375:
----------------------------
Attachment: HIVE-375.2.patch
Added 2 tests and fixed a bug.
> LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
> --------------------------------------------------------------------------------------
>
> Key: HIVE-375
> URL: https://issues.apache.org/jira/browse/HIVE-375
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Attachments: HIVE-375.1.patch, HIVE-375.2.patch
>
>
> LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
> Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
> We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
> This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-375) LazySimpleSerDe to directly serialize
(append) int/long/byte/short etc to UTF-8 buffer
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-375:
----------------------------
Resolution: Fixed
Fix Version/s: 0.3.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
committed. Thanks Zheng
> LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
> --------------------------------------------------------------------------------------
>
> Key: HIVE-375
> URL: https://issues.apache.org/jira/browse/HIVE-375
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.3.0
>
> Attachments: HIVE-375.1.patch, HIVE-375.2.patch
>
>
> LazySimpleSerDe currently serialize all data into a StringBuilder, and then convert it to String and then Text.
> Even if the data is of type int/long/byte/short, we still do that unnecessary conversion.
> We should directly serialize/append int/long/byte/short to a UTF-8 buffer.
> This is a very simple change, but it is expected to save 2-3% of the time of a typical mapper (on a group-by query with some int/long columns), and this blocks HIVE-266.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.