You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/05/21 23:28:12 UTC

[jira] [Updated] (PIG-4656) Improve String serialization and comparator performance in BinInterSedes

     [ https://issues.apache.org/jira/browse/PIG-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-4656:
------------------------------------
    Fix Version/s:     (was: 0.16.0)
                   0.17.0

> Improve String serialization and comparator performance in BinInterSedes
> ------------------------------------------------------------------------
>
>                 Key: PIG-4656
>                 URL: https://issues.apache.org/jira/browse/PIG-4656
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.17.0
>
>         Attachments: PIG-4656-1.patch
>
>
> Two major optimizations can be done:
>   -  PIG-1472 added multiple data types to store different sizes (byte, short, int). It can be simplified using WritableUtils.writeVInt. There is no difference for byte and short compared to current approach. But with int, it could be beneficial where lot of numbers could be written with 3 bytes instead of 4. For eg: 32768 is written using 3 bytes in with WritableUtils.writeVInt whereas currently 4 bytes (int) is used. 
>   -  String comparison in BinInterSedesTupleRawComparator initializes String for comparison. Should instead compare bytes like Text.Comparator.
> {code}
> str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8);
> str2 = new String(bb2.array(), bb2.position(), casz2, BinInterSedes.UTF8);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)