You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/05/13 23:56:00 UTC

[jira] [Commented] (PIG-4506) binstorage fails to write biginteger

    [ https://issues.apache.org/jira/browse/PIG-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542802#comment-14542802 ] 

Rohini Palaniswamy commented on PIG-4506:
-----------------------------------------

  The fix might be easy, but it is bad as it wastes one byte per biginteger and bigdecimal.

{code}
case DataType.BIGINTEGER:
                out.writeByte(DataType.BIGINTEGER);
                writeDatum(out, ((BigInteger)val).toByteArray());
                break;
case DataType.BIGDECIMAL:
                out.writeByte(DataType.BIGDECIMAL);
                writeDatum(out, ((BigDecimal)val).toString());
{code}

Instead of actually writing DataType.BIGINTEGER + length of bytearray + byte array, the code is writing DataType.BIGINTEGER + DATATYPE.BYTEARRAY + length of bytearray + byte array. In case of BigDecimal it is DataType.BIGDECIMAL + DataType.CHARARRAY/DataType.BIGCHARARRAY + short/int length of bytearray + byte array.   We should get rid of the DATATYPE.BYTEARRAY  and DataType.CHARARRAY/DataType.BIGCHARARRAY. Though it makes for easy coding it is inefficient.

Can we extract out the writing and reading of bytearray and its length and reuse that code instead of calling writeDatum(databytearray).  For BigDecimal, we can always do out.writeShort(length) as the length of the length of the BigDecimal String should not be > 65535. 

> binstorage fails to write biginteger
> ------------------------------------
>
>                 Key: PIG-4506
>                 URL: https://issues.apache.org/jira/browse/PIG-4506
>             Project: Pig
>          Issue Type: Bug
>          Components: data, impl
>            Reporter: Savvas Savvides
>            Assignee: Savvas Savvides
>             Fix For: 0.15.0
>
>         Attachments: PIG-4506-1.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> When trying to store a biginteger using binstorage the following error is issued (The error might manifest elsewhere too):
> java.lang.RuntimeException: Unexpected data type -1 found in stream
> This is caused by a bug in the writeDatum method of the DataReaderWriter.java class. When writeDatum is called with a BigInteger as a argument, the BigInteger is converted to a byte[] and the writeDatum method is recursively called on the byte[]. writeDatum cannon handle byte[] objects but instead expects DataByteArray objects.
> Suggested fix - wrap byte[] to DataByteArray:
> change this line:
> _writeDatum(out, ((BigInteger)val).toByteArray());_
> to this:
> _writeDatum(out, new DataByteArray(((BigInteger)val).toByteArray()));_



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)