You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/03/26 02:52:41 UTC

[jira] [Commented] (DRILL-5385) Vector serializer fails to read saved SV2

    [ https://issues.apache.org/jira/browse/DRILL-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942109#comment-15942109 ] 

ASF GitHub Bot commented on DRILL-5385:
---------------------------------------

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/800

    DRILL-5385: Vector serializer fails to read saved SV2

    Unit testing revealed that the VectorAccessorSerializable class claims
    to serialize SV2s, but, in fact, does not. Actually, it writes them,
    but does not read them, resulting in corrupted data on read.
    
    Fortunately, no code appears to serialize sv2s at present. Still, it is
    a bug and needs to be fixed.
    
    First task is to add serialization code for the sv2.
    
    That revealed that the recently-added code to save DrillBufs using a
    shared buffer had a bug: it relied on the writer index to know how much
    data is in the buffer. Turns out sv2 buffers don’t set this index. So,
    new versions of the write function takes a write length.
    
    Then, closer inspection of the read code revealed duplicated code. So,
    DrillBuf allocation moved into a version of the read function that now
    does reading and DrillBuf allocation.
    
    Turns out that value vectors, but not SV2s, can be built from a
    Drillbuf. Added a matching constructor to the SV2 class.
    
    Finally, cleaned up the code a bit to make it easier to follow. Also
    allowed test code to access the handy timer already present in the code.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-5385

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/800.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #800
    
----
commit 26a6cf9eed5347a06640bd72fb3720ea9369c001
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-03-26T02:51:43Z

    DRILL-5385: Vector serializer fails to read saved SV2
    
    Unit testing revealed that the VectorAccessorSerializable class claims
    to serialize SV2s, but, in fact, does not. Actually, it writes them,
    but does not read them, resulting in corrupted data on read.
    
    Fortunately, no code appears to serialize sv2s at present. Still, it is
    a bug and needs to be fixed.
    
    First task is to add serialization code for the sv2.
    
    That revealed that the recently-added code to save DrillBufs using a
    shared buffer had a bug: it relied on the writer index to know how much
    data is in the buffer. Turns out sv2 buffers don’t set this index. So,
    new versions of the write function takes a write length.
    
    Then, closer inspection of the read code revealed duplicated code. So,
    DrillBuf allocation moved into a version of the read function that now
    does reading and DrillBuf allocation.
    
    Turns out that value vectors, but not SV2s, can be built from a
    Drillbuf. Added a matching constructor to the SV2 class.
    
    Finally, cleaned up the code a bit to make it easier to follow. Also
    allowed test code to access the handy timer already present in the code.

----


> Vector serializer fails to read saved SV2
> -----------------------------------------
>
>                 Key: DRILL-5385
>                 URL: https://issues.apache.org/jira/browse/DRILL-5385
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.11.0
>
>
> Drill provides the {{VectorAccessibleSerializable}} class to write a record batch to a stream, and to read that batch from a stream. Record batches can carry an indirection vector (a so-called selection vector 2 or SV2).
> The code to write batches writes the SV2 to the stream. But, the code to deserialize batches initializes, but does not read, the SV2 from the stream.
> The result is that vector deserialization reads the wrong bytes and the saved values are corrupted on read.
> Note that this issue was found via unit testing. At present, the only production use of this code is in the external sort, which serializes batches without an indirection vector.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)