You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/11/04 03:14:27 UTC

[jira] [Commented] (DRILL-4028) Merge Drill parquet modifications back into the mainline project

    [ https://issues.apache.org/jira/browse/DRILL-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988742#comment-14988742 ] 

ASF GitHub Bot commented on DRILL-4028:
---------------------------------------

GitHub user jaltekruse opened a pull request:

    https://github.com/apache/drill/pull/236

    DRILL-4028: Get off parquet fork

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jaltekruse/incubator-drill parquet-update-squash

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #236
    
----
commit afb72c81bbba69346c48c77852f2429bae47dea4
Author: Jason Altekruse <al...@gmail.com>
Date:   2015-09-04T18:09:23Z

    DRILL-4028: Part 1 - Remove references to the shaded version of a Jackson @JsonCreator annotation from parquet, replace with proper fasterxml version.

commit 0f51a6bf341699aa7f14457b2c49097e84fff936
Author: Jason Altekruse <al...@gmail.com>
Date:   2015-09-04T18:17:21Z

    DRILL-4028: Part 2 - Fixing imports using the wrong parquet packages after rebase.
    
    clean up imports in generated source template

commit 4feb538da813f2f1a974337f5e6874866c3cd350
Author: Jason Altekruse <al...@gmail.com>
Date:   2015-09-14T18:13:04Z

    DRILL-4028: Part 3 - Fixing issues with Drill parquet read a write path after merging the Drill parquet fork back into mainline.
    
    Fixed the issue with the writer, needed to flush the RecordConsumer in the ParquetRecordWriter.
    
    Consolidate page reading code
    
    Fix buffer sizes, uncompressed and compressed sizes were backwards
    
    The issue was a mismatch in the usage of byte buffers. Even though the position of a buffer was being set, that seemed to be ignored in the setSafe method on the varbinary vector. I needed to pass in the offset as it seems to just read from the beginning of the buffer. I'm not sure this is how ByteBuffers are supposed to be used, but we seem to make use of this pattern commonly so I'm not sure it could be easily refactored.
    
    Added some test to print out some additional context when an ordered comparison of two datasets fails in a test.
    
    Removing usage of Drill classes from DirectCodecFactory, getting it ready to be moved into the parquet codebase.
    
    Fix up parquet API usage in Hive Module.
    
    Fix dictionary reading, the changes made I think may speed up reading dictionary encoded files by avoiding an extra copy.
    
    Adding unit test to read a write all types in parquet, the decimal types and interval year have some issues.
    
    Use direct codec factory from new package in the parquet library now that it has been moved.
    
    Moving the test for Direct Codec Factory out of the Drill source as the class itself has been moved.
    
    Small fix after consolidating two different ByteBuffer based implementations of BytesInput.
    
    Small fixes to accommodate interface changes.
    
    Small changes to remove direct references to DirectCodecFactory, this class is not accessible outside of parquet, but an instance with the same contract is now accessible with a new factory method on CodecFactory.
    
    Fixed failing test using miniDFS when reading a larger parquet file.

----


> Merge Drill parquet modifications back into the mainline project
> ----------------------------------------------------------------
>
>                 Key: DRILL-4028
>                 URL: https://issues.apache.org/jira/browse/DRILL-4028
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Jason Altekruse
>            Assignee: Jason Altekruse
>             Fix For: 1.3.0
>
>
> Drill has been maintaining a fork of Parquet for over a year. The changes need to make it back into the main repository so we don't have to bother merging in all of the new changes from the master repository into the fork.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)