You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by Jason Altekruse <al...@gmail.com> on 2014/10/07 07:51:14 UTC

High performance vectorized reader meeting notes

Hello Parquet team,

I wanted to report the results of a discussion between the Drill team and
the engineers  at Netflix working to make Parquet run faster with Presto.
As we have said in the last few hangouts we both want to make contributions
back to parquet-mr to add features and performance. We thought it would be
good to sit down and speak directly about our real goals and the best next
steps to get an engineering effort started to accomplish these goals.

Below is a summary of the meeting.

- Meeting notes

   - Attendees:

       - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo

       - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth Chandra

- Minutes

   - Introductions/ Background

   - Netflix

       - Working on providing interactive SQL querying to users

       - have chosen Presto as the query engine and Parquet as high
performance data

         storage format

       - Presto is providing needed speed in some cases, but others are
missing optimizations

         that could be avoiding reads

       - Have already started some development and investigation, have
identified key goals

       - Some initial benchmarks with a modified ORC reader DWRF, written
by the Presto

         team shows that such gains are possible with a different reader
implementation

       - goals

           - filter pushdown

               - skipping reads based on filter evaluation on one or more
columns

               - this can happen at several granularities : row group,
page, record/value

           - late/lazy materialization

               - for columns not involved in a filter, avoid materializing
them entirely

                 until they are know to be needed after evaluating a filter
on other columns

   - Drill

       - the Drill engine uses an in-memory vectorized representation of
records

       - for scalar and repeated types we have implemented a fast
vectorized reader

         that is optimized to transform between Parquet's on disk and our
in-memory format

       - this is currently producing performant table scans, but has no
facility for filter

         push down

       - Major goals going forward

           - filter pushdown

               - decide the best implementation for incorporating filter
pushdown into

                 our current implementation, or figure out a way to
leverage existing

                 work in the parquet-mr library to accomplish this goal

           - late/lazy materialization

               - see above

           - contribute existing code back to parquet

               - the Drill parquet reader has a very strong emphasis on
performance, a

                 clear interface to consume, that is sufficiently separated
from Drill

                 could prove very useful for other projects

   - First steps

       - Netflix team will share some of their thoughts and research from
working with

         the DWRF code

           - we can have a discussion based off of this, which aspects are
done well,

             and any opportunities they may have missed that we can
incorporate into our

             design

           - do further investigation and ask the existing community for
guidance on existing

             parquet-mr features or planned APIs that may provide desired
functionality

       - We will begin a discussion of an API for the new functionality

           - some outstanding thoughts for down the road

               - The Drill team has an interest in very late
materialization for data stored

                 in dictionary encoded pages, such as running a join or
filter on the dictionary

                 and then going back to the reader to grab all of the
values in the data that match

                 the needed members of the dictionary

                   - this is a later consideration, but just some of the
idea of the reason we are

                     opening up the design discussion early so that the API
can be flexible enough
                     to allow this in the further, even if not implemented
too soon

Re: High performance vectorized reader meeting notes

Posted by Zhenxiao Luo <zl...@netflix.com.INVALID>.

Hi Ryan,

I just created this JIRA for it:
https://issues.apache.org/jira/browse/PARQUET-131

Comments and suggestions are welcome.

Thanks,
Zhenxiao

On Mon, Nov 10, 2014 at 10:59 AM, Ryan Blue <bl...@cloudera.com> wrote:

> Hi everyone,
>
> Is there a JIRA issue tracking the vectorized reader API? Brock and I have
> been working through how we would integrate this with Hive and have a few
> questions and comments. Thanks!
>
> rb
>
>
> On 11/01/2014 01:03 PM, Brock Noland wrote:
>
>> Hi,
>>
>> Great! I will take a look soon.
>>
>> Cheers!
>> Brock
>>
>> On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:
>>
>>
>>> Thanks Jacques.
>>>
>>> Here is the gist:
>>> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>>>
>>> Comments and Suggestions are appreciated.
>>>
>>> Thanks,
>>> Zhenxiao
>>>
>>> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>>
>>>  You can't send attachments.  Can you post as google doc or gist?
>>>>
>>>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>> >
>>>> wrote:
>>>>
>>>>
>>>>> Thanks Brock and Jason.
>>>>>
>>>>> I just drafted a proposed APIs for vectorized Parquet reader(attached
>>>>> in
>>>>> this email). Any comments and suggestions are appreciated.
>>>>>
>>>>> Thanks,
>>>>> Zhenxiao
>>>>>
>>>>> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>>  Hi,
>>>>>>
>>>>>> The Hive + Parquet community is very interested in improving
>>>>>>
>>>>> performance
>>>>
>>>>> of
>>>>>> Hive + Parquet and Parquet generally. We are very interested in
>>>>>> contributing to the Parquet vectorization and lazy materialization
>>>>>>
>>>>> effort.
>>>>
>>>>> Please add myself to any future meetings on this topic.
>>>>>>
>>>>>> BTW here it the JIRA tracking this effort from the Hive side:
>>>>>> https://issues.apache.org/jira/browse/HIVE-8120
>>>>>>
>>>>>> Brock
>>>>>>
>>>>>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>>>>
>>>>>
>>>>>  wrote:
>>>>>>
>>>>>>  Thanks Jason.
>>>>>>>
>>>>>>> Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>  http://techblog.netflix.com/2014/10/using-presto-in-our-
>>>> big-data-platform.html
>>>>
>>>>> ).
>>>>>>>
>>>>>>> The fastest format currently in Presto is ORC, not DWRF(Parquet is
>>>>>>>
>>>>>> fast,
>>>>
>>>>> but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>>>>>>> implementation.
>>>>>>>
>>>>>>> We already get Parquet working in Presto. We definitely would like to
>>>>>>>
>>>>>> get
>>>>>>
>>>>>>> it as fast as ORC.
>>>>>>>
>>>>>>> Facebook has did native support for ORC in Presto, which does not use
>>>>>>>
>>>>>> the
>>>>>>
>>>>>>> ORCRecordReader at all. They parses the ORC footer, and does
>>>>>>>
>>>>>> Predicate
>>>>
>>>>> Pushdown by skipping row groups, Vectorization by introducing Type
>>>>>>>
>>>>>> Specific
>>>>>>
>>>>>>> Vectors, and Lazy Materialization by introducing LazyVectors(their
>>>>>>>
>>>>>> code
>>>>
>>>>> has
>>>>>>
>>>>>>> not been committed yet, I mean their pull request). We are planning
>>>>>>>
>>>>>> to
>>>>
>>>>> do
>>>>>>
>>>>>>> similar optimization for Parquet in Presto.
>>>>>>>
>>>>>>> For the ParquetRecordReader, we need additional APIs to read the next
>>>>>>>
>>>>>> Batch
>>>>>>
>>>>>>> of values, and read in a vector of values. For example, here are the
>>>>>>> related APIs in the ORC code:
>>>>>>>
>>>>>>> /**
>>>>>>>     * Read the next row batch. The size of the batch to read cannot
>>>>>>> be
>>>>>>> controlled
>>>>>>>     * by the callers. Caller need to look at VectorizedRowBatch.size
>>>>>>>
>>>>>> of
>>>>
>>>>> the
>>>>>>
>>>>>>> retunred
>>>>>>>     * object to know the batch size read.
>>>>>>>     * @param previousBatch a row batch object that can be reused by
>>>>>>>
>>>>>> the
>>>>
>>>>> reader
>>>>>>>     * @return the row batch that was read
>>>>>>>     * @throws java.io.IOException
>>>>>>>     */
>>>>>>>    VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>>>>>>>
>>>>>> throws
>>>>
>>>>> IOException;
>>>>>>>
>>>>>>> And, here are the related APIs in Presto code, which is used for ORC
>>>>>>> support in Presto:
>>>>>>>
>>>>>>> public void readVector(int columnIndex, Object vector);
>>>>>>>
>>>>>>> For lazy materialization, we may also consider adding LazyVectors or
>>>>>>> LazyBlocks, so that the value is not materialized until they are
>>>>>>>
>>>>>> accessed
>>>>>>
>>>>>>> by the Operator.
>>>>>>>
>>>>>>> Any comments and suggestions are appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Zhenxiao
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>>>>>>>
>>>>>> altekrusejason@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Hello All,
>>>>>>>>
>>>>>>>> No updates from me yet, just sending out another message for some
>>>>>>>>
>>>>>>> of
>>>>
>>>>> the
>>>>>>
>>>>>>> Netflix engineers that were still just subscribed to the google
>>>>>>>>
>>>>>>> group
>>>>
>>>>> mail.
>>>>>>>
>>>>>>>> This will allow them to respond directly with their research on the
>>>>>>>> optimized ORC reader for consideration in the design discussion.
>>>>>>>>
>>>>>>>> -Jason
>>>>>>>>
>>>>>>>> On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>>>>>>>>
>>>>>>> altekrusejason@gmail.com
>>>>>>>
>>>>>>>>
>>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>  Hello Parquet team,
>>>>>>>>>
>>>>>>>>> I wanted to report the results of a discussion between the Drill
>>>>>>>>>
>>>>>>>> team
>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>> the engineers  at Netflix working to make Parquet run faster with
>>>>>>>>>
>>>>>>>> Presto.
>>>>>>>
>>>>>>>> As we have said in the last few hangouts we both want to make
>>>>>>>>>
>>>>>>>> contributions
>>>>>>>>
>>>>>>>>> back to parquet-mr to add features and performance. We thought it
>>>>>>>>>
>>>>>>>> would
>>>>>>
>>>>>>> be
>>>>>>>>
>>>>>>>>> good to sit down and speak directly about our real goals and the
>>>>>>>>>
>>>>>>>> best
>>>>>>
>>>>>>> next
>>>>>>>>
>>>>>>>>> steps to get an engineering effort started to accomplish these
>>>>>>>>>
>>>>>>>> goals.
>>>>>>
>>>>>>>
>>>>>>>>> Below is a summary of the meeting.
>>>>>>>>>
>>>>>>>>> - Meeting notes
>>>>>>>>>
>>>>>>>>>     - Attendees:
>>>>>>>>>
>>>>>>>>>         - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>>>>>>>>>
>>>>>>>>>         - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>>>>>>>>>
>>>>>>>> Parth
>>>>
>>>>> Chandra
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Minutes
>>>>>>>>>
>>>>>>>>>     - Introductions/ Background
>>>>>>>>>
>>>>>>>>>     - Netflix
>>>>>>>>>
>>>>>>>>>         - Working on providing interactive SQL querying to users
>>>>>>>>>
>>>>>>>>>         - have chosen Presto as the query engine and Parquet as
>>>>>>>>>
>>>>>>>> high
>>>>
>>>>> performance data
>>>>>>>>>
>>>>>>>>>           storage format
>>>>>>>>>
>>>>>>>>>         - Presto is providing needed speed in some cases, but
>>>>>>>>>
>>>>>>>> others
>>>>
>>>>> are
>>>>>>
>>>>>>> missing optimizations
>>>>>>>>>
>>>>>>>>>           that could be avoiding reads
>>>>>>>>>
>>>>>>>>>         - Have already started some development and investigation,
>>>>>>>>>
>>>>>>>> have
>>>>>>
>>>>>>> identified key goals
>>>>>>>>>
>>>>>>>>>         - Some initial benchmarks with a modified ORC reader DWRF,
>>>>>>>>>
>>>>>>>> written
>>>>>>>
>>>>>>>> by the Presto
>>>>>>>>>
>>>>>>>>>           team shows that such gains are possible with a different
>>>>>>>>>
>>>>>>>> reader
>>>>>>>
>>>>>>>> implementation
>>>>>>>>>
>>>>>>>>>         - goals
>>>>>>>>>
>>>>>>>>>             - filter pushdown
>>>>>>>>>
>>>>>>>>>                 - skipping reads based on filter evaluation on
>>>>>>>>>
>>>>>>>> one or
>>>>
>>>>> more
>>>>>>>
>>>>>>>> columns
>>>>>>>>>
>>>>>>>>>                 - this can happen at several granularities : row
>>>>>>>>>
>>>>>>>> group,
>>>>>>
>>>>>>> page, record/value
>>>>>>>>>
>>>>>>>>>             - late/lazy materialization
>>>>>>>>>
>>>>>>>>>                 - for columns not involved in a filter, avoid
>>>>>>>>>
>>>>>>>> materializing
>>>>>>>>
>>>>>>>>> them entirely
>>>>>>>>>
>>>>>>>>>                   until they are know to be needed after
>>>>>>>>>
>>>>>>>> evaluating a
>>>>
>>>>> filter on other columns
>>>>>>>>>
>>>>>>>>>     - Drill
>>>>>>>>>
>>>>>>>>>         - the Drill engine uses an in-memory vectorized
>>>>>>>>>
>>>>>>>> representation
>>>>>>
>>>>>>> of
>>>>>>>
>>>>>>>> records
>>>>>>>>>
>>>>>>>>>         - for scalar and repeated types we have implemented a fast
>>>>>>>>> vectorized reader
>>>>>>>>>
>>>>>>>>>           that is optimized to transform between Parquet's on disk
>>>>>>>>>
>>>>>>>> and
>>>>>>
>>>>>>> our
>>>>>>>
>>>>>>>> in-memory format
>>>>>>>>>
>>>>>>>>>         - this is currently producing performant table scans, but
>>>>>>>>>
>>>>>>>> has no
>>>>>>
>>>>>>> facility for filter
>>>>>>>>>
>>>>>>>>>           push down
>>>>>>>>>
>>>>>>>>>         - Major goals going forward
>>>>>>>>>
>>>>>>>>>             - filter pushdown
>>>>>>>>>
>>>>>>>>>                 - decide the best implementation for incorporating
>>>>>>>>>
>>>>>>>> filter
>>>>>>>
>>>>>>>> pushdown into
>>>>>>>>>
>>>>>>>>>                   our current implementation, or figure out a way
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> leverage existing
>>>>>>>>>
>>>>>>>>>                   work in the parquet-mr library to accomplish
>>>>>>>>>
>>>>>>>> this
>>>>
>>>>> goal
>>>>>>
>>>>>>>
>>>>>>>>>             - late/lazy materialization
>>>>>>>>>
>>>>>>>>>                 - see above
>>>>>>>>>
>>>>>>>>>             - contribute existing code back to parquet
>>>>>>>>>
>>>>>>>>>                 - the Drill parquet reader has a very strong
>>>>>>>>>
>>>>>>>> emphasis on
>>>>>>
>>>>>>> performance, a
>>>>>>>>>
>>>>>>>>>                   clear interface to consume, that is sufficiently
>>>>>>>>> separated from Drill
>>>>>>>>>
>>>>>>>>>                   could prove very useful for other projects
>>>>>>>>>
>>>>>>>>>     - First steps
>>>>>>>>>
>>>>>>>>>         - Netflix team will share some of their thoughts and
>>>>>>>>>
>>>>>>>> research
>>>>
>>>>> from
>>>>>>>
>>>>>>>> working with
>>>>>>>>>
>>>>>>>>>           the DWRF code
>>>>>>>>>
>>>>>>>>>             - we can have a discussion based off of this, which
>>>>>>>>>
>>>>>>>> aspects
>>>>>>
>>>>>>> are
>>>>>>>>
>>>>>>>>> done well,
>>>>>>>>>
>>>>>>>>>               and any opportunities they may have missed that we
>>>>>>>>>
>>>>>>>> can
>>>>
>>>>> incorporate into our
>>>>>>>>>
>>>>>>>>>               design
>>>>>>>>>
>>>>>>>>>             - do further investigation and ask the existing
>>>>>>>>>
>>>>>>>> community
>>>>
>>>>> for
>>>>>>>
>>>>>>>> guidance on existing
>>>>>>>>>
>>>>>>>>>               parquet-mr features or planned APIs that may provide
>>>>>>>>>
>>>>>>>> desired
>>>>>>>
>>>>>>>> functionality
>>>>>>>>>
>>>>>>>>>         - We will begin a discussion of an API for the new
>>>>>>>>>
>>>>>>>> functionality
>>>>>>
>>>>>>>
>>>>>>>>>             - some outstanding thoughts for down the road
>>>>>>>>>
>>>>>>>>>                 - The Drill team has an interest in very late
>>>>>>>>> materialization for data stored
>>>>>>>>>
>>>>>>>>>                   in dictionary encoded pages, such as running a
>>>>>>>>>
>>>>>>>> join or
>>>>>>
>>>>>>> filter on the dictionary
>>>>>>>>>
>>>>>>>>>                   and then going back to the reader to grab all of
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> values in the data that match
>>>>>>>>>
>>>>>>>>>                   the needed members of the dictionary
>>>>>>>>>
>>>>>>>>>                     - this is a later consideration, but just
>>>>>>>>>
>>>>>>>> some of
>>>>
>>>>> the
>>>>>>>
>>>>>>>> idea of the reason we are
>>>>>>>>>
>>>>>>>>>                       opening up the design discussion early so
>>>>>>>>>
>>>>>>>> that
>>>>
>>>>> the
>>>>>>
>>>>>>> API can be flexible enough
>>>>>>>>>                       to allow this in the further, even if not
>>>>>>>>>
>>>>>>>> implemented
>>>>>>>>
>>>>>>>>> too soon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: High performance vectorized reader meeting notes

Posted by Ryan Blue <bl...@cloudera.com>.

Hi everyone,

Is there a JIRA issue tracking the vectorized reader API? Brock and I 
have been working through how we would integrate this with Hive and have 
a few questions and comments. Thanks!

rb

On 11/01/2014 01:03 PM, Brock Noland wrote:
> Hi,
>
> Great! I will take a look soon.
>
> Cheers!
> Brock
>
> On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:
>
>>
>> Thanks Jacques.
>>
>> Here is the gist:
>> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>>
>> Comments and Suggestions are appreciated.
>>
>> Thanks,
>> Zhenxiao
>>
>> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>>> You can't send attachments.  Can you post as google doc or gist?
>>>
>>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
>>> wrote:
>>>
>>>>
>>>> Thanks Brock and Jason.
>>>>
>>>> I just drafted a proposed APIs for vectorized Parquet reader(attached in
>>>> this email). Any comments and suggestions are appreciated.
>>>>
>>>> Thanks,
>>>> Zhenxiao
>>>>
>>>> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The Hive + Parquet community is very interested in improving
>>> performance
>>>>> of
>>>>> Hive + Parquet and Parquet generally. We are very interested in
>>>>> contributing to the Parquet vectorization and lazy materialization
>>> effort.
>>>>> Please add myself to any future meetings on this topic.
>>>>>
>>>>> BTW here it the JIRA tracking this effort from the Hive side:
>>>>> https://issues.apache.org/jira/browse/HIVE-8120
>>>>>
>>>>> Brock
>>>>>
>>>>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>>
>>>>> wrote:
>>>>>
>>>>>> Thanks Jason.
>>>>>>
>>>>>> Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>>>>>>
>>>>>>
>>>>>
>>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>>>>>> ).
>>>>>>
>>>>>> The fastest format currently in Presto is ORC, not DWRF(Parquet is
>>> fast,
>>>>>> but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>>>>>> implementation.
>>>>>>
>>>>>> We already get Parquet working in Presto. We definitely would like to
>>>>> get
>>>>>> it as fast as ORC.
>>>>>>
>>>>>> Facebook has did native support for ORC in Presto, which does not use
>>>>> the
>>>>>> ORCRecordReader at all. They parses the ORC footer, and does
>>> Predicate
>>>>>> Pushdown by skipping row groups, Vectorization by introducing Type
>>>>> Specific
>>>>>> Vectors, and Lazy Materialization by introducing LazyVectors(their
>>> code
>>>>> has
>>>>>> not been committed yet, I mean their pull request). We are planning
>>> to
>>>>> do
>>>>>> similar optimization for Parquet in Presto.
>>>>>>
>>>>>> For the ParquetRecordReader, we need additional APIs to read the next
>>>>> Batch
>>>>>> of values, and read in a vector of values. For example, here are the
>>>>>> related APIs in the ORC code:
>>>>>>
>>>>>> /**
>>>>>>     * Read the next row batch. The size of the batch to read cannot be
>>>>>> controlled
>>>>>>     * by the callers. Caller need to look at VectorizedRowBatch.size
>>> of
>>>>> the
>>>>>> retunred
>>>>>>     * object to know the batch size read.
>>>>>>     * @param previousBatch a row batch object that can be reused by
>>> the
>>>>>> reader
>>>>>>     * @return the row batch that was read
>>>>>>     * @throws java.io.IOException
>>>>>>     */
>>>>>>    VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>>> throws
>>>>>> IOException;
>>>>>>
>>>>>> And, here are the related APIs in Presto code, which is used for ORC
>>>>>> support in Presto:
>>>>>>
>>>>>> public void readVector(int columnIndex, Object vector);
>>>>>>
>>>>>> For lazy materialization, we may also consider adding LazyVectors or
>>>>>> LazyBlocks, so that the value is not materialized until they are
>>>>> accessed
>>>>>> by the Operator.
>>>>>>
>>>>>> Any comments and suggestions are appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> Zhenxiao
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>>>>> altekrusejason@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> No updates from me yet, just sending out another message for some
>>> of
>>>>> the
>>>>>>> Netflix engineers that were still just subscribed to the google
>>> group
>>>>>> mail.
>>>>>>> This will allow them to respond directly with their research on the
>>>>>>> optimized ORC reader for consideration in the design discussion.
>>>>>>>
>>>>>>> -Jason
>>>>>>>
>>>>>>> On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>>>>>> altekrusejason@gmail.com
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Parquet team,
>>>>>>>>
>>>>>>>> I wanted to report the results of a discussion between the Drill
>>>>> team
>>>>>> and
>>>>>>>> the engineers  at Netflix working to make Parquet run faster with
>>>>>> Presto.
>>>>>>>> As we have said in the last few hangouts we both want to make
>>>>>>> contributions
>>>>>>>> back to parquet-mr to add features and performance. We thought it
>>>>> would
>>>>>>> be
>>>>>>>> good to sit down and speak directly about our real goals and the
>>>>> best
>>>>>>> next
>>>>>>>> steps to get an engineering effort started to accomplish these
>>>>> goals.
>>>>>>>>
>>>>>>>> Below is a summary of the meeting.
>>>>>>>>
>>>>>>>> - Meeting notes
>>>>>>>>
>>>>>>>>     - Attendees:
>>>>>>>>
>>>>>>>>         - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>>>>>>>>
>>>>>>>>         - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>>> Parth
>>>>>>> Chandra
>>>>>>>>
>>>>>>>> - Minutes
>>>>>>>>
>>>>>>>>     - Introductions/ Background
>>>>>>>>
>>>>>>>>     - Netflix
>>>>>>>>
>>>>>>>>         - Working on providing interactive SQL querying to users
>>>>>>>>
>>>>>>>>         - have chosen Presto as the query engine and Parquet as
>>> high
>>>>>>>> performance data
>>>>>>>>
>>>>>>>>           storage format
>>>>>>>>
>>>>>>>>         - Presto is providing needed speed in some cases, but
>>> others
>>>>> are
>>>>>>>> missing optimizations
>>>>>>>>
>>>>>>>>           that could be avoiding reads
>>>>>>>>
>>>>>>>>         - Have already started some development and investigation,
>>>>> have
>>>>>>>> identified key goals
>>>>>>>>
>>>>>>>>         - Some initial benchmarks with a modified ORC reader DWRF,
>>>>>> written
>>>>>>>> by the Presto
>>>>>>>>
>>>>>>>>           team shows that such gains are possible with a different
>>>>>> reader
>>>>>>>> implementation
>>>>>>>>
>>>>>>>>         - goals
>>>>>>>>
>>>>>>>>             - filter pushdown
>>>>>>>>
>>>>>>>>                 - skipping reads based on filter evaluation on
>>> one or
>>>>>> more
>>>>>>>> columns
>>>>>>>>
>>>>>>>>                 - this can happen at several granularities : row
>>>>> group,
>>>>>>>> page, record/value
>>>>>>>>
>>>>>>>>             - late/lazy materialization
>>>>>>>>
>>>>>>>>                 - for columns not involved in a filter, avoid
>>>>>>> materializing
>>>>>>>> them entirely
>>>>>>>>
>>>>>>>>                   until they are know to be needed after
>>> evaluating a
>>>>>>>> filter on other columns
>>>>>>>>
>>>>>>>>     - Drill
>>>>>>>>
>>>>>>>>         - the Drill engine uses an in-memory vectorized
>>>>> representation
>>>>>> of
>>>>>>>> records
>>>>>>>>
>>>>>>>>         - for scalar and repeated types we have implemented a fast
>>>>>>>> vectorized reader
>>>>>>>>
>>>>>>>>           that is optimized to transform between Parquet's on disk
>>>>> and
>>>>>> our
>>>>>>>> in-memory format
>>>>>>>>
>>>>>>>>         - this is currently producing performant table scans, but
>>>>> has no
>>>>>>>> facility for filter
>>>>>>>>
>>>>>>>>           push down
>>>>>>>>
>>>>>>>>         - Major goals going forward
>>>>>>>>
>>>>>>>>             - filter pushdown
>>>>>>>>
>>>>>>>>                 - decide the best implementation for incorporating
>>>>>> filter
>>>>>>>> pushdown into
>>>>>>>>
>>>>>>>>                   our current implementation, or figure out a way
>>> to
>>>>>>>> leverage existing
>>>>>>>>
>>>>>>>>                   work in the parquet-mr library to accomplish
>>> this
>>>>> goal
>>>>>>>>
>>>>>>>>             - late/lazy materialization
>>>>>>>>
>>>>>>>>                 - see above
>>>>>>>>
>>>>>>>>             - contribute existing code back to parquet
>>>>>>>>
>>>>>>>>                 - the Drill parquet reader has a very strong
>>>>> emphasis on
>>>>>>>> performance, a
>>>>>>>>
>>>>>>>>                   clear interface to consume, that is sufficiently
>>>>>>>> separated from Drill
>>>>>>>>
>>>>>>>>                   could prove very useful for other projects
>>>>>>>>
>>>>>>>>     - First steps
>>>>>>>>
>>>>>>>>         - Netflix team will share some of their thoughts and
>>> research
>>>>>> from
>>>>>>>> working with
>>>>>>>>
>>>>>>>>           the DWRF code
>>>>>>>>
>>>>>>>>             - we can have a discussion based off of this, which
>>>>> aspects
>>>>>>> are
>>>>>>>> done well,
>>>>>>>>
>>>>>>>>               and any opportunities they may have missed that we
>>> can
>>>>>>>> incorporate into our
>>>>>>>>
>>>>>>>>               design
>>>>>>>>
>>>>>>>>             - do further investigation and ask the existing
>>> community
>>>>>> for
>>>>>>>> guidance on existing
>>>>>>>>
>>>>>>>>               parquet-mr features or planned APIs that may provide
>>>>>> desired
>>>>>>>> functionality
>>>>>>>>
>>>>>>>>         - We will begin a discussion of an API for the new
>>>>> functionality
>>>>>>>>
>>>>>>>>             - some outstanding thoughts for down the road
>>>>>>>>
>>>>>>>>                 - The Drill team has an interest in very late
>>>>>>>> materialization for data stored
>>>>>>>>
>>>>>>>>                   in dictionary encoded pages, such as running a
>>>>> join or
>>>>>>>> filter on the dictionary
>>>>>>>>
>>>>>>>>                   and then going back to the reader to grab all of
>>>>> the
>>>>>>>> values in the data that match
>>>>>>>>
>>>>>>>>                   the needed members of the dictionary
>>>>>>>>
>>>>>>>>                     - this is a later consideration, but just
>>> some of
>>>>>> the
>>>>>>>> idea of the reason we are
>>>>>>>>
>>>>>>>>                       opening up the design discussion early so
>>> that
>>>>> the
>>>>>>>> API can be flexible enough
>>>>>>>>                       to allow this in the further, even if not
>>>>>>> implemented
>>>>>>>> too soon
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: High performance vectorized reader meeting notes

Posted by Brock Noland <br...@cloudera.com>.

Hi,

Great! I will take a look soon.

Cheers!
Brock

On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:

>
> Thanks Jacques.
>
> Here is the gist:
> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>
> Comments and Suggestions are appreciated.
>
> Thanks,
> Zhenxiao
>
> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
>> You can't send attachments.  Can you post as google doc or gist?
>>
>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
>> wrote:
>>
>> >
>> > Thanks Brock and Jason.
>> >
>> > I just drafted a proposed APIs for vectorized Parquet reader(attached in
>> > this email). Any comments and suggestions are appreciated.
>> >
>> > Thanks,
>> > Zhenxiao
>> >
>> > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> The Hive + Parquet community is very interested in improving
>> performance
>> >> of
>> >> Hive + Parquet and Parquet generally. We are very interested in
>> >> contributing to the Parquet vectorization and lazy materialization
>> effort.
>> >> Please add myself to any future meetings on this topic.
>> >>
>> >> BTW here it the JIRA tracking this effort from the Hive side:
>> >> https://issues.apache.org/jira/browse/HIVE-8120
>> >>
>> >> Brock
>> >>
>> >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>> >
>> >> wrote:
>> >>
>> >> > Thanks Jason.
>> >> >
>> >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>> >> >
>> >> >
>> >>
>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>> >> > ).
>> >> >
>> >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is
>> fast,
>> >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>> >> > implementation.
>> >> >
>> >> > We already get Parquet working in Presto. We definitely would like to
>> >> get
>> >> > it as fast as ORC.
>> >> >
>> >> > Facebook has did native support for ORC in Presto, which does not use
>> >> the
>> >> > ORCRecordReader at all. They parses the ORC footer, and does
>> Predicate
>> >> > Pushdown by skipping row groups, Vectorization by introducing Type
>> >> Specific
>> >> > Vectors, and Lazy Materialization by introducing LazyVectors(their
>> code
>> >> has
>> >> > not been committed yet, I mean their pull request). We are planning
>> to
>> >> do
>> >> > similar optimization for Parquet in Presto.
>> >> >
>> >> > For the ParquetRecordReader, we need additional APIs to read the next
>> >> Batch
>> >> > of values, and read in a vector of values. For example, here are the
>> >> > related APIs in the ORC code:
>> >> >
>> >> > /**
>> >> >    * Read the next row batch. The size of the batch to read cannot be
>> >> > controlled
>> >> >    * by the callers. Caller need to look at VectorizedRowBatch.size
>> of
>> >> the
>> >> > retunred
>> >> >    * object to know the batch size read.
>> >> >    * @param previousBatch a row batch object that can be reused by
>> the
>> >> > reader
>> >> >    * @return the row batch that was read
>> >> >    * @throws java.io.IOException
>> >> >    */
>> >> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>> throws
>> >> > IOException;
>> >> >
>> >> > And, here are the related APIs in Presto code, which is used for ORC
>> >> > support in Presto:
>> >> >
>> >> > public void readVector(int columnIndex, Object vector);
>> >> >
>> >> > For lazy materialization, we may also consider adding LazyVectors or
>> >> > LazyBlocks, so that the value is not materialized until they are
>> >> accessed
>> >> > by the Operator.
>> >> >
>> >> > Any comments and suggestions are appreciated.
>> >> >
>> >> > Thanks,
>> >> > Zhenxiao
>> >> >
>> >> >
>> >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>> >> altekrusejason@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hello All,
>> >> > >
>> >> > > No updates from me yet, just sending out another message for some
>> of
>> >> the
>> >> > > Netflix engineers that were still just subscribed to the google
>> group
>> >> > mail.
>> >> > > This will allow them to respond directly with their research on the
>> >> > > optimized ORC reader for consideration in the design discussion.
>> >> > >
>> >> > > -Jason
>> >> > >
>> >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>> >> > altekrusejason@gmail.com
>> >> > > >
>> >> > > wrote:
>> >> > >
>> >> > > > Hello Parquet team,
>> >> > > >
>> >> > > > I wanted to report the results of a discussion between the Drill
>> >> team
>> >> > and
>> >> > > > the engineers  at Netflix working to make Parquet run faster with
>> >> > Presto.
>> >> > > > As we have said in the last few hangouts we both want to make
>> >> > > contributions
>> >> > > > back to parquet-mr to add features and performance. We thought it
>> >> would
>> >> > > be
>> >> > > > good to sit down and speak directly about our real goals and the
>> >> best
>> >> > > next
>> >> > > > steps to get an engineering effort started to accomplish these
>> >> goals.
>> >> > > >
>> >> > > > Below is a summary of the meeting.
>> >> > > >
>> >> > > > - Meeting notes
>> >> > > >
>> >> > > >    - Attendees:
>> >> > > >
>> >> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>> >> > > >
>> >> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>> Parth
>> >> > > Chandra
>> >> > > >
>> >> > > > - Minutes
>> >> > > >
>> >> > > >    - Introductions/ Background
>> >> > > >
>> >> > > >    - Netflix
>> >> > > >
>> >> > > >        - Working on providing interactive SQL querying to users
>> >> > > >
>> >> > > >        - have chosen Presto as the query engine and Parquet as
>> high
>> >> > > > performance data
>> >> > > >
>> >> > > >          storage format
>> >> > > >
>> >> > > >        - Presto is providing needed speed in some cases, but
>> others
>> >> are
>> >> > > > missing optimizations
>> >> > > >
>> >> > > >          that could be avoiding reads
>> >> > > >
>> >> > > >        - Have already started some development and investigation,
>> >> have
>> >> > > > identified key goals
>> >> > > >
>> >> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
>> >> > written
>> >> > > > by the Presto
>> >> > > >
>> >> > > >          team shows that such gains are possible with a different
>> >> > reader
>> >> > > > implementation
>> >> > > >
>> >> > > >        - goals
>> >> > > >
>> >> > > >            - filter pushdown
>> >> > > >
>> >> > > >                - skipping reads based on filter evaluation on
>> one or
>> >> > more
>> >> > > > columns
>> >> > > >
>> >> > > >                - this can happen at several granularities : row
>> >> group,
>> >> > > > page, record/value
>> >> > > >
>> >> > > >            - late/lazy materialization
>> >> > > >
>> >> > > >                - for columns not involved in a filter, avoid
>> >> > > materializing
>> >> > > > them entirely
>> >> > > >
>> >> > > >                  until they are know to be needed after
>> evaluating a
>> >> > > > filter on other columns
>> >> > > >
>> >> > > >    - Drill
>> >> > > >
>> >> > > >        - the Drill engine uses an in-memory vectorized
>> >> representation
>> >> > of
>> >> > > > records
>> >> > > >
>> >> > > >        - for scalar and repeated types we have implemented a fast
>> >> > > > vectorized reader
>> >> > > >
>> >> > > >          that is optimized to transform between Parquet's on disk
>> >> and
>> >> > our
>> >> > > > in-memory format
>> >> > > >
>> >> > > >        - this is currently producing performant table scans, but
>> >> has no
>> >> > > > facility for filter
>> >> > > >
>> >> > > >          push down
>> >> > > >
>> >> > > >        - Major goals going forward
>> >> > > >
>> >> > > >            - filter pushdown
>> >> > > >
>> >> > > >                - decide the best implementation for incorporating
>> >> > filter
>> >> > > > pushdown into
>> >> > > >
>> >> > > >                  our current implementation, or figure out a way
>> to
>> >> > > > leverage existing
>> >> > > >
>> >> > > >                  work in the parquet-mr library to accomplish
>> this
>> >> goal
>> >> > > >
>> >> > > >            - late/lazy materialization
>> >> > > >
>> >> > > >                - see above
>> >> > > >
>> >> > > >            - contribute existing code back to parquet
>> >> > > >
>> >> > > >                - the Drill parquet reader has a very strong
>> >> emphasis on
>> >> > > > performance, a
>> >> > > >
>> >> > > >                  clear interface to consume, that is sufficiently
>> >> > > > separated from Drill
>> >> > > >
>> >> > > >                  could prove very useful for other projects
>> >> > > >
>> >> > > >    - First steps
>> >> > > >
>> >> > > >        - Netflix team will share some of their thoughts and
>> research
>> >> > from
>> >> > > > working with
>> >> > > >
>> >> > > >          the DWRF code
>> >> > > >
>> >> > > >            - we can have a discussion based off of this, which
>> >> aspects
>> >> > > are
>> >> > > > done well,
>> >> > > >
>> >> > > >              and any opportunities they may have missed that we
>> can
>> >> > > > incorporate into our
>> >> > > >
>> >> > > >              design
>> >> > > >
>> >> > > >            - do further investigation and ask the existing
>> community
>> >> > for
>> >> > > > guidance on existing
>> >> > > >
>> >> > > >              parquet-mr features or planned APIs that may provide
>> >> > desired
>> >> > > > functionality
>> >> > > >
>> >> > > >        - We will begin a discussion of an API for the new
>> >> functionality
>> >> > > >
>> >> > > >            - some outstanding thoughts for down the road
>> >> > > >
>> >> > > >                - The Drill team has an interest in very late
>> >> > > > materialization for data stored
>> >> > > >
>> >> > > >                  in dictionary encoded pages, such as running a
>> >> join or
>> >> > > > filter on the dictionary
>> >> > > >
>> >> > > >                  and then going back to the reader to grab all of
>> >> the
>> >> > > > values in the data that match
>> >> > > >
>> >> > > >                  the needed members of the dictionary
>> >> > > >
>> >> > > >                    - this is a later consideration, but just
>> some of
>> >> > the
>> >> > > > idea of the reason we are
>> >> > > >
>> >> > > >                      opening up the design discussion early so
>> that
>> >> the
>> >> > > > API can be flexible enough
>> >> > > >                      to allow this in the further, even if not
>> >> > > implemented
>> >> > > > too soon
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: High performance vectorized reader meeting notes

Posted by Zhenxiao Luo <zl...@netflix.com.INVALID>.

Thanks Jacques.

Here is the gist:
https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30

Comments and Suggestions are appreciated.

Thanks,
Zhenxiao

On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org> wrote:

> You can't send attachments.  Can you post as google doc or gist?
>
> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
> wrote:
>
> >
> > Thanks Brock and Jason.
> >
> > I just drafted a proposed APIs for vectorized Parquet reader(attached in
> > this email). Any comments and suggestions are appreciated.
> >
> > Thanks,
> > Zhenxiao
> >
> > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com> wrote:
> >
> >> Hi,
> >>
> >> The Hive + Parquet community is very interested in improving performance
> >> of
> >> Hive + Parquet and Parquet generally. We are very interested in
> >> contributing to the Parquet vectorization and lazy materialization
> effort.
> >> Please add myself to any future meetings on this topic.
> >>
> >> BTW here it the JIRA tracking this effort from the Hive side:
> >> https://issues.apache.org/jira/browse/HIVE-8120
> >>
> >> Brock
> >>
> >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
> >> wrote:
> >>
> >> > Thanks Jason.
> >> >
> >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
> >> >
> >> >
> >>
> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
> >> > ).
> >> >
> >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is
> fast,
> >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
> >> > implementation.
> >> >
> >> > We already get Parquet working in Presto. We definitely would like to
> >> get
> >> > it as fast as ORC.
> >> >
> >> > Facebook has did native support for ORC in Presto, which does not use
> >> the
> >> > ORCRecordReader at all. They parses the ORC footer, and does Predicate
> >> > Pushdown by skipping row groups, Vectorization by introducing Type
> >> Specific
> >> > Vectors, and Lazy Materialization by introducing LazyVectors(their
> code
> >> has
> >> > not been committed yet, I mean their pull request). We are planning to
> >> do
> >> > similar optimization for Parquet in Presto.
> >> >
> >> > For the ParquetRecordReader, we need additional APIs to read the next
> >> Batch
> >> > of values, and read in a vector of values. For example, here are the
> >> > related APIs in the ORC code:
> >> >
> >> > /**
> >> >    * Read the next row batch. The size of the batch to read cannot be
> >> > controlled
> >> >    * by the callers. Caller need to look at VectorizedRowBatch.size of
> >> the
> >> > retunred
> >> >    * object to know the batch size read.
> >> >    * @param previousBatch a row batch object that can be reused by the
> >> > reader
> >> >    * @return the row batch that was read
> >> >    * @throws java.io.IOException
> >> >    */
> >> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
> throws
> >> > IOException;
> >> >
> >> > And, here are the related APIs in Presto code, which is used for ORC
> >> > support in Presto:
> >> >
> >> > public void readVector(int columnIndex, Object vector);
> >> >
> >> > For lazy materialization, we may also consider adding LazyVectors or
> >> > LazyBlocks, so that the value is not materialized until they are
> >> accessed
> >> > by the Operator.
> >> >
> >> > Any comments and suggestions are appreciated.
> >> >
> >> > Thanks,
> >> > Zhenxiao
> >> >
> >> >
> >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
> >> altekrusejason@gmail.com>
> >> > wrote:
> >> >
> >> > > Hello All,
> >> > >
> >> > > No updates from me yet, just sending out another message for some of
> >> the
> >> > > Netflix engineers that were still just subscribed to the google
> group
> >> > mail.
> >> > > This will allow them to respond directly with their research on the
> >> > > optimized ORC reader for consideration in the design discussion.
> >> > >
> >> > > -Jason
> >> > >
> >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
> >> > altekrusejason@gmail.com
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Hello Parquet team,
> >> > > >
> >> > > > I wanted to report the results of a discussion between the Drill
> >> team
> >> > and
> >> > > > the engineers  at Netflix working to make Parquet run faster with
> >> > Presto.
> >> > > > As we have said in the last few hangouts we both want to make
> >> > > contributions
> >> > > > back to parquet-mr to add features and performance. We thought it
> >> would
> >> > > be
> >> > > > good to sit down and speak directly about our real goals and the
> >> best
> >> > > next
> >> > > > steps to get an engineering effort started to accomplish these
> >> goals.
> >> > > >
> >> > > > Below is a summary of the meeting.
> >> > > >
> >> > > > - Meeting notes
> >> > > >
> >> > > >    - Attendees:
> >> > > >
> >> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
> >> > > >
> >> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
> Parth
> >> > > Chandra
> >> > > >
> >> > > > - Minutes
> >> > > >
> >> > > >    - Introductions/ Background
> >> > > >
> >> > > >    - Netflix
> >> > > >
> >> > > >        - Working on providing interactive SQL querying to users
> >> > > >
> >> > > >        - have chosen Presto as the query engine and Parquet as
> high
> >> > > > performance data
> >> > > >
> >> > > >          storage format
> >> > > >
> >> > > >        - Presto is providing needed speed in some cases, but
> others
> >> are
> >> > > > missing optimizations
> >> > > >
> >> > > >          that could be avoiding reads
> >> > > >
> >> > > >        - Have already started some development and investigation,
> >> have
> >> > > > identified key goals
> >> > > >
> >> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
> >> > written
> >> > > > by the Presto
> >> > > >
> >> > > >          team shows that such gains are possible with a different
> >> > reader
> >> > > > implementation
> >> > > >
> >> > > >        - goals
> >> > > >
> >> > > >            - filter pushdown
> >> > > >
> >> > > >                - skipping reads based on filter evaluation on one
> or
> >> > more
> >> > > > columns
> >> > > >
> >> > > >                - this can happen at several granularities : row
> >> group,
> >> > > > page, record/value
> >> > > >
> >> > > >            - late/lazy materialization
> >> > > >
> >> > > >                - for columns not involved in a filter, avoid
> >> > > materializing
> >> > > > them entirely
> >> > > >
> >> > > >                  until they are know to be needed after
> evaluating a
> >> > > > filter on other columns
> >> > > >
> >> > > >    - Drill
> >> > > >
> >> > > >        - the Drill engine uses an in-memory vectorized
> >> representation
> >> > of
> >> > > > records
> >> > > >
> >> > > >        - for scalar and repeated types we have implemented a fast
> >> > > > vectorized reader
> >> > > >
> >> > > >          that is optimized to transform between Parquet's on disk
> >> and
> >> > our
> >> > > > in-memory format
> >> > > >
> >> > > >        - this is currently producing performant table scans, but
> >> has no
> >> > > > facility for filter
> >> > > >
> >> > > >          push down
> >> > > >
> >> > > >        - Major goals going forward
> >> > > >
> >> > > >            - filter pushdown
> >> > > >
> >> > > >                - decide the best implementation for incorporating
> >> > filter
> >> > > > pushdown into
> >> > > >
> >> > > >                  our current implementation, or figure out a way
> to
> >> > > > leverage existing
> >> > > >
> >> > > >                  work in the parquet-mr library to accomplish this
> >> goal
> >> > > >
> >> > > >            - late/lazy materialization
> >> > > >
> >> > > >                - see above
> >> > > >
> >> > > >            - contribute existing code back to parquet
> >> > > >
> >> > > >                - the Drill parquet reader has a very strong
> >> emphasis on
> >> > > > performance, a
> >> > > >
> >> > > >                  clear interface to consume, that is sufficiently
> >> > > > separated from Drill
> >> > > >
> >> > > >                  could prove very useful for other projects
> >> > > >
> >> > > >    - First steps
> >> > > >
> >> > > >        - Netflix team will share some of their thoughts and
> research
> >> > from
> >> > > > working with
> >> > > >
> >> > > >          the DWRF code
> >> > > >
> >> > > >            - we can have a discussion based off of this, which
> >> aspects
> >> > > are
> >> > > > done well,
> >> > > >
> >> > > >              and any opportunities they may have missed that we
> can
> >> > > > incorporate into our
> >> > > >
> >> > > >              design
> >> > > >
> >> > > >            - do further investigation and ask the existing
> community
> >> > for
> >> > > > guidance on existing
> >> > > >
> >> > > >              parquet-mr features or planned APIs that may provide
> >> > desired
> >> > > > functionality
> >> > > >
> >> > > >        - We will begin a discussion of an API for the new
> >> functionality
> >> > > >
> >> > > >            - some outstanding thoughts for down the road
> >> > > >
> >> > > >                - The Drill team has an interest in very late
> >> > > > materialization for data stored
> >> > > >
> >> > > >                  in dictionary encoded pages, such as running a
> >> join or
> >> > > > filter on the dictionary
> >> > > >
> >> > > >                  and then going back to the reader to grab all of
> >> the
> >> > > > values in the data that match
> >> > > >
> >> > > >                  the needed members of the dictionary
> >> > > >
> >> > > >                    - this is a later consideration, but just some
> of
> >> > the
> >> > > > idea of the reason we are
> >> > > >
> >> > > >                      opening up the design discussion early so
> that
> >> the
> >> > > > API can be flexible enough
> >> > > >                      to allow this in the further, even if not
> >> > > implemented
> >> > > > too soon
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: High performance vectorized reader meeting notes

Posted by Jacques Nadeau <ja...@apache.org>.

You can't send attachments.  Can you post as google doc or gist?

On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
wrote:

>
> Thanks Brock and Jason.
>
> I just drafted a proposed APIs for vectorized Parquet reader(attached in
> this email). Any comments and suggestions are appreciated.
>
> Thanks,
> Zhenxiao
>
> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com> wrote:
>
>> Hi,
>>
>> The Hive + Parquet community is very interested in improving performance
>> of
>> Hive + Parquet and Parquet generally. We are very interested in
>> contributing to the Parquet vectorization and lazy materialization effort.
>> Please add myself to any future meetings on this topic.
>>
>> BTW here it the JIRA tracking this effort from the Hive side:
>> https://issues.apache.org/jira/browse/HIVE-8120
>>
>> Brock
>>
>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
>> wrote:
>>
>> > Thanks Jason.
>> >
>> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>> >
>> >
>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>> > ).
>> >
>> > The fastest format currently in Presto is ORC, not DWRF(Parquet is fast,
>> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>> > implementation.
>> >
>> > We already get Parquet working in Presto. We definitely would like to
>> get
>> > it as fast as ORC.
>> >
>> > Facebook has did native support for ORC in Presto, which does not use
>> the
>> > ORCRecordReader at all. They parses the ORC footer, and does Predicate
>> > Pushdown by skipping row groups, Vectorization by introducing Type
>> Specific
>> > Vectors, and Lazy Materialization by introducing LazyVectors(their code
>> has
>> > not been committed yet, I mean their pull request). We are planning to
>> do
>> > similar optimization for Parquet in Presto.
>> >
>> > For the ParquetRecordReader, we need additional APIs to read the next
>> Batch
>> > of values, and read in a vector of values. For example, here are the
>> > related APIs in the ORC code:
>> >
>> > /**
>> >    * Read the next row batch. The size of the batch to read cannot be
>> > controlled
>> >    * by the callers. Caller need to look at VectorizedRowBatch.size of
>> the
>> > retunred
>> >    * object to know the batch size read.
>> >    * @param previousBatch a row batch object that can be reused by the
>> > reader
>> >    * @return the row batch that was read
>> >    * @throws java.io.IOException
>> >    */
>> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) throws
>> > IOException;
>> >
>> > And, here are the related APIs in Presto code, which is used for ORC
>> > support in Presto:
>> >
>> > public void readVector(int columnIndex, Object vector);
>> >
>> > For lazy materialization, we may also consider adding LazyVectors or
>> > LazyBlocks, so that the value is not materialized until they are
>> accessed
>> > by the Operator.
>> >
>> > Any comments and suggestions are appreciated.
>> >
>> > Thanks,
>> > Zhenxiao
>> >
>> >
>> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>> altekrusejason@gmail.com>
>> > wrote:
>> >
>> > > Hello All,
>> > >
>> > > No updates from me yet, just sending out another message for some of
>> the
>> > > Netflix engineers that were still just subscribed to the google group
>> > mail.
>> > > This will allow them to respond directly with their research on the
>> > > optimized ORC reader for consideration in the design discussion.
>> > >
>> > > -Jason
>> > >
>> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>> > altekrusejason@gmail.com
>> > > >
>> > > wrote:
>> > >
>> > > > Hello Parquet team,
>> > > >
>> > > > I wanted to report the results of a discussion between the Drill
>> team
>> > and
>> > > > the engineers  at Netflix working to make Parquet run faster with
>> > Presto.
>> > > > As we have said in the last few hangouts we both want to make
>> > > contributions
>> > > > back to parquet-mr to add features and performance. We thought it
>> would
>> > > be
>> > > > good to sit down and speak directly about our real goals and the
>> best
>> > > next
>> > > > steps to get an engineering effort started to accomplish these
>> goals.
>> > > >
>> > > > Below is a summary of the meeting.
>> > > >
>> > > > - Meeting notes
>> > > >
>> > > >    - Attendees:
>> > > >
>> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>> > > >
>> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth
>> > > Chandra
>> > > >
>> > > > - Minutes
>> > > >
>> > > >    - Introductions/ Background
>> > > >
>> > > >    - Netflix
>> > > >
>> > > >        - Working on providing interactive SQL querying to users
>> > > >
>> > > >        - have chosen Presto as the query engine and Parquet as high
>> > > > performance data
>> > > >
>> > > >          storage format
>> > > >
>> > > >        - Presto is providing needed speed in some cases, but others
>> are
>> > > > missing optimizations
>> > > >
>> > > >          that could be avoiding reads
>> > > >
>> > > >        - Have already started some development and investigation,
>> have
>> > > > identified key goals
>> > > >
>> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
>> > written
>> > > > by the Presto
>> > > >
>> > > >          team shows that such gains are possible with a different
>> > reader
>> > > > implementation
>> > > >
>> > > >        - goals
>> > > >
>> > > >            - filter pushdown
>> > > >
>> > > >                - skipping reads based on filter evaluation on one or
>> > more
>> > > > columns
>> > > >
>> > > >                - this can happen at several granularities : row
>> group,
>> > > > page, record/value
>> > > >
>> > > >            - late/lazy materialization
>> > > >
>> > > >                - for columns not involved in a filter, avoid
>> > > materializing
>> > > > them entirely
>> > > >
>> > > >                  until they are know to be needed after evaluating a
>> > > > filter on other columns
>> > > >
>> > > >    - Drill
>> > > >
>> > > >        - the Drill engine uses an in-memory vectorized
>> representation
>> > of
>> > > > records
>> > > >
>> > > >        - for scalar and repeated types we have implemented a fast
>> > > > vectorized reader
>> > > >
>> > > >          that is optimized to transform between Parquet's on disk
>> and
>> > our
>> > > > in-memory format
>> > > >
>> > > >        - this is currently producing performant table scans, but
>> has no
>> > > > facility for filter
>> > > >
>> > > >          push down
>> > > >
>> > > >        - Major goals going forward
>> > > >
>> > > >            - filter pushdown
>> > > >
>> > > >                - decide the best implementation for incorporating
>> > filter
>> > > > pushdown into
>> > > >
>> > > >                  our current implementation, or figure out a way to
>> > > > leverage existing
>> > > >
>> > > >                  work in the parquet-mr library to accomplish this
>> goal
>> > > >
>> > > >            - late/lazy materialization
>> > > >
>> > > >                - see above
>> > > >
>> > > >            - contribute existing code back to parquet
>> > > >
>> > > >                - the Drill parquet reader has a very strong
>> emphasis on
>> > > > performance, a
>> > > >
>> > > >                  clear interface to consume, that is sufficiently
>> > > > separated from Drill
>> > > >
>> > > >                  could prove very useful for other projects
>> > > >
>> > > >    - First steps
>> > > >
>> > > >        - Netflix team will share some of their thoughts and research
>> > from
>> > > > working with
>> > > >
>> > > >          the DWRF code
>> > > >
>> > > >            - we can have a discussion based off of this, which
>> aspects
>> > > are
>> > > > done well,
>> > > >
>> > > >              and any opportunities they may have missed that we can
>> > > > incorporate into our
>> > > >
>> > > >              design
>> > > >
>> > > >            - do further investigation and ask the existing community
>> > for
>> > > > guidance on existing
>> > > >
>> > > >              parquet-mr features or planned APIs that may provide
>> > desired
>> > > > functionality
>> > > >
>> > > >        - We will begin a discussion of an API for the new
>> functionality
>> > > >
>> > > >            - some outstanding thoughts for down the road
>> > > >
>> > > >                - The Drill team has an interest in very late
>> > > > materialization for data stored
>> > > >
>> > > >                  in dictionary encoded pages, such as running a
>> join or
>> > > > filter on the dictionary
>> > > >
>> > > >                  and then going back to the reader to grab all of
>> the
>> > > > values in the data that match
>> > > >
>> > > >                  the needed members of the dictionary
>> > > >
>> > > >                    - this is a later consideration, but just some of
>> > the
>> > > > idea of the reason we are
>> > > >
>> > > >                      opening up the design discussion early so that
>> the
>> > > > API can be flexible enough
>> > > >                      to allow this in the further, even if not
>> > > implemented
>> > > > too soon
>> > > >
>> > >
>> >
>>
>
>

Re: High performance vectorized reader meeting notes

Posted by Zhenxiao Luo <zl...@netflix.com.INVALID>.

Thanks Brock and Jason.

I just drafted a proposed APIs for vectorized Parquet reader(attached in
this email). Any comments and suggestions are appreciated.

Thanks,
Zhenxiao

On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com> wrote:

> Hi,
>
> The Hive + Parquet community is very interested in improving performance of
> Hive + Parquet and Parquet generally. We are very interested in
> contributing to the Parquet vectorization and lazy materialization effort.
> Please add myself to any future meetings on this topic.
>
> BTW here it the JIRA tracking this effort from the Hive side:
> https://issues.apache.org/jira/browse/HIVE-8120
>
> Brock
>
> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
> wrote:
>
> > Thanks Jason.
> >
> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
> >
> >
> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
> > ).
> >
> > The fastest format currently in Presto is ORC, not DWRF(Parquet is fast,
> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
> > implementation.
> >
> > We already get Parquet working in Presto. We definitely would like to get
> > it as fast as ORC.
> >
> > Facebook has did native support for ORC in Presto, which does not use the
> > ORCRecordReader at all. They parses the ORC footer, and does Predicate
> > Pushdown by skipping row groups, Vectorization by introducing Type
> Specific
> > Vectors, and Lazy Materialization by introducing LazyVectors(their code
> has
> > not been committed yet, I mean their pull request). We are planning to do
> > similar optimization for Parquet in Presto.
> >
> > For the ParquetRecordReader, we need additional APIs to read the next
> Batch
> > of values, and read in a vector of values. For example, here are the
> > related APIs in the ORC code:
> >
> > /**
> >    * Read the next row batch. The size of the batch to read cannot be
> > controlled
> >    * by the callers. Caller need to look at VectorizedRowBatch.size of
> the
> > retunred
> >    * object to know the batch size read.
> >    * @param previousBatch a row batch object that can be reused by the
> > reader
> >    * @return the row batch that was read
> >    * @throws java.io.IOException
> >    */
> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) throws
> > IOException;
> >
> > And, here are the related APIs in Presto code, which is used for ORC
> > support in Presto:
> >
> > public void readVector(int columnIndex, Object vector);
> >
> > For lazy materialization, we may also consider adding LazyVectors or
> > LazyBlocks, so that the value is not materialized until they are accessed
> > by the Operator.
> >
> > Any comments and suggestions are appreciated.
> >
> > Thanks,
> > Zhenxiao
> >
> >
> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
> altekrusejason@gmail.com>
> > wrote:
> >
> > > Hello All,
> > >
> > > No updates from me yet, just sending out another message for some of
> the
> > > Netflix engineers that were still just subscribed to the google group
> > mail.
> > > This will allow them to respond directly with their research on the
> > > optimized ORC reader for consideration in the design discussion.
> > >
> > > -Jason
> > >
> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hello Parquet team,
> > > >
> > > > I wanted to report the results of a discussion between the Drill team
> > and
> > > > the engineers  at Netflix working to make Parquet run faster with
> > Presto.
> > > > As we have said in the last few hangouts we both want to make
> > > contributions
> > > > back to parquet-mr to add features and performance. We thought it
> would
> > > be
> > > > good to sit down and speak directly about our real goals and the best
> > > next
> > > > steps to get an engineering effort started to accomplish these goals.
> > > >
> > > > Below is a summary of the meeting.
> > > >
> > > > - Meeting notes
> > > >
> > > >    - Attendees:
> > > >
> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
> > > >
> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth
> > > Chandra
> > > >
> > > > - Minutes
> > > >
> > > >    - Introductions/ Background
> > > >
> > > >    - Netflix
> > > >
> > > >        - Working on providing interactive SQL querying to users
> > > >
> > > >        - have chosen Presto as the query engine and Parquet as high
> > > > performance data
> > > >
> > > >          storage format
> > > >
> > > >        - Presto is providing needed speed in some cases, but others
> are
> > > > missing optimizations
> > > >
> > > >          that could be avoiding reads
> > > >
> > > >        - Have already started some development and investigation,
> have
> > > > identified key goals
> > > >
> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
> > written
> > > > by the Presto
> > > >
> > > >          team shows that such gains are possible with a different
> > reader
> > > > implementation
> > > >
> > > >        - goals
> > > >
> > > >            - filter pushdown
> > > >
> > > >                - skipping reads based on filter evaluation on one or
> > more
> > > > columns
> > > >
> > > >                - this can happen at several granularities : row
> group,
> > > > page, record/value
> > > >
> > > >            - late/lazy materialization
> > > >
> > > >                - for columns not involved in a filter, avoid
> > > materializing
> > > > them entirely
> > > >
> > > >                  until they are know to be needed after evaluating a
> > > > filter on other columns
> > > >
> > > >    - Drill
> > > >
> > > >        - the Drill engine uses an in-memory vectorized representation
> > of
> > > > records
> > > >
> > > >        - for scalar and repeated types we have implemented a fast
> > > > vectorized reader
> > > >
> > > >          that is optimized to transform between Parquet's on disk and
> > our
> > > > in-memory format
> > > >
> > > >        - this is currently producing performant table scans, but has
> no
> > > > facility for filter
> > > >
> > > >          push down
> > > >
> > > >        - Major goals going forward
> > > >
> > > >            - filter pushdown
> > > >
> > > >                - decide the best implementation for incorporating
> > filter
> > > > pushdown into
> > > >
> > > >                  our current implementation, or figure out a way to
> > > > leverage existing
> > > >
> > > >                  work in the parquet-mr library to accomplish this
> goal
> > > >
> > > >            - late/lazy materialization
> > > >
> > > >                - see above
> > > >
> > > >            - contribute existing code back to parquet
> > > >
> > > >                - the Drill parquet reader has a very strong emphasis
> on
> > > > performance, a
> > > >
> > > >                  clear interface to consume, that is sufficiently
> > > > separated from Drill
> > > >
> > > >                  could prove very useful for other projects
> > > >
> > > >    - First steps
> > > >
> > > >        - Netflix team will share some of their thoughts and research
> > from
> > > > working with
> > > >
> > > >          the DWRF code
> > > >
> > > >            - we can have a discussion based off of this, which
> aspects
> > > are
> > > > done well,
> > > >
> > > >              and any opportunities they may have missed that we can
> > > > incorporate into our
> > > >
> > > >              design
> > > >
> > > >            - do further investigation and ask the existing community
> > for
> > > > guidance on existing
> > > >
> > > >              parquet-mr features or planned APIs that may provide
> > desired
> > > > functionality
> > > >
> > > >        - We will begin a discussion of an API for the new
> functionality
> > > >
> > > >            - some outstanding thoughts for down the road
> > > >
> > > >                - The Drill team has an interest in very late
> > > > materialization for data stored
> > > >
> > > >                  in dictionary encoded pages, such as running a join
> or
> > > > filter on the dictionary
> > > >
> > > >                  and then going back to the reader to grab all of the
> > > > values in the data that match
> > > >
> > > >                  the needed members of the dictionary
> > > >
> > > >                    - this is a later consideration, but just some of
> > the
> > > > idea of the reason we are
> > > >
> > > >                      opening up the design discussion early so that
> the
> > > > API can be flexible enough
> > > >                      to allow this in the further, even if not
> > > implemented
> > > > too soon
> > > >
> > >
> >
>

Re: High performance vectorized reader meeting notes

Posted by Brock Noland <br...@cloudera.com>.

Hi,

The Hive + Parquet community is very interested in improving performance of
Hive + Parquet and Parquet generally. We are very interested in
contributing to the Parquet vectorization and lazy materialization effort.
Please add myself to any future meetings on this topic.

BTW here it the JIRA tracking this effort from the Hive side:
https://issues.apache.org/jira/browse/HIVE-8120

Brock

On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
wrote:

> Thanks Jason.
>
> Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>
> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
> ).
>
> The fastest format currently in Presto is ORC, not DWRF(Parquet is fast,
> but not as fast as ORC). We are referring to ORC, not facebook's DWRF
> implementation.
>
> We already get Parquet working in Presto. We definitely would like to get
> it as fast as ORC.
>
> Facebook has did native support for ORC in Presto, which does not use the
> ORCRecordReader at all. They parses the ORC footer, and does Predicate
> Pushdown by skipping row groups, Vectorization by introducing Type Specific
> Vectors, and Lazy Materialization by introducing LazyVectors(their code has
> not been committed yet, I mean their pull request). We are planning to do
> similar optimization for Parquet in Presto.
>
> For the ParquetRecordReader, we need additional APIs to read the next Batch
> of values, and read in a vector of values. For example, here are the
> related APIs in the ORC code:
>
> /**
>    * Read the next row batch. The size of the batch to read cannot be
> controlled
>    * by the callers. Caller need to look at VectorizedRowBatch.size of the
> retunred
>    * object to know the batch size read.
>    * @param previousBatch a row batch object that can be reused by the
> reader
>    * @return the row batch that was read
>    * @throws java.io.IOException
>    */
>   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) throws
> IOException;
>
> And, here are the related APIs in Presto code, which is used for ORC
> support in Presto:
>
> public void readVector(int columnIndex, Object vector);
>
> For lazy materialization, we may also consider adding LazyVectors or
> LazyBlocks, so that the value is not materialized until they are accessed
> by the Operator.
>
> Any comments and suggestions are appreciated.
>
> Thanks,
> Zhenxiao
>
>
> On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <al...@gmail.com>
> wrote:
>
> > Hello All,
> >
> > No updates from me yet, just sending out another message for some of the
> > Netflix engineers that were still just subscribed to the google group
> mail.
> > This will allow them to respond directly with their research on the
> > optimized ORC reader for consideration in the design discussion.
> >
> > -Jason
> >
> > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > Hello Parquet team,
> > >
> > > I wanted to report the results of a discussion between the Drill team
> and
> > > the engineers  at Netflix working to make Parquet run faster with
> Presto.
> > > As we have said in the last few hangouts we both want to make
> > contributions
> > > back to parquet-mr to add features and performance. We thought it would
> > be
> > > good to sit down and speak directly about our real goals and the best
> > next
> > > steps to get an engineering effort started to accomplish these goals.
> > >
> > > Below is a summary of the meeting.
> > >
> > > - Meeting notes
> > >
> > >    - Attendees:
> > >
> > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
> > >
> > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth
> > Chandra
> > >
> > > - Minutes
> > >
> > >    - Introductions/ Background
> > >
> > >    - Netflix
> > >
> > >        - Working on providing interactive SQL querying to users
> > >
> > >        - have chosen Presto as the query engine and Parquet as high
> > > performance data
> > >
> > >          storage format
> > >
> > >        - Presto is providing needed speed in some cases, but others are
> > > missing optimizations
> > >
> > >          that could be avoiding reads
> > >
> > >        - Have already started some development and investigation, have
> > > identified key goals
> > >
> > >        - Some initial benchmarks with a modified ORC reader DWRF,
> written
> > > by the Presto
> > >
> > >          team shows that such gains are possible with a different
> reader
> > > implementation
> > >
> > >        - goals
> > >
> > >            - filter pushdown
> > >
> > >                - skipping reads based on filter evaluation on one or
> more
> > > columns
> > >
> > >                - this can happen at several granularities : row group,
> > > page, record/value
> > >
> > >            - late/lazy materialization
> > >
> > >                - for columns not involved in a filter, avoid
> > materializing
> > > them entirely
> > >
> > >                  until they are know to be needed after evaluating a
> > > filter on other columns
> > >
> > >    - Drill
> > >
> > >        - the Drill engine uses an in-memory vectorized representation
> of
> > > records
> > >
> > >        - for scalar and repeated types we have implemented a fast
> > > vectorized reader
> > >
> > >          that is optimized to transform between Parquet's on disk and
> our
> > > in-memory format
> > >
> > >        - this is currently producing performant table scans, but has no
> > > facility for filter
> > >
> > >          push down
> > >
> > >        - Major goals going forward
> > >
> > >            - filter pushdown
> > >
> > >                - decide the best implementation for incorporating
> filter
> > > pushdown into
> > >
> > >                  our current implementation, or figure out a way to
> > > leverage existing
> > >
> > >                  work in the parquet-mr library to accomplish this goal
> > >
> > >            - late/lazy materialization
> > >
> > >                - see above
> > >
> > >            - contribute existing code back to parquet
> > >
> > >                - the Drill parquet reader has a very strong emphasis on
> > > performance, a
> > >
> > >                  clear interface to consume, that is sufficiently
> > > separated from Drill
> > >
> > >                  could prove very useful for other projects
> > >
> > >    - First steps
> > >
> > >        - Netflix team will share some of their thoughts and research
> from
> > > working with
> > >
> > >          the DWRF code
> > >
> > >            - we can have a discussion based off of this, which aspects
> > are
> > > done well,
> > >
> > >              and any opportunities they may have missed that we can
> > > incorporate into our
> > >
> > >              design
> > >
> > >            - do further investigation and ask the existing community
> for
> > > guidance on existing
> > >
> > >              parquet-mr features or planned APIs that may provide
> desired
> > > functionality
> > >
> > >        - We will begin a discussion of an API for the new functionality
> > >
> > >            - some outstanding thoughts for down the road
> > >
> > >                - The Drill team has an interest in very late
> > > materialization for data stored
> > >
> > >                  in dictionary encoded pages, such as running a join or
> > > filter on the dictionary
> > >
> > >                  and then going back to the reader to grab all of the
> > > values in the data that match
> > >
> > >                  the needed members of the dictionary
> > >
> > >                    - this is a later consideration, but just some of
> the
> > > idea of the reason we are
> > >
> > >                      opening up the design discussion early so that the
> > > API can be flexible enough
> > >                      to allow this in the further, even if not
> > implemented
> > > too soon
> > >
> >
>

Re: High performance vectorized reader meeting notes

Posted by Zhenxiao Luo <zl...@netflix.com.INVALID>.

Thanks Jason.

Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
).

The fastest format currently in Presto is ORC, not DWRF(Parquet is fast,
but not as fast as ORC). We are referring to ORC, not facebook's DWRF
implementation.

We already get Parquet working in Presto. We definitely would like to get
it as fast as ORC.

Facebook has did native support for ORC in Presto, which does not use the
ORCRecordReader at all. They parses the ORC footer, and does Predicate
Pushdown by skipping row groups, Vectorization by introducing Type Specific
Vectors, and Lazy Materialization by introducing LazyVectors(their code has
not been committed yet, I mean their pull request). We are planning to do
similar optimization for Parquet in Presto.

For the ParquetRecordReader, we need additional APIs to read the next Batch
of values, and read in a vector of values. For example, here are the
related APIs in the ORC code:

/**
   * Read the next row batch. The size of the batch to read cannot be
controlled
   * by the callers. Caller need to look at VectorizedRowBatch.size of the
retunred
   * object to know the batch size read.
   * @param previousBatch a row batch object that can be reused by the
reader
   * @return the row batch that was read
   * @throws java.io.IOException
   */
  VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) throws
IOException;

And, here are the related APIs in Presto code, which is used for ORC
support in Presto:

public void readVector(int columnIndex, Object vector);

For lazy materialization, we may also consider adding LazyVectors or
LazyBlocks, so that the value is not materialized until they are accessed
by the Operator.

Any comments and suggestions are appreciated.

Thanks,
Zhenxiao


On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Hello All,
>
> No updates from me yet, just sending out another message for some of the
> Netflix engineers that were still just subscribed to the google group mail.
> This will allow them to respond directly with their research on the
> optimized ORC reader for consideration in the design discussion.
>
> -Jason
>
> On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <altekrusejason@gmail.com
> >
> wrote:
>
> > Hello Parquet team,
> >
> > I wanted to report the results of a discussion between the Drill team and
> > the engineers  at Netflix working to make Parquet run faster with Presto.
> > As we have said in the last few hangouts we both want to make
> contributions
> > back to parquet-mr to add features and performance. We thought it would
> be
> > good to sit down and speak directly about our real goals and the best
> next
> > steps to get an engineering effort started to accomplish these goals.
> >
> > Below is a summary of the meeting.
> >
> > - Meeting notes
> >
> >    - Attendees:
> >
> >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
> >
> >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth
> Chandra
> >
> > - Minutes
> >
> >    - Introductions/ Background
> >
> >    - Netflix
> >
> >        - Working on providing interactive SQL querying to users
> >
> >        - have chosen Presto as the query engine and Parquet as high
> > performance data
> >
> >          storage format
> >
> >        - Presto is providing needed speed in some cases, but others are
> > missing optimizations
> >
> >          that could be avoiding reads
> >
> >        - Have already started some development and investigation, have
> > identified key goals
> >
> >        - Some initial benchmarks with a modified ORC reader DWRF, written
> > by the Presto
> >
> >          team shows that such gains are possible with a different reader
> > implementation
> >
> >        - goals
> >
> >            - filter pushdown
> >
> >                - skipping reads based on filter evaluation on one or more
> > columns
> >
> >                - this can happen at several granularities : row group,
> > page, record/value
> >
> >            - late/lazy materialization
> >
> >                - for columns not involved in a filter, avoid
> materializing
> > them entirely
> >
> >                  until they are know to be needed after evaluating a
> > filter on other columns
> >
> >    - Drill
> >
> >        - the Drill engine uses an in-memory vectorized representation of
> > records
> >
> >        - for scalar and repeated types we have implemented a fast
> > vectorized reader
> >
> >          that is optimized to transform between Parquet's on disk and our
> > in-memory format
> >
> >        - this is currently producing performant table scans, but has no
> > facility for filter
> >
> >          push down
> >
> >        - Major goals going forward
> >
> >            - filter pushdown
> >
> >                - decide the best implementation for incorporating filter
> > pushdown into
> >
> >                  our current implementation, or figure out a way to
> > leverage existing
> >
> >                  work in the parquet-mr library to accomplish this goal
> >
> >            - late/lazy materialization
> >
> >                - see above
> >
> >            - contribute existing code back to parquet
> >
> >                - the Drill parquet reader has a very strong emphasis on
> > performance, a
> >
> >                  clear interface to consume, that is sufficiently
> > separated from Drill
> >
> >                  could prove very useful for other projects
> >
> >    - First steps
> >
> >        - Netflix team will share some of their thoughts and research from
> > working with
> >
> >          the DWRF code
> >
> >            - we can have a discussion based off of this, which aspects
> are
> > done well,
> >
> >              and any opportunities they may have missed that we can
> > incorporate into our
> >
> >              design
> >
> >            - do further investigation and ask the existing community for
> > guidance on existing
> >
> >              parquet-mr features or planned APIs that may provide desired
> > functionality
> >
> >        - We will begin a discussion of an API for the new functionality
> >
> >            - some outstanding thoughts for down the road
> >
> >                - The Drill team has an interest in very late
> > materialization for data stored
> >
> >                  in dictionary encoded pages, such as running a join or
> > filter on the dictionary
> >
> >                  and then going back to the reader to grab all of the
> > values in the data that match
> >
> >                  the needed members of the dictionary
> >
> >                    - this is a later consideration, but just some of the
> > idea of the reason we are
> >
> >                      opening up the design discussion early so that the
> > API can be flexible enough
> >                      to allow this in the further, even if not
> implemented
> > too soon
> >
>

Re: High performance vectorized reader meeting notes

Posted by Jason Altekruse <al...@gmail.com>.

Hello All,

No updates from me yet, just sending out another message for some of the
Netflix engineers that were still just subscribed to the google group mail.
This will allow them to respond directly with their research on the
optimized ORC reader for consideration in the design discussion.

-Jason

On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Hello Parquet team,
>
> I wanted to report the results of a discussion between the Drill team and
> the engineers  at Netflix working to make Parquet run faster with Presto.
> As we have said in the last few hangouts we both want to make contributions
> back to parquet-mr to add features and performance. We thought it would be
> good to sit down and speak directly about our real goals and the best next
> steps to get an engineering effort started to accomplish these goals.
>
> Below is a summary of the meeting.
>
> - Meeting notes
>
>    - Attendees:
>
>        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>
>        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth Chandra
>
> - Minutes
>
>    - Introductions/ Background
>
>    - Netflix
>
>        - Working on providing interactive SQL querying to users
>
>        - have chosen Presto as the query engine and Parquet as high
> performance data
>
>          storage format
>
>        - Presto is providing needed speed in some cases, but others are
> missing optimizations
>
>          that could be avoiding reads
>
>        - Have already started some development and investigation, have
> identified key goals
>
>        - Some initial benchmarks with a modified ORC reader DWRF, written
> by the Presto
>
>          team shows that such gains are possible with a different reader
> implementation
>
>        - goals
>
>            - filter pushdown
>
>                - skipping reads based on filter evaluation on one or more
> columns
>
>                - this can happen at several granularities : row group,
> page, record/value
>
>            - late/lazy materialization
>
>                - for columns not involved in a filter, avoid materializing
> them entirely
>
>                  until they are know to be needed after evaluating a
> filter on other columns
>
>    - Drill
>
>        - the Drill engine uses an in-memory vectorized representation of
> records
>
>        - for scalar and repeated types we have implemented a fast
> vectorized reader
>
>          that is optimized to transform between Parquet's on disk and our
> in-memory format
>
>        - this is currently producing performant table scans, but has no
> facility for filter
>
>          push down
>
>        - Major goals going forward
>
>            - filter pushdown
>
>                - decide the best implementation for incorporating filter
> pushdown into
>
>                  our current implementation, or figure out a way to
> leverage existing
>
>                  work in the parquet-mr library to accomplish this goal
>
>            - late/lazy materialization
>
>                - see above
>
>            - contribute existing code back to parquet
>
>                - the Drill parquet reader has a very strong emphasis on
> performance, a
>
>                  clear interface to consume, that is sufficiently
> separated from Drill
>
>                  could prove very useful for other projects
>
>    - First steps
>
>        - Netflix team will share some of their thoughts and research from
> working with
>
>          the DWRF code
>
>            - we can have a discussion based off of this, which aspects are
> done well,
>
>              and any opportunities they may have missed that we can
> incorporate into our
>
>              design
>
>            - do further investigation and ask the existing community for
> guidance on existing
>
>              parquet-mr features or planned APIs that may provide desired
> functionality
>
>        - We will begin a discussion of an API for the new functionality
>
>            - some outstanding thoughts for down the road
>
>                - The Drill team has an interest in very late
> materialization for data stored
>
>                  in dictionary encoded pages, such as running a join or
> filter on the dictionary
>
>                  and then going back to the reader to grab all of the
> values in the data that match
>
>                  the needed members of the dictionary
>
>                    - this is a later consideration, but just some of the
> idea of the reason we are
>
>                      opening up the design discussion early so that the
> API can be flexible enough
>                      to allow this in the further, even if not implemented
> too soon
>