You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by Brock Noland <br...@cloudera.com> on 2014/11/01 21:03:16 UTC

Re: High performance vectorized reader meeting notes

Hi,

Great! I will take a look soon.

Cheers!
Brock

On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:

>
> Thanks Jacques.
>
> Here is the gist:
> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>
> Comments and Suggestions are appreciated.
>
> Thanks,
> Zhenxiao
>
> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
>> You can't send attachments.  Can you post as google doc or gist?
>>
>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
>> wrote:
>>
>> >
>> > Thanks Brock and Jason.
>> >
>> > I just drafted a proposed APIs for vectorized Parquet reader(attached in
>> > this email). Any comments and suggestions are appreciated.
>> >
>> > Thanks,
>> > Zhenxiao
>> >
>> > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> The Hive + Parquet community is very interested in improving
>> performance
>> >> of
>> >> Hive + Parquet and Parquet generally. We are very interested in
>> >> contributing to the Parquet vectorization and lazy materialization
>> effort.
>> >> Please add myself to any future meetings on this topic.
>> >>
>> >> BTW here it the JIRA tracking this effort from the Hive side:
>> >> https://issues.apache.org/jira/browse/HIVE-8120
>> >>
>> >> Brock
>> >>
>> >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>> >
>> >> wrote:
>> >>
>> >> > Thanks Jason.
>> >> >
>> >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>> >> >
>> >> >
>> >>
>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>> >> > ).
>> >> >
>> >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is
>> fast,
>> >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>> >> > implementation.
>> >> >
>> >> > We already get Parquet working in Presto. We definitely would like to
>> >> get
>> >> > it as fast as ORC.
>> >> >
>> >> > Facebook has did native support for ORC in Presto, which does not use
>> >> the
>> >> > ORCRecordReader at all. They parses the ORC footer, and does
>> Predicate
>> >> > Pushdown by skipping row groups, Vectorization by introducing Type
>> >> Specific
>> >> > Vectors, and Lazy Materialization by introducing LazyVectors(their
>> code
>> >> has
>> >> > not been committed yet, I mean their pull request). We are planning
>> to
>> >> do
>> >> > similar optimization for Parquet in Presto.
>> >> >
>> >> > For the ParquetRecordReader, we need additional APIs to read the next
>> >> Batch
>> >> > of values, and read in a vector of values. For example, here are the
>> >> > related APIs in the ORC code:
>> >> >
>> >> > /**
>> >> >    * Read the next row batch. The size of the batch to read cannot be
>> >> > controlled
>> >> >    * by the callers. Caller need to look at VectorizedRowBatch.size
>> of
>> >> the
>> >> > retunred
>> >> >    * object to know the batch size read.
>> >> >    * @param previousBatch a row batch object that can be reused by
>> the
>> >> > reader
>> >> >    * @return the row batch that was read
>> >> >    * @throws java.io.IOException
>> >> >    */
>> >> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>> throws
>> >> > IOException;
>> >> >
>> >> > And, here are the related APIs in Presto code, which is used for ORC
>> >> > support in Presto:
>> >> >
>> >> > public void readVector(int columnIndex, Object vector);
>> >> >
>> >> > For lazy materialization, we may also consider adding LazyVectors or
>> >> > LazyBlocks, so that the value is not materialized until they are
>> >> accessed
>> >> > by the Operator.
>> >> >
>> >> > Any comments and suggestions are appreciated.
>> >> >
>> >> > Thanks,
>> >> > Zhenxiao
>> >> >
>> >> >
>> >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>> >> altekrusejason@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hello All,
>> >> > >
>> >> > > No updates from me yet, just sending out another message for some
>> of
>> >> the
>> >> > > Netflix engineers that were still just subscribed to the google
>> group
>> >> > mail.
>> >> > > This will allow them to respond directly with their research on the
>> >> > > optimized ORC reader for consideration in the design discussion.
>> >> > >
>> >> > > -Jason
>> >> > >
>> >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>> >> > altekrusejason@gmail.com
>> >> > > >
>> >> > > wrote:
>> >> > >
>> >> > > > Hello Parquet team,
>> >> > > >
>> >> > > > I wanted to report the results of a discussion between the Drill
>> >> team
>> >> > and
>> >> > > > the engineers  at Netflix working to make Parquet run faster with
>> >> > Presto.
>> >> > > > As we have said in the last few hangouts we both want to make
>> >> > > contributions
>> >> > > > back to parquet-mr to add features and performance. We thought it
>> >> would
>> >> > > be
>> >> > > > good to sit down and speak directly about our real goals and the
>> >> best
>> >> > > next
>> >> > > > steps to get an engineering effort started to accomplish these
>> >> goals.
>> >> > > >
>> >> > > > Below is a summary of the meeting.
>> >> > > >
>> >> > > > - Meeting notes
>> >> > > >
>> >> > > >    - Attendees:
>> >> > > >
>> >> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>> >> > > >
>> >> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>> Parth
>> >> > > Chandra
>> >> > > >
>> >> > > > - Minutes
>> >> > > >
>> >> > > >    - Introductions/ Background
>> >> > > >
>> >> > > >    - Netflix
>> >> > > >
>> >> > > >        - Working on providing interactive SQL querying to users
>> >> > > >
>> >> > > >        - have chosen Presto as the query engine and Parquet as
>> high
>> >> > > > performance data
>> >> > > >
>> >> > > >          storage format
>> >> > > >
>> >> > > >        - Presto is providing needed speed in some cases, but
>> others
>> >> are
>> >> > > > missing optimizations
>> >> > > >
>> >> > > >          that could be avoiding reads
>> >> > > >
>> >> > > >        - Have already started some development and investigation,
>> >> have
>> >> > > > identified key goals
>> >> > > >
>> >> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
>> >> > written
>> >> > > > by the Presto
>> >> > > >
>> >> > > >          team shows that such gains are possible with a different
>> >> > reader
>> >> > > > implementation
>> >> > > >
>> >> > > >        - goals
>> >> > > >
>> >> > > >            - filter pushdown
>> >> > > >
>> >> > > >                - skipping reads based on filter evaluation on
>> one or
>> >> > more
>> >> > > > columns
>> >> > > >
>> >> > > >                - this can happen at several granularities : row
>> >> group,
>> >> > > > page, record/value
>> >> > > >
>> >> > > >            - late/lazy materialization
>> >> > > >
>> >> > > >                - for columns not involved in a filter, avoid
>> >> > > materializing
>> >> > > > them entirely
>> >> > > >
>> >> > > >                  until they are know to be needed after
>> evaluating a
>> >> > > > filter on other columns
>> >> > > >
>> >> > > >    - Drill
>> >> > > >
>> >> > > >        - the Drill engine uses an in-memory vectorized
>> >> representation
>> >> > of
>> >> > > > records
>> >> > > >
>> >> > > >        - for scalar and repeated types we have implemented a fast
>> >> > > > vectorized reader
>> >> > > >
>> >> > > >          that is optimized to transform between Parquet's on disk
>> >> and
>> >> > our
>> >> > > > in-memory format
>> >> > > >
>> >> > > >        - this is currently producing performant table scans, but
>> >> has no
>> >> > > > facility for filter
>> >> > > >
>> >> > > >          push down
>> >> > > >
>> >> > > >        - Major goals going forward
>> >> > > >
>> >> > > >            - filter pushdown
>> >> > > >
>> >> > > >                - decide the best implementation for incorporating
>> >> > filter
>> >> > > > pushdown into
>> >> > > >
>> >> > > >                  our current implementation, or figure out a way
>> to
>> >> > > > leverage existing
>> >> > > >
>> >> > > >                  work in the parquet-mr library to accomplish
>> this
>> >> goal
>> >> > > >
>> >> > > >            - late/lazy materialization
>> >> > > >
>> >> > > >                - see above
>> >> > > >
>> >> > > >            - contribute existing code back to parquet
>> >> > > >
>> >> > > >                - the Drill parquet reader has a very strong
>> >> emphasis on
>> >> > > > performance, a
>> >> > > >
>> >> > > >                  clear interface to consume, that is sufficiently
>> >> > > > separated from Drill
>> >> > > >
>> >> > > >                  could prove very useful for other projects
>> >> > > >
>> >> > > >    - First steps
>> >> > > >
>> >> > > >        - Netflix team will share some of their thoughts and
>> research
>> >> > from
>> >> > > > working with
>> >> > > >
>> >> > > >          the DWRF code
>> >> > > >
>> >> > > >            - we can have a discussion based off of this, which
>> >> aspects
>> >> > > are
>> >> > > > done well,
>> >> > > >
>> >> > > >              and any opportunities they may have missed that we
>> can
>> >> > > > incorporate into our
>> >> > > >
>> >> > > >              design
>> >> > > >
>> >> > > >            - do further investigation and ask the existing
>> community
>> >> > for
>> >> > > > guidance on existing
>> >> > > >
>> >> > > >              parquet-mr features or planned APIs that may provide
>> >> > desired
>> >> > > > functionality
>> >> > > >
>> >> > > >        - We will begin a discussion of an API for the new
>> >> functionality
>> >> > > >
>> >> > > >            - some outstanding thoughts for down the road
>> >> > > >
>> >> > > >                - The Drill team has an interest in very late
>> >> > > > materialization for data stored
>> >> > > >
>> >> > > >                  in dictionary encoded pages, such as running a
>> >> join or
>> >> > > > filter on the dictionary
>> >> > > >
>> >> > > >                  and then going back to the reader to grab all of
>> >> the
>> >> > > > values in the data that match
>> >> > > >
>> >> > > >                  the needed members of the dictionary
>> >> > > >
>> >> > > >                    - this is a later consideration, but just
>> some of
>> >> > the
>> >> > > > idea of the reason we are
>> >> > > >
>> >> > > >                      opening up the design discussion early so
>> that
>> >> the
>> >> > > > API can be flexible enough
>> >> > > >                      to allow this in the further, even if not
>> >> > > implemented
>> >> > > > too soon
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: High performance vectorized reader meeting notes

Posted by Zhenxiao Luo <zl...@netflix.com.INVALID>.

Hi Ryan,

I just created this JIRA for it:
https://issues.apache.org/jira/browse/PARQUET-131

Comments and suggestions are welcome.

Thanks,
Zhenxiao

On Mon, Nov 10, 2014 at 10:59 AM, Ryan Blue <bl...@cloudera.com> wrote:

> Hi everyone,
>
> Is there a JIRA issue tracking the vectorized reader API? Brock and I have
> been working through how we would integrate this with Hive and have a few
> questions and comments. Thanks!
>
> rb
>
>
> On 11/01/2014 01:03 PM, Brock Noland wrote:
>
>> Hi,
>>
>> Great! I will take a look soon.
>>
>> Cheers!
>> Brock
>>
>> On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:
>>
>>
>>> Thanks Jacques.
>>>
>>> Here is the gist:
>>> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>>>
>>> Comments and Suggestions are appreciated.
>>>
>>> Thanks,
>>> Zhenxiao
>>>
>>> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>>
>>>  You can't send attachments.  Can you post as google doc or gist?
>>>>
>>>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>> >
>>>> wrote:
>>>>
>>>>
>>>>> Thanks Brock and Jason.
>>>>>
>>>>> I just drafted a proposed APIs for vectorized Parquet reader(attached
>>>>> in
>>>>> this email). Any comments and suggestions are appreciated.
>>>>>
>>>>> Thanks,
>>>>> Zhenxiao
>>>>>
>>>>> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>>  Hi,
>>>>>>
>>>>>> The Hive + Parquet community is very interested in improving
>>>>>>
>>>>> performance
>>>>
>>>>> of
>>>>>> Hive + Parquet and Parquet generally. We are very interested in
>>>>>> contributing to the Parquet vectorization and lazy materialization
>>>>>>
>>>>> effort.
>>>>
>>>>> Please add myself to any future meetings on this topic.
>>>>>>
>>>>>> BTW here it the JIRA tracking this effort from the Hive side:
>>>>>> https://issues.apache.org/jira/browse/HIVE-8120
>>>>>>
>>>>>> Brock
>>>>>>
>>>>>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>>>>
>>>>>
>>>>>  wrote:
>>>>>>
>>>>>>  Thanks Jason.
>>>>>>>
>>>>>>> Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>  http://techblog.netflix.com/2014/10/using-presto-in-our-
>>>> big-data-platform.html
>>>>
>>>>> ).
>>>>>>>
>>>>>>> The fastest format currently in Presto is ORC, not DWRF(Parquet is
>>>>>>>
>>>>>> fast,
>>>>
>>>>> but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>>>>>>> implementation.
>>>>>>>
>>>>>>> We already get Parquet working in Presto. We definitely would like to
>>>>>>>
>>>>>> get
>>>>>>
>>>>>>> it as fast as ORC.
>>>>>>>
>>>>>>> Facebook has did native support for ORC in Presto, which does not use
>>>>>>>
>>>>>> the
>>>>>>
>>>>>>> ORCRecordReader at all. They parses the ORC footer, and does
>>>>>>>
>>>>>> Predicate
>>>>
>>>>> Pushdown by skipping row groups, Vectorization by introducing Type
>>>>>>>
>>>>>> Specific
>>>>>>
>>>>>>> Vectors, and Lazy Materialization by introducing LazyVectors(their
>>>>>>>
>>>>>> code
>>>>
>>>>> has
>>>>>>
>>>>>>> not been committed yet, I mean their pull request). We are planning
>>>>>>>
>>>>>> to
>>>>
>>>>> do
>>>>>>
>>>>>>> similar optimization for Parquet in Presto.
>>>>>>>
>>>>>>> For the ParquetRecordReader, we need additional APIs to read the next
>>>>>>>
>>>>>> Batch
>>>>>>
>>>>>>> of values, and read in a vector of values. For example, here are the
>>>>>>> related APIs in the ORC code:
>>>>>>>
>>>>>>> /**
>>>>>>>     * Read the next row batch. The size of the batch to read cannot
>>>>>>> be
>>>>>>> controlled
>>>>>>>     * by the callers. Caller need to look at VectorizedRowBatch.size
>>>>>>>
>>>>>> of
>>>>
>>>>> the
>>>>>>
>>>>>>> retunred
>>>>>>>     * object to know the batch size read.
>>>>>>>     * @param previousBatch a row batch object that can be reused by
>>>>>>>
>>>>>> the
>>>>
>>>>> reader
>>>>>>>     * @return the row batch that was read
>>>>>>>     * @throws java.io.IOException
>>>>>>>     */
>>>>>>>    VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>>>>>>>
>>>>>> throws
>>>>
>>>>> IOException;
>>>>>>>
>>>>>>> And, here are the related APIs in Presto code, which is used for ORC
>>>>>>> support in Presto:
>>>>>>>
>>>>>>> public void readVector(int columnIndex, Object vector);
>>>>>>>
>>>>>>> For lazy materialization, we may also consider adding LazyVectors or
>>>>>>> LazyBlocks, so that the value is not materialized until they are
>>>>>>>
>>>>>> accessed
>>>>>>
>>>>>>> by the Operator.
>>>>>>>
>>>>>>> Any comments and suggestions are appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Zhenxiao
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>>>>>>>
>>>>>> altekrusejason@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Hello All,
>>>>>>>>
>>>>>>>> No updates from me yet, just sending out another message for some
>>>>>>>>
>>>>>>> of
>>>>
>>>>> the
>>>>>>
>>>>>>> Netflix engineers that were still just subscribed to the google
>>>>>>>>
>>>>>>> group
>>>>
>>>>> mail.
>>>>>>>
>>>>>>>> This will allow them to respond directly with their research on the
>>>>>>>> optimized ORC reader for consideration in the design discussion.
>>>>>>>>
>>>>>>>> -Jason
>>>>>>>>
>>>>>>>> On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>>>>>>>>
>>>>>>> altekrusejason@gmail.com
>>>>>>>
>>>>>>>>
>>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>  Hello Parquet team,
>>>>>>>>>
>>>>>>>>> I wanted to report the results of a discussion between the Drill
>>>>>>>>>
>>>>>>>> team
>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>> the engineers  at Netflix working to make Parquet run faster with
>>>>>>>>>
>>>>>>>> Presto.
>>>>>>>
>>>>>>>> As we have said in the last few hangouts we both want to make
>>>>>>>>>
>>>>>>>> contributions
>>>>>>>>
>>>>>>>>> back to parquet-mr to add features and performance. We thought it
>>>>>>>>>
>>>>>>>> would
>>>>>>
>>>>>>> be
>>>>>>>>
>>>>>>>>> good to sit down and speak directly about our real goals and the
>>>>>>>>>
>>>>>>>> best
>>>>>>
>>>>>>> next
>>>>>>>>
>>>>>>>>> steps to get an engineering effort started to accomplish these
>>>>>>>>>
>>>>>>>> goals.
>>>>>>
>>>>>>>
>>>>>>>>> Below is a summary of the meeting.
>>>>>>>>>
>>>>>>>>> - Meeting notes
>>>>>>>>>
>>>>>>>>>     - Attendees:
>>>>>>>>>
>>>>>>>>>         - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>>>>>>>>>
>>>>>>>>>         - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>>>>>>>>>
>>>>>>>> Parth
>>>>
>>>>> Chandra
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Minutes
>>>>>>>>>
>>>>>>>>>     - Introductions/ Background
>>>>>>>>>
>>>>>>>>>     - Netflix
>>>>>>>>>
>>>>>>>>>         - Working on providing interactive SQL querying to users
>>>>>>>>>
>>>>>>>>>         - have chosen Presto as the query engine and Parquet as
>>>>>>>>>
>>>>>>>> high
>>>>
>>>>> performance data
>>>>>>>>>
>>>>>>>>>           storage format
>>>>>>>>>
>>>>>>>>>         - Presto is providing needed speed in some cases, but
>>>>>>>>>
>>>>>>>> others
>>>>
>>>>> are
>>>>>>
>>>>>>> missing optimizations
>>>>>>>>>
>>>>>>>>>           that could be avoiding reads
>>>>>>>>>
>>>>>>>>>         - Have already started some development and investigation,
>>>>>>>>>
>>>>>>>> have
>>>>>>
>>>>>>> identified key goals
>>>>>>>>>
>>>>>>>>>         - Some initial benchmarks with a modified ORC reader DWRF,
>>>>>>>>>
>>>>>>>> written
>>>>>>>
>>>>>>>> by the Presto
>>>>>>>>>
>>>>>>>>>           team shows that such gains are possible with a different
>>>>>>>>>
>>>>>>>> reader
>>>>>>>
>>>>>>>> implementation
>>>>>>>>>
>>>>>>>>>         - goals
>>>>>>>>>
>>>>>>>>>             - filter pushdown
>>>>>>>>>
>>>>>>>>>                 - skipping reads based on filter evaluation on
>>>>>>>>>
>>>>>>>> one or
>>>>
>>>>> more
>>>>>>>
>>>>>>>> columns
>>>>>>>>>
>>>>>>>>>                 - this can happen at several granularities : row
>>>>>>>>>
>>>>>>>> group,
>>>>>>
>>>>>>> page, record/value
>>>>>>>>>
>>>>>>>>>             - late/lazy materialization
>>>>>>>>>
>>>>>>>>>                 - for columns not involved in a filter, avoid
>>>>>>>>>
>>>>>>>> materializing
>>>>>>>>
>>>>>>>>> them entirely
>>>>>>>>>
>>>>>>>>>                   until they are know to be needed after
>>>>>>>>>
>>>>>>>> evaluating a
>>>>
>>>>> filter on other columns
>>>>>>>>>
>>>>>>>>>     - Drill
>>>>>>>>>
>>>>>>>>>         - the Drill engine uses an in-memory vectorized
>>>>>>>>>
>>>>>>>> representation
>>>>>>
>>>>>>> of
>>>>>>>
>>>>>>>> records
>>>>>>>>>
>>>>>>>>>         - for scalar and repeated types we have implemented a fast
>>>>>>>>> vectorized reader
>>>>>>>>>
>>>>>>>>>           that is optimized to transform between Parquet's on disk
>>>>>>>>>
>>>>>>>> and
>>>>>>
>>>>>>> our
>>>>>>>
>>>>>>>> in-memory format
>>>>>>>>>
>>>>>>>>>         - this is currently producing performant table scans, but
>>>>>>>>>
>>>>>>>> has no
>>>>>>
>>>>>>> facility for filter
>>>>>>>>>
>>>>>>>>>           push down
>>>>>>>>>
>>>>>>>>>         - Major goals going forward
>>>>>>>>>
>>>>>>>>>             - filter pushdown
>>>>>>>>>
>>>>>>>>>                 - decide the best implementation for incorporating
>>>>>>>>>
>>>>>>>> filter
>>>>>>>
>>>>>>>> pushdown into
>>>>>>>>>
>>>>>>>>>                   our current implementation, or figure out a way
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> leverage existing
>>>>>>>>>
>>>>>>>>>                   work in the parquet-mr library to accomplish
>>>>>>>>>
>>>>>>>> this
>>>>
>>>>> goal
>>>>>>
>>>>>>>
>>>>>>>>>             - late/lazy materialization
>>>>>>>>>
>>>>>>>>>                 - see above
>>>>>>>>>
>>>>>>>>>             - contribute existing code back to parquet
>>>>>>>>>
>>>>>>>>>                 - the Drill parquet reader has a very strong
>>>>>>>>>
>>>>>>>> emphasis on
>>>>>>
>>>>>>> performance, a
>>>>>>>>>
>>>>>>>>>                   clear interface to consume, that is sufficiently
>>>>>>>>> separated from Drill
>>>>>>>>>
>>>>>>>>>                   could prove very useful for other projects
>>>>>>>>>
>>>>>>>>>     - First steps
>>>>>>>>>
>>>>>>>>>         - Netflix team will share some of their thoughts and
>>>>>>>>>
>>>>>>>> research
>>>>
>>>>> from
>>>>>>>
>>>>>>>> working with
>>>>>>>>>
>>>>>>>>>           the DWRF code
>>>>>>>>>
>>>>>>>>>             - we can have a discussion based off of this, which
>>>>>>>>>
>>>>>>>> aspects
>>>>>>
>>>>>>> are
>>>>>>>>
>>>>>>>>> done well,
>>>>>>>>>
>>>>>>>>>               and any opportunities they may have missed that we
>>>>>>>>>
>>>>>>>> can
>>>>
>>>>> incorporate into our
>>>>>>>>>
>>>>>>>>>               design
>>>>>>>>>
>>>>>>>>>             - do further investigation and ask the existing
>>>>>>>>>
>>>>>>>> community
>>>>
>>>>> for
>>>>>>>
>>>>>>>> guidance on existing
>>>>>>>>>
>>>>>>>>>               parquet-mr features or planned APIs that may provide
>>>>>>>>>
>>>>>>>> desired
>>>>>>>
>>>>>>>> functionality
>>>>>>>>>
>>>>>>>>>         - We will begin a discussion of an API for the new
>>>>>>>>>
>>>>>>>> functionality
>>>>>>
>>>>>>>
>>>>>>>>>             - some outstanding thoughts for down the road
>>>>>>>>>
>>>>>>>>>                 - The Drill team has an interest in very late
>>>>>>>>> materialization for data stored
>>>>>>>>>
>>>>>>>>>                   in dictionary encoded pages, such as running a
>>>>>>>>>
>>>>>>>> join or
>>>>>>
>>>>>>> filter on the dictionary
>>>>>>>>>
>>>>>>>>>                   and then going back to the reader to grab all of
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> values in the data that match
>>>>>>>>>
>>>>>>>>>                   the needed members of the dictionary
>>>>>>>>>
>>>>>>>>>                     - this is a later consideration, but just
>>>>>>>>>
>>>>>>>> some of
>>>>
>>>>> the
>>>>>>>
>>>>>>>> idea of the reason we are
>>>>>>>>>
>>>>>>>>>                       opening up the design discussion early so
>>>>>>>>>
>>>>>>>> that
>>>>
>>>>> the
>>>>>>
>>>>>>> API can be flexible enough
>>>>>>>>>                       to allow this in the further, even if not
>>>>>>>>>
>>>>>>>> implemented
>>>>>>>>
>>>>>>>>> too soon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: High performance vectorized reader meeting notes

Posted by Ryan Blue <bl...@cloudera.com>.

Hi everyone,

Is there a JIRA issue tracking the vectorized reader API? Brock and I 
have been working through how we would integrate this with Hive and have 
a few questions and comments. Thanks!

rb

On 11/01/2014 01:03 PM, Brock Noland wrote:
> Hi,
>
> Great! I will take a look soon.
>
> Cheers!
> Brock
>
> On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:
>
>>
>> Thanks Jacques.
>>
>> Here is the gist:
>> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>>
>> Comments and Suggestions are appreciated.
>>
>> Thanks,
>> Zhenxiao
>>
>> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>>> You can't send attachments.  Can you post as google doc or gist?
>>>
>>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
>>> wrote:
>>>
>>>>
>>>> Thanks Brock and Jason.
>>>>
>>>> I just drafted a proposed APIs for vectorized Parquet reader(attached in
>>>> this email). Any comments and suggestions are appreciated.
>>>>
>>>> Thanks,
>>>> Zhenxiao
>>>>
>>>> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The Hive + Parquet community is very interested in improving
>>> performance
>>>>> of
>>>>> Hive + Parquet and Parquet generally. We are very interested in
>>>>> contributing to the Parquet vectorization and lazy materialization
>>> effort.
>>>>> Please add myself to any future meetings on this topic.
>>>>>
>>>>> BTW here it the JIRA tracking this effort from the Hive side:
>>>>> https://issues.apache.org/jira/browse/HIVE-8120
>>>>>
>>>>> Brock
>>>>>
>>>>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>>
>>>>> wrote:
>>>>>
>>>>>> Thanks Jason.
>>>>>>
>>>>>> Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>>>>>>
>>>>>>
>>>>>
>>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>>>>>> ).
>>>>>>
>>>>>> The fastest format currently in Presto is ORC, not DWRF(Parquet is
>>> fast,
>>>>>> but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>>>>>> implementation.
>>>>>>
>>>>>> We already get Parquet working in Presto. We definitely would like to
>>>>> get
>>>>>> it as fast as ORC.
>>>>>>
>>>>>> Facebook has did native support for ORC in Presto, which does not use
>>>>> the
>>>>>> ORCRecordReader at all. They parses the ORC footer, and does
>>> Predicate
>>>>>> Pushdown by skipping row groups, Vectorization by introducing Type
>>>>> Specific
>>>>>> Vectors, and Lazy Materialization by introducing LazyVectors(their
>>> code
>>>>> has
>>>>>> not been committed yet, I mean their pull request). We are planning
>>> to
>>>>> do
>>>>>> similar optimization for Parquet in Presto.
>>>>>>
>>>>>> For the ParquetRecordReader, we need additional APIs to read the next
>>>>> Batch
>>>>>> of values, and read in a vector of values. For example, here are the
>>>>>> related APIs in the ORC code:
>>>>>>
>>>>>> /**
>>>>>>     * Read the next row batch. The size of the batch to read cannot be
>>>>>> controlled
>>>>>>     * by the callers. Caller need to look at VectorizedRowBatch.size
>>> of
>>>>> the
>>>>>> retunred
>>>>>>     * object to know the batch size read.
>>>>>>     * @param previousBatch a row batch object that can be reused by
>>> the
>>>>>> reader
>>>>>>     * @return the row batch that was read
>>>>>>     * @throws java.io.IOException
>>>>>>     */
>>>>>>    VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>>> throws
>>>>>> IOException;
>>>>>>
>>>>>> And, here are the related APIs in Presto code, which is used for ORC
>>>>>> support in Presto:
>>>>>>
>>>>>> public void readVector(int columnIndex, Object vector);
>>>>>>
>>>>>> For lazy materialization, we may also consider adding LazyVectors or
>>>>>> LazyBlocks, so that the value is not materialized until they are
>>>>> accessed
>>>>>> by the Operator.
>>>>>>
>>>>>> Any comments and suggestions are appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> Zhenxiao
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>>>>> altekrusejason@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> No updates from me yet, just sending out another message for some
>>> of
>>>>> the
>>>>>>> Netflix engineers that were still just subscribed to the google
>>> group
>>>>>> mail.
>>>>>>> This will allow them to respond directly with their research on the
>>>>>>> optimized ORC reader for consideration in the design discussion.
>>>>>>>
>>>>>>> -Jason
>>>>>>>
>>>>>>> On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>>>>>> altekrusejason@gmail.com
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Parquet team,
>>>>>>>>
>>>>>>>> I wanted to report the results of a discussion between the Drill
>>>>> team
>>>>>> and
>>>>>>>> the engineers  at Netflix working to make Parquet run faster with
>>>>>> Presto.
>>>>>>>> As we have said in the last few hangouts we both want to make
>>>>>>> contributions
>>>>>>>> back to parquet-mr to add features and performance. We thought it
>>>>> would
>>>>>>> be
>>>>>>>> good to sit down and speak directly about our real goals and the
>>>>> best
>>>>>>> next
>>>>>>>> steps to get an engineering effort started to accomplish these
>>>>> goals.
>>>>>>>>
>>>>>>>> Below is a summary of the meeting.
>>>>>>>>
>>>>>>>> - Meeting notes
>>>>>>>>
>>>>>>>>     - Attendees:
>>>>>>>>
>>>>>>>>         - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>>>>>>>>
>>>>>>>>         - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>>> Parth
>>>>>>> Chandra
>>>>>>>>
>>>>>>>> - Minutes
>>>>>>>>
>>>>>>>>     - Introductions/ Background
>>>>>>>>
>>>>>>>>     - Netflix
>>>>>>>>
>>>>>>>>         - Working on providing interactive SQL querying to users
>>>>>>>>
>>>>>>>>         - have chosen Presto as the query engine and Parquet as
>>> high
>>>>>>>> performance data
>>>>>>>>
>>>>>>>>           storage format
>>>>>>>>
>>>>>>>>         - Presto is providing needed speed in some cases, but
>>> others
>>>>> are
>>>>>>>> missing optimizations
>>>>>>>>
>>>>>>>>           that could be avoiding reads
>>>>>>>>
>>>>>>>>         - Have already started some development and investigation,
>>>>> have
>>>>>>>> identified key goals
>>>>>>>>
>>>>>>>>         - Some initial benchmarks with a modified ORC reader DWRF,
>>>>>> written
>>>>>>>> by the Presto
>>>>>>>>
>>>>>>>>           team shows that such gains are possible with a different
>>>>>> reader
>>>>>>>> implementation
>>>>>>>>
>>>>>>>>         - goals
>>>>>>>>
>>>>>>>>             - filter pushdown
>>>>>>>>
>>>>>>>>                 - skipping reads based on filter evaluation on
>>> one or
>>>>>> more
>>>>>>>> columns
>>>>>>>>
>>>>>>>>                 - this can happen at several granularities : row
>>>>> group,
>>>>>>>> page, record/value
>>>>>>>>
>>>>>>>>             - late/lazy materialization
>>>>>>>>
>>>>>>>>                 - for columns not involved in a filter, avoid
>>>>>>> materializing
>>>>>>>> them entirely
>>>>>>>>
>>>>>>>>                   until they are know to be needed after
>>> evaluating a
>>>>>>>> filter on other columns
>>>>>>>>
>>>>>>>>     - Drill
>>>>>>>>
>>>>>>>>         - the Drill engine uses an in-memory vectorized
>>>>> representation
>>>>>> of
>>>>>>>> records
>>>>>>>>
>>>>>>>>         - for scalar and repeated types we have implemented a fast
>>>>>>>> vectorized reader
>>>>>>>>
>>>>>>>>           that is optimized to transform between Parquet's on disk
>>>>> and
>>>>>> our
>>>>>>>> in-memory format
>>>>>>>>
>>>>>>>>         - this is currently producing performant table scans, but
>>>>> has no
>>>>>>>> facility for filter
>>>>>>>>
>>>>>>>>           push down
>>>>>>>>
>>>>>>>>         - Major goals going forward
>>>>>>>>
>>>>>>>>             - filter pushdown
>>>>>>>>
>>>>>>>>                 - decide the best implementation for incorporating
>>>>>> filter
>>>>>>>> pushdown into
>>>>>>>>
>>>>>>>>                   our current implementation, or figure out a way
>>> to
>>>>>>>> leverage existing
>>>>>>>>
>>>>>>>>                   work in the parquet-mr library to accomplish
>>> this
>>>>> goal
>>>>>>>>
>>>>>>>>             - late/lazy materialization
>>>>>>>>
>>>>>>>>                 - see above
>>>>>>>>
>>>>>>>>             - contribute existing code back to parquet
>>>>>>>>
>>>>>>>>                 - the Drill parquet reader has a very strong
>>>>> emphasis on
>>>>>>>> performance, a
>>>>>>>>
>>>>>>>>                   clear interface to consume, that is sufficiently
>>>>>>>> separated from Drill
>>>>>>>>
>>>>>>>>                   could prove very useful for other projects
>>>>>>>>
>>>>>>>>     - First steps
>>>>>>>>
>>>>>>>>         - Netflix team will share some of their thoughts and
>>> research
>>>>>> from
>>>>>>>> working with
>>>>>>>>
>>>>>>>>           the DWRF code
>>>>>>>>
>>>>>>>>             - we can have a discussion based off of this, which
>>>>> aspects
>>>>>>> are
>>>>>>>> done well,
>>>>>>>>
>>>>>>>>               and any opportunities they may have missed that we
>>> can
>>>>>>>> incorporate into our
>>>>>>>>
>>>>>>>>               design
>>>>>>>>
>>>>>>>>             - do further investigation and ask the existing
>>> community
>>>>>> for
>>>>>>>> guidance on existing
>>>>>>>>
>>>>>>>>               parquet-mr features or planned APIs that may provide
>>>>>> desired
>>>>>>>> functionality
>>>>>>>>
>>>>>>>>         - We will begin a discussion of an API for the new
>>>>> functionality
>>>>>>>>
>>>>>>>>             - some outstanding thoughts for down the road
>>>>>>>>
>>>>>>>>                 - The Drill team has an interest in very late
>>>>>>>> materialization for data stored
>>>>>>>>
>>>>>>>>                   in dictionary encoded pages, such as running a
>>>>> join or
>>>>>>>> filter on the dictionary
>>>>>>>>
>>>>>>>>                   and then going back to the reader to grab all of
>>>>> the
>>>>>>>> values in the data that match
>>>>>>>>
>>>>>>>>                   the needed members of the dictionary
>>>>>>>>
>>>>>>>>                     - this is a later consideration, but just
>>> some of
>>>>>> the
>>>>>>>> idea of the reason we are
>>>>>>>>
>>>>>>>>                       opening up the design discussion early so
>>> that
>>>>> the
>>>>>>>> API can be flexible enough
>>>>>>>>                       to allow this in the further, even if not
>>>>>>> implemented
>>>>>>>> too soon
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.