You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Brock Noland <br...@cloudera.com> on 2014/11/01 21:03:16 UTC
Re: High performance vectorized reader meeting notes
Hi,
Great! I will take a look soon.
Cheers!
Brock
On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:
>
> Thanks Jacques.
>
> Here is the gist:
> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>
> Comments and Suggestions are appreciated.
>
> Thanks,
> Zhenxiao
>
> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
>> You can't send attachments. Can you post as google doc or gist?
>>
>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
>> wrote:
>>
>> >
>> > Thanks Brock and Jason.
>> >
>> > I just drafted a proposed APIs for vectorized Parquet reader(attached in
>> > this email). Any comments and suggestions are appreciated.
>> >
>> > Thanks,
>> > Zhenxiao
>> >
>> > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> The Hive + Parquet community is very interested in improving
>> performance
>> >> of
>> >> Hive + Parquet and Parquet generally. We are very interested in
>> >> contributing to the Parquet vectorization and lazy materialization
>> effort.
>> >> Please add myself to any future meetings on this topic.
>> >>
>> >> BTW here it the JIRA tracking this effort from the Hive side:
>> >> https://issues.apache.org/jira/browse/HIVE-8120
>> >>
>> >> Brock
>> >>
>> >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>> >
>> >> wrote:
>> >>
>> >> > Thanks Jason.
>> >> >
>> >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>> >> >
>> >> >
>> >>
>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>> >> > ).
>> >> >
>> >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is
>> fast,
>> >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>> >> > implementation.
>> >> >
>> >> > We already get Parquet working in Presto. We definitely would like to
>> >> get
>> >> > it as fast as ORC.
>> >> >
>> >> > Facebook has did native support for ORC in Presto, which does not use
>> >> the
>> >> > ORCRecordReader at all. They parses the ORC footer, and does
>> Predicate
>> >> > Pushdown by skipping row groups, Vectorization by introducing Type
>> >> Specific
>> >> > Vectors, and Lazy Materialization by introducing LazyVectors(their
>> code
>> >> has
>> >> > not been committed yet, I mean their pull request). We are planning
>> to
>> >> do
>> >> > similar optimization for Parquet in Presto.
>> >> >
>> >> > For the ParquetRecordReader, we need additional APIs to read the next
>> >> Batch
>> >> > of values, and read in a vector of values. For example, here are the
>> >> > related APIs in the ORC code:
>> >> >
>> >> > /**
>> >> > * Read the next row batch. The size of the batch to read cannot be
>> >> > controlled
>> >> > * by the callers. Caller need to look at VectorizedRowBatch.size
>> of
>> >> the
>> >> > retunred
>> >> > * object to know the batch size read.
>> >> > * @param previousBatch a row batch object that can be reused by
>> the
>> >> > reader
>> >> > * @return the row batch that was read
>> >> > * @throws java.io.IOException
>> >> > */
>> >> > VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>> throws
>> >> > IOException;
>> >> >
>> >> > And, here are the related APIs in Presto code, which is used for ORC
>> >> > support in Presto:
>> >> >
>> >> > public void readVector(int columnIndex, Object vector);
>> >> >
>> >> > For lazy materialization, we may also consider adding LazyVectors or
>> >> > LazyBlocks, so that the value is not materialized until they are
>> >> accessed
>> >> > by the Operator.
>> >> >
>> >> > Any comments and suggestions are appreciated.
>> >> >
>> >> > Thanks,
>> >> > Zhenxiao
>> >> >
>> >> >
>> >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>> >> altekrusejason@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hello All,
>> >> > >
>> >> > > No updates from me yet, just sending out another message for some
>> of
>> >> the
>> >> > > Netflix engineers that were still just subscribed to the google
>> group
>> >> > mail.
>> >> > > This will allow them to respond directly with their research on the
>> >> > > optimized ORC reader for consideration in the design discussion.
>> >> > >
>> >> > > -Jason
>> >> > >
>> >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>> >> > altekrusejason@gmail.com
>> >> > > >
>> >> > > wrote:
>> >> > >
>> >> > > > Hello Parquet team,
>> >> > > >
>> >> > > > I wanted to report the results of a discussion between the Drill
>> >> team
>> >> > and
>> >> > > > the engineers at Netflix working to make Parquet run faster with
>> >> > Presto.
>> >> > > > As we have said in the last few hangouts we both want to make
>> >> > > contributions
>> >> > > > back to parquet-mr to add features and performance. We thought it
>> >> would
>> >> > > be
>> >> > > > good to sit down and speak directly about our real goals and the
>> >> best
>> >> > > next
>> >> > > > steps to get an engineering effort started to accomplish these
>> >> goals.
>> >> > > >
>> >> > > > Below is a summary of the meeting.
>> >> > > >
>> >> > > > - Meeting notes
>> >> > > >
>> >> > > > - Attendees:
>> >> > > >
>> >> > > > - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>> >> > > >
>> >> > > > - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>> Parth
>> >> > > Chandra
>> >> > > >
>> >> > > > - Minutes
>> >> > > >
>> >> > > > - Introductions/ Background
>> >> > > >
>> >> > > > - Netflix
>> >> > > >
>> >> > > > - Working on providing interactive SQL querying to users
>> >> > > >
>> >> > > > - have chosen Presto as the query engine and Parquet as
>> high
>> >> > > > performance data
>> >> > > >
>> >> > > > storage format
>> >> > > >
>> >> > > > - Presto is providing needed speed in some cases, but
>> others
>> >> are
>> >> > > > missing optimizations
>> >> > > >
>> >> > > > that could be avoiding reads
>> >> > > >
>> >> > > > - Have already started some development and investigation,
>> >> have
>> >> > > > identified key goals
>> >> > > >
>> >> > > > - Some initial benchmarks with a modified ORC reader DWRF,
>> >> > written
>> >> > > > by the Presto
>> >> > > >
>> >> > > > team shows that such gains are possible with a different
>> >> > reader
>> >> > > > implementation
>> >> > > >
>> >> > > > - goals
>> >> > > >
>> >> > > > - filter pushdown
>> >> > > >
>> >> > > > - skipping reads based on filter evaluation on
>> one or
>> >> > more
>> >> > > > columns
>> >> > > >
>> >> > > > - this can happen at several granularities : row
>> >> group,
>> >> > > > page, record/value
>> >> > > >
>> >> > > > - late/lazy materialization
>> >> > > >
>> >> > > > - for columns not involved in a filter, avoid
>> >> > > materializing
>> >> > > > them entirely
>> >> > > >
>> >> > > > until they are know to be needed after
>> evaluating a
>> >> > > > filter on other columns
>> >> > > >
>> >> > > > - Drill
>> >> > > >
>> >> > > > - the Drill engine uses an in-memory vectorized
>> >> representation
>> >> > of
>> >> > > > records
>> >> > > >
>> >> > > > - for scalar and repeated types we have implemented a fast
>> >> > > > vectorized reader
>> >> > > >
>> >> > > > that is optimized to transform between Parquet's on disk
>> >> and
>> >> > our
>> >> > > > in-memory format
>> >> > > >
>> >> > > > - this is currently producing performant table scans, but
>> >> has no
>> >> > > > facility for filter
>> >> > > >
>> >> > > > push down
>> >> > > >
>> >> > > > - Major goals going forward
>> >> > > >
>> >> > > > - filter pushdown
>> >> > > >
>> >> > > > - decide the best implementation for incorporating
>> >> > filter
>> >> > > > pushdown into
>> >> > > >
>> >> > > > our current implementation, or figure out a way
>> to
>> >> > > > leverage existing
>> >> > > >
>> >> > > > work in the parquet-mr library to accomplish
>> this
>> >> goal
>> >> > > >
>> >> > > > - late/lazy materialization
>> >> > > >
>> >> > > > - see above
>> >> > > >
>> >> > > > - contribute existing code back to parquet
>> >> > > >
>> >> > > > - the Drill parquet reader has a very strong
>> >> emphasis on
>> >> > > > performance, a
>> >> > > >
>> >> > > > clear interface to consume, that is sufficiently
>> >> > > > separated from Drill
>> >> > > >
>> >> > > > could prove very useful for other projects
>> >> > > >
>> >> > > > - First steps
>> >> > > >
>> >> > > > - Netflix team will share some of their thoughts and
>> research
>> >> > from
>> >> > > > working with
>> >> > > >
>> >> > > > the DWRF code
>> >> > > >
>> >> > > > - we can have a discussion based off of this, which
>> >> aspects
>> >> > > are
>> >> > > > done well,
>> >> > > >
>> >> > > > and any opportunities they may have missed that we
>> can
>> >> > > > incorporate into our
>> >> > > >
>> >> > > > design
>> >> > > >
>> >> > > > - do further investigation and ask the existing
>> community
>> >> > for
>> >> > > > guidance on existing
>> >> > > >
>> >> > > > parquet-mr features or planned APIs that may provide
>> >> > desired
>> >> > > > functionality
>> >> > > >
>> >> > > > - We will begin a discussion of an API for the new
>> >> functionality
>> >> > > >
>> >> > > > - some outstanding thoughts for down the road
>> >> > > >
>> >> > > > - The Drill team has an interest in very late
>> >> > > > materialization for data stored
>> >> > > >
>> >> > > > in dictionary encoded pages, such as running a
>> >> join or
>> >> > > > filter on the dictionary
>> >> > > >
>> >> > > > and then going back to the reader to grab all of
>> >> the
>> >> > > > values in the data that match
>> >> > > >
>> >> > > > the needed members of the dictionary
>> >> > > >
>> >> > > > - this is a later consideration, but just
>> some of
>> >> > the
>> >> > > > idea of the reason we are
>> >> > > >
>> >> > > > opening up the design discussion early so
>> that
>> >> the
>> >> > > > API can be flexible enough
>> >> > > > to allow this in the further, even if not
>> >> > > implemented
>> >> > > > too soon
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>
Re: High performance vectorized reader meeting notes
Posted by Zhenxiao Luo <zl...@netflix.com.INVALID>.
Hi Ryan,
I just created this JIRA for it:
https://issues.apache.org/jira/browse/PARQUET-131
Comments and suggestions are welcome.
Thanks,
Zhenxiao
On Mon, Nov 10, 2014 at 10:59 AM, Ryan Blue <bl...@cloudera.com> wrote:
> Hi everyone,
>
> Is there a JIRA issue tracking the vectorized reader API? Brock and I have
> been working through how we would integrate this with Hive and have a few
> questions and comments. Thanks!
>
> rb
>
>
> On 11/01/2014 01:03 PM, Brock Noland wrote:
>
>> Hi,
>>
>> Great! I will take a look soon.
>>
>> Cheers!
>> Brock
>>
>> On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:
>>
>>
>>> Thanks Jacques.
>>>
>>> Here is the gist:
>>> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>>>
>>> Comments and Suggestions are appreciated.
>>>
>>> Thanks,
>>> Zhenxiao
>>>
>>> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>>
>>> You can't send attachments. Can you post as google doc or gist?
>>>>
>>>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>> >
>>>> wrote:
>>>>
>>>>
>>>>> Thanks Brock and Jason.
>>>>>
>>>>> I just drafted a proposed APIs for vectorized Parquet reader(attached
>>>>> in
>>>>> this email). Any comments and suggestions are appreciated.
>>>>>
>>>>> Thanks,
>>>>> Zhenxiao
>>>>>
>>>>> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> The Hive + Parquet community is very interested in improving
>>>>>>
>>>>> performance
>>>>
>>>>> of
>>>>>> Hive + Parquet and Parquet generally. We are very interested in
>>>>>> contributing to the Parquet vectorization and lazy materialization
>>>>>>
>>>>> effort.
>>>>
>>>>> Please add myself to any future meetings on this topic.
>>>>>>
>>>>>> BTW here it the JIRA tracking this effort from the Hive side:
>>>>>> https://issues.apache.org/jira/browse/HIVE-8120
>>>>>>
>>>>>> Brock
>>>>>>
>>>>>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>>>>
>>>>>
>>>>> wrote:
>>>>>>
>>>>>> Thanks Jason.
>>>>>>>
>>>>>>> Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> http://techblog.netflix.com/2014/10/using-presto-in-our-
>>>> big-data-platform.html
>>>>
>>>>> ).
>>>>>>>
>>>>>>> The fastest format currently in Presto is ORC, not DWRF(Parquet is
>>>>>>>
>>>>>> fast,
>>>>
>>>>> but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>>>>>>> implementation.
>>>>>>>
>>>>>>> We already get Parquet working in Presto. We definitely would like to
>>>>>>>
>>>>>> get
>>>>>>
>>>>>>> it as fast as ORC.
>>>>>>>
>>>>>>> Facebook has did native support for ORC in Presto, which does not use
>>>>>>>
>>>>>> the
>>>>>>
>>>>>>> ORCRecordReader at all. They parses the ORC footer, and does
>>>>>>>
>>>>>> Predicate
>>>>
>>>>> Pushdown by skipping row groups, Vectorization by introducing Type
>>>>>>>
>>>>>> Specific
>>>>>>
>>>>>>> Vectors, and Lazy Materialization by introducing LazyVectors(their
>>>>>>>
>>>>>> code
>>>>
>>>>> has
>>>>>>
>>>>>>> not been committed yet, I mean their pull request). We are planning
>>>>>>>
>>>>>> to
>>>>
>>>>> do
>>>>>>
>>>>>>> similar optimization for Parquet in Presto.
>>>>>>>
>>>>>>> For the ParquetRecordReader, we need additional APIs to read the next
>>>>>>>
>>>>>> Batch
>>>>>>
>>>>>>> of values, and read in a vector of values. For example, here are the
>>>>>>> related APIs in the ORC code:
>>>>>>>
>>>>>>> /**
>>>>>>> * Read the next row batch. The size of the batch to read cannot
>>>>>>> be
>>>>>>> controlled
>>>>>>> * by the callers. Caller need to look at VectorizedRowBatch.size
>>>>>>>
>>>>>> of
>>>>
>>>>> the
>>>>>>
>>>>>>> retunred
>>>>>>> * object to know the batch size read.
>>>>>>> * @param previousBatch a row batch object that can be reused by
>>>>>>>
>>>>>> the
>>>>
>>>>> reader
>>>>>>> * @return the row batch that was read
>>>>>>> * @throws java.io.IOException
>>>>>>> */
>>>>>>> VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>>>>>>>
>>>>>> throws
>>>>
>>>>> IOException;
>>>>>>>
>>>>>>> And, here are the related APIs in Presto code, which is used for ORC
>>>>>>> support in Presto:
>>>>>>>
>>>>>>> public void readVector(int columnIndex, Object vector);
>>>>>>>
>>>>>>> For lazy materialization, we may also consider adding LazyVectors or
>>>>>>> LazyBlocks, so that the value is not materialized until they are
>>>>>>>
>>>>>> accessed
>>>>>>
>>>>>>> by the Operator.
>>>>>>>
>>>>>>> Any comments and suggestions are appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Zhenxiao
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>>>>>>>
>>>>>> altekrusejason@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> No updates from me yet, just sending out another message for some
>>>>>>>>
>>>>>>> of
>>>>
>>>>> the
>>>>>>
>>>>>>> Netflix engineers that were still just subscribed to the google
>>>>>>>>
>>>>>>> group
>>>>
>>>>> mail.
>>>>>>>
>>>>>>>> This will allow them to respond directly with their research on the
>>>>>>>> optimized ORC reader for consideration in the design discussion.
>>>>>>>>
>>>>>>>> -Jason
>>>>>>>>
>>>>>>>> On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>>>>>>>>
>>>>>>> altekrusejason@gmail.com
>>>>>>>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hello Parquet team,
>>>>>>>>>
>>>>>>>>> I wanted to report the results of a discussion between the Drill
>>>>>>>>>
>>>>>>>> team
>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>> the engineers at Netflix working to make Parquet run faster with
>>>>>>>>>
>>>>>>>> Presto.
>>>>>>>
>>>>>>>> As we have said in the last few hangouts we both want to make
>>>>>>>>>
>>>>>>>> contributions
>>>>>>>>
>>>>>>>>> back to parquet-mr to add features and performance. We thought it
>>>>>>>>>
>>>>>>>> would
>>>>>>
>>>>>>> be
>>>>>>>>
>>>>>>>>> good to sit down and speak directly about our real goals and the
>>>>>>>>>
>>>>>>>> best
>>>>>>
>>>>>>> next
>>>>>>>>
>>>>>>>>> steps to get an engineering effort started to accomplish these
>>>>>>>>>
>>>>>>>> goals.
>>>>>>
>>>>>>>
>>>>>>>>> Below is a summary of the meeting.
>>>>>>>>>
>>>>>>>>> - Meeting notes
>>>>>>>>>
>>>>>>>>> - Attendees:
>>>>>>>>>
>>>>>>>>> - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>>>>>>>>>
>>>>>>>>> - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>>>>>>>>>
>>>>>>>> Parth
>>>>
>>>>> Chandra
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Minutes
>>>>>>>>>
>>>>>>>>> - Introductions/ Background
>>>>>>>>>
>>>>>>>>> - Netflix
>>>>>>>>>
>>>>>>>>> - Working on providing interactive SQL querying to users
>>>>>>>>>
>>>>>>>>> - have chosen Presto as the query engine and Parquet as
>>>>>>>>>
>>>>>>>> high
>>>>
>>>>> performance data
>>>>>>>>>
>>>>>>>>> storage format
>>>>>>>>>
>>>>>>>>> - Presto is providing needed speed in some cases, but
>>>>>>>>>
>>>>>>>> others
>>>>
>>>>> are
>>>>>>
>>>>>>> missing optimizations
>>>>>>>>>
>>>>>>>>> that could be avoiding reads
>>>>>>>>>
>>>>>>>>> - Have already started some development and investigation,
>>>>>>>>>
>>>>>>>> have
>>>>>>
>>>>>>> identified key goals
>>>>>>>>>
>>>>>>>>> - Some initial benchmarks with a modified ORC reader DWRF,
>>>>>>>>>
>>>>>>>> written
>>>>>>>
>>>>>>>> by the Presto
>>>>>>>>>
>>>>>>>>> team shows that such gains are possible with a different
>>>>>>>>>
>>>>>>>> reader
>>>>>>>
>>>>>>>> implementation
>>>>>>>>>
>>>>>>>>> - goals
>>>>>>>>>
>>>>>>>>> - filter pushdown
>>>>>>>>>
>>>>>>>>> - skipping reads based on filter evaluation on
>>>>>>>>>
>>>>>>>> one or
>>>>
>>>>> more
>>>>>>>
>>>>>>>> columns
>>>>>>>>>
>>>>>>>>> - this can happen at several granularities : row
>>>>>>>>>
>>>>>>>> group,
>>>>>>
>>>>>>> page, record/value
>>>>>>>>>
>>>>>>>>> - late/lazy materialization
>>>>>>>>>
>>>>>>>>> - for columns not involved in a filter, avoid
>>>>>>>>>
>>>>>>>> materializing
>>>>>>>>
>>>>>>>>> them entirely
>>>>>>>>>
>>>>>>>>> until they are know to be needed after
>>>>>>>>>
>>>>>>>> evaluating a
>>>>
>>>>> filter on other columns
>>>>>>>>>
>>>>>>>>> - Drill
>>>>>>>>>
>>>>>>>>> - the Drill engine uses an in-memory vectorized
>>>>>>>>>
>>>>>>>> representation
>>>>>>
>>>>>>> of
>>>>>>>
>>>>>>>> records
>>>>>>>>>
>>>>>>>>> - for scalar and repeated types we have implemented a fast
>>>>>>>>> vectorized reader
>>>>>>>>>
>>>>>>>>> that is optimized to transform between Parquet's on disk
>>>>>>>>>
>>>>>>>> and
>>>>>>
>>>>>>> our
>>>>>>>
>>>>>>>> in-memory format
>>>>>>>>>
>>>>>>>>> - this is currently producing performant table scans, but
>>>>>>>>>
>>>>>>>> has no
>>>>>>
>>>>>>> facility for filter
>>>>>>>>>
>>>>>>>>> push down
>>>>>>>>>
>>>>>>>>> - Major goals going forward
>>>>>>>>>
>>>>>>>>> - filter pushdown
>>>>>>>>>
>>>>>>>>> - decide the best implementation for incorporating
>>>>>>>>>
>>>>>>>> filter
>>>>>>>
>>>>>>>> pushdown into
>>>>>>>>>
>>>>>>>>> our current implementation, or figure out a way
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> leverage existing
>>>>>>>>>
>>>>>>>>> work in the parquet-mr library to accomplish
>>>>>>>>>
>>>>>>>> this
>>>>
>>>>> goal
>>>>>>
>>>>>>>
>>>>>>>>> - late/lazy materialization
>>>>>>>>>
>>>>>>>>> - see above
>>>>>>>>>
>>>>>>>>> - contribute existing code back to parquet
>>>>>>>>>
>>>>>>>>> - the Drill parquet reader has a very strong
>>>>>>>>>
>>>>>>>> emphasis on
>>>>>>
>>>>>>> performance, a
>>>>>>>>>
>>>>>>>>> clear interface to consume, that is sufficiently
>>>>>>>>> separated from Drill
>>>>>>>>>
>>>>>>>>> could prove very useful for other projects
>>>>>>>>>
>>>>>>>>> - First steps
>>>>>>>>>
>>>>>>>>> - Netflix team will share some of their thoughts and
>>>>>>>>>
>>>>>>>> research
>>>>
>>>>> from
>>>>>>>
>>>>>>>> working with
>>>>>>>>>
>>>>>>>>> the DWRF code
>>>>>>>>>
>>>>>>>>> - we can have a discussion based off of this, which
>>>>>>>>>
>>>>>>>> aspects
>>>>>>
>>>>>>> are
>>>>>>>>
>>>>>>>>> done well,
>>>>>>>>>
>>>>>>>>> and any opportunities they may have missed that we
>>>>>>>>>
>>>>>>>> can
>>>>
>>>>> incorporate into our
>>>>>>>>>
>>>>>>>>> design
>>>>>>>>>
>>>>>>>>> - do further investigation and ask the existing
>>>>>>>>>
>>>>>>>> community
>>>>
>>>>> for
>>>>>>>
>>>>>>>> guidance on existing
>>>>>>>>>
>>>>>>>>> parquet-mr features or planned APIs that may provide
>>>>>>>>>
>>>>>>>> desired
>>>>>>>
>>>>>>>> functionality
>>>>>>>>>
>>>>>>>>> - We will begin a discussion of an API for the new
>>>>>>>>>
>>>>>>>> functionality
>>>>>>
>>>>>>>
>>>>>>>>> - some outstanding thoughts for down the road
>>>>>>>>>
>>>>>>>>> - The Drill team has an interest in very late
>>>>>>>>> materialization for data stored
>>>>>>>>>
>>>>>>>>> in dictionary encoded pages, such as running a
>>>>>>>>>
>>>>>>>> join or
>>>>>>
>>>>>>> filter on the dictionary
>>>>>>>>>
>>>>>>>>> and then going back to the reader to grab all of
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> values in the data that match
>>>>>>>>>
>>>>>>>>> the needed members of the dictionary
>>>>>>>>>
>>>>>>>>> - this is a later consideration, but just
>>>>>>>>>
>>>>>>>> some of
>>>>
>>>>> the
>>>>>>>
>>>>>>>> idea of the reason we are
>>>>>>>>>
>>>>>>>>> opening up the design discussion early so
>>>>>>>>>
>>>>>>>> that
>>>>
>>>>> the
>>>>>>
>>>>>>> API can be flexible enough
>>>>>>>>> to allow this in the further, even if not
>>>>>>>>>
>>>>>>>> implemented
>>>>>>>>
>>>>>>>>> too soon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>
Re: High performance vectorized reader meeting notes
Posted by Ryan Blue <bl...@cloudera.com>.
Hi everyone,
Is there a JIRA issue tracking the vectorized reader API? Brock and I
have been working through how we would integrate this with Hive and have
a few questions and comments. Thanks!
rb
On 11/01/2014 01:03 PM, Brock Noland wrote:
> Hi,
>
> Great! I will take a look soon.
>
> Cheers!
> Brock
>
> On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <zl...@netflix.com> wrote:
>
>>
>> Thanks Jacques.
>>
>> Here is the gist:
>> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>>
>> Comments and Suggestions are appreciated.
>>
>> Thanks,
>> Zhenxiao
>>
>> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>>> You can't send attachments. Can you post as google doc or gist?
>>>
>>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <zl...@netflix.com.invalid>
>>> wrote:
>>>
>>>>
>>>> Thanks Brock and Jason.
>>>>
>>>> I just drafted a proposed APIs for vectorized Parquet reader(attached in
>>>> this email). Any comments and suggestions are appreciated.
>>>>
>>>> Thanks,
>>>> Zhenxiao
>>>>
>>>> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <br...@cloudera.com>
>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The Hive + Parquet community is very interested in improving
>>> performance
>>>>> of
>>>>> Hive + Parquet and Parquet generally. We are very interested in
>>>>> contributing to the Parquet vectorization and lazy materialization
>>> effort.
>>>>> Please add myself to any future meetings on this topic.
>>>>>
>>>>> BTW here it the JIRA tracking this effort from the Hive side:
>>>>> https://issues.apache.org/jira/browse/HIVE-8120
>>>>>
>>>>> Brock
>>>>>
>>>>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <zluo@netflix.com.invalid
>>>>
>>>>> wrote:
>>>>>
>>>>>> Thanks Jason.
>>>>>>
>>>>>> Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>>>>>>
>>>>>>
>>>>>
>>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>>>>>> ).
>>>>>>
>>>>>> The fastest format currently in Presto is ORC, not DWRF(Parquet is
>>> fast,
>>>>>> but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>>>>>> implementation.
>>>>>>
>>>>>> We already get Parquet working in Presto. We definitely would like to
>>>>> get
>>>>>> it as fast as ORC.
>>>>>>
>>>>>> Facebook has did native support for ORC in Presto, which does not use
>>>>> the
>>>>>> ORCRecordReader at all. They parses the ORC footer, and does
>>> Predicate
>>>>>> Pushdown by skipping row groups, Vectorization by introducing Type
>>>>> Specific
>>>>>> Vectors, and Lazy Materialization by introducing LazyVectors(their
>>> code
>>>>> has
>>>>>> not been committed yet, I mean their pull request). We are planning
>>> to
>>>>> do
>>>>>> similar optimization for Parquet in Presto.
>>>>>>
>>>>>> For the ParquetRecordReader, we need additional APIs to read the next
>>>>> Batch
>>>>>> of values, and read in a vector of values. For example, here are the
>>>>>> related APIs in the ORC code:
>>>>>>
>>>>>> /**
>>>>>> * Read the next row batch. The size of the batch to read cannot be
>>>>>> controlled
>>>>>> * by the callers. Caller need to look at VectorizedRowBatch.size
>>> of
>>>>> the
>>>>>> retunred
>>>>>> * object to know the batch size read.
>>>>>> * @param previousBatch a row batch object that can be reused by
>>> the
>>>>>> reader
>>>>>> * @return the row batch that was read
>>>>>> * @throws java.io.IOException
>>>>>> */
>>>>>> VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>>> throws
>>>>>> IOException;
>>>>>>
>>>>>> And, here are the related APIs in Presto code, which is used for ORC
>>>>>> support in Presto:
>>>>>>
>>>>>> public void readVector(int columnIndex, Object vector);
>>>>>>
>>>>>> For lazy materialization, we may also consider adding LazyVectors or
>>>>>> LazyBlocks, so that the value is not materialized until they are
>>>>> accessed
>>>>>> by the Operator.
>>>>>>
>>>>>> Any comments and suggestions are appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> Zhenxiao
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>>>>> altekrusejason@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> No updates from me yet, just sending out another message for some
>>> of
>>>>> the
>>>>>>> Netflix engineers that were still just subscribed to the google
>>> group
>>>>>> mail.
>>>>>>> This will allow them to respond directly with their research on the
>>>>>>> optimized ORC reader for consideration in the design discussion.
>>>>>>>
>>>>>>> -Jason
>>>>>>>
>>>>>>> On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>>>>>> altekrusejason@gmail.com
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Parquet team,
>>>>>>>>
>>>>>>>> I wanted to report the results of a discussion between the Drill
>>>>> team
>>>>>> and
>>>>>>>> the engineers at Netflix working to make Parquet run faster with
>>>>>> Presto.
>>>>>>>> As we have said in the last few hangouts we both want to make
>>>>>>> contributions
>>>>>>>> back to parquet-mr to add features and performance. We thought it
>>>>> would
>>>>>>> be
>>>>>>>> good to sit down and speak directly about our real goals and the
>>>>> best
>>>>>>> next
>>>>>>>> steps to get an engineering effort started to accomplish these
>>>>> goals.
>>>>>>>>
>>>>>>>> Below is a summary of the meeting.
>>>>>>>>
>>>>>>>> - Meeting notes
>>>>>>>>
>>>>>>>> - Attendees:
>>>>>>>>
>>>>>>>> - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>>>>>>>>
>>>>>>>> - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>>> Parth
>>>>>>> Chandra
>>>>>>>>
>>>>>>>> - Minutes
>>>>>>>>
>>>>>>>> - Introductions/ Background
>>>>>>>>
>>>>>>>> - Netflix
>>>>>>>>
>>>>>>>> - Working on providing interactive SQL querying to users
>>>>>>>>
>>>>>>>> - have chosen Presto as the query engine and Parquet as
>>> high
>>>>>>>> performance data
>>>>>>>>
>>>>>>>> storage format
>>>>>>>>
>>>>>>>> - Presto is providing needed speed in some cases, but
>>> others
>>>>> are
>>>>>>>> missing optimizations
>>>>>>>>
>>>>>>>> that could be avoiding reads
>>>>>>>>
>>>>>>>> - Have already started some development and investigation,
>>>>> have
>>>>>>>> identified key goals
>>>>>>>>
>>>>>>>> - Some initial benchmarks with a modified ORC reader DWRF,
>>>>>> written
>>>>>>>> by the Presto
>>>>>>>>
>>>>>>>> team shows that such gains are possible with a different
>>>>>> reader
>>>>>>>> implementation
>>>>>>>>
>>>>>>>> - goals
>>>>>>>>
>>>>>>>> - filter pushdown
>>>>>>>>
>>>>>>>> - skipping reads based on filter evaluation on
>>> one or
>>>>>> more
>>>>>>>> columns
>>>>>>>>
>>>>>>>> - this can happen at several granularities : row
>>>>> group,
>>>>>>>> page, record/value
>>>>>>>>
>>>>>>>> - late/lazy materialization
>>>>>>>>
>>>>>>>> - for columns not involved in a filter, avoid
>>>>>>> materializing
>>>>>>>> them entirely
>>>>>>>>
>>>>>>>> until they are know to be needed after
>>> evaluating a
>>>>>>>> filter on other columns
>>>>>>>>
>>>>>>>> - Drill
>>>>>>>>
>>>>>>>> - the Drill engine uses an in-memory vectorized
>>>>> representation
>>>>>> of
>>>>>>>> records
>>>>>>>>
>>>>>>>> - for scalar and repeated types we have implemented a fast
>>>>>>>> vectorized reader
>>>>>>>>
>>>>>>>> that is optimized to transform between Parquet's on disk
>>>>> and
>>>>>> our
>>>>>>>> in-memory format
>>>>>>>>
>>>>>>>> - this is currently producing performant table scans, but
>>>>> has no
>>>>>>>> facility for filter
>>>>>>>>
>>>>>>>> push down
>>>>>>>>
>>>>>>>> - Major goals going forward
>>>>>>>>
>>>>>>>> - filter pushdown
>>>>>>>>
>>>>>>>> - decide the best implementation for incorporating
>>>>>> filter
>>>>>>>> pushdown into
>>>>>>>>
>>>>>>>> our current implementation, or figure out a way
>>> to
>>>>>>>> leverage existing
>>>>>>>>
>>>>>>>> work in the parquet-mr library to accomplish
>>> this
>>>>> goal
>>>>>>>>
>>>>>>>> - late/lazy materialization
>>>>>>>>
>>>>>>>> - see above
>>>>>>>>
>>>>>>>> - contribute existing code back to parquet
>>>>>>>>
>>>>>>>> - the Drill parquet reader has a very strong
>>>>> emphasis on
>>>>>>>> performance, a
>>>>>>>>
>>>>>>>> clear interface to consume, that is sufficiently
>>>>>>>> separated from Drill
>>>>>>>>
>>>>>>>> could prove very useful for other projects
>>>>>>>>
>>>>>>>> - First steps
>>>>>>>>
>>>>>>>> - Netflix team will share some of their thoughts and
>>> research
>>>>>> from
>>>>>>>> working with
>>>>>>>>
>>>>>>>> the DWRF code
>>>>>>>>
>>>>>>>> - we can have a discussion based off of this, which
>>>>> aspects
>>>>>>> are
>>>>>>>> done well,
>>>>>>>>
>>>>>>>> and any opportunities they may have missed that we
>>> can
>>>>>>>> incorporate into our
>>>>>>>>
>>>>>>>> design
>>>>>>>>
>>>>>>>> - do further investigation and ask the existing
>>> community
>>>>>> for
>>>>>>>> guidance on existing
>>>>>>>>
>>>>>>>> parquet-mr features or planned APIs that may provide
>>>>>> desired
>>>>>>>> functionality
>>>>>>>>
>>>>>>>> - We will begin a discussion of an API for the new
>>>>> functionality
>>>>>>>>
>>>>>>>> - some outstanding thoughts for down the road
>>>>>>>>
>>>>>>>> - The Drill team has an interest in very late
>>>>>>>> materialization for data stored
>>>>>>>>
>>>>>>>> in dictionary encoded pages, such as running a
>>>>> join or
>>>>>>>> filter on the dictionary
>>>>>>>>
>>>>>>>> and then going back to the reader to grab all of
>>>>> the
>>>>>>>> values in the data that match
>>>>>>>>
>>>>>>>> the needed members of the dictionary
>>>>>>>>
>>>>>>>> - this is a later consideration, but just
>>> some of
>>>>>> the
>>>>>>>> idea of the reason we are
>>>>>>>>
>>>>>>>> opening up the design discussion early so
>>> that
>>>>> the
>>>>>>>> API can be flexible enough
>>>>>>>> to allow this in the further, even if not
>>>>>>> implemented
>>>>>>>> too soon
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
--
Ryan Blue
Software Engineer
Cloudera, Inc.