You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Hanumath Rao Maduri <ha...@gmail.com> on 2018/11/01 04:10:00 UTC

[Agenda] Drill developer meetup 2019

Drill Developers,


I am quite excited to announce the details of the Drill developers day
2019. I have consolidated the topics from our earlier discussions and
prioritized them according to the votes.


MapR has offered to host it on Nov 14th in Training room downstairs.


Here is the exact location


Training Room at

4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.


Please find the agenda for the meetup.



*Lunch starts at 12:00PM.*


*[12:25 - 12:40] Welcome *

   - Recap on last year's activities
   - Preview of this year's focus

*[12:40 - 1:00] Storage plugins*



   - Adding new storage plugins for the following:
      - Netflix Iceberg, Kudu(some code already exists), Cassandra,
      Elasticsearch, Carbondata, ORC/XML file formats, Spark
      RDD/DataFrames/Datasets, Graph databases & more
   - Improving documentation related to Storage plugins


*[1:00 - 1:45] Schema discovery & Evolution*



   - Creation, management of schema
   - Handling schema changes in certain common cases
   - Handling NULL values elegantly
   - Schema learning (similar to MSGpack plugin)
   - Query hints

*[1:45 - 2:30] Metadata Management*



   - Defining an abstraction layer for various types of metadata: views,
   schema, statistics, security
   - Underlying storage for metadata: what are the options and their
   trade-offs?
   - Hive metastore
   - Parquet metadata cache (parquet specific for row group metadata)
   - Ease of using the parquet files generated by other engines (like spark)


*[2:30 - 2:45] Break*


*[2:45 - 4:00] Resource management*



   - Resource limits per query
   - Optimal memory assignment for blocking operators based on stats
   - Enhancing the blocking and exchange operators to live within memory
   limits
   - Aligning with admission control/queueing (YARN concepts)
   - Query scheduling based on queues using tagging and costing
   - Drill on kubernetes


*[4:00 - 4:20] Apache Arrow*

   - Benefits of integrating Apache Drill with Apache Arrow
   - Possible trade-offs & implementation hurdles

*[4:20 - 4:40] **Performance Improvements*

   - Efficient handling of Broadcast/Semi/Anti Semi join
   - Drill Statistics handling
   - Optimizing complex Parquet reader

Thanks,
-Hanu

Re: [Agenda] Drill developer meetup 2018

Posted by hanu mapr <ha...@gmail.com>.

Hello All,

Please find the presentations of Drill developer day at the link below.
https://drive.google.com/drive/folders/1VjKkqCKghrrbmgAyDY7h2bhoF65QB57r

Thanks,

On Wed, Nov 14, 2018 at 8:32 AM Hanumanth Maduri <ha...@gmail.com> wrote:

>
> Hello Drillers,
>
> Here is the webex link for remote attendees.
> Remote attendees can join at
> https://mapr.webex.com/mapr/j.phpMTID=ma05d8b5406acdb6292d5b81c79240a38
>
> Thanks
>
>
> > On Nov 2, 2018, at 11:25 AM, Abhishek Girish <ag...@apache.org> wrote:
> >
> > Charles, I'm sure we'll have a link for remote folks to join - will share
> > it closer to the day.
> >
> >> On Thu, Nov 1, 2018 at 1:58 PM hanu mapr <ha...@gmail.com> wrote:
> >>
> >> Hello All,
> >>
> >> There was typo for the year in the mail. It should be 2018 instead of
> 2019.
> >> Thanks Aman for correcting it.
> >>
> >> Regards,
> >> -Hanu
> >>
> >>> On Thu, Nov 1, 2018 at 6:30 AM Charles Givre <cg...@gmail.com> wrote:
> >>>
> >>> Hi Hanumath,
> >>> This looks great!!  Will you be streaming the event for those of us not
> >> in
> >>> the Bay Area?
> >>> Thx,
> >>> — C
> >>>
> >>>> On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <ha...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Drill Developers,
> >>>>
> >>>>
> >>>> I am quite excited to announce the details of the Drill developers day
> >>>> 2018. I have consolidated the topics from our earlier discussions and
> >>>> prioritized them according to the votes.
> >>>>
> >>>>
> >>>> MapR has offered to host it on Nov 14th in Training room downstairs.
> >>>>
> >>>>
> >>>> Here is the exact location
> >>>>
> >>>>
> >>>> Training Room at
> >>>>
> >>>> 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.
> >>>>
> >>>>
> >>>> Please find the agenda for the meetup.
> >>>>
> >>>>
> >>>>
> >>>> *Lunch starts at 12:00PM.*
> >>>>
> >>>>
> >>>> *[12:25 - 12:40] Welcome *
> >>>>
> >>>>  - Recap on last year's activities
> >>>>  - Preview of this year's focus
> >>>>
> >>>> *[12:40 - 1:00] Storage plugins*
> >>>>
> >>>>
> >>>>
> >>>>  - Adding new storage plugins for the following:
> >>>>     - Netflix Iceberg, Kudu(some code already exists), Cassandra,
> >>>>     Elasticsearch, Carbondata, ORC/XML file formats, Spark
> >>>>     RDD/DataFrames/Datasets, Graph databases & more
> >>>>  - Improving documentation related to Storage plugins
> >>>>
> >>>>
> >>>> *[1:00 - 1:45] Schema discovery & Evolution*
> >>>>
> >>>>
> >>>>
> >>>>  - Creation, management of schema
> >>>>  - Handling schema changes in certain common cases
> >>>>  - Handling NULL values elegantly
> >>>>  - Schema learning (similar to MSGpack plugin)
> >>>>  - Query hints
> >>>>
> >>>> *[1:45 - 2:30] Metadata Management*
> >>>>
> >>>>
> >>>>
> >>>>  - Defining an abstraction layer for various types of metadata: views,
> >>>>  schema, statistics, security
> >>>>  - Underlying storage for metadata: what are the options and their
> >>>>  trade-offs?
> >>>>  - Hive metastore
> >>>>  - Parquet metadata cache (parquet specific for row group metadata)
> >>>>  - Ease of using the parquet files generated by other engines (like
> >>> spark)
> >>>>
> >>>>
> >>>> *[2:30 - 2:45] Break*
> >>>>
> >>>>
> >>>> *[2:45 - 4:00] Resource management*
> >>>>
> >>>>
> >>>>
> >>>>  - Resource limits per query
> >>>>  - Optimal memory assignment for blocking operators based on stats
> >>>>  - Enhancing the blocking and exchange operators to live within memory
> >>>>  limits
> >>>>  - Aligning with admission control/queueing (YARN concepts)
> >>>>  - Query scheduling based on queues using tagging and costing
> >>>>  - Drill on kubernetes
> >>>>
> >>>>
> >>>> *[4:00 - 4:20] Apache Arrow*
> >>>>
> >>>>  - Benefits of integrating Apache Drill with Apache Arrow
> >>>>  - Possible trade-offs & implementation hurdles
> >>>>
> >>>> *[4:20 - 4:40] **Performance Improvements*
> >>>>
> >>>>  - Efficient handling of Broadcast/Semi/Anti Semi join
> >>>>  - Drill Statistics handling
> >>>>  - Optimizing complex Parquet reader
> >>>>
> >>>> Thanks,
> >>>> -Hanu
> >>>
> >>>
> >>
>

Re: [Agenda] Drill developer meetup 2018

Posted by Hanumanth Maduri <ha...@gmail.com>.

Hello Drillers,

Here is the webex link for remote attendees.
Remote attendees can join at https://mapr.webex.com/mapr/j.phpMTID=ma05d8b5406acdb6292d5b81c79240a38

Thanks


> On Nov 2, 2018, at 11:25 AM, Abhishek Girish <ag...@apache.org> wrote:
> 
> Charles, I'm sure we'll have a link for remote folks to join - will share
> it closer to the day.
> 
>> On Thu, Nov 1, 2018 at 1:58 PM hanu mapr <ha...@gmail.com> wrote:
>> 
>> Hello All,
>> 
>> There was typo for the year in the mail. It should be 2018 instead of 2019.
>> Thanks Aman for correcting it.
>> 
>> Regards,
>> -Hanu
>> 
>>> On Thu, Nov 1, 2018 at 6:30 AM Charles Givre <cg...@gmail.com> wrote:
>>> 
>>> Hi Hanumath,
>>> This looks great!!  Will you be streaming the event for those of us not
>> in
>>> the Bay Area?
>>> Thx,
>>> — C
>>> 
>>>> On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <ha...@gmail.com>
>>> wrote:
>>>> 
>>>> Drill Developers,
>>>> 
>>>> 
>>>> I am quite excited to announce the details of the Drill developers day
>>>> 2018. I have consolidated the topics from our earlier discussions and
>>>> prioritized them according to the votes.
>>>> 
>>>> 
>>>> MapR has offered to host it on Nov 14th in Training room downstairs.
>>>> 
>>>> 
>>>> Here is the exact location
>>>> 
>>>> 
>>>> Training Room at
>>>> 
>>>> 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.
>>>> 
>>>> 
>>>> Please find the agenda for the meetup.
>>>> 
>>>> 
>>>> 
>>>> *Lunch starts at 12:00PM.*
>>>> 
>>>> 
>>>> *[12:25 - 12:40] Welcome *
>>>> 
>>>>  - Recap on last year's activities
>>>>  - Preview of this year's focus
>>>> 
>>>> *[12:40 - 1:00] Storage plugins*
>>>> 
>>>> 
>>>> 
>>>>  - Adding new storage plugins for the following:
>>>>     - Netflix Iceberg, Kudu(some code already exists), Cassandra,
>>>>     Elasticsearch, Carbondata, ORC/XML file formats, Spark
>>>>     RDD/DataFrames/Datasets, Graph databases & more
>>>>  - Improving documentation related to Storage plugins
>>>> 
>>>> 
>>>> *[1:00 - 1:45] Schema discovery & Evolution*
>>>> 
>>>> 
>>>> 
>>>>  - Creation, management of schema
>>>>  - Handling schema changes in certain common cases
>>>>  - Handling NULL values elegantly
>>>>  - Schema learning (similar to MSGpack plugin)
>>>>  - Query hints
>>>> 
>>>> *[1:45 - 2:30] Metadata Management*
>>>> 
>>>> 
>>>> 
>>>>  - Defining an abstraction layer for various types of metadata: views,
>>>>  schema, statistics, security
>>>>  - Underlying storage for metadata: what are the options and their
>>>>  trade-offs?
>>>>  - Hive metastore
>>>>  - Parquet metadata cache (parquet specific for row group metadata)
>>>>  - Ease of using the parquet files generated by other engines (like
>>> spark)
>>>> 
>>>> 
>>>> *[2:30 - 2:45] Break*
>>>> 
>>>> 
>>>> *[2:45 - 4:00] Resource management*
>>>> 
>>>> 
>>>> 
>>>>  - Resource limits per query
>>>>  - Optimal memory assignment for blocking operators based on stats
>>>>  - Enhancing the blocking and exchange operators to live within memory
>>>>  limits
>>>>  - Aligning with admission control/queueing (YARN concepts)
>>>>  - Query scheduling based on queues using tagging and costing
>>>>  - Drill on kubernetes
>>>> 
>>>> 
>>>> *[4:00 - 4:20] Apache Arrow*
>>>> 
>>>>  - Benefits of integrating Apache Drill with Apache Arrow
>>>>  - Possible trade-offs & implementation hurdles
>>>> 
>>>> *[4:20 - 4:40] **Performance Improvements*
>>>> 
>>>>  - Efficient handling of Broadcast/Semi/Anti Semi join
>>>>  - Drill Statistics handling
>>>>  - Optimizing complex Parquet reader
>>>> 
>>>> Thanks,
>>>> -Hanu
>>> 
>>> 
>>

Re: [Agenda] Drill developer meetup 2018

Posted by Abhishek Girish <ag...@apache.org>.

Charles, I'm sure we'll have a link for remote folks to join - will share
it closer to the day.

On Thu, Nov 1, 2018 at 1:58 PM hanu mapr <ha...@gmail.com> wrote:

> Hello All,
>
> There was typo for the year in the mail. It should be 2018 instead of 2019.
> Thanks Aman for correcting it.
>
> Regards,
> -Hanu
>
> On Thu, Nov 1, 2018 at 6:30 AM Charles Givre <cg...@gmail.com> wrote:
>
> > Hi Hanumath,
> > This looks great!!  Will you be streaming the event for those of us not
> in
> > the Bay Area?
> > Thx,
> > — C
> >
> > > On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <ha...@gmail.com>
> > wrote:
> > >
> > > Drill Developers,
> > >
> > >
> > > I am quite excited to announce the details of the Drill developers day
> > > 2018. I have consolidated the topics from our earlier discussions and
> > > prioritized them according to the votes.
> > >
> > >
> > > MapR has offered to host it on Nov 14th in Training room downstairs.
> > >
> > >
> > > Here is the exact location
> > >
> > >
> > > Training Room at
> > >
> > > 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.
> > >
> > >
> > > Please find the agenda for the meetup.
> > >
> > >
> > >
> > > *Lunch starts at 12:00PM.*
> > >
> > >
> > > *[12:25 - 12:40] Welcome *
> > >
> > >   - Recap on last year's activities
> > >   - Preview of this year's focus
> > >
> > > *[12:40 - 1:00] Storage plugins*
> > >
> > >
> > >
> > >   - Adding new storage plugins for the following:
> > >      - Netflix Iceberg, Kudu(some code already exists), Cassandra,
> > >      Elasticsearch, Carbondata, ORC/XML file formats, Spark
> > >      RDD/DataFrames/Datasets, Graph databases & more
> > >   - Improving documentation related to Storage plugins
> > >
> > >
> > > *[1:00 - 1:45] Schema discovery & Evolution*
> > >
> > >
> > >
> > >   - Creation, management of schema
> > >   - Handling schema changes in certain common cases
> > >   - Handling NULL values elegantly
> > >   - Schema learning (similar to MSGpack plugin)
> > >   - Query hints
> > >
> > > *[1:45 - 2:30] Metadata Management*
> > >
> > >
> > >
> > >   - Defining an abstraction layer for various types of metadata: views,
> > >   schema, statistics, security
> > >   - Underlying storage for metadata: what are the options and their
> > >   trade-offs?
> > >   - Hive metastore
> > >   - Parquet metadata cache (parquet specific for row group metadata)
> > >   - Ease of using the parquet files generated by other engines (like
> > spark)
> > >
> > >
> > > *[2:30 - 2:45] Break*
> > >
> > >
> > > *[2:45 - 4:00] Resource management*
> > >
> > >
> > >
> > >   - Resource limits per query
> > >   - Optimal memory assignment for blocking operators based on stats
> > >   - Enhancing the blocking and exchange operators to live within memory
> > >   limits
> > >   - Aligning with admission control/queueing (YARN concepts)
> > >   - Query scheduling based on queues using tagging and costing
> > >   - Drill on kubernetes
> > >
> > >
> > > *[4:00 - 4:20] Apache Arrow*
> > >
> > >   - Benefits of integrating Apache Drill with Apache Arrow
> > >   - Possible trade-offs & implementation hurdles
> > >
> > > *[4:20 - 4:40] **Performance Improvements*
> > >
> > >   - Efficient handling of Broadcast/Semi/Anti Semi join
> > >   - Drill Statistics handling
> > >   - Optimizing complex Parquet reader
> > >
> > > Thanks,
> > > -Hanu
> >
> >
>

Re: [Agenda] Drill developer meetup 2018

Posted by hanu mapr <ha...@gmail.com>.

Hello All,

There was typo for the year in the mail. It should be 2018 instead of 2019.
Thanks Aman for correcting it.

Regards,
-Hanu

On Thu, Nov 1, 2018 at 6:30 AM Charles Givre <cg...@gmail.com> wrote:

> Hi Hanumath,
> This looks great!!  Will you be streaming the event for those of us not in
> the Bay Area?
> Thx,
> — C
>
> > On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <ha...@gmail.com>
> wrote:
> >
> > Drill Developers,
> >
> >
> > I am quite excited to announce the details of the Drill developers day
> > 2018. I have consolidated the topics from our earlier discussions and
> > prioritized them according to the votes.
> >
> >
> > MapR has offered to host it on Nov 14th in Training room downstairs.
> >
> >
> > Here is the exact location
> >
> >
> > Training Room at
> >
> > 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.
> >
> >
> > Please find the agenda for the meetup.
> >
> >
> >
> > *Lunch starts at 12:00PM.*
> >
> >
> > *[12:25 - 12:40] Welcome *
> >
> >   - Recap on last year's activities
> >   - Preview of this year's focus
> >
> > *[12:40 - 1:00] Storage plugins*
> >
> >
> >
> >   - Adding new storage plugins for the following:
> >      - Netflix Iceberg, Kudu(some code already exists), Cassandra,
> >      Elasticsearch, Carbondata, ORC/XML file formats, Spark
> >      RDD/DataFrames/Datasets, Graph databases & more
> >   - Improving documentation related to Storage plugins
> >
> >
> > *[1:00 - 1:45] Schema discovery & Evolution*
> >
> >
> >
> >   - Creation, management of schema
> >   - Handling schema changes in certain common cases
> >   - Handling NULL values elegantly
> >   - Schema learning (similar to MSGpack plugin)
> >   - Query hints
> >
> > *[1:45 - 2:30] Metadata Management*
> >
> >
> >
> >   - Defining an abstraction layer for various types of metadata: views,
> >   schema, statistics, security
> >   - Underlying storage for metadata: what are the options and their
> >   trade-offs?
> >   - Hive metastore
> >   - Parquet metadata cache (parquet specific for row group metadata)
> >   - Ease of using the parquet files generated by other engines (like
> spark)
> >
> >
> > *[2:30 - 2:45] Break*
> >
> >
> > *[2:45 - 4:00] Resource management*
> >
> >
> >
> >   - Resource limits per query
> >   - Optimal memory assignment for blocking operators based on stats
> >   - Enhancing the blocking and exchange operators to live within memory
> >   limits
> >   - Aligning with admission control/queueing (YARN concepts)
> >   - Query scheduling based on queues using tagging and costing
> >   - Drill on kubernetes
> >
> >
> > *[4:00 - 4:20] Apache Arrow*
> >
> >   - Benefits of integrating Apache Drill with Apache Arrow
> >   - Possible trade-offs & implementation hurdles
> >
> > *[4:20 - 4:40] **Performance Improvements*
> >
> >   - Efficient handling of Broadcast/Semi/Anti Semi join
> >   - Drill Statistics handling
> >   - Optimizing complex Parquet reader
> >
> > Thanks,
> > -Hanu
>
>

Re: [Agenda] Drill developer meetup 2019

Posted by Charles Givre <cg...@gmail.com>.

Hi Hanumath, 
This looks great!!  Will you be streaming the event for those of us not in the Bay Area?
Thx,
— C

> On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <ha...@gmail.com> wrote:
> 
> Drill Developers,
> 
> 
> I am quite excited to announce the details of the Drill developers day
> 2019. I have consolidated the topics from our earlier discussions and
> prioritized them according to the votes.
> 
> 
> MapR has offered to host it on Nov 14th in Training room downstairs.
> 
> 
> Here is the exact location
> 
> 
> Training Room at
> 
> 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054.
> 
> 
> Please find the agenda for the meetup.
> 
> 
> 
> *Lunch starts at 12:00PM.*
> 
> 
> *[12:25 - 12:40] Welcome *
> 
>   - Recap on last year's activities
>   - Preview of this year's focus
> 
> *[12:40 - 1:00] Storage plugins*
> 
> 
> 
>   - Adding new storage plugins for the following:
>      - Netflix Iceberg, Kudu(some code already exists), Cassandra,
>      Elasticsearch, Carbondata, ORC/XML file formats, Spark
>      RDD/DataFrames/Datasets, Graph databases & more
>   - Improving documentation related to Storage plugins
> 
> 
> *[1:00 - 1:45] Schema discovery & Evolution*
> 
> 
> 
>   - Creation, management of schema
>   - Handling schema changes in certain common cases
>   - Handling NULL values elegantly
>   - Schema learning (similar to MSGpack plugin)
>   - Query hints
> 
> *[1:45 - 2:30] Metadata Management*
> 
> 
> 
>   - Defining an abstraction layer for various types of metadata: views,
>   schema, statistics, security
>   - Underlying storage for metadata: what are the options and their
>   trade-offs?
>   - Hive metastore
>   - Parquet metadata cache (parquet specific for row group metadata)
>   - Ease of using the parquet files generated by other engines (like spark)
> 
> 
> *[2:30 - 2:45] Break*
> 
> 
> *[2:45 - 4:00] Resource management*
> 
> 
> 
>   - Resource limits per query
>   - Optimal memory assignment for blocking operators based on stats
>   - Enhancing the blocking and exchange operators to live within memory
>   limits
>   - Aligning with admission control/queueing (YARN concepts)
>   - Query scheduling based on queues using tagging and costing
>   - Drill on kubernetes
> 
> 
> *[4:00 - 4:20] Apache Arrow*
> 
>   - Benefits of integrating Apache Drill with Apache Arrow
>   - Possible trade-offs & implementation hurdles
> 
> *[4:20 - 4:40] **Performance Improvements*
> 
>   - Efficient handling of Broadcast/Semi/Anti Semi join
>   - Drill Statistics handling
>   - Optimizing complex Parquet reader
> 
> Thanks,
> -Hanu