You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Hanumath Rao Maduri <ha...@gmail.com> on 2018/08/20 19:35:07 UTC

Drill Hangout tomorrow 08/21

The Apache Drill Hangout will be held tomorrow at 10:00am PST; please let
us know should you have a topic for tomorrow's hangout. We will also ask
for topics at the beginning of the hangout.

Hangout Link -
https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Regards,
Hanu

Re: Drill Hangout tomorrow 08/21

Posted by Oleksandr Kalinin <al...@gmail.com>.

Hi Vitalii,

I added a comment to JIRA.

Best Regards,
Alex

On Wed, Sep 12, 2018 at 6:47 PM Vitalii Diravka <vi...@gmail.com>
wrote:

> Oleksandr,
>
> You couldn't connect to this hangout meeting. But you can share your ideas
> in the answer to our last comment regarding Drill Metastore [1].
> Could you please take a look?
>
> [1]
>
> https://issues.apache.org/jira/browse/DRILL-6552?focusedCommentId=16612437&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16612437
>
> Kind regards
> Vitalii
>
>
> On Wed, Aug 22, 2018 at 8:28 AM Hanumath Rao Maduri <ha...@gmail.com>
> wrote:
>
> > Hangout attendees on 08/21:
> > Pritesh, Salim, Hanumath, Boaz, Robert, Jyothsna, Karthik, Gautam,
> Vitalli,
> > Vova, Parth, Olek
> >
> > Vitalli and Vova gave a presentation on Drill Metadata management
> project.
> >
> > Some of the questions which were discussed during the discussion.
> > 1) Gautam suggested to use native operators for collecting stats instead
> of
> > aggregation operators.
> > 2) The metadata API should be made abstract such that metastore can use a
> > dfs or hive metastore etc.
> > 3) Schema change exception can be minimized by hive metastore but not
> > totally overcome.
> > 4) Discussion on how to refresh the metadata.
> > 5) Caching the metadata and discussion on what problems the eariler
> caching
> > solutions had in Drill.
> >
> >
> > Further metadata discussion will be continued in the next hangout.
> >
> > -Hanu
> >
> > On Tue, Aug 21, 2018 at 9:53 AM Vitalii Diravka <
> vitalii.diravka@gmail.com
> > >
> > wrote:
> >
> > > Hi Alex,
> > >
> > > The issues pointed by you really exist. And using of HMS is still open
> > > question.
> > >
> > > The main goal is to make Drill Metastore API, which can be used for
> > > different Drill data sources. Then to adapt current Parquet metadata
> > cache
> > > files mechanism to this API.
> > > It will be the first implementation. The second one could be HMS.
> > > Although it has limitations, it has also benefits: it is easy to
> leverage
> > > it in Drill, a lot of projects already use HMS (Spark, Presto ...),
> > > so for some users it can be a good choice for storing metadata.
> > >
> > > Other implementations for Drill Metastore could be discussed (MetaCat,
> > > WhereHow, new own implementation based on HBase/MapR-DB).
> > >
> > >
> > > Kind regards
> > > Vitalii
> > >
> > >
> > > On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin <al...@gmail.com>
> > > wrote:
> > >
> > > > Hi Volodymyr,
> > > >
> > > > Just recalling on recent discussions in DEV list, it would be
> > interesting
> > > > to see if following topics are addressed in the Drill metadata
> > management
> > > > initiative:
> > > >
> > > > 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> > > > Just to substantiate this point of view from practical experience,
> and
> > if
> > > > we reflect on ambition to integrate and operate Drill in
> > mission-critical
> > > > environment, following aspects could be listed:
> > > >   - Need of DBA support if cluster is subject to service level
> > > > objectives/agreements, which is somehow remote from Hadoop world.
> Need
> > of
> > > > strong DBA skills if resulting DB workload is challenging in terms of
> > > > performance tuning.
> > > >   - Common RDBMS setups offer active-standby HA model. In secure
> > > > environments, e.g. environments which are subject to PCI-DSS
> > compliancy,
> > > > that implies frequent OS patching and reboot (in reality every 30
> days
> > > > max), thus causing an additional coordination effort and service
> outage
> > > for
> > > > duration of the failovers.
> > > >   - Active-active HA clusters like Galera / Percona are free of above
> > > > disadvantage, but require specific skill set which is not widespread
> in
> > > DBA
> > > > community. Also they are sensitive to even disk IO performance across
> > the
> > > > cluster which may require additional hardware adjustment and IO
> > > isolation.
> > > >   - Need of backup / restore mechanism, which is probably lesser of
> > > > concerns
> > > >
> > > > 2. Bottleneck in foreman when performing initial metadata collection
> > (and
> > > > eventually pruning) on large amount of Parquet files
> > > >   - From discussion in the mailing list it was not fully clear
> whether
> > > > metastore will address it
> > > >   - Or shall this discussion be continued outside of metastore
> > initiative
> > > > from your point of view?
> > > >
> > > > I hope it would be OK with you and Vitalii to share some thoughts on
> > > this.
> > > >
> > > > Thanks & Best Regards,
> > > > Alex
> > > >
> > > > On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <
> > > volodymyr@apache.org
> > > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I and Vitalii Diravka want to give the presentation with our ideas
> > > > > connected with Drill Metadata management project (DRILL-6552
> > > > > <https://issues.apache.org/jira/browse/DRILL-6552>).
> > > > >
> > > > > We will be happy to discuss it and choose the right way for further
> > > > > development.
> > > > >
> > > > > Kind regards,
> > > > > Volodymyr Vysotskyi
> > > > >
> > > > >
> > > > > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <
> > > hanu.ncr@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > The Apache Drill Hangout will be held tomorrow at 10:00am PST;
> > please
> > > > let
> > > > > > us know should you have a topic for tomorrow's hangout. We will
> > also
> > > > ask
> > > > > > for topics at the beginning of the hangout.
> > > > > >
> > > > > > Hangout Link -
> > > > > >
> > > >
> > https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> > > > > >
> > > > > > Regards,
> > > > > > Hanu
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Drill Hangout tomorrow 08/21

Posted by Vitalii Diravka <vi...@gmail.com>.

Oleksandr,

You couldn't connect to this hangout meeting. But you can share your ideas
in the answer to our last comment regarding Drill Metastore [1].
Could you please take a look?

[1]
https://issues.apache.org/jira/browse/DRILL-6552?focusedCommentId=16612437&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16612437

Kind regards
Vitalii


On Wed, Aug 22, 2018 at 8:28 AM Hanumath Rao Maduri <ha...@gmail.com>
wrote:

> Hangout attendees on 08/21:
> Pritesh, Salim, Hanumath, Boaz, Robert, Jyothsna, Karthik, Gautam, Vitalli,
> Vova, Parth, Olek
>
> Vitalli and Vova gave a presentation on Drill Metadata management project.
>
> Some of the questions which were discussed during the discussion.
> 1) Gautam suggested to use native operators for collecting stats instead of
> aggregation operators.
> 2) The metadata API should be made abstract such that metastore can use a
> dfs or hive metastore etc.
> 3) Schema change exception can be minimized by hive metastore but not
> totally overcome.
> 4) Discussion on how to refresh the metadata.
> 5) Caching the metadata and discussion on what problems the eariler caching
> solutions had in Drill.
>
>
> Further metadata discussion will be continued in the next hangout.
>
> -Hanu
>
> On Tue, Aug 21, 2018 at 9:53 AM Vitalii Diravka <vitalii.diravka@gmail.com
> >
> wrote:
>
> > Hi Alex,
> >
> > The issues pointed by you really exist. And using of HMS is still open
> > question.
> >
> > The main goal is to make Drill Metastore API, which can be used for
> > different Drill data sources. Then to adapt current Parquet metadata
> cache
> > files mechanism to this API.
> > It will be the first implementation. The second one could be HMS.
> > Although it has limitations, it has also benefits: it is easy to leverage
> > it in Drill, a lot of projects already use HMS (Spark, Presto ...),
> > so for some users it can be a good choice for storing metadata.
> >
> > Other implementations for Drill Metastore could be discussed (MetaCat,
> > WhereHow, new own implementation based on HBase/MapR-DB).
> >
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin <al...@gmail.com>
> > wrote:
> >
> > > Hi Volodymyr,
> > >
> > > Just recalling on recent discussions in DEV list, it would be
> interesting
> > > to see if following topics are addressed in the Drill metadata
> management
> > > initiative:
> > >
> > > 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> > > Just to substantiate this point of view from practical experience, and
> if
> > > we reflect on ambition to integrate and operate Drill in
> mission-critical
> > > environment, following aspects could be listed:
> > >   - Need of DBA support if cluster is subject to service level
> > > objectives/agreements, which is somehow remote from Hadoop world. Need
> of
> > > strong DBA skills if resulting DB workload is challenging in terms of
> > > performance tuning.
> > >   - Common RDBMS setups offer active-standby HA model. In secure
> > > environments, e.g. environments which are subject to PCI-DSS
> compliancy,
> > > that implies frequent OS patching and reboot (in reality every 30 days
> > > max), thus causing an additional coordination effort and service outage
> > for
> > > duration of the failovers.
> > >   - Active-active HA clusters like Galera / Percona are free of above
> > > disadvantage, but require specific skill set which is not widespread in
> > DBA
> > > community. Also they are sensitive to even disk IO performance across
> the
> > > cluster which may require additional hardware adjustment and IO
> > isolation.
> > >   - Need of backup / restore mechanism, which is probably lesser of
> > > concerns
> > >
> > > 2. Bottleneck in foreman when performing initial metadata collection
> (and
> > > eventually pruning) on large amount of Parquet files
> > >   - From discussion in the mailing list it was not fully clear whether
> > > metastore will address it
> > >   - Or shall this discussion be continued outside of metastore
> initiative
> > > from your point of view?
> > >
> > > I hope it would be OK with you and Vitalii to share some thoughts on
> > this.
> > >
> > > Thanks & Best Regards,
> > > Alex
> > >
> > > On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <
> > volodymyr@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I and Vitalii Diravka want to give the presentation with our ideas
> > > > connected with Drill Metadata management project (DRILL-6552
> > > > <https://issues.apache.org/jira/browse/DRILL-6552>).
> > > >
> > > > We will be happy to discuss it and choose the right way for further
> > > > development.
> > > >
> > > > Kind regards,
> > > > Volodymyr Vysotskyi
> > > >
> > > >
> > > > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <
> > hanu.ncr@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > The Apache Drill Hangout will be held tomorrow at 10:00am PST;
> please
> > > let
> > > > > us know should you have a topic for tomorrow's hangout. We will
> also
> > > ask
> > > > > for topics at the beginning of the hangout.
> > > > >
> > > > > Hangout Link -
> > > > >
> > >
> https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> > > > >
> > > > > Regards,
> > > > > Hanu
> > > > >
> > > >
> > >
> >
>

Re: Drill Hangout tomorrow 08/21

Posted by Vitalii Diravka <vi...@gmail.com>.

Oleksandr,

You couldn't connect to this hangout meeting. But you can share your ideas
in the answer to our last comment regarding Drill Metastore [1].
Could you please take a look?

[1]
https://issues.apache.org/jira/browse/DRILL-6552?focusedCommentId=16612437&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16612437

Kind regards
Vitalii


On Wed, Aug 22, 2018 at 8:28 AM Hanumath Rao Maduri <ha...@gmail.com>
wrote:

> Hangout attendees on 08/21:
> Pritesh, Salim, Hanumath, Boaz, Robert, Jyothsna, Karthik, Gautam, Vitalli,
> Vova, Parth, Olek
>
> Vitalli and Vova gave a presentation on Drill Metadata management project.
>
> Some of the questions which were discussed during the discussion.
> 1) Gautam suggested to use native operators for collecting stats instead of
> aggregation operators.
> 2) The metadata API should be made abstract such that metastore can use a
> dfs or hive metastore etc.
> 3) Schema change exception can be minimized by hive metastore but not
> totally overcome.
> 4) Discussion on how to refresh the metadata.
> 5) Caching the metadata and discussion on what problems the eariler caching
> solutions had in Drill.
>
>
> Further metadata discussion will be continued in the next hangout.
>
> -Hanu
>
> On Tue, Aug 21, 2018 at 9:53 AM Vitalii Diravka <vitalii.diravka@gmail.com
> >
> wrote:
>
> > Hi Alex,
> >
> > The issues pointed by you really exist. And using of HMS is still open
> > question.
> >
> > The main goal is to make Drill Metastore API, which can be used for
> > different Drill data sources. Then to adapt current Parquet metadata
> cache
> > files mechanism to this API.
> > It will be the first implementation. The second one could be HMS.
> > Although it has limitations, it has also benefits: it is easy to leverage
> > it in Drill, a lot of projects already use HMS (Spark, Presto ...),
> > so for some users it can be a good choice for storing metadata.
> >
> > Other implementations for Drill Metastore could be discussed (MetaCat,
> > WhereHow, new own implementation based on HBase/MapR-DB).
> >
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin <al...@gmail.com>
> > wrote:
> >
> > > Hi Volodymyr,
> > >
> > > Just recalling on recent discussions in DEV list, it would be
> interesting
> > > to see if following topics are addressed in the Drill metadata
> management
> > > initiative:
> > >
> > > 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> > > Just to substantiate this point of view from practical experience, and
> if
> > > we reflect on ambition to integrate and operate Drill in
> mission-critical
> > > environment, following aspects could be listed:
> > >   - Need of DBA support if cluster is subject to service level
> > > objectives/agreements, which is somehow remote from Hadoop world. Need
> of
> > > strong DBA skills if resulting DB workload is challenging in terms of
> > > performance tuning.
> > >   - Common RDBMS setups offer active-standby HA model. In secure
> > > environments, e.g. environments which are subject to PCI-DSS
> compliancy,
> > > that implies frequent OS patching and reboot (in reality every 30 days
> > > max), thus causing an additional coordination effort and service outage
> > for
> > > duration of the failovers.
> > >   - Active-active HA clusters like Galera / Percona are free of above
> > > disadvantage, but require specific skill set which is not widespread in
> > DBA
> > > community. Also they are sensitive to even disk IO performance across
> the
> > > cluster which may require additional hardware adjustment and IO
> > isolation.
> > >   - Need of backup / restore mechanism, which is probably lesser of
> > > concerns
> > >
> > > 2. Bottleneck in foreman when performing initial metadata collection
> (and
> > > eventually pruning) on large amount of Parquet files
> > >   - From discussion in the mailing list it was not fully clear whether
> > > metastore will address it
> > >   - Or shall this discussion be continued outside of metastore
> initiative
> > > from your point of view?
> > >
> > > I hope it would be OK with you and Vitalii to share some thoughts on
> > this.
> > >
> > > Thanks & Best Regards,
> > > Alex
> > >
> > > On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <
> > volodymyr@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I and Vitalii Diravka want to give the presentation with our ideas
> > > > connected with Drill Metadata management project (DRILL-6552
> > > > <https://issues.apache.org/jira/browse/DRILL-6552>).
> > > >
> > > > We will be happy to discuss it and choose the right way for further
> > > > development.
> > > >
> > > > Kind regards,
> > > > Volodymyr Vysotskyi
> > > >
> > > >
> > > > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <
> > hanu.ncr@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > The Apache Drill Hangout will be held tomorrow at 10:00am PST;
> please
> > > let
> > > > > us know should you have a topic for tomorrow's hangout. We will
> also
> > > ask
> > > > > for topics at the beginning of the hangout.
> > > > >
> > > > > Hangout Link -
> > > > >
> > >
> https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> > > > >
> > > > > Regards,
> > > > > Hanu
> > > > >
> > > >
> > >
> >
>

Re: Drill Hangout tomorrow 08/21

Posted by Hanumath Rao Maduri <ha...@gmail.com>.

Hangout attendees on 08/21:
Pritesh, Salim, Hanumath, Boaz, Robert, Jyothsna, Karthik, Gautam, Vitalli,
Vova, Parth, Olek

Vitalli and Vova gave a presentation on Drill Metadata management project.

Some of the questions which were discussed during the discussion.
1) Gautam suggested to use native operators for collecting stats instead of
aggregation operators.
2) The metadata API should be made abstract such that metastore can use a
dfs or hive metastore etc.
3) Schema change exception can be minimized by hive metastore but not
totally overcome.
4) Discussion on how to refresh the metadata.
5) Caching the metadata and discussion on what problems the eariler caching
solutions had in Drill.


Further metadata discussion will be continued in the next hangout.

-Hanu

On Tue, Aug 21, 2018 at 9:53 AM Vitalii Diravka <vi...@gmail.com>
wrote:

> Hi Alex,
>
> The issues pointed by you really exist. And using of HMS is still open
> question.
>
> The main goal is to make Drill Metastore API, which can be used for
> different Drill data sources. Then to adapt current Parquet metadata cache
> files mechanism to this API.
> It will be the first implementation. The second one could be HMS.
> Although it has limitations, it has also benefits: it is easy to leverage
> it in Drill, a lot of projects already use HMS (Spark, Presto ...),
> so for some users it can be a good choice for storing metadata.
>
> Other implementations for Drill Metastore could be discussed (MetaCat,
> WhereHow, new own implementation based on HBase/MapR-DB).
>
>
> Kind regards
> Vitalii
>
>
> On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin <al...@gmail.com>
> wrote:
>
> > Hi Volodymyr,
> >
> > Just recalling on recent discussions in DEV list, it would be interesting
> > to see if following topics are addressed in the Drill metadata management
> > initiative:
> >
> > 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> > Just to substantiate this point of view from practical experience, and if
> > we reflect on ambition to integrate and operate Drill in mission-critical
> > environment, following aspects could be listed:
> >   - Need of DBA support if cluster is subject to service level
> > objectives/agreements, which is somehow remote from Hadoop world. Need of
> > strong DBA skills if resulting DB workload is challenging in terms of
> > performance tuning.
> >   - Common RDBMS setups offer active-standby HA model. In secure
> > environments, e.g. environments which are subject to PCI-DSS compliancy,
> > that implies frequent OS patching and reboot (in reality every 30 days
> > max), thus causing an additional coordination effort and service outage
> for
> > duration of the failovers.
> >   - Active-active HA clusters like Galera / Percona are free of above
> > disadvantage, but require specific skill set which is not widespread in
> DBA
> > community. Also they are sensitive to even disk IO performance across the
> > cluster which may require additional hardware adjustment and IO
> isolation.
> >   - Need of backup / restore mechanism, which is probably lesser of
> > concerns
> >
> > 2. Bottleneck in foreman when performing initial metadata collection (and
> > eventually pruning) on large amount of Parquet files
> >   - From discussion in the mailing list it was not fully clear whether
> > metastore will address it
> >   - Or shall this discussion be continued outside of metastore initiative
> > from your point of view?
> >
> > I hope it would be OK with you and Vitalii to share some thoughts on
> this.
> >
> > Thanks & Best Regards,
> > Alex
> >
> > On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <
> volodymyr@apache.org
> > >
> > wrote:
> >
> > > Hi all,
> > >
> > > I and Vitalii Diravka want to give the presentation with our ideas
> > > connected with Drill Metadata management project (DRILL-6552
> > > <https://issues.apache.org/jira/browse/DRILL-6552>).
> > >
> > > We will be happy to discuss it and choose the right way for further
> > > development.
> > >
> > > Kind regards,
> > > Volodymyr Vysotskyi
> > >
> > >
> > > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <
> hanu.ncr@gmail.com
> > >
> > > wrote:
> > >
> > > > The Apache Drill Hangout will be held tomorrow at 10:00am PST; please
> > let
> > > > us know should you have a topic for tomorrow's hangout. We will also
> > ask
> > > > for topics at the beginning of the hangout.
> > > >
> > > > Hangout Link -
> > > >
> > https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> > > >
> > > > Regards,
> > > > Hanu
> > > >
> > >
> >
>

Re: Drill Hangout tomorrow 08/21

Posted by Vitalii Diravka <vi...@gmail.com>.

Hi Alex,

The issues pointed by you really exist. And using of HMS is still open
question.

The main goal is to make Drill Metastore API, which can be used for
different Drill data sources. Then to adapt current Parquet metadata cache
files mechanism to this API.
It will be the first implementation. The second one could be HMS.
Although it has limitations, it has also benefits: it is easy to leverage
it in Drill, a lot of projects already use HMS (Spark, Presto ...),
so for some users it can be a good choice for storing metadata.

Other implementations for Drill Metastore could be discussed (MetaCat,
WhereHow, new own implementation based on HBase/MapR-DB).


Kind regards
Vitalii


On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin <al...@gmail.com>
wrote:

> Hi Volodymyr,
>
> Just recalling on recent discussions in DEV list, it would be interesting
> to see if following topics are addressed in the Drill metadata management
> initiative:
>
> 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> Just to substantiate this point of view from practical experience, and if
> we reflect on ambition to integrate and operate Drill in mission-critical
> environment, following aspects could be listed:
>   - Need of DBA support if cluster is subject to service level
> objectives/agreements, which is somehow remote from Hadoop world. Need of
> strong DBA skills if resulting DB workload is challenging in terms of
> performance tuning.
>   - Common RDBMS setups offer active-standby HA model. In secure
> environments, e.g. environments which are subject to PCI-DSS compliancy,
> that implies frequent OS patching and reboot (in reality every 30 days
> max), thus causing an additional coordination effort and service outage for
> duration of the failovers.
>   - Active-active HA clusters like Galera / Percona are free of above
> disadvantage, but require specific skill set which is not widespread in DBA
> community. Also they are sensitive to even disk IO performance across the
> cluster which may require additional hardware adjustment and IO isolation.
>   - Need of backup / restore mechanism, which is probably lesser of
> concerns
>
> 2. Bottleneck in foreman when performing initial metadata collection (and
> eventually pruning) on large amount of Parquet files
>   - From discussion in the mailing list it was not fully clear whether
> metastore will address it
>   - Or shall this discussion be continued outside of metastore initiative
> from your point of view?
>
> I hope it would be OK with you and Vitalii to share some thoughts on this.
>
> Thanks & Best Regards,
> Alex
>
> On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <volodymyr@apache.org
> >
> wrote:
>
> > Hi all,
> >
> > I and Vitalii Diravka want to give the presentation with our ideas
> > connected with Drill Metadata management project (DRILL-6552
> > <https://issues.apache.org/jira/browse/DRILL-6552>).
> >
> > We will be happy to discuss it and choose the right way for further
> > development.
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <hanu.ncr@gmail.com
> >
> > wrote:
> >
> > > The Apache Drill Hangout will be held tomorrow at 10:00am PST; please
> let
> > > us know should you have a topic for tomorrow's hangout. We will also
> ask
> > > for topics at the beginning of the hangout.
> > >
> > > Hangout Link -
> > >
> https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> > >
> > > Regards,
> > > Hanu
> > >
> >
>

Re: Drill Hangout tomorrow 08/21

Posted by Vitalii Diravka <vi...@gmail.com>.

Hi Alex,

The issues pointed by you really exist. And using of HMS is still open
question.

The main goal is to make Drill Metastore API, which can be used for
different Drill data sources. Then to adapt current Parquet metadata cache
files mechanism to this API.
It will be the first implementation. The second one could be HMS.
Although it has limitations, it has also benefits: it is easy to leverage
it in Drill, a lot of projects already use HMS (Spark, Presto ...),
so for some users it can be a good choice for storing metadata.

Other implementations for Drill Metastore could be discussed (MetaCat,
WhereHow, new own implementation based on HBase/MapR-DB).


Kind regards
Vitalii


On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin <al...@gmail.com>
wrote:

> Hi Volodymyr,
>
> Just recalling on recent discussions in DEV list, it would be interesting
> to see if following topics are addressed in the Drill metadata management
> initiative:
>
> 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> Just to substantiate this point of view from practical experience, and if
> we reflect on ambition to integrate and operate Drill in mission-critical
> environment, following aspects could be listed:
>   - Need of DBA support if cluster is subject to service level
> objectives/agreements, which is somehow remote from Hadoop world. Need of
> strong DBA skills if resulting DB workload is challenging in terms of
> performance tuning.
>   - Common RDBMS setups offer active-standby HA model. In secure
> environments, e.g. environments which are subject to PCI-DSS compliancy,
> that implies frequent OS patching and reboot (in reality every 30 days
> max), thus causing an additional coordination effort and service outage for
> duration of the failovers.
>   - Active-active HA clusters like Galera / Percona are free of above
> disadvantage, but require specific skill set which is not widespread in DBA
> community. Also they are sensitive to even disk IO performance across the
> cluster which may require additional hardware adjustment and IO isolation.
>   - Need of backup / restore mechanism, which is probably lesser of
> concerns
>
> 2. Bottleneck in foreman when performing initial metadata collection (and
> eventually pruning) on large amount of Parquet files
>   - From discussion in the mailing list it was not fully clear whether
> metastore will address it
>   - Or shall this discussion be continued outside of metastore initiative
> from your point of view?
>
> I hope it would be OK with you and Vitalii to share some thoughts on this.
>
> Thanks & Best Regards,
> Alex
>
> On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <volodymyr@apache.org
> >
> wrote:
>
> > Hi all,
> >
> > I and Vitalii Diravka want to give the presentation with our ideas
> > connected with Drill Metadata management project (DRILL-6552
> > <https://issues.apache.org/jira/browse/DRILL-6552>).
> >
> > We will be happy to discuss it and choose the right way for further
> > development.
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <hanu.ncr@gmail.com
> >
> > wrote:
> >
> > > The Apache Drill Hangout will be held tomorrow at 10:00am PST; please
> let
> > > us know should you have a topic for tomorrow's hangout. We will also
> ask
> > > for topics at the beginning of the hangout.
> > >
> > > Hangout Link -
> > >
> https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> > >
> > > Regards,
> > > Hanu
> > >
> >
>

Re: Drill Hangout tomorrow 08/21

Posted by Oleksandr Kalinin <al...@gmail.com>.

Hi Volodymyr,

Just recalling on recent discussions in DEV list, it would be interesting
to see if following topics are addressed in the Drill metadata management
initiative:

1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
Just to substantiate this point of view from practical experience, and if
we reflect on ambition to integrate and operate Drill in mission-critical
environment, following aspects could be listed:
  - Need of DBA support if cluster is subject to service level
objectives/agreements, which is somehow remote from Hadoop world. Need of
strong DBA skills if resulting DB workload is challenging in terms of
performance tuning.
  - Common RDBMS setups offer active-standby HA model. In secure
environments, e.g. environments which are subject to PCI-DSS compliancy,
that implies frequent OS patching and reboot (in reality every 30 days
max), thus causing an additional coordination effort and service outage for
duration of the failovers.
  - Active-active HA clusters like Galera / Percona are free of above
disadvantage, but require specific skill set which is not widespread in DBA
community. Also they are sensitive to even disk IO performance across the
cluster which may require additional hardware adjustment and IO isolation.
  - Need of backup / restore mechanism, which is probably lesser of concerns

2. Bottleneck in foreman when performing initial metadata collection (and
eventually pruning) on large amount of Parquet files
  - From discussion in the mailing list it was not fully clear whether
metastore will address it
  - Or shall this discussion be continued outside of metastore initiative
from your point of view?

I hope it would be OK with you and Vitalii to share some thoughts on this.

Thanks & Best Regards,
Alex

On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <vo...@apache.org>
wrote:

> Hi all,
>
> I and Vitalii Diravka want to give the presentation with our ideas
> connected with Drill Metadata management project (DRILL-6552
> <https://issues.apache.org/jira/browse/DRILL-6552>).
>
> We will be happy to discuss it and choose the right way for further
> development.
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <ha...@gmail.com>
> wrote:
>
> > The Apache Drill Hangout will be held tomorrow at 10:00am PST; please let
> > us know should you have a topic for tomorrow's hangout. We will also ask
> > for topics at the beginning of the hangout.
> >
> > Hangout Link -
> > https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> >
> > Regards,
> > Hanu
> >
>

Re: Drill Hangout tomorrow 08/21

Posted by Oleksandr Kalinin <al...@gmail.com>.

Hi Volodymyr,

Just recalling on recent discussions in DEV list, it would be interesting
to see if following topics are addressed in the Drill metadata management
initiative:

1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
Just to substantiate this point of view from practical experience, and if
we reflect on ambition to integrate and operate Drill in mission-critical
environment, following aspects could be listed:
  - Need of DBA support if cluster is subject to service level
objectives/agreements, which is somehow remote from Hadoop world. Need of
strong DBA skills if resulting DB workload is challenging in terms of
performance tuning.
  - Common RDBMS setups offer active-standby HA model. In secure
environments, e.g. environments which are subject to PCI-DSS compliancy,
that implies frequent OS patching and reboot (in reality every 30 days
max), thus causing an additional coordination effort and service outage for
duration of the failovers.
  - Active-active HA clusters like Galera / Percona are free of above
disadvantage, but require specific skill set which is not widespread in DBA
community. Also they are sensitive to even disk IO performance across the
cluster which may require additional hardware adjustment and IO isolation.
  - Need of backup / restore mechanism, which is probably lesser of concerns

2. Bottleneck in foreman when performing initial metadata collection (and
eventually pruning) on large amount of Parquet files
  - From discussion in the mailing list it was not fully clear whether
metastore will address it
  - Or shall this discussion be continued outside of metastore initiative
from your point of view?

I hope it would be OK with you and Vitalii to share some thoughts on this.

Thanks & Best Regards,
Alex

On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <vo...@apache.org>
wrote:

> Hi all,
>
> I and Vitalii Diravka want to give the presentation with our ideas
> connected with Drill Metadata management project (DRILL-6552
> <https://issues.apache.org/jira/browse/DRILL-6552>).
>
> We will be happy to discuss it and choose the right way for further
> development.
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <ha...@gmail.com>
> wrote:
>
> > The Apache Drill Hangout will be held tomorrow at 10:00am PST; please let
> > us know should you have a topic for tomorrow's hangout. We will also ask
> > for topics at the beginning of the hangout.
> >
> > Hangout Link -
> > https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> >
> > Regards,
> > Hanu
> >
>

Re: Drill Hangout tomorrow 08/21

Posted by Volodymyr Vysotskyi <vo...@apache.org>.

Hi all,

I and Vitalii Diravka want to give the presentation with our ideas
connected with Drill Metadata management project (DRILL-6552
<https://issues.apache.org/jira/browse/DRILL-6552>).

We will be happy to discuss it and choose the right way for further
development.

Kind regards,
Volodymyr Vysotskyi

On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <ha...@gmail.com>
wrote:

> The Apache Drill Hangout will be held tomorrow at 10:00am PST; please let
> us know should you have a topic for tomorrow's hangout. We will also ask
> for topics at the beginning of the hangout.
>
> Hangout Link -
> https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
>
> Regards,
> Hanu
>

Re: Drill Hangout tomorrow 08/21

Posted by Volodymyr Vysotskyi <vo...@apache.org>.

Hi all,

I and Vitalii Diravka want to give the presentation with our ideas
connected with Drill Metadata management project (DRILL-6552
<https://issues.apache.org/jira/browse/DRILL-6552>).

We will be happy to discuss it and choose the right way for further
development.

Kind regards,
Volodymyr Vysotskyi

On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <ha...@gmail.com>
wrote:

> The Apache Drill Hangout will be held tomorrow at 10:00am PST; please let
> us know should you have a topic for tomorrow's hangout. We will also ask
> for topics at the beginning of the hangout.
>
> Hangout Link -
> https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
>
> Regards,
> Hanu
>