You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Purushotham Pushpavanthar <pu...@gmail.com> on 2020/01/06 10:17:24 UTC

Re: Regards to have Athena Metastore Sync

Hi Vinoth,

Since *schema registr*y is source of truth and Hive Meta store is
translation it,
having option to update multiple metastores in Hudi would help here in this
case.
Similar to what Syed mentioned, same Hudi dataset can be exposed in
multiple
places like Athena, Redshift Spectrum, on prem Presto, Hive etc where
datasets's
meta data is not shared with each other.

Regards,
Purushotham Pushpavanth



On Wed, 1 Jan 2020 at 00:46, Vinoth Chandar <vi...@apache.org> wrote:

> Can one of the aws folks please chime in here? IIRC I saw some tweets
> mentioning Hudi/Athena support is in the works.
> Not sure myself.
>
> On Sun, Dec 29, 2019 at 11:33 PM Syed Abdul Kather <in...@gmail.com>
> wrote:
>
> > Hi Team,
> >
> > We have built the  "CDC  pipeline with apache hudi and debezium" .  It
> > works very well in our production.
> >
> > But we have inhouse Ambari  Cluster with Hive metastore for all the ETL
> > purpose and Athena for all analytics purposes.  To make hudi table we
> work
> > on the athena we have preserved only the latest version and create the
> > table in parquet format .
> >
> > Right now hive metastore get update using hudi itself . But to keep the
> > athena metastore in sync we have wrote a separate script to manage. But
> > that looks like not right approach . As only the required the affected
> > partition needs to be updated in athena side.
> >
> > Please suggest as right approach here .
> >
> >             Thanks and Regards,
> >         S SYED ABDUL KATHER
> >
>

Re: Regards to have Athena Metastore Sync

Posted by Vinoth Chandar <vi...@apache.org>.
>>multiple places like Athena, Redshift Spectrum, on prem Presto, Hive etc
I was under the impression that most of them can read the schema the from
Hive metastore, already? It's atleast true of Presto. Glue catalog in AWS
again adheres to the Hive metastore protocol/APIs, so Hudi (IIRC) can
register to the Glue catalog using the same mechanism.

I am supportive if we can establish we need to support additional
metastores explicitly..

On Mon, Jan 6, 2020 at 5:13 AM Syed Abdul Kather <in...@gmail.com> wrote:

> Hi Vinoth,
>
> As discussed by puru. Please suggest as on supporting the multiple
> megastores or if there is any better way.
>             Thanks and Regards,
>         S SYED ABDUL KATHER
>
>
>
> On Mon, Jan 6, 2020 at 3:47 PM Purushotham Pushpavanthar <
> pushpavanthar@gmail.com> wrote:
>
> > Hi Vinoth,
> >
> > Since *schema registr*y is source of truth and Hive Meta store is
> > translation it,
> > having option to update multiple metastores in Hudi would help here in
> this
> > case.
> > Similar to what Syed mentioned, same Hudi dataset can be exposed in
> > multiple
> > places like Athena, Redshift Spectrum, on prem Presto, Hive etc where
> > datasets's
> > meta data is not shared with each other.
> >
> > Regards,
> > Purushotham Pushpavanth
> >
> >
> >
> > On Wed, 1 Jan 2020 at 00:46, Vinoth Chandar <vi...@apache.org> wrote:
> >
> > > Can one of the aws folks please chime in here? IIRC I saw some tweets
> > > mentioning Hudi/Athena support is in the works.
> > > Not sure myself.
> > >
> > > On Sun, Dec 29, 2019 at 11:33 PM Syed Abdul Kather <in.abdul@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > We have built the  "CDC  pipeline with apache hudi and debezium" .
> It
> > > > works very well in our production.
> > > >
> > > > But we have inhouse Ambari  Cluster with Hive metastore for all the
> ETL
> > > > purpose and Athena for all analytics purposes.  To make hudi table we
> > > work
> > > > on the athena we have preserved only the latest version and create
> the
> > > > table in parquet format .
> > > >
> > > > Right now hive metastore get update using hudi itself . But to keep
> the
> > > > athena metastore in sync we have wrote a separate script to manage.
> But
> > > > that looks like not right approach . As only the required the
> affected
> > > > partition needs to be updated in athena side.
> > > >
> > > > Please suggest as right approach here .
> > > >
> > > >             Thanks and Regards,
> > > >         S SYED ABDUL KATHER
> > > >
> > >
> >
>

Re: Regards to have Athena Metastore Sync

Posted by Syed Abdul Kather <in...@gmail.com>.
Hi Vinoth,

As discussed by puru. Please suggest as on supporting the multiple
megastores or if there is any better way.
            Thanks and Regards,
        S SYED ABDUL KATHER



On Mon, Jan 6, 2020 at 3:47 PM Purushotham Pushpavanthar <
pushpavanthar@gmail.com> wrote:

> Hi Vinoth,
>
> Since *schema registr*y is source of truth and Hive Meta store is
> translation it,
> having option to update multiple metastores in Hudi would help here in this
> case.
> Similar to what Syed mentioned, same Hudi dataset can be exposed in
> multiple
> places like Athena, Redshift Spectrum, on prem Presto, Hive etc where
> datasets's
> meta data is not shared with each other.
>
> Regards,
> Purushotham Pushpavanth
>
>
>
> On Wed, 1 Jan 2020 at 00:46, Vinoth Chandar <vi...@apache.org> wrote:
>
> > Can one of the aws folks please chime in here? IIRC I saw some tweets
> > mentioning Hudi/Athena support is in the works.
> > Not sure myself.
> >
> > On Sun, Dec 29, 2019 at 11:33 PM Syed Abdul Kather <in...@gmail.com>
> > wrote:
> >
> > > Hi Team,
> > >
> > > We have built the  "CDC  pipeline with apache hudi and debezium" .  It
> > > works very well in our production.
> > >
> > > But we have inhouse Ambari  Cluster with Hive metastore for all the ETL
> > > purpose and Athena for all analytics purposes.  To make hudi table we
> > work
> > > on the athena we have preserved only the latest version and create the
> > > table in parquet format .
> > >
> > > Right now hive metastore get update using hudi itself . But to keep the
> > > athena metastore in sync we have wrote a separate script to manage. But
> > > that looks like not right approach . As only the required the affected
> > > partition needs to be updated in athena side.
> > >
> > > Please suggest as right approach here .
> > >
> > >             Thanks and Regards,
> > >         S SYED ABDUL KATHER
> > >
> >
>