You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Owen O'Malley <ow...@gmail.com> on 2019/01/07 21:09:20 UTC

Re: Iceberg and Hive

The group has moved to the Apache infrastructure, so we should use
dev@iceberg.apache.org .

What is required, but not started, is for someone to implement Hive's
RawStore API with an Iceberg backend. That would let you use Hive SQL
commands to manipulate the Iceberg tables.

.. Owen


On Mon, Jan 7, 2019 at 1:01 PM 'Vladi Feigin' via Iceberg Developers <
iceberg-devel@googlegroups.com> wrote:

> Hello ,
>
> I still confused a bit how Iceberg interacts with Hive (metastore).
> In our case we have many Hive tables and a lot Spark and Presto jobs
> reading, creating, writing to Hive
> Moving to Iceberg, even gradually raising a few questions :
> 1. Are new tables created via Iceberg visible (by sparlk/presto) in Hive
> metastore as well?
> 2. Should we migrate somehow existing Hive tables to be supported by
> Iceberg?
> 3. Is there any impact on the existing (spark,presto) jobs when moving to
> Iceberg?
>
> I understand that creating a new system from scratch with Iceberg is
> probably easier comparing to the projects heavily using Hive metastore but
> this is the use case in a lot of projects nowdays
> Thank you
> Vladi Feigin
>
> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to iceberg-devel+unsubscribe@googlegroups.com.
> To post to this group, send email to iceberg-devel@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/5d38541c-f73f-471f-b8db-5430238c4376%40googlegroups.com
> <https://groups.google.com/d/msgid/iceberg-devel/5d38541c-f73f-471f-b8db-5430238c4376%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: Iceberg and Hive

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Vladi,

I'll add a little to Owen's answer for context. Owen was right that using
an Iceberg table in Hive will require some work implementing the RawStore
API. But the `iceberg-hive` module will currently use the Hive Metastore to
keep track of Iceberg metadata.

An Iceberg table isn't a Hive table. Iceberg requires extra metadata and
doesn't meet assumptions that Hive makes about data because Iceberg tracks
what files are in a table differently. Still, most people want to use a
Hive Metastore instance to track Iceberg tables because they already have
one. That's what iceberg-hive provides. It stores Iceberg's root metadata
location and ensures changes to that location through the Iceberg library
are atomic.

While you can use iceberg-hive to keep track of tables, engines still need
to use iceberg-hive to access Iceberg tables, too. Right now, the only one
that does this is Presto, in the open PR. I also need to update the Spark
support to use iceberg-hive by default and not just HDFS-based tables. This
is an issue we intend to get done for the 1.0 release.

I hope that helps!

rb

On Mon, Jan 7, 2019 at 1:09 PM Owen O'Malley <ow...@gmail.com> wrote:

> The group has moved to the Apache infrastructure, so we should use
> dev@iceberg.apache.org .
>
> What is required, but not started, is for someone to implement Hive's
> RawStore API with an Iceberg backend. That would let you use Hive SQL
> commands to manipulate the Iceberg tables.
>
> .. Owen
>
>
> On Mon, Jan 7, 2019 at 1:01 PM 'Vladi Feigin' via Iceberg Developers <
> iceberg-devel@googlegroups.com> wrote:
>
>> Hello ,
>>
>> I still confused a bit how Iceberg interacts with Hive (metastore).
>> In our case we have many Hive tables and a lot Spark and Presto jobs
>> reading, creating, writing to Hive
>> Moving to Iceberg, even gradually raising a few questions :
>> 1. Are new tables created via Iceberg visible (by sparlk/presto) in Hive
>> metastore as well?
>> 2. Should we migrate somehow existing Hive tables to be supported by
>> Iceberg?
>> 3. Is there any impact on the existing (spark,presto) jobs when moving to
>> Iceberg?
>>
>> I understand that creating a new system from scratch with Iceberg is
>> probably easier comparing to the projects heavily using Hive metastore but
>> this is the use case in a lot of projects nowdays
>> Thank you
>> Vladi Feigin
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Iceberg Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to iceberg-devel+unsubscribe@googlegroups.com.
>> To post to this group, send email to iceberg-devel@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/iceberg-devel/5d38541c-f73f-471f-b8db-5430238c4376%40googlegroups.com
>> <https://groups.google.com/d/msgid/iceberg-devel/5d38541c-f73f-471f-b8db-5430238c4376%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to iceberg-devel+unsubscribe@googlegroups.com.
> To post to this group, send email to iceberg-devel@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/CAHfHakFL43a5c8zOXF5voYK4DU2Byq7XMeoJL%3DWqvab7KGYL-A%40mail.gmail.com
> <https://groups.google.com/d/msgid/iceberg-devel/CAHfHakFL43a5c8zOXF5voYK4DU2Byq7XMeoJL%3DWqvab7KGYL-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg and Hive

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Arvind,

Iceberg tables don't work like Hive tables and you can't use Presto's Hive
support to read them. The pull request for Presto adds a new type of Presto
catalog for Iceberg tables that has its own implementation for calculating
splits. Once splits are calculated, Iceberg reuses some of the same classes
for reading Parquet files.

rb

On Wed, Jan 9, 2019 at 6:29 PM Arvind Pruthi <ap...@linkedin.com> wrote:

> @Ryan Blue <rb...@netflix.com> my understanding is that Presto typically
> gets a list of partitions from HMS; caches it and applies predicates
> against this list in response to queries. Since you use Presto at Netflix,
> how does this work with Iceberg if Iceberg hides the partition list? Did
> you have to re-write portions of Presto to make it happen or only a new
> Presto connector was enough?
>
>
>
> Thanks
>
> Arvind
>
>
>
>
>
> *From: *Ryan Blue <rb...@netflix.com>
> *Reply-To: *"rblue@netflix.com" <rb...@netflix.com>
> *Date: *Tuesday, January 8, 2019 at 3:13 PM
> *To: *Vladi Feigin <vl...@wix.com>
> *Cc: *Arvind Pruthi <ap...@linkedin.com>, "dev@iceberg.apache.org" <
> dev@iceberg.apache.org>, Owen O'Malley <ow...@gmail.com>, "
> rdblue@netflix.com" <rd...@netflix.com>, Iceberg Developers <
> iceberg-devel@googlegroups.com>
> *Subject: *Re: Iceberg and Hive
>
>
>
> > when iceberg-hive will be integrated into Presto and Spark - does it
> mean that an Iceberg table created in Spark or Presto will  be recorded in
> HMS and visible to other engines
>
>
>
> These tables are visible to any HMS client, but not readable. An engine
> needs Iceberg support to read and write Iceberg tables.
>
>
>
> > Operations like: "listPartitions" that are a regular feature in Hive
> Metastore Client API don't appear to be very straightforward in Iceberg.
>
>
>
> Iceberg doesn't support these operations because it is trying to hide
> partitioning. See http://iceberg.apache.org/partitioning/
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Ficeberg.apache.org%2Fpartitioning%2F&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377280187&sdata=h%2Fx1PYy2UV5fQWEWZuqeMtGykhYMuWwaoaUTMkdIFs0%3D&reserved=0>
>
>
>
> > Owen and you mentioned "an implementation of Hive's Rawstore API" . Is
> it something in your plans?
>
>
>
> I don't intend to work on Hive support, but I would certainly help if
> someone wanted to contribute support to Hive. We have no need for Hive
> support because we use primarily Presto and Spark.
>
>
>
> On Tue, Jan 8, 2019 at 2:07 PM 'Vladi Feigin' via Iceberg Developers <
> iceberg-devel@googlegroups.com> wrote:
>
> @Arvid - Agree with you. Smooth integration with Hive it's very important
> or even critical IMHO.
>
> Iceberg guys vision for this is very interesting.
>
>
>
> @Ryan - Owen and you mentioned "an implementation of Hive's Rawstore API"
> . Is it something in your plans?
>
>     Might be we can help to progress with it and contribute this part.
>
>
>
>   Regarding "tracking Iceberg tables in Metastore" - when iceberg-hive
> will be integrated into Presto and Spark - does it mean that an Iceberg
> table created in Spark or Presto will  be recorded in HMS and visible to
> other engines ? Like today, if a table created by Spark in Hive is visible
> in Presto and vice versa ?
>
> Thank you,
>
> Vladi
>
>
>
>
>
> On Tue, Jan 8, 2019 at 3:05 AM Arvind Pruthi <ap...@linkedin.com> wrote:
>
> Vladi,
> We have similarities to what you describe.
>
> While I agree that what Owen mentioned about an implementation of Hive's
> Rawstore API will be really useful, I don't believe it fully answers
> Vladi's question. I think the main concern here is smooth migration of
> existing clients to iceberg tables and what happens in a hybrid world when
> all the tables are not fully switched to iceberg? @Owen O'Malley, @
> rdblue@netflix.com do you have any thoughts on this?
>
> I have an additional concern: Operations like: "listPartitions" that are a
> regular feature in Hive Metastore Client API don't appear to be very
> straightforward in Iceberg. Doesn't Presto or other clients used to getting
> a list of partitions? Any thoughts on these would be very helpful.
>
> Thanks,
> Arvind
>
> On 1/7/19, 1:19 PM, "Owen O'Malley" <ow...@gmail.com> wrote:
>
>     The group has moved to the Apache infrastructure, so we should use
>     dev@iceberg.apache.org .
>
>     What is required, but not started, is for someone to implement Hive's
>     RawStore API with an Iceberg backend. That would let you use Hive SQL
>     commands to manipulate the Iceberg tables.
>
>     .. Owen
>
>
>     On Mon, Jan 7, 2019 at 1:01 PM 'Vladi Feigin' via Iceberg Developers <
>     iceberg-devel@googlegroups.com> wrote:
>
>     > Hello ,
>     >
>     > I still confused a bit how Iceberg interacts with Hive (metastore).
>     > In our case we have many Hive tables and a lot Spark and Presto jobs
>     > reading, creating, writing to Hive
>     > Moving to Iceberg, even gradually raising a few questions :
>     > 1. Are new tables created via Iceberg visible (by sparlk/presto) in
> Hive
>     > metastore as well?
>     > 2. Should we migrate somehow existing Hive tables to be supported by
>     > Iceberg?
>     > 3. Is there any impact on the existing (spark,presto) jobs when
> moving to
>     > Iceberg?
>     >
>     > I understand that creating a new system from scratch with Iceberg is
>     > probably easier comparing to the projects heavily using Hive
> metastore but
>     > this is the use case in a lot of projects nowdays
>     > Thank you
>     > Vladi Feigin
>     >
>     > --
>     > You received this message because you are subscribed to the Google
> Groups
>     > "Iceberg Developers" group.
>     > To unsubscribe from this group and stop receiving emails from it,
> send an
>     > email to iceberg-devel+unsubscribe@googlegroups.com.
>     > To post to this group, send email to iceberg-devel@googlegroups.com.
>     > To view this discussion on the web visit
>     >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=%2BDoc4lEFexkeCipgBmoht8ZfEhfi3beQmXmNguxALYU%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377290195&sdata=Oi6l1Ob%2BQWfdT82TlgOb0BjYu37h9shzZO%2Bpl1trDL8%3D&reserved=0>
>     > <
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=jCCxuMCjvgswEJQv0twAAd1IrxbLIjhfpKy8cgoiypg%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377300208&sdata=I6ZaJU0gpNkvaEvrUV6H5a%2FWHEEp6KWyA7FOneYCsSw%3D&reserved=0>
> >
>     > .
>     > For more options, visit
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=i0%2BMLZRVy50HywHfxIA%2FsWxlTXn%2B74UXBPTHlZJ7RRA%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377300208&sdata=MeNaPaAym%2B8%2B6l7Jv9%2BG48E953F2sgBk8gSIglI4%2Fi8%3D&reserved=0>
> .
>     >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to iceberg-devel+unsubscribe@googlegroups.com.
> To post to this group, send email to iceberg-devel@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/CAE9RnPhQ-2jHbHDHOSJRSoX80W-DqT6YZwu8Mqnq-%3Df%2BHkXC3w%40mail.gmail.com
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2FCAE9RnPhQ-2jHbHDHOSJRSoX80W-DqT6YZwu8Mqnq-%253Df%252BHkXC3w%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377310216&sdata=vse0EjC5pJ9cfNwp9A9robhipbCoNCrOlejjvQyzQgY%3D&reserved=0>
> .
> For more options, visit https://groups.google.com/d/optout
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377310216&sdata=oxupMW%2B4nv%2B7ND9zCYzy44AMxS9jL%2FThihzY7fTQXIo%3D&reserved=0>
> .
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg and Hive

Posted by Arvind Pruthi <ap...@linkedin.com>.
@Ryan Blue<ma...@netflix.com> my understanding is that Presto typically gets a list of partitions from HMS; caches it and applies predicates against this list in response to queries. Since you use Presto at Netflix, how does this work with Iceberg if Iceberg hides the partition list? Did you have to re-write portions of Presto to make it happen or only a new Presto connector was enough?

Thanks
Arvind


From: Ryan Blue <rb...@netflix.com>
Reply-To: "rblue@netflix.com" <rb...@netflix.com>
Date: Tuesday, January 8, 2019 at 3:13 PM
To: Vladi Feigin <vl...@wix.com>
Cc: Arvind Pruthi <ap...@linkedin.com>, "dev@iceberg.apache.org" <de...@iceberg.apache.org>, Owen O'Malley <ow...@gmail.com>, "rdblue@netflix.com" <rd...@netflix.com>, Iceberg Developers <ic...@googlegroups.com>
Subject: Re: Iceberg and Hive

> when iceberg-hive will be integrated into Presto and Spark - does it mean that an Iceberg table created in Spark or Presto will  be recorded in HMS and visible to other engines

These tables are visible to any HMS client, but not readable. An engine needs Iceberg support to read and write Iceberg tables.

> Operations like: "listPartitions" that are a regular feature in Hive Metastore Client API don't appear to be very straightforward in Iceberg.

Iceberg doesn't support these operations because it is trying to hide partitioning. See http://iceberg.apache.org/partitioning/<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Ficeberg.apache.org%2Fpartitioning%2F&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377280187&sdata=h%2Fx1PYy2UV5fQWEWZuqeMtGykhYMuWwaoaUTMkdIFs0%3D&reserved=0>

> Owen and you mentioned "an implementation of Hive's Rawstore API" . Is it something in your plans?

I don't intend to work on Hive support, but I would certainly help if someone wanted to contribute support to Hive. We have no need for Hive support because we use primarily Presto and Spark.

On Tue, Jan 8, 2019 at 2:07 PM 'Vladi Feigin' via Iceberg Developers <ic...@googlegroups.com>> wrote:
@Arvid - Agree with you. Smooth integration with Hive it's very important or even critical IMHO.
Iceberg guys vision for this is very interesting.

@Ryan - Owen and you mentioned "an implementation of Hive's Rawstore API" . Is it something in your plans?
    Might be we can help to progress with it and contribute this part.

  Regarding "tracking Iceberg tables in Metastore" - when iceberg-hive will be integrated into Presto and Spark - does it mean that an Iceberg table created in Spark or Presto will  be recorded in HMS and visible to other engines ? Like today, if a table created by Spark in Hive is visible in Presto and vice versa ?
Thank you,
Vladi


On Tue, Jan 8, 2019 at 3:05 AM Arvind Pruthi <ap...@linkedin.com>> wrote:
Vladi,
We have similarities to what you describe.

While I agree that what Owen mentioned about an implementation of Hive's Rawstore API will be really useful, I don't believe it fully answers Vladi's question. I think the main concern here is smooth migration of existing clients to iceberg tables and what happens in a hybrid world when all the tables are not fully switched to iceberg? @Owen O'Malley, @rdblue@netflix.com<ma...@netflix.com> do you have any thoughts on this?

I have an additional concern: Operations like: "listPartitions" that are a regular feature in Hive Metastore Client API don't appear to be very straightforward in Iceberg. Doesn't Presto or other clients used to getting a list of partitions? Any thoughts on these would be very helpful.

Thanks,
Arvind

On 1/7/19, 1:19 PM, "Owen O'Malley" <ow...@gmail.com>> wrote:

    The group has moved to the Apache infrastructure, so we should use
    dev@iceberg.apache.org<ma...@iceberg.apache.org> .

    What is required, but not started, is for someone to implement Hive's
    RawStore API with an Iceberg backend. That would let you use Hive SQL
    commands to manipulate the Iceberg tables.

    .. Owen


    On Mon, Jan 7, 2019 at 1:01 PM 'Vladi Feigin' via Iceberg Developers <
    iceberg-devel@googlegroups.com<ma...@googlegroups.com>> wrote:

    > Hello ,
    >
    > I still confused a bit how Iceberg interacts with Hive (metastore).
    > In our case we have many Hive tables and a lot Spark and Presto jobs
    > reading, creating, writing to Hive
    > Moving to Iceberg, even gradually raising a few questions :
    > 1. Are new tables created via Iceberg visible (by sparlk/presto) in Hive
    > metastore as well?
    > 2. Should we migrate somehow existing Hive tables to be supported by
    > Iceberg?
    > 3. Is there any impact on the existing (spark,presto) jobs when moving to
    > Iceberg?
    >
    > I understand that creating a new system from scratch with Iceberg is
    > probably easier comparing to the projects heavily using Hive metastore but
    > this is the use case in a lot of projects nowdays
    > Thank you
    > Vladi Feigin
    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "Iceberg Developers" group.
    > To unsubscribe from this group and stop receiving emails from it, send an
    > email to iceberg-devel+unsubscribe@googlegroups.com<ma...@googlegroups.com>.
    > To post to this group, send email to iceberg-devel@googlegroups.com<ma...@googlegroups.com>.
    > To view this discussion on the web visit
    > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=%2BDoc4lEFexkeCipgBmoht8ZfEhfi3beQmXmNguxALYU%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377290195&sdata=Oi6l1Ob%2BQWfdT82TlgOb0BjYu37h9shzZO%2Bpl1trDL8%3D&reserved=0>
    > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=jCCxuMCjvgswEJQv0twAAd1IrxbLIjhfpKy8cgoiypg%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377300208&sdata=I6ZaJU0gpNkvaEvrUV6H5a%2FWHEEp6KWyA7FOneYCsSw%3D&reserved=0>>
    > .
    > For more options, visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=i0%2BMLZRVy50HywHfxIA%2FsWxlTXn%2B74UXBPTHlZJ7RRA%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377300208&sdata=MeNaPaAym%2B8%2B6l7Jv9%2BG48E953F2sgBk8gSIglI4%2Fi8%3D&reserved=0>.
    >

--
You received this message because you are subscribed to the Google Groups "Iceberg Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iceberg-devel+unsubscribe@googlegroups.com<ma...@googlegroups.com>.
To post to this group, send email to iceberg-devel@googlegroups.com<ma...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/iceberg-devel/CAE9RnPhQ-2jHbHDHOSJRSoX80W-DqT6YZwu8Mqnq-%3Df%2BHkXC3w%40mail.gmail.com<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2FCAE9RnPhQ-2jHbHDHOSJRSoX80W-DqT6YZwu8Mqnq-%253Df%252BHkXC3w%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377310216&sdata=vse0EjC5pJ9cfNwp9A9robhipbCoNCrOlejjvQyzQgY%3D&reserved=0>.
For more options, visit https://groups.google.com/d/optout<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=02%7C01%7Capruthi%40linkedin.com%7C162b090dd5874d58435a08d675bef60f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636825860377310216&sdata=oxupMW%2B4nv%2B7ND9zCYzy44AMxS9jL%2FThihzY7fTQXIo%3D&reserved=0>.


--
Ryan Blue
Software Engineer
Netflix

Re: Iceberg and Hive

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
> when iceberg-hive will be integrated into Presto and Spark - does it mean
that an Iceberg table created in Spark or Presto will  be recorded in HMS
and visible to other engines

These tables are visible to any HMS client, but not readable. An engine
needs Iceberg support to read and write Iceberg tables.

> Operations like: "listPartitions" that are a regular feature in Hive
Metastore Client API don't appear to be very straightforward in Iceberg.

Iceberg doesn't support these operations because it is trying to hide
partitioning. See http://iceberg.apache.org/partitioning/

> Owen and you mentioned "an implementation of Hive's Rawstore API" . Is it
something in your plans?

I don't intend to work on Hive support, but I would certainly help if
someone wanted to contribute support to Hive. We have no need for Hive
support because we use primarily Presto and Spark.

On Tue, Jan 8, 2019 at 2:07 PM 'Vladi Feigin' via Iceberg Developers <
iceberg-devel@googlegroups.com> wrote:

> @Arvid - Agree with you. Smooth integration with Hive it's very important
> or even critical IMHO.
> Iceberg guys vision for this is very interesting.
>
> @Ryan - Owen and you mentioned "an implementation of Hive's Rawstore API"
> . Is it something in your plans?
>     Might be we can help to progress with it and contribute this part.
>
>   Regarding "tracking Iceberg tables in Metastore" - when iceberg-hive
> will be integrated into Presto and Spark - does it mean that an Iceberg
> table created in Spark or Presto will  be recorded in HMS and visible to
> other engines ? Like today, if a table created by Spark in Hive is visible
> in Presto and vice versa ?
> Thank you,
> Vladi
>
>
> On Tue, Jan 8, 2019 at 3:05 AM Arvind Pruthi <ap...@linkedin.com> wrote:
>
>> Vladi,
>> We have similarities to what you describe.
>>
>> While I agree that what Owen mentioned about an implementation of Hive's
>> Rawstore API will be really useful, I don't believe it fully answers
>> Vladi's question. I think the main concern here is smooth migration of
>> existing clients to iceberg tables and what happens in a hybrid world when
>> all the tables are not fully switched to iceberg? @Owen O'Malley, @
>> rdblue@netflix.com do you have any thoughts on this?
>>
>> I have an additional concern: Operations like: "listPartitions" that are
>> a regular feature in Hive Metastore Client API don't appear to be very
>> straightforward in Iceberg. Doesn't Presto or other clients used to getting
>> a list of partitions? Any thoughts on these would be very helpful.
>>
>> Thanks,
>> Arvind
>>
>> On 1/7/19, 1:19 PM, "Owen O'Malley" <ow...@gmail.com> wrote:
>>
>>     The group has moved to the Apache infrastructure, so we should use
>>     dev@iceberg.apache.org .
>>
>>     What is required, but not started, is for someone to implement Hive's
>>     RawStore API with an Iceberg backend. That would let you use Hive SQL
>>     commands to manipulate the Iceberg tables.
>>
>>     .. Owen
>>
>>
>>     On Mon, Jan 7, 2019 at 1:01 PM 'Vladi Feigin' via Iceberg Developers <
>>     iceberg-devel@googlegroups.com> wrote:
>>
>>     > Hello ,
>>     >
>>     > I still confused a bit how Iceberg interacts with Hive (metastore).
>>     > In our case we have many Hive tables and a lot Spark and Presto jobs
>>     > reading, creating, writing to Hive
>>     > Moving to Iceberg, even gradually raising a few questions :
>>     > 1. Are new tables created via Iceberg visible (by sparlk/presto) in
>> Hive
>>     > metastore as well?
>>     > 2. Should we migrate somehow existing Hive tables to be supported by
>>     > Iceberg?
>>     > 3. Is there any impact on the existing (spark,presto) jobs when
>> moving to
>>     > Iceberg?
>>     >
>>     > I understand that creating a new system from scratch with Iceberg is
>>     > probably easier comparing to the projects heavily using Hive
>> metastore but
>>     > this is the use case in a lot of projects nowdays
>>     > Thank you
>>     > Vladi Feigin
>>     >
>>     > --
>>     > You received this message because you are subscribed to the Google
>> Groups
>>     > "Iceberg Developers" group.
>>     > To unsubscribe from this group and stop receiving emails from it,
>> send an
>>     > email to iceberg-devel+unsubscribe@googlegroups.com.
>>     > To post to this group, send email to iceberg-devel@googlegroups.com
>> .
>>     > To view this discussion on the web visit
>>     >
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=%2BDoc4lEFexkeCipgBmoht8ZfEhfi3beQmXmNguxALYU%3D&amp;reserved=0
>>     > <
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=jCCxuMCjvgswEJQv0twAAd1IrxbLIjhfpKy8cgoiypg%3D&amp;reserved=0
>> >
>>     > .
>>     > For more options, visit
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=i0%2BMLZRVy50HywHfxIA%2FsWxlTXn%2B74UXBPTHlZJ7RRA%3D&amp;reserved=0
>> .
>>     >
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to iceberg-devel+unsubscribe@googlegroups.com.
> To post to this group, send email to iceberg-devel@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/CAE9RnPhQ-2jHbHDHOSJRSoX80W-DqT6YZwu8Mqnq-%3Df%2BHkXC3w%40mail.gmail.com
> <https://groups.google.com/d/msgid/iceberg-devel/CAE9RnPhQ-2jHbHDHOSJRSoX80W-DqT6YZwu8Mqnq-%3Df%2BHkXC3w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg and Hive

Posted by Arvind Pruthi <ap...@linkedin.com>.
Vladi,
We have similarities to what you describe. 

While I agree that what Owen mentioned about an implementation of Hive's Rawstore API will be really useful, I don't believe it fully answers Vladi's question. I think the main concern here is smooth migration of existing clients to iceberg tables and what happens in a hybrid world when all the tables are not fully switched to iceberg? @Owen O'Malley, @rdblue@netflix.com do you have any thoughts on this?

I have an additional concern: Operations like: "listPartitions" that are a regular feature in Hive Metastore Client API don't appear to be very straightforward in Iceberg. Doesn't Presto or other clients used to getting a list of partitions? Any thoughts on these would be very helpful.

Thanks,
Arvind 

On 1/7/19, 1:19 PM, "Owen O'Malley" <ow...@gmail.com> wrote:

    The group has moved to the Apache infrastructure, so we should use
    dev@iceberg.apache.org .
    
    What is required, but not started, is for someone to implement Hive's
    RawStore API with an Iceberg backend. That would let you use Hive SQL
    commands to manipulate the Iceberg tables.
    
    .. Owen
    
    
    On Mon, Jan 7, 2019 at 1:01 PM 'Vladi Feigin' via Iceberg Developers <
    iceberg-devel@googlegroups.com> wrote:
    
    > Hello ,
    >
    > I still confused a bit how Iceberg interacts with Hive (metastore).
    > In our case we have many Hive tables and a lot Spark and Presto jobs
    > reading, creating, writing to Hive
    > Moving to Iceberg, even gradually raising a few questions :
    > 1. Are new tables created via Iceberg visible (by sparlk/presto) in Hive
    > metastore as well?
    > 2. Should we migrate somehow existing Hive tables to be supported by
    > Iceberg?
    > 3. Is there any impact on the existing (spark,presto) jobs when moving to
    > Iceberg?
    >
    > I understand that creating a new system from scratch with Iceberg is
    > probably easier comparing to the projects heavily using Hive metastore but
    > this is the use case in a lot of projects nowdays
    > Thank you
    > Vladi Feigin
    >
    > --
    > You received this message because you are subscribed to the Google Groups
    > "Iceberg Developers" group.
    > To unsubscribe from this group and stop receiving emails from it, send an
    > email to iceberg-devel+unsubscribe@googlegroups.com.
    > To post to this group, send email to iceberg-devel@googlegroups.com.
    > To view this discussion on the web visit
    > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=%2BDoc4lEFexkeCipgBmoht8ZfEhfi3beQmXmNguxALYU%3D&amp;reserved=0
    > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Ficeberg-devel%2F5d38541c-f73f-471f-b8db-5430238c4376%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=jCCxuMCjvgswEJQv0twAAd1IrxbLIjhfpKy8cgoiypg%3D&amp;reserved=0>
    > .
    > For more options, visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&amp;data=02%7C01%7Capruthi%40linkedin.com%7C0ff7ac0a61c94a5572ed08d674e5c501%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636824927540936290&amp;sdata=i0%2BMLZRVy50HywHfxIA%2FsWxlTXn%2B74UXBPTHlZJ7RRA%3D&amp;reserved=0.
    >