You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Elliot West <te...@gmail.com> on 2020/01/08 14:27:44 UTC

Apache Iceberg integration

Hello,

We're considering working on an integration of Apache Iceberg with Hive,
initially so that the latest snapshot of Iceberg tables can be queried via
Hive, but later to allow the writing of data using the Iceberg table format.

I wanted to first check for the existence and status of any similar efforts
so that we do not find ourselves duplicating work unnecessarily. I've
checked both the Iceberg and Hive projects and can find no issues that
suggest that such an integration is underway or planned (only HIVE-19457
<https://issues.apache.org/jira/browse/HIVE-19457> which was raised by
myself and remains open).

If one or more efforts is underway we'd certainly be open to contributing.
If not, we'd be keen to capture any thoughts from the community on
preferred or recommended technical approaches.

I see that some work occurred on MR In/Out formats
<https://github.com/guilload/incubator-iceberg/pull/1> which might serve as
a foundation, so we'll certainly be investigating those further.

Thanks,

Elliot.

Re: Apache Iceberg integration

Posted by Feng Lu <fe...@google.com>.
For someone like me who is new to the hive community, is there a
(semi-)formal process on contributing a large-scale feature like
Hive-Iceberg integration?
For example, hive improvement proposal, community voting, development and
code review, release, etc.

Thank you and sorry for derailing this conversation a bit.


On Thu, Jan 9, 2020 at 3:43 AM Peter Vary <pv...@cloudera.com> wrote:

> Hi Elliot,
>
> I think would be really worthwhile to have Iceberg integration with Hive.
> Minimally for reading through the available interfaces, then handling
> schema evolution / schema synchronization etc.
> Later having the possibility to write to an Iceberg table would be good as
> well, but integrating it with the current hive ACID implementation could be
> a tougher nut to break.
>
> That said, I definitely would like at least review, and help during the
> implementation if someone takes on contributing this feature.
>
> Thanks,
> Peter
>
> On Jan 8, 2020, at 15:27, Elliot West <te...@gmail.com> wrote:
>
> Hello,
>
> We're considering working on an integration of Apache Iceberg with Hive,
> initially so that the latest snapshot of Iceberg tables can be queried via
> Hive, but later to allow the writing of data using the Iceberg table format.
>
> I wanted to first check for the existence and status of any similar
> efforts so that we do not find ourselves duplicating work unnecessarily.
> I've checked both the Iceberg and Hive projects and can find no issues that
> suggest that such an integration is underway or planned (only HIVE-19457
> <https://issues.apache.org/jira/browse/HIVE-19457> which was raised by
> myself and remains open).
>
> If one or more efforts is underway we'd certainly be open to contributing.
> If not, we'd be keen to capture any thoughts from the community on
> preferred or recommended technical approaches.
>
> I see that some work occurred on MR In/Out formats
> <https://github.com/guilload/incubator-iceberg/pull/1> which might serve
> as a foundation, so we'll certainly be investigating those further.
>
> Thanks,
>
> Elliot.
>
>
>

Re: Apache Iceberg integration

Posted by Peter Vary <pv...@cloudera.com>.
Hi Elliot,

I think would be really worthwhile to have Iceberg integration with Hive.
Minimally for reading through the available interfaces, then handling schema evolution / schema synchronization etc.
Later having the possibility to write to an Iceberg table would be good as well, but integrating it with the current hive ACID implementation could be a tougher nut to break.

That said, I definitely would like at least review, and help during the implementation if someone takes on contributing this feature.

Thanks,
Peter

> On Jan 8, 2020, at 15:27, Elliot West <te...@gmail.com> wrote:
> 
> Hello,
> 
> We're considering working on an integration of Apache Iceberg with Hive, initially so that the latest snapshot of Iceberg tables can be queried via Hive, but later to allow the writing of data using the Iceberg table format.
> 
> I wanted to first check for the existence and status of any similar efforts so that we do not find ourselves duplicating work unnecessarily. I've checked both the Iceberg and Hive projects and can find no issues that suggest that such an integration is underway or planned (only HIVE-19457 <https://issues.apache.org/jira/browse/HIVE-19457> which was raised by myself and remains open).
> 
> If one or more efforts is underway we'd certainly be open to contributing. If not, we'd be keen to capture any thoughts from the community on preferred or recommended technical approaches.
> 
> I see that some work occurred on MR In/Out formats <https://github.com/guilload/incubator-iceberg/pull/1> which might serve as a foundation, so we'll certainly be investigating those further.
> 
> Thanks,
> 
> Elliot.