You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Peter Vary <pv...@cloudera.com.INVALID> on 2020/08/26 16:22:49 UTC

Hive Iceberg writes

Hi Team,

We are thinking about implementing HiveOutputFormat, so writes through Hive can work as well.
Has anybody working on this? Do you know any ongoing effort related to Hive writes?
Asking because we would like to prevent duplicate effort.
Also if anyone has some good pointers to start for an Iceberg noobie, it would be good.

Thanks,
Peter 


Re: Hive Iceberg writes

Posted by Peter Vary <pv...@cloudera.com.INVALID>.
Uploaded a working implementation for unpartitioned tables.
Those who are interested can take a look here: https://github.com/apache/iceberg/pull/1407 <https://github.com/apache/iceberg/pull/1407>

> On Aug 31, 2020, at 16:34, Peter Vary <pv...@cloudera.com> wrote:
> 
> Thanks everyone for the quick answers.
> I will post a WIP patch as Adrian suggested in the next few days.
> 
> Thanks,
> Peter
> 
>> On Aug 27, 2020, at 19:35, RD <rdsr.me@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Our stance has been similar at LinkedIn. Hive writes are not a priority for us as we plan to move more and more of our workloads on Hive to Spark SQL
>> 
>> -R
>> 
>> On Thu, Aug 27, 2020 at 10:18 AM Edgar Rodriguez <edgar.rodriguez@airbnb.com.invalid <ma...@airbnb.com.invalid>> wrote:
>> Hi folks,
>> 
>> We have not started to work on this either, but we've discussed this internally on whether supporting Hive writes or not. Our first priority right now is getting Hive reads in production to have read compatibility with our existing Hive clients. We'd be interested in this, however, at Airbnb we're moving to Spark so writes in Hive most likely won't be on top of our list.
>> 
>> Thanks!
>> 
>> Cheers,
>> 
>> On Thu, Aug 27, 2020 at 12:53 AM Mass Dosage <massdosage@gmail.com <ma...@gmail.com>> wrote:
>> We're definitely interested in this too but haven't started work on it yet. It has been discussed at our community syncs as something quite a few people are interested in so if nobody else responds a good starting point would probably be an early WIP PR that everyone can follow and contribute to.
>> 
>> Thanks,
>> 
>> Adrian
>> 
>> On Wed, 26 Aug 2020 at 17:35, Ryan Blue <rblue@netflix.com.invalid <ma...@netflix.com.invalid>> wrote:
>> I think Edgar and Adrien who have been contributing support for ORC and Hive are interested in this as well.
>> 
>> On Wed, Aug 26, 2020 at 9:22 AM Peter Vary <pvary@cloudera.com.invalid <ma...@cloudera.com.invalid>> wrote:
>> Hi Team,
>> 
>> We are thinking about implementing HiveOutputFormat, so writes through Hive can work as well.
>> Has anybody working on this? Do you know any ongoing effort related to Hive writes?
>> Asking because we would like to prevent duplicate effort.
>> Also if anyone has some good pointers to start for an Iceberg noobie, it would be good.
>> 
>> Thanks,
>> Peter 
>> 
>> 
>> 
>> -- 
>> Ryan Blue
>> Software Engineer
>> Netflix
>> 
>> 
>> -- 
>> Edgar R
> 


Re: Hive Iceberg writes

Posted by Peter Vary <pv...@cloudera.com.INVALID>.
Thanks everyone for the quick answers.
I will post a WIP patch as Adrian suggested in the next few days.

Thanks,
Peter

> On Aug 27, 2020, at 19:35, RD <rd...@gmail.com> wrote:
> 
> Our stance has been similar at LinkedIn. Hive writes are not a priority for us as we plan to move more and more of our workloads on Hive to Spark SQL
> 
> -R
> 
> On Thu, Aug 27, 2020 at 10:18 AM Edgar Rodriguez <ed...@airbnb.com.invalid> wrote:
> Hi folks,
> 
> We have not started to work on this either, but we've discussed this internally on whether supporting Hive writes or not. Our first priority right now is getting Hive reads in production to have read compatibility with our existing Hive clients. We'd be interested in this, however, at Airbnb we're moving to Spark so writes in Hive most likely won't be on top of our list.
> 
> Thanks!
> 
> Cheers,
> 
> On Thu, Aug 27, 2020 at 12:53 AM Mass Dosage <massdosage@gmail.com <ma...@gmail.com>> wrote:
> We're definitely interested in this too but haven't started work on it yet. It has been discussed at our community syncs as something quite a few people are interested in so if nobody else responds a good starting point would probably be an early WIP PR that everyone can follow and contribute to.
> 
> Thanks,
> 
> Adrian
> 
> On Wed, 26 Aug 2020 at 17:35, Ryan Blue <rb...@netflix.com.invalid> wrote:
> I think Edgar and Adrien who have been contributing support for ORC and Hive are interested in this as well.
> 
> On Wed, Aug 26, 2020 at 9:22 AM Peter Vary <pv...@cloudera.com.invalid> wrote:
> Hi Team,
> 
> We are thinking about implementing HiveOutputFormat, so writes through Hive can work as well.
> Has anybody working on this? Do you know any ongoing effort related to Hive writes?
> Asking because we would like to prevent duplicate effort.
> Also if anyone has some good pointers to start for an Iceberg noobie, it would be good.
> 
> Thanks,
> Peter 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix
> 
> 
> -- 
> Edgar R


Re: Hive Iceberg writes

Posted by RD <rd...@gmail.com>.
Our stance has been similar at LinkedIn. Hive writes are not a priority for
us as we plan to move more and more of our workloads on Hive to Spark SQL

-R

On Thu, Aug 27, 2020 at 10:18 AM Edgar Rodriguez
<ed...@airbnb.com.invalid> wrote:

> Hi folks,
>
> We have not started to work on this either, but we've discussed this
> internally on whether supporting Hive writes or not. Our first priority
> right now is getting Hive reads in production to have read compatibility
> with our existing Hive clients. We'd be interested in this, however, at
> Airbnb we're moving to Spark so writes in Hive most likely won't be on top
> of our list.
>
> Thanks!
>
> Cheers,
>
> On Thu, Aug 27, 2020 at 12:53 AM Mass Dosage <ma...@gmail.com> wrote:
>
>> We're definitely interested in this too but haven't started work on it
>> yet. It has been discussed at our community syncs as something quite a few
>> people are interested in so if nobody else responds a good starting point
>> would probably be an early WIP PR that everyone can follow and contribute
>> to.
>>
>> Thanks,
>>
>> Adrian
>>
>> On Wed, 26 Aug 2020 at 17:35, Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> I think Edgar and Adrien who have been contributing support for ORC and
>>> Hive are interested in this as well.
>>>
>>> On Wed, Aug 26, 2020 at 9:22 AM Peter Vary <pv...@cloudera.com.invalid>
>>> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> We are thinking about implementing HiveOutputFormat, so writes through
>>>> Hive can work as well.
>>>> Has anybody working on this? Do you know any ongoing effort related to
>>>> Hive writes?
>>>> Asking because we would like to prevent duplicate effort.
>>>> Also if anyone has some good pointers to start for an Iceberg noobie,
>>>> it would be good.
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Edgar R
>

Re: Hive Iceberg writes

Posted by Edgar Rodriguez <ed...@airbnb.com.INVALID>.
Hi folks,

We have not started to work on this either, but we've discussed this
internally on whether supporting Hive writes or not. Our first priority
right now is getting Hive reads in production to have read compatibility
with our existing Hive clients. We'd be interested in this, however, at
Airbnb we're moving to Spark so writes in Hive most likely won't be on top
of our list.

Thanks!

Cheers,

On Thu, Aug 27, 2020 at 12:53 AM Mass Dosage <ma...@gmail.com> wrote:

> We're definitely interested in this too but haven't started work on it
> yet. It has been discussed at our community syncs as something quite a few
> people are interested in so if nobody else responds a good starting point
> would probably be an early WIP PR that everyone can follow and contribute
> to.
>
> Thanks,
>
> Adrian
>
> On Wed, 26 Aug 2020 at 17:35, Ryan Blue <rb...@netflix.com.invalid> wrote:
>
>> I think Edgar and Adrien who have been contributing support for ORC and
>> Hive are interested in this as well.
>>
>> On Wed, Aug 26, 2020 at 9:22 AM Peter Vary <pv...@cloudera.com.invalid>
>> wrote:
>>
>>> Hi Team,
>>>
>>> We are thinking about implementing HiveOutputFormat, so writes through
>>> Hive can work as well.
>>> Has anybody working on this? Do you know any ongoing effort related to
>>> Hive writes?
>>> Asking because we would like to prevent duplicate effort.
>>> Also if anyone has some good pointers to start for an Iceberg noobie, it
>>> would be good.
>>>
>>> Thanks,
>>> Peter
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Edgar R

Re: Hive Iceberg writes

Posted by Mass Dosage <ma...@gmail.com>.
We're definitely interested in this too but haven't started work on it yet.
It has been discussed at our community syncs as something quite a few
people are interested in so if nobody else responds a good starting point
would probably be an early WIP PR that everyone can follow and contribute
to.

Thanks,

Adrian

On Wed, 26 Aug 2020 at 17:35, Ryan Blue <rb...@netflix.com.invalid> wrote:

> I think Edgar and Adrien who have been contributing support for ORC and
> Hive are interested in this as well.
>
> On Wed, Aug 26, 2020 at 9:22 AM Peter Vary <pv...@cloudera.com.invalid>
> wrote:
>
>> Hi Team,
>>
>> We are thinking about implementing HiveOutputFormat, so writes through
>> Hive can work as well.
>> Has anybody working on this? Do you know any ongoing effort related to
>> Hive writes?
>> Asking because we would like to prevent duplicate effort.
>> Also if anyone has some good pointers to start for an Iceberg noobie, it
>> would be good.
>>
>> Thanks,
>> Peter
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Hive Iceberg writes

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I think Edgar and Adrien who have been contributing support for ORC and
Hive are interested in this as well.

On Wed, Aug 26, 2020 at 9:22 AM Peter Vary <pv...@cloudera.com.invalid>
wrote:

> Hi Team,
>
> We are thinking about implementing HiveOutputFormat, so writes through
> Hive can work as well.
> Has anybody working on this? Do you know any ongoing effort related to
> Hive writes?
> Asking because we would like to prevent duplicate effort.
> Also if anyone has some good pointers to start for an Iceberg noobie, it
> would be good.
>
> Thanks,
> Peter
>
>

-- 
Ryan Blue
Software Engineer
Netflix