You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Marton Bod <ma...@gmail.com> on 2020/08/26 15:57:58 UTC

Question about Iceberg release cadence

Hi Team,

I was wondering whether there is a release cadence already in place for
Iceberg, e.g. how often releases will take place approximately? Which
commits/features as release candidates in the near term?

We're looking to integrate Iceberg into Hive, however, the current 0.9.1
release does not yet contain the StorageHandler code in iceberg-mr. Knowing
the approximate release timelines would help greatly with our integration
planning.

Of course, happy to get involved with ongoing dev/stability efforts to help
achieve a new release of this module.

Thanks a lot,
Marton

Re: Question about Iceberg release cadence

Posted by Saisai Shao <sa...@gmail.com>.
Would like to get structured streaming reader in in the next release :).
Will spend time on addressing new feedbacks.

Thanks
Saisai

Mass Dosage <ma...@gmail.com> 于2020年8月27日周四 下午10:36写道:

> I'm all for a release. The only thing still required for basic Hive read
> support (other than documentation of course!) is producing a *single* jar
> that can be added to Hive's classpath, the PR for that is at
> https://github.com/apache/iceberg/pull/1267.
>
> Thanks,
>
> Adrian
>
> On Thu, 27 Aug 2020 at 01:26, Anton Okolnychyi
> <ao...@apple.com.invalid> wrote:
>
>> +1 on releasing structured streaming source. I should be able to do one
>> more review round tomorrow.
>>
>> - Anton
>>
>> On 26 Aug 2020, at 17:12, Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>> I hope we include Spark structured streaming read as well in the next
>> release; that was proposed in Feb this year and still around. Quoting my
>> comment on benefit of the streaming read on Spark;
>>
>> This would be the major feature to cover the gap on use case for
>>> structured streaming between Delta Lake and Iceberg. There's a technical
>>> limitation on Spark structured streaming itself (global watermark), which
>>> requires workaround via splitting query into multiple queries &
>>> intermediate storage supporting end-to-end exactly once. Delta Lake covers
>>> the case, and I really would like to see the case also covered by Iceberg.
>>> I see there're lots of works in progress on the milestone (and these are
>>> great features which should be done), but after this we cover both batch
>>> and streaming workloads being done with Spark, which is a huge step forward
>>> on Spark users.
>>
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> On Thu, Aug 27, 2020 at 1:13 AM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> Hi Marton,
>>>
>>> 0.9.0 was released about 6 weeks ago, so I don't think we've planned
>>> when the next release will be yet. I think it's a good idea to release
>>> soon, though. The Flink sink is close to being ready as well and I'd like
>>> to get both of those released so that the contributors can start using them.
>>>
>>> Seems like a good question for the broader community: how about a
>>> release in the next month or so for Hive reads and the Flink sink?
>>>
>>> rb
>>>
>>> On Wed, Aug 26, 2020 at 8:58 AM Marton Bod <ma...@gmail.com> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> I was wondering whether there is a release cadence already in place for
>>>> Iceberg, e.g. how often releases will take place approximately? Which
>>>> commits/features as release candidates in the near term?
>>>>
>>>> We're looking to integrate Iceberg into Hive, however, the current
>>>> 0.9.1 release does not yet contain the StorageHandler code in iceberg-mr.
>>>> Knowing the approximate release timelines would help greatly with our
>>>> integration planning.
>>>>
>>>> Of course, happy to get involved with ongoing dev/stability efforts to
>>>> help achieve a new release of this module.
>>>>
>>>> Thanks a lot,
>>>> Marton
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>

Re: Question about Iceberg release cadence

Posted by Mass Dosage <ma...@gmail.com>.
I'm all for a release. The only thing still required for basic Hive read
support (other than documentation of course!) is producing a *single* jar
that can be added to Hive's classpath, the PR for that is at
https://github.com/apache/iceberg/pull/1267.

Thanks,

Adrian

On Thu, 27 Aug 2020 at 01:26, Anton Okolnychyi
<ao...@apple.com.invalid> wrote:

> +1 on releasing structured streaming source. I should be able to do one
> more review round tomorrow.
>
> - Anton
>
> On 26 Aug 2020, at 17:12, Jungtaek Lim <ka...@gmail.com>
> wrote:
>
> I hope we include Spark structured streaming read as well in the next
> release; that was proposed in Feb this year and still around. Quoting my
> comment on benefit of the streaming read on Spark;
>
> This would be the major feature to cover the gap on use case for
>> structured streaming between Delta Lake and Iceberg. There's a technical
>> limitation on Spark structured streaming itself (global watermark), which
>> requires workaround via splitting query into multiple queries &
>> intermediate storage supporting end-to-end exactly once. Delta Lake covers
>> the case, and I really would like to see the case also covered by Iceberg.
>> I see there're lots of works in progress on the milestone (and these are
>> great features which should be done), but after this we cover both batch
>> and streaming workloads being done with Spark, which is a huge step forward
>> on Spark users.
>
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Thu, Aug 27, 2020 at 1:13 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Hi Marton,
>>
>> 0.9.0 was released about 6 weeks ago, so I don't think we've planned when
>> the next release will be yet. I think it's a good idea to release soon,
>> though. The Flink sink is close to being ready as well and I'd like to get
>> both of those released so that the contributors can start using them.
>>
>> Seems like a good question for the broader community: how about a release
>> in the next month or so for Hive reads and the Flink sink?
>>
>> rb
>>
>> On Wed, Aug 26, 2020 at 8:58 AM Marton Bod <ma...@gmail.com> wrote:
>>
>>> Hi Team,
>>>
>>> I was wondering whether there is a release cadence already in place for
>>> Iceberg, e.g. how often releases will take place approximately? Which
>>> commits/features as release candidates in the near term?
>>>
>>> We're looking to integrate Iceberg into Hive, however, the current 0.9.1
>>> release does not yet contain the StorageHandler code in iceberg-mr. Knowing
>>> the approximate release timelines would help greatly with our integration
>>> planning.
>>>
>>> Of course, happy to get involved with ongoing dev/stability efforts to
>>> help achieve a new release of this module.
>>>
>>> Thanks a lot,
>>> Marton
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

Re: Question about Iceberg release cadence

Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.
+1 on releasing structured streaming source. I should be able to do one more review round tomorrow.

- Anton

> On 26 Aug 2020, at 17:12, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> I hope we include Spark structured streaming read as well in the next release; that was proposed in Feb this year and still around. Quoting my comment on benefit of the streaming read on Spark;
> 
> This would be the major feature to cover the gap on use case for structured streaming between Delta Lake and Iceberg. There's a technical limitation on Spark structured streaming itself (global watermark), which requires workaround via splitting query into multiple queries & intermediate storage supporting end-to-end exactly once. Delta Lake covers the case, and I really would like to see the case also covered by Iceberg.
> I see there're lots of works in progress on the milestone (and these are great features which should be done), but after this we cover both batch and streaming workloads being done with Spark, which is a huge step forward on Spark users.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR) 
> 
> On Thu, Aug 27, 2020 at 1:13 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
> Hi Marton,
> 
> 0.9.0 was released about 6 weeks ago, so I don't think we've planned when the next release will be yet. I think it's a good idea to release soon, though. The Flink sink is close to being ready as well and I'd like to get both of those released so that the contributors can start using them.
> 
> Seems like a good question for the broader community: how about a release in the next month or so for Hive reads and the Flink sink?
> 
> rb
> 
> On Wed, Aug 26, 2020 at 8:58 AM Marton Bod <marton.bod@gmail.com <ma...@gmail.com>> wrote:
> Hi Team,
> 
> I was wondering whether there is a release cadence already in place for Iceberg, e.g. how often releases will take place approximately? Which commits/features as release candidates in the near term?
> 
> We're looking to integrate Iceberg into Hive, however, the current 0.9.1 release does not yet contain the StorageHandler code in iceberg-mr. Knowing the approximate release timelines would help greatly with our integration planning.
> 
> Of course, happy to get involved with ongoing dev/stability efforts to help achieve a new release of this module.
> 
> Thanks a lot,
> Marton
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix


Re: Question about Iceberg release cadence

Posted by Jungtaek Lim <ka...@gmail.com>.
I hope we include Spark structured streaming read as well in the next
release; that was proposed in Feb this year and still around. Quoting my
comment on benefit of the streaming read on Spark;

This would be the major feature to cover the gap on use case for structured
> streaming between Delta Lake and Iceberg. There's a technical limitation on
> Spark structured streaming itself (global watermark), which requires
> workaround via splitting query into multiple queries & intermediate storage
> supporting end-to-end exactly once. Delta Lake covers the case, and I
> really would like to see the case also covered by Iceberg.
> I see there're lots of works in progress on the milestone (and these are
> great features which should be done), but after this we cover both batch
> and streaming workloads being done with Spark, which is a huge step forward
> on Spark users.


Thanks,
Jungtaek Lim (HeartSaVioR)

On Thu, Aug 27, 2020 at 1:13 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Hi Marton,
>
> 0.9.0 was released about 6 weeks ago, so I don't think we've planned when
> the next release will be yet. I think it's a good idea to release soon,
> though. The Flink sink is close to being ready as well and I'd like to get
> both of those released so that the contributors can start using them.
>
> Seems like a good question for the broader community: how about a release
> in the next month or so for Hive reads and the Flink sink?
>
> rb
>
> On Wed, Aug 26, 2020 at 8:58 AM Marton Bod <ma...@gmail.com> wrote:
>
>> Hi Team,
>>
>> I was wondering whether there is a release cadence already in place for
>> Iceberg, e.g. how often releases will take place approximately? Which
>> commits/features as release candidates in the near term?
>>
>> We're looking to integrate Iceberg into Hive, however, the current 0.9.1
>> release does not yet contain the StorageHandler code in iceberg-mr. Knowing
>> the approximate release timelines would help greatly with our integration
>> planning.
>>
>> Of course, happy to get involved with ongoing dev/stability efforts to
>> help achieve a new release of this module.
>>
>> Thanks a lot,
>> Marton
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Question about Iceberg release cadence

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Hi Marton,

0.9.0 was released about 6 weeks ago, so I don't think we've planned when
the next release will be yet. I think it's a good idea to release soon,
though. The Flink sink is close to being ready as well and I'd like to get
both of those released so that the contributors can start using them.

Seems like a good question for the broader community: how about a release
in the next month or so for Hive reads and the Flink sink?

rb

On Wed, Aug 26, 2020 at 8:58 AM Marton Bod <ma...@gmail.com> wrote:

> Hi Team,
>
> I was wondering whether there is a release cadence already in place for
> Iceberg, e.g. how often releases will take place approximately? Which
> commits/features as release candidates in the near term?
>
> We're looking to integrate Iceberg into Hive, however, the current 0.9.1
> release does not yet contain the StorageHandler code in iceberg-mr. Knowing
> the approximate release timelines would help greatly with our integration
> planning.
>
> Of course, happy to get involved with ongoing dev/stability efforts to
> help achieve a new release of this module.
>
> Thanks a lot,
> Marton
>


-- 
Ryan Blue
Software Engineer
Netflix