You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Edgar Rodriguez <ed...@airbnb.com> on 2019/08/07 20:57:51 UTC

Iceberg in Spark 3.0.0

Hi everyone,

I was wondering if there's a branch tracking the changes happening in Spark
3.0.0 for Iceberg. The DataSource V2 API has substantially changed from the
one implemented in Iceberg master branch and since Spark 3.0.0 would allow
us to introduce Spark SQL support then it seems interesting to start
tracking those changes to start evaluating some of the support as it
evolves.

Thanks.

Cheers,
-- 
Edgar Rodriguez

Re: Iceberg in Spark 3.0.0

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I just created a spark-3 branch that is up to date with master:
https://github.com/apache/incubator-iceberg/tree/spark-3

Please create PRs against that branch. Thanks!

On Sun, Nov 24, 2019 at 6:22 PM Saisai Shao <sa...@gmail.com> wrote:

> Thanks guys for your reply.
>
> Hi Ryan, would you please help to create a spark-3.0 branch, so we could
> submit our PRs against that branch.
>
> Best regards,
> Saisai
>
> Ryan Blue <rb...@netflix.com.invalid> 于2019年11月23日周六 上午2:03写道:
>
>> I agree, let's create a spark-3.0 branch to start with. We've been
>> building vectorization this way using the vectorized-reads branch.
>>
>> In the long term, we may want to split Spark into separate modules for
>> 2.x and 3.x in the same branch, but for now we can at least get everything
>> working with a 3.0 branch.
>>
>> On Fri, Nov 22, 2019 at 8:34 AM John Zhuge <jz...@apache.org> wrote:
>>
>>> +1 for Iceberg branch
>>>
>>> Thanks for the contribution from you and your team!
>>>
>>> On Fri, Nov 22, 2019 at 8:29 AM Anton Okolnychyi
>>> <ao...@apple.com.invalid> wrote:
>>>
>>>> +1 on having a branch in Iceberg as we have for vectorized reads.
>>>>
>>>> - Anton
>>>>
>>>> On 22 Nov 2019, at 02:26, Saisai Shao <sa...@gmail.com> wrote:
>>>>
>>>> Hi Ryan and team,
>>>>
>>>> Thanks a lot for your response. I was wondering how do we share our
>>>> branch, one possible way s that we maintain a forked Iceberg repo with
>>>> Spark 3.0.0-preview branch, another possible way is to create a branch in
>>>> upstream Iceberg repo. I'm inclined to choose the second way, so that they
>>>> community could review and contribute on it.
>>>>
>>>> I would like to hear your suggestions.
>>>>
>>>> Best regards,
>>>> Saisai
>>>>
>>>>
>>>> Ryan Blue <rb...@netflix.com.invalid> 于2019年11月20日周三 上午1:27写道:
>>>>
>>>>> Sounds great, thanks Saisai!
>>>>>
>>>>> On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao <sa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Anton, I will share our branch soon.
>>>>>>
>>>>>> Best regards,
>>>>>> Saisai
>>>>>>
>>>>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>>>>> 下午6:54写道:
>>>>>>
>>>>>>> I think it would be great if you can share what you have, Saisai.
>>>>>>> That way, we can all collaborate and ensure we build a full 3.0 integration
>>>>>>> as soon as possible.
>>>>>>>
>>>>>>> - Anton
>>>>>>>
>>>>>>>
>>>>>>> On 18 Nov 2019, at 02:08, Saisai Shao <sa...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Anton,
>>>>>>>
>>>>>>> Thanks to bring this out. We already have a branch building against
>>>>>>> Spark 3.0 (Master branch actually) internally, and we're actively working
>>>>>>> on it. I think it is a good idea to create an upstream Spark 3.0 branch, we
>>>>>>> could share it if the community would like to do so.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Saisai
>>>>>>>
>>>>>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>>>>>> 上午1:40写道:
>>>>>>>
>>>>>>>> I think it is a good time to create a branch to build our 3.0
>>>>>>>> integration as the 3.0 preview was released.
>>>>>>>> What does everyone think? Has anybody started already?
>>>>>>>>
>>>>>>>> - Anton
>>>>>>>>
>>>>>>>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <
>>>>>>>> edgar.rodriguez@airbnb.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:
>>>>>>>>
>>>>>>>>> I think it's a great idea to branch and get ready for Spark 3.0.0.
>>>>>>>>> Right now, I'm focused on getting a release out, but I can review patches
>>>>>>>>> for Spark 3.0.
>>>>>>>>>
>>>>>>>>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Seems like there're nightly snapshots built in
>>>>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ -
>>>>>>>> I've started setting something up with these snapshots so I can probably
>>>>>>>> start working on this.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> --
>>>>>>>> Edgar Rodriguez
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Software Engineer
>>>>> Netflix
>>>>>
>>>>
>>>>
>>>
>>> --
>>> John Zhuge
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg in Spark 3.0.0

Posted by Saisai Shao <sa...@gmail.com>.
Thanks guys for your reply.

Hi Ryan, would you please help to create a spark-3.0 branch, so we could
submit our PRs against that branch.

Best regards,
Saisai

Ryan Blue <rb...@netflix.com.invalid> 于2019年11月23日周六 上午2:03写道:

> I agree, let's create a spark-3.0 branch to start with. We've been
> building vectorization this way using the vectorized-reads branch.
>
> In the long term, we may want to split Spark into separate modules for 2.x
> and 3.x in the same branch, but for now we can at least get everything
> working with a 3.0 branch.
>
> On Fri, Nov 22, 2019 at 8:34 AM John Zhuge <jz...@apache.org> wrote:
>
>> +1 for Iceberg branch
>>
>> Thanks for the contribution from you and your team!
>>
>> On Fri, Nov 22, 2019 at 8:29 AM Anton Okolnychyi
>> <ao...@apple.com.invalid> wrote:
>>
>>> +1 on having a branch in Iceberg as we have for vectorized reads.
>>>
>>> - Anton
>>>
>>> On 22 Nov 2019, at 02:26, Saisai Shao <sa...@gmail.com> wrote:
>>>
>>> Hi Ryan and team,
>>>
>>> Thanks a lot for your response. I was wondering how do we share our
>>> branch, one possible way s that we maintain a forked Iceberg repo with
>>> Spark 3.0.0-preview branch, another possible way is to create a branch in
>>> upstream Iceberg repo. I'm inclined to choose the second way, so that they
>>> community could review and contribute on it.
>>>
>>> I would like to hear your suggestions.
>>>
>>> Best regards,
>>> Saisai
>>>
>>>
>>> Ryan Blue <rb...@netflix.com.invalid> 于2019年11月20日周三 上午1:27写道:
>>>
>>>> Sounds great, thanks Saisai!
>>>>
>>>> On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Anton, I will share our branch soon.
>>>>>
>>>>> Best regards,
>>>>> Saisai
>>>>>
>>>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>>>> 下午6:54写道:
>>>>>
>>>>>> I think it would be great if you can share what you have, Saisai.
>>>>>> That way, we can all collaborate and ensure we build a full 3.0 integration
>>>>>> as soon as possible.
>>>>>>
>>>>>> - Anton
>>>>>>
>>>>>>
>>>>>> On 18 Nov 2019, at 02:08, Saisai Shao <sa...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Anton,
>>>>>>
>>>>>> Thanks to bring this out. We already have a branch building against
>>>>>> Spark 3.0 (Master branch actually) internally, and we're actively working
>>>>>> on it. I think it is a good idea to create an upstream Spark 3.0 branch, we
>>>>>> could share it if the community would like to do so.
>>>>>>
>>>>>> Best regards,
>>>>>> Saisai
>>>>>>
>>>>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>>>>> 上午1:40写道:
>>>>>>
>>>>>>> I think it is a good time to create a branch to build our 3.0
>>>>>>> integration as the 3.0 preview was released.
>>>>>>> What does everyone think? Has anybody started already?
>>>>>>>
>>>>>>> - Anton
>>>>>>>
>>>>>>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <ed...@airbnb.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:
>>>>>>>
>>>>>>>> I think it's a great idea to branch and get ready for Spark 3.0.0.
>>>>>>>> Right now, I'm focused on getting a release out, but I can review patches
>>>>>>>> for Spark 3.0.
>>>>>>>>
>>>>>>>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>>>>>>>
>>>>>>>
>>>>>>> Seems like there're nightly snapshots built in
>>>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ -
>>>>>>> I've started setting something up with these snapshots so I can probably
>>>>>>> start working on this.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Cheers,
>>>>>>> --
>>>>>>> Edgar Rodriguez
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>>
>>
>> --
>> John Zhuge
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Iceberg in Spark 3.0.0

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I agree, let's create a spark-3.0 branch to start with. We've been building
vectorization this way using the vectorized-reads branch.

In the long term, we may want to split Spark into separate modules for 2.x
and 3.x in the same branch, but for now we can at least get everything
working with a 3.0 branch.

On Fri, Nov 22, 2019 at 8:34 AM John Zhuge <jz...@apache.org> wrote:

> +1 for Iceberg branch
>
> Thanks for the contribution from you and your team!
>
> On Fri, Nov 22, 2019 at 8:29 AM Anton Okolnychyi
> <ao...@apple.com.invalid> wrote:
>
>> +1 on having a branch in Iceberg as we have for vectorized reads.
>>
>> - Anton
>>
>> On 22 Nov 2019, at 02:26, Saisai Shao <sa...@gmail.com> wrote:
>>
>> Hi Ryan and team,
>>
>> Thanks a lot for your response. I was wondering how do we share our
>> branch, one possible way s that we maintain a forked Iceberg repo with
>> Spark 3.0.0-preview branch, another possible way is to create a branch in
>> upstream Iceberg repo. I'm inclined to choose the second way, so that they
>> community could review and contribute on it.
>>
>> I would like to hear your suggestions.
>>
>> Best regards,
>> Saisai
>>
>>
>> Ryan Blue <rb...@netflix.com.invalid> 于2019年11月20日周三 上午1:27写道:
>>
>>> Sounds great, thanks Saisai!
>>>
>>> On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao <sa...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Anton, I will share our branch soon.
>>>>
>>>> Best regards,
>>>> Saisai
>>>>
>>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>>> 下午6:54写道:
>>>>
>>>>> I think it would be great if you can share what you have, Saisai. That
>>>>> way, we can all collaborate and ensure we build a full 3.0 integration as
>>>>> soon as possible.
>>>>>
>>>>> - Anton
>>>>>
>>>>>
>>>>> On 18 Nov 2019, at 02:08, Saisai Shao <sa...@gmail.com> wrote:
>>>>>
>>>>> Hi Anton,
>>>>>
>>>>> Thanks to bring this out. We already have a branch building against
>>>>> Spark 3.0 (Master branch actually) internally, and we're actively working
>>>>> on it. I think it is a good idea to create an upstream Spark 3.0 branch, we
>>>>> could share it if the community would like to do so.
>>>>>
>>>>> Best regards,
>>>>> Saisai
>>>>>
>>>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>>>> 上午1:40写道:
>>>>>
>>>>>> I think it is a good time to create a branch to build our 3.0
>>>>>> integration as the 3.0 preview was released.
>>>>>> What does everyone think? Has anybody started already?
>>>>>>
>>>>>> - Anton
>>>>>>
>>>>>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <ed...@airbnb.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:
>>>>>>
>>>>>>> I think it's a great idea to branch and get ready for Spark 3.0.0.
>>>>>>> Right now, I'm focused on getting a release out, but I can review patches
>>>>>>> for Spark 3.0.
>>>>>>>
>>>>>>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>>>>>>
>>>>>>
>>>>>> Seems like there're nightly snapshots built in
>>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ -
>>>>>> I've started setting something up with these snapshots so I can probably
>>>>>> start working on this.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Cheers,
>>>>>> --
>>>>>> Edgar Rodriguez
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>
> --
> John Zhuge
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg in Spark 3.0.0

Posted by John Zhuge <jz...@apache.org>.
+1 for Iceberg branch

Thanks for the contribution from you and your team!

On Fri, Nov 22, 2019 at 8:29 AM Anton Okolnychyi
<ao...@apple.com.invalid> wrote:

> +1 on having a branch in Iceberg as we have for vectorized reads.
>
> - Anton
>
> On 22 Nov 2019, at 02:26, Saisai Shao <sa...@gmail.com> wrote:
>
> Hi Ryan and team,
>
> Thanks a lot for your response. I was wondering how do we share our
> branch, one possible way s that we maintain a forked Iceberg repo with
> Spark 3.0.0-preview branch, another possible way is to create a branch in
> upstream Iceberg repo. I'm inclined to choose the second way, so that they
> community could review and contribute on it.
>
> I would like to hear your suggestions.
>
> Best regards,
> Saisai
>
>
> Ryan Blue <rb...@netflix.com.invalid> 于2019年11月20日周三 上午1:27写道:
>
>> Sounds great, thanks Saisai!
>>
>> On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao <sa...@gmail.com>
>> wrote:
>>
>>> Thanks Anton, I will share our branch soon.
>>>
>>> Best regards,
>>> Saisai
>>>
>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>> 下午6:54写道:
>>>
>>>> I think it would be great if you can share what you have, Saisai. That
>>>> way, we can all collaborate and ensure we build a full 3.0 integration as
>>>> soon as possible.
>>>>
>>>> - Anton
>>>>
>>>>
>>>> On 18 Nov 2019, at 02:08, Saisai Shao <sa...@gmail.com> wrote:
>>>>
>>>> Hi Anton,
>>>>
>>>> Thanks to bring this out. We already have a branch building against
>>>> Spark 3.0 (Master branch actually) internally, and we're actively working
>>>> on it. I think it is a good idea to create an upstream Spark 3.0 branch, we
>>>> could share it if the community would like to do so.
>>>>
>>>> Best regards,
>>>> Saisai
>>>>
>>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>>> 上午1:40写道:
>>>>
>>>>> I think it is a good time to create a branch to build our 3.0
>>>>> integration as the 3.0 preview was released.
>>>>> What does everyone think? Has anybody started already?
>>>>>
>>>>> - Anton
>>>>>
>>>>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <ed...@airbnb.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:
>>>>>
>>>>>> I think it's a great idea to branch and get ready for Spark 3.0.0.
>>>>>> Right now, I'm focused on getting a release out, but I can review patches
>>>>>> for Spark 3.0.
>>>>>>
>>>>>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>>>>>
>>>>>
>>>>> Seems like there're nightly snapshots built in
>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ -
>>>>> I've started setting something up with these snapshots so I can probably
>>>>> start working on this.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Cheers,
>>>>> --
>>>>> Edgar Rodriguez
>>>>>
>>>>>
>>>>>
>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

-- 
John Zhuge

Re: Iceberg in Spark 3.0.0

Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.
+1 on having a branch in Iceberg as we have for vectorized reads.

- Anton

> On 22 Nov 2019, at 02:26, Saisai Shao <sa...@gmail.com> wrote:
> 
> Hi Ryan and team,
> 
> Thanks a lot for your response. I was wondering how do we share our branch, one possible way s that we maintain a forked Iceberg repo with Spark 3.0.0-preview branch, another possible way is to create a branch in upstream Iceberg repo. I'm inclined to choose the second way, so that they community could review and contribute on it. 
> 
> I would like to hear your suggestions.
> 
> Best regards,
> Saisai
> 
> 
> Ryan Blue <rb...@netflix.com.invalid> 于2019年11月20日周三 上午1:27写道:
> Sounds great, thanks Saisai!
> 
> On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao <sai.sai.shao@gmail.com <ma...@gmail.com>> wrote:
> Thanks Anton, I will share our branch soon.
> 
> Best regards,
> Saisai
> 
> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一 下午6:54写道:
> I think it would be great if you can share what you have, Saisai. That way, we can all collaborate and ensure we build a full 3.0 integration as soon as possible.
> 
> - Anton
> 
> 
>> On 18 Nov 2019, at 02:08, Saisai Shao <sai.sai.shao@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi Anton, 
>> 
>> Thanks to bring this out. We already have a branch building against Spark 3.0 (Master branch actually) internally, and we're actively working on it. I think it is a good idea to create an upstream Spark 3.0 branch, we could share it if the community would like to do so.
>> 
>> Best regards,
>> Saisai
>> 
>> Anton Okolnychyi <aokolnychyi@apple.com.invalid <ma...@apple.com.invalid>> 于2019年11月18日周一 上午1:40写道:
>> I think it is a good time to create a branch to build our 3.0 integration as the 3.0 preview was released.
>> What does everyone think? Has anybody started already?
>> 
>> - Anton
>> 
>>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <edgar.rodriguez@airbnb.com <ma...@airbnb.com>> wrote:
>>> 
>>> 
>>> 
>>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rblue@netflix.com <ma...@netflix.com>> wrote:
>>> I think it's a great idea to branch and get ready for Spark 3.0.0. Right now, I'm focused on getting a release out, but I can review patches for Spark 3.0.
>>> 
>>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>> 
>>> Seems like there're nightly snapshots built in https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ <https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/> - I've started setting something up with these snapshots so I can probably start working on this.
>>>  
>>> Thanks!
>>> 
>>> Cheers,
>>> -- 
>>> Edgar Rodriguez
>> 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix


Re: Iceberg in Spark 3.0.0

Posted by Saisai Shao <sa...@gmail.com>.
Hi Ryan and team,

Thanks a lot for your response. I was wondering how do we share our branch,
one possible way s that we maintain a forked Iceberg repo with Spark
3.0.0-preview branch, another possible way is to create a branch in
upstream Iceberg repo. I'm inclined to choose the second way, so that they
community could review and contribute on it.

I would like to hear your suggestions.

Best regards,
Saisai


Ryan Blue <rb...@netflix.com.invalid> 于2019年11月20日周三 上午1:27写道:

> Sounds great, thanks Saisai!
>
> On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao <sa...@gmail.com>
> wrote:
>
>> Thanks Anton, I will share our branch soon.
>>
>> Best regards,
>> Saisai
>>
>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一 下午6:54写道:
>>
>>> I think it would be great if you can share what you have, Saisai. That
>>> way, we can all collaborate and ensure we build a full 3.0 integration as
>>> soon as possible.
>>>
>>> - Anton
>>>
>>>
>>> On 18 Nov 2019, at 02:08, Saisai Shao <sa...@gmail.com> wrote:
>>>
>>> Hi Anton,
>>>
>>> Thanks to bring this out. We already have a branch building against
>>> Spark 3.0 (Master branch actually) internally, and we're actively working
>>> on it. I think it is a good idea to create an upstream Spark 3.0 branch, we
>>> could share it if the community would like to do so.
>>>
>>> Best regards,
>>> Saisai
>>>
>>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一
>>> 上午1:40写道:
>>>
>>>> I think it is a good time to create a branch to build our 3.0
>>>> integration as the 3.0 preview was released.
>>>> What does everyone think? Has anybody started already?
>>>>
>>>> - Anton
>>>>
>>>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <ed...@airbnb.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:
>>>>
>>>>> I think it's a great idea to branch and get ready for Spark 3.0.0.
>>>>> Right now, I'm focused on getting a release out, but I can review patches
>>>>> for Spark 3.0.
>>>>>
>>>>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>>>>
>>>>
>>>> Seems like there're nightly snapshots built in
>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ -
>>>> I've started setting something up with these snapshots so I can probably
>>>> start working on this.
>>>>
>>>> Thanks!
>>>>
>>>> Cheers,
>>>> --
>>>> Edgar Rodriguez
>>>>
>>>>
>>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Iceberg in Spark 3.0.0

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Sounds great, thanks Saisai!

On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao <sa...@gmail.com> wrote:

> Thanks Anton, I will share our branch soon.
>
> Best regards,
> Saisai
>
> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一 下午6:54写道:
>
>> I think it would be great if you can share what you have, Saisai. That
>> way, we can all collaborate and ensure we build a full 3.0 integration as
>> soon as possible.
>>
>> - Anton
>>
>>
>> On 18 Nov 2019, at 02:08, Saisai Shao <sa...@gmail.com> wrote:
>>
>> Hi Anton,
>>
>> Thanks to bring this out. We already have a branch building against Spark
>> 3.0 (Master branch actually) internally, and we're actively working on it.
>> I think it is a good idea to create an upstream Spark 3.0 branch, we could
>> share it if the community would like to do so.
>>
>> Best regards,
>> Saisai
>>
>> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一 上午1:40写道:
>>
>>> I think it is a good time to create a branch to build our 3.0
>>> integration as the 3.0 preview was released.
>>> What does everyone think? Has anybody started already?
>>>
>>> - Anton
>>>
>>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <ed...@airbnb.com>
>>> wrote:
>>>
>>>
>>>
>>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:
>>>
>>>> I think it's a great idea to branch and get ready for Spark 3.0.0.
>>>> Right now, I'm focused on getting a release out, but I can review patches
>>>> for Spark 3.0.
>>>>
>>>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>>>
>>>
>>> Seems like there're nightly snapshots built in
>>> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ -
>>> I've started setting something up with these snapshots so I can probably
>>> start working on this.
>>>
>>> Thanks!
>>>
>>> Cheers,
>>> --
>>> Edgar Rodriguez
>>>
>>>
>>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg in Spark 3.0.0

Posted by Saisai Shao <sa...@gmail.com>.
Thanks Anton, I will share our branch soon.

Best regards,
Saisai

Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一 下午6:54写道:

> I think it would be great if you can share what you have, Saisai. That
> way, we can all collaborate and ensure we build a full 3.0 integration as
> soon as possible.
>
> - Anton
>
>
> On 18 Nov 2019, at 02:08, Saisai Shao <sa...@gmail.com> wrote:
>
> Hi Anton,
>
> Thanks to bring this out. We already have a branch building against Spark
> 3.0 (Master branch actually) internally, and we're actively working on it.
> I think it is a good idea to create an upstream Spark 3.0 branch, we could
> share it if the community would like to do so.
>
> Best regards,
> Saisai
>
> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一 上午1:40写道:
>
>> I think it is a good time to create a branch to build our 3.0 integration
>> as the 3.0 preview was released.
>> What does everyone think? Has anybody started already?
>>
>> - Anton
>>
>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <ed...@airbnb.com>
>> wrote:
>>
>>
>>
>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:
>>
>>> I think it's a great idea to branch and get ready for Spark 3.0.0. Right
>>> now, I'm focused on getting a release out, but I can review patches for
>>> Spark 3.0.
>>>
>>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>>
>>
>> Seems like there're nightly snapshots built in
>> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ -
>> I've started setting something up with these snapshots so I can probably
>> start working on this.
>>
>> Thanks!
>>
>> Cheers,
>> --
>> Edgar Rodriguez
>>
>>
>>
>

Re: Iceberg in Spark 3.0.0

Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.
I think it would be great if you can share what you have, Saisai. That way, we can all collaborate and ensure we build a full 3.0 integration as soon as possible.

- Anton


> On 18 Nov 2019, at 02:08, Saisai Shao <sa...@gmail.com> wrote:
> 
> Hi Anton, 
> 
> Thanks to bring this out. We already have a branch building against Spark 3.0 (Master branch actually) internally, and we're actively working on it. I think it is a good idea to create an upstream Spark 3.0 branch, we could share it if the community would like to do so.
> 
> Best regards,
> Saisai
> 
> Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一 上午1:40写道:
> I think it is a good time to create a branch to build our 3.0 integration as the 3.0 preview was released.
> What does everyone think? Has anybody started already?
> 
> - Anton
> 
>> On 8 Aug 2019, at 23:47, Edgar Rodriguez <edgar.rodriguez@airbnb.com <ma...@airbnb.com>> wrote:
>> 
>> 
>> 
>> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rblue@netflix.com <ma...@netflix.com>> wrote:
>> I think it's a great idea to branch and get ready for Spark 3.0.0. Right now, I'm focused on getting a release out, but I can review patches for Spark 3.0.
>> 
>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>> 
>> Seems like there're nightly snapshots built in https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ <https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/> - I've started setting something up with these snapshots so I can probably start working on this.
>>  
>> Thanks!
>> 
>> Cheers,
>> -- 
>> Edgar Rodriguez
> 


Re: Iceberg in Spark 3.0.0

Posted by Saisai Shao <sa...@gmail.com>.
Hi Anton,

Thanks to bring this out. We already have a branch building against Spark
3.0 (Master branch actually) internally, and we're actively working on it.
I think it is a good idea to create an upstream Spark 3.0 branch, we could
share it if the community would like to do so.

Best regards,
Saisai

Anton Okolnychyi <ao...@apple.com.invalid> 于2019年11月18日周一 上午1:40写道:

> I think it is a good time to create a branch to build our 3.0 integration
> as the 3.0 preview was released.
> What does everyone think? Has anybody started already?
>
> - Anton
>
> On 8 Aug 2019, at 23:47, Edgar Rodriguez <ed...@airbnb.com>
> wrote:
>
>
>
> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:
>
>> I think it's a great idea to branch and get ready for Spark 3.0.0. Right
>> now, I'm focused on getting a release out, but I can review patches for
>> Spark 3.0.
>>
>> Anyone know if there are nightly builds of Spark 3.0 to test with?
>>
>
> Seems like there're nightly snapshots built in
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ -
> I've started setting something up with these snapshots so I can probably
> start working on this.
>
> Thanks!
>
> Cheers,
> --
> Edgar Rodriguez
>
>
>

Re: Iceberg in Spark 3.0.0

Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.
I think it is a good time to create a branch to build our 3.0 integration as the 3.0 preview was released.
What does everyone think? Has anybody started already?

- Anton

> On 8 Aug 2019, at 23:47, Edgar Rodriguez <ed...@airbnb.com> wrote:
> 
> 
> 
> On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rblue@netflix.com <ma...@netflix.com>> wrote:
> I think it's a great idea to branch and get ready for Spark 3.0.0. Right now, I'm focused on getting a release out, but I can review patches for Spark 3.0.
> 
> Anyone know if there are nightly builds of Spark 3.0 to test with?
> 
> Seems like there're nightly snapshots built in https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/ <https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/> - I've started setting something up with these snapshots so I can probably start working on this.
>  
> Thanks!
> 
> Cheers,
> -- 
> Edgar Rodriguez


Re: Iceberg in Spark 3.0.0

Posted by Edgar Rodriguez <ed...@airbnb.com>.
On Thu, Aug 8, 2019 at 3:37 PM Ryan Blue <rb...@netflix.com> wrote:

> I think it's a great idea to branch and get ready for Spark 3.0.0. Right
> now, I'm focused on getting a release out, but I can review patches for
> Spark 3.0.
>
> Anyone know if there are nightly builds of Spark 3.0 to test with?
>

Seems like there're nightly snapshots built in
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/3.0.0-SNAPSHOT/
-
I've started setting something up with these snapshots so I can probably
start working on this.

Thanks!

Cheers,
-- 
Edgar Rodriguez

Re: Iceberg in Spark 3.0.0

Posted by Ryan Blue <rb...@netflix.com>.
One more thing:

Spark 3.0.0 has several changes regarding to DataSource V2, it would be
better to evaluate the changes and do the design by also considering 3.0
changes

This actually goes the other way. We’ve been influencing the design of
DataSourceV2 based on what we need for Iceberg. I’m tracking DSv2 very
closely and I don’t expect any surprises.

On Thu, Aug 8, 2019 at 3:36 PM Ryan Blue <rb...@netflix.com> wrote:

> I think it's a great idea to branch and get ready for Spark 3.0.0. Right
> now, I'm focused on getting a release out, but I can review patches for
> Spark 3.0.
>
> Anyone know if there are nightly builds of Spark 3.0 to test with?
>
> On Wed, Aug 7, 2019 at 7:34 PM Saisai Shao <sa...@gmail.com> wrote:
>
>> IMHO I agree that we should have a branch to track the changes for Spark
>> 3.0.0. Spark 3.0.0 has several changes regarding to DataSource V2, it would
>> be better to evaluate the changes and do the design by also considering 3.0
>> changes.
>>
>> My two cents :)
>>
>> Best regards,
>> Saisai
>>
>> Edgar Rodriguez <ed...@airbnb.com> 于2019年8月8日周四 上午4:58写道:
>>
>>> Hi everyone,
>>>
>>> I was wondering if there's a branch tracking the changes happening in
>>> Spark 3.0.0 for Iceberg. The DataSource V2 API has substantially changed
>>> from the one implemented in Iceberg master branch and since Spark 3.0.0
>>> would allow us to introduce Spark SQL support then it seems interesting to
>>> start tracking those changes to start evaluating some of the support as it
>>> evolves.
>>>
>>> Thanks.
>>>
>>> Cheers,
>>> --
>>> Edgar Rodriguez
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg in Spark 3.0.0

Posted by Ryan Blue <rb...@netflix.com>.
I think it's a great idea to branch and get ready for Spark 3.0.0. Right
now, I'm focused on getting a release out, but I can review patches for
Spark 3.0.

Anyone know if there are nightly builds of Spark 3.0 to test with?

On Wed, Aug 7, 2019 at 7:34 PM Saisai Shao <sa...@gmail.com> wrote:

> IMHO I agree that we should have a branch to track the changes for Spark
> 3.0.0. Spark 3.0.0 has several changes regarding to DataSource V2, it would
> be better to evaluate the changes and do the design by also considering 3.0
> changes.
>
> My two cents :)
>
> Best regards,
> Saisai
>
> Edgar Rodriguez <ed...@airbnb.com> 于2019年8月8日周四 上午4:58写道:
>
>> Hi everyone,
>>
>> I was wondering if there's a branch tracking the changes happening in
>> Spark 3.0.0 for Iceberg. The DataSource V2 API has substantially changed
>> from the one implemented in Iceberg master branch and since Spark 3.0.0
>> would allow us to introduce Spark SQL support then it seems interesting to
>> start tracking those changes to start evaluating some of the support as it
>> evolves.
>>
>> Thanks.
>>
>> Cheers,
>> --
>> Edgar Rodriguez
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg in Spark 3.0.0

Posted by Saisai Shao <sa...@gmail.com>.
IMHO I agree that we should have a branch to track the changes for Spark
3.0.0. Spark 3.0.0 has several changes regarding to DataSource V2, it would
be better to evaluate the changes and do the design by also considering 3.0
changes.

My two cents :)

Best regards,
Saisai

Edgar Rodriguez <ed...@airbnb.com> 于2019年8月8日周四 上午4:58写道:

> Hi everyone,
>
> I was wondering if there's a branch tracking the changes happening in
> Spark 3.0.0 for Iceberg. The DataSource V2 API has substantially changed
> from the one implemented in Iceberg master branch and since Spark 3.0.0
> would allow us to introduce Spark SQL support then it seems interesting to
> start tracking those changes to start evaluating some of the support as it
> evolves.
>
> Thanks.
>
> Cheers,
> --
> Edgar Rodriguez
>