You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Michael Heuer <he...@gmail.com> on 2019/11/22 16:55:54 UTC

Spark 2.4.5 release for Parquet and Avro dependency updates?

Hello,

Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet <https://mail-archives.apache.org/mod_mbox/parquet-dev/201911.mbox/%3C8357699C-9295-4EB0-A39E-B3538D71795B@gmail.com%3E>).

Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?

Thank you in advance,

   michael

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Posted by Sean Owen <sr...@gmail.com>.
I haven't been following this closely, but I'm aware that there are
some tricky compatibility problems between Avro and Parquet, both of
which are used in Spark. That's made it pretty hard to update in 2.x.
master/3.0 is on Parquet 1.10.1 and Avro 1.8.2. Just a general
question: is that the best combo going forward? because the time to
update would be right about now for Spark 3. Backporting to 2.x is
pretty unlikely though.

On Fri, Nov 22, 2019 at 12:45 PM Michael Heuer <he...@gmail.com> wrote:
>
> Hello,
>
> I am sorry for asking a somewhat inappropriate question.
>
> For context, our projects depend on a fix in Parquet master but not yet released.  Parquet 1.11.0 is in release-candidate phase.  It looks like we can't build against Parquet 1.11.0 RC to include the fix and run successfully on Spark 2.4.x, which includes 1.10.1, without various classpath workarounds.
>
> I see now that Spark policy requires the Avro upgrade to wait until Spark 3.0, and since Parquet 1.11.0 RC currently depends on Avro 1.9.1, it may also have to wait.  I'll continue to think on this in the scope of the Parquet community.
>
> Thank you for the clarification,
>
>    michael
>
>
> On Nov 22, 2019, at 12:07 PM, Dongjoon Hyun <do...@gmail.com> wrote:
>
> Hi, Michael.
>
> I'm not sure Apache Spark is in the status close to what you want.
>
> First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want.
>
> Do we have a PR on the master branch? Otherwise, before starting to discuss the releases, could you make a PR first on the master branch? For Parquet, it's the same.
>
> Second, we want to provide Apache Spark 3.0.0 as compatible as possible. The incompatible change could be a reason for rejection even in `master` branch for Apache Spark 3.0.0.
>
> Lastly, we may consider backporting if it lands at `master` branch for 3.0.
> However, as Nan Zhu said, the dependency upgrade backporting PR is -1 by default. Usually, it's allowed only for those serious cases like security/production outage.
>
> Bests,
> Dongjoon.
>
>
> On Fri, Nov 22, 2019 at 9:00 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
>>
>> Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the same execution.
>>
>> Michael, please be more careful about announcing compatibility problems in other communities. If you've observed problems, let's find out the root cause first.
>>
>> rb
>>
>> On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <he...@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet).
>>>
>>> Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?
>>>
>>> Thank you in advance,
>>>
>>>    michael
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Posted by Michael Heuer <he...@gmail.com>.
Hello,

I am sorry for asking a somewhat inappropriate question.

For context, our projects depend on a fix in Parquet master but not yet released.  Parquet 1.11.0 is in release-candidate phase.  It looks like we can't build against Parquet 1.11.0 RC to include the fix and run successfully on Spark 2.4.x, which includes 1.10.1, without various classpath workarounds.

I see now that Spark policy requires the Avro upgrade to wait until Spark 3.0, and since Parquet 1.11.0 RC currently depends on Avro 1.9.1, it may also have to wait.  I'll continue to think on this in the scope of the Parquet community.

Thank you for the clarification,

   michael


> On Nov 22, 2019, at 12:07 PM, Dongjoon Hyun <do...@gmail.com> wrote:
> 
> Hi, Michael.
> 
> I'm not sure Apache Spark is in the status close to what you want.
> 
> First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want. 
> 
> Do we have a PR on the master branch? Otherwise, before starting to discuss the releases, could you make a PR first on the master branch? For Parquet, it's the same.
> 
> Second, we want to provide Apache Spark 3.0.0 as compatible as possible. The incompatible change could be a reason for rejection even in `master` branch for Apache Spark 3.0.0.
> 
> Lastly, we may consider backporting if it lands at `master` branch for 3.0.
> However, as Nan Zhu said, the dependency upgrade backporting PR is -1 by default. Usually, it's allowed only for those serious cases like security/production outage.
> 
> Bests,
> Dongjoon.
> 
> 
> On Fri, Nov 22, 2019 at 9:00 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
> Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the same execution.
> 
> Michael, please be more careful about announcing compatibility problems in other communities. If you've observed problems, let's find out the root cause first.
> 
> rb
> 
> On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <heuermh@gmail.com <ma...@gmail.com>> wrote:
> Hello,
> 
> Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet <https://mail-archives.apache.org/mod_mbox/parquet-dev/201911.mbox/%3C8357699C-9295-4EB0-A39E-B3538D71795B@gmail.com%3E>).
> 
> Might there be any desire to cut a Spark 2.4.5 release so that users can pick up these changes independently of all the other changes in Spark 3.0?
> 
> Thank you in advance,
> 
>    michael
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix


Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Posted by Dongjoon Hyun <do...@gmail.com>.
Hi, Michael.

I'm not sure Apache Spark is in the status close to what you want.

First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro
1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do
not provide you what you want.

Do we have a PR on the master branch? Otherwise, before starting to discuss
the releases, could you make a PR first on the master branch? For Parquet,
it's the same.

Second, we want to provide Apache Spark 3.0.0 as compatible as possible.
The incompatible change could be a reason for rejection even in `master`
branch for Apache Spark 3.0.0.

Lastly, we may consider backporting if it lands at `master` branch for 3.0.
However, as Nan Zhu said, the dependency upgrade backporting PR is -1 by
default. Usually, it's allowed only for those serious cases like
security/production outage.

Bests,
Dongjoon.


On Fri, Nov 22, 2019 at 9:00 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a
> runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the
> same execution.
>
> Michael, please be more careful about announcing compatibility problems in
> other communities. If you've observed problems, let's find out the root
> cause first.
>
> rb
>
> On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <he...@gmail.com> wrote:
>
>> Hello,
>>
>> Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that
>> Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on
>> dev@parquet
>> <https://mail-archives.apache.org/mod_mbox/parquet-dev/201911.mbox/%3C8357699C-9295-4EB0-A39E-B3538D71795B@gmail.com%3E>
>> ).
>>
>> Might there be any desire to cut a Spark 2.4.5 release so that users can
>> pick up these changes independently of all the other changes in Spark 3.0?
>>
>> Thank you in advance,
>>
>>    michael
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a
runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the
same execution.

Michael, please be more careful about announcing compatibility problems in
other communities. If you've observed problems, let's find out the root
cause first.

rb

On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <he...@gmail.com> wrote:

> Hello,
>
> Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that
> Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on
> dev@parquet
> <https://mail-archives.apache.org/mod_mbox/parquet-dev/201911.mbox/%3C8357699C-9295-4EB0-A39E-B3538D71795B@gmail.com%3E>
> ).
>
> Might there be any desire to cut a Spark 2.4.5 release so that users can
> pick up these changes independently of all the other changes in Spark 3.0?
>
> Thank you in advance,
>
>    michael
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

Posted by Nan Zhu <zh...@gmail.com>.
I am not sure if it is a good practice to have breaking changes in
dependencies for maintenance releases

On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <he...@gmail.com> wrote:

> Hello,
>
> Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that
> Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on
> dev@parquet
> <https://mail-archives.apache.org/mod_mbox/parquet-dev/201911.mbox/%3C8357699C-9295-4EB0-A39E-B3538D71795B@gmail.com%3E>
> ).
>
> Might there be any desire to cut a Spark 2.4.5 release so that users can
> pick up these changes independently of all the other changes in Spark 3.0?
>
> Thank you in advance,
>
>    michael
>