You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Karen Feng <ka...@databricks.com> on 2020/02/19 00:55:29 UTC

Breaking API changes in Spark 3.0

Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Breaking API changes in Spark 3.0

Posted by Holden Karau <ho...@pigscanfly.ca>.
So my view of how common & stable API removal should go (in general I want
to be clear exceptions can and do make sense)
1) Deprecate API
2) Release replacement API
3) Provide migration guidance (ideally in deprecated annotation, but
possible in release notes or elsewhere)
4) Remove old API

I think, ideally, we should 1, 2, and 3 occur in a release prior to 4. If
this is not possible I think having a quick discussion on the dev list is a
reasonable given the potential impact on our users. I think the preview
release is a good opportunity for us to get an idea of if something is
going to have a really large impact.

I think we've felt this pain as developers on top of Scala before, and
knowing how painful that has been in our own experiences I'd like us to
minimize the pain of this type that we make our users experience.
And it's not like having the conversation will have no utility at all, the
discussion will be visible to users searching so they can see the rationale
and hopefully migration suggestions.


On Wed, Feb 19, 2020 at 7:02 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> I think I was too rushed to read and focused on the first sentence of
> Karen's input. Sorry about that.
>
> As I said I'm not sure I can agree with the point of deprecation and
> breaking changes of APIs, the thread has another topic which seems to be a
> good input - practice on new API proposal. I feel it should be different
> thread to discuss, though.
>
> Maybe we can make the deprecation of API as "heavy-weight" operation to
> mitigate the impact a bit, like requiring discussion thread to reach
> consensus before going through PR. For now, you have no idea which API is
> going to be deprecated and why if you only subscribe to dev@. Even you
> subscribe the issue@ you would miss it among flooded issues.
>
> Personally I feel the root cause as dev@ is very quiet compared to the
> volume of PRs the community gets and the impacts of changes these PRs have
> been made. I agree we should have balance on this to avoid restricting
> ourselves too much, but I feel there's no balance now - most things are
> just going through PRs without discussion. It would be ideal we have time
> to consider on this.
>
>
> On Thu, Feb 20, 2020 at 8:50 AM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> Apache Spark 2.0 was released in July 2016. Assuming the project has been
>> trying the best to follow the semantic versioning, it is "more than three
>> years" to wait for the breaking changes. What the community misses to
>> address necessary breaking changes would be going to be technical debts for
>> another 3+ years.
>>
>> As the PRs removing deprecated APIs were pointed out first, I'm not sure
>> about the reason. I roughly remember that these PRs target to remove
>> deprecated APIs deprecated at couple of minor versions before. If then
>> what's the matter?
>>
>> If the deprecation messages don't kindly guide about alternatives then
>> that's the major problem the community should concern and try to fix, but
>> that's another problem. The community doesn't deprecate the API just for
>> fun. Every deprecation has the reason, and not removing the API doesn't
>> make sense unless the community has mistaken for a reason of deprecation.
>>
>> If the community really would like to build some (soft) rules/policies on
>> deprecation, I would only imagine 2 items -
>>
>> 1. define "minimum release to live" (either each deprecated API or
>> globally)
>> 2. never skip describing the reason of deprecation and try best to
>> describe alternative works same or similar - if the alternative doesn't
>> work exactly same, also describe the difference (optionally, maybe)
>>
>> I cannot imagine other problems at all about deprecation.
>>
> I think those guidelines seem reasonable to me. I've written a bit more
about what I'd expect us to be doing as a project with as many downstream
consumers that we have.

>
>> On Thu, Feb 20, 2020 at 7:36 AM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> Sure. I understand the background of the following requests. So, it's a
>>> good time to decide the criteria in order to start discussion.
>>>
>>>     1. "to provide a reasonable migration path we’d want the replacement
>>> of the deprecated API to also exist in 2.4"
>>>     2. "We need to discuss the APIs case by case"
>>>
>>> For now, it's unclear what is `necessarily painful`, what is "widely
>>> used APIs", or how small is "the maintenance costs are small".
>>>
>> I think these are all case by case. For example, to me, in the original
situation which kicked off the thread the SQLContext getOrCreate probably
doesn't need to keep existing given that we've had SparkSession builder's
getOrCreate for several releases and it's been deprecated.

>
>>> I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward
>>> compatible with Apache Spark 2.4.5 like Apache Kafka?
>>> Are we going to revert all changes? If there is a clear criteria, we
>>> didn't need to do the clean up for that long period of 3.0.0.
>>>
>>> BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in
>>> this thread.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <li...@databricks.com> wrote:
>>>
>>>> Like https://github.com/apache/spark/pull/23131, we added back
>>>> unionAll.
>>>>
>>>> We might need to double check whether we removed some widely used APIs
>>>> in this release before RC. If the maintenance costs are small, keeping some
>>>> deprecated APIs look reasonable to me. This can help the adoption of Spark
>>>> 3.0. We need to discuss the APIs case by case.
>>>>
>>>> Xiao
>>>>
>>>> On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <ho...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>> So my understanding would be that to provide a reasonable migration
>>>>> path we’d want the replacement of the deprecated API to also exist in 2.4
>>>>> this way libraries and programs can dual target during the migration
>>>>> process.
>>>>>
>>>>> Now that isn’t always going to be doable, but certainly worth looking
>>>>> at the situations where we aren’t providing a smooth migration path and
>>>>> making sure it’s the best thing to do.
>>>>>
>>>>> On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, Karen.
>>>>>>
>>>>>> Are you saying that Spark 3 has to have all deprecated 2.x APIs?
>>>>>> Could you tell us what is your criteria for `unnecessarily` or
>>>>>> `necessarily`?
>>>>>>
>>>>>> > the migration process from Spark 2 to Spark 3 unnecessarily painful.
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <ka...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am concerned that the API-breaking changes in SPARK-25908 (as well
>>>>>>> as
>>>>>>> SPARK-16775, and potentially others) will make the migration process
>>>>>>> from
>>>>>>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
>>>>>>> SQLContext.getOrCreate will break a large number of libraries
>>>>>>> currently
>>>>>>> built on Spark 2.
>>>>>>>
>>>>>>> Even if library developers do not use deprecated APIs, API changes
>>>>>>> between
>>>>>>> 2.x and 3.x will result in inconsistencies that require hacking
>>>>>>> around. For
>>>>>>> a fairly small and new (2.4.3+) genomics library, I had to create a
>>>>>>> number
>>>>>>> of shims (https://github.com/projectglow/glow/pull/155) for the
>>>>>>> source and
>>>>>>> test code due to API changes in SPARK-25393, SPARK-27328,
>>>>>>> SPARK-28744.
>>>>>>>
>>>>>>> It would be best practice to avoid breaking existing APIs to ease
>>>>>>> library
>>>>>>> development. To avoid dealing with similar deprecated API issues
>>>>>>> down the
>>>>>>> road, we should practice more prudence when considering new API
>>>>>>> proposals.
>>>>>>>
>>>>>>> I'd love to see more discussion on this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from:
>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>
>>>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>
>>>>
>>>>
>>>> --
>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>
>>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Breaking API changes in Spark 3.0

Posted by Jungtaek Lim <ka...@gmail.com>.
I think I was too rushed to read and focused on the first sentence of
Karen's input. Sorry about that.

As I said I'm not sure I can agree with the point of deprecation and
breaking changes of APIs, the thread has another topic which seems to be a
good input - practice on new API proposal. I feel it should be different
thread to discuss, though.

Maybe we can make the deprecation of API as "heavy-weight" operation to
mitigate the impact a bit, like requiring discussion thread to reach
consensus before going through PR. For now, you have no idea which API is
going to be deprecated and why if you only subscribe to dev@. Even you
subscribe the issue@ you would miss it among flooded issues.

Personally I feel the root cause as dev@ is very quiet compared to the
volume of PRs the community gets and the impacts of changes these PRs have
been made. I agree we should have balance on this to avoid restricting
ourselves too much, but I feel there's no balance now - most things are
just going through PRs without discussion. It would be ideal we have time
to consider on this.


On Thu, Feb 20, 2020 at 8:50 AM Jungtaek Lim <ka...@gmail.com>
wrote:

> Apache Spark 2.0 was released in July 2016. Assuming the project has been
> trying the best to follow the semantic versioning, it is "more than three
> years" to wait for the breaking changes. What the community misses to
> address necessary breaking changes would be going to be technical debts for
> another 3+ years.
>
> As the PRs removing deprecated APIs were pointed out first, I'm not sure
> about the reason. I roughly remember that these PRs target to remove
> deprecated APIs deprecated at couple of minor versions before. If then
> what's the matter?
>
> If the deprecation messages don't kindly guide about alternatives then
> that's the major problem the community should concern and try to fix, but
> that's another problem. The community doesn't deprecate the API just for
> fun. Every deprecation has the reason, and not removing the API doesn't
> make sense unless the community has mistaken for a reason of deprecation.
>
> If the community really would like to build some (soft) rules/policies on
> deprecation, I would only imagine 2 items -
>
> 1. define "minimum release to live" (either each deprecated API or
> globally)
> 2. never skip describing the reason of deprecation and try best to
> describe alternative works same or similar - if the alternative doesn't
> work exactly same, also describe the difference (optionally, maybe)
>
> I cannot imagine other problems at all about deprecation.
>
> On Thu, Feb 20, 2020 at 7:36 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Sure. I understand the background of the following requests. So, it's a
>> good time to decide the criteria in order to start discussion.
>>
>>     1. "to provide a reasonable migration path we’d want the replacement
>> of the deprecated API to also exist in 2.4"
>>     2. "We need to discuss the APIs case by case"
>>
>> For now, it's unclear what is `necessarily painful`, what is "widely used
>> APIs", or how small is "the maintenance costs are small".
>>
>> I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward
>> compatible with Apache Spark 2.4.5 like Apache Kafka?
>> Are we going to revert all changes? If there is a clear criteria, we
>> didn't need to do the clean up for that long period of 3.0.0.
>>
>> BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in
>> this thread.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <li...@databricks.com> wrote:
>>
>>> Like https://github.com/apache/spark/pull/23131, we added back
>>> unionAll.
>>>
>>> We might need to double check whether we removed some widely used APIs
>>> in this release before RC. If the maintenance costs are small, keeping some
>>> deprecated APIs look reasonable to me. This can help the adoption of Spark
>>> 3.0. We need to discuss the APIs case by case.
>>>
>>> Xiao
>>>
>>> On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <ho...@pigscanfly.ca>
>>> wrote:
>>>
>>>> So my understanding would be that to provide a reasonable migration
>>>> path we’d want the replacement of the deprecated API to also exist in 2.4
>>>> this way libraries and programs can dual target during the migration
>>>> process.
>>>>
>>>> Now that isn’t always going to be doable, but certainly worth looking
>>>> at the situations where we aren’t providing a smooth migration path and
>>>> making sure it’s the best thing to do.
>>>>
>>>> On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi, Karen.
>>>>>
>>>>> Are you saying that Spark 3 has to have all deprecated 2.x APIs?
>>>>> Could you tell us what is your criteria for `unnecessarily` or
>>>>> `necessarily`?
>>>>>
>>>>> > the migration process from Spark 2 to Spark 3 unnecessarily painful.
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <ka...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am concerned that the API-breaking changes in SPARK-25908 (as well
>>>>>> as
>>>>>> SPARK-16775, and potentially others) will make the migration process
>>>>>> from
>>>>>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
>>>>>> SQLContext.getOrCreate will break a large number of libraries
>>>>>> currently
>>>>>> built on Spark 2.
>>>>>>
>>>>>> Even if library developers do not use deprecated APIs, API changes
>>>>>> between
>>>>>> 2.x and 3.x will result in inconsistencies that require hacking
>>>>>> around. For
>>>>>> a fairly small and new (2.4.3+) genomics library, I had to create a
>>>>>> number
>>>>>> of shims (https://github.com/projectglow/glow/pull/155) for the
>>>>>> source and
>>>>>> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.
>>>>>>
>>>>>> It would be best practice to avoid breaking existing APIs to ease
>>>>>> library
>>>>>> development. To avoid dealing with similar deprecated API issues down
>>>>>> the
>>>>>> road, we should practice more prudence when considering new API
>>>>>> proposals.
>>>>>>
>>>>>> I'd love to see more discussion on this.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>
>>>
>>> --
>>> <https://databricks.com/sparkaisummit/north-america>
>>>
>>

Re: Breaking API changes in Spark 3.0

Posted by Jungtaek Lim <ka...@gmail.com>.
Apache Spark 2.0 was released in July 2016. Assuming the project has been
trying the best to follow the semantic versioning, it is "more than three
years" to wait for the breaking changes. What the community misses to
address necessary breaking changes would be going to be technical debts for
another 3+ years.

As the PRs removing deprecated APIs were pointed out first, I'm not sure
about the reason. I roughly remember that these PRs target to remove
deprecated APIs deprecated at couple of minor versions before. If then
what's the matter?

If the deprecation messages don't kindly guide about alternatives then
that's the major problem the community should concern and try to fix, but
that's another problem. The community doesn't deprecate the API just for
fun. Every deprecation has the reason, and not removing the API doesn't
make sense unless the community has mistaken for a reason of deprecation.

If the community really would like to build some (soft) rules/policies on
deprecation, I would only imagine 2 items -

1. define "minimum release to live" (either each deprecated API or globally)
2. never skip describing the reason of deprecation and try best to describe
alternative works same or similar - if the alternative doesn't work exactly
same, also describe the difference (optionally, maybe)

I cannot imagine other problems at all about deprecation.

On Thu, Feb 20, 2020 at 7:36 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Sure. I understand the background of the following requests. So, it's a
> good time to decide the criteria in order to start discussion.
>
>     1. "to provide a reasonable migration path we’d want the replacement
> of the deprecated API to also exist in 2.4"
>     2. "We need to discuss the APIs case by case"
>
> For now, it's unclear what is `necessarily painful`, what is "widely used
> APIs", or how small is "the maintenance costs are small".
>
> I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward
> compatible with Apache Spark 2.4.5 like Apache Kafka?
> Are we going to revert all changes? If there is a clear criteria, we
> didn't need to do the clean up for that long period of 3.0.0.
>
> BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in
> this thread.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <li...@databricks.com> wrote:
>
>> Like https://github.com/apache/spark/pull/23131, we added back unionAll.
>>
>> We might need to double check whether we removed some widely used APIs in
>> this release before RC. If the maintenance costs are small, keeping some
>> deprecated APIs look reasonable to me. This can help the adoption of Spark
>> 3.0. We need to discuss the APIs case by case.
>>
>> Xiao
>>
>> On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>>
>>> So my understanding would be that to provide a reasonable migration path
>>> we’d want the replacement of the deprecated API to also exist in 2.4 this
>>> way libraries and programs can dual target during the migration process.
>>>
>>> Now that isn’t always going to be doable, but certainly worth looking at
>>> the situations where we aren’t providing a smooth migration path and making
>>> sure it’s the best thing to do.
>>>
>>> On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> Hi, Karen.
>>>>
>>>> Are you saying that Spark 3 has to have all deprecated 2.x APIs?
>>>> Could you tell us what is your criteria for `unnecessarily` or
>>>> `necessarily`?
>>>>
>>>> > the migration process from Spark 2 to Spark 3 unnecessarily painful.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <ka...@databricks.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am concerned that the API-breaking changes in SPARK-25908 (as well as
>>>>> SPARK-16775, and potentially others) will make the migration process
>>>>> from
>>>>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
>>>>> SQLContext.getOrCreate will break a large number of libraries currently
>>>>> built on Spark 2.
>>>>>
>>>>> Even if library developers do not use deprecated APIs, API changes
>>>>> between
>>>>> 2.x and 3.x will result in inconsistencies that require hacking
>>>>> around. For
>>>>> a fairly small and new (2.4.3+) genomics library, I had to create a
>>>>> number
>>>>> of shims (https://github.com/projectglow/glow/pull/155) for the
>>>>> source and
>>>>> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.
>>>>>
>>>>> It would be best practice to avoid breaking existing APIs to ease
>>>>> library
>>>>> development. To avoid dealing with similar deprecated API issues down
>>>>> the
>>>>> road, we should practice more prudence when considering new API
>>>>> proposals.
>>>>>
>>>>> I'd love to see more discussion on this.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>
>>>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>>
>> --
>> <https://databricks.com/sparkaisummit/north-america>
>>
>

Re: Breaking API changes in Spark 3.0

Posted by Dongjoon Hyun <do...@gmail.com>.
Sure. I understand the background of the following requests. So, it's a
good time to decide the criteria in order to start discussion.

    1. "to provide a reasonable migration path we’d want the replacement of
the deprecated API to also exist in 2.4"
    2. "We need to discuss the APIs case by case"

For now, it's unclear what is `necessarily painful`, what is "widely used
APIs", or how small is "the maintenance costs are small".

I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward
compatible with Apache Spark 2.4.5 like Apache Kafka?
Are we going to revert all changes? If there is a clear criteria, we didn't
need to do the clean up for that long period of 3.0.0.

BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in
this thread.

Bests,
Dongjoon.


On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <li...@databricks.com> wrote:

> Like https://github.com/apache/spark/pull/23131, we added back unionAll.
>
> We might need to double check whether we removed some widely used APIs in
> this release before RC. If the maintenance costs are small, keeping some
> deprecated APIs look reasonable to me. This can help the adoption of Spark
> 3.0. We need to discuss the APIs case by case.
>
> Xiao
>
> On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <ho...@pigscanfly.ca> wrote:
>
>> So my understanding would be that to provide a reasonable migration path
>> we’d want the replacement of the deprecated API to also exist in 2.4 this
>> way libraries and programs can dual target during the migration process.
>>
>> Now that isn’t always going to be doable, but certainly worth looking at
>> the situations where we aren’t providing a smooth migration path and making
>> sure it’s the best thing to do.
>>
>> On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> Hi, Karen.
>>>
>>> Are you saying that Spark 3 has to have all deprecated 2.x APIs?
>>> Could you tell us what is your criteria for `unnecessarily` or
>>> `necessarily`?
>>>
>>> > the migration process from Spark 2 to Spark 3 unnecessarily painful.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <ka...@databricks.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am concerned that the API-breaking changes in SPARK-25908 (as well as
>>>> SPARK-16775, and potentially others) will make the migration process
>>>> from
>>>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
>>>> SQLContext.getOrCreate will break a large number of libraries currently
>>>> built on Spark 2.
>>>>
>>>> Even if library developers do not use deprecated APIs, API changes
>>>> between
>>>> 2.x and 3.x will result in inconsistencies that require hacking around.
>>>> For
>>>> a fairly small and new (2.4.3+) genomics library, I had to create a
>>>> number
>>>> of shims (https://github.com/projectglow/glow/pull/155) for the source
>>>> and
>>>> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.
>>>>
>>>> It would be best practice to avoid breaking existing APIs to ease
>>>> library
>>>> development. To avoid dealing with similar deprecated API issues down
>>>> the
>>>> road, we should practice more prudence when considering new API
>>>> proposals.
>>>>
>>>> I'd love to see more discussion on this.
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>
> --
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Breaking API changes in Spark 3.0

Posted by Xiao Li <li...@databricks.com>.
Like https://github.com/apache/spark/pull/23131, we added back unionAll.

We might need to double check whether we removed some widely used APIs in
this release before RC. If the maintenance costs are small, keeping some
deprecated APIs look reasonable to me. This can help the adoption of Spark
3.0. We need to discuss the APIs case by case.

Xiao

On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <ho...@pigscanfly.ca> wrote:

> So my understanding would be that to provide a reasonable migration path
> we’d want the replacement of the deprecated API to also exist in 2.4 this
> way libraries and programs can dual target during the migration process.
>
> Now that isn’t always going to be doable, but certainly worth looking at
> the situations where we aren’t providing a smooth migration path and making
> sure it’s the best thing to do.
>
> On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Hi, Karen.
>>
>> Are you saying that Spark 3 has to have all deprecated 2.x APIs?
>> Could you tell us what is your criteria for `unnecessarily` or
>> `necessarily`?
>>
>> > the migration process from Spark 2 to Spark 3 unnecessarily painful.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <ka...@databricks.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I am concerned that the API-breaking changes in SPARK-25908 (as well as
>>> SPARK-16775, and potentially others) will make the migration process from
>>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
>>> SQLContext.getOrCreate will break a large number of libraries currently
>>> built on Spark 2.
>>>
>>> Even if library developers do not use deprecated APIs, API changes
>>> between
>>> 2.x and 3.x will result in inconsistencies that require hacking around.
>>> For
>>> a fairly small and new (2.4.3+) genomics library, I had to create a
>>> number
>>> of shims (https://github.com/projectglow/glow/pull/155) for the source
>>> and
>>> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.
>>>
>>> It would be best practice to avoid breaking existing APIs to ease library
>>> development. To avoid dealing with similar deprecated API issues down the
>>> road, we should practice more prudence when considering new API
>>> proposals.
>>>
>>> I'd love to see more discussion on this.
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


-- 
<https://databricks.com/sparkaisummit/north-america>

Re: Breaking API changes in Spark 3.0

Posted by Holden Karau <ho...@pigscanfly.ca>.
So my understanding would be that to provide a reasonable migration path
we’d want the replacement of the deprecated API to also exist in 2.4 this
way libraries and programs can dual target during the migration process.

Now that isn’t always going to be doable, but certainly worth looking at
the situations where we aren’t providing a smooth migration path and making
sure it’s the best thing to do.

On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Hi, Karen.
>
> Are you saying that Spark 3 has to have all deprecated 2.x APIs?
> Could you tell us what is your criteria for `unnecessarily` or
> `necessarily`?
>
> > the migration process from Spark 2 to Spark 3 unnecessarily painful.
>
> Bests,
> Dongjoon.
>
>
> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <ka...@databricks.com>
> wrote:
>
>> Hi all,
>>
>> I am concerned that the API-breaking changes in SPARK-25908 (as well as
>> SPARK-16775, and potentially others) will make the migration process from
>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
>> SQLContext.getOrCreate will break a large number of libraries currently
>> built on Spark 2.
>>
>> Even if library developers do not use deprecated APIs, API changes between
>> 2.x and 3.x will result in inconsistencies that require hacking around.
>> For
>> a fairly small and new (2.4.3+) genomics library, I had to create a number
>> of shims (https://github.com/projectglow/glow/pull/155) for the source
>> and
>> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.
>>
>> It would be best practice to avoid breaking existing APIs to ease library
>> development. To avoid dealing with similar deprecated API issues down the
>> road, we should practice more prudence when considering new API proposals.
>>
>> I'd love to see more discussion on this.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Breaking API changes in Spark 3.0

Posted by Dongjoon Hyun <do...@gmail.com>.
Hi, Karen.

Are you saying that Spark 3 has to have all deprecated 2.x APIs?
Could you tell us what is your criteria for `unnecessarily` or
`necessarily`?

> the migration process from Spark 2 to Spark 3 unnecessarily painful.

Bests,
Dongjoon.


On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <ka...@databricks.com>
wrote:

> Hi all,
>
> I am concerned that the API-breaking changes in SPARK-25908 (as well as
> SPARK-16775, and potentially others) will make the migration process from
> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
> SQLContext.getOrCreate will break a large number of libraries currently
> built on Spark 2.
>
> Even if library developers do not use deprecated APIs, API changes between
> 2.x and 3.x will result in inconsistencies that require hacking around. For
> a fairly small and new (2.4.3+) genomics library, I had to create a number
> of shims (https://github.com/projectglow/glow/pull/155) for the source and
> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.
>
> It would be best practice to avoid breaking existing APIs to ease library
> development. To avoid dealing with similar deprecated API issues down the
> road, we should practice more prudence when considering new API proposals.
>
> I'd love to see more discussion on this.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>