You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by OpenInx <op...@gmail.com> on 2022/02/21 03:03:28 UTC

[DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

Hi everyone

The current spark2.4, spark3.0 have the following unaligned runtime
artifact names:

# Spark 2.4
iceberg-spark-runtime-0.13.1.jar
# Spark 3.0
iceberg-spark3-runtime-0.13.1.jar
# Spark 3.1
iceberg-spark-runtime-3.1_2.12-0.13.1.jar
# Spark 3.2
iceberg-spark-runtime-3.2_2.12-0.13.1.jar

From the spark 3.1 and spark 3.2's runtime artifact names, we can easily
recognize:
1. What's the spark major version that the runtime jar is attached to
2. What's the spark scala version that the runtime jar is compiled with

But for spark 3.0 and spark 2.4,  it's not easy to understand what's the
above information.  I think we kept those legacy names because they were
introduced in older iceberg releases and we wanted to avoid changing the
modules that users depend on and opted not to rename, but they are indeed
causing confusion for the new community users.

In general,   we have two options:

Option#1:  keep the current artifact names, that mean spark 2.4 & spark 3.0
will always use the iceberg-spark-runtime-<iceberg-version>.jar and
iceberg-spark3-runtime-<iceberg-version>.jar until them get retired in the
apache iceberg official repo.
Option#2:  Change the spark2.4 & spark3.0's artifact names to the generic
name format:
iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
 It makes sharing all the consistent name format between all the spark
versions.

Personally, I'd prefer option#2 because that looks more friendly for new
community users (although it will require the old users to change their
pom.xml to the new version).

What is your preference ?

Reference:
1.  Created a PR to change the artifact names and we had few discussions
there. https://github.com/apache/iceberg/pull/4158
2.  https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155

Re: [DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

Posted by OpenInx <op...@gmail.com>.
So we basically  agree to rename the spark artifact names.  Is there any
other concern for this PR: https://github.com/apache/iceberg/pull/4158/ ?

On Wed, Feb 23, 2022 at 1:48 AM Ryan Blue <bl...@tabular.io> wrote:

> I initially supported not renaming for the reason that Jeff raised, but
> now I'm more convinced by Kyle's argument. This is confusing and it isn't
> that big of a problem to use a different Jar. +1 to renaming.
>
> On Sun, Feb 20, 2022 at 10:57 PM Yufei Gu <fl...@gmail.com> wrote:
>
>> Agreed with Kyle. An artifact name of Spark3.0 like
>> iceberg-spark-runtime-3.0_2.12-0.13.1.jar is more accurate and
>> consistent,  less confusing for users.
>>
>> On Sun, Feb 20, 2022 at 10:47 PM Kyle Bendickson <ky...@tabular.io> wrote:
>>
>>> Thanks for bringing this up Jeff!
>>>
>>> Normally I agree, it’s not a good practice to change artifact name.
>>> However, in this case, the artifact has changed already. The
>>> “spark3-runtime” used to be for all versions of Spark 3 (at the time Spark
>>> 3.0 and 3.1). It no longer is, as it’s only tested / used with Spark 3.0.
>>>
>>> I encounter many users who have upgraded to newer versions of Spark, but
>>> have not upgraded the artifact to the newly versioned by Spark name system
>>> as “spark3-runtime” sounds like it encompasses all versions. And they
>>> encounter subtle bugs and it’s not a great user experience to solve
>>> upgrading that way.
>>>
>>> These users are, however, updating the Iceberg artifact to the new
>>> versions.
>>>
>>> So I think in this case, breaking naming has benefits. As users who go
>>> to upgrade when new Iceberg version are released, and their dependency is
>>> not found, they will hopefully check maven and see the new naming
>>> convention / artifacts.
>>>
>>> So I support option 2 also, with naming with Spark and Scala versions.
>>> Otherwise, we continue to see people using the old “spark3-runtime” as they
>>> upgrade Spark versions and encounter subtle errors (class not found, wrong
>>> type signatures due to version mismatch).
>>>
>>> Users eventually have to upgrade their pom if / when they upgrade Spark,
>>> due to incompatibility. This way at least, breaking will be loud as there’s
>>> won’t be a new Iceberg version,
>>>
>>> Is it possible to mark to the old spark3-runtime / spark-runtime as
>>> deprecated or otherwise point to the new artifacts in Maven?
>>>
>>> - Kyle
>>>
>>> On Sun, Feb 20, 2022 at 9:41 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>
>>>> I don't think it is best practice to just change the artifact name of
>>>> published jars. Unless we publish a new version with the new naming
>>>> convention.
>>>>
>>>> On Mon, Feb 21, 2022 at 12:36 PM Jack Ye <ye...@gmail.com> wrote:
>>>>
>>>>> I think option 2 is ideal, but I don't know if there is any hard
>>>>> requirement from ASF/Maven Central side for us to keep backwards
>>>>> compatibility of package names published in maven. If there is a
>>>>> requirement then we cannot change it.
>>>>>
>>>>> As a mitigation, I stated in
>>>>> https://iceberg.apache.org/multi-engine-support that Spark 2.4 and
>>>>> 3.0 jar names do not follow the naming convention of newer versions for
>>>>> backwards compatibility.
>>>>>
>>>>> Best,
>>>>> Jack Ye
>>>>>
>>>>> On Sun, Feb 20, 2022 at 7:03 PM OpenInx <op...@gmail.com> wrote:
>>>>>
>>>>>> Hi everyone
>>>>>>
>>>>>> The current spark2.4, spark3.0 have the following unaligned runtime
>>>>>> artifact names:
>>>>>>
>>>>>> # Spark 2.4
>>>>>> iceberg-spark-runtime-0.13.1.jar
>>>>>> # Spark 3.0
>>>>>> iceberg-spark3-runtime-0.13.1.jar
>>>>>> # Spark 3.1
>>>>>> iceberg-spark-runtime-3.1_2.12-0.13.1.jar
>>>>>> # Spark 3.2
>>>>>> iceberg-spark-runtime-3.2_2.12-0.13.1.jar
>>>>>>
>>>>>> From the spark 3.1 and spark 3.2's runtime artifact names, we can
>>>>>> easily recognize:
>>>>>> 1. What's the spark major version that the runtime jar is attached to
>>>>>> 2. What's the spark scala version that the runtime jar is compiled
>>>>>> with
>>>>>>
>>>>>> But for spark 3.0 and spark 2.4,  it's not easy to understand what's
>>>>>> the above information.  I think we kept those legacy names because they
>>>>>> were introduced in older iceberg releases and we wanted to avoid changing
>>>>>> the modules that users depend on and opted not to rename, but they are
>>>>>> indeed causing confusion for the new community users.
>>>>>>
>>>>>> In general,   we have two options:
>>>>>>
>>>>>> Option#1:  keep the current artifact names, that mean spark 2.4 &
>>>>>> spark 3.0 will always use the iceberg-spark-runtime-<iceberg-version>.jar
>>>>>> and iceberg-spark3-runtime-<iceberg-version>.jar until them get retired in
>>>>>> the apache iceberg official repo.
>>>>>> Option#2:  Change the spark2.4 & spark3.0's artifact names to the
>>>>>> generic name format:
>>>>>> iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
>>>>>>  It makes sharing all the consistent name format between all the spark
>>>>>> versions.
>>>>>>
>>>>>> Personally, I'd prefer option#2 because that looks more friendly for
>>>>>> new community users (although it will require the old users to change their
>>>>>> pom.xml to the new version).
>>>>>>
>>>>>> What is your preference ?
>>>>>>
>>>>>> Reference:
>>>>>> 1.  Created a PR to change the artifact names and we had few
>>>>>> discussions there. https://github.com/apache/iceberg/pull/4158
>>>>>> 2.
>>>>>> https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>> --
>> Best,
>>
>> Yufei
>>
>> `This is not a contribution`
>>
>
>
> --
> Ryan Blue
> Tabular
>

Re: [DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

Posted by Ryan Blue <bl...@tabular.io>.
I initially supported not renaming for the reason that Jeff raised, but now
I'm more convinced by Kyle's argument. This is confusing and it isn't that
big of a problem to use a different Jar. +1 to renaming.

On Sun, Feb 20, 2022 at 10:57 PM Yufei Gu <fl...@gmail.com> wrote:

> Agreed with Kyle. An artifact name of Spark3.0 like
> iceberg-spark-runtime-3.0_2.12-0.13.1.jar is more accurate and
> consistent,  less confusing for users.
>
> On Sun, Feb 20, 2022 at 10:47 PM Kyle Bendickson <ky...@tabular.io> wrote:
>
>> Thanks for bringing this up Jeff!
>>
>> Normally I agree, it’s not a good practice to change artifact name.
>> However, in this case, the artifact has changed already. The
>> “spark3-runtime” used to be for all versions of Spark 3 (at the time Spark
>> 3.0 and 3.1). It no longer is, as it’s only tested / used with Spark 3.0.
>>
>> I encounter many users who have upgraded to newer versions of Spark, but
>> have not upgraded the artifact to the newly versioned by Spark name system
>> as “spark3-runtime” sounds like it encompasses all versions. And they
>> encounter subtle bugs and it’s not a great user experience to solve
>> upgrading that way.
>>
>> These users are, however, updating the Iceberg artifact to the new
>> versions.
>>
>> So I think in this case, breaking naming has benefits. As users who go to
>> upgrade when new Iceberg version are released, and their dependency is not
>> found, they will hopefully check maven and see the new naming convention /
>> artifacts.
>>
>> So I support option 2 also, with naming with Spark and Scala versions.
>> Otherwise, we continue to see people using the old “spark3-runtime” as they
>> upgrade Spark versions and encounter subtle errors (class not found, wrong
>> type signatures due to version mismatch).
>>
>> Users eventually have to upgrade their pom if / when they upgrade Spark,
>> due to incompatibility. This way at least, breaking will be loud as there’s
>> won’t be a new Iceberg version,
>>
>> Is it possible to mark to the old spark3-runtime / spark-runtime as
>> deprecated or otherwise point to the new artifacts in Maven?
>>
>> - Kyle
>>
>> On Sun, Feb 20, 2022 at 9:41 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> I don't think it is best practice to just change the artifact name of
>>> published jars. Unless we publish a new version with the new naming
>>> convention.
>>>
>>> On Mon, Feb 21, 2022 at 12:36 PM Jack Ye <ye...@gmail.com> wrote:
>>>
>>>> I think option 2 is ideal, but I don't know if there is any hard
>>>> requirement from ASF/Maven Central side for us to keep backwards
>>>> compatibility of package names published in maven. If there is a
>>>> requirement then we cannot change it.
>>>>
>>>> As a mitigation, I stated in
>>>> https://iceberg.apache.org/multi-engine-support that Spark 2.4 and 3.0
>>>> jar names do not follow the naming convention of newer versions for
>>>> backwards compatibility.
>>>>
>>>> Best,
>>>> Jack Ye
>>>>
>>>> On Sun, Feb 20, 2022 at 7:03 PM OpenInx <op...@gmail.com> wrote:
>>>>
>>>>> Hi everyone
>>>>>
>>>>> The current spark2.4, spark3.0 have the following unaligned runtime
>>>>> artifact names:
>>>>>
>>>>> # Spark 2.4
>>>>> iceberg-spark-runtime-0.13.1.jar
>>>>> # Spark 3.0
>>>>> iceberg-spark3-runtime-0.13.1.jar
>>>>> # Spark 3.1
>>>>> iceberg-spark-runtime-3.1_2.12-0.13.1.jar
>>>>> # Spark 3.2
>>>>> iceberg-spark-runtime-3.2_2.12-0.13.1.jar
>>>>>
>>>>> From the spark 3.1 and spark 3.2's runtime artifact names, we can
>>>>> easily recognize:
>>>>> 1. What's the spark major version that the runtime jar is attached to
>>>>> 2. What's the spark scala version that the runtime jar is compiled with
>>>>>
>>>>> But for spark 3.0 and spark 2.4,  it's not easy to understand what's
>>>>> the above information.  I think we kept those legacy names because they
>>>>> were introduced in older iceberg releases and we wanted to avoid changing
>>>>> the modules that users depend on and opted not to rename, but they are
>>>>> indeed causing confusion for the new community users.
>>>>>
>>>>> In general,   we have two options:
>>>>>
>>>>> Option#1:  keep the current artifact names, that mean spark 2.4 &
>>>>> spark 3.0 will always use the iceberg-spark-runtime-<iceberg-version>.jar
>>>>> and iceberg-spark3-runtime-<iceberg-version>.jar until them get retired in
>>>>> the apache iceberg official repo.
>>>>> Option#2:  Change the spark2.4 & spark3.0's artifact names to the
>>>>> generic name format:
>>>>> iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
>>>>>  It makes sharing all the consistent name format between all the spark
>>>>> versions.
>>>>>
>>>>> Personally, I'd prefer option#2 because that looks more friendly for
>>>>> new community users (although it will require the old users to change their
>>>>> pom.xml to the new version).
>>>>>
>>>>> What is your preference ?
>>>>>
>>>>> Reference:
>>>>> 1.  Created a PR to change the artifact names and we had few
>>>>> discussions there. https://github.com/apache/iceberg/pull/4158
>>>>> 2.
>>>>> https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155
>>>>>
>>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>> --
> Best,
>
> Yufei
>
> `This is not a contribution`
>


-- 
Ryan Blue
Tabular

Re: [DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

Posted by Yufei Gu <fl...@gmail.com>.
Agreed with Kyle. An artifact name of Spark3.0 like
iceberg-spark-runtime-3.0_2.12-0.13.1.jar is more accurate and consistent,
 less confusing for users.

On Sun, Feb 20, 2022 at 10:47 PM Kyle Bendickson <ky...@tabular.io> wrote:

> Thanks for bringing this up Jeff!
>
> Normally I agree, it’s not a good practice to change artifact name.
> However, in this case, the artifact has changed already. The
> “spark3-runtime” used to be for all versions of Spark 3 (at the time Spark
> 3.0 and 3.1). It no longer is, as it’s only tested / used with Spark 3.0.
>
> I encounter many users who have upgraded to newer versions of Spark, but
> have not upgraded the artifact to the newly versioned by Spark name system
> as “spark3-runtime” sounds like it encompasses all versions. And they
> encounter subtle bugs and it’s not a great user experience to solve
> upgrading that way.
>
> These users are, however, updating the Iceberg artifact to the new
> versions.
>
> So I think in this case, breaking naming has benefits. As users who go to
> upgrade when new Iceberg version are released, and their dependency is not
> found, they will hopefully check maven and see the new naming convention /
> artifacts.
>
> So I support option 2 also, with naming with Spark and Scala versions.
> Otherwise, we continue to see people using the old “spark3-runtime” as they
> upgrade Spark versions and encounter subtle errors (class not found, wrong
> type signatures due to version mismatch).
>
> Users eventually have to upgrade their pom if / when they upgrade Spark,
> due to incompatibility. This way at least, breaking will be loud as there’s
> won’t be a new Iceberg version,
>
> Is it possible to mark to the old spark3-runtime / spark-runtime as
> deprecated or otherwise point to the new artifacts in Maven?
>
> - Kyle
>
> On Sun, Feb 20, 2022 at 9:41 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> I don't think it is best practice to just change the artifact name of
>> published jars. Unless we publish a new version with the new naming
>> convention.
>>
>> On Mon, Feb 21, 2022 at 12:36 PM Jack Ye <ye...@gmail.com> wrote:
>>
>>> I think option 2 is ideal, but I don't know if there is any hard
>>> requirement from ASF/Maven Central side for us to keep backwards
>>> compatibility of package names published in maven. If there is a
>>> requirement then we cannot change it.
>>>
>>> As a mitigation, I stated in
>>> https://iceberg.apache.org/multi-engine-support that Spark 2.4 and 3.0
>>> jar names do not follow the naming convention of newer versions for
>>> backwards compatibility.
>>>
>>> Best,
>>> Jack Ye
>>>
>>> On Sun, Feb 20, 2022 at 7:03 PM OpenInx <op...@gmail.com> wrote:
>>>
>>>> Hi everyone
>>>>
>>>> The current spark2.4, spark3.0 have the following unaligned runtime
>>>> artifact names:
>>>>
>>>> # Spark 2.4
>>>> iceberg-spark-runtime-0.13.1.jar
>>>> # Spark 3.0
>>>> iceberg-spark3-runtime-0.13.1.jar
>>>> # Spark 3.1
>>>> iceberg-spark-runtime-3.1_2.12-0.13.1.jar
>>>> # Spark 3.2
>>>> iceberg-spark-runtime-3.2_2.12-0.13.1.jar
>>>>
>>>> From the spark 3.1 and spark 3.2's runtime artifact names, we can
>>>> easily recognize:
>>>> 1. What's the spark major version that the runtime jar is attached to
>>>> 2. What's the spark scala version that the runtime jar is compiled with
>>>>
>>>> But for spark 3.0 and spark 2.4,  it's not easy to understand what's
>>>> the above information.  I think we kept those legacy names because they
>>>> were introduced in older iceberg releases and we wanted to avoid changing
>>>> the modules that users depend on and opted not to rename, but they are
>>>> indeed causing confusion for the new community users.
>>>>
>>>> In general,   we have two options:
>>>>
>>>> Option#1:  keep the current artifact names, that mean spark 2.4 & spark
>>>> 3.0 will always use the iceberg-spark-runtime-<iceberg-version>.jar and
>>>> iceberg-spark3-runtime-<iceberg-version>.jar until them get retired in the
>>>> apache iceberg official repo.
>>>> Option#2:  Change the spark2.4 & spark3.0's artifact names to the
>>>> generic name format:
>>>> iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
>>>>  It makes sharing all the consistent name format between all the spark
>>>> versions.
>>>>
>>>> Personally, I'd prefer option#2 because that looks more friendly for
>>>> new community users (although it will require the old users to change their
>>>> pom.xml to the new version).
>>>>
>>>> What is your preference ?
>>>>
>>>> Reference:
>>>> 1.  Created a PR to change the artifact names and we had few
>>>> discussions there. https://github.com/apache/iceberg/pull/4158
>>>> 2.
>>>> https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155
>>>>
>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
> --
Best,

Yufei

`This is not a contribution`

Re: [DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

Posted by Kyle Bendickson <ky...@tabular.io>.
Thanks for bringing this up Jeff!

Normally I agree, it’s not a good practice to change artifact name.
However, in this case, the artifact has changed already. The
“spark3-runtime” used to be for all versions of Spark 3 (at the time Spark
3.0 and 3.1). It no longer is, as it’s only tested / used with Spark 3.0.

I encounter many users who have upgraded to newer versions of Spark, but
have not upgraded the artifact to the newly versioned by Spark name system
as “spark3-runtime” sounds like it encompasses all versions. And they
encounter subtle bugs and it’s not a great user experience to solve
upgrading that way.

These users are, however, updating the Iceberg artifact to the new versions.

So I think in this case, breaking naming has benefits. As users who go to
upgrade when new Iceberg version are released, and their dependency is not
found, they will hopefully check maven and see the new naming convention /
artifacts.

So I support option 2 also, with naming with Spark and Scala versions.
Otherwise, we continue to see people using the old “spark3-runtime” as they
upgrade Spark versions and encounter subtle errors (class not found, wrong
type signatures due to version mismatch).

Users eventually have to upgrade their pom if / when they upgrade Spark,
due to incompatibility. This way at least, breaking will be loud as there’s
won’t be a new Iceberg version,

Is it possible to mark to the old spark3-runtime / spark-runtime as
deprecated or otherwise point to the new artifacts in Maven?

- Kyle

On Sun, Feb 20, 2022 at 9:41 PM Jeff Zhang <zj...@gmail.com> wrote:

> I don't think it is best practice to just change the artifact name of
> published jars. Unless we publish a new version with the new naming
> convention.
>
> On Mon, Feb 21, 2022 at 12:36 PM Jack Ye <ye...@gmail.com> wrote:
>
>> I think option 2 is ideal, but I don't know if there is any hard
>> requirement from ASF/Maven Central side for us to keep backwards
>> compatibility of package names published in maven. If there is a
>> requirement then we cannot change it.
>>
>> As a mitigation, I stated in
>> https://iceberg.apache.org/multi-engine-support that Spark 2.4 and 3.0
>> jar names do not follow the naming convention of newer versions for
>> backwards compatibility.
>>
>> Best,
>> Jack Ye
>>
>> On Sun, Feb 20, 2022 at 7:03 PM OpenInx <op...@gmail.com> wrote:
>>
>>> Hi everyone
>>>
>>> The current spark2.4, spark3.0 have the following unaligned runtime
>>> artifact names:
>>>
>>> # Spark 2.4
>>> iceberg-spark-runtime-0.13.1.jar
>>> # Spark 3.0
>>> iceberg-spark3-runtime-0.13.1.jar
>>> # Spark 3.1
>>> iceberg-spark-runtime-3.1_2.12-0.13.1.jar
>>> # Spark 3.2
>>> iceberg-spark-runtime-3.2_2.12-0.13.1.jar
>>>
>>> From the spark 3.1 and spark 3.2's runtime artifact names, we can easily
>>> recognize:
>>> 1. What's the spark major version that the runtime jar is attached to
>>> 2. What's the spark scala version that the runtime jar is compiled with
>>>
>>> But for spark 3.0 and spark 2.4,  it's not easy to understand what's the
>>> above information.  I think we kept those legacy names because they were
>>> introduced in older iceberg releases and we wanted to avoid changing the
>>> modules that users depend on and opted not to rename, but they are indeed
>>> causing confusion for the new community users.
>>>
>>> In general,   we have two options:
>>>
>>> Option#1:  keep the current artifact names, that mean spark 2.4 & spark
>>> 3.0 will always use the iceberg-spark-runtime-<iceberg-version>.jar and
>>> iceberg-spark3-runtime-<iceberg-version>.jar until them get retired in the
>>> apache iceberg official repo.
>>> Option#2:  Change the spark2.4 & spark3.0's artifact names to the
>>> generic name format:
>>> iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
>>>  It makes sharing all the consistent name format between all the spark
>>> versions.
>>>
>>> Personally, I'd prefer option#2 because that looks more friendly for new
>>> community users (although it will require the old users to change their
>>> pom.xml to the new version).
>>>
>>> What is your preference ?
>>>
>>> Reference:
>>> 1.  Created a PR to change the artifact names and we had few discussions
>>> there. https://github.com/apache/iceberg/pull/4158
>>> 2.  https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155
>>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

Posted by Jeff Zhang <zj...@gmail.com>.
I don't think it is best practice to just change the artifact name of
published jars. Unless we publish a new version with the new naming
convention.

On Mon, Feb 21, 2022 at 12:36 PM Jack Ye <ye...@gmail.com> wrote:

> I think option 2 is ideal, but I don't know if there is any hard
> requirement from ASF/Maven Central side for us to keep backwards
> compatibility of package names published in maven. If there is a
> requirement then we cannot change it.
>
> As a mitigation, I stated in
> https://iceberg.apache.org/multi-engine-support that Spark 2.4 and 3.0
> jar names do not follow the naming convention of newer versions for
> backwards compatibility.
>
> Best,
> Jack Ye
>
> On Sun, Feb 20, 2022 at 7:03 PM OpenInx <op...@gmail.com> wrote:
>
>> Hi everyone
>>
>> The current spark2.4, spark3.0 have the following unaligned runtime
>> artifact names:
>>
>> # Spark 2.4
>> iceberg-spark-runtime-0.13.1.jar
>> # Spark 3.0
>> iceberg-spark3-runtime-0.13.1.jar
>> # Spark 3.1
>> iceberg-spark-runtime-3.1_2.12-0.13.1.jar
>> # Spark 3.2
>> iceberg-spark-runtime-3.2_2.12-0.13.1.jar
>>
>> From the spark 3.1 and spark 3.2's runtime artifact names, we can easily
>> recognize:
>> 1. What's the spark major version that the runtime jar is attached to
>> 2. What's the spark scala version that the runtime jar is compiled with
>>
>> But for spark 3.0 and spark 2.4,  it's not easy to understand what's the
>> above information.  I think we kept those legacy names because they were
>> introduced in older iceberg releases and we wanted to avoid changing the
>> modules that users depend on and opted not to rename, but they are indeed
>> causing confusion for the new community users.
>>
>> In general,   we have two options:
>>
>> Option#1:  keep the current artifact names, that mean spark 2.4 & spark
>> 3.0 will always use the iceberg-spark-runtime-<iceberg-version>.jar and
>> iceberg-spark3-runtime-<iceberg-version>.jar until them get retired in the
>> apache iceberg official repo.
>> Option#2:  Change the spark2.4 & spark3.0's artifact names to the generic
>> name format:
>> iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
>>  It makes sharing all the consistent name format between all the spark
>> versions.
>>
>> Personally, I'd prefer option#2 because that looks more friendly for new
>> community users (although it will require the old users to change their
>> pom.xml to the new version).
>>
>> What is your preference ?
>>
>> Reference:
>> 1.  Created a PR to change the artifact names and we had few discussions
>> there. https://github.com/apache/iceberg/pull/4158
>> 2.  https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155
>>
>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

Posted by Jack Ye <ye...@gmail.com>.
I think option 2 is ideal, but I don't know if there is any hard
requirement from ASF/Maven Central side for us to keep backwards
compatibility of package names published in maven. If there is a
requirement then we cannot change it.

As a mitigation, I stated in https://iceberg.apache.org/multi-engine-support
that Spark 2.4 and 3.0 jar names do not follow the naming convention of
newer versions for backwards compatibility.

Best,
Jack Ye

On Sun, Feb 20, 2022 at 7:03 PM OpenInx <op...@gmail.com> wrote:

> Hi everyone
>
> The current spark2.4, spark3.0 have the following unaligned runtime
> artifact names:
>
> # Spark 2.4
> iceberg-spark-runtime-0.13.1.jar
> # Spark 3.0
> iceberg-spark3-runtime-0.13.1.jar
> # Spark 3.1
> iceberg-spark-runtime-3.1_2.12-0.13.1.jar
> # Spark 3.2
> iceberg-spark-runtime-3.2_2.12-0.13.1.jar
>
> From the spark 3.1 and spark 3.2's runtime artifact names, we can easily
> recognize:
> 1. What's the spark major version that the runtime jar is attached to
> 2. What's the spark scala version that the runtime jar is compiled with
>
> But for spark 3.0 and spark 2.4,  it's not easy to understand what's the
> above information.  I think we kept those legacy names because they were
> introduced in older iceberg releases and we wanted to avoid changing the
> modules that users depend on and opted not to rename, but they are indeed
> causing confusion for the new community users.
>
> In general,   we have two options:
>
> Option#1:  keep the current artifact names, that mean spark 2.4 & spark
> 3.0 will always use the iceberg-spark-runtime-<iceberg-version>.jar and
> iceberg-spark3-runtime-<iceberg-version>.jar until them get retired in the
> apache iceberg official repo.
> Option#2:  Change the spark2.4 & spark3.0's artifact names to the generic
> name format:
> iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
>  It makes sharing all the consistent name format between all the spark
> versions.
>
> Personally, I'd prefer option#2 because that looks more friendly for new
> community users (although it will require the old users to change their
> pom.xml to the new version).
>
> What is your preference ?
>
> Reference:
> 1.  Created a PR to change the artifact names and we had few discussions
> there. https://github.com/apache/iceberg/pull/4158
> 2.  https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155
>