You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jason Xu <ja...@gmail.com> on 2022/05/18 15:00:00 UTC

Spark 3 migration question

Hi Spark user group,

Spark 2.4 to 3 migration for existing Spark jobs seems a big challenge
given a long list of changes in migration guide
<https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-24-to-30>,
they could introduce failures or output changes related to behavior changes
in Spark 3. This makes the migration risky if we don't identify and fix
changes in the migration guide.

However, the guide is a bit high level. For some items in the guide, I
don't know how to use an example query/job to compare behavior between 2.4
and 3.x.  A specific example:

   - In Spark version 2.4 and below, you can create map values with map
   type key via built-in function such as CreateMap, MapFromArrays, etc. In
   Spark 3.0, it’s not allowed to create map values with map type key with
   these built-in functions. Users can use map_entries function to convert map
   to array<struct<key, value» as a workaround. In addition, users can still
   read map values with map type key from data source or Java/Scala
   collections, though it is discouraged.


Does anyone have an example to illustrate what this item is about?

Also, I'm curious if anyone has done, or is doing large scale Spark 2.4 to
3.x migration?  What's your experience in handling the long list of
breaking changes?


Thanks,
Jason Xu