You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jason Xu <ja...@gmail.com> on 2022/05/18 15:00:00 UTC
Spark 3 migration question
Hi Spark user group,
Spark 2.4 to 3 migration for existing Spark jobs seems a big challenge
given a long list of changes in migration guide
<https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-24-to-30>,
they could introduce failures or output changes related to behavior changes
in Spark 3. This makes the migration risky if we don't identify and fix
changes in the migration guide.
However, the guide is a bit high level. For some items in the guide, I
don't know how to use an example query/job to compare behavior between 2.4
and 3.x. A specific example:
- In Spark version 2.4 and below, you can create map values with map
type key via built-in function such as CreateMap, MapFromArrays, etc. In
Spark 3.0, it’s not allowed to create map values with map type key with
these built-in functions. Users can use map_entries function to convert map
to array<struct<key, value» as a workaround. In addition, users can still
read map values with map type key from data source or Java/Scala
collections, though it is discouraged.
Does anyone have an example to illustrate what this item is about?
Also, I'm curious if anyone has done, or is doing large scale Spark 2.4 to
3.x migration? What's your experience in handling the long list of
breaking changes?
Thanks,
Jason Xu