You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Yang Jie (Jira)" <ji...@apache.org> on 2022/11/27 06:07:00 UTC

[jira] [Created] (SPARK-41276) Optimize constructor use of `StructType`

Yang Jie created SPARK-41276:
--------------------------------

             Summary: Optimize constructor use of `StructType`
                 Key: SPARK-41276
                 URL: https://issues.apache.org/jira/browse/SPARK-41276
             Project: Spark
          Issue Type: Improvement
          Components: MLlib, SQL
    Affects Versions: 3.4.0
            Reporter: Yang Jie


There are two main ways to construct `StructType`:

- Primary constructor

```scala
case class StructType(fields: Array[StructField])
```

- Use `Seq` as input constructor

```scala
def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray)
```

These two construction methods are widely used in Spark, but the latter requires an additional collection conversion.

This pr changes the following 3 scenarios to use primary constructor to reduce one collection conversion:

1. For manually create `Seq` input scenes, change to use manually create `Array` input instead, for examaple:

https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63

2. For the scenario where 'toSeq' is added to create input for compatibility with Scala 2.13, directly call 'toArray' to instead, for example:

https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113

3. For scenes whose input is originally `Array`, remove the redundant `toSeq`, for example:

https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org