You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kousuke Saruta (Jira)" <ji...@apache.org> on 2021/02/20 15:05:00 UTC

[jira] [Updated] (SPARK-34484) Introduce a new syntax to represent attributes with the Catalyst DSL

     [ https://issues.apache.org/jira/browse/SPARK-34484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kousuke Saruta updated SPARK-34484:
-----------------------------------
    Summary: Introduce a new syntax to represent attributes with the Catalyst DSL  (was: Introduce a new syntax to represent attributes with Catalyst DSL)

> Introduce a new syntax to represent attributes with the Catalyst DSL
> --------------------------------------------------------------------
>
>                 Key: SPARK-34484
>                 URL: https://issues.apache.org/jira/browse/SPARK-34484
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Kousuke Saruta
>            Assignee: Kousuke Saruta
>            Priority: Major
>
> With the Catalyst DSL (dsl/package.scala), we have two ways to represent attributes.
> 1. Symbol literals (`'` syntax)
> 2. `$""` syntax which is defined in `sql/catalyst` module using string context.
> But they have problems.
> Regarding symbol literals, the scala community deprecates the symbol literals in Scala 2.13. We could alternatively use `Symbol` constructor but what is worse, Scala will completely remove `Symbol` in the future (https://scalacenter.github.io/scala-3-migration-guide/docs/incompatibilities/dropped-features.html).
> {code}
> Although scala.Symbol is useful for migration, beware that it is deprecated and that it will be removed from the scala-library. You are recommended, as a second step, to replace them with plain string literals "xwy" or a dedicated class.
> {code}
> Regarding `$""` syntax, this has two problems.
> The first problem is that the syntax conflicts with another `$""` syntax defined in `sql/core` module.
> You can easily see the problem with the Spark Shell.
> {code}
> import org.apache.spark.sql.catalyst.dsl.expressions._
> val attr1 = $"attr1"
>        error: type mismatch;
>         found   : StringContext
>         required: ?{def $: ?}
>        Note that implicit conversions are not applicable because they are ambiguous:
>         both method StringToColumn in class SQLImplicits of type (sc: StringContext): spark.implicits.StringToColumn
>         and method StringToAttributeConversionHelper in trait ExpressionConversions of type (sc: StringContext): org.apache.spark.sql.catalyst.dsl.expressions.StringToAttributeConversionHelper
>         are possible conversion functions from StringContext to ?{def $: ?}
> {code}
> The second problem is that we can't write like `$"attr".map(StringType, StringType)`, though we can write `'attr.map(StringType, StringType)`.
> This seems to be a bug of the Scala compiler and will be fixed in neither `2.12` nor `2.13` (https://github.com/scala/scala/pull/7396).
> Actually, I'm working on replacing all the symbol literals with `$""` syntax in SPARK-34443 and I found this problem in the following test code.
> * EncoderResolutionSuite.scala
> * ComplexTypeSuite.scala
> * ObjectExpressionsSuite.scala
> * NestedColumnAliasingSuite.scala
> * ReplaceNullWithFalseInPredicateSuite.scala
> * SimplifyCastsSuite.scala
> * SimplifyConditionalSuite.scala
> {code}
> [error] /home/kou/work/oss/spark-scala-2.13/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala:212:28: too many arguments (found 2, expected 1) for method map: (f: org.apache.spark.sql.catalyst.expressions.Expression => A): Seq[A]
> [error]       $"a".map(StringType, StringType)).foreach { attr =>
> {code}
> So, it's better to have another way to represent attributes with DSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org