You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kousuke Saruta (Jira)" <ji...@apache.org> on 2021/02/20 15:02:00 UTC

[jira] [Created] (SPARK-34484) Introduce a new syntax attr() to represent attributes with Catalyst DSL

Kousuke Saruta created SPARK-34484:
--------------------------------------

             Summary: Introduce a new syntax attr() to represent attributes with Catalyst DSL
                 Key: SPARK-34484
                 URL: https://issues.apache.org/jira/browse/SPARK-34484
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Kousuke Saruta
            Assignee: Kousuke Saruta


With the Catalyst DSL (dsl/package.scala), we have two ways to represent attributes.

1. Symbol literals (`'` syntax)
2. `$""` syntax which is defined in `sql/catalyst` module using string context.

But they have problems.

Regarding symbol literals, the scala community deprecates the symbol literals in Scala 2.13. We could alternatively use `Symbol` constructor but what is worse, Scala will completely remove `Symbol` in the future (https://scalacenter.github.io/scala-3-migration-guide/docs/incompatibilities/dropped-features.html).

{code}
Although scala.Symbol is useful for migration, beware that it is deprecated and that it will be removed from the scala-library. You are recommended, as a second step, to replace them with plain string literals "xwy" or a dedicated class.
{code}

Regarding `$""` syntax, this has two problems.
The first problem is that the syntax conflicts with another `$""` syntax defined in `sql/core` module.
You can easily see the problem with the Spark Shell.

{code}
import org.apache.spark.sql.catalyst.dsl.expressions._
val attr1 = $"attr1"

       error: type mismatch;
        found   : StringContext
        required: ?{def $: ?}
       Note that implicit conversions are not applicable because they are ambiguous:
        both method StringToColumn in class SQLImplicits of type (sc: StringContext): spark.implicits.StringToColumn
        and method StringToAttributeConversionHelper in trait ExpressionConversions of type (sc: StringContext): org.apache.spark.sql.catalyst.dsl.expressions.StringToAttributeConversionHelper
        are possible conversion functions from StringContext to ?{def $: ?}
{code}

The second problem is that we can't write like `$"attr".map(StringType, StringType)`, though we can write `'attr.map(StringType, StringType)`.
This seems to be a bug of the Scala compiler and will be fixed in neither `2.12` nor `2.13` (https://github.com/scala/scala/pull/7396).

Actually, I'm working on replacing all the symbol literals with `$""` syntax in SPARK-34443 and I found this problem in the following test code.

* EncoderResolutionSuite.scala
* ComplexTypeSuite.scala
* ObjectExpressionsSuite.scala
* NestedColumnAliasingSuite.scala
* ReplaceNullWithFalseInPredicateSuite.scala
* SimplifyCastsSuite.scala
* SimplifyConditionalSuite.scala

{code}
[error] /home/kou/work/oss/spark-scala-2.13/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala:212:28: too many arguments (found 2, expected 1) for method map: (f: org.apache.spark.sql.catalyst.expressions.Expression => A): Seq[A]
[error]       $"a".map(StringType, StringType)).foreach { attr =>
{code}

So, it's better to have another way to represent attributes with DSL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org