You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ted Yu (Jira)" <ji...@apache.org> on 2020/12/26 04:14:00 UTC

[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

    [ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254944#comment-17254944 ] 

Ted Yu commented on SPARK-33915:
--------------------------------

I have experimented with two patches which allow json expression to be pushable column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() method returns UTF8String. FunctionRegistry.scala and functions.scala would have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first patch is similar):

Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)

Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
     val jsonStr = json.eval(input).asInstanceOf[UTF8String]
     if (jsonStr == null) {
       return null
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index e4f001d61a..1cdc2642ba 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -723,6 +725,12 @@ abstract class PushableColumnBase {
         }
       case s: GetStructField if nestedPredicatePushdownEnabled =>
         helper(s.child).map(_ :+ s.childSchema(s.ordinal).name)
+      case GetJsonObject(col, field) =>
+        Some(Seq("GetJsonObject(" + col + "," + field + ")"))
       case _ => None
     }
     helper(e).map(_.quoted)
{code}

> Allow json expression to be pushable column
> -------------------------------------------
>
>                 Key: SPARK-33915
>                 URL: https://issues.apache.org/jira/browse/SPARK-33915
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Ted Yu
>            Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a chance to perform pushdown even if third party DB engine supports json expression pushdown.
> This issue is for discussion and implementation of Spark core changes which would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org