You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "panbingkun (via GitHub)" <gi...@apache.org> on 2023/03/21 07:56:54 UTC
[GitHub] [spark] panbingkun opened a new pull request, #40506: [SPARK-42881][SQL] get_json_object Codegen Support
panbingkun opened a new pull request, #40506:
URL: https://github.com/apache/spark/pull/40506
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1480968971
> @panbingkun I think we should also update `JsonBenchmark-jdk11-results.txt`, `JsonBenchmark-jdk17-results.txt` and `JsonBenchmark-results.txt` in this pr due to `JsonBenchmark` updated
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1630922534
> @panbingkun Please, resolve conflicts.
Done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1260533972
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = new GetJsonObjectEvaluator(right)
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val refEvaluator = ctx.addReferenceObj("evaluator", evaluator)
+ val jsonEval = json.genCode(ctx)
+ val pathEval = path.genCode(ctx)
+
+ val setJson =
+ s"""
+ |if (${jsonEval.isNull}) {
+ | $refEvaluator.setJson(null);
+ |} else {
+ | $refEvaluator.setJson(${jsonEval.value});
+ |}
+ |""".stripMargin
+ val setPath =
+ s"""
+ |if (${pathEval.isNull}) {
+ | $refEvaluator.setPath(null);
+ |} else {
+ | $refEvaluator.setPath(${pathEval.value});
+ |}
+ |""".stripMargin
+
+ val resultType = CodeGenerator.boxedType(dataType)
+ val resultTerm = ctx.freshName("result")
+ ev.copy(code =
+ code"""
+ |${jsonEval.code}
+ |${pathEval.code}
+ |$setJson
+ |$setPath
+ |$resultType $resultTerm = ($resultType) $refEvaluator.evaluate();
+ |boolean ${ev.isNull} = $resultTerm == null;
+ |${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+ |if (!${ev.isNull}) {
+ | ${ev.value} = $resultTerm;
+ |}
+ |""".stripMargin
+ )
+ }
+
+ override protected def withNewChildrenInternal(
+ newLeft: Expression, newRight: Expression): GetJsonObject =
+ copy(json = newLeft, path = newRight)
+}
+
+class GetJsonObjectEvaluator(path: Expression) extends Serializable {
+ import com.fasterxml.jackson.core.JsonToken._
+ import PathInstruction._
+ import SharedFactory._
+ import WriteStyle._
+
+ @transient
+ private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
Review Comment:
Let's put this logic `path.eval().asInstanceOf[UTF8String]` outside.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1263204348
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,114 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = if (path.foldable) {
+ new GetJsonObjectEvaluator(path.eval().asInstanceOf[UTF8String])
+ } else {
+ new GetJsonObjectEvaluator()
+ }
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ if (!path.foldable) {
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ }
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val evaluatorClass = classOf[GetJsonObjectEvaluator].getName
+ val initEvaluator = path.foldable match {
+ case true if path.eval() != null =>
+ val cachedPath = path.eval().asInstanceOf[UTF8String]
+ val refCachedPath = ctx.addReferenceObj("cachedPath", cachedPath)
+ s"new $evaluatorClass($refCachedPath)"
+ case _ => s"new $evaluatorClass()"
+ }
+ val evaluator = ctx.addMutableState(evaluatorClass, "evaluator",
+ v => s"""$v = $initEvaluator;""", forceInline = true)
+
+ val jsonEval = json.genCode(ctx)
+ val pathEval = path.genCode(ctx)
+
+ val setJson =
+ s"""
+ |if (${jsonEval.isNull}) {
+ | $evaluator.setJson(null);
+ |} else {
+ | $evaluator.setJson(${jsonEval.value});
+ |}
+ |""".stripMargin
+ val setPath = if (!path.foldable) {
+ s"""
+ |if (${pathEval.isNull}) {
+ | $evaluator.setPath(null);
+ |} else {
+ | $evaluator.setPath(${pathEval.value});
+ |}
+ |""".stripMargin
+ } else {
+ ""
+ }
Review Comment:
Oh, nvm, I saw the default constructor sets it to null.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1263209639
##########
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala:
##########
@@ -1392,4 +1393,25 @@ class JsonFunctionsSuite extends QueryTest with SharedSparkSession {
checkAnswer(df.selectExpr("json_object_keys(a)"), expected)
checkAnswer(df.select(json_object_keys($"a")), expected)
}
+
+ test("GET_JSON_OBJECT Codegen Support") {
+ withTempView("GetJsonObjectTable") {
+ val data = Seq(("1", """{"f1": "value1", "f5": 5.23}""")).toDF("key", "jstring")
+ data.createOrReplaceTempView("GetJsonObjectTable")
+ val df = sql("SELECT key, get_json_object(jstring, '$.f1') FROM GetJsonObjectTable")
+ val plan = df.queryExecution.executedPlan
+ assert(plan.isInstanceOf[WholeStageCodegenExec])
+ checkAnswer(df, Seq(Row("1", "value1")))
+ }
+ }
+
+ test("path is null") {
+ val df: DataFrame = Seq(("""{"name": "alice", "age": 5}""", "")).toDF("a", "b")
+ checkAnswer(df.selectExpr("get_json_object(a, null)"), Row(null))
+ }
Review Comment:
Could you also check if this is wholestage codegen?
##########
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala:
##########
@@ -1392,4 +1393,25 @@ class JsonFunctionsSuite extends QueryTest with SharedSparkSession {
checkAnswer(df.selectExpr("json_object_keys(a)"), expected)
checkAnswer(df.select(json_object_keys($"a")), expected)
}
+
+ test("GET_JSON_OBJECT Codegen Support") {
+ withTempView("GetJsonObjectTable") {
+ val data = Seq(("1", """{"f1": "value1", "f5": 5.23}""")).toDF("key", "jstring")
+ data.createOrReplaceTempView("GetJsonObjectTable")
+ val df = sql("SELECT key, get_json_object(jstring, '$.f1') FROM GetJsonObjectTable")
+ val plan = df.queryExecution.executedPlan
+ assert(plan.isInstanceOf[WholeStageCodegenExec])
+ checkAnswer(df, Seq(Row("1", "value1")))
+ }
+ }
+
+ test("path is null") {
+ val df: DataFrame = Seq(("""{"name": "alice", "age": 5}""", "")).toDF("a", "b")
+ checkAnswer(df.selectExpr("get_json_object(a, null)"), Row(null))
+ }
+
+ test("json is null") {
+ val df: DataFrame = Seq(("""{"name": "alice", "age": 5}""", "")).toDF("a", "b")
+ checkAnswer(df.selectExpr("get_json_object(null, '$.name')"), Row(null))
Review Comment:
ditto.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-42881][SQL] Codegen Support for get_json_object [spark]
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1355839029
##########
sql/core/benchmarks/JsonBenchmark-results.txt:
##########
@@ -3,127 +3,128 @@ Benchmark for performance of JSON parsing
================================================================================================
Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1047-azure
Review Comment:
Okay,I will sumbit a followup pr to complete it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1630778507
@panbingkun Please, resolve conflicts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1630071155
I'm fine w/ this but I don't feel confident enough to approve. I remember @viirya mentioned sth like if codegen is not implemented, the wholestage codegen chain is cut out, and it affects performance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1630080078
> I remember @viirya mentioned sth like if codegen is not implemented, the wholestage codegen chain is cut out, and it affects performance.
This might be a valid standpoint that can support the merging of this pull request, although the benefits may not be obvious on their own.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xy2953396112 commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "xy2953396112 (via GitHub)" <gi...@apache.org>.
xy2953396112 commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1628946488
@panbingkun How much the performance improvement in your production environment?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1263203372
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,114 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = if (path.foldable) {
+ new GetJsonObjectEvaluator(path.eval().asInstanceOf[UTF8String])
+ } else {
+ new GetJsonObjectEvaluator()
+ }
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ if (!path.foldable) {
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ }
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val evaluatorClass = classOf[GetJsonObjectEvaluator].getName
+ val initEvaluator = path.foldable match {
+ case true if path.eval() != null =>
+ val cachedPath = path.eval().asInstanceOf[UTF8String]
+ val refCachedPath = ctx.addReferenceObj("cachedPath", cachedPath)
+ s"new $evaluatorClass($refCachedPath)"
+ case _ => s"new $evaluatorClass()"
+ }
+ val evaluator = ctx.addMutableState(evaluatorClass, "evaluator",
+ v => s"""$v = $initEvaluator;""", forceInline = true)
+
+ val jsonEval = json.genCode(ctx)
+ val pathEval = path.genCode(ctx)
+
+ val setJson =
+ s"""
+ |if (${jsonEval.isNull}) {
+ | $evaluator.setJson(null);
+ |} else {
+ | $evaluator.setJson(${jsonEval.value});
+ |}
+ |""".stripMargin
+ val setPath = if (!path.foldable) {
+ s"""
+ |if (${pathEval.isNull}) {
+ | $evaluator.setPath(null);
+ |} else {
+ | $evaluator.setPath(${pathEval.value});
+ |}
+ |""".stripMargin
+ } else {
+ ""
+ }
Review Comment:
If path is foldable but evaluated value is null, don't you need to set a null path to `$evaluator`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1563673976
> @panbingkun Could you resolve conflicts, please.
This is done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1563033876
> @panbingkun Could you resolve conflicts, please.
Let me update the results of `JsonBenchmark` again. Waiting for it.
Thank you for your review! @MaxGekk
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1670579488
> @panbingkun Could you rebase this PR on the recent master and resolve conflicts, please.
@MaxGekk Done, Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1260533574
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = new GetJsonObjectEvaluator(right)
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val refEvaluator = ctx.addReferenceObj("evaluator", evaluator)
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1143306325
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = new GetJsonObjectEvaluator(right)
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val refEvaluator = ctx.addReferenceObj("evaluator", evaluator)
+ val jsonEval = json.genCode(ctx)
+ val pathEval = path.genCode(ctx)
+
+ val setJson =
+ s"""
+ |if (${jsonEval.isNull}) {
+ | $refEvaluator.setJson(null);
+ |} else {
+ | $refEvaluator.setJson(${jsonEval.value});
+ |}
+ |""".stripMargin
+ val setPath =
+ s"""
+ |if (${pathEval.isNull}) {
+ | $refEvaluator.setPath(null);
+ |} else {
+ | $refEvaluator.setPath(${pathEval.value});
+ |}
+ |""".stripMargin
+
+ val resultType = CodeGenerator.boxedType(dataType)
+ val resultTerm = ctx.freshName("result")
+ ev.copy(code =
+ code"""
+ |${jsonEval.code + "\n"}
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1206118883
##########
sql/core/benchmarks/JsonBenchmark-jdk17-results.txt:
##########
@@ -7,117 +7,118 @@ OpenJDK 64-Bit Server VM 17.0.7+7 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 2964 3045 89 1.7 592.8 1.0X
-UTF-8 is set 4365 4382 18 1.1 873.1 0.7X
+No encoding 3004 3017 12 1.7 600.8 1.0X
+UTF-8 is set 4430 4446 17 1.1 886.0 0.7X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 17.0.7+7 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 2326 2381 52 2.1 465.2 1.0X
-UTF-8 is set 3834 3846 17 1.3 766.7 0.6X
+No encoding 2345 2392 44 2.1 469.0 1.0X
+UTF-8 is set 3832 3845 11 1.3 766.4 0.6X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 17.0.7+7 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 4599 4622 26 0.2 4599.4 1.0X
-UTF-8 is set 6079 6120 62 0.2 6078.8 0.8X
+No encoding 7234 7306 71 0.1 7234.4 1.0X
+UTF-8 is set 6396 6449 57 0.2 6396.1 1.1X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 17.0.7+7 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 12217 12443 256 0.0 244340.4 1.0X
-UTF-8 is set 13720 13823 113 0.0 274409.6 0.9X
+No encoding 12800 12832 39 0.0 255996.9 1.0X
+UTF-8 is set 13818 13931 115 0.0 276350.2 0.9X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 17.0.7+7 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Select 10 columns 2291 2308 18 0.4 2291.5 1.0X
-Select 1 column 1485 1491 8 0.7 1485.2 1.5X
+Select 10 columns 1948 1966 31 0.5 1947.5 1.0X
+Select 1 column 1453 1456 4 0.7 1452.6 1.3X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 17.0.7+7 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Short column without encoding 689 691 3 1.5 688.7 1.0X
-Short column with UTF-8 973 977 3 1.0 972.8 0.7X
-Wide column without encoding 7239 7283 71 0.1 7238.6 0.1X
-Wide column with UTF-8 9634 9667 30 0.1 9634.3 0.1X
+Short column without encoding 664 675 12 1.5 663.6 1.0X
+Short column with UTF-8 956 975 25 1.0 956.0 0.7X
+Wide column without encoding 7269 7299 34 0.1 7268.7 0.1X
+Wide column with UTF-8 10444 10474 28 0.1 10444.5 0.1X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 17.0.7+7 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Text read 95 100 9 10.5 95.1 1.0X
-from_json 1638 1646 7 0.6 1638.5 0.1X
-json_tuple 1971 1996 39 0.5 1970.6 0.0X
-get_json_object 1799 1809 13 0.6 1799.3 0.1X
+Text read 90 91 3 11.2 89.5 1.0X
+from_json 2048 2057 12 0.5 2047.7 0.0X
+json_tuple 2334 2340 6 0.4 2334.1 0.0X
+get_json_object wholestage off 2295 2299 4 0.4 2295.4 0.0X
+get_json_object wholestage on 2158 2161 3 0.5 2158.3 0.0X
Review Comment:
ditto
##########
sql/core/benchmarks/JsonBenchmark-results.txt:
##########
@@ -4,120 +4,121 @@ Benchmark for performance of JSON parsing
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 3720 3843 121 1.3 743.9 1.0X
-UTF-8 is set 5412 5455 45 0.9 1082.4 0.7X
+No encoding 3280 3495 218 1.5 655.9 1.0X
+UTF-8 is set 4759 4780 18 1.1 951.8 0.7X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 3234 3254 33 1.5 646.7 1.0X
-UTF-8 is set 4847 4868 21 1.0 969.5 0.7X
+No encoding 2734 2780 39 1.8 546.9 1.0X
+UTF-8 is set 4421 4472 45 1.1 884.2 0.6X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 5702 5794 101 0.2 5702.1 1.0X
-UTF-8 is set 9526 9607 73 0.1 9526.1 0.6X
+No encoding 6322 6442 138 0.2 6322.2 1.0X
+UTF-8 is set 10099 10136 49 0.1 10099.0 0.6X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 18318 18448 199 0.0 366367.7 1.0X
-UTF-8 is set 19791 19887 99 0.0 395817.1 0.9X
+No encoding 16237 16377 154 0.0 324735.1 1.0X
+UTF-8 is set 17622 17694 71 0.0 352440.5 0.9X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Select 10 columns 2531 2570 51 0.4 2531.3 1.0X
-Select 1 column 1867 1882 16 0.5 1867.0 1.4X
+Select 10 columns 2481 2495 14 0.4 2480.8 1.0X
+Select 1 column 1789 1792 3 0.6 1789.0 1.4X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Short column without encoding 868 875 7 1.2 868.4 1.0X
-Short column with UTF-8 1151 1163 11 0.9 1150.9 0.8X
-Wide column without encoding 12063 12299 205 0.1 12063.0 0.1X
-Wide column with UTF-8 16095 16136 51 0.1 16095.3 0.1X
+Short column without encoding 812 831 17 1.2 811.9 1.0X
+Short column with UTF-8 1150 1153 3 0.9 1149.9 0.7X
+Wide column without encoding 11707 11763 49 0.1 11707.4 0.1X
+Wide column with UTF-8 17484 17524 35 0.1 17484.2 0.0X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Text read 165 170 4 6.1 164.7 1.0X
-from_json 2339 2386 77 0.4 2338.9 0.1X
-json_tuple 2667 2730 55 0.4 2667.3 0.1X
-get_json_object 2627 2659 32 0.4 2627.1 0.1X
+Text read 149 152 4 6.7 148.9 1.0X
+from_json 2103 2124 21 0.5 2103.4 0.1X
+json_tuple 2482 2490 7 0.4 2481.7 0.1X
+get_json_object wholestage off 2241 2249 12 0.4 2240.7 0.1X
+get_json_object wholestage on 2124 2132 8 0.5 2123.9 0.1X
Review Comment:
ditto
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-42881][SQL] Codegen Support for get_json_object [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1355134245
##########
sql/core/benchmarks/JsonBenchmark-results.txt:
##########
@@ -3,127 +3,128 @@ Benchmark for performance of JSON parsing
================================================================================================
Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1047-azure
Review Comment:
Sorry for the late reply, but I have to say, we also need to update `JsonBenchmark-jdk21-results.txt` ... @panbingkun
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-42881][SQL] Codegen Support for get_json_object [spark]
Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
URL: https://github.com/apache/spark/pull/40506
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1477426004
cc @wangyum @cloud-fan FYI
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1480692403
@panbingkun I think we should also update `JsonBenchmark-jdk11-results.txt`, `JsonBenchmark-jdk17-results.txt` and `JsonBenchmark-results.txt` in this pr due to `JsonBenchmark` updated
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1479578855
hmm... I think we should refactor `JsonBenchmark` to make get_json_object run w/ and w/o in one
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-42881][SQL] Codegen Support for get_json_object [spark]
Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1757843168
The failed tests like this:
```
python/pyspark/sql/tests/pandas/test_pandas_map.py.test_other_than_dataframe_iter
[Errno 111] Connection refused
```
has been passed already a couple commits ago. Since that only the PR only rebased.
+1, LGTM. Merging to master.
Thank you, @panbingkun and @viirya @LuciferYang for review.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1260279671
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = new GetJsonObjectEvaluator(right)
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val refEvaluator = ctx.addReferenceObj("evaluator", evaluator)
+ val jsonEval = json.genCode(ctx)
+ val pathEval = path.genCode(ctx)
+
+ val setJson =
+ s"""
+ |if (${jsonEval.isNull}) {
+ | $refEvaluator.setJson(null);
+ |} else {
+ | $refEvaluator.setJson(${jsonEval.value});
+ |}
+ |""".stripMargin
+ val setPath =
+ s"""
+ |if (${pathEval.isNull}) {
+ | $refEvaluator.setPath(null);
+ |} else {
+ | $refEvaluator.setPath(${pathEval.value});
+ |}
+ |""".stripMargin
+
+ val resultType = CodeGenerator.boxedType(dataType)
+ val resultTerm = ctx.freshName("result")
+ ev.copy(code =
+ code"""
+ |${jsonEval.code}
+ |${pathEval.code}
+ |$setJson
+ |$setPath
+ |$resultType $resultTerm = ($resultType) $refEvaluator.evaluate();
+ |boolean ${ev.isNull} = $resultTerm == null;
+ |${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
+ |if (!${ev.isNull}) {
+ | ${ev.value} = $resultTerm;
+ |}
+ |""".stripMargin
+ )
+ }
+
+ override protected def withNewChildrenInternal(
+ newLeft: Expression, newRight: Expression): GetJsonObject =
+ copy(json = newLeft, path = newRight)
+}
+
+class GetJsonObjectEvaluator(path: Expression) extends Serializable {
+ import com.fasterxml.jackson.core.JsonToken._
+ import PathInstruction._
+ import SharedFactory._
+ import WriteStyle._
+
+ @transient
+ private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
Review Comment:
Hmm, this looks weird as in codegen you will call both interpreted and codgen of `path`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1263225024
##########
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala:
##########
@@ -1392,4 +1393,25 @@ class JsonFunctionsSuite extends QueryTest with SharedSparkSession {
checkAnswer(df.selectExpr("json_object_keys(a)"), expected)
checkAnswer(df.select(json_object_keys($"a")), expected)
}
+
+ test("GET_JSON_OBJECT Codegen Support") {
+ withTempView("GetJsonObjectTable") {
+ val data = Seq(("1", """{"f1": "value1", "f5": 5.23}""")).toDF("key", "jstring")
+ data.createOrReplaceTempView("GetJsonObjectTable")
+ val df = sql("SELECT key, get_json_object(jstring, '$.f1') FROM GetJsonObjectTable")
+ val plan = df.queryExecution.executedPlan
+ assert(plan.isInstanceOf[WholeStageCodegenExec])
+ checkAnswer(df, Seq(Row("1", "value1")))
+ }
+ }
+
+ test("path is null") {
+ val df: DataFrame = Seq(("""{"name": "alice", "age": 5}""", "")).toDF("a", "b")
+ checkAnswer(df.selectExpr("get_json_object(a, null)"), Row(null))
+ }
+
+ test("json is null") {
+ val df: DataFrame = Seq(("""{"name": "alice", "age": 5}""", "")).toDF("a", "b")
+ checkAnswer(df.selectExpr("get_json_object(null, '$.name')"), Row(null))
Review Comment:
All Done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1206118575
##########
sql/core/benchmarks/JsonBenchmark-jdk11-results.txt:
##########
@@ -4,120 +4,121 @@ Benchmark for performance of JSON parsing
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 3150 3166 27 1.6 630.1 1.0X
-UTF-8 is set 4572 4585 12 1.1 914.4 0.7X
+No encoding 3493 3689 209 1.4 698.6 1.0X
+UTF-8 is set 4954 4984 50 1.0 990.7 0.7X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 2422 2475 50 2.1 484.4 1.0X
-UTF-8 is set 3786 3796 14 1.3 757.2 0.6X
+No encoding 2723 2771 54 1.8 544.6 1.0X
+UTF-8 is set 4092 4166 99 1.2 818.4 0.7X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 5104 5170 87 0.2 5104.0 1.0X
-UTF-8 is set 9229 9246 15 0.1 9228.7 0.6X
+No encoding 5025 5208 175 0.2 5024.7 1.0X
+UTF-8 is set 9642 9678 34 0.1 9641.7 0.5X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-No encoding 13977 14153 277 0.0 279538.0 1.0X
-UTF-8 is set 16231 16284 70 0.0 324628.3 0.9X
+No encoding 16465 22840 1784 0.0 329303.4 1.0X
+UTF-8 is set 21291 21761 785 0.0 425817.2 0.8X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Select 10 columns 2197 2232 42 0.5 2196.7 1.0X
-Select 1 column 1560 1567 9 0.6 1560.2 1.4X
+Select 10 columns 2749 2881 115 0.4 2749.2 1.0X
+Select 1 column 1951 2014 83 0.5 1950.9 1.4X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Short column without encoding 688 709 18 1.5 688.3 1.0X
-Short column with UTF-8 939 963 21 1.1 939.4 0.7X
-Wide column without encoding 8049 8102 66 0.1 8048.7 0.1X
-Wide column with UTF-8 14346 14368 28 0.1 14345.7 0.0X
+Short column without encoding 773 796 21 1.3 773.0 1.0X
+Short column with UTF-8 1096 1133 32 0.9 1096.1 0.7X
+Wide column without encoding 8231 8389 140 0.1 8230.8 0.1X
+Wide column with UTF-8 12882 13034 147 0.1 12881.9 0.1X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1037-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Text read 101 103 2 9.9 100.5 1.0X
-from_json 1960 1965 6 0.5 1960.1 0.1X
-json_tuple 2226 2235 13 0.4 2226.3 0.0X
-get_json_object 2077 2088 12 0.5 2077.0 0.0X
+Text read 99 109 9 10.1 98.6 1.0X
+from_json 2766 2816 46 0.4 2766.0 0.0X
+json_tuple 3064 3077 11 0.3 3063.8 0.0X
+get_json_object wholestage off 2897 2917 32 0.3 2897.3 0.0X
+get_json_object wholestage on 2832 2853 22 0.4 2831.6 0.0X
Review Comment:
Update here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1143228608
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = new GetJsonObjectEvaluator(right)
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val refEvaluator = ctx.addReferenceObj("evaluator", evaluator)
+ val jsonEval = json.genCode(ctx)
+ val pathEval = path.genCode(ctx)
+
+ val setJson =
+ s"""
+ |if (${jsonEval.isNull}) {
+ | $refEvaluator.setJson(null);
+ |} else {
+ | $refEvaluator.setJson(${jsonEval.value});
+ |}
+ |""".stripMargin
+ val setPath =
+ s"""
+ |if (${pathEval.isNull}) {
+ | $refEvaluator.setPath(null);
+ |} else {
+ | $refEvaluator.setPath(${pathEval.value});
+ |}
+ |""".stripMargin
+
+ val resultType = CodeGenerator.boxedType(dataType)
+ val resultTerm = ctx.freshName("result")
+ ev.copy(code =
+ code"""
+ |${jsonEval.code + "\n"}
Review Comment:
I don't think we need these `+ "\n"`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1479624519
> hmm... I think we should refactor `JsonBenchmark` to make get_json_object run w/ and w/o code gen in one
Ok, Let me do it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1623726609
> I am not sure why this should improve performance but in general LGTM. @HyukjinKwon WDYT?
In our production environment, it has indeed improved query performance for some scenarios.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1630931255
Its principle is similar to the following diagram (Although the diagram says Hive UDF Codgen)
<img width="530" alt="image" src="https://github.com/apache/spark/assets/15246973/b748afb4-28a5-471c-a89b-a9b8dc597378">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1666814815
@panbingkun Could you rebase this PR on the recent master and resolve conflicts, please.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1143261154
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = new GetJsonObjectEvaluator(right)
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val refEvaluator = ctx.addReferenceObj("evaluator", evaluator)
+ val jsonEval = json.genCode(ctx)
+ val pathEval = path.genCode(ctx)
+
+ val setJson =
+ s"""
+ |if (${jsonEval.isNull}) {
+ | $refEvaluator.setJson(null);
+ |} else {
+ | $refEvaluator.setJson(${jsonEval.value});
+ |}
+ |""".stripMargin
+ val setPath =
+ s"""
+ |if (${pathEval.isNull}) {
+ | $refEvaluator.setPath(null);
+ |} else {
+ | $refEvaluator.setPath(${pathEval.value});
+ |}
+ |""".stripMargin
+
+ val resultType = CodeGenerator.boxedType(dataType)
+ val resultTerm = ctx.freshName("result")
+ ev.copy(code =
+ code"""
+ |${jsonEval.code + "\n"}
Review Comment:
The generated code looks like this:
<img width="1057" alt="image" src="https://user-images.githubusercontent.com/15246973/226598452-fc784f17-ee57-450e-9aa7-9b4cea02e13c.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1633648972
Friendly ping @viirya
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1260277539
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala:
##########
@@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression, path: Expression)
override def nullable: Boolean = true
override def prettyName: String = "get_json_object"
- @transient private lazy val parsedPath = parsePath(path.eval().asInstanceOf[UTF8String])
+ @transient
+ private lazy val evaluator = new GetJsonObjectEvaluator(right)
override def eval(input: InternalRow): Any = {
- val jsonStr = json.eval(input).asInstanceOf[UTF8String]
+ evaluator.setJson(json.eval(input).asInstanceOf[UTF8String])
+ evaluator.setPath(path.eval(input).asInstanceOf[UTF8String])
+ evaluator.evaluate()
+ }
+
+ protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val refEvaluator = ctx.addReferenceObj("evaluator", evaluator)
Review Comment:
If you initialize `evaluator` in generated code, you may not need to make `GetJsonObjectEvaluator` extend `Serializable`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1479567656
cc @cloud-fan @wangyum @LuciferYang
I run benchmark - org.apache.spark.sql.execution.datasources.json.JsonBenchmark, result as follow:
- CodeGen for get_json_object
https://github.com/panbingkun/spark/actions/runs/4489492515/jobs/7895384637
<img width="922" alt="image" src="https://user-images.githubusercontent.com/15246973/226918094-4a943e82-151f-4d70-b29c-a16ef4087f30.png">
- No CodeGen for get_json_object
https://github.com/panbingkun/spark/actions/runs/4489490118/jobs/7895379568
<img width="920" alt="image" src="https://user-images.githubusercontent.com/15246973/226918535-df933355-bc2e-4b07-bbed-7f6010bcd42b.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-42881][SQL] Codegen Support for get_json_object [spark]
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #40506:
URL: https://github.com/apache/spark/pull/40506#discussion_r1355992806
##########
sql/core/benchmarks/JsonBenchmark-results.txt:
##########
@@ -3,127 +3,128 @@ Benchmark for performance of JSON parsing
================================================================================================
Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1047-azure
Review Comment:
Followup pr: https://github.com/apache/spark/pull/43346
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-42881][SQL] Codegen Support for get_json_object [spark]
Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1756772218
@panbingkun Could you rebase this on recent master, please.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-42881][SQL] Codegen Support for get_json_object [spark]
Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1757237699
> @panbingkun Could you rebase this on the recent master, please.
Okay.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org