You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/11 22:52:42 UTC
[GitHub] [spark] chitralverma commented on issue #25122:
[SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into
UDF test base
chitralverma commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25122#issuecomment-510682452
@HyukjinKwon I've raised this PR as a WIP till I incorporate your comments. I had some doubts regarding the tests in pivot.sql and was hoping you could clear it for me.
While porting 'pivot.sql', I ran the command below and it fails when running for configs `spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY`
`build/sbt "sql/test-only *SQLQueryTestSuite -- -z pivot.sql"`
On inspection it seems like there is some discrepancy while handling the `null` values when passing through the udf. For Scala its expecting `null`, for Python its expecting `None` but the golden files contains `nan`. Thus the match is failing.
This error persists in the port also. As per the guide, I tried looking for a related Jira but couldn't find one, so I thought I'd run this by you first before creating one.
Stacktrace:
```
5:21:42.536 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY
[info] - udf/udf-pivot.sql - Regular Python UDF *** FAILED *** (24 seconds, 575 milliseconds)
[info] Expected "Java 2012 20000 [nan
[info] Java 2013 nan 30000
[info] dotNET 2012 15000 nan
[info] dotNET 2013 nan] 48000", but got "Java 2012 20000 [None
[info] Java 2013 None 30000
[info] dotNET 2012 15000 None
[info] dotNET 2013 None] 48000" Result did not match for query #8
[info] SELECT * FROM (
[info] SELECT course, year, earnings, udf(s) as s
[info] FROM courseSales
[info] JOIN years ON year = y
[info] )
[info] PIVOT (
[info] udf(sum(earnings))
[info] FOR s IN (1, 2)
[info] ) (SQLQueryTestSuite.scala:333)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
[info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info] at org.scalatest.Assertions.assertResult(Assertions.scala:1003)
```
```
5:21:17.912 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY
[info] - udf/udf-pivot.sql - Scala UDF *** FAILED *** (25 seconds, 411 milliseconds)
[info] Expected "Java 2012 20000 n[an
[info] Java 2013 nan 30000
[info] dotNET 2012 15000 nan
[info] dotNET 2013 nan] 48000", but got "Java 2012 20000 n[ull
[info] Java 2013 null 30000
[info] dotNET 2012 15000 null
[info] dotNET 2013 null] 48000" Result did not match for query #8
[info] SELECT * FROM (
[info] SELECT course, year, earnings, udf(s) as s
[info] FROM courseSales
[info] JOIN years ON year = y
[info] )
[info] PIVOT (
[info] udf(sum(earnings))
[info] FOR s IN (1, 2)
[info] ) (SQLQueryTestSuite.scala:333)
```
Any help will be appreciated. Thanks,
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org