You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/11 22:52:42 UTC
[GitHub] [spark] chitralverma commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base

chitralverma commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25122#issuecomment-510682452
 
 
   @HyukjinKwon I've raised this PR as a WIP till I incorporate your comments. I had some doubts regarding the tests in pivot.sql and was hoping you could clear it for me.
   
   While porting 'pivot.sql', I ran the command below and it fails when running for configs `spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY`
   
   `build/sbt "sql/test-only *SQLQueryTestSuite -- -z pivot.sql"` 
   
   On inspection it seems like there is some discrepancy while handling the `null` values when passing through the udf. For Scala its expecting `null`, for Python its expecting `None` but the golden files contains `nan`. Thus the match is failing.
   
   This error persists in the port also. As per the guide, I tried looking for a related Jira but couldn't find one, so I thought I'd run this by you first before creating one.
   
   Stacktrace:
   
   ```
   5:21:42.536 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY
   [info] - udf/udf-pivot.sql - Regular Python UDF *** FAILED *** (24 seconds, 575 milliseconds)
   [info]   Expected "Java	2012	20000	[nan
   [info]   Java	2013	nan	30000
   [info]   dotNET	2012	15000	nan
   [info]   dotNET	2013	nan]	48000", but got "Java	2012	20000	[None
   [info]   Java	2013	None	30000
   [info]   dotNET	2012	15000	None
   [info]   dotNET	2013	None]	48000" Result did not match for query #8
   [info]   SELECT * FROM (
   [info]     SELECT course, year, earnings, udf(s) as s
   [info]     FROM courseSales
   [info]     JOIN years ON year = y
   [info]   )
   [info]   PIVOT (
   [info]     udf(sum(earnings))
   [info]     FOR s IN (1, 2)
   [info]   ) (SQLQueryTestSuite.scala:333)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
   [info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
   [info]   at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
   [info]   at org.scalatest.Assertions.assertResult(Assertions.scala:1003)
   
   ```
   ```
   5:21:17.912 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY
   [info] - udf/udf-pivot.sql - Scala UDF *** FAILED *** (25 seconds, 411 milliseconds)
   [info]   Expected "Java	2012	20000	n[an
   [info]   Java	2013	nan	30000
   [info]   dotNET	2012	15000	nan
   [info]   dotNET	2013	nan]	48000", but got "Java	2012	20000	n[ull
   [info]   Java	2013	null	30000
   [info]   dotNET	2012	15000	null
   [info]   dotNET	2013	null]	48000" Result did not match for query #8
   [info]   SELECT * FROM (
   [info]     SELECT course, year, earnings, udf(s) as s
   [info]     FROM courseSales
   [info]     JOIN years ON year = y
   [info]   )
   [info]   PIVOT (
   [info]     udf(sum(earnings))
   [info]     FOR s IN (1, 2)
   [info]   ) (SQLQueryTestSuite.scala:333)
   ```
   
   Any help will be appreciated. Thanks,
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org