You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/08 04:44:12 UTC

[GitHub] [spark] LuciferYang commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

LuciferYang commented on PR #36781:
URL: https://github.com/apache/spark/pull/36781#issuecomment-1149455635

   I think this pr should be backport to previous Spark version, because  when run `SPARK-39393: Do not push down predicate filters for repeated primitive fields` without this pr, I found the error logs as follows:
   
   ```
   Last failure message: There are 1 possibly leaked file streams..
   	at org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:185)
   	at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:192)
   	at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:402)
   	at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:401)
   	at org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.eventually(ParquetFilterSuite.scala:77)
   	at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:312)
   	at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:311)
   	at org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.eventually(ParquetFilterSuite.scala:77)
   	at org.apache.spark.sql.test.SharedSparkSessionBase.afterEach(SharedSparkSession.scala:164)
   	at org.apache.spark.sql.test.SharedSparkSessionBase.afterEach$(SharedSparkSession.scala:158)
   	at org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.afterEach(ParquetFilterSuite.scala:100)
   	at org.scalatest.BeforeAndAfterEach.$anonfun$runTest$1(BeforeAndAfterEach.scala:247)
   	at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:377)
   	at org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:373)
   	at org.scalatest.FailedStatus$.whenCompleted(Status.scala:505)
   	at org.scalatest.Status.withAfterEffect(Status.scala:373)
   	at org.scalatest.Status.withAfterEffect$(Status.scala:371)
   	at org.scalatest.FailedStatus$.withAfterEffect(Status.scala:477)
   	at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:246)
   	at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   	at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
   	at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
   	at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
   	at scala.collection.immutable.List.foreach(List.scala:431)
   	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
   	at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
   	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
   	at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
   	at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
   	at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
   	at org.scalatest.Suite.run(Suite.scala:1112)
   	at org.scalatest.Suite.run$(Suite.scala:1094)
   	at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
   	at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
   	at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
   	at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
   	at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
   	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:64)
   	at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
   	at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
   	at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
   	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:64)
   	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
   	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
   	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
   	at scala.collection.immutable.List.foreach(List.scala:431)
   	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
   	at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
   	at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
   	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
   	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
   	at org.scalatest.tools.Runner$.run(Runner.scala:798)
   	at org.scalatest.tools.Runner.run(Runner.scala)
   	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
   	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:25)
   Caused by: java.lang.IllegalStateException: There are 1 possibly leaked file streams.
   	at org.apache.spark.DebugFilesystem$.assertNoOpenStreams(DebugFilesystem.scala:54)
   	at org.apache.spark.sql.test.SharedSparkSessionBase.$anonfun$afterEach$1(SharedSparkSession.scala:165)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:150)
   	at org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:162)
   	... 54 more
   ```
   
   It seems that this issue is related to parquet-mr:
   
   https://github.com/LuciferYang/parquet-mr/blob/a2da156b251d13bce1fa81eb95b555da04880bc1/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L790
   
   the line 790 may throw `IOE` but not close `f`, this issue needs to be fixed in `parquet-mr` and  **this pr seems can protect this issue in Spark**
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org