You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2019/11/03 18:36:00 UTC

[jira] [Created] (SPARK-29735) CSVDataSource leaks file system

Dongjoon Hyun created SPARK-29735:
-------------------------------------

             Summary: CSVDataSource leaks file system
                 Key: SPARK-29735
                 URL: https://issues.apache.org/jira/browse/SPARK-29735
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Dongjoon Hyun


- https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113178/consoleFull

{code}
[info] FileBasedDataSourceSuite:
[info] - Writing empty datasets should not fail - orc (309 milliseconds)
[info] - Writing empty datasets should not fail - parquet (367 milliseconds)
[info] - Writing empty datasets should not fail - csv (171 milliseconds)
[info] - Writing empty datasets should not fail - json (130 milliseconds)
[info] - Writing empty datasets should not fail - text (423 milliseconds)
[info] - SPARK-23072 Write and read back unicode column names - orc (274 milliseconds)
[info] - SPARK-23072 Write and read back unicode column names - parquet (318 milliseconds)
[info] - SPARK-23072 Write and read back unicode column names - csv (358 milliseconds)
[info] - SPARK-23072 Write and read back unicode column names - json (290 milliseconds)
[info] - SPARK-15474 Write and read back non-empty schema with empty dataframe - orc (327 milliseconds)
[info] - SPARK-15474 Write and read back non-empty schema with empty dataframe - parquet (334 milliseconds)
[info] - SPARK-23271 empty RDD when saved should write a metadata only file - orc (273 milliseconds)
[info] - SPARK-23271 empty RDD when saved should write a metadata only file - parquet (352 milliseconds)
[info] - SPARK-23372 error while writing empty schema files using orc (29 milliseconds)
[info] - SPARK-23372 error while writing empty schema files using parquet (15 milliseconds)
[info] - SPARK-23372 error while writing empty schema files using csv (12 milliseconds)
[info] - SPARK-23372 error while writing empty schema files using json (10 milliseconds)
[info] - SPARK-23372 error while writing empty schema files using text (11 milliseconds)
[info] - SPARK-22146 read files containing special characters using orc (256 milliseconds)
[info] - SPARK-22146 read files containing special characters using parquet (380 milliseconds)
[info] - SPARK-22146 read files containing special characters using csv (428 milliseconds)
[info] - SPARK-22146 read files containing special characters using json (284 milliseconds)
[info] - SPARK-22146 read files containing special characters using text (254 milliseconds)
[info] - SPARK-23148 read files containing special characters using json with multiline enabled (557 milliseconds)
[info] - SPARK-23148 read files containing special characters using csv with multiline enabled (424 milliseconds)
[info] - Enabling/disabling ignoreMissingFiles using orc (1 second, 605 milliseconds)
[info] - Enabling/disabling ignoreMissingFiles using parquet (1 second, 895 milliseconds)
09:26:51.342 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 94.0 (TID 125, amp-jenkins-worker-04.amp, executor driver): TaskKilled (Stage cancelled)
[info] - Enabling/disabling ignoreMissingFiles using csv (1 second, 672 milliseconds)
09:26:51.344 WARN org.apache.spark.DebugFilesystem: Leaked filesystem connection created at:
java.lang.Throwable
	at org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:35)
	at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:69)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
	at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)
	at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.<init>(HadoopFileLinesReader.scala:65)
	at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.readFile(CSVDataSource.scala:99)
	at org.apache.spark.sql.execution.datasources.v2.csv.CSVPartitionReaderFactory.buildReader(CSVPartitionReaderFactory.scala:68)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createReader$1(FilePartitionReaderFactory.scala:29)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:109)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:42)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:95)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:62)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:726)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1832)
	at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
	at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2135)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org