You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jorge Machado (Jira)" <ji...@apache.org> on 2020/01/27 07:20:00 UTC
[jira] [Created] (SPARK-30647) When creating a custom datasource File NotFoundExpection happens

Jorge Machado created SPARK-30647:
-------------------------------------

             Summary: When creating a custom datasource File NotFoundExpection happens
                 Key: SPARK-30647
                 URL: https://issues.apache.org/jira/browse/SPARK-30647
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.3.2
            Reporter: Jorge Machado


Hello, I'm creating a datasource based on FileFormat and DataSourceRegister. 

when I pass a path or a file that has a white space it seems to fail wit the error: 
{code:java}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2.0 (TID 213, localhost, executor driver): java.io.FileNotFoundException: File file:somePath/0019_leftImg8%20bit.png does not exist It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
{code}
I'm happy to fix this if someone tells me where I need to look.  

I think it is on org.apache.spark.rdd.InputFileBlockHolder : 
{code:java}
inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, length))
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org