You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Andrey Zagrebin (Jira)" <ji...@apache.org> on 2019/09/13 14:25:00 UTC

[jira] [Comment Edited] (FLINK-6993) Not reading recursive files in Batch by using readTextFile when file name contains _ in starting.

    [ https://issues.apache.org/jira/browse/FLINK-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929241#comment-16929241 ] 

Andrey Zagrebin edited comment on FLINK-6993 at 9/13/19 2:24 PM:
-----------------------------------------------------------------

 I will add it to the docs


was (Author: azagrebin):
 I will add to the docs

> Not reading recursive files in Batch by using readTextFile when file name contains _ in starting.
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-6993
>                 URL: https://issues.apache.org/jira/browse/FLINK-6993
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataSet
>    Affects Versions: 1.3.0
>            Reporter: Shashank Agarwal
>            Assignee: Andrey Zagrebin
>            Priority: Critical
>
> When i try to read files from a folder using using readTextFile in batch and using recursive.file.enumeration, It's not reading the files when file name contains _ in starting. But when i removed the _ from start it's working fine. 
> It also working fine in case of direct path of single file not working with Directory path. For replicate the issue :
> {code}
> import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}
> import org.apache.flink.configuration.Configuration
> object CSVMerge {
>   def main(args: Array[String]): Unit = {
>     val env = ExecutionEnvironment.getExecutionEnvironment
>     // create a configuration object
>     val parameters = new Configuration
>     // set the recursive enumeration parameter
>     parameters.setBoolean("recursive.file.enumeration", true)
>     val stream = env.readTextFile("file:///Users/data")
>       .withParameters(parameters)
>     stream.print()
>   }
> }
> {code}
> When you put 2-3 Text files with name like 1.txt, 2.txt etc. in data folder it's working fine. But when we put _1.txt, _2.txt file it's not working.
> Flink BucketingSink in stream by default put _ before the file names. So unable to read Sinked files from DataStream.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)