You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Camelia-Elena Ciolac <ca...@inria.fr> on 2014/10/24 12:08:24 UTC

Collection of files as input

Hello, 

I am working on a use case where we have a collections of files as input. 
I am using the env.createInput based on AvroInputFormat. For one input file, it is fine to specify it in new Path(args[0]). 
But, it is possible (and if yes, how) to create a DataSet based on a collection of files directly? 

I thought of a workaround of building one DataSet dsUnion to be the union result, 
and a second DataSet dsCurrent where we create an input for one file. 

read first file in dsUnion 

in a loop, repeat: 
read another file in dsCurrent 
dsUnion = dsUnion.union(dsCurrent) 
until all files in the collection are processed. 

Is there a simpler solution possible with Flink API? 

Thanks in advance! 
Camelia 

Re: Collection of files as input

Posted by Fabian Hueske <fh...@apache.org>.
Hi Camelia,

FileInputFormats such as the AvroInputFormat can also read all files in a
directory if this is specified as the path.

Hope that helps.

Best, Fabian

2014-10-24 12:08 GMT+02:00 Camelia-Elena Ciolac <
camelia-elena.ciolac@inria.fr>:

> Hello,
>
> I am working on a use case where we have a collections of files as input.
> I am using the env.createInput based on AvroInputFormat. For one input
> file, it is fine to specify it in new Path(args[0]).
> But, it is possible (and if yes, how)  to create a DataSet based on a
> collection of files directly?
>
> I thought of a workaround of building one DataSet dsUnion to be the union
> result,
>                                                    and a second DataSet
> dsCurrent where we create an input for one file.
>
> read first file in dsUnion
>
> in a loop, repeat:
>       read another file in dsCurrent
>       dsUnion = dsUnion.union(dsCurrent)
> until all files in the collection are processed.
>
> Is there a simpler solution possible with Flink API?
>
> Thanks in advance!
> Camelia
>
>
>