You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Telco Phone <te...@yahoo.com> on 2017/10/23 18:47:16 UTC

Processing files

All,
Im looking to process files in a directory based on files that are coming in via file transfer.
The files are renamed once the transfer is done to a .DONE.
These are binary files and I need to process billions per day.
What I want to do is process the file and then create a new file called .PROCESSED
I need to have a task thread process a file at a time (unsplitable=true)
The files are in /mnt/DATE/filename.DONE
They are coming in on 4-5 servers at the moment.
I can run a task manager on each host so these will be processed on each server and written to the same directory.
What is the best way to build a continues list of files to process and hand that filename to tasks threads running on each host... 
Hope this makes sense... 
Thanks in advance.

Re: Processing files

Posted by Sugandha Amatya <su...@gmail.com>.
I think you need to use readfile.

On Tue, Oct 24, 2017 at 12:32 AM, Telco Phone <te...@yahoo.com> wrote:

> All,
>
> Im looking to process files in a directory based on files that are coming
> in via file transfer.
>
> The files are renamed once the transfer is done to a .DONE.
>
> These are binary files and I need to process billions per day.
>
> What I want to do is process the file and then create a new file called
> .PROCESSED
>
> I need to have a task thread process a file at a time (unsplitable=true)
>
> The files are in /mnt/DATE/filename.DONE
>
> They are coming in on 4-5 servers at the moment.
>
> I can run a task manager on each host so these will be processed on each
> server and written to the same directory.
>
> What is the best way to build a continues list of files to process and
> hand that filename to tasks threads running on each host...
>
> Hope this makes sense...
>
> Thanks in advance.
>
>