You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by madan <ma...@gmail.com> on 2018/11/01 04:47:46 UTC

Re: CsvInputFormat - read header line first

Hi Ken,

Yep correct.

Thank you.

On Wed, Oct 31, 2018 at 7:24 PM Ken Krugler <kk...@transpac.com>
wrote:

> Hi Madan,
>
> If your source has a parallelism > 1, then when the CSV file is split,
> only one of the operators will get the split with the header row.
>
> So in that case, how would you communicate the column name->index
> information to the other operators?
>
> If you force a parallelism of 1 for the source, then I’m pretty sure
> you’re guaranteed that the file will be processed in order.
>
> — Ken
>
> On Oct 31, 2018, at 12:50 AM, madan <ma...@gmail.com> wrote:
>
> Hi,
>
> When we are splitting a csv file into multiple parts we are not sure which
> part is read first. Is there any way to make sure first part with header is
> read first ? I need to read header line first to store column name vs index
> and use this index for processing next records.
>
> I could read header line from the file before submitting job to the flink,
> but that way we are opening the file 2 times. Is there any better way to do
> this? Please suggest.
>
> --
> Thank you.
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

-- 
Thank you,
Madan.