You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Nirmalya Sengupta <se...@gmail.com> on 2016/04/26 15:25:24 UTC

Initializing global data

Hello Flinksters,

I need to initialize a piece of global data at the beginning of a Flink
application. I use this piece as a READ-ONLY for enhancing the streamed
data that come in at the time of execution. So, this piece of data also has
to be present in all the execution nodes. For readability, let's name this
piece as ReferableData.

The ReferableData is essentially a Map[String, UserDefinedClass]. The
source is a regular CSV file, present in the local file system (no HDFS,for
the time being). So, I read the CSV file in the usual manner, parse it,
create instances of UserDefinedClass and create a Map[asStringK,
asInstanceV] and make this available globally.

What I gather from the documents and discussions in this forum is that one
of the ways to achieve this is to use

env.getConfig().setGlobalJobParameters(parameters)

Where 'parameters' is the Map is the ReferableData that I want to create
from the CSV file.

I have two questions:

a) Is this approach of providing look-up data to all nodes, a good practice?

b) Should I read the input CSV file through the
ExecutionEnvironment.readTextFile or using standard IO routines of Java?
Does the community have any preference?

-- Nirmalya




-- 
Software Technologist
http://www.linkedin.com/in/nirmalyasengupta
"If you have built castles in the air, your work need not be lost. That is
where they should be.
Now put the foundation under them."


[image: --]

Nirmalya Sengupta
[image: https://]about.me/sengupta.nirmalya
<https://about.me/sengupta.nirmalya?promo=email_sig&utm_source=email_sig&utm_medium=external_link&utm_campaign=chrome_ext>

Re: Initializing global data

Posted by Stefano Baghino <st...@radicalbit.io>.
Hi Nirmalya,

I'm not really sure setGlobalJobParameters is what you're looking for. If
the ReferableData more then some simple configuration (and judging from its
type it looks like so) maybe you can try to leverage broadcast variables.
You can read more about them here:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#broadcast-variables

Regarding your question on CSV, I always find the built-in APIs very handy.
I also expect them to work in a parallel fashion on distributed file
systems out of the box, so I wouldn't re-write them (unless you have very
specific needs, of course).

On Tue, Apr 26, 2016 at 3:25 PM, Nirmalya Sengupta <
sengupta.nirmalya@gmail.com> wrote:

> Hello Flinksters,
>
> I need to initialize a piece of global data at the beginning of a Flink
> application. I use this piece as a READ-ONLY for enhancing the streamed
> data that come in at the time of execution. So, this piece of data also has
> to be present in all the execution nodes. For readability, let's name this
> piece as ReferableData.
>
> The ReferableData is essentially a Map[String, UserDefinedClass]. The
> source is a regular CSV file, present in the local file system (no HDFS,for
> the time being). So, I read the CSV file in the usual manner, parse it,
> create instances of UserDefinedClass and create a Map[asStringK,
> asInstanceV] and make this available globally.
>
> What I gather from the documents and discussions in this forum is that one
> of the ways to achieve this is to use
>
> env.getConfig().setGlobalJobParameters(parameters)
>
> Where 'parameters' is the Map is the ReferableData that I want to create
> from the CSV file.
>
> I have two questions:
>
> a) Is this approach of providing look-up data to all nodes, a good
> practice?
>
> b) Should I read the input CSV file through the
> ExecutionEnvironment.readTextFile or using standard IO routines of Java?
> Does the community have any preference?
>
> -- Nirmalya
>
>
>
>
> --
> Software Technologist
> http://www.linkedin.com/in/nirmalyasengupta
> "If you have built castles in the air, your work need not be lost. That is
> where they should be.
> Now put the foundation under them."
>
>
> [image: --]
>
> Nirmalya Sengupta
> [image: https://]about.me/sengupta.nirmalya
>
> <https://about.me/sengupta.nirmalya?promo=email_sig&utm_source=email_sig&utm_medium=external_link&utm_campaign=chrome_ext>
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit