You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Annemarie Burger <an...@campus.tu-berlin.de> on 2020/07/17 19:22:26 UTC

Global Hashmap & global static variable.

Hi,

I have two questions:

1. In the first part of my pipeline using Flink DataStreams processing graph
edges, I'm filling up Hashmap. In it goes a vertex id and the partition this
vertex is assigned to. Later in my pipeline I want to query this Hashmap
again, to see in which partition exactly I can find a specific edge based on
which partitions its two vertices are assigned to. What is the best way to
do this? I keep getting Nullpointer exceptions. 

2. Is there a good way to retrieve a single value at some point in my
pipeline, and then make it globally available? I was using static, but found
that this led to a Nullpointer exception when executing Flink standalone.
Weirdly enough it worked fine in my Intellij. 

All help very appreciated!

Best,
Annemarie 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Global Hashmap & global static variable.

Posted by Piotr Nowojski <pn...@apache.org>.

Hi Annemarie,

You are missing some basic concepts in Flink, please take a look at [1].

> Weirdly enough it worked fine in my Intellij.

It's completely normal. If you are accessing some static variable in your
code and you are executing your Flink application in a testing local
environment (Intellij), where there is just a single JVM running all of the
code, so everyone (any code/operator/function) accessing the same static
variable will be accessing the same one. But if you execute the same code
on a cluster, with multiple machines (Flink is a distributed system!),
there will be no way for different JVM processes to communicate via static
variables.

Probably the same problem applies to your first issue with the HashMap.

Please, carefully follow [1], how Flink executes/distributes your code to
understand this problem.

Can you describe what are you trying to do/achieve/solve?

There is currently no way for different operators/functions to share access
to the same state. Note, different parallel instances of the same
operator/function can share the same state. It's called broadcast state
[2], but it doesn't allow for the pattern you are looking for (aggregate
results in one stage, and then use this aggregated state in another later
stage/operator). To do this, you would have to store the state in some
external system (some external key/valua storage, DB, Kafka, File on a
distribute file system, ...), to make it visible and accessible across your
whole cluster.

Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/concepts/flink-architecture.html
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html

pt., 17 lip 2020 o 21:32 Annemarie Burger <
annemarie.burger@campus.tu-berlin.de> napisał(a):

> Hi,
>
> I have two questions:
>
> 1. In the first part of my pipeline using Flink DataStreams processing
> graph
> edges, I'm filling up Hashmap. In it goes a vertex id and the partition
> this
> vertex is assigned to. Later in my pipeline I want to query this Hashmap
> again, to see in which partition exactly I can find a specific edge based
> on
> which partitions its two vertices are assigned to. What is the best way to
> do this? I keep getting Nullpointer exceptions.
>
> 2. Is there a good way to retrieve a single value at some point in my
> pipeline, and then make it globally available? I was using static, but
> found
> that this led to a Nullpointer exception when executing Flink standalone.
> Weirdly enough it worked fine in my Intellij.
>
> All help very appreciated!
>
> Best,
> Annemarie
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>