You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Steven Nelson <sn...@sourceallies.com> on 2018/12/21 18:48:25 UTC

HA with HDFS question

First off, I am new to using HDFS to store things, so expect stupid
questions.

I am working on hardening our Flink cluster for production usage. This
includes setting up an HA flink cluster, saving checkpoint and savepoints
to a central location etc. I have a functioning HDFS setup inside an HA
Kubernetes cluster. We have successfully stored checkpoint data in the HDFS
directory.

When we specify the location for the HDFS savepoints/checkpoints/HA save
locations we specify the a single namenode in the url. My question is how
do we implement failover in the event that namenode fails? We looked at
putting the namenodes behind a load balancer, except the backup nodes
attempt to respond to writes (and fail). I figure I am missing something
simple.

-Steve

Re: HA with HDFS question

Posted by Steven Nelson <sn...@sourceallies.com>.

Well, I have a fully functioning HDFS HA setup via a helm chart. My
question is more about how to specify the hdfs nodename in such a way so
that if a name node fails it communicates with the new active name node
automatically. Swapnil mentioned configuring nameservice for hdfs namenode
and I was looking for clarification on that.
-Steve

On Mon, Dec 24, 2018 at 8:20 AM Andrey Zagrebin <an...@data-artisans.com>
wrote:

> Hi Steve,
>
> I think your question is specific to HDFS HA setup.
> Flink HA addresses failover issues only for job manager and job meta state.
> The storage layer for savepoints/checkpoints and its failover are
> responsibility of HDFS deployment.
> Flink uses HDFS as external system, available over location url.
> I am not an expert on HDFS HA deployment. You could have a look into
> hadoop docs [1].
>
> Best,
> Andrey
>
> [1]
> https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
>
> > On 21 Dec 2018, at 21:48, Steven Nelson <sn...@sourceallies.com>
> wrote:
> >
> > First off, I am new to using HDFS to store things, so expect stupid
> questions.
> >
> > I am working on hardening our Flink cluster for production usage. This
> includes setting up an HA flink cluster, saving checkpoint and savepoints
> to a central location etc. I have a functioning HDFS setup inside an HA
> Kubernetes cluster. We have successfully stored checkpoint data in the HDFS
> directory.
> >
> > When we specify the location for the HDFS savepoints/checkpoints/HA save
> locations we specify the a single namenode in the url. My question is how
> do we implement failover in the event that namenode fails? We looked at
> putting the namenodes behind a load balancer, except the backup nodes
> attempt to respond to writes (and fail). I figure I am missing something
> simple.
> >
> > -Steve
>
>

Re: HA with HDFS question

Posted by Andrey Zagrebin <an...@data-artisans.com>.

Hi Steve,

I think your question is specific to HDFS HA setup.
Flink HA addresses failover issues only for job manager and job meta state.
The storage layer for savepoints/checkpoints and its failover are responsibility of HDFS deployment.
Flink uses HDFS as external system, available over location url.
I am not an expert on HDFS HA deployment. You could have a look into hadoop docs [1].

Best,
Andrey

[1] https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

> On 21 Dec 2018, at 21:48, Steven Nelson <sn...@sourceallies.com> wrote:
> 
> First off, I am new to using HDFS to store things, so expect stupid questions.
> 
> I am working on hardening our Flink cluster for production usage. This includes setting up an HA flink cluster, saving checkpoint and savepoints to a central location etc. I have a functioning HDFS setup inside an HA Kubernetes cluster. We have successfully stored checkpoint data in the HDFS directory.
> 
> When we specify the location for the HDFS savepoints/checkpoints/HA save locations we specify the a single namenode in the url. My question is how do we implement failover in the event that namenode fails? We looked at putting the namenodes behind a load balancer, except the backup nodes attempt to respond to writes (and fail). I figure I am missing something simple.
> 
> -Steve