You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Ben Laird <br...@gmail.com> on 2018/09/04 15:18:12 UTC

Re: WebHdfsSensor doesn't support HDFS HA

Manu -

This is the relevant code I was referencing before:
https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/webhdfs_hook.py#L54-L71

So multiple connections for a given conn_id is already built into some
hooks, but we need a way to set this from CLI. I'll be creating a JIRA
shortly and pushing an update to the cli for this

On Thu, Aug 30, 2018 at 2:03 AM Manu Zhang <ow...@gmail.com> wrote:

> Thanks Xiaodong, that works like a charm.
>
> Manu
>
> On Thu, Aug 30, 2018 at 11:34 AM Deng Xiaodong <xd...@gmail.com>
> wrote:
>
> > Hi Manu,
> >
> > You can set up multiple connections with the same conn_id and different
> > host, rather than setting in one single connection.
> >
> >
> > XD
> >
> > On Thu, Aug 30, 2018 at 11:17 Manu Zhang <ow...@gmail.com>
> wrote:
> >
> > > Hi Ben,
> > >
> > > How do you set multiple connections through Web UI (from Connections
> item
> > > of Admin pull-down list) ? I'm tried setting a comma-separated list to
> a
> > > conn_id but that doesn't work.
> > >
> > > Thanks,
> > > Manu
> > >
> > >
> > > On Wed, Aug 29, 2018 at 11:31 PM Ben Laird <br...@gmail.com> wrote:
> > >
> > > > Hi Manu,
> > > >
> > > > We have the same use case as you, a primary and backup namenode. If I
> > > > understand your issue correctly, the WebHDFSSensor code checks an
> > > iterable
> > > > of Airflow connections to the namenode to find one that is active.
> > > >
> > > > However, my issue (which I've emailed this list about) was that you
> > > cannot
> > > > set multiple connections with the same name (e.g. webhdfs_default)
> > > through
> > > > the CLI, only in the Web interface. I'm planning on submitting a PR
> > soon
> > > to
> > > > remedy this.
> > > >
> > > > Ben
> > > >
> > > > On Wed, Aug 29, 2018 at 2:57 AM Driesprong, Fokko
> <fokko@driesprong.frl
> > >
> > > > wrote:
> > > >
> > > > > Hi Manu,
> > > > >
> > > > > Thanks for raising this question. There is a PR for moving
> > > > > <https://github.com/apache/incubator-airflow/pull/3560> to hdfs3.
> > > There
> > > > is
> > > > > code in the existing codebase, which support HA
> > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-airflow/blob/53b89b98371c7bb993b242c341d3941e9ce09f9a/airflow/hooks/hdfs_hook.py#L92-L96
> > > > > >,
> > > > > but this might not be for the sensor.
> > > > >
> > > > > Personally I'm not familiar with pyarrow.hdfs, so I'm not the one
> to
> > > > judge
> > > > > how mature it is. We need to replace Snakebite for sure since it is
> > > only
> > > > > compatible with Python 2.7.
> > > > >
> > > > > Cheers, Fokko
> > > > >
> > > > >
> > > > > Op wo 29 aug. 2018 om 04:29 schreef Manu Zhang <
> > > owenzhang1990@gmail.com
> > > > >:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We've been using WebHdfsSensor happily to sensor the state of
> > > upstream
> > > > > > tasks outputting to HDFS except when there is a namenode switch.
> > I've
> > > > > > opened https://issues.apache.org/jira/browse/AIRFLOW-2901 to
> > discuss
> > > > the
> > > > > > HDFS HA support.
> > > > > >
> > > > > > There are two solutions that I can see,
> > > > > >
> > > > > > 1. use pyarrow.hdfs which has HA support
> > > > > > 2. allow user to configure a list of namenodes
> > > > > >
> > > > > > WDYT ?
> > > > > >
> > > > > > Thanks,
> > > > > > Manu Zhang
> > > > > >
> > > > >
> > > >
> > >
> >
>