You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by Jay Sen <ja...@apache.org> on 2020/09/09 01:46:52 UTC

Yarn token mgmt

Hi Gobblin Dev team,

I see the configs and functionality around creating and renewing the token
off of provided keytab file, but didn't ind any functionality that creates
token for remote system.

so question is if we run Gobblin for hadoop to hadoop job ( source =
CopySource ), how does it manages creating and renewing token for the
remote hadoop cluster.

Thanks
jay

Re: Yarn token mgmt

Posted by Jay Sen <ja...@apache.org>.
Sure. Just noticed the typo.

I meant that I will create the PR that should NOT be breaking any
compatibility :)

-Jay



On Tue, Sep 15, 2020, 10:18 PM Lei Sun <le...@linkedin.com.invalid> wrote:

> Thanks Jay. Looking forward to your PR.
> ________________________________
> From: Jay Sen <ja...@apache.org>
> Sent: Tuesday, September 15, 2020 7:29 PM
> To: dev@gobblin.apache.org <de...@gobblin.apache.org>
> Subject: Re: Yarn token mgmt
>
> got it, I was able to make some changes and collect tokens in Credential to
> write to file and it worked fine. will create PR that should be a breaking
> change. Thanks
>
> On Tue, Sep 15, 2020 at 10:12 AM Lei Sun <le...@linkedin.com.invalid>
> wrote:
>
> > Hi Jay,
> >
> >
> > So yes most of YarnAppLauncher is still relying on kerberos login. We are
> > using similar thing as OTHER_NAMENODES but it just happen to be the case
> > that Azkaban is taking over the heavy-lifting part.
> >
> >
> > Regards,
> > Lei
> > ________________________________
> > From: Jay Sen <ja...@apache.org>
> > Sent: Friday, September 11, 2020 12:23 AM
> > To: dev@gobblin.apache.org <de...@gobblin.apache.org>
> > Subject: Re: Yarn token mgmt
> >
> > Hi Lei, yes, that was helpful.
> >
> > the more i looked into it. I see all current method from yarnAppLauncher
> is
> > limited to local hadoop cluster with kerberos only. ( as it uses
> > getDelegationToken for the local hdfs)
> > Moreover, it does not honor multiple tokens like KMS-dt or possibly
> others
> > ( even though I sae other token fetching functions, its all for the local
> > cluster ), but not for the remote.
> >
> > The right way of doing this can be registrering the remote namenodes
> > dynamically at runtime from the CopySource as it appears, probably for
> the
> > long term solution.
> > For now, I was able to solve it via bunch of code changes with similar
> > functionality of OTHEER_NAMENODES.
> >
> > Thanks for the pointers.
> > -Jay
> >
> > On Tue, Sep 8, 2020 at 8:59 PM Lei Sun <le...@linkedin.com.invalid>
> wrote:
> >
> > > Hi Jay,
> > >
> > >
> > > Our workflow scheduler (Azkaban) did that for Gobblin to fetch remote
> > > tokens in the beginning of the job and add it to UGI.
> > >
> > > Hope it helps.
> > >
> > >
> > > Lei
> > >
> > >
> > > ________________________________
> > > From: Jay Sen <ja...@apache.org>
> > > Sent: Tuesday, September 8, 2020 6:46 PM
> > > To: dev@gobblin.incubator.apache.org <dev@gobblin.incubator.apache.org
> >
> > > Subject: Yarn token mgmt
> > >
> > > Hi Gobblin Dev team,
> > >
> > > I see the configs and functionality around creating and renewing the
> > token
> > > off of provided keytab file, but didn't ind any functionality that
> > creates
> > > token for remote system.
> > >
> > > so question is if we run Gobblin for hadoop to hadoop job ( source =
> > > CopySource ), how does it manages creating and renewing token for the
> > > remote hadoop cluster.
> > >
> > > Thanks
> > > jay
> > >
> >
>

Re: Yarn token mgmt

Posted by Lei Sun <le...@linkedin.com.INVALID>.
Thanks Jay. Looking forward to your PR.
________________________________
From: Jay Sen <ja...@apache.org>
Sent: Tuesday, September 15, 2020 7:29 PM
To: dev@gobblin.apache.org <de...@gobblin.apache.org>
Subject: Re: Yarn token mgmt

got it, I was able to make some changes and collect tokens in Credential to
write to file and it worked fine. will create PR that should be a breaking
change. Thanks

On Tue, Sep 15, 2020 at 10:12 AM Lei Sun <le...@linkedin.com.invalid> wrote:

> Hi Jay,
>
>
> So yes most of YarnAppLauncher is still relying on kerberos login. We are
> using similar thing as OTHER_NAMENODES but it just happen to be the case
> that Azkaban is taking over the heavy-lifting part.
>
>
> Regards,
> Lei
> ________________________________
> From: Jay Sen <ja...@apache.org>
> Sent: Friday, September 11, 2020 12:23 AM
> To: dev@gobblin.apache.org <de...@gobblin.apache.org>
> Subject: Re: Yarn token mgmt
>
> Hi Lei, yes, that was helpful.
>
> the more i looked into it. I see all current method from yarnAppLauncher is
> limited to local hadoop cluster with kerberos only. ( as it uses
> getDelegationToken for the local hdfs)
> Moreover, it does not honor multiple tokens like KMS-dt or possibly others
> ( even though I sae other token fetching functions, its all for the local
> cluster ), but not for the remote.
>
> The right way of doing this can be registrering the remote namenodes
> dynamically at runtime from the CopySource as it appears, probably for the
> long term solution.
> For now, I was able to solve it via bunch of code changes with similar
> functionality of OTHEER_NAMENODES.
>
> Thanks for the pointers.
> -Jay
>
> On Tue, Sep 8, 2020 at 8:59 PM Lei Sun <le...@linkedin.com.invalid> wrote:
>
> > Hi Jay,
> >
> >
> > Our workflow scheduler (Azkaban) did that for Gobblin to fetch remote
> > tokens in the beginning of the job and add it to UGI.
> >
> > Hope it helps.
> >
> >
> > Lei
> >
> >
> > ________________________________
> > From: Jay Sen <ja...@apache.org>
> > Sent: Tuesday, September 8, 2020 6:46 PM
> > To: dev@gobblin.incubator.apache.org <de...@gobblin.incubator.apache.org>
> > Subject: Yarn token mgmt
> >
> > Hi Gobblin Dev team,
> >
> > I see the configs and functionality around creating and renewing the
> token
> > off of provided keytab file, but didn't ind any functionality that
> creates
> > token for remote system.
> >
> > so question is if we run Gobblin for hadoop to hadoop job ( source =
> > CopySource ), how does it manages creating and renewing token for the
> > remote hadoop cluster.
> >
> > Thanks
> > jay
> >
>

Re: Yarn token mgmt

Posted by Jay Sen <ja...@apache.org>.
got it, I was able to make some changes and collect tokens in Credential to
write to file and it worked fine. will create PR that should be a breaking
change. Thanks

On Tue, Sep 15, 2020 at 10:12 AM Lei Sun <le...@linkedin.com.invalid> wrote:

> Hi Jay,
>
>
> So yes most of YarnAppLauncher is still relying on kerberos login. We are
> using similar thing as OTHER_NAMENODES but it just happen to be the case
> that Azkaban is taking over the heavy-lifting part.
>
>
> Regards,
> Lei
> ________________________________
> From: Jay Sen <ja...@apache.org>
> Sent: Friday, September 11, 2020 12:23 AM
> To: dev@gobblin.apache.org <de...@gobblin.apache.org>
> Subject: Re: Yarn token mgmt
>
> Hi Lei, yes, that was helpful.
>
> the more i looked into it. I see all current method from yarnAppLauncher is
> limited to local hadoop cluster with kerberos only. ( as it uses
> getDelegationToken for the local hdfs)
> Moreover, it does not honor multiple tokens like KMS-dt or possibly others
> ( even though I sae other token fetching functions, its all for the local
> cluster ), but not for the remote.
>
> The right way of doing this can be registrering the remote namenodes
> dynamically at runtime from the CopySource as it appears, probably for the
> long term solution.
> For now, I was able to solve it via bunch of code changes with similar
> functionality of OTHEER_NAMENODES.
>
> Thanks for the pointers.
> -Jay
>
> On Tue, Sep 8, 2020 at 8:59 PM Lei Sun <le...@linkedin.com.invalid> wrote:
>
> > Hi Jay,
> >
> >
> > Our workflow scheduler (Azkaban) did that for Gobblin to fetch remote
> > tokens in the beginning of the job and add it to UGI.
> >
> > Hope it helps.
> >
> >
> > Lei
> >
> >
> > ________________________________
> > From: Jay Sen <ja...@apache.org>
> > Sent: Tuesday, September 8, 2020 6:46 PM
> > To: dev@gobblin.incubator.apache.org <de...@gobblin.incubator.apache.org>
> > Subject: Yarn token mgmt
> >
> > Hi Gobblin Dev team,
> >
> > I see the configs and functionality around creating and renewing the
> token
> > off of provided keytab file, but didn't ind any functionality that
> creates
> > token for remote system.
> >
> > so question is if we run Gobblin for hadoop to hadoop job ( source =
> > CopySource ), how does it manages creating and renewing token for the
> > remote hadoop cluster.
> >
> > Thanks
> > jay
> >
>

Re: Yarn token mgmt

Posted by Lei Sun <le...@linkedin.com.INVALID>.
Hi Jay,


So yes most of YarnAppLauncher is still relying on kerberos login. We are using similar thing as OTHER_NAMENODES but it just happen to be the case that Azkaban is taking over the heavy-lifting part.


Regards,
Lei
________________________________
From: Jay Sen <ja...@apache.org>
Sent: Friday, September 11, 2020 12:23 AM
To: dev@gobblin.apache.org <de...@gobblin.apache.org>
Subject: Re: Yarn token mgmt

Hi Lei, yes, that was helpful.

the more i looked into it. I see all current method from yarnAppLauncher is
limited to local hadoop cluster with kerberos only. ( as it uses
getDelegationToken for the local hdfs)
Moreover, it does not honor multiple tokens like KMS-dt or possibly others
( even though I sae other token fetching functions, its all for the local
cluster ), but not for the remote.

The right way of doing this can be registrering the remote namenodes
dynamically at runtime from the CopySource as it appears, probably for the
long term solution.
For now, I was able to solve it via bunch of code changes with similar
functionality of OTHEER_NAMENODES.

Thanks for the pointers.
-Jay

On Tue, Sep 8, 2020 at 8:59 PM Lei Sun <le...@linkedin.com.invalid> wrote:

> Hi Jay,
>
>
> Our workflow scheduler (Azkaban) did that for Gobblin to fetch remote
> tokens in the beginning of the job and add it to UGI.
>
> Hope it helps.
>
>
> Lei
>
>
> ________________________________
> From: Jay Sen <ja...@apache.org>
> Sent: Tuesday, September 8, 2020 6:46 PM
> To: dev@gobblin.incubator.apache.org <de...@gobblin.incubator.apache.org>
> Subject: Yarn token mgmt
>
> Hi Gobblin Dev team,
>
> I see the configs and functionality around creating and renewing the token
> off of provided keytab file, but didn't ind any functionality that creates
> token for remote system.
>
> so question is if we run Gobblin for hadoop to hadoop job ( source =
> CopySource ), how does it manages creating and renewing token for the
> remote hadoop cluster.
>
> Thanks
> jay
>

Re: Yarn token mgmt

Posted by Jay Sen <ja...@apache.org>.
Hi Lei, yes, that was helpful.

the more i looked into it. I see all current method from yarnAppLauncher is
limited to local hadoop cluster with kerberos only. ( as it uses
getDelegationToken for the local hdfs)
Moreover, it does not honor multiple tokens like KMS-dt or possibly others
( even though I sae other token fetching functions, its all for the local
cluster ), but not for the remote.

The right way of doing this can be registrering the remote namenodes
dynamically at runtime from the CopySource as it appears, probably for the
long term solution.
For now, I was able to solve it via bunch of code changes with similar
functionality of OTHEER_NAMENODES.

Thanks for the pointers.
-Jay

On Tue, Sep 8, 2020 at 8:59 PM Lei Sun <le...@linkedin.com.invalid> wrote:

> Hi Jay,
>
>
> Our workflow scheduler (Azkaban) did that for Gobblin to fetch remote
> tokens in the beginning of the job and add it to UGI.
>
> Hope it helps.
>
>
> Lei
>
>
> ________________________________
> From: Jay Sen <ja...@apache.org>
> Sent: Tuesday, September 8, 2020 6:46 PM
> To: dev@gobblin.incubator.apache.org <de...@gobblin.incubator.apache.org>
> Subject: Yarn token mgmt
>
> Hi Gobblin Dev team,
>
> I see the configs and functionality around creating and renewing the token
> off of provided keytab file, but didn't ind any functionality that creates
> token for remote system.
>
> so question is if we run Gobblin for hadoop to hadoop job ( source =
> CopySource ), how does it manages creating and renewing token for the
> remote hadoop cluster.
>
> Thanks
> jay
>

Re: Yarn token mgmt

Posted by Lei Sun <le...@linkedin.com.INVALID>.
Hi Jay,


Our workflow scheduler (Azkaban) did that for Gobblin to fetch remote tokens in the beginning of the job and add it to UGI.

Hope it helps.


Lei


________________________________
From: Jay Sen <ja...@apache.org>
Sent: Tuesday, September 8, 2020 6:46 PM
To: dev@gobblin.incubator.apache.org <de...@gobblin.incubator.apache.org>
Subject: Yarn token mgmt

Hi Gobblin Dev team,

I see the configs and functionality around creating and renewing the token
off of provided keytab file, but didn't ind any functionality that creates
token for remote system.

so question is if we run Gobblin for hadoop to hadoop job ( source =
CopySource ), how does it manages creating and renewing token for the
remote hadoop cluster.

Thanks
jay