You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Sergiy Matusevych <se...@gmail.com> on 2017/03/15 06:10:46 UTC

Two AMs in one YARN container?

Hi YARN developers,

I have an interesting problem that I think is related to YARN Java client.
I am trying to launch *two* application masters in one container. To be
more specific, I am starting a Spark job on YARN, and launch an Apache REEF
Unmanaged AM from the Spark Driver.

Technically, YARN Resource Manager should not care which process each AM
runs in. However, there is a problem with the YARN Java client
implementation: there is a global UserGroupInformation object that holds
the user credentials of the current RM session. This data structure is
shared by all AMs, and when REEF application tries to register the second
(unmanaged) AM, the client library presents to YARN RM all credentials,
including the security token of the first (managed) AM. YARN rejects such
registration request, throwing InvalidApplicationMasterRequestException
"Application Master is already registered".

I feel like this issue can be resolved by a relatively small update to the
YARN Java client - e.g. by introducing a new variant of the
AMRMClientAsync.registerApplicationMaster() that would take the required
security token (instead of getting it implicitly from the
UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
some sort of RM session class that would wrap all data that is currently
global. I need to think about the elegant API for it.

What do you guys think? I would love to work on this problem and send you a
pull request for the upcoming 2.9 release.

Cheers,
Sergiy.

Re: Two AMs in one YARN container?

Posted by Sergiy Matusevych <se...@gmail.com>.
Hi Jason,

Thanks a lot for your help again! Having two separate UserGroupInformation
instances is exactly what I had in mind. What I do not understand, though,
is how to make sure that our second call to .regsiterApplicationMaster()
will pick the right UserGroupInformation object. I would love to find a way
that does not involve any changes to the YARN client, but if we have to
patch it, of course, I agree that we need to have a generic yet minimally
invasive solution.

Thank you!
Sergiy.


On Thu, Mar 16, 2017 at 8:03 AM, Jason Lowe <jl...@yahoo-inc.com> wrote:
>
> I believe a cleaner way to solve this problem is to create two,
_separate_ UserGroupInformation objects and wrap each AM instances in a UGI
doAs so they aren't trying to share the same credentials.  This is one
example of a token bleeding over and causing problems. I suspect trying to
fix these one-by-one as they pop up is going to be frustrating compared to
just ensuring the credentials remain separate as if they really were
running in separate JVMs.
>
> Adding Daryn who knows a lot more about the UGI stuff so he can correct
any misunderstandings on my part.
>
> Jason
>
>
> On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych <
sergiy.matusevych@gmail.com> wrote:
>
>
> Hi YARN developers,
>
> I have an interesting problem that I think is related to YARN Java client.
> I am trying to launch *two* application masters in one container. To be
> more specific, I am starting a Spark job on YARN, and launch an Apache
REEF
> Unmanaged AM from the Spark Driver.
>
> Technically, YARN Resource Manager should not care which process each AM
> runs in. However, there is a problem with the YARN Java client
> implementation: there is a global UserGroupInformation object that holds
> the user credentials of the current RM session. This data structure is
> shared by all AMs, and when REEF application tries to register the second
> (unmanaged) AM, the client library presents to YARN RM all credentials,
> including the security token of the first (managed) AM. YARN rejects such
> registration request, throwing InvalidApplicationMasterRequestException
> "Application Master is already registered".
>
> I feel like this issue can be resolved by a relatively small update to the
> YARN Java client - e.g. by introducing a new variant of the
> AMRMClientAsync.registerApplicationMaster() that would take the required
> security token (instead of getting it implicitly from the
> UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
> some sort of RM session class that would wrap all data that is currently
> global. I need to think about the elegant API for it.
>
> What do you guys think? I would love to work on this problem and send you
a
> pull request for the upcoming 2.9 release.
>
> Cheers,
> Sergiy.
>
>

Re: Two AMs in one YARN container?

Posted by Sergiy Matusevych <se...@gmail.com>.
Hi Jason,

Thanks a lot for your help again! Having two separate UserGroupInformation
instances is exactly what I had in mind. What I do not understand, though,
is how to make sure that our second call to .regsiterApplicationMaster()
will pick the right UserGroupInformation object. I would love to find a way
that does not involve any changes to the YARN client, but if we have to
patch it, of course, I agree that we need to have a generic yet minimally
invasive solution.

Thank you!
Sergiy.


On Thu, Mar 16, 2017 at 8:03 AM, Jason Lowe <jl...@yahoo-inc.com> wrote:
>
> I believe a cleaner way to solve this problem is to create two,
_separate_ UserGroupInformation objects and wrap each AM instances in a UGI
doAs so they aren't trying to share the same credentials.  This is one
example of a token bleeding over and causing problems. I suspect trying to
fix these one-by-one as they pop up is going to be frustrating compared to
just ensuring the credentials remain separate as if they really were
running in separate JVMs.
>
> Adding Daryn who knows a lot more about the UGI stuff so he can correct
any misunderstandings on my part.
>
> Jason
>
>
> On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych <
sergiy.matusevych@gmail.com> wrote:
>
>
> Hi YARN developers,
>
> I have an interesting problem that I think is related to YARN Java client.
> I am trying to launch *two* application masters in one container. To be
> more specific, I am starting a Spark job on YARN, and launch an Apache
REEF
> Unmanaged AM from the Spark Driver.
>
> Technically, YARN Resource Manager should not care which process each AM
> runs in. However, there is a problem with the YARN Java client
> implementation: there is a global UserGroupInformation object that holds
> the user credentials of the current RM session. This data structure is
> shared by all AMs, and when REEF application tries to register the second
> (unmanaged) AM, the client library presents to YARN RM all credentials,
> including the security token of the first (managed) AM. YARN rejects such
> registration request, throwing InvalidApplicationMasterRequestException
> "Application Master is already registered".
>
> I feel like this issue can be resolved by a relatively small update to the
> YARN Java client - e.g. by introducing a new variant of the
> AMRMClientAsync.registerApplicationMaster() that would take the required
> security token (instead of getting it implicitly from the
> UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
> some sort of RM session class that would wrap all data that is currently
> global. I need to think about the elegant API for it.
>
> What do you guys think? I would love to work on this problem and send you
a
> pull request for the upcoming 2.9 release.
>
> Cheers,
> Sergiy.
>
>

Re: Two AMs in one YARN container?

Posted by Jason Lowe <jl...@yahoo-inc.com.INVALID>.
I believe a cleaner way to solve this problem is to create two, _separate_ UserGroupInformation objects and wrap each AM instances in a UGI doAs so they aren't trying to share the same credentials.  This is one example of a token bleeding over and causing problems. I suspect trying to fix these one-by-one as they pop up is going to be frustrating compared to just ensuring the credentials remain separate as if they really were running in separate JVMs.
Adding Daryn who knows a lot more about the UGI stuff so he can correct any misunderstandings on my part.
Jason
 

    On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych <se...@gmail.com> wrote:
 

 Hi YARN developers,

I have an interesting problem that I think is related to YARN Java client.
I am trying to launch *two* application masters in one container. To be
more specific, I am starting a Spark job on YARN, and launch an Apache REEF
Unmanaged AM from the Spark Driver.

Technically, YARN Resource Manager should not care which process each AM
runs in. However, there is a problem with the YARN Java client
implementation: there is a global UserGroupInformation object that holds
the user credentials of the current RM session. This data structure is
shared by all AMs, and when REEF application tries to register the second
(unmanaged) AM, the client library presents to YARN RM all credentials,
including the security token of the first (managed) AM. YARN rejects such
registration request, throwing InvalidApplicationMasterRequestException
"Application Master is already registered".

I feel like this issue can be resolved by a relatively small update to the
YARN Java client - e.g. by introducing a new variant of the
AMRMClientAsync.registerApplicationMaster() that would take the required
security token (instead of getting it implicitly from the
UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
some sort of RM session class that would wrap all data that is currently
global. I need to think about the elegant API for it.

What do you guys think? I would love to work on this problem and send you a
pull request for the upcoming 2.9 release.

Cheers,
Sergiy.


   

Re: Two AMs in one YARN container?

Posted by Jason Lowe <jl...@yahoo-inc.com.INVALID>.
I believe a cleaner way to solve this problem is to create two, _separate_ UserGroupInformation objects and wrap each AM instances in a UGI doAs so they aren't trying to share the same credentials.  This is one example of a token bleeding over and causing problems. I suspect trying to fix these one-by-one as they pop up is going to be frustrating compared to just ensuring the credentials remain separate as if they really were running in separate JVMs.
Adding Daryn who knows a lot more about the UGI stuff so he can correct any misunderstandings on my part.
Jason
 

    On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych <se...@gmail.com> wrote:
 

 Hi YARN developers,

I have an interesting problem that I think is related to YARN Java client.
I am trying to launch *two* application masters in one container. To be
more specific, I am starting a Spark job on YARN, and launch an Apache REEF
Unmanaged AM from the Spark Driver.

Technically, YARN Resource Manager should not care which process each AM
runs in. However, there is a problem with the YARN Java client
implementation: there is a global UserGroupInformation object that holds
the user credentials of the current RM session. This data structure is
shared by all AMs, and when REEF application tries to register the second
(unmanaged) AM, the client library presents to YARN RM all credentials,
including the security token of the first (managed) AM. YARN rejects such
registration request, throwing InvalidApplicationMasterRequestException
"Application Master is already registered".

I feel like this issue can be resolved by a relatively small update to the
YARN Java client - e.g. by introducing a new variant of the
AMRMClientAsync.registerApplicationMaster() that would take the required
security token (instead of getting it implicitly from the
UserGroupInformation.getCurrentUser().getCredentials() etc.), or having
some sort of RM session class that would wrap all data that is currently
global. I need to think about the elegant API for it.

What do you guys think? I would love to work on this problem and send you a
pull request for the upcoming 2.9 release.

Cheers,
Sergiy.