You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Praneet Sharma (JIRA)" <ji...@apache.org> on 2018/11/02 06:40:00 UTC

[jira] [Comment Edited] (SPARK-20059) HbaseCredentialProvider uses wrong classloader

    [ https://issues.apache.org/jira/browse/SPARK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672622#comment-16672622 ] 

Praneet Sharma edited comment on SPARK-20059 at 11/2/18 6:39 AM:
-----------------------------------------------------------------

Hi Guys

Regarding the fix done here for yarn-cluster mode, why are we adding primaryResource? Isn't adding args.jars to childClasspath enough?

I ask this because we have a scenario where-in the cleanup of primaryResource fails till the application has completed. Here are the details:
 * We have a primaryResource jar present in an *NFS* mounted location. Our scenario is that we perform the cleanup after the spark job starts on the cluster.
 * Till spark-2.1.0, this clean has worked fine for us because since we are in yarn-cluster mode, this primaryResource is no longer needed on the client once it has been uploaded to the cluster.
 * But with spark-2.2.1 (which contains this fix), when we attempt to clean up the primaryResource, it produces a .nfsxxx file which essentially means that spark-submit process is holding onto an open handle on this primaryResource for the entirety of application lifecycle.

What we found out was that since we are now adding primaryResource to the thread-context-classloader within SparkSubmit.scala even for yarn-cluster mode, the cleanup doesn't seem to happen on NFS. This is a regression for us. And it becomes more problematic for us in case of long running streaming jobs, because now we can't cleanup the primaryResource till the application completes, which is unacceptable to us.

This is how the issue can be reproduced:
 * Have spark-examples jar present in an NFS mounted location.
 * While performing spark-submit in yarn-cluster mode, provide spark-examples jar as the primary resource. 
 * When the job gets submitted onto the cluster, try deleting the jar in the NFS location.
 ** While the jar gets deleted, but it produces an .nfsxx file which locks the deletion of the directory itself until the application fully completes and spark-submit ends

{code:java}
export HADOOP_CONF_DIR=./conf
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 1G --num-executors 1 /mountloc/spark-test/spark-examples_2.11-2.3.1.jar 1000
{code}
When the above command submits an application onto the cluster, we attempt to delete /mountloc/spark-test/ location (which houses the primaryResource), but it doesn't succeed since NFS locks this directory by producing .nfsxxx file.

Please provide your thoughts on the same. Please let me know if any further information needs to be provided from my side.


was (Author: praneetsharma):
Hi Guys

Regarding the fix done here for yarn-cluster mode, why are we adding primaryResource? Isn't adding args.jars to childClasspath enough?

I ask this because we have a scenario where-in the cleanup of primaryResource fails till the application has completed. Here are the details:
 * We have a primaryResource jar present in an *NFS* mounted location. Our scenario is that we perform the cleanup after the spark job starts on the cluster.
 * Till spark-2.1.0, this clean has worked fine for us because since we are in yarn-cluster mode, this primaryResource is no longer needed on the client once it has been uploaded to the cluster.
 * But with spark-2.2.1 (which contains this fix), when we attempt to clean up the primaryResource, it produces a .nfsxxx file which essentially means that spark-submit process is holding onto an open handle on this primaryResource for the entirety of application lifecycle.

What we found out was that since we are now adding primaryResource to the thread-context-classloader within SparkSubmit.scala even for yarn-cluster mode, the cleanup doesn't seem to happen on NFS. This is a regression for us. And it becomes more problematic for us in case of long running streaming jobs, because now we can't cleanup the primaryResource till the application completes, which is unacceptable to us.

This is how the issue can be reproduced:
 * Have spark-examples jar present in an NFS mounted location.
 * While performing spark-submit in yarn-cluster mode, provide spark-examples jar as the primary resource. 
 * When the job gets submitted onto the cluster, try deleting the jar in the NFS location.
 ** While the jar gets deleted, but it produces an .nfsxx file which locks the deletion of the directory itself until the application fully completes and spark-submit ends

{code:java}
export HADOOP_CONF_DIR=./conf
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 1G --num-executors 1 /mountbdmqa/prsharma/spark-test/spark-examples_2.11-2.3.1.jar 1000
{code}
Please provide your thoughts on the same. Please let me know if any further information needs to be provided from my side.

> HbaseCredentialProvider uses wrong classloader
> ----------------------------------------------
>
>                 Key: SPARK-20059
>                 URL: https://issues.apache.org/jira/browse/SPARK-20059
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Saisai Shao
>            Assignee: Saisai Shao
>            Priority: Major
>             Fix For: 2.1.1, 2.2.0
>
>
> {{HBaseCredentialProvider}} uses system classloader instead of child classloader, which will make HBase jars specified with {{--jars}} fail to work, so here we should use the right class loader.
> Besides in yarn cluster mode jars specified with {{--jars}} is not added into client's class path, which will make it fail to load HBase jars and issue tokens in our scenario. Also some customized credential provider cannot be registered into client.
> So here I will fix this two issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org