You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GrahamDennis <gi...@git.apache.org> on 2014/08/11 13:06:44 UTC

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

GitHub user GrahamDennis opened a pull request:

    https://github.com/apache/spark/pull/1890

    [SPARK-2878]: Fix custom spark.kryo.registrator

    This is a work-in-progress, and I'm looking for feedback on my current approach.  My aim here is to add the user jars specified in SparkConf into the Executor processes before they start any tasks, and to add these jars to the class path of all threads.
    
    So far I have just implemented sending the user jars to the Executor processes before registering the executor with the master.  Next steps are to add these jars to all threads, and/or share a pool of kryo instances created with a known class loader.  This will also need implementing for Mesos & YARN.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/GrahamDennis/spark feature/spark-2878

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1890
    
----
commit 8cb52099c0faa04234cb2741d98d2fb689f6a42c
Author: Graham Dennis <gr...@gmail.com>
Date:   2014-08-11T11:05:07Z

    [WIP][SPARK-2878]: Fix custom spark.kryo.registrator
    
    My aim here is to add the user jars specified in SparkConf into the Executor processes before they launch any tasks, and add these jars to the class path of all threads.
    
    So far I have just implemented sending the user jars to the Executor processes before registering the executor with the master.  Next steps are to add these jars to all threads, and/or share a pool of kryo instances created with a known class loader.  This will also need implementing for Mesos & YARN.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1890


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by GrahamDennis <gi...@git.apache.org>.

Github user GrahamDennis commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-53213573
  
    @rxin: I haven't modified the Mesos code, and it seems that wouldn't be too hard to do, but I have no way of testing it.  Suggestions welcomed.
    
    As for YARN, the code is a little more opaque to me, but @mridulm suggests no changes may be necessary there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-51767200
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52862331
  
    Thanks, @GrahamDennis for updating this. You should get a medal for making it super easy to reproduce problems!
    
    How does this fix work on YARN / Mesos? As I understand, you basically make the standalone cluster mgr worker itself download the initial user jars. 
    
    An alternative fix is to not initialize the data serializer until the first time updateDependencies is called.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by GrahamDennis <gi...@git.apache.org>.

Github user GrahamDennis commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52581284
  
    @rxin: No, #1972 isn't enough.  I've updated my example project to reproduce this problem, see https://github.com/GrahamDennis/spark-kryo-serialisation
    
    Running this using `spark-submit --master local-cluster[10,1,1024] --class 'org.example.SparkDriver' path-to-jar.jar` gives the following error in the executor processes:
    
    ```
    14/08/19 11:46:51 ERROR OneForOneStrategy: org.example.WrapperSerializer
    java.lang.ClassNotFoundException: org.example.WrapperSerializer
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    	at java.lang.Class.forName0(Native Method)
    	at java.lang.Class.forName(Class.java:270)
    	at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:161)
    	at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:182)
    	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:185)
    	at org.apache.spark.executor.Executor.<init>(Executor.scala:87)
    	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:60)
    	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    	at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
    	at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
    	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
    	at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
    	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    ```
    
    Basically, when the SparkEnv is created, which happens essentially at launch-time of the Executor's, the custom serialiser is loaded.  The alternatives here are either to (1) add the user jar to the Executor classpath at launch time, or (2) instantiate the serialisers lazily on-demand and ensure that they are never required before.
    
    Option 2 will require being very careful.  For example a quick scan found the following two problems that would need to be solved. (A) at the class-level in Executor.scala we have:
    
    ```scala
    // Set the classloader for serializer
    env.serializer.setDefaultClassLoader(urlClassLoader)
    ```
    
    This code would need to move.
    
    And (B) in Executor.scala / TaskRunner.run we have:
    
    ```scala
          val ser = SparkEnv.get.closureSerializer.newInstance()
          logInfo(s"Running $taskName (TID $taskId)")
          execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
          var taskStart: Long = 0
          def gcTime = ManagementFactory.getGarbageCollectorMXBeans.map(_.getCollectionTime).sum
          val startGCTime = gcTime
    
          try {
            SparkEnv.set(env)
            Accumulators.clear()
            val (taskFiles, taskJars, taskBytes) = Task.deserializeWithDependencies(serializedTask)
            updateDependencies(taskFiles, taskJars)
    ```
    
    Here the `val ser` line at the top would need to move after updateDependencies (this is a much easier fix).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1890#discussion_r16154021
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala ---
    @@ -114,12 +115,12 @@ private[spark] class ExecutorRunner(
         case other => other
       }
     
    -  def getCommandSeq = {
    +  def getCommandSeq(userJarClassPathEntries : Seq[String]) = {
         val command = Command(
           appDesc.command.mainClass,
           appDesc.command.arguments.map(substituteVariables) ++ Seq(appId),
           appDesc.command.environment,
    -      appDesc.command.classPathEntries,
    +      appDesc.command.classPathEntries ++ userJarClassPathEntries,
    --- End diff --
    
    This should probably respect the `spark.files.userClassPathFirst` config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52918929
  
    Unless user tries addJar, should not be relevant to yarn modes,
    
    Regards,
    Mridul
    
    
    On Thu, Aug 21, 2014 at 5:37 AM, Reynold Xin <no...@github.com>
    wrote:
    
    > Thanks, @GrahamDennis <https://github.com/GrahamDennis> for updating
    > this. You should get a medal for making it super easy to reproduce problems!
    >
    > How does this fix work on YARN / Mesos? As I understand, you basically
    > make the standalone cluster mgr worker itself download the initial user
    > jars.
    >
    > An alternative fix is to not initialize the data serializer until the
    > first time updateDependencies is called.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1890#issuecomment-52862331>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by GrahamDennis <gi...@git.apache.org>.

Github user GrahamDennis commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52564363
  
    @pwendell: I've verified that #1972 fixes the problem I was having.  This PR also addresses a bug (unfiled, but related to SPARK-2878) that you can't distribute a custom serialiser in an application jar.  Although I can't imagine there's significant demand for that, it was just an issue I hit while trying to diagnose SPARK-2878.  If support for custom serialisers in application jars isn't an issue, then feel free to close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by ypwais <gi...@git.apache.org>.

Github user ypwais commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-57271841
  
    Any chance this might make it into v1.2?  I'd love to use custom {Input,Output}Formats (e.g. Parquet) and I personally spent almost a day after getting bitten by this classloader issue (especially since it doesn't manifest in local mode).  And fixing serialization is nice too ^_^


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by GrahamDennis <gi...@git.apache.org>.

Github user GrahamDennis commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52581353
  
    Just to be clear: custom serialisers are *not* a pain point for me.  I'm happy provided custom Kryo registrators can be shipped in the app jar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by GrahamDennis <gi...@git.apache.org>.

Github user GrahamDennis commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52859692
  
    I've filed a new JIRA ticket for the custom serialiser problem here: https://issues.apache.org/jira/browse/SPARK-3166 and updated the title of this PR to indicate that it now addresses SPARK-3166, as SPARK-2878 has since been solved by @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-96770267
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by GrahamDennis <gi...@git.apache.org>.

Github user GrahamDennis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1890#discussion_r16155353
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala ---
    @@ -114,12 +115,12 @@ private[spark] class ExecutorRunner(
         case other => other
       }
     
    -  def getCommandSeq = {
    +  def getCommandSeq(userJarClassPathEntries : Seq[String]) = {
         val command = Command(
           appDesc.command.mainClass,
           appDesc.command.arguments.map(substituteVariables) ++ Seq(appId),
           appDesc.command.environment,
    -      appDesc.command.classPathEntries,
    +      appDesc.command.classPathEntries ++ userJarClassPathEntries,
    --- End diff --
    
    Good point.  Also, even with the above code, `userJarClassPathEntries` will be before the output of `bin/compute-classpath.sh` (see buildJavaOpts in CommandUtils.scala).  At present, appDesc.command.classPathEntries really comes from spark.executor.extraClassPath.
    
    What should be the order for the two values of spark.files.userClassPathFirst?
    
    If false, should it be `[<results of bin/compute-classpath.sh>, spark.executor.extraClassPath, userJars]`? (Note that this is different to the present behaviour where compute-classpath.sh comes after spark.executor.extraClassPath).  An alternative here would be [spark.executor.extraClassPath, <results of bin/compute-classpath.sh>, userJars]?
    
    If true, should it be [userJars, spark.extraClassPath, <results of bin/compute-classpath.sh>]?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-57763605
  
    Yes we will take a look at this for 1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-51805750
  
    FWIW I think this is already what happens in YARN, as we use Hadoop's distributed cache to send out the jars and include them on the executor classpath at startup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by GrahamDennis <gi...@git.apache.org>.

Github user GrahamDennis commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-57956359
  
    @rxin, @ash211 It would be good to have a conversation about whether this is the best approach.
    
    My approach is a sort-of brute-force approach of just adding the application jar to the classpath, but another approach might be to use an isolated classloader for the user application jar that only includes org.apache.spark.*.  This would reduce conflicts with transitive dependencies.  I'm not familiar with this type of classloader magic, so I don't know how feasible this would be.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52564521
  
    It is good to support that case actually.  Is #1972 not enough to support adding a custom serializer yet?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by jjuraszek <gi...@git.apache.org>.

Github user jjuraszek commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-64792200
  
    it is crucial to have it - currently I can't use spark job operating on avro objects (serializing by hand is no option because of many types needed to be extended). is it possible to have it in some kind of snapshot distro of spark soon?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by arahuja <gi...@git.apache.org>.

Github user arahuja commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52446389
  
    I think this may be the issue I have been wrangling with the last couple
    days.  I see a variety of odd Kryo related errors, slightly different each
    time:
    
    14/08/17 22:52:01 ERROR Executor: Exception in task ID 17061
    com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID:
    12763
            at com.esotericsoftware.kryo.util.DefaultClassResolver.readCl
    
    14/08/17 22:52:01 ERROR Executor: Exception in task ID 17051
    java.lang.IndexOutOfBoundsException: Index: 5927, Size: 0
            at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    
    Looking through the executor logs I do see
    
    14/08/17 22:52:00 ERROR KryoSerializer: Failed to run spark.kryo.registrator
    java.lang.ClassNotFoundException:
    org.bdgenomics.guacamole.GuacamoleKryoRegistrator
    
    Also, Sandy we are running on YARN and still seem to see this, is there a
    workaround you know of?  Or is there any known workaround in general?
    
    Thanks,
    Arun
    
    
    On Sat, Aug 16, 2014 at 6:05 PM, Patrick Wendell <no...@github.com>
    wrote:
    
    > Hey @GrahamDennis <https://github.com/GrahamDennis> thanks for an
    > extremely thorough analysis of this issue here and on the JIRA. I think
    > that @rxin <https://github.com/rxin> was able to solve this in a PR that
    > improves the way we deal with passing classloaders to our serializers. Do
    > you mind trying that fix (which has now been merged) and seeing if it fixes
    > your issue?
    >
    > #1972 <https://github.com/apache/spark/pull/1972>
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1890#issuecomment-52407167>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-57956057
  
    I set the Target Version on SPARK-3166 to 1.2.0 so we can try to get this in


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-52407167
  
    Hey @GrahamDennis thanks for an extremely thorough analysis of this issue here and on the JIRA. I think that @rxin was able to solve this in a PR that improves the way we deal with passing classloaders to our serializers. Do you mind trying that fix (which has now been merged) and seeing if it fixes your issue?
    
    https://github.com/apache/spark/pull/1972


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-54694485
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3166]: Allow custom serialiser to be sh...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-136889467
  
    @GrahamDennis let's close this PR since it's mostly gone stale and will likely not be merged. We can always re-open an updated one if necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

Posted by GrahamDennis <gi...@git.apache.org>.

Github user GrahamDennis commented on the pull request:

    https://github.com/apache/spark/pull/1890#issuecomment-51888165
  
    I've updated my PR, and now instead of getting the Executor process to download jars before registering itself with the application driver, the Worker process downloads the jars and adds them to the Executor process when it is launched.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org