You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Pat Ferrel <pa...@actionml.com> on 2018/05/23 15:57:31 UTC

Spark cluster error

Same CLI works using local Spark master, but fails using remote master for
a cluster due to a missing class def for protobuf used in hbase. We are
using the binary dist 0.12.1.  Is this known? Is there a work around?

We are now trying a source build in hope the class will be put in the
assembly passed to Spark and the reasoning is that the executors don’t
contain hbase classes but when you run a local executor it does, due to
some local classpath. If the source built assembly does not have these
classes, we will have the same problem. Namely how to get protobuf to the
executors.

Has anyone seen this?

Re: Spark cluster error

Posted by Donald Szeto <do...@apache.org>.
I recall at one point Spark switched to use per-thread classpath so that
each job would have its own isolated classpath. That was probably around
Spark 1.5 though, so not likely the exact same case here. From what version
of Spark to what version did you upgrade to?

On Tue, May 29, 2018 at 2:39 PM Pat Ferrel <pa...@occamsmachete.com> wrote:

> BTW the way we worked around this was to scale up the driver machine to
> handle the executors too-et voila. All worked but, our normal strategy of
> using remote Spark is now somehow broken. We upgraded everything to the
> latest stable and may have messed up some config. So not sure where the
> problem is, just looking for a clue we haven’t already thought of.
>
>
> From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 29, 2018 at 2:14:23 PM
> To: Donald Szeto <do...@apache.org> <do...@apache.org>,
> user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
>
> Subject:  Re: Spark cluster error
>
> Yes, the spark-submit --jars is where we started to find the missing
> class. The class isn’t found on the remote executor so we looked in the
> jars actually downloaded into the executor’s work dir. the PIO assembly
> jars are there are do have the classes. This would be in the classpath of
> the executor, right? Not sure what you are asking.
>
> Are you asking about the SPARK_CLASSPATH in spark-env.sh? The default
> should include the work subdir for the job, I believe. and it can only be
> added to so we couldn’t have messed that up if it points first to the
> work/job-number dir, right?
>
> I guess the root of my question is how can the jars be downloaded to the
> executor’s work dir and still the classes we know are in the jar are not
> found?
>
>
> From: Donald Szeto <do...@apache.org> <do...@apache.org>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 29, 2018 at 1:27:03 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  Re: Spark cluster error
>
> Sorry, what I meant was the actual spark-submit command that PIO was
> using. It should be in the log.
>
> What Spark version was that? I recall classpath issues with certain
> versions of Spark.
>
> On Thu, May 24, 2018 at 4:52 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>> Thanks Donald,
>>
>> We have:
>>
>>    - built pio with hbase 1.4.3, which is what we have deployed
>>    - verified that the `ProtobufUtil` class is in the pio hbase assembly
>>    - verified the assembly is passed in --jars to spark-submit
>>    - verified that the executors receive and store the assemblies in the
>>    FS work dir on the worker machines
>>    - verified that hashes match the original assembly so the class is
>>    being received by every executor
>>
>> However the executor is unable to find the class.
>>
>> This seems just short of impossible but clearly possible. How can the
>> executor deserialize the code but not find it later?
>>
>> Not sure what you mean the classpath going in to the cluster? The
>> classDef not found does seem to be in the pio 0.12.1 hbase assembly, isn’t
>> this where it should get it?
>>
>> Thanks again
>> p
>>
>>
>> From: Donald Szeto <do...@apache.org> <do...@apache.org>
>> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
>> <us...@predictionio.apache.org>
>> Date: May 24, 2018 at 2:10:24 PM
>> To: user@predictionio.apache.org <us...@predictionio.apache.org>
>> <us...@predictionio.apache.org>
>> Subject:  Re: Spark cluster error
>>
>> 0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
>> Looking at Git history it has not changed in a while.
>>
>> Do you have the exact classpath that has gone into your Spark cluster?
>>
>> On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <pa...@actionml.com> wrote:
>>
>>> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
>>> Spark cluster? The issue seems to be how to pass the correct code to Spark
>>> to connect to HBase:
>>>
>>> [ERROR] [TransportRequestHandler] Error while invoking
>>> RpcHandler#receive() for one-way message.
>>> [ERROR] [TransportRequestHandler] Error while invoking
>>> RpcHandler#receive() for one-way message.
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
>>> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
>>> java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>>>     at
>>> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertStringToScan(TableMapReduceUtil.java:521)
>>>     at
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:110)
>>>     at
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:170)
>>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>>>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>>>     at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
>>> (edited)
>>>
>>> Now that we have these pluggable DBs did I miss something? This works
>>> with master=local but not with remote Spark master
>>>
>>> I’ve passed in the hbase-client in the --jars part of spark-submit,
>>> still fails, what am I missing?
>>>
>>>
>>> From: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
>>> Reply: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
>>> Date: May 23, 2018 at 8:57:32 AM
>>> To: user@predictionio.apache.org <us...@predictionio.apache.org>
>>> <us...@predictionio.apache.org>
>>> Subject:  Spark cluster error
>>>
>>> Same CLI works using local Spark master, but fails using remote master
>>> for a cluster due to a missing class def for protobuf used in hbase. We are
>>> using the binary dist 0.12.1.  Is this known? Is there a work around?
>>>
>>> We are now trying a source build in hope the class will be put in the
>>> assembly passed to Spark and the reasoning is that the executors don’t
>>> contain hbase classes but when you run a local executor it does, due to
>>> some local classpath. If the source built assembly does not have these
>>> classes, we will have the same problem. Namely how to get protobuf to the
>>> executors.
>>>
>>> Has anyone seen this?
>>>
>>>
>>
>

Re: Spark cluster error

Posted by Pat Ferrel <pa...@occamsmachete.com>.
BTW the way we worked around this was to scale up the driver machine to
handle the executors too-et voila. All worked but, our normal strategy of
using remote Spark is now somehow broken. We upgraded everything to the
latest stable and may have messed up some config. So not sure where the
problem is, just looking for a clue we haven’t already thought of.


From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 29, 2018 at 2:14:23 PM
To: Donald Szeto <do...@apache.org> <do...@apache.org>,
user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  Re: Spark cluster error

Yes, the spark-submit --jars is where we started to find the missing class.
The class isn’t found on the remote executor so we looked in the jars
actually downloaded into the executor’s work dir. the PIO assembly jars are
there are do have the classes. This would be in the classpath of the
executor, right? Not sure what you are asking.

Are you asking about the SPARK_CLASSPATH in spark-env.sh? The default
should include the work subdir for the job, I believe. and it can only be
added to so we couldn’t have messed that up if it points first to the
work/job-number dir, right?

I guess the root of my question is how can the jars be downloaded to the
executor’s work dir and still the classes we know are in the jar are not
found?


From: Donald Szeto <do...@apache.org> <do...@apache.org>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 29, 2018 at 1:27:03 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  Re: Spark cluster error

Sorry, what I meant was the actual spark-submit command that PIO was using.
It should be in the log.

What Spark version was that? I recall classpath issues with certain
versions of Spark.

On Thu, May 24, 2018 at 4:52 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Thanks Donald,
>
> We have:
>
>    - built pio with hbase 1.4.3, which is what we have deployed
>    - verified that the `ProtobufUtil` class is in the pio hbase assembly
>    - verified the assembly is passed in --jars to spark-submit
>    - verified that the executors receive and store the assemblies in the
>    FS work dir on the worker machines
>    - verified that hashes match the original assembly so the class is
>    being received by every executor
>
> However the executor is unable to find the class.
>
> This seems just short of impossible but clearly possible. How can the
> executor deserialize the code but not find it later?
>
> Not sure what you mean the classpath going in to the cluster? The classDef
> not found does seem to be in the pio 0.12.1 hbase assembly, isn’t this
> where it should get it?
>
> Thanks again
> p
>
>
> From: Donald Szeto <do...@apache.org> <do...@apache.org>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 24, 2018 at 2:10:24 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  Re: Spark cluster error
>
> 0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
> Looking at Git history it has not changed in a while.
>
> Do you have the exact classpath that has gone into your Spark cluster?
>
> On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <pa...@actionml.com> wrote:
>
>> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
>> Spark cluster? The issue seems to be how to pass the correct code to Spark
>> to connect to HBase:
>>
>> [ERROR] [TransportRequestHandler] Error while invoking
>> RpcHandler#receive() for one-way message.
>> [ERROR] [TransportRequestHandler] Error while invoking
>> RpcHandler#receive() for one-way message.
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
>> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
>> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>>     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convert
>> StringToScan(TableMapReduceUtil.java:521)
>>     at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(
>> TableInputFormat.java:110)
>>     at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRD
>> D.scala:170)
>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsR
>> DD.scala:38)
>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
>> (edited)
>>
>> Now that we have these pluggable DBs did I miss something? This works
>> with master=local but not with remote Spark master
>>
>> I’ve passed in the hbase-client in the --jars part of spark-submit, still
>> fails, what am I missing?
>>
>>
>> From: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
>> Reply: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
>> Date: May 23, 2018 at 8:57:32 AM
>> To: user@predictionio.apache.org <us...@predictionio.apache.org>
>> <us...@predictionio.apache.org>
>> Subject:  Spark cluster error
>>
>> Same CLI works using local Spark master, but fails using remote master
>> for a cluster due to a missing class def for protobuf used in hbase. We are
>> using the binary dist 0.12.1.  Is this known? Is there a work around?
>>
>> We are now trying a source build in hope the class will be put in the
>> assembly passed to Spark and the reasoning is that the executors don’t
>> contain hbase classes but when you run a local executor it does, due to
>> some local classpath. If the source built assembly does not have these
>> classes, we will have the same problem. Namely how to get protobuf to the
>> executors.
>>
>> Has anyone seen this?
>>
>>
>

Re: Spark cluster error

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Yes, the spark-submit --jars is where we started to find the missing class.
The class isn’t found on the remote executor so we looked in the jars
actually downloaded into the executor’s work dir. the PIO assembly jars are
there are do have the classes. This would be in the classpath of the
executor, right? Not sure what you are asking.

Are you asking about the SPARK_CLASSPATH in spark-env.sh? The default
should include the work subdir for the job, I believe. and it can only be
added to so we couldn’t have messed that up if it points first to the
work/job-number dir, right?

I guess the root of my question is how can the jars be downloaded to the
executor’s work dir and still the classes we know are in the jar are not
found?


From: Donald Szeto <do...@apache.org> <do...@apache.org>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 29, 2018 at 1:27:03 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  Re: Spark cluster error

Sorry, what I meant was the actual spark-submit command that PIO was using.
It should be in the log.

What Spark version was that? I recall classpath issues with certain
versions of Spark.

On Thu, May 24, 2018 at 4:52 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Thanks Donald,
>
> We have:
>
>    - built pio with hbase 1.4.3, which is what we have deployed
>    - verified that the `ProtobufUtil` class is in the pio hbase assembly
>    - verified the assembly is passed in --jars to spark-submit
>    - verified that the executors receive and store the assemblies in the
>    FS work dir on the worker machines
>    - verified that hashes match the original assembly so the class is
>    being received by every executor
>
> However the executor is unable to find the class.
>
> This seems just short of impossible but clearly possible. How can the
> executor deserialize the code but not find it later?
>
> Not sure what you mean the classpath going in to the cluster? The classDef
> not found does seem to be in the pio 0.12.1 hbase assembly, isn’t this
> where it should get it?
>
> Thanks again
> p
>
>
> From: Donald Szeto <do...@apache.org> <do...@apache.org>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 24, 2018 at 2:10:24 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  Re: Spark cluster error
>
> 0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
> Looking at Git history it has not changed in a while.
>
> Do you have the exact classpath that has gone into your Spark cluster?
>
> On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <pa...@actionml.com> wrote:
>
>> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
>> Spark cluster? The issue seems to be how to pass the correct code to Spark
>> to connect to HBase:
>>
>> [ERROR] [TransportRequestHandler] Error while invoking
>> RpcHandler#receive() for one-way message.
>> [ERROR] [TransportRequestHandler] Error while invoking
>> RpcHandler#receive() for one-way message.
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
>> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
>> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>>     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convert
>> StringToScan(TableMapReduceUtil.java:521)
>>     at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(
>> TableInputFormat.java:110)
>>     at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRD
>> D.scala:170)
>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsR
>> DD.scala:38)
>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
>> (edited)
>>
>> Now that we have these pluggable DBs did I miss something? This works
>> with master=local but not with remote Spark master
>>
>> I’ve passed in the hbase-client in the --jars part of spark-submit, still
>> fails, what am I missing?
>>
>>
>> From: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
>> Reply: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
>> Date: May 23, 2018 at 8:57:32 AM
>> To: user@predictionio.apache.org <us...@predictionio.apache.org>
>> <us...@predictionio.apache.org>
>> Subject:  Spark cluster error
>>
>> Same CLI works using local Spark master, but fails using remote master
>> for a cluster due to a missing class def for protobuf used in hbase. We are
>> using the binary dist 0.12.1.  Is this known? Is there a work around?
>>
>> We are now trying a source build in hope the class will be put in the
>> assembly passed to Spark and the reasoning is that the executors don’t
>> contain hbase classes but when you run a local executor it does, due to
>> some local classpath. If the source built assembly does not have these
>> classes, we will have the same problem. Namely how to get protobuf to the
>> executors.
>>
>> Has anyone seen this?
>>
>>
>

Re: Spark cluster error

Posted by Donald Szeto <do...@apache.org>.
Sorry, what I meant was the actual spark-submit command that PIO was using.
It should be in the log.

What Spark version was that? I recall classpath issues with certain
versions of Spark.

On Thu, May 24, 2018 at 4:52 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Thanks Donald,
>
> We have:
>
>    - built pio with hbase 1.4.3, which is what we have deployed
>    - verified that the `ProtobufUtil` class is in the pio hbase assembly
>    - verified the assembly is passed in --jars to spark-submit
>    - verified that the executors receive and store the assemblies in the
>    FS work dir on the worker machines
>    - verified that hashes match the original assembly so the class is
>    being received by every executor
>
> However the executor is unable to find the class.
>
> This seems just short of impossible but clearly possible. How can the
> executor deserialize the code but not find it later?
>
> Not sure what you mean the classpath going in to the cluster? The classDef
> not found does seem to be in the pio 0.12.1 hbase assembly, isn’t this
> where it should get it?
>
> Thanks again
> p
>
>
> From: Donald Szeto <do...@apache.org> <do...@apache.org>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 24, 2018 at 2:10:24 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  Re: Spark cluster error
>
> 0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
> Looking at Git history it has not changed in a while.
>
> Do you have the exact classpath that has gone into your Spark cluster?
>
> On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <pa...@actionml.com> wrote:
>
>> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
>> Spark cluster? The issue seems to be how to pass the correct code to Spark
>> to connect to HBase:
>>
>> [ERROR] [TransportRequestHandler] Error while invoking
>> RpcHandler#receive() for one-way message.
>> [ERROR] [TransportRequestHandler] Error while invoking
>> RpcHandler#receive() for one-way message.
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
>> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
>> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>>     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convert
>> StringToScan(TableMapReduceUtil.java:521)
>>     at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(
>> TableInputFormat.java:110)
>>     at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRD
>> D.scala:170)
>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsR
>> DD.scala:38)
>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
>> (edited)
>>
>> Now that we have these pluggable DBs did I miss something? This works
>> with master=local but not with remote Spark master
>>
>> I’ve passed in the hbase-client in the --jars part of spark-submit, still
>> fails, what am I missing?
>>
>>
>> From: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
>> Reply: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
>> Date: May 23, 2018 at 8:57:32 AM
>> To: user@predictionio.apache.org <us...@predictionio.apache.org>
>> <us...@predictionio.apache.org>
>> Subject:  Spark cluster error
>>
>> Same CLI works using local Spark master, but fails using remote master
>> for a cluster due to a missing class def for protobuf used in hbase. We are
>> using the binary dist 0.12.1.  Is this known? Is there a work around?
>>
>> We are now trying a source build in hope the class will be put in the
>> assembly passed to Spark and the reasoning is that the executors don’t
>> contain hbase classes but when you run a local executor it does, due to
>> some local classpath. If the source built assembly does not have these
>> classes, we will have the same problem. Namely how to get protobuf to the
>> executors.
>>
>> Has anyone seen this?
>>
>>
>

Re: Spark cluster error

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Thanks Donald,

We have:

   - built pio with hbase 1.4.3, which is what we have deployed
   - verified that the `ProtobufUtil` class is in the pio hbase assembly
   - verified the assembly is passed in --jars to spark-submit
   - verified that the executors receive and store the assemblies in the FS
   work dir on the worker machines
   - verified that hashes match the original assembly so the class is being
   received by every executor

However the executor is unable to find the class.

This seems just short of impossible but clearly possible. How can the
executor deserialize the code but not find it later?

Not sure what you mean the classpath going in to the cluster? The classDef
not found does seem to be in the pio 0.12.1 hbase assembly, isn’t this
where it should get it?

Thanks again
p


From: Donald Szeto <do...@apache.org> <do...@apache.org>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 24, 2018 at 2:10:24 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  Re: Spark cluster error

0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
Looking at Git history it has not changed in a while.

Do you have the exact classpath that has gone into your Spark cluster?

On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <pa...@actionml.com> wrote:

> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
> Spark cluster? The issue seems to be how to pass the correct code to Spark
> to connect to HBase:
>
> [ERROR] [TransportRequestHandler] Error while invoking
> RpcHandler#receive() for one-way message.
> [ERROR] [TransportRequestHandler] Error while invoking
> RpcHandler#receive() for one-way message.
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.
> convertStringToScan(TableMapReduceUtil.java:521)
>     at org.apache.hadoop.hbase.mapreduce.TableInputFormat.
> setConf(TableInputFormat.java:110)
>     at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(
> NewHadoopRDD.scala:170)
>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
> (edited)
>
> Now that we have these pluggable DBs did I miss something? This works with
> master=local but not with remote Spark master
>
> I’ve passed in the hbase-client in the --jars part of spark-submit, still
> fails, what am I missing?
>
>
> From: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
> Reply: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
> Date: May 23, 2018 at 8:57:32 AM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  Spark cluster error
>
> Same CLI works using local Spark master, but fails using remote master for
> a cluster due to a missing class def for protobuf used in hbase. We are
> using the binary dist 0.12.1.  Is this known? Is there a work around?
>
> We are now trying a source build in hope the class will be put in the
> assembly passed to Spark and the reasoning is that the executors don’t
> contain hbase classes but when you run a local executor it does, due to
> some local classpath. If the source built assembly does not have these
> classes, we will have the same problem. Namely how to get protobuf to the
> executors.
>
> Has anyone seen this?
>
>

Re: Spark cluster error

Posted by Donald Szeto <do...@apache.org>.
0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
Looking at Git history it has not changed in a while.

Do you have the exact classpath that has gone into your Spark cluster?

On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <pa...@actionml.com> wrote:

> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
> Spark cluster? The issue seems to be how to pass the correct code to Spark
> to connect to HBase:
>
> [ERROR] [TransportRequestHandler] Error while invoking
> RpcHandler#receive() for one-way message.
> [ERROR] [TransportRequestHandler] Error while invoking
> RpcHandler#receive() for one-way message.
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.
> convertStringToScan(TableMapReduceUtil.java:521)
>     at org.apache.hadoop.hbase.mapreduce.TableInputFormat.
> setConf(TableInputFormat.java:110)
>     at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(
> NewHadoopRDD.scala:170)
>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
> (edited)
>
> Now that we have these pluggable DBs did I miss something? This works with
> master=local but not with remote Spark master
>
> I’ve passed in the hbase-client in the --jars part of spark-submit, still
> fails, what am I missing?
>
>
> From: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
> Reply: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
> Date: May 23, 2018 at 8:57:32 AM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  Spark cluster error
>
> Same CLI works using local Spark master, but fails using remote master for
> a cluster due to a missing class def for protobuf used in hbase. We are
> using the binary dist 0.12.1.  Is this known? Is there a work around?
>
> We are now trying a source build in hope the class will be put in the
> assembly passed to Spark and the reasoning is that the executors don’t
> contain hbase classes but when you run a local executor it does, due to
> some local classpath. If the source built assembly does not have these
> classes, we will have the same problem. Namely how to get protobuf to the
> executors.
>
> Has anyone seen this?
>
>

Re: Spark cluster error

Posted by Pat Ferrel <pa...@actionml.com>.
A source build did not fix the problem, has anyone run PIO 0.12.1 on a
Spark cluster? The issue seems to be how to pass the correct code to Spark
to connect to HBase:

[ERROR] [TransportRequestHandler] Error while invoking RpcHandler#receive()
for one-way message.
[ERROR] [TransportRequestHandler] Error while invoking RpcHandler#receive()
for one-way message.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 4 in stage 0.0 failed 4 times, most recent failure:
Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hbase.protobuf.ProtobufUtil
    at
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertStringToScan(TableMapReduceUtil.java:521)
    at
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:110)
    at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:170)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
(edited)

Now that we have these pluggable DBs did I miss something? This works with
master=local but not with remote Spark master

I’ve passed in the hbase-client in the --jars part of spark-submit, still
fails, what am I missing?


From: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
Reply: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>
Date: May 23, 2018 at 8:57:32 AM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  Spark cluster error

Same CLI works using local Spark master, but fails using remote master for
a cluster due to a missing class def for protobuf used in hbase. We are
using the binary dist 0.12.1.  Is this known? Is there a work around?

We are now trying a source build in hope the class will be put in the
assembly passed to Spark and the reasoning is that the executors don’t
contain hbase classes but when you run a local executor it does, due to
some local classpath. If the source built assembly does not have these
classes, we will have the same problem. Namely how to get protobuf to the
executors.

Has anyone seen this?