You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Priya Ch <le...@gmail.com> on 2014/11/14 12:47:26 UTC

1gb file processing...task doesn't launch on all the node...Unseen exception

Hi All,

  We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is
having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set
up hdfs which has 2 TB capacity and the block size is 256 mb   When we try
to process 1 gb file on spark, we see the following exception

14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
0.0 (TID 0, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
0.0 (TID 1, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 2.0 in stage
0.0 (TID 2, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:43 INFO cluster.SparkDeploySchedulerBackend: Registered
executor: Actor[akka.tcp://sparkExecutor@IMPETUS-DSRV02:41124/user/Executor#539551156]
with ID 0
14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
manager NODE-DSRV05.impetus.co.in:60432 with 2.1 GB RAM
14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
manager NODE-DSRV02:47844 with 2.1 GB RAM
14/11/14 17:01:43 INFO network.ConnectionManager: Accepted connection from [
NODE-DSRV05.impetus.co.in/192.168.145.195:51447]
14/11/14 17:01:43 INFO network.SendingConnection: Initiating connection to [
NODE-DSRV05.impetus.co.in/192.168.145.195:60432]
14/11/14 17:01:43 INFO network.SendingConnection: Connected to [
NODE-DSRV05.impetus.co.in/192.168.145.195:60432], 1 messages pending
14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
in memory on NODE-DSRV05.impetus.co.in:60432 (size: 17.1 KB, free: 2.1 GB)
14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
in memory on NODE-DSRV05.impetus.co.in:60432 (size: 14.1 KB, free: 2.1 GB)
14/11/14 17:01:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
(TID 0, NODE-DSRV05.impetus.co.in): java.lang.NullPointerException:
        org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
        org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)

org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        java.lang.Thread.run(Thread.java:722)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.1 in stage
0.0 (TID 3, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 0.0
(TID 1) on executor NODE-DSRV05.impetus.co.in:
java.lang.NullPointerException (null) [duplicate 1]
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.0 in stage 0.0
(TID 2) on executor NODE-DSRV05.impetus.co.in:
java.lang.NullPointerException (null) [duplicate 2]
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.1 in stage
0.0 (TID 4, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 1.1 in stage
0.0 (TID 5, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 0.0
(TID 3) on executor NODE-DSRV05.impetus.co.in:
java.lang.NullPointerException (null) [duplicate 3]
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.2 in stage
0.0 (TID 6, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.1 in stage 0.0
(TID 4) on executor NODE-DSRV05.impetus.co.in:
java.lang.NullPointerException (null) [duplicate 4]
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.2 in stage
0.0 (TID 7, NODE-DSRV02, NODE_LOCAL, 1667 bytes)


What I see is, it couldnt launch tasks on NODE-DSRV05 and processing it on
single node i.e NODE-DSRV02. When we tried with 360 MB of data, I dont see
any exception but the entire processing is done by only one node. I couldnt
figure out where the issue lies.

Any suggestions on what kind of situations might cause such issue ?

Thanks,
Padma Ch

Fwd: 1gb file processing...task doesn't launch on all the node...Unseen exception

Posted by Priya Ch <le...@gmail.com>.

Hi,

I tried with try catch  blocks. Infact, inside mapPartitionsWithIndex,
method is invoked which does the operation. I put the operations inside the
function in try...catch block but thats of no use...still the error
persists. Even I commented all the operations and a simple print statement
inside the method is not executed. The data size is 542 MB. hdfs block size
is 64 MB and it has got 9 blocks. I used a 2 node cluster with rep.factor
2.

When is see the logs, it seemed to me like it tried to launch tasks on the
other node ..but TaskSetManager has encountered Null pointer exception and
the job is aborted. Is this the problem with mapPartitionWithIndex ?

The same operations when performed with map transformation, it got executed
with no issues.


Please let me know if anyone has the same problem ?

Thanks,
Padma Ch

On Fri, Nov 14, 2014 at 7:42 PM, Akhil [via Apache Spark User List] <
ml-node+s1001560n18936h61@n3.nabble.com> wrote:

> It shows nullPointerException, your data could be corrupted? Try putting a
> try catch inside the operation that you are doing, Are you running the
> worker process on the master node also? If not, then only 1 node will be
> doing the processing. If yes, then try setting the level of parallelism and
> number of partitions while creating/transforming the RDD.
>
> Thanks
> Best Regards
>
> On Fri, Nov 14, 2014 at 5:17 PM, Priya Ch <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=18936&i=0>> wrote:
>
>> Hi All,
>>
>>   We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is
>> having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set
>> up hdfs which has 2 TB capacity and the block size is 256 mb   When we try
>> to process 1 gb file on spark, we see the following exception
>>
>> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 0.0 in
>> stage 0.0 (TID 0, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in
>> stage 0.0 (TID 1, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 2.0 in
>> stage 0.0 (TID 2, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:43 INFO cluster.SparkDeploySchedulerBackend: Registered
>> executor: Actor[akka.tcp://sparkExecutor@IMPETUS-DSRV02:41124/user/Executor#539551156]
>> with ID 0
>> 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
>> manager NODE-DSRV05.impetus.co.in:60432 with 2.1 GB RAM
>> 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
>> manager NODE-DSRV02:47844 with 2.1 GB RAM
>> 14/11/14 17:01:43 INFO network.ConnectionManager: Accepted connection
>> from [NODE-DSRV05.impetus.co.in/192.168.145.195:51447]
>> 14/11/14 17:01:43 INFO network.SendingConnection: Initiating connection
>> to [NODE-DSRV05.impetus.co.in/192.168.145.195:60432]
>> 14/11/14 17:01:43 INFO network.SendingConnection: Connected to [
>> NODE-DSRV05.impetus.co.in/192.168.145.195:60432], 1 messages pending
>> 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
>> in memory on NODE-DSRV05.impetus.co.in:60432 (size: 17.1 KB, free: 2.1
>> GB)
>> 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
>> in memory on NODE-DSRV05.impetus.co.in:60432 (size: 14.1 KB, free: 2.1
>> GB)
>> 14/11/14 17:01:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
>> 0.0 (TID 0, NODE-DSRV05.impetus.co.in): java.lang.NullPointerException:
>>         org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
>>         org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
>>
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>>
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         java.lang.Thread.run(Thread.java:722)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.1 in
>> stage 0.0 (TID 3, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 1.0 in stage
>> 0.0 (TID 1) on executor NODE-DSRV05.impetus.co.in:
>> java.lang.NullPointerException (null) [duplicate 1]
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.0 in stage
>> 0.0 (TID 2) on executor NODE-DSRV05.impetus.co.in:
>> java.lang.NullPointerException (null) [duplicate 2]
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.1 in
>> stage 0.0 (TID 4, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 1.1 in
>> stage 0.0 (TID 5, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 0.1 in stage
>> 0.0 (TID 3) on executor NODE-DSRV05.impetus.co.in:
>> java.lang.NullPointerException (null) [duplicate 3]
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.2 in
>> stage 0.0 (TID 6, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.1 in stage
>> 0.0 (TID 4) on executor NODE-DSRV05.impetus.co.in:
>> java.lang.NullPointerException (null) [duplicate 4]
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.2 in
>> stage 0.0 (TID 7, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
>>
>>
>> What I see is, it couldnt launch tasks on NODE-DSRV05 and processing it
>> on single node i.e NODE-DSRV02. When we tried with 360 MB of data, I dont
>> see any exception but the entire processing is done by only one node. I
>> couldnt figure out where the issue lies.
>>
>> Any suggestions on what kind of situations might cause such issue ?
>>
>> Thanks,
>> Padma Ch
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/1gb-file-processing-task-doesn-t-launch-on-all-the-node-Unseen-exception-tp18933p18936.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h76@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>

Re: 1gb file processing...task doesn't launch on all the node...Unseen exception

Posted by Chitturi Padma <le...@gmail.com>.

Hi,

I tried with try catch  blocks. Infact, inside mapPartitionsWithIndex,
method is invoked which does the operation. I put the operations inside the
function in try...catch block but thats of no use...still the error
persists. Even I commented all the operations and a simple print statement
inside the method is not executed. The data size is 542 MB. hdfs block size
is 64 MB and it has got 9 blocks. I used a 2 node cluster with rep.factor
2.

When is see the logs, it seemed to me like it tried to launch tasks on the
other node ..but TaskSetManager has encountered Null pointer exception and
the job is aborted. Is this the problem with mapPartitionWithIndex ?

The same operations when performed with map transformation, it got executed
with no issues.


Please let me know if anyone has the same problem ?

Thanks,
Padma Ch

On Fri, Nov 14, 2014 at 7:42 PM, Akhil [via Apache Spark User List] <
ml-node+s1001560n18936h61@n3.nabble.com> wrote:

> It shows nullPointerException, your data could be corrupted? Try putting a
> try catch inside the operation that you are doing, Are you running the
> worker process on the master node also? If not, then only 1 node will be
> doing the processing. If yes, then try setting the level of parallelism and
> number of partitions while creating/transforming the RDD.
>
> Thanks
> Best Regards
>
> On Fri, Nov 14, 2014 at 5:17 PM, Priya Ch <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=18936&i=0>> wrote:
>
>> Hi All,
>>
>>   We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is
>> having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set
>> up hdfs which has 2 TB capacity and the block size is 256 mb   When we try
>> to process 1 gb file on spark, we see the following exception
>>
>> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 0.0 in
>> stage 0.0 (TID 0, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in
>> stage 0.0 (TID 1, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 2.0 in
>> stage 0.0 (TID 2, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:43 INFO cluster.SparkDeploySchedulerBackend: Registered
>> executor: Actor[akka.tcp://sparkExecutor@IMPETUS-DSRV02:41124/user/Executor#539551156]
>> with ID 0
>> 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
>> manager NODE-DSRV05.impetus.co.in:60432 with 2.1 GB RAM
>> 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
>> manager NODE-DSRV02:47844 with 2.1 GB RAM
>> 14/11/14 17:01:43 INFO network.ConnectionManager: Accepted connection
>> from [NODE-DSRV05.impetus.co.in/192.168.145.195:51447]
>> 14/11/14 17:01:43 INFO network.SendingConnection: Initiating connection
>> to [NODE-DSRV05.impetus.co.in/192.168.145.195:60432]
>> 14/11/14 17:01:43 INFO network.SendingConnection: Connected to [
>> NODE-DSRV05.impetus.co.in/192.168.145.195:60432], 1 messages pending
>> 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
>> in memory on NODE-DSRV05.impetus.co.in:60432 (size: 17.1 KB, free: 2.1
>> GB)
>> 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
>> in memory on NODE-DSRV05.impetus.co.in:60432 (size: 14.1 KB, free: 2.1
>> GB)
>> 14/11/14 17:01:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
>> 0.0 (TID 0, NODE-DSRV05.impetus.co.in): java.lang.NullPointerException:
>>         org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
>>         org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
>>
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>>
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         java.lang.Thread.run(Thread.java:722)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.1 in
>> stage 0.0 (TID 3, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 1.0 in stage
>> 0.0 (TID 1) on executor NODE-DSRV05.impetus.co.in:
>> java.lang.NullPointerException (null) [duplicate 1]
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.0 in stage
>> 0.0 (TID 2) on executor NODE-DSRV05.impetus.co.in:
>> java.lang.NullPointerException (null) [duplicate 2]
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.1 in
>> stage 0.0 (TID 4, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 1.1 in
>> stage 0.0 (TID 5, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 0.1 in stage
>> 0.0 (TID 3) on executor NODE-DSRV05.impetus.co.in:
>> java.lang.NullPointerException (null) [duplicate 3]
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.2 in
>> stage 0.0 (TID 6, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.1 in stage
>> 0.0 (TID 4) on executor NODE-DSRV05.impetus.co.in:
>> java.lang.NullPointerException (null) [duplicate 4]
>> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.2 in
>> stage 0.0 (TID 7, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
>>
>>
>> What I see is, it couldnt launch tasks on NODE-DSRV05 and processing it
>> on single node i.e NODE-DSRV02. When we tried with 360 MB of data, I dont
>> see any exception but the entire processing is done by only one node. I
>> couldnt figure out where the issue lies.
>>
>> Any suggestions on what kind of situations might cause such issue ?
>>
>> Thanks,
>> Padma Ch
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/1gb-file-processing-task-doesn-t-launch-on-all-the-node-Unseen-exception-tp18933p18936.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h76@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1gb-file-processing-task-doesn-t-launch-on-all-the-node-Unseen-exception-tp18933p19376.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: 1gb file processing...task doesn't launch on all the node...Unseen exception

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

It shows nullPointerException, your data could be corrupted? Try putting a
try catch inside the operation that you are doing, Are you running the
worker process on the master node also? If not, then only 1 node will be
doing the processing. If yes, then try setting the level of parallelism and
number of partitions while creating/transforming the RDD.

Thanks
Best Regards

On Fri, Nov 14, 2014 at 5:17 PM, Priya Ch <le...@gmail.com>
wrote:

> Hi All,
>
>   We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is
> having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set
> up hdfs which has 2 TB capacity and the block size is 256 mb   When we try
> to process 1 gb file on spark, we see the following exception
>
> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 0.0 in
> stage 0.0 (TID 0, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in
> stage 0.0 (TID 1, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 2.0 in
> stage 0.0 (TID 2, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:43 INFO cluster.SparkDeploySchedulerBackend: Registered
> executor: Actor[akka.tcp://sparkExecutor@IMPETUS-DSRV02:41124/user/Executor#539551156]
> with ID 0
> 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
> manager NODE-DSRV05.impetus.co.in:60432 with 2.1 GB RAM
> 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
> manager NODE-DSRV02:47844 with 2.1 GB RAM
> 14/11/14 17:01:43 INFO network.ConnectionManager: Accepted connection from
> [NODE-DSRV05.impetus.co.in/192.168.145.195:51447]
> 14/11/14 17:01:43 INFO network.SendingConnection: Initiating connection to
> [NODE-DSRV05.impetus.co.in/192.168.145.195:60432]
> 14/11/14 17:01:43 INFO network.SendingConnection: Connected to [
> NODE-DSRV05.impetus.co.in/192.168.145.195:60432], 1 messages pending
> 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
> in memory on NODE-DSRV05.impetus.co.in:60432 (size: 17.1 KB, free: 2.1 GB)
> 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
> in memory on NODE-DSRV05.impetus.co.in:60432 (size: 14.1 KB, free: 2.1 GB)
> 14/11/14 17:01:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
> 0.0 (TID 0, NODE-DSRV05.impetus.co.in): java.lang.NullPointerException:
>         org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
>         org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
>
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         java.lang.Thread.run(Thread.java:722)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.1 in
> stage 0.0 (TID 3, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 1.0 in stage
> 0.0 (TID 1) on executor NODE-DSRV05.impetus.co.in:
> java.lang.NullPointerException (null) [duplicate 1]
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.0 in stage
> 0.0 (TID 2) on executor NODE-DSRV05.impetus.co.in:
> java.lang.NullPointerException (null) [duplicate 2]
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.1 in
> stage 0.0 (TID 4, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 1.1 in
> stage 0.0 (TID 5, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 0.1 in stage
> 0.0 (TID 3) on executor NODE-DSRV05.impetus.co.in:
> java.lang.NullPointerException (null) [duplicate 3]
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.2 in
> stage 0.0 (TID 6, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.1 in stage
> 0.0 (TID 4) on executor NODE-DSRV05.impetus.co.in:
> java.lang.NullPointerException (null) [duplicate 4]
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.2 in
> stage 0.0 (TID 7, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
>
>
> What I see is, it couldnt launch tasks on NODE-DSRV05 and processing it on
> single node i.e NODE-DSRV02. When we tried with 360 MB of data, I dont see
> any exception but the entire processing is done by only one node. I couldnt
> figure out where the issue lies.
>
> Any suggestions on what kind of situations might cause such issue ?
>
> Thanks,
> Padma Ch
>

Re: 1gb file processing...task doesn't launch on all the node...Unseen exception

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

It shows nullPointerException, your data could be corrupted? Try putting a
try catch inside the operation that you are doing, Are you running the
worker process on the master node also? If not, then only 1 node will be
doing the processing. If yes, then try setting the level of parallelism and
number of partitions while creating/transforming the RDD.

Thanks
Best Regards

On Fri, Nov 14, 2014 at 5:17 PM, Priya Ch <le...@gmail.com>
wrote:

> Hi All,
>
>   We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is
> having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set
> up hdfs which has 2 TB capacity and the block size is 256 mb   When we try
> to process 1 gb file on spark, we see the following exception
>
> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 0.0 in
> stage 0.0 (TID 0, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in
> stage 0.0 (TID 1, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 2.0 in
> stage 0.0 (TID 2, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:43 INFO cluster.SparkDeploySchedulerBackend: Registered
> executor: Actor[akka.tcp://sparkExecutor@IMPETUS-DSRV02:41124/user/Executor#539551156]
> with ID 0
> 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
> manager NODE-DSRV05.impetus.co.in:60432 with 2.1 GB RAM
> 14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
> manager NODE-DSRV02:47844 with 2.1 GB RAM
> 14/11/14 17:01:43 INFO network.ConnectionManager: Accepted connection from
> [NODE-DSRV05.impetus.co.in/192.168.145.195:51447]
> 14/11/14 17:01:43 INFO network.SendingConnection: Initiating connection to
> [NODE-DSRV05.impetus.co.in/192.168.145.195:60432]
> 14/11/14 17:01:43 INFO network.SendingConnection: Connected to [
> NODE-DSRV05.impetus.co.in/192.168.145.195:60432], 1 messages pending
> 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
> in memory on NODE-DSRV05.impetus.co.in:60432 (size: 17.1 KB, free: 2.1 GB)
> 14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
> in memory on NODE-DSRV05.impetus.co.in:60432 (size: 14.1 KB, free: 2.1 GB)
> 14/11/14 17:01:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
> 0.0 (TID 0, NODE-DSRV05.impetus.co.in): java.lang.NullPointerException:
>         org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
>         org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
>
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         java.lang.Thread.run(Thread.java:722)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.1 in
> stage 0.0 (TID 3, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 1.0 in stage
> 0.0 (TID 1) on executor NODE-DSRV05.impetus.co.in:
> java.lang.NullPointerException (null) [duplicate 1]
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.0 in stage
> 0.0 (TID 2) on executor NODE-DSRV05.impetus.co.in:
> java.lang.NullPointerException (null) [duplicate 2]
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.1 in
> stage 0.0 (TID 4, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 1.1 in
> stage 0.0 (TID 5, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 0.1 in stage
> 0.0 (TID 3) on executor NODE-DSRV05.impetus.co.in:
> java.lang.NullPointerException (null) [duplicate 3]
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.2 in
> stage 0.0 (TID 6, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.1 in stage
> 0.0 (TID 4) on executor NODE-DSRV05.impetus.co.in:
> java.lang.NullPointerException (null) [duplicate 4]
> 14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.2 in
> stage 0.0 (TID 7, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
>
>
> What I see is, it couldnt launch tasks on NODE-DSRV05 and processing it on
> single node i.e NODE-DSRV02. When we tried with 360 MB of data, I dont see
> any exception but the entire processing is done by only one node. I couldnt
> figure out where the issue lies.
>
> Any suggestions on what kind of situations might cause such issue ?
>
> Thanks,
> Padma Ch
>