You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by freedafeng <fr...@yahoo.com> on 2015/08/13 18:13:19 UTC

What does NativeMethodAccessorImpl.java do?

I am running a spark job with only two operations: mapPartition and then
collect(). The output data size of mapPartition is very small. One integer
per partition. I saw there is a stage 2 for this job that runs this java
program. I am not a java programmer. Could anyone please let me know what
this java program does? or simply how to get rid of this from running, or at
least get it run faster? The collect() call is not important to me. All the
work was done in mapPartition which sends out data to a k-v store. It's sth
like foreachPartition. But I cannot get foreachPartition() to run somehow.
Spark 1.1.1.

Thanks!



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-does-NativeMethodAccessorImpl-java-do-tp13667.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: What does NativeMethodAccessorImpl.java do?

Posted by freedafeng <fr...@yahoo.com>.

Thanks Marcelo! 

The reason I was asking that question is that I was expecting my spark job
to be a "map only" job. In other words, it should finish after the
mapPartitions run for all partitions. This is because the job is only
mapPartitions() plus count() where mapPartitions only yield one integer for
each partition. The first stage running "count at
/root/workspace/**/mapred/aerospike_calculations.py:35" completed after
reasonably long time. I was expecting the job to complete right away after
the first stage is complete. To my surprise, the second stage calling
"collect at NativeMethodAccessorImpl.java:-2" runs super slow, about as slow
as the first stage. 

I want to know what the second stage is doing..

================================UI============================
Spark Stages
Total Duration: 8.2 h
Scheduling Mode: FIFO
Active Stages: 1
Completed Stages: 2
Failed Stages: 0
Active Stages (1)

Stage Id	Description	Submitted	Duration	Tasks: Succeeded/Total	Input	Shuffle
Read	Shuffle Write
2	
(kill) collect at NativeMethodAccessorImpl.java:-2 +details
2015/08/13 16:01:59	4.1 h	
360/2048	375.1 GB		
Completed Stages (2)

Stage Id	Description	Submitted	Duration	Tasks: Succeeded/Total	Input	Shuffle
Read	Shuffle Write
1	
count at /root/workspace/**/aerospike_calculations.py:35
2015/08/13 12:02:40	7.5 h	
2048/2048	1785.6 GB		
0	
first at SerDeUtil.scala:70 +details
2015/08/13 12:02:34	4 s	
1/1	839.0 MB		
Failed Stages (0)

Stage Id	Description	Submitted	Duration	Tasks: Succeeded/Total	Input	Shuffle
Read	Shuffle Write	Failure Reason




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-does-NativeMethodAccessorImpl-java-do-tp13667p13684.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: What does NativeMethodAccessorImpl.java do?

Posted by Marcelo Vanzin <va...@cloudera.com>.

That's not a program, it's just a class in the Java library. Spark looks at
the call stack and uses it to describe the job in the UI. If you look at
the whole stack trace you'll see more things that might tell you what's
really going on in that job.

On Thu, Aug 13, 2015 at 9:13 AM, freedafeng <fr...@yahoo.com> wrote:

> I am running a spark job with only two operations: mapPartition and then
> collect(). The output data size of mapPartition is very small. One integer
> per partition. I saw there is a stage 2 for this job that runs this java
> program. I am not a java programmer. Could anyone please let me know what
> this java program does? or simply how to get rid of this from running, or
> at
> least get it run faster? The collect() call is not important to me. All the
> work was done in mapPartition which sends out data to a k-v store. It's sth
> like foreachPartition. But I cannot get foreachPartition() to run somehow.
> Spark 1.1.1.
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/What-does-NativeMethodAccessorImpl-java-do-tp13667.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
Marcelo