You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by abhiguruvayya <sh...@gmail.com> on 2014/07/03 02:35:56 UTC

Re: Spark job tracker.

Spark displays job status information on port 4040 using JobProgressListener,
any one knows how to hook into this port and read the details?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p8697.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark job tracker.

Posted by abhiguruvayya <sh...@gmail.com>.

I am trying to create a asynchronous thread using Java executor service and
launching the javaSparkContext in this thread. But it is failing with exit
code 0(zero). I basically want to submit spark job in one thread and
continue doing something else after submitting. Any help on this? Thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p11351.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark job tracker.

Posted by abhiguruvayya <sh...@gmail.com>.

An also i am facing one issue. If i run the program in yarn-cluster mode it
works absolutely fine but if i change it to yarn-client mode i get this
below error.

Application application_1405471266091_0055 failed 2 times due to AM
Container for appattempt_1405471266091_0055_000002 exited with exitCode:
-1000 due to: File does not exist:
********/user/hadoop/.sparkStaging/application_1405471266091_0055/commons-math3-3.0.jar

I have commons-math3-3.0.jar in the class path and i am loading it to
staging also.

yarn.Client: Uploading file:***/commons-math3-3.0.jar to
***/user/hadoop/.sparkStaging/application_1405471266091_0055/commons-math3-3.0.jar.
from the logs.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p10343.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark job tracker.

Posted by abhiguruvayya <sh...@gmail.com>.

Is there any thing equivalent to haddop "Job"
(org.apache.hadoop.mapreduce.Job;) in spark? Once i submit the spark job i
want to concurrently read the sparkListener interface implementation methods
where i can grab the job status. I am trying to find a way to wrap the spark
submit object into one thread and read the sparkListener interface
implementation methods in another thread.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p10548.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark job tracker.

Posted by Marcelo Vanzin <va...@cloudera.com>.

The spark log classes are based on the actual class names. So if you
want to filter out a package's logs you need to specify the full
package name (e.g. "org.apache.spark.storage" instead of just
"spark.storage").

On Tue, Jul 22, 2014 at 2:07 PM, abhiguruvayya
<sh...@gmail.com> wrote:
> Thanks i am able to load the file now. Can i turn off specific logs using
> log4j.properties. I don't want to see the below logs. How can i do this.
>
> 14/07/22 14:01:24 INFO scheduler.TaskSetManager: Starting task 2.0:129 as
> TID 129 on executor 3: ****** (NODE_LOCAL)
> 14/07/22 14:01:24 INFO scheduler.TaskSetManager: Serialized task 2.0:129 as
> 14708 bytes in 0 ms
>
> *current log4j.properties entry:*
>
> # make a file appender and a console appender
> # Print the date in ISO 8601 format
> log4j.appender.myConsoleAppender=org.apache.log4j.ConsoleAppender
> log4j.appender.myConsoleAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.myConsoleAppender.layout.ConversionPattern=%d [%t] %-5p %c -
> %m%n
> log4j.appender.myFileAppender=org.apache.log4j.RollingFileAppender
> log4j.appender.myFileAppender.File=spark.log
> log4j.appender.myFileAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.myFileAppender.layout.ConversionPattern=%d [%t] %-5p %c -
> %m%n
>
>
>
> # By default, everything goes to console and file
> log4j.rootLogger=INFO, myConsoleAppender, myFileAppender
>
> # The noisier spark logs go to file only
> log4j.logger.spark.storage=INFO, myFileAppender
> log4j.additivity.spark.storage=false
> log4j.logger.spark.scheduler=INFO, myFileAppender
> log4j.additivity.spark.scheduler=false
> log4j.logger.spark.CacheTracker=INFO, myFileAppender
> log4j.additivity.spark.CacheTracker=false
> log4j.logger.spark.CacheTrackerActor=INFO, myFileAppender
> log4j.additivity.spark.CacheTrackerActor=false
> log4j.logger.spark.MapOutputTrackerActor=INFO, myFileAppender
> log4j.additivity.spark.MapOutputTrackerActor=false
> log4j.logger.spark.MapOutputTracker=INFO, myFileAppender
> log4j.additivty.spark.MapOutputTracker=false
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p10440.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.



-- 
Marcelo

Re: Spark job tracker.

Posted by abhiguruvayya <sh...@gmail.com>.

Thanks i am able to load the file now. Can i turn off specific logs using
log4j.properties. I don't want to see the below logs. How can i do this.

14/07/22 14:01:24 INFO scheduler.TaskSetManager: Starting task 2.0:129 as
TID 129 on executor 3: ****** (NODE_LOCAL)
14/07/22 14:01:24 INFO scheduler.TaskSetManager: Serialized task 2.0:129 as
14708 bytes in 0 ms

*current log4j.properties entry:*

# make a file appender and a console appender
# Print the date in ISO 8601 format
log4j.appender.myConsoleAppender=org.apache.log4j.ConsoleAppender
log4j.appender.myConsoleAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myConsoleAppender.layout.ConversionPattern=%d [%t] %-5p %c -
%m%n
log4j.appender.myFileAppender=org.apache.log4j.RollingFileAppender
log4j.appender.myFileAppender.File=spark.log
log4j.appender.myFileAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myFileAppender.layout.ConversionPattern=%d [%t] %-5p %c -
%m%n



# By default, everything goes to console and file
log4j.rootLogger=INFO, myConsoleAppender, myFileAppender

# The noisier spark logs go to file only
log4j.logger.spark.storage=INFO, myFileAppender
log4j.additivity.spark.storage=false
log4j.logger.spark.scheduler=INFO, myFileAppender
log4j.additivity.spark.scheduler=false
log4j.logger.spark.CacheTracker=INFO, myFileAppender
log4j.additivity.spark.CacheTracker=false
log4j.logger.spark.CacheTrackerActor=INFO, myFileAppender
log4j.additivity.spark.CacheTrackerActor=false
log4j.logger.spark.MapOutputTrackerActor=INFO, myFileAppender
log4j.additivity.spark.MapOutputTrackerActor=false
log4j.logger.spark.MapOutputTracker=INFO, myFileAppender
log4j.additivty.spark.MapOutputTracker=false



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p10440.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark job tracker.

Posted by Marcelo Vanzin <va...@cloudera.com>.

You can upload your own log4j.properties using spark-submit's
"--files" argument.

On Tue, Jul 22, 2014 at 12:45 PM, abhiguruvayya
<sh...@gmail.com> wrote:
> I fixed the error with the yarn-client mode issue which i mentioned in my
> earlier post. Now i want to edit the log4j.properties to filter some of the
> unnecessary logs. Can you let me know where can i find this properties file.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p10433.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.



-- 
Marcelo

Re: Spark job tracker.

Posted by abhiguruvayya <sh...@gmail.com>.

I fixed the error with the yarn-client mode issue which i mentioned in my
earlier post. Now i want to edit the log4j.properties to filter some of the
unnecessary logs. Can you let me know where can i find this properties file.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p10433.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark job tracker.

Posted by Marcelo Vanzin <va...@cloudera.com>.

I don't understand what you're trying to do.

The code will use log4j under the covers. The default configuration
means writing log messages to stderr. In yarn-client mode that is your
terminal screen, in yarn-cluster mode that is redirected to a file by
Yarn. For the executors, that will always be redirected to a file
(since they're launched by yarn).

I don't know what you mean by "port". But if neither of those options
is what you're looking for, you need to look at providing a custom
log4j configuration that does what you want.

On Sun, Jul 20, 2014 at 11:05 PM, abhiguruvayya
<sh...@gmail.com> wrote:
> Hello Marcelo Vanzin,
>
> Can you explain bit more on this? I tried using client mode but can you
> explain how can i use this port to write the log or output to this
> port?Thanks in advance!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p10287.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

-- 
Marcelo

Re: Spark job tracker.

Posted by abhiguruvayya <sh...@gmail.com>.

Hello Marcelo Vanzin,

Can you explain bit more on this? I tried using client mode but can you
explain how can i use this port to write the log or output to this
port?Thanks in advance!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p10287.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark job tracker.

Posted by Marcelo Vanzin <va...@cloudera.com>.

That output means you're running in yarn-cluster mode. So your code is
running inside the ApplicationMaster and has no access to the local
terminal.

If you want to see the output:
- try yarn-client mode, then your code will run inside the launcher process
- check the RM web ui and look at the logs for your application's AM

On Thu, Jul 10, 2014 at 10:06 AM, abhiguruvayya
<sh...@gmail.com> wrote:
> Hi Mayur, Thanks so much for the explanation. It did help me. Is there a way
> i can log these details on the console rather than logging it. As of now
> once i start my application i could see this,
>
> 14/07/10 00:48:20 INFO yarn.Client: Application report from ASM:
>          application identifier: application_1403538869175_0274
>          appId: 274
>          clientToAMToken: null
>          appDiagnostics:
>          appMasterHost: ***********
>          appQueue: default
>          appMasterRpcPort: 0
>          appStartTime: 1404978412354
>          yarnAppState: RUNNING
>          distributedFinalState: UNDEFINED
>          appTrackingUrl: ***********
>          appUser: hadoop
>
> Which port do i have to hook in to write my statistics details rather than
> this default output.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p9315.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.



-- 
Marcelo

Re: Spark job tracker.

Posted by abhiguruvayya <sh...@gmail.com>.

Hi Mayur, Thanks so much for the explanation. It did help me. Is there a way
i can log these details on the console rather than logging it. As of now
once i start my application i could see this, 

14/07/10 00:48:20 INFO yarn.Client: Application report from ASM: 
	 application identifier: application_1403538869175_0274
	 appId: 274
	 clientToAMToken: null
	 appDiagnostics: 
	 appMasterHost: ***********
	 appQueue: default
	 appMasterRpcPort: 0
	 appStartTime: 1404978412354
	 yarnAppState: RUNNING
	 distributedFinalState: UNDEFINED
	 appTrackingUrl: ***********
	 appUser: hadoop

Which port do i have to hook in to write my statistics details rather than
this default output.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p9315.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark job tracker.

Posted by Mayur Rustagi <ma...@gmail.com>.

 val sem = 0
    sc.addSparkListener(new SparkListener {
      override def onTaskStart(taskStart: SparkListenerTaskStart) {
        sem +=1
      }

    })
sc = spark context



Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Wed, Jul 9, 2014 at 4:34 AM, abhiguruvayya <sh...@gmail.com>
wrote:

> Hello Mayur,
>
> How can I implement these methods mentioned below. Do u you have any clue
> on
> this pls et me know.
>
>
>         public void onJobStart(SparkListenerJobStart arg0) {
>         }
>
>         @Override
>         public void onStageCompleted(SparkListenerStageCompleted arg0) {
>         }
>
>         @Override
>         public void onStageSubmitted(SparkListenerStageSubmitted arg0) {
>
>         }
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p9104.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Spark job tracker.

Posted by abhiguruvayya <sh...@gmail.com>.

Hello Mayur,

How can I implement these methods mentioned below. Do u you have any clue on
this pls et me know.


	public void onJobStart(SparkListenerJobStart arg0) {
	}

	@Override
	public void onStageCompleted(SparkListenerStageCompleted arg0) {	
	}

	@Override
	public void onStageSubmitted(SparkListenerStageSubmitted arg0) {
		
	}



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p9104.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark job tracker.

Posted by Mayur Rustagi <ma...@gmail.com>.

The application server doesnt provide json api unlike the cluster
interface(8080).
If you are okay to patch spark, you can use our patch to create json API,
or you can use sparklistener interface in your application to get that info
out.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>

On Thu, Jul 3, 2014 at 6:05 AM, abhiguruvayya <sh...@gmail.com>
wrote:

> Spark displays job status information on port 4040 using
> JobProgressListener,
> any one knows how to hook into this port and read the details?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p8697.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>