You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Michael Nazario <mn...@palantir.com> on 2014/12/11 23:41:42 UTC

Job status from Python

In PySpark, is there a way to get the status of a job which is currently running? My use case is that I have a long running job that users may not know whether or not the job is still running. It would be nice to have an idea of whether or not the job is progressing even if it isn't very granular.

I've looked into the Application detailed UI which has per-stage information (but unfortunately is not in json format), but even at that point I don't necessarily know which stages correspond to a job I started.

So I guess my main questions are:

  1.  How do I get the job id of a job started in python?
  2.  If possible, how do I get the stages which correspond to that job?
  3.  Is there any way to get information about currently running stages without parsing the Stage UI HTML page?
  4.  Has anyone approached this problem in a different way?