You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Kevin Chen <kc...@palantir.com> on 2015/09/11 20:30:31 UTC

New Spark json endpoints

Hello Spark Devs,

 I noticed that [SPARK-3454], which introduces new json endpoints at
/api/v1/[path] for information previously only shown on the web UI, does not
expose several useful properties about Spark jobs that are exposed on the
web UI and on the unofficial /json endpoint.

 Specific examples include the maximum number of allotted cores per
application, amount of memory allotted to each slave, and number of cores
used by each worker. These are provided at ‘app.cores, app.memoryperslave,
and worker.coresused’ in the /json endpoint, and also all appear on the web
UI page.

 Is there any specific reason that these fields are not exposed in the
public API? If not, would it be reasonable to add them to the json blobs,
possibly in a future /api/v2 API?

Thank you,
Kevin Chen

Re: New Spark json endpoints

Posted by Reynold Xin <rx...@databricks.com>.

Do we need to increment the version number if it is just strict additions?


On Wed, Sep 16, 2015 at 7:10 PM, Kevin Chen <kc...@palantir.com> wrote:

> Just wanted to bring this email up again in case there were any thoughts.
> Having all the information from the web UI accessible through a supported
> json API is very important to us; are there any objections to us adding a
> v2 API to Spark?
>
> Thanks!
>
> From: Kevin Chen <kc...@palantir.com>
> Date: Friday, September 11, 2015 at 11:30 AM
> To: "dev@spark.apache.org" <de...@spark.apache.org>
> Cc: Matt Cheah <mc...@palantir.com>, Mingyu Kim <mk...@palantir.com>
> Subject: New Spark json endpoints
>
> Hello Spark Devs,
>
>  I noticed that [SPARK-3454], which introduces new json endpoints at
> /api/v1/[path] for information previously only shown on the web UI, does
> not expose several useful properties about Spark jobs that are exposed on
> the web UI and on the unofficial /json endpoint.
>
>  Specific examples include the maximum number of allotted cores per
> application, amount of memory allotted to each slave, and number of cores
> used by each worker. These are provided at ‘app.cores, app.memoryperslave,
> and worker.coresused’ in the /json endpoint, and also all appear on the web
> UI page.
>
>  Is there any specific reason that these fields are not exposed in the
> public API? If not, would it be reasonable to add them to the json blobs,
> possibly in a future /api/v2 API?
>
> Thank you,
> Kevin Chen
>
>

Re: New Spark json endpoints

Posted by Kevin Chen <kc...@palantir.com>.

Thank you all for the feedback. I’ve created a corresponding JIRA ticket at
https://issues.apache.org/jira/browse/SPARK-10565, updated with a summary of
this thread.

From:  Mark Hamstra <ma...@clearstorydata.com>
Date:  Thursday, September 17, 2015 at 8:00 AM
To:  Imran Rashid <ir...@cloudera.com>
Cc:  Kevin Chen <kc...@palantir.com>, "dev@spark.apache.org"
<de...@spark.apache.org>, Matt Cheah <mc...@palantir.com>, Mingyu Kim
<mk...@palantir.com>
Subject:  Re: New Spark json endpoints

While we're at it, adding endpoints that get results by jobGroup (cf.
SparkContext#setJobGroup) instead of just for a single Job would also be
very useful to some of us.

On Thu, Sep 17, 2015 at 7:30 AM, Imran Rashid <ir...@cloudera.com> wrote:
> Hi Kevin, 
> 
> I think it would be great if you added this.  It never got added in the first
> place b/c the original PR was already pretty bloated, and just never got back
> to this.  I agree with Reynold -- you shouldn't need to increase the version
> for just adding new endpoints (or even adding new fields to existing
> endpoints).  See the guarantees we make here:
> 
> http://spark.apache.org/docs/latest/monitoring.html#rest-api
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spark.apache.org_docs_lat
> est_monitoring.html-23rest-2Dapi&d=BQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4t
> Fb6oOnmz8&r=LsE5OY2DgDurlIyKnSpM6-DfP7v9PXYudc_RKz6BHxk&m=9LefIyuh9-7WnXvvb-5y
> xOvm0d8y18FKNG7D834_kNA&s=Ov1ONemrf4hSALvJBLRhykzEIhhb0ZijAFQ6zdlunQk&e=>
> 
> (Though if you think we should make different guarantees around versions, that
> would be worth discussing as well.)
> 
> Can you file a jira, and we move discussion there?  Please cc me, and maybe
> also Josh Rosen (I'm not sure if he has cycles now but he's been very helpful
> on these issues in the past).
> 
> thanks,
> Imran
> 
> 
> On Wed, Sep 16, 2015 at 9:10 PM, Kevin Chen <kc...@palantir.com> wrote:
>> Just wanted to bring this email up again in case there were any thoughts.
>> Having all the information from the web UI accessible through a supported
>> json API is very important to us; are there any objections to us adding a v2
>> API to Spark?
>> 
>> Thanks!
>> 
>> From: Kevin Chen <kc...@palantir.com>
>> Date: Friday, September 11, 2015 at 11:30 AM
>> To: "dev@spark.apache.org" <de...@spark.apache.org>
>> Cc: Matt Cheah <mc...@palantir.com>, Mingyu Kim <mk...@palantir.com>
>> Subject: New Spark json endpoints
>> 
>> Hello Spark Devs,
>> 
>>  I noticed that [SPARK-3454], which introduces new json endpoints at
>> /api/v1/[path] for information previously only shown on the web UI, does not
>> expose several useful properties about Spark jobs that are exposed on the web
>> UI and on the unofficial /json endpoint.
>> 
>>  Specific examples include the maximum number of allotted cores per
>> application, amount of memory allotted to each slave, and number of cores
>> used by each worker. These are provided at ‘app.cores, app.memoryperslave,
>> and worker.coresused’ in the /json endpoint, and also all appear on the web
>> UI page.
>> 
>>  Is there any specific reason that these fields are not exposed in the public
>> API? If not, would it be reasonable to add them to the json blobs, possibly
>> in a future /api/v2 API?
>> 
>> Thank you,
>> Kevin Chen
>> 
>

Re: New Spark json endpoints

Posted by Mark Hamstra <ma...@clearstorydata.com>.

While we're at it, adding endpoints that get results by jobGroup (cf.
SparkContext#setJobGroup) instead of just for a single Job would also be
very useful to some of us.

On Thu, Sep 17, 2015 at 7:30 AM, Imran Rashid <ir...@cloudera.com> wrote:

> Hi Kevin,
>
> I think it would be great if you added this.  It never got added in the
> first place b/c the original PR was already pretty bloated, and just never
> got back to this.  I agree with Reynold -- you shouldn't need to increase
> the version for just adding new endpoints (or even adding new fields to
> existing endpoints).  See the guarantees we make here:
>
> http://spark.apache.org/docs/latest/monitoring.html#rest-api
>
> (Though if you think we should make different guarantees around versions,
> that would be worth discussing as well.)
>
> Can you file a jira, and we move discussion there?  Please cc me, and
> maybe also Josh Rosen (I'm not sure if he has cycles now but he's been very
> helpful on these issues in the past).
>
> thanks,
> Imran
>
>
> On Wed, Sep 16, 2015 at 9:10 PM, Kevin Chen <kc...@palantir.com> wrote:
>
>> Just wanted to bring this email up again in case there were any thoughts.
>> Having all the information from the web UI accessible through a supported
>> json API is very important to us; are there any objections to us adding a
>> v2 API to Spark?
>>
>> Thanks!
>>
>> From: Kevin Chen <kc...@palantir.com>
>> Date: Friday, September 11, 2015 at 11:30 AM
>> To: "dev@spark.apache.org" <de...@spark.apache.org>
>> Cc: Matt Cheah <mc...@palantir.com>, Mingyu Kim <mk...@palantir.com>
>> Subject: New Spark json endpoints
>>
>> Hello Spark Devs,
>>
>>  I noticed that [SPARK-3454], which introduces new json endpoints at
>> /api/v1/[path] for information previously only shown on the web UI, does
>> not expose several useful properties about Spark jobs that are exposed on
>> the web UI and on the unofficial /json endpoint.
>>
>>  Specific examples include the maximum number of allotted cores per
>> application, amount of memory allotted to each slave, and number of cores
>> used by each worker. These are provided at ‘app.cores, app.memoryperslave,
>> and worker.coresused’ in the /json endpoint, and also all appear on the web
>> UI page.
>>
>>  Is there any specific reason that these fields are not exposed in the
>> public API? If not, would it be reasonable to add them to the json blobs,
>> possibly in a future /api/v2 API?
>>
>> Thank you,
>> Kevin Chen
>>
>>
>

Re: New Spark json endpoints

Posted by Imran Rashid <ir...@cloudera.com>.

Hi Kevin,

I think it would be great if you added this.  It never got added in the
first place b/c the original PR was already pretty bloated, and just never
got back to this.  I agree with Reynold -- you shouldn't need to increase
the version for just adding new endpoints (or even adding new fields to
existing endpoints).  See the guarantees we make here:

http://spark.apache.org/docs/latest/monitoring.html#rest-api

(Though if you think we should make different guarantees around versions,
that would be worth discussing as well.)

Can you file a jira, and we move discussion there?  Please cc me, and maybe
also Josh Rosen (I'm not sure if he has cycles now but he's been very
helpful on these issues in the past).

thanks,
Imran


On Wed, Sep 16, 2015 at 9:10 PM, Kevin Chen <kc...@palantir.com> wrote:

> Just wanted to bring this email up again in case there were any thoughts.
> Having all the information from the web UI accessible through a supported
> json API is very important to us; are there any objections to us adding a
> v2 API to Spark?
>
> Thanks!
>
> From: Kevin Chen <kc...@palantir.com>
> Date: Friday, September 11, 2015 at 11:30 AM
> To: "dev@spark.apache.org" <de...@spark.apache.org>
> Cc: Matt Cheah <mc...@palantir.com>, Mingyu Kim <mk...@palantir.com>
> Subject: New Spark json endpoints
>
> Hello Spark Devs,
>
>  I noticed that [SPARK-3454], which introduces new json endpoints at
> /api/v1/[path] for information previously only shown on the web UI, does
> not expose several useful properties about Spark jobs that are exposed on
> the web UI and on the unofficial /json endpoint.
>
>  Specific examples include the maximum number of allotted cores per
> application, amount of memory allotted to each slave, and number of cores
> used by each worker. These are provided at ‘app.cores, app.memoryperslave,
> and worker.coresused’ in the /json endpoint, and also all appear on the web
> UI page.
>
>  Is there any specific reason that these fields are not exposed in the
> public API? If not, would it be reasonable to add them to the json blobs,
> possibly in a future /api/v2 API?
>
> Thank you,
> Kevin Chen
>
>

Re: New Spark json endpoints

Posted by Kevin Chen <kc...@palantir.com>.

Just wanted to bring this email up again in case there were any thoughts.
Having all the information from the web UI accessible through a supported
json API is very important to us; are there any objections to us adding a v2
API to Spark?

Thanks!

From:  Kevin Chen <kc...@palantir.com>
Date:  Friday, September 11, 2015 at 11:30 AM
To:  "dev@spark.apache.org" <de...@spark.apache.org>
Cc:  Matt Cheah <mc...@palantir.com>, Mingyu Kim <mk...@palantir.com>
Subject:  New Spark json endpoints

Hello Spark Devs,

 I noticed that [SPARK-3454], which introduces new json endpoints at
/api/v1/[path] for information previously only shown on the web UI, does not
expose several useful properties about Spark jobs that are exposed on the
web UI and on the unofficial /json endpoint.

 Specific examples include the maximum number of allotted cores per
application, amount of memory allotted to each slave, and number of cores
used by each worker. These are provided at ‘app.cores, app.memoryperslave,
and worker.coresused’ in the /json endpoint, and also all appear on the web
UI page.

 Is there any specific reason that these fields are not exposed in the
public API? If not, would it be reasonable to add them to the json blobs,
possibly in a future /api/v2 API?

Thank you,
Kevin Chen