You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Varun Saxena (JIRA)" <ji...@apache.org> on 2016/09/01 12:41:20 UTC

[jira] [Comment Edited] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455257#comment-15455257 ] 

Varun Saxena edited comment on YARN-5585 at 9/1/16 12:40 PM:
-------------------------------------------------------------

So I though a little bit over it and I think there is a solution possible for fetching apps within a cluster without much of performance impact. Because this seems to be your use case.

What we can do is that  we can get the required App IDs' from App to flow table first as app ids' in this table are sorted and extract applicable flows from there. And then get data from the application table using these unique flows to get more specific information about the apps.  We have something called MultiRowRangeFilter in HBase which can help us specify multiple row key ranges.
We can only return those apps which we found from app to flow table. 
And from a performance viewpoint we can assume there will always be a reasonable limit specified.
 
_Example:_
Assume, in a cluster we have applications from application_1111111_0001 to application_1111111_0034 (running or completed).
These apps will be stored in a descending order in app to flow table. 
Let us say you want to get latest 10 apps (i.e. limit in your query is 10).
What we can do is get first 10 apps from app to flow table i.e. application_1111111_0034 to application_1111111_0025. We can use PageFilter to return only first 10 records. This is the result set we can return back.
Assume application IDs' ending with _0034, _0031 and _0027 belong to flow1 and rest to flow2. We can then use this info to query app table.

So to get detailed info for these 10 apps in a single shot from application table, what we can do is as under :
* Create a MultiRowRangeFilter
* For flow1. add start row as {{cluster!user!flow1!application_1111111_0034}} and stop row as {{cluster!user!flow1!application_1111111_0027}}. We can make stop row inclusive. We can then add this start/stop row pair into the multi row range filter created.
* And for flow2, start row can be  {{cluster!user!flow2!application_1111111_0033}} and stop row as  {{cluster!user!flow2!application_1111111_0024}}. We can then add this start/stop row pair into the multi row range filter created.

This would be slower than getting all apps when flow or flow run is specified but would be faster than doing full table scan of application table, especially when it grows large.

Maybe I can raise a separate JIRA for this and handle it there if this is a real use case.


was (Author: varun_saxena):
So I though a little bit over it and I think there is a solution possible for fetching apps within a cluster without much of performance impact. Because this seems to be your use case.

What we can do is that  we can get the required App IDs' from App to flow table first as app ids' in this table are sorted and extract applicable flows from there. And then get data from the application table using these unique flows to get more specific information about the apps. Say pass a flow to appids' map. We have something called MultiRowRangeFilter in HBase which can help us specify multiple row key ranges.
We can only return those apps which we found from app to flow table. 
And from a performance viewpoint we can assume there will always be a reasonable limit specified.
 
_Example:_
Assume, in a cluster we have applications from application_1111111_0001 to application_1111111_0034 (running or completed).
These apps will be stored in a descending order in app to flow table. 
Let us say you want to get latest 10 apps (i.e. limit in your query is 10).
What we can do is get first 10 apps from app to flow table i.e. application_1111111_0034 to application_1111111_0025. We can use PageFilter to return only first 10 records. This is the result set we can return back.
Assume application IDs' ending with _0034, _0031 and _0027 belong to flow1 and rest to flow2. We can then use this info to query app table.

So to get detailed info for these 10 apps in a single shot from application table, what we can do is as under :
* Create a MultiRowRangeFilter
* For flow1. add start row as {{cluster!user!flow1!application_1111111_0034}} and stop row as {{cluster!user!flow1!application_1111111_0027}}. We can make stop row inclusive. We can then add this start/stop row pair into the multi row range filter created.
* And for flow2, start row can be  {{cluster!user!flow2!application_1111111_0033}} and stop row as  {{cluster!user!flow2!application_1111111_0024}}. We can then add this start/stop row pair into the multi row range filter created.

This would be slower than getting all apps when flow or flow run is specified but would be faster than doing full table scan of application table, especially when it grows large.

Maybe I can raise a separate JIRA for this and handle it there if this is a real use case.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along with those, it would be good to add new filter i.e fromId so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is difficult.
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org