You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Varun Saxena (JIRA)" <ji...@apache.org> on 2016/12/19 17:21:58 UTC

[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761743#comment-15761743 ] 

Varun Saxena commented on YARN-5585:
------------------------------------

Thanks [~rohithsharma] for the patch.

bq. For single entity retrieval, when IdPrefix is not known, need to match column value for entityType by doing range scan. Any other way this can achieve this?
I am wondering that can we utilize setting the start and stop row in Scan for this. Reason being we know idprefix can have a range of 0 to max value of long. Thus, our start row can be {{cluster!user!flow!runid!appid!entitytype!0!entityid}} and as stop row in not inclusive, we can call TimelineStorageUtils#calculateTheClosestNextRowKeyForPrefix for {{cluster!user!flow!runid!appid!entitytype!LONG_MAX!entityid}}. This would mean that typically only one row will be scanned. We can anyways break out of the loop as soon as first row (which will be true for almost all the cases) is found. We can use PageFilter of 1 to keep the Scan and result retrieved via it as small. Thoughts ?

bq.  FromId can be passed as filter where in fromId=idPrefix!entityId
As idPrefix is numeric any separator should be fine as we won't have to encode it. Prefer to use those separators which do not require URL encoding.

bq.  If we plan to reuse same API's.
I think we can reuse same APIs'. We can add a new query param, say idprefix and we can document that query retrieval will be slightly faster if idprefix is provided. Would like to know what others think about this though.

bq. we need to handle one scenario where same entityId is published with 2 entityIdPrefix. entityIdPrefix is mandatorily written even though user do not provide any idPrefix while publishing entities. So, if case of idPrefix is not known, should we use default idPrefix to get a row?
This will be tricky. We can follow what I mentioned in point 1 (if feasible) and break out of the loop on first row. 
If we just use 0 (default idprefix) we wont be able to support direct queries by user based on say, container id, task id, etc. where the user may not know about the corresponding prefix.
Another option could be that if more than one row is encountered for a single entity read, we send some sort of error message indicating multiple idprefixes in backend which can alert the user/application of some issue on the write side.

> [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>              Labels: oct16-hard
>         Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, YARN-5585-YARN-5355.0002.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along with those, it would be good to add new filter i.e fromId so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to get next set of entities using fromId rather than querying all the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org