You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2016/11/15 17:37:58 UTC

[jira] [Comment Edited] (TEZ-3347) Vertex UI throws an error while getting vertexProgress for a killed Vertex

    [ https://issues.apache.org/jira/browse/TEZ-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667752#comment-15667752 ] 

Jonathan Eagles edited comment on TEZ-3347 at 11/15/16 5:37 PM:
----------------------------------------------------------------

[~Sreenath], what's happening is that the failed job is finished yet the timeline server data shows running. When the task attempts page queries the vertexProgress, the proxy on the RM redirects to the generic application history page which is returning html (as expected) and not json.

{quote}
http://rm.example.com:8088/proxy/application_123456789_12345/ws/v1/tez/vertexProgress?dagID=1&vertexID=10&_=xxxxxxxxxx
{quote}

gets redirected to the generic application history page (since that is configured on)
{quote}
http://rm.example.com:8188/applicationhistory/app/application_123456789_12345/ws/v1/tez/vertexProgress?dagID=1&vertexID=00&_=xxxxxxx
{quote}

Since the UI isn't double checking with the RM that the application is finished or not on this page, we can end up in this scenario.


was (Author: jeagles):
[~Sreenath], what's happening is that the failed job is finished yet the timeline server data shows running. When the task attempts page queries the vertexProgress, the proxy on the RM redirects to the generic application history page which is returning html (as expected) and not json.

{quote}
http://rm.example.com:8088/proxy/application_123456789_12345/ws/v1/tez/vertexProgress?dagID=1&vertexID=10&_=xxxxxxxxxx
{quote}

gets redirected to the generic application history page (since that is configured on)
{quote}
http://rm.example.com:8188/applicationhistory/app/application_123456789_12345/ws/v1/tez/vertexProgress?dagID=1&vertexID=00&_=xxxxxxx
{quote}

Since the vertex isn't double checking with the RM that the application is finished or not, we can end up in this scenario.

> Vertex UI throws an error while getting vertexProgress for a killed Vertex
> --------------------------------------------------------------------------
>
>                 Key: TEZ-3347
>                 URL: https://issues.apache.org/jira/browse/TEZ-3347
>             Project: Apache Tez
>          Issue Type: Bug
>          Components: UI
>            Reporter: Kuhu Shukla
>         Attachments: ErrorCodeFailedVertex.png
>
>
> Given an AM that fails all its attempts, the application fails and the very first click on the killed/failed vertex throws the following error:
> {code}
>  error code: Unknown, message: expected expression, got '<'
> {code}
> It self corrects if tried again immediately after the failure.
> This is because the RM proxy redirects the call to the AHS server and the REST call is malformed for that server. Upon inspection of the responses, it was seen that the URL looked something like this:
> {code}
> http://<hostname>:<ahsport>/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1&vertexID=01&_=123
> {code}
> which is not a proper Rest call on the AHS.
> I think the following code can cause this issue:
> {code}
> // Load progress in parallel for v1 version of the api
>   _loadProgress: function (vertices) {
>     var that = this,
>         runningVerticesIdx = vertices
>       .filterBy('status', 'RUNNING')
>       .map(function(item) {
>         return item.get('id').split('_').splice(-1).pop();
>       });
>     if (runningVerticesIdx.length > 0) {
>       this.store.unloadAll('vertexProgress');
>       this.store.findQuery('vertexProgress', {
>         metadata: {
>           appId: that.get('applicationId'),
>           dagIdx: that.get('idx'),
>           vertexIds: runningVerticesIdx.join(',')
>         }
>       }).then(function(vertexProgressInfo) {
>           App.Helpers.emData.mergeRecords(
>             that.get('rowsDisplayed'),
>             vertexProgressInfo,
>             ['progress']
>           );
>       }).catch(function(error) {
>         error.message = "Failed to fetch vertexProgress. Application Master (AM) is out of reach. Either it's down, or CORS is not enabled for YARN ResourceManager.";
>         Em.Logger.error(error);
>         var err = App.Helpers.misc.formatError(error);
>         var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg);
>         App.Helpers.ErrorBar.getInstance().show(msg, err.details);
>       });
> {code}
> which uses AMInfo that gets the response based on what loadApp method finds:
> {code}
> loadApp: function (store, appId, useCache) {
>     if(!useCache) {
>       App.Helpers.misc.removeRecord(store, 'appDetail', appId);
>       App.Helpers.misc.removeRecord(store, 'clusterApp', appId);
>     }
>     return store.find('clusterApp', appId).catch(function () {
>       return store.find('appDetail', appId);
>     }).catch(function (error) {
>       error.message = "Couldn't get details of application %@. RM is not reachable, and history service is not enabled.".fmt(appId);
>       throw error;
>     });
>   }
> {code}
> We can check here in the catch block if the response type is not JSON  or not try and get vertexProgress since it knows that the application/AM has failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)