You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Victor Wong (Jira)" <ji...@apache.org> on 2019/12/31 10:22:00 UTC

[jira] [Created] (FLINK-15448) Make "ResourceID#toString" more descriptive

Victor Wong created FLINK-15448:
-----------------------------------

             Summary: Make "ResourceID#toString" more descriptive
                 Key: FLINK-15448
                 URL: https://issues.apache.org/jira/browse/FLINK-15448
             Project: Flink
          Issue Type: Improvement
    Affects Versions: 1.9.1
            Reporter: Victor Wong


With Flink on Yarn, sometimes we ran into an exception like this:

{code:java}
java.util.concurrent.TimeoutException: The heartbeat of TaskManager with id container_xxxx  timed out.
{code}

We'd like to find out the host of the lost TaskManager to log into it for more details, we have to check the previous logs for the host information, which is a little time-consuming.

Maybe we can add more descriptive information to ResourceID of Yarn containers, e.g. "container_xxx@host_name:port_number".

Here's the demo:


{code:java}
class ResourceID {
  final String resourceId;
  final String details;

  public ResourceID(String resourceId) {
    this.resourceId = resourceId;
    this.details = resourceId;
  }

  public ResourceID(String resourceId, String details) {
    this.resourceId = resourceId;
    this.details = details;
  }

  public String toString() {
    return details;
  }	  
}

// in flink-yarn
private void startTaskExecutorInContainer(Container container) {
  final String containerIdStr = container.getId().toString();
  final String containerDetail = container.getId() + "@" + container.getNodeId();  
  final ResourceID resourceId = new ResourceID(containerIdStr, containerDetail);
  ...
}
{code}







--
This message was sent by Atlassian Jira
(v8.3.4#803005)