You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/05/05 21:22:00 UTC

[jira] [Work logged] (GOBBLIN-1823) Improving Container Calculation and Allocation Methodology

     [ https://issues.apache.org/jira/browse/GOBBLIN-1823?focusedWorklogId=860826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860826 ]

ASF GitHub Bot logged work on GOBBLIN-1823:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/May/23 21:21
            Start Date: 05/May/23 21:21
    Worklog Time Spent: 10m 
      Work Description: ZihanLi58 opened a new pull request, #3692:
URL: https://github.com/apache/gobblin/pull/3692

   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-1823
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if applicable):
   **Problem**: When Yarn allocates "ghost containers" without calling the onContainerAllocated() method and when the container is eventually released, onContainersCompleted() is called, container numbers mismatches can occur. 
   In the onContainerAllocated() method, we add the container to the containerMap using the container ID as the key, and increase the count for the specific tag.
   In the onContainersCompleted() method, we remove the container from the containerMap and decrease the count. However, in some cases, we find that the containerMap does not contain the ID, and we ignore this while still decreasing the number of the allocated tag. We do this because sometimes onContainersCompleted() is called before onContainerAllocated() for the same container.
   
   **Solution**
   1. Add the removedContainerID map to track the containers that have been released before onContainerAllocated() is called
   2. Go through the container map to check the whether the assigned helix instance is alive and release it when it's in-alive for more than 10 minutes
   3. Add TIME_OUT and COMPLETED as the un-retryable partition state and log it out to improve debugability.   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   Unit test for exiting function, it's hard to add a unit test for a bad yarn container and helix disconnection situation.
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       4. Subject does not end with a period
       5. Subject uses the imperative mood ("add", not "adding")
       6. Body wraps at 72 characters
       7. Body explains "what" and "why", not "how"
   
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 860826)
    Remaining Estimate: 0h
            Time Spent: 10m

> Improving Container Calculation and Allocation Methodology
> ----------------------------------------------------------
>
>                 Key: GOBBLIN-1823
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1823
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Zihan Li
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When Yarn allocates "ghost containers" without calling the onContainerAllocated() method and when the container is eventually released, onContainersCompleted() is called, container numbers mismatches can occur. 
> In the onContainerAllocated() method, we add the container to the containerMap using the container ID as the key, and increase the count for the specific tag.
> In the onContainersCompleted() method, we remove the container from the containerMap and decrease the count. However, in some cases, we find that the containerMap does not contain the ID, and we ignore this while still decreasing the number of the allocated tag. We do this because sometimes onContainersCompleted() is called before onContainerAllocated() for the same container.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)