You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Peter Bacsko (JIRA)" <ji...@apache.org> on 2019/01/21 16:13:00 UTC

[jira] [Commented] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

    [ https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748059#comment-16748059 ] 

Peter Bacsko commented on YARN-9100:
------------------------------------

Thanks for the improvements Szilard. Some thoughts:



1.
{noformat}
87	} catch (InterruptedException e) {
88	// On any interrupt, break the loop and continue execution.
89	break;{noformat}
At least log something in case of an {{InterruptedException.}} Also, in cases like this, restoring the interrupted flag with {{Thread.currentThread.interrupt()}} is desirable.

2. In {{logStatement()}} you log twice if TRACE is enabled (I guess?)

3. Use SLF4J as an API, not Commons Logging.

4. You don't log anything in case of a timeout.

5. You can define both {{check}} and {{nonNullCheck}} at the same time. There are two problems with this. First, ordinary {{check}} is not used in the code. Second, if both are used, then the result of {{nonNullCheck}} is simply ignored.

 
In general I feel that having the retry logic in a separate class a bit of an overengineering. It would be justified it the patch modified classes other than {{GpuResourceAllocator}}. But for only a single class, it looks like an overkill.
Also, check out this project, which might be good for us: https://github.com/rholder/guava-retrying

 

> Add tests for GpuResourceAllocator and do minor code cleanup
> ------------------------------------------------------------
>
>                 Key: YARN-9100
>                 URL: https://issues.apache.org/jira/browse/YARN-9100
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-9100.001.patch, YARN-9100.002.patch, YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal data structures. The word "copy" is just a noise in their name, so they have been renamed. Additionally, the copied data structures modified to be immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, AMRMClientAsync and even in GenericTestUtils (see waitFor method). RetryCommand could be a future replacement of these duplicated code, as it gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator (startGpuAssignmentLoop) is the ugly exception handling part, but that's solely because how Java deals with checked exceptions vs. lambdas. If there's a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org