You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Agshin Kazimli (Jira)" <ji...@apache.org> on 2021/01/07 11:08:00 UTC

[jira] [Commented] (YARN-7200) SLS generates a realtimetrack.json file but that file is missing the closing ']'

    [ https://issues.apache.org/jira/browse/YARN-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260416#comment-17260416 ] 

Agshin Kazimli commented on YARN-7200:
--------------------------------------

Hi [~snemeth],
 Thanks for the review.

I've investigated the points you have described above. I'd like to point out my views on the aforementioned scenario.

So, as it is designed, SLS statically takes job informations from the json file and creates AMs for these jobs right after starting Resource Manager and Node Managers. SLSRunner.startAM() (org.apache.hadoop.yarn.sls) is invoked to create AMSimulators from input traces(SLS, RUMEN or SYNTH), add them to amMap, maps the job ID and corresponding AMSimulators.

*The call hierarchy of AMSimulator creation from SLS trace*
{code:java}
(org.apache.hadoop.yarn.sls)

SLSRunner.startAM()
   SLSRunner.startAMFromSLSTrace(String inputTrace)
      SLSRunner.createAMForJob(Map jsonJob)
         SLSRunner.runNewAM(String jobType, String user,
                           String jobQueue, String oldJobId, long jobStartTimeMS,
                           long jobFinishTimeMS, List<ContainerSimulator> containerList,
                           Resource amContainerResource, String labelExpr)
            SLSRunner.runNewAM(String jobType, String user,
                           String jobQueue, String oldJobId, long jobStartTimeMS,
                           long jobFinishTimeMS, List<ContainerSimulator> containerList,
                           ReservationId reservationId, long deadline, Resource amContainerResource,
                           String labelExpr, Map<String, String> params)

{code}

1. SLSRunner.startAM() invokes corresponding functions to create AMs from given input trace i.e _SLS, RUMEN, SYNTH_
2. SLSRunner.startAMFromSLSTrace() reads the input trace(json file) and invokes SLSRunner.createAMForJob() for every job
3. SLSRunner.createAMForJob() takes the map of jsonJobs and for the given job count, invokes SLSRunner.runNewAM()
4. SLSRunner.runNewAM() is called, there are 3 different SLSRunner.runNewAM() functions out there, because  _SLS, RUMEN, SYNTH_ traces differ a little bit. One of the functions is the base, which is invoked on the other SLSRunner.runNewAM() functions.
5. In SLSRunner.runNewAM(), AMSimulator is initialized with the given parameters which gets heartbeatInterval argument and creates the AMSimulator. Then, new entry is added to amMap with the (jobID, amSim).

At the end of SLSRunner.startAM(), remainingApps is assigned to numAMs, which is equal to amMap.size() at the end of startAM():

{code:java}
numAMs = amMap.size();
remainingApps = numAMs;
{code}

My conclusion is that, as you see, creation of AMs is not bound to any other thread, they are automatically created with the static info, mapping job id and amsimulator, and assigning remainingApps to the size of this map. To support my argument, I've added some LOG info to see whether they are created and added to the map instantaneously. As it is expected, it turns out that way. In the scenario, which you've mentioned AMSimulators can have different heartBeatInterval and starting time, but it doesn't happen in the same process, as I've described above, SLSRunner.runNewAM() initializes AMSimulators, which in turn extended from TaskRunner.task, itself implements Runnable interface. But, mapping of these AMSimulators are happening on the same thread.


> SLS generates a realtimetrack.json file but that file is missing the closing ']'
> --------------------------------------------------------------------------------
>
>                 Key: YARN-7200
>                 URL: https://issues.apache.org/jira/browse/YARN-7200
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler-load-simulator
>            Reporter: Grant Sohn
>            Assignee: Agshin Kazimli
>            Priority: Minor
>              Labels: newbie, newbie++
>         Attachments: YARN-7200-branch-trunk.patch, YARN-7200.002.patch, YARN-7200.003.patch, snemeth-testing-20201113.zip
>
>
> File hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java shows:
> {noformat}
>   void tearDown() throws Exception {
>     if (metricsLogBW != null)  {
>       metricsLogBW.write("]");
>       metricsLogBW.close();
>     }
>     ....
> {noformat}
> So the exit logic is flawed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org