You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Subramanyam Pattipaka (JIRA)" <ji...@apache.org> on 2017/02/17 00:41:41 UTC

[jira] [Updated] (HIVE-15947) Enhance Templeton service job operations reliability

     [ https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Subramanyam Pattipaka updated HIVE-15947:
-----------------------------------------
    Attachment: HIVE-15947.patch

Attaching patch with changes. Introduced configs and verified changes works fine on real cluster with 400 job submit requests which also make requests until those jobs are completed. Also added unit tests to verify the behavior of concurrent job requests.

> Enhance Templeton service job operations reliability
> ----------------------------------------------------
>
>                 Key: HIVE-15947
>                 URL: https://issues.apache.org/jira/browse/HIVE-15947
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Subramanyam Pattipaka
>            Assignee: Subramanyam Pattipaka
>         Attachments: HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation requests. It simply accepts and tries to run all operations. If more number of concurrent job submit requests comes then the time to submit job operations can increase significantly. Templetonused hdfs to store staging file for job. If HDFS storage can't respond to large number of requests and throttles then the job submission can take very large times in order of minutes.
> This behavior may not be suitable for all applications and client applications  may be looking for predictable and low response for successful request or send throttle response to client to wait for some time before re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs which controls maximum number of concurrent active job submissions within Templeton and use this config to control better response times. If a new job submission request sees that there are already templeton.job.submit.exec.max-procs jobs getting submitted concurrently then the request will fail with Http error 503 with reason 
>    β€œToo many concurrent job submission requests received. Please wait for some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for some time. The default value for the config templeton.job.submit.exec.max-procs is set to β€˜0’. This means by default job submission requests are always accepted. The behavior needs to be enabled based on requirements.
> We can have similar behavior for Status and List operations with configs templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs respectively.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)