You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/09/09 23:04:29 UTC
[jira] [Commented] (OOZIE-1885) Query optimization for
StatusTransitService
[ https://issues.apache.org/jira/browse/OOZIE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127548#comment-14127548 ]
Rohini Palaniswamy commented on OOZIE-1885:
-------------------------------------------
Good idea. w.startTimestamp <= :matTime should not be there in the query. Use distinct instead of group by.
> A join query is always more CPU and memory intensive.
Not necessarily. Join will only be expensive if too many records are selected. Also this is a sub-query involving no join. Too many values in "IN" clause will slow it a bit but certainly this query will not be CPU or memory intensive as there will be at max 2000 coordinators in the database. I have not seen anyone run more coordinators than that.
> Query optimization for StatusTransitService
> -------------------------------------------
>
> Key: OOZIE-1885
> URL: https://issues.apache.org/jira/browse/OOZIE-1885
> Project: Oozie
> Issue Type: Bug
> Reporter: Purshotam Shah
>
> {code}
> private void coordTransit() throws JPAExecutorException, CommandException {
> List<CoordinatorJobBean> pendingJobCheckList = null;
> if (lastInstanceStartTime == null) {
> LOG.info("Running coordinator status service first instance");
> // this is the first instance, we need to check for all pending jobs;
> pendingJobCheckList = jpaService.execute(new CoordJobsGetPendingJPAExecutor(limit));
> }
> else {
> LOG.info("Running coordinator status service from last instance time = "
> + DateUtils.formatDateOozieTZ(lastInstanceStartTime));
> // this is not the first instance, we should only check jobs
> // that have actions or jobs been
> // updated >= start time of last service run;
> List<CoordinatorActionBean> actionsList = CoordActionQueryExecutor.getInstance().getList(
> CoordActionQuery.GET_COORD_ACTIONS_BY_LAST_MODIFIED_TIME, lastInstanceStartTime);
> Set<String> coordIds = new HashSet<String>();
> for (CoordinatorActionBean action : actionsList) {
> coordIds.add(action.getJobId());
> }
> pendingJobCheckList = new ArrayList<CoordinatorJobBean>();
> for (String coordId : coordIds.toArray(new String[coordIds.size()])) {
> CoordinatorJobBean coordJob;
> try {
> coordJob = CoordJobQueryExecutor.getInstance().get(CoordJobQuery.GET_COORD_JOB, coordId);
> }
> catch (JPAExecutorException jpaee) {
> if (jpaee.getErrorCode().equals(ErrorCode.E0604)) {
> LOG.warn("Exception happened during StatusTransitRunnable; Coordinator Job doesn't exist", jpaee);
> continue;
> } else {
> throw jpaee;
> }
> }
> // Running coord job might have pending false
> Job.Status coordJobStatus = coordJob.getStatus();
> if ((coordJob.isPending() || coordJobStatus.equals(Job.Status.PAUSED)
> || coordJobStatus.equals(Job.Status.RUNNING)
> || coordJobStatus.equals(Job.Status.RUNNINGWITHERROR)
> || coordJobStatus.equals(Job.Status.PAUSEDWITHERROR))
> && !coordJobStatus.equals(Job.Status.IGNORED)) {
> pendingJobCheckList.add(coordJob);
> }
> }
> pendingJobCheckList.addAll(CoordJobQueryExecutor.getInstance().getList(
> CoordJobQuery.GET_COORD_JOBS_CHANGED, lastInstanceStartTime));
> }
> aggregateCoordJobsStatus(pendingJobCheckList);
> }
> }
> {code}
> This could be done in one sql, something like
> select w.id, w.status, w.pending from CoordinatorJobBean w where w.startTimestamp <= :matTime AND (w.statusStr = 'PREP' OR w.statusStr = 'RUNNING' or w.statusStr = 'RUNNINGWITHERROR' or w.statusStr= 'PAUSEDWITHERROR' and w.statusStr <> 'IGNORED') w.id in ( select a.jobId from CoordinatorActionBean a where a.lastModifiedTimestamp >= :lastModifiedTime groupby a.jobId)
> Same for bundleTransit().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)