You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by priyank sharma <pr...@orkash.com> on 2017/11/09 10:19:15 UTC

DUCC's job goes into infintie loop

All!

I have a problem regarding DUCC cluster in which a job process gets 
stuck and keeps on processing the same batch again and again due to 
maximum duration the batch gets reason or extraordinary status 
*"**CanceledByUser" *and then gets restarted with the same ID's. This 
usually happens after 15 to 20 days and goes away after restarting the 
ducc cluster. While going through the data store that is being used by 
CAS consumer to ingest data, the data regarding this batch does never 
get ingested. So most probably this data is not being processed.

How to check if this data is being processed or not?

Are the resources the issue and why it is being processed after 
restarting the cluster?

We have three nodes cluster with  32gb ram, 40gb ram and 28 gb ram.



-- 
Thanks and Regards
Priyank Sharma

Re: DUCC's job goes into infintie loop

Posted by Lou DeGenaro <lo...@gmail.com>.

Are you running with a shared file system on your cluster?  Is your user
log directory located there?  Look at the DUCC daemon log files located in
$DUCC_HOME/logs. They should provide some clues as to what is wrong.  Feel
free to post (non-confidential versions of) them here for a second opinion.

Lou.

On Fri, Nov 10, 2017 at 12:11 AM, priyank sharma <pr...@orkash.com>
wrote:

> There is nothing on the work item page and performance page on the web
> server. There is only one log file for the main node, no log files for
> other two nodes. Ducc job processes not able to pick the data from the data
> source and no UIMA aggregator is working for that batches.
>
> Are the issue because of the java heap space? We are giving 4gb ram to the
> job-process.
>
> Attaching the Log file.
>
> Thanks and Regards
> Priyank Sharma
>
> On Thursday 09 November 2017 04:33 PM, Lou DeGenaro wrote:
>
>> The first place to look is in your job's logs.  Visit the ducc-mon jobs
>> page ducchost:42133/jobs.jsp then click on the id of your job.  Examine
>> the
>> logs by clicking on each log file name looking for any revealing
>> information.
>>
>> Feel free to post non-confidential snippets here, or If you'd like to chat
>> in real time we can use hipchat.
>>
>> Lou.
>>
>> On Thu, Nov 9, 2017 at 5:19 AM, priyank sharma <priyank.sharma@orkash.com
>> >
>> wrote:
>>
>> All!
>>>
>>> I have a problem regarding DUCC cluster in which a job process gets stuck
>>> and keeps on processing the same batch again and again due to maximum
>>> duration the batch gets reason or extraordinary status
>>> *"**CanceledByUser"
>>> *and then gets restarted with the same ID's. This usually happens after
>>> 15
>>> to 20 days and goes away after restarting the ducc cluster. While going
>>> through the data store that is being used by CAS consumer to ingest data,
>>> the data regarding this batch does never get ingested. So most probably
>>> this data is not being processed.
>>>
>>> How to check if this data is being processed or not?
>>>
>>> Are the resources the issue and why it is being processed after
>>> restarting
>>> the cluster?
>>>
>>> We have three nodes cluster with  32gb ram, 40gb ram and 28 gb ram.
>>>
>>>
>>>
>>> --
>>> Thanks and Regards
>>> Priyank Sharma
>>>
>>>
>>>
>