You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "reshu.agarwal" <re...@orkash.com> on 2014/11/12 06:45:02 UTC

DUCC-1.1.0: Machines are going down very frequently

Hi,

When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down 
status problem in machines. I have configured two machines and these 
machines are going down one by one. This makes the DUCC Services disable 
and Jobs to be initialize again and again.

DUCC 1.0.0 was working fine on same machines.

How can I fix this problem? I have also compared ducc.properties file 
for both versions. Both are using same configuration to check heartbeats.

Re-Initialization of Jobs are increasing the processing time. Can I 
change or re-configure this process?

Services are getting disabled automatically and showing excessive 
Initialization error status on mark over on disabled status but logs are 
not showing any error.

I have to use DUCC 1.0.0 instead of DUCC 1.1.0.

Thanks in Advance.

-- 
Signature *Reshu Agarwal*


Re: DUCC-1.1.0: Machines are going down very frequently

Posted by Lou DeGenaro <lo...@gmail.com>.
Also, do all the daemons on the System -> Daemons page show status "up"?

Have a look at the Broker page for live demo on Apache here:
http://uima-ducc-vm.apache.org:42133/system.broker.jsp and compare with
yours.  Do all of the Topics appear with consumers > 0 ?

On Mon, Nov 17, 2014 at 6:48 AM, Lou DeGenaro <lo...@gmail.com>
wrote:

> Reshu,
>
> Have you tried looking at the log files in DUCC's log directory for signs
> of errors or exceptions?  Are any daemons producing core dumps?
>
> Lou.
>
> On Mon, Nov 17, 2014 at 1:21 AM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>>
>> Dear Lou,
>>
>> I am using default configuration:
>>
>> ducc.agent.node.metrics.publish.rate=30000
>> ducc.rm.node.stability = 5
>>
>> Reshu.
>>
>>
>> Signature On 11/12/2014 05:03 PM, Lou DeGenaro wrote:
>>
>>> What do you have defined in your ducc.properties for
>>> ducc.rm.node.stability and ducc.agent.node.metrics.publish.rate?  The
>>> Web Server considers a node down according to the following
>>> calculation:
>>>
>>> private long getAgentMillisMIA() {
>>>          String location = "getAgentMillisMIA";
>>>          long secondsMIA = DOWN_AFTER_SECONDS*SECONDS_PER_MILLI;
>>>          Properties properties = DuccWebProperties.get();
>>>          String s_tolerance = properties.getProperty("ducc.
>>> rm.node.stability");
>>>          String s_rate =
>>> properties.getProperty("ducc.agent.node.metrics.publish.rate");
>>>          try {
>>>              long tolerance = Long.parseLong(s_tolerance.trim());
>>>              long rate = Long.parseLong(s_rate.trim());
>>>              secondsMIA = (tolerance * rate) / 1000;
>>>          }
>>>          catch(Throwable t) {
>>>              logger.warn(location, jobid, t);
>>>          }
>>>          return secondsMIA;
>>>      }
>>>
>>> The default is 65 seconds. Note that the Web Server has no effect on
>>> actual operations in this case.  If is just a reporter of information.
>>>
>>> Lou.
>>>
>>> On Wed, Nov 12, 2014 at 12:45 AM, reshu.agarwal
>>> <re...@orkash.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down
>>>> status problem in machines. I have configured two machines and these
>>>> machines are going down one by one. This makes the DUCC Services
>>>> disable and
>>>> Jobs to be initialize again and again.
>>>>
>>>> DUCC 1.0.0 was working fine on same machines.
>>>>
>>>> How can I fix this problem? I have also compared ducc.properties file
>>>> for
>>>> both versions. Both are using same configuration to check heartbeats.
>>>>
>>>> Re-Initialization of Jobs are increasing the processing time. Can I
>>>> change
>>>> or re-configure this process?
>>>>
>>>> Services are getting disabled automatically and showing excessive
>>>> Initialization error status on mark over on disabled status but logs
>>>> are not
>>>> showing any error.
>>>>
>>>> I have to use DUCC 1.0.0 instead of DUCC 1.1.0.
>>>>
>>>> Thanks in Advance.
>>>>
>>>> --
>>>> Signature *Reshu Agarwal*
>>>>
>>>>
>>
>

Re: DUCC-1.1.0: Machines are going down very frequently

Posted by "reshu.agarwal" <re...@orkash.com>.
Lou,

I tried to find any sign of error and exception but didn't find any.

Reshu.
On 11/17/2014 05:18 PM, Lou DeGenaro wrote:
> Reshu,
>
> Have you tried looking at the log files in DUCC's log directory for signs
> of errors or exceptions?  Are any daemons producing core dumps?
>
> Lou.
>
> On Mon, Nov 17, 2014 at 1:21 AM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>> Dear Lou,
>>
>> I am using default configuration:
>>
>> ducc.agent.node.metrics.publish.rate=30000
>> ducc.rm.node.stability = 5
>>
>> Reshu.
>>
>>
>> Signature On 11/12/2014 05:03 PM, Lou DeGenaro wrote:
>>
>>> What do you have defined in your ducc.properties for
>>> ducc.rm.node.stability and ducc.agent.node.metrics.publish.rate?  The
>>> Web Server considers a node down according to the following
>>> calculation:
>>>
>>> private long getAgentMillisMIA() {
>>>           String location = "getAgentMillisMIA";
>>>           long secondsMIA = DOWN_AFTER_SECONDS*SECONDS_PER_MILLI;
>>>           Properties properties = DuccWebProperties.get();
>>>           String s_tolerance = properties.getProperty("ducc.
>>> rm.node.stability");
>>>           String s_rate =
>>> properties.getProperty("ducc.agent.node.metrics.publish.rate");
>>>           try {
>>>               long tolerance = Long.parseLong(s_tolerance.trim());
>>>               long rate = Long.parseLong(s_rate.trim());
>>>               secondsMIA = (tolerance * rate) / 1000;
>>>           }
>>>           catch(Throwable t) {
>>>               logger.warn(location, jobid, t);
>>>           }
>>>           return secondsMIA;
>>>       }
>>>
>>> The default is 65 seconds. Note that the Web Server has no effect on
>>> actual operations in this case.  If is just a reporter of information.
>>>
>>> Lou.
>>>
>>> On Wed, Nov 12, 2014 at 12:45 AM, reshu.agarwal
>>> <re...@orkash.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down
>>>> status problem in machines. I have configured two machines and these
>>>> machines are going down one by one. This makes the DUCC Services disable
>>>> and
>>>> Jobs to be initialize again and again.
>>>>
>>>> DUCC 1.0.0 was working fine on same machines.
>>>>
>>>> How can I fix this problem? I have also compared ducc.properties file for
>>>> both versions. Both are using same configuration to check heartbeats.
>>>>
>>>> Re-Initialization of Jobs are increasing the processing time. Can I
>>>> change
>>>> or re-configure this process?
>>>>
>>>> Services are getting disabled automatically and showing excessive
>>>> Initialization error status on mark over on disabled status but logs are
>>>> not
>>>> showing any error.
>>>>
>>>> I have to use DUCC 1.0.0 instead of DUCC 1.1.0.
>>>>
>>>> Thanks in Advance.
>>>>
>>>> --
>>>> Signature *Reshu Agarwal*
>>>>
>>>>


Re: DUCC-1.1.0: Machines are going down very frequently

Posted by Lou DeGenaro <lo...@gmail.com>.
Reshu,

Have you tried looking at the log files in DUCC's log directory for signs
of errors or exceptions?  Are any daemons producing core dumps?

Lou.

On Mon, Nov 17, 2014 at 1:21 AM, reshu.agarwal <re...@orkash.com>
wrote:

>
> Dear Lou,
>
> I am using default configuration:
>
> ducc.agent.node.metrics.publish.rate=30000
> ducc.rm.node.stability = 5
>
> Reshu.
>
>
> Signature On 11/12/2014 05:03 PM, Lou DeGenaro wrote:
>
>> What do you have defined in your ducc.properties for
>> ducc.rm.node.stability and ducc.agent.node.metrics.publish.rate?  The
>> Web Server considers a node down according to the following
>> calculation:
>>
>> private long getAgentMillisMIA() {
>>          String location = "getAgentMillisMIA";
>>          long secondsMIA = DOWN_AFTER_SECONDS*SECONDS_PER_MILLI;
>>          Properties properties = DuccWebProperties.get();
>>          String s_tolerance = properties.getProperty("ducc.
>> rm.node.stability");
>>          String s_rate =
>> properties.getProperty("ducc.agent.node.metrics.publish.rate");
>>          try {
>>              long tolerance = Long.parseLong(s_tolerance.trim());
>>              long rate = Long.parseLong(s_rate.trim());
>>              secondsMIA = (tolerance * rate) / 1000;
>>          }
>>          catch(Throwable t) {
>>              logger.warn(location, jobid, t);
>>          }
>>          return secondsMIA;
>>      }
>>
>> The default is 65 seconds. Note that the Web Server has no effect on
>> actual operations in this case.  If is just a reporter of information.
>>
>> Lou.
>>
>> On Wed, Nov 12, 2014 at 12:45 AM, reshu.agarwal
>> <re...@orkash.com> wrote:
>>
>>> Hi,
>>>
>>> When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down
>>> status problem in machines. I have configured two machines and these
>>> machines are going down one by one. This makes the DUCC Services disable
>>> and
>>> Jobs to be initialize again and again.
>>>
>>> DUCC 1.0.0 was working fine on same machines.
>>>
>>> How can I fix this problem? I have also compared ducc.properties file for
>>> both versions. Both are using same configuration to check heartbeats.
>>>
>>> Re-Initialization of Jobs are increasing the processing time. Can I
>>> change
>>> or re-configure this process?
>>>
>>> Services are getting disabled automatically and showing excessive
>>> Initialization error status on mark over on disabled status but logs are
>>> not
>>> showing any error.
>>>
>>> I have to use DUCC 1.0.0 instead of DUCC 1.1.0.
>>>
>>> Thanks in Advance.
>>>
>>> --
>>> Signature *Reshu Agarwal*
>>>
>>>
>

Re: DUCC-1.1.0: Machines are going down very frequently

Posted by "reshu.agarwal" <re...@orkash.com>.
Dear Lou,

I am using default configuration:

ducc.agent.node.metrics.publish.rate=30000
ducc.rm.node.stability = 5

Reshu.

Signature On 11/12/2014 05:03 PM, Lou DeGenaro wrote:
> What do you have defined in your ducc.properties for
> ducc.rm.node.stability and ducc.agent.node.metrics.publish.rate?  The
> Web Server considers a node down according to the following
> calculation:
>
> private long getAgentMillisMIA() {
>          String location = "getAgentMillisMIA";
>          long secondsMIA = DOWN_AFTER_SECONDS*SECONDS_PER_MILLI;
>          Properties properties = DuccWebProperties.get();
>          String s_tolerance = properties.getProperty("ducc.rm.node.stability");
>          String s_rate =
> properties.getProperty("ducc.agent.node.metrics.publish.rate");
>          try {
>              long tolerance = Long.parseLong(s_tolerance.trim());
>              long rate = Long.parseLong(s_rate.trim());
>              secondsMIA = (tolerance * rate) / 1000;
>          }
>          catch(Throwable t) {
>              logger.warn(location, jobid, t);
>          }
>          return secondsMIA;
>      }
>
> The default is 65 seconds. Note that the Web Server has no effect on
> actual operations in this case.  If is just a reporter of information.
>
> Lou.
>
> On Wed, Nov 12, 2014 at 12:45 AM, reshu.agarwal
> <re...@orkash.com> wrote:
>> Hi,
>>
>> When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down
>> status problem in machines. I have configured two machines and these
>> machines are going down one by one. This makes the DUCC Services disable and
>> Jobs to be initialize again and again.
>>
>> DUCC 1.0.0 was working fine on same machines.
>>
>> How can I fix this problem? I have also compared ducc.properties file for
>> both versions. Both are using same configuration to check heartbeats.
>>
>> Re-Initialization of Jobs are increasing the processing time. Can I change
>> or re-configure this process?
>>
>> Services are getting disabled automatically and showing excessive
>> Initialization error status on mark over on disabled status but logs are not
>> showing any error.
>>
>> I have to use DUCC 1.0.0 instead of DUCC 1.1.0.
>>
>> Thanks in Advance.
>>
>> --
>> Signature *Reshu Agarwal*
>>


Re: DUCC-1.1.0: Machines are going down very frequently

Posted by Lou DeGenaro <lo...@gmail.com>.
What do you have defined in your ducc.properties for
ducc.rm.node.stability and ducc.agent.node.metrics.publish.rate?  The
Web Server considers a node down according to the following
calculation:

private long getAgentMillisMIA() {
        String location = "getAgentMillisMIA";
        long secondsMIA = DOWN_AFTER_SECONDS*SECONDS_PER_MILLI;
        Properties properties = DuccWebProperties.get();
        String s_tolerance = properties.getProperty("ducc.rm.node.stability");
        String s_rate =
properties.getProperty("ducc.agent.node.metrics.publish.rate");
        try {
            long tolerance = Long.parseLong(s_tolerance.trim());
            long rate = Long.parseLong(s_rate.trim());
            secondsMIA = (tolerance * rate) / 1000;
        }
        catch(Throwable t) {
            logger.warn(location, jobid, t);
        }
        return secondsMIA;
    }

The default is 65 seconds. Note that the Web Server has no effect on
actual operations in this case.  If is just a reporter of information.

Lou.

On Wed, Nov 12, 2014 at 12:45 AM, reshu.agarwal
<re...@orkash.com> wrote:
>
> Hi,
>
> When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down
> status problem in machines. I have configured two machines and these
> machines are going down one by one. This makes the DUCC Services disable and
> Jobs to be initialize again and again.
>
> DUCC 1.0.0 was working fine on same machines.
>
> How can I fix this problem? I have also compared ducc.properties file for
> both versions. Both are using same configuration to check heartbeats.
>
> Re-Initialization of Jobs are increasing the processing time. Can I change
> or re-configure this process?
>
> Services are getting disabled automatically and showing excessive
> Initialization error status on mark over on disabled status but logs are not
> showing any error.
>
> I have to use DUCC 1.0.0 instead of DUCC 1.1.0.
>
> Thanks in Advance.
>
> --
> Signature *Reshu Agarwal*
>