You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by José Luis Larroque <la...@gmail.com> on 2017/02/08 02:23:43 UTC

Re: Stop Giraph application when reach certain amount of time

I finally make a bash script that waits an amount of time before killing
any app :(

-- 
*José Luis Larroque*
Analista Programador Universitario - Facultad de Informática - UNLP
Desarrollador Java  en LIFIA

2017-01-26 23:32 GMT-03:00 José Luis Larroque <la...@gmail.com>:

> I believe that i found something related to this issue.
>
> The behavior when maxAllowedJobTimeMilliseconds is set is strongly
> related to giraph.trackJobProgressOnClient option, which is set on *false*
> by default.
>
> For stoping the job when the time reach to the maxAllowedJobTimeMilliseco
> nds value, the method mapperStarted() of JobProgressTrackerService should
> be executed.
>
> In Giraph 1.1, the giraph.trackJobProgressOnClient configuration option
> is in false by default. When this happens, a JobProgressTrackerClientNoOp
> is created for tracking progress on client. This class have the
> mapperStarted() method implemented, but with an *empty* body, this means
> that nothing is done, and this means that the thread that should be created
> for killing the job in a maximum amount of time is not created at all, and
> that's why i'm not seeing LOG information related to this option on logs.
>
> I try to run a job with the  giraph.trackJobProgressOnClient set in true,
> but when i did this all my containers get this exception:
> java.lang.NoClassDefFoundError: org/apache/thrift/transport/TTransport
>
> Apparently, when i put the giraph.trackJobProgressOnClient on true,
> a RetryableJobProgressTrackerClient client is created instead of
> JobProgressTrackerClientNoOp, and RetryableJobProgressTrackerClient uses
> classes fhat i don't have available on my classpath like
> org/apache/thrift/transport/TTransport. Should i start to download jars
> dependencies until the NoClassDeffFoundError is solved, or there is a
> better workaround for this problem?
>
> Any help will be greatly appreciated.
>
> bye!
> José
>
>
>
>
> --
> *José Luis Larroque*
> Analista Programador Universitario - Facultad de Informática - UNLP
> Desarrollador Java  en LIFIA
>
> 2017-01-25 22:51 GMT-03:00 José Luis Larroque <la...@gmail.com>:
>
>> Sorry, i forgot to attach the log files, here they are:
>>
>>
>> --
>> *José Luis Larroque*
>> Analista Programador Universitario - Facultad de Informática - UNLP
>> Desarrollador Java  en LIFIA
>>
>> 2017-01-25 22:50 GMT-03:00 José Luis Larroque <la...@gmail.com>:
>>
>>> Hi Sergey, thanks for your answer and sorry for my delay.
>>>
>>> I'm using Hadoop 2.4.0 and Giraph 1.1. In this version of Giraph, i'm
>>> using this one i believe:
>>> https://github.com/apache/giraph/blob/release-1.1/giraph-cor
>>> e/src/main/java/org/apache/giraph/job/JobProgressTrackerServ
>>> ice.java#L136
>>>
>>> I'm using this job parameters:
>>> -w 4 -yh 5700 -ca giraph.metrics.enable=true,gir
>>> aph.useOutOfCoreMessages=true,giraph.isStaticGraph=true,gira
>>> ph.maxAllowedJobTimeMilliseconds=10000
>>>
>>> I'm using a cluster of 1 master and 4 slaves in AWS.
>>>
>>> I send attached logs from three different containers. I have a superstep
>>> that took 12 seconds and the entire Giraph application doesn't get stopped.
>>>
>>> Thanks in advance!
>>>
>>> --
>>> *José Luis Larroque*
>>> Analista Programador Universitario - Facultad de Informática - UNLP
>>> Desarrollador Java  en LIFIA
>>>
>>> 2017-01-25 1:33 GMT-03:00 Sergey Edunov <ed...@gmail.com>:
>>>
>>>> Hello José,
>>>>
>>>> giraph.maxAllowedJobTimeMilliseconds is supposed to do exactly what
>>>> you want, see the code here:
>>>> https://github.com/apache/giraph/blob/trunk/giraph-core/src/
>>>> main/java/org/apache/giraph/job/DefaultJobProgressTrackerSer
>>>> vice.java#L123
>>>>
>>>> However, I have never tested it with any hadoop distro other than
>>>> hadoop 1.0, so maybe it doesn't work in your environment.
>>>>
>>>> Can you share exact configuration (job parameters, and hadoop version)
>>>> and what messages do you see in the log?
>>>>
>>>> Regards,
>>>> Sergey Edunov
>>>>
>>>>
>>>> On Tue, Jan 24, 2017 at 7:26 PM, José Luis Larroque
>>>> <la...@gmail.com> wrote:
>>>> > I have to execute several Giraph process in AWS. For doing it, i have
>>>> a
>>>> > script that launch one process after another until all process are
>>>> finished.
>>>> > The problem is that some times, a container gets killed, and i spent
>>>> a lot
>>>> > of time waiting for the entire giraph app gets killed, so the
>>>> following can
>>>> > start. I'm trying to diminish this time, because i know that a
>>>> process that
>>>> > takes more than 5 minutes isn't going to be ended (i prefer get a few
>>>> giraph
>>>> > process being killed, if the maximum time for executing all of them
>>>> gets
>>>> > reduced significantly).
>>>> >
>>>> > I already try putting a "maximum amount of time" with the following
>>>> options
>>>> > putting a really low value (1 milisecond):
>>>> >
>>>> > giraph.waitTaskDoneTimeoutMs -> This option make the container throw
>>>> an
>>>> > IllegalStateException but doens's stop the Giraph app from running. I
>>>> know
>>>> > that this option have a bug reported, but i hope that is not the case
>>>> here.
>>>> > giraph.maxAllowedJobTimeMilliseconds -> With LOG level in DEBUG, i
>>>> couldn't
>>>> > see any impact of using this option.
>>>> >
>>>> > But yet, i'm not getting the expected result, and i have Giraph
>>>> applications
>>>> > that take like 12000 seconds or more (a big waste of time, resources
>>>> and
>>>> > money).
>>>> >
>>>> >
>>>> > Any help will be greatly appreciated.
>>>> >
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> > --
>>>> > José Luis Larroque
>>>> > Analista Programador Universitario - Facultad de Informática - UNLP
>>>> > Desarrollador Java  en LIFIA
>>>>
>>>
>>>
>>
>