You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hama.apache.org by Steven van Beelen <sm...@gmail.com> on 2013/11/20 10:22:56 UTC

HAMA jobs failing, with no debug message - 2

I have a very similar problem as Anveshi Charuvaka is mailing about.

What I found additionally when I set task logging to DEBUG mode, is that
the DEBUG logs get interrupted at same point and replaced with the "INFO
bsp.BSPJobClient: Job failed." message.
My program works in local, distributed and pseudo mode, so that's probably
not the issue.

The only case the program does run, is when I use the maximum number of
machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the maximum
number of tasks to 12 per node, thus 84. But when I force the program to
run with 60 tasks, the "Job Failed" comes up with no additional info.

Last note: I'm running an Inverted Indexing algorithm with a data set of
approximately 17 GB.
Could someone help me with this?

Regards, Steven

Re: HAMA jobs failing, with no debug message - 2

Posted by Steven van Beelen <sm...@gmail.com>.

Thanks for the info, I'll try it out!
To bad there is no 'Sorted Spilling Message Queue' yet ;-)


On Wed, Nov 20, 2013 at 3:09 PM, Edward J. Yoon <ed...@apache.org>wrote:

> > Can I combine the Spilling Queue with the Sorted Message Queue? (e.g.
>
> Work in progress. HAMA-723
>
> > My program has only one super step.
>
> That's why your program consumes large memory. If you call sync()
> periodically, you might be able to avoid huge consumption of memory.
>
> On Wed, Nov 20, 2013 at 10:58 PM, Steven van Beelen
> <sm...@gmail.com> wrote:
> > Can I combine the Spilling Queue with the Sorted Message Queue? (e.g.
> > conf.set(MessageManager.QUEUE_TYPE_CLASS,
> > "org.apache.hama.bsp.message.queue.SortedMessageQueue");)
> > My implementation inclines the messages to be received sorted, hence the
> > question.
> >
> > My program has only one superstep. It is an implementation of Inverted
> > Indexing which first reads in a Sequence File consisting of <key, value>
> > pairs where the key is a Text object and the value a IntWritable.
> > The program first parses the Texts Objects, stores each separate word and
> > its frequency. After each document, it sends a messages to another peer
> > containing the word, document id and the frequency.
> > If all the documents have been worked through, sync() is called.
> > After that, a list is created for every word, consisting of all the
> > <document_id, frequency> pairs found.
> >
> >
> > On Wed, Nov 20, 2013 at 2:40 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Why don't you use Spilling Queue? Then, it'll work without no problem.
> >>
> >> >> > Last note: I'm running an Inverted Indexing algorithm with a data
> set
> >> of
> >> >> > approximately 17 GB.
> >>
> >> How many supersteps is needed? If your job is too
> >> communication-intensive, maybe you should consider another approach.
> >>
> >> On Wed, Nov 20, 2013 at 10:14 PM, Steven van Beelen
> >> <sm...@gmail.com> wrote:
> >> > Hi Edward,
> >> >
> >> > That was the issue I was thinking of first. So, I increased
> >> > bsp.child.java.opts to 8Gb and that of the Groomservers to 4Gb.
> >> > After that, the 84-tasks run worked, but with 60 tasks it fails as
> said
> >> > above.
> >> > Should I give it more memory? I would think that these amounts per
> >> > task/Groomserver should be enough.
> >> >
> >> > Regars, Steven
> >> >
> >> >
> >> >
> >> > On Wed, Nov 20, 2013 at 12:16 PM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >wrote:
> >> >
> >> >> > The only case the program does run, is when I use the maximum
> number
> >> of
> >> >> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the
> >> maximum
> >> >> > number of tasks to 12 per node, thus 84. But when I force the
> program
> >> to
> >> >> run
> >> >> > with 60 tasks, the "Job Failed" comes up with no additional info.
> >> >>
> >> >> Your case looks like a memory problem. Can you check the memory space
> >> >> during job execution? or try to increase the max heap of BSP child
> >> >> JVM.
> >> >>
> >> >> > the "Job Failed" comes up with no additional info.
> >> >>
> >> >> Sorry for the inconvenience, i'll check it out and see what's wrong.
> >> >>
> >> >> On Wed, Nov 20, 2013 at 6:22 PM, Steven van Beelen <
> >> smcvbeelen@gmail.com>
> >> >> wrote:
> >> >> > I have a very similar problem as Anveshi Charuvaka is mailing
> about.
> >> >> >
> >> >> > What I found additionally when I set task logging to DEBUG mode, is
> >> that
> >> >> the
> >> >> > DEBUG logs get interrupted at same point and replaced with the
> "INFO
> >> >> > bsp.BSPJobClient: Job failed." message.
> >> >> > My program works in local, distributed and pseudo mode, so that's
> >> >> probably
> >> >> > not the issue.
> >> >> >
> >> >> > The only case the program does run, is when I use the maximum
> number
> >> of
> >> >> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the
> >> maximum
> >> >> > number of tasks to 12 per node, thus 84. But when I force the
> program
> >> to
> >> >> run
> >> >> > with 60 tasks, the "Job Failed" comes up with no additional info.
> >> >> >
> >> >> > Last note: I'm running an Inverted Indexing algorithm with a data
> set
> >> of
> >> >> > approximately 17 GB.
> >> >> > Could someone help me with this?
> >> >> >
> >> >> > Regards, Steven
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: HAMA jobs failing, with no debug message - 2

Posted by "Edward J. Yoon" <ed...@apache.org>.

> Can I combine the Spilling Queue with the Sorted Message Queue? (e.g.

Work in progress. HAMA-723

> My program has only one super step.

That's why your program consumes large memory. If you call sync()
periodically, you might be able to avoid huge consumption of memory.

On Wed, Nov 20, 2013 at 10:58 PM, Steven van Beelen
<sm...@gmail.com> wrote:
> Can I combine the Spilling Queue with the Sorted Message Queue? (e.g.
> conf.set(MessageManager.QUEUE_TYPE_CLASS,
> "org.apache.hama.bsp.message.queue.SortedMessageQueue");)
> My implementation inclines the messages to be received sorted, hence the
> question.
>
> My program has only one superstep. It is an implementation of Inverted
> Indexing which first reads in a Sequence File consisting of <key, value>
> pairs where the key is a Text object and the value a IntWritable.
> The program first parses the Texts Objects, stores each separate word and
> its frequency. After each document, it sends a messages to another peer
> containing the word, document id and the frequency.
> If all the documents have been worked through, sync() is called.
> After that, a list is created for every word, consisting of all the
> <document_id, frequency> pairs found.
>
>
> On Wed, Nov 20, 2013 at 2:40 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Why don't you use Spilling Queue? Then, it'll work without no problem.
>>
>> >> > Last note: I'm running an Inverted Indexing algorithm with a data set
>> of
>> >> > approximately 17 GB.
>>
>> How many supersteps is needed? If your job is too
>> communication-intensive, maybe you should consider another approach.
>>
>> On Wed, Nov 20, 2013 at 10:14 PM, Steven van Beelen
>> <sm...@gmail.com> wrote:
>> > Hi Edward,
>> >
>> > That was the issue I was thinking of first. So, I increased
>> > bsp.child.java.opts to 8Gb and that of the Groomservers to 4Gb.
>> > After that, the 84-tasks run worked, but with 60 tasks it fails as said
>> > above.
>> > Should I give it more memory? I would think that these amounts per
>> > task/Groomserver should be enough.
>> >
>> > Regars, Steven
>> >
>> >
>> >
>> > On Wed, Nov 20, 2013 at 12:16 PM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >
>> >> > The only case the program does run, is when I use the maximum number
>> of
>> >> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the
>> maximum
>> >> > number of tasks to 12 per node, thus 84. But when I force the program
>> to
>> >> run
>> >> > with 60 tasks, the "Job Failed" comes up with no additional info.
>> >>
>> >> Your case looks like a memory problem. Can you check the memory space
>> >> during job execution? or try to increase the max heap of BSP child
>> >> JVM.
>> >>
>> >> > the "Job Failed" comes up with no additional info.
>> >>
>> >> Sorry for the inconvenience, i'll check it out and see what's wrong.
>> >>
>> >> On Wed, Nov 20, 2013 at 6:22 PM, Steven van Beelen <
>> smcvbeelen@gmail.com>
>> >> wrote:
>> >> > I have a very similar problem as Anveshi Charuvaka is mailing about.
>> >> >
>> >> > What I found additionally when I set task logging to DEBUG mode, is
>> that
>> >> the
>> >> > DEBUG logs get interrupted at same point and replaced with the "INFO
>> >> > bsp.BSPJobClient: Job failed." message.
>> >> > My program works in local, distributed and pseudo mode, so that's
>> >> probably
>> >> > not the issue.
>> >> >
>> >> > The only case the program does run, is when I use the maximum number
>> of
>> >> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the
>> maximum
>> >> > number of tasks to 12 per node, thus 84. But when I force the program
>> to
>> >> run
>> >> > with 60 tasks, the "Job Failed" comes up with no additional info.
>> >> >
>> >> > Last note: I'm running an Inverted Indexing algorithm with a data set
>> of
>> >> > approximately 17 GB.
>> >> > Could someone help me with this?
>> >> >
>> >> > Regards, Steven
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: HAMA jobs failing, with no debug message - 2

Posted by Steven van Beelen <sm...@gmail.com>.

Can I combine the Spilling Queue with the Sorted Message Queue? (e.g.
conf.set(MessageManager.QUEUE_TYPE_CLASS,
"org.apache.hama.bsp.message.queue.SortedMessageQueue");)
My implementation inclines the messages to be received sorted, hence the
question.

My program has only one superstep. It is an implementation of Inverted
Indexing which first reads in a Sequence File consisting of <key, value>
pairs where the key is a Text object and the value a IntWritable.
The program first parses the Texts Objects, stores each separate word and
its frequency. After each document, it sends a messages to another peer
containing the word, document id and the frequency.
If all the documents have been worked through, sync() is called.
After that, a list is created for every word, consisting of all the
<document_id, frequency> pairs found.


On Wed, Nov 20, 2013 at 2:40 PM, Edward J. Yoon <ed...@apache.org>wrote:

> Why don't you use Spilling Queue? Then, it'll work without no problem.
>
> >> > Last note: I'm running an Inverted Indexing algorithm with a data set
> of
> >> > approximately 17 GB.
>
> How many supersteps is needed? If your job is too
> communication-intensive, maybe you should consider another approach.
>
> On Wed, Nov 20, 2013 at 10:14 PM, Steven van Beelen
> <sm...@gmail.com> wrote:
> > Hi Edward,
> >
> > That was the issue I was thinking of first. So, I increased
> > bsp.child.java.opts to 8Gb and that of the Groomservers to 4Gb.
> > After that, the 84-tasks run worked, but with 60 tasks it fails as said
> > above.
> > Should I give it more memory? I would think that these amounts per
> > task/Groomserver should be enough.
> >
> > Regars, Steven
> >
> >
> >
> > On Wed, Nov 20, 2013 at 12:16 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> > The only case the program does run, is when I use the maximum number
> of
> >> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the
> maximum
> >> > number of tasks to 12 per node, thus 84. But when I force the program
> to
> >> run
> >> > with 60 tasks, the "Job Failed" comes up with no additional info.
> >>
> >> Your case looks like a memory problem. Can you check the memory space
> >> during job execution? or try to increase the max heap of BSP child
> >> JVM.
> >>
> >> > the "Job Failed" comes up with no additional info.
> >>
> >> Sorry for the inconvenience, i'll check it out and see what's wrong.
> >>
> >> On Wed, Nov 20, 2013 at 6:22 PM, Steven van Beelen <
> smcvbeelen@gmail.com>
> >> wrote:
> >> > I have a very similar problem as Anveshi Charuvaka is mailing about.
> >> >
> >> > What I found additionally when I set task logging to DEBUG mode, is
> that
> >> the
> >> > DEBUG logs get interrupted at same point and replaced with the "INFO
> >> > bsp.BSPJobClient: Job failed." message.
> >> > My program works in local, distributed and pseudo mode, so that's
> >> probably
> >> > not the issue.
> >> >
> >> > The only case the program does run, is when I use the maximum number
> of
> >> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the
> maximum
> >> > number of tasks to 12 per node, thus 84. But when I force the program
> to
> >> run
> >> > with 60 tasks, the "Job Failed" comes up with no additional info.
> >> >
> >> > Last note: I'm running an Inverted Indexing algorithm with a data set
> of
> >> > approximately 17 GB.
> >> > Could someone help me with this?
> >> >
> >> > Regards, Steven
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: HAMA jobs failing, with no debug message - 2

Posted by "Edward J. Yoon" <ed...@apache.org>.

Why don't you use Spilling Queue? Then, it'll work without no problem.

>> > Last note: I'm running an Inverted Indexing algorithm with a data set of
>> > approximately 17 GB.

How many supersteps is needed? If your job is too
communication-intensive, maybe you should consider another approach.

On Wed, Nov 20, 2013 at 10:14 PM, Steven van Beelen
<sm...@gmail.com> wrote:
> Hi Edward,
>
> That was the issue I was thinking of first. So, I increased
> bsp.child.java.opts to 8Gb and that of the Groomservers to 4Gb.
> After that, the 84-tasks run worked, but with 60 tasks it fails as said
> above.
> Should I give it more memory? I would think that these amounts per
> task/Groomserver should be enough.
>
> Regars, Steven
>
>
>
> On Wed, Nov 20, 2013 at 12:16 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> > The only case the program does run, is when I use the maximum number of
>> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the maximum
>> > number of tasks to 12 per node, thus 84. But when I force the program to
>> run
>> > with 60 tasks, the "Job Failed" comes up with no additional info.
>>
>> Your case looks like a memory problem. Can you check the memory space
>> during job execution? or try to increase the max heap of BSP child
>> JVM.
>>
>> > the "Job Failed" comes up with no additional info.
>>
>> Sorry for the inconvenience, i'll check it out and see what's wrong.
>>
>> On Wed, Nov 20, 2013 at 6:22 PM, Steven van Beelen <sm...@gmail.com>
>> wrote:
>> > I have a very similar problem as Anveshi Charuvaka is mailing about.
>> >
>> > What I found additionally when I set task logging to DEBUG mode, is that
>> the
>> > DEBUG logs get interrupted at same point and replaced with the "INFO
>> > bsp.BSPJobClient: Job failed." message.
>> > My program works in local, distributed and pseudo mode, so that's
>> probably
>> > not the issue.
>> >
>> > The only case the program does run, is when I use the maximum number of
>> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the maximum
>> > number of tasks to 12 per node, thus 84. But when I force the program to
>> run
>> > with 60 tasks, the "Job Failed" comes up with no additional info.
>> >
>> > Last note: I'm running an Inverted Indexing algorithm with a data set of
>> > approximately 17 GB.
>> > Could someone help me with this?
>> >
>> > Regards, Steven
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: HAMA jobs failing, with no debug message - 2

Posted by Steven van Beelen <sm...@gmail.com>.

Hi Edward,

That was the issue I was thinking of first. So, I increased
bsp.child.java.opts to 8Gb and that of the Groomservers to 4Gb.
After that, the 84-tasks run worked, but with 60 tasks it fails as said
above.
Should I give it more memory? I would think that these amounts per
task/Groomserver should be enough.

Regars, Steven



On Wed, Nov 20, 2013 at 12:16 PM, Edward J. Yoon <ed...@apache.org>wrote:

> > The only case the program does run, is when I use the maximum number of
> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the maximum
> > number of tasks to 12 per node, thus 84. But when I force the program to
> run
> > with 60 tasks, the "Job Failed" comes up with no additional info.
>
> Your case looks like a memory problem. Can you check the memory space
> during job execution? or try to increase the max heap of BSP child
> JVM.
>
> > the "Job Failed" comes up with no additional info.
>
> Sorry for the inconvenience, i'll check it out and see what's wrong.
>
> On Wed, Nov 20, 2013 at 6:22 PM, Steven van Beelen <sm...@gmail.com>
> wrote:
> > I have a very similar problem as Anveshi Charuvaka is mailing about.
> >
> > What I found additionally when I set task logging to DEBUG mode, is that
> the
> > DEBUG logs get interrupted at same point and replaced with the "INFO
> > bsp.BSPJobClient: Job failed." message.
> > My program works in local, distributed and pseudo mode, so that's
> probably
> > not the issue.
> >
> > The only case the program does run, is when I use the maximum number of
> > machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the maximum
> > number of tasks to 12 per node, thus 84. But when I force the program to
> run
> > with 60 tasks, the "Job Failed" comes up with no additional info.
> >
> > Last note: I'm running an Inverted Indexing algorithm with a data set of
> > approximately 17 GB.
> > Could someone help me with this?
> >
> > Regards, Steven
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: HAMA jobs failing, with no debug message - 2

Posted by "Edward J. Yoon" <ed...@apache.org>.

> The only case the program does run, is when I use the maximum number of
> machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the maximum
> number of tasks to 12 per node, thus 84. But when I force the program to run
> with 60 tasks, the "Job Failed" comes up with no additional info.

Your case looks like a memory problem. Can you check the memory space
during job execution? or try to increase the max heap of BSP child
JVM.

> the "Job Failed" comes up with no additional info.

Sorry for the inconvenience, i'll check it out and see what's wrong.

On Wed, Nov 20, 2013 at 6:22 PM, Steven van Beelen <sm...@gmail.com> wrote:
> I have a very similar problem as Anveshi Charuvaka is mailing about.
>
> What I found additionally when I set task logging to DEBUG mode, is that the
> DEBUG logs get interrupted at same point and replaced with the "INFO
> bsp.BSPJobClient: Job failed." message.
> My program works in local, distributed and pseudo mode, so that's probably
> not the issue.
>
> The only case the program does run, is when I use the maximum number of
> machines (i.e. 7 machines, with 12 cores, 128GB ram..). I set the maximum
> number of tasks to 12 per node, thus 84. But when I force the program to run
> with 60 tasks, the "Job Failed" comes up with no additional info.
>
> Last note: I'm running an Inverted Indexing algorithm with a data set of
> approximately 17 GB.
> Could someone help me with this?
>
> Regards, Steven



-- 
Best Regards, Edward J. Yoon
@eddieyoon