You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by John Bergstrom <hi...@gmail.com> on 2009/03/19 19:48:45 UTC

Will Hadoop help for my application?

Hi,

Can anyone tell me if Hadoop is appropriate for the following application.

I need to perform optimization using a single, small input data set.
To get a good result I must make many independent runs of the
optimizer, where each run is initiated with a different starting
point. At completion, I just choose the best solution from all of the
runs. So my problem is not that I'm working with big data, I just want
to speed up my run time by linking several Ubuntu desktops that are
available to me. The optimizer is written in ANSI C.

Thanks,

John

Re: Will Hadoop help for my application?

Posted by John Bergstrom <hi...@gmail.com>.

Thanks to you all. You have been very helpful. I've gotten lots of
good information.

Regards,

John

Re: Will Hadoop help for my application?

Posted by Stefan Podkowinski <sp...@gmail.com>.

Using genetic algorithms may also work for your case.

See http://jgap.sourceforge.net/
This one supports grid environment execution too.


On Thu, Mar 19, 2009 at 7:48 PM, John Bergstrom <hi...@gmail.com> wrote:
> Hi,
>
> Can anyone tell me if Hadoop is appropriate for the following application.
>
> I need to perform optimization using a single, small input data set.
> To get a good result I must make many independent runs of the
> optimizer, where each run is initiated with a different starting
> point. At completion, I just choose the best solution from all of the
> runs. So my problem is not that I'm working with big data, I just want
> to speed up my run time by linking several Ubuntu desktops that are
> available to me. The optimizer is written in ANSI C.
>
> Thanks,
>
> John
>

Re: Will Hadoop help for my application?

Posted by Ted Dunning <te...@gmail.com>.

You can use a randomized reduce key to parallelize the comparison of
different runs.  Each reduce key would be in a small range of integers (say
0..100).  Each reducer would then be in charge of keeping only the best
solution.  The final output would be 100 values which could be compared
conventionally.

Whether this would help really depends on how many runs you have.  If it is
less than millions, this probably doesn't matter and Miles suggestion is
fine.

On Thu, Mar 19, 2009 at 11:54 AM, Miles Osborne <mi...@inf.ed.ac.uk> wrote:

> you won't need any reducers.




-- 
Ted Dunning, CTO
DeepDyve

Re: Will Hadoop help for my application?

Posted by Miles Osborne <mi...@inf.ed.ac.uk>.

yes, this is perfectly fine:  make each mapper one of your runs and
simply emit the final result, along with the conditions leading to
that result.

you won't need any reducers.

Miles

2009/3/19 John Bergstrom <hi...@gmail.com>:
> Hi,
>
> Can anyone tell me if Hadoop is appropriate for the following application.
>
> I need to perform optimization using a single, small input data set.
> To get a good result I must make many independent runs of the
> optimizer, where each run is initiated with a different starting
> point. At completion, I just choose the best solution from all of the
> runs. So my problem is not that I'm working with big data, I just want
> to speed up my run time by linking several Ubuntu desktops that are
> available to me. The optimizer is written in ANSI C.
>
> Thanks,
>
> John
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: Will Hadoop help for my application?

Posted by Mark Kerzner <ma...@gmail.com>.

My feeling is that JavaSpaces could be a good choice. Here is my plan:

   - Have one machine running JavaSpaces (using GigaSpaces free community
   version), put the data in there, with a small object to keep the staring
   point;
   - Each worker machine reads the Space (all workers can read at the same
   time, no lock), and also updates the starting point object (get it from the
   Space, update, put it back) - this is a locking operation, but fact;
   - Results go back into the Space.

Interesting what you will do in the end.

Mark

On Thu, Mar 19, 2009 at 1:48 PM, John Bergstrom <hi...@gmail.com>wrote:

> Hi,
>
> Can anyone tell me if Hadoop is appropriate for the following application.
>
> I need to perform optimization using a single, small input data set.
> To get a good result I must make many independent runs of the
> optimizer, where each run is initiated with a different starting
> point. At completion, I just choose the best solution from all of the
> runs. So my problem is not that I'm working with big data, I just want
> to speed up my run time by linking several Ubuntu desktops that are
> available to me. The optimizer is written in ANSI C.
>
> Thanks,
>
> John
>

Re: Will Hadoop help for my application?

Posted by tim robertson <ti...@gmail.com>.

You might make use of the Hadoop scheduler and task management to
initiate the jobs, and writing the results back to the hadoop
filesystem but I would guess there are better ways of doing this than
using hadoop just for this scheduling (perhaps a simple web service on
each machine through which you can remotely trigger the processing?).
I am by no means a Hadoop expert though.

Cheers,

Tim

On Thu, Mar 19, 2009 at 7:48 PM, John Bergstrom <hi...@gmail.com> wrote:
> Hi,
>
> Can anyone tell me if Hadoop is appropriate for the following application.
>
> I need to perform optimization using a single, small input data set.
> To get a good result I must make many independent runs of the
> optimizer, where each run is initiated with a different starting
> point. At completion, I just choose the best solution from all of the
> runs. So my problem is not that I'm working with big data, I just want
> to speed up my run time by linking several Ubuntu desktops that are
> available to me. The optimizer is written in ANSI C.
>
> Thanks,
>
> John
>