You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jackob Carlsson <ja...@gmail.com> on 2010/08/10 16:00:59 UTC

Hadoop and Cloud computing

Hi,

I am trying to write a thesis proposal about my PhD about usage of hadoop in
cloud computing. I need to find some open problems in cloud computing which
can be addressed by hadoop. I would appreciate if somebody could help me to
find some topics.

Thanks in advance
Jackob

Re: Hadoop and Cloud computing

Posted by Josh Patterson <jo...@cloudera.com>.
Jackob,
Hadoop and MapReduce should not be thought of in the realm of only
text based processing; projects like the openPDC have shown Map Reduce
to be a very good at processing TBs of high resolution timeseries data
(from binary formats). Take a look at the "powered by" page on
hadoop.apache.org to look for other ideas as well.

In terms of architecture, there are some interesting discussions in
the HBase realm about dealing with hot spots and how HBase and HDFS
work together. You might ask Jonathan Gray or Michael Stack over in
the hbase irc channel about what they are facing, I'm sure they would
provide you with an interesting discussion.

Josh

On Wed, Aug 11, 2010 at 12:42 PM, Jackob Carlsson
<ja...@gmail.com> wrote:
> Hi Josh,
>
> I would say the second case. As you know MapReduce algorithms more or less
> fits with text pressing but I'm looking for some issues such as large scale
> data handling. I would appreciate if you can point me to some related topics
> if you know.
>
> Best regards,
> Jackob
>
>
>
>
> On Wed, Aug 11, 2010 at 5:10 PM, Josh Patterson <jo...@cloudera.com> wrote:
>
>> Jackob,
>> Are you looking for problems to solve with Map Reduce on Hadoop or
>> open problems to be solved in the architecture of hadoop?
>>
>> Josh Patterson
>> Cloudera
>>
>> On Tue, Aug 10, 2010 at 10:00 AM, Jackob Carlsson
>> <ja...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am trying to write a thesis proposal about my PhD about usage of hadoop
>> in
>> > cloud computing. I need to find some open problems in cloud computing
>> which
>> > can be addressed by hadoop. I would appreciate if somebody could help me
>> to
>> > find some topics.
>> >
>> > Thanks in advance
>> > Jackob
>> >
>>
>

Re: Hadoop and Cloud computing

Posted by Jackob Carlsson <ja...@gmail.com>.
Hi Josh,

I would say the second case. As you know MapReduce algorithms more or less
fits with text pressing but I'm looking for some issues such as large scale
data handling. I would appreciate if you can point me to some related topics
if you know.

Best regards,
Jackob




On Wed, Aug 11, 2010 at 5:10 PM, Josh Patterson <jo...@cloudera.com> wrote:

> Jackob,
> Are you looking for problems to solve with Map Reduce on Hadoop or
> open problems to be solved in the architecture of hadoop?
>
> Josh Patterson
> Cloudera
>
> On Tue, Aug 10, 2010 at 10:00 AM, Jackob Carlsson
> <ja...@gmail.com> wrote:
> > Hi,
> >
> > I am trying to write a thesis proposal about my PhD about usage of hadoop
> in
> > cloud computing. I need to find some open problems in cloud computing
> which
> > can be addressed by hadoop. I would appreciate if somebody could help me
> to
> > find some topics.
> >
> > Thanks in advance
> > Jackob
> >
>

Re: Hadoop and Cloud computing

Posted by da...@ontrenet.com.
Map/Reduce was designed primarily for parallel batch processing text.
In this respect, its best suited to data sets that are homogenous and can
be divided (mapped) and merged (reduced) without impacting the algorithm
efficacy.

As such, its not well suited to general purpose algorithms or non-uniform
data sets, but rather more for transformative batch processing, in my
opinion.


> On Tue, Aug 10, 2010 at 10:00 AM, Jackob Carlsson
> <ja...@gmail.com> wrote:
>> Hi,
>>
>> I am trying to write a thesis proposal about my PhD about usage of
>> hadoop in
>> cloud computing. I need to find some open problems in cloud computing
>> which
>> can be addressed by hadoop. I would appreciate if somebody could help me
>> to
>> find some topics.
>>
>> Thanks in advance
>> Jackob
>>
>


Re: Hadoop and Cloud computing

Posted by Josh Patterson <jo...@cloudera.com>.
Jackob,
Are you looking for problems to solve with Map Reduce on Hadoop or
open problems to be solved in the architecture of hadoop?

Josh Patterson
Cloudera

On Tue, Aug 10, 2010 at 10:00 AM, Jackob Carlsson
<ja...@gmail.com> wrote:
> Hi,
>
> I am trying to write a thesis proposal about my PhD about usage of hadoop in
> cloud computing. I need to find some open problems in cloud computing which
> can be addressed by hadoop. I would appreciate if somebody could help me to
> find some topics.
>
> Thanks in advance
> Jackob
>

Re: Hadoop and Cloud computing

Posted by Jackob Carlsson <ja...@gmail.com>.
Thank you Steve for the useful links. Unfortunately, I'm not in UK and I'll
be busy by that time to meet you in the opentech. But in case of more help,
I'll try to contact you.



On Wed, Aug 11, 2010 at 11:43 AM, Steve Loughran <st...@apache.org> wrote:

> On 10/08/10 15:00, Jackob Carlsson wrote:
>
>> Hi,
>>
>> I am trying to write a thesis proposal about my PhD about usage of hadoop
>> in
>> cloud computing. I need to find some open problems in cloud computing
>> which
>> can be addressed by hadoop. I would appreciate if somebody could help me
>> to
>> find some topics.
>>
>> Thanks in advance
>> Jackob
>>
>>
> This might be a starting point
> http://www.slideshare.net/steve_l/hadoop-and-universities
>
> * what do you mean by "cloud computing"; if it is VM-hosted code running on
> Pay-as-yo-go Infrastructure, this is the kind of problem:
> http://www.slideshare.net/steve_l/farming-hadoop-inthecloud
>
>  -placing VMs close to the data
>  -handling failure differently (don't blacklist, kill the VM)
>  -making Hadoop and its clients more adaptive to clusters where the
> machines are moving around more.
>
> Other options
>  -running Hadoop physically, but use the spare cycles/memory for other
> work, so the tasktrackers must co-ordinate Hadoop work scheduling with other
> work
>
>  -running Hadoop directly against the underlying filesystem of the
> infrastructure, instead of HDFS.
> http://www.slideshare.net/steve_l/high-availability-hadoop
>
>
> Where are you based? If you are in the UK we could meet some time, I'll be
> at the opentech event in London next month.
>

Re: Hadoop and Cloud computing

Posted by Amandeep Khurana <am...@gmail.com>.
Jacob,

One of things that you can consider looking at is maintaining data
locality during the expansion and contraction of an on demand cluster.
Maybe the on demand cluster resizing can be done intelligently without
impacting the locality much. Ofcourse it would mean moving some data
around while the resizing happens but how can it be minimized? Add
HBase into the equation as well if you'd like.

Just an idea I had while discussing this with a colleague earlier.

-Amandeep

Sent from my iPhone

On Aug 11, 2010, at 2:43 AM, Steve Loughran <st...@apache.org> wrote:

> On 10/08/10 15:00, Jackob Carlsson wrote:
>> Hi,
>>
>> I am trying to write a thesis proposal about my PhD about usage of hadoop in
>> cloud computing. I need to find some open problems in cloud computing which
>> can be addressed by hadoop. I would appreciate if somebody could help me to
>> find some topics.
>>
>> Thanks in advance
>> Jackob
>>
>
> This might be a starting point
> http://www.slideshare.net/steve_l/hadoop-and-universities
>
> * what do you mean by "cloud computing"; if it is VM-hosted code running on Pay-as-yo-go Infrastructure, this is the kind of problem:
> http://www.slideshare.net/steve_l/farming-hadoop-inthecloud
>
> -placing VMs close to the data
> -handling failure differently (don't blacklist, kill the VM)
> -making Hadoop and its clients more adaptive to clusters where the machines are moving around more.
>
> Other options
> -running Hadoop physically, but use the spare cycles/memory for other work, so the tasktrackers must co-ordinate Hadoop work scheduling with other work
>
> -running Hadoop directly against the underlying filesystem of the infrastructure, instead of HDFS.
> http://www.slideshare.net/steve_l/high-availability-hadoop
>
>
> Where are you based? If you are in the UK we could meet some time, I'll be at the opentech event in London next month.

Re: Hadoop and Cloud computing

Posted by Steve Loughran <st...@apache.org>.
On 10/08/10 15:00, Jackob Carlsson wrote:
> Hi,
>
> I am trying to write a thesis proposal about my PhD about usage of hadoop in
> cloud computing. I need to find some open problems in cloud computing which
> can be addressed by hadoop. I would appreciate if somebody could help me to
> find some topics.
>
> Thanks in advance
> Jackob
>

This might be a starting point
http://www.slideshare.net/steve_l/hadoop-and-universities

* what do you mean by "cloud computing"; if it is VM-hosted code running 
on Pay-as-yo-go Infrastructure, this is the kind of problem:
http://www.slideshare.net/steve_l/farming-hadoop-inthecloud

  -placing VMs close to the data
  -handling failure differently (don't blacklist, kill the VM)
  -making Hadoop and its clients more adaptive to clusters where the 
machines are moving around more.

Other options
  -running Hadoop physically, but use the spare cycles/memory for other 
work, so the tasktrackers must co-ordinate Hadoop work scheduling with 
other work

  -running Hadoop directly against the underlying filesystem of the 
infrastructure, instead of HDFS.
http://www.slideshare.net/steve_l/high-availability-hadoop


Where are you based? If you are in the UK we could meet some time, I'll 
be at the opentech event in London next month.