You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Hyunsik Choi <c0...@gmail.com> on 2009/06/23 10:03:07 UTC

Is it possible? I want to group data blocks.

Hi all,

I would like to give data locality. In other words, I want to place
certain data blocks on one machine. In some problems, subsets of an
entire dataset need one another for answer. Most of the graph problems
are good examples.

Is it possible? If impossible, can you advice about that?

Thank you in advance.

- Hyunsik Choi -

Re: Is it possible? I want to group data blocks.

Posted by Tom White <to...@cloudera.com>.
You might be interested in
https://issues.apache.org/jira/browse/HDFS-385, where there is
discussion about how to add pluggable block placement to HDFS.

Cheers,
Tom

On Tue, Jun 23, 2009 at 5:50 PM, Alex Loddengaard<al...@cloudera.com> wrote:
> Hi Hyunsik,
>
> Unfortunately you can't control the servers that blocks go on.  Hadoop does
> block allocation for you, and it tries its best to distribute data evenly
> among the cluster, so long as replicated blocks reside on different
> machines, on different racks (assuming you've made Hadoop rack-aware).
>
> Hope this clears things up.
>
> Alex
>
> 2009/6/23 Hyunsik Choi <c0...@gmail.com>
>
>> Hi all,
>>
>> I would like to give data locality. In other words, I want to place
>> certain data blocks on one machine. In some problems, subsets of an
>> entire dataset need one another for answer. Most of the graph problems
>> are good examples.
>>
>> Is it possible? If impossible, can you advice about that?
>>
>> Thank you in advance.
>>
>> - Hyunsik Choi -
>>
>

Re: Is it possible? I want to group data blocks.

Posted by Alex Loddengaard <al...@cloudera.com>.
Hi Hyunsik,

Unfortunately you can't control the servers that blocks go on.  Hadoop does
block allocation for you, and it tries its best to distribute data evenly
among the cluster, so long as replicated blocks reside on different
machines, on different racks (assuming you've made Hadoop rack-aware).

Hope this clears things up.

Alex

2009/6/23 Hyunsik Choi <c0...@gmail.com>

> Hi all,
>
> I would like to give data locality. In other words, I want to place
> certain data blocks on one machine. In some problems, subsets of an
> entire dataset need one another for answer. Most of the graph problems
> are good examples.
>
> Is it possible? If impossible, can you advice about that?
>
> Thank you in advance.
>
> - Hyunsik Choi -
>