You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Sigurd Spieckermann <si...@gmail.com> on 2012/12/05 18:53:53 UTC

Tell Hadoop to store pairs of files at the same location(s) on HDFS

Hi guys,

I have been wondering if there's a way (hack'ish would be okay too) to tell
Hadoop that two files shall be stored together at the same location(s). It
would benefit map-side join performance if it could be done somehow because
all map tasks would be able to read data from a local copy. Does anyone
know a way?

-Sigurd

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by "M. C. Srivas" <mc...@gmail.com>.
MapR does this already .. and well beyond just 2 files.  One can arrange
things so that a boatload of files have all their replicas also placed on
the same set of nodes, ie,  files A ... Z will have replica1 on node1,
replica2 on node2, replica3 on node3. etc.  (nodes 1. 2 and 3 are picked by
the system based on utilization and node-fullness).




On Wed, Dec 5, 2012 at 11:26 AM, Sigurd Spieckermann <
sigurd.spieckermann@gmail.com> wrote:

> Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I hope
> this is coming soon!
>
> Am 05.12.2012 18:58, schrieb Harsh J:
>
>  You are probably talking of
>> https://issues.apache.org/**jira/browse/HDFS-2576<https://issues.apache.org/jira/browse/HDFS-2576>and similar JIRAs.
>> This feature isn't available in HDFS yet, but may arrive soon.
>>
>> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
>> <sigurd.spieckermann@gmail.com**> wrote:
>>
>>> Hi guys,
>>>
>>> I have been wondering if there's a way (hack'ish would be okay too) to
>>> tell
>>> Hadoop that two files shall be stored together at the same location(s).
>>> It
>>> would benefit map-side join performance if it could be done somehow
>>> because
>>> all map tasks would be able to read data from a local copy. Does anyone
>>> know
>>> a way?
>>>
>>> -Sigurd
>>>
>>
>>
>>
>>

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by "M. C. Srivas" <mc...@gmail.com>.
MapR does this already .. and well beyond just 2 files.  One can arrange
things so that a boatload of files have all their replicas also placed on
the same set of nodes, ie,  files A ... Z will have replica1 on node1,
replica2 on node2, replica3 on node3. etc.  (nodes 1. 2 and 3 are picked by
the system based on utilization and node-fullness).




On Wed, Dec 5, 2012 at 11:26 AM, Sigurd Spieckermann <
sigurd.spieckermann@gmail.com> wrote:

> Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I hope
> this is coming soon!
>
> Am 05.12.2012 18:58, schrieb Harsh J:
>
>  You are probably talking of
>> https://issues.apache.org/**jira/browse/HDFS-2576<https://issues.apache.org/jira/browse/HDFS-2576>and similar JIRAs.
>> This feature isn't available in HDFS yet, but may arrive soon.
>>
>> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
>> <sigurd.spieckermann@gmail.com**> wrote:
>>
>>> Hi guys,
>>>
>>> I have been wondering if there's a way (hack'ish would be okay too) to
>>> tell
>>> Hadoop that two files shall be stored together at the same location(s).
>>> It
>>> would benefit map-side join performance if it could be done somehow
>>> because
>>> all map tasks would be able to read data from a local copy. Does anyone
>>> know
>>> a way?
>>>
>>> -Sigurd
>>>
>>
>>
>>
>>

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by "M. C. Srivas" <mc...@gmail.com>.
MapR does this already .. and well beyond just 2 files.  One can arrange
things so that a boatload of files have all their replicas also placed on
the same set of nodes, ie,  files A ... Z will have replica1 on node1,
replica2 on node2, replica3 on node3. etc.  (nodes 1. 2 and 3 are picked by
the system based on utilization and node-fullness).




On Wed, Dec 5, 2012 at 11:26 AM, Sigurd Spieckermann <
sigurd.spieckermann@gmail.com> wrote:

> Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I hope
> this is coming soon!
>
> Am 05.12.2012 18:58, schrieb Harsh J:
>
>  You are probably talking of
>> https://issues.apache.org/**jira/browse/HDFS-2576<https://issues.apache.org/jira/browse/HDFS-2576>and similar JIRAs.
>> This feature isn't available in HDFS yet, but may arrive soon.
>>
>> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
>> <sigurd.spieckermann@gmail.com**> wrote:
>>
>>> Hi guys,
>>>
>>> I have been wondering if there's a way (hack'ish would be okay too) to
>>> tell
>>> Hadoop that two files shall be stored together at the same location(s).
>>> It
>>> would benefit map-side join performance if it could be done somehow
>>> because
>>> all map tasks would be able to read data from a local copy. Does anyone
>>> know
>>> a way?
>>>
>>> -Sigurd
>>>
>>
>>
>>
>>

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by "M. C. Srivas" <mc...@gmail.com>.
MapR does this already .. and well beyond just 2 files.  One can arrange
things so that a boatload of files have all their replicas also placed on
the same set of nodes, ie,  files A ... Z will have replica1 on node1,
replica2 on node2, replica3 on node3. etc.  (nodes 1. 2 and 3 are picked by
the system based on utilization and node-fullness).




On Wed, Dec 5, 2012 at 11:26 AM, Sigurd Spieckermann <
sigurd.spieckermann@gmail.com> wrote:

> Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I hope
> this is coming soon!
>
> Am 05.12.2012 18:58, schrieb Harsh J:
>
>  You are probably talking of
>> https://issues.apache.org/**jira/browse/HDFS-2576<https://issues.apache.org/jira/browse/HDFS-2576>and similar JIRAs.
>> This feature isn't available in HDFS yet, but may arrive soon.
>>
>> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
>> <sigurd.spieckermann@gmail.com**> wrote:
>>
>>> Hi guys,
>>>
>>> I have been wondering if there's a way (hack'ish would be okay too) to
>>> tell
>>> Hadoop that two files shall be stored together at the same location(s).
>>> It
>>> would benefit map-side join performance if it could be done somehow
>>> because
>>> all map tasks would be able to read data from a local copy. Does anyone
>>> know
>>> a way?
>>>
>>> -Sigurd
>>>
>>
>>
>>
>>

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by Sigurd Spieckermann <si...@gmail.com>.
Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I 
hope this is coming soon!

Am 05.12.2012 18:58, schrieb Harsh J:
> You are probably talking of
> https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
> This feature isn't available in HDFS yet, but may arrive soon.
>
> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
> <si...@gmail.com> wrote:
>> Hi guys,
>>
>> I have been wondering if there's a way (hack'ish would be okay too) to tell
>> Hadoop that two files shall be stored together at the same location(s). It
>> would benefit map-side join performance if it could be done somehow because
>> all map tasks would be able to read data from a local copy. Does anyone know
>> a way?
>>
>> -Sigurd
>
>
>

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by Sigurd Spieckermann <si...@gmail.com>.
Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I 
hope this is coming soon!

Am 05.12.2012 18:58, schrieb Harsh J:
> You are probably talking of
> https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
> This feature isn't available in HDFS yet, but may arrive soon.
>
> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
> <si...@gmail.com> wrote:
>> Hi guys,
>>
>> I have been wondering if there's a way (hack'ish would be okay too) to tell
>> Hadoop that two files shall be stored together at the same location(s). It
>> would benefit map-side join performance if it could be done somehow because
>> all map tasks would be able to read data from a local copy. Does anyone know
>> a way?
>>
>> -Sigurd
>
>
>

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by Sigurd Spieckermann <si...@gmail.com>.
Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I 
hope this is coming soon!

Am 05.12.2012 18:58, schrieb Harsh J:
> You are probably talking of
> https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
> This feature isn't available in HDFS yet, but may arrive soon.
>
> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
> <si...@gmail.com> wrote:
>> Hi guys,
>>
>> I have been wondering if there's a way (hack'ish would be okay too) to tell
>> Hadoop that two files shall be stored together at the same location(s). It
>> would benefit map-side join performance if it could be done somehow because
>> all map tasks would be able to read data from a local copy. Does anyone know
>> a way?
>>
>> -Sigurd
>
>
>

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by Sigurd Spieckermann <si...@gmail.com>.
Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I 
hope this is coming soon!

Am 05.12.2012 18:58, schrieb Harsh J:
> You are probably talking of
> https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
> This feature isn't available in HDFS yet, but may arrive soon.
>
> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
> <si...@gmail.com> wrote:
>> Hi guys,
>>
>> I have been wondering if there's a way (hack'ish would be okay too) to tell
>> Hadoop that two files shall be stored together at the same location(s). It
>> would benefit map-side join performance if it could be done somehow because
>> all map tasks would be able to read data from a local copy. Does anyone know
>> a way?
>>
>> -Sigurd
>
>
>

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by Harsh J <ha...@cloudera.com>.
You are probably talking of
https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
This feature isn't available in HDFS yet, but may arrive soon.

On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
<si...@gmail.com> wrote:
> Hi guys,
>
> I have been wondering if there's a way (hack'ish would be okay too) to tell
> Hadoop that two files shall be stored together at the same location(s). It
> would benefit map-side join performance if it could be done somehow because
> all map tasks would be able to read data from a local copy. Does anyone know
> a way?
>
> -Sigurd



-- 
Harsh J

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by Harsh J <ha...@cloudera.com>.
You are probably talking of
https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
This feature isn't available in HDFS yet, but may arrive soon.

On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
<si...@gmail.com> wrote:
> Hi guys,
>
> I have been wondering if there's a way (hack'ish would be okay too) to tell
> Hadoop that two files shall be stored together at the same location(s). It
> would benefit map-side join performance if it could be done somehow because
> all map tasks would be able to read data from a local copy. Does anyone know
> a way?
>
> -Sigurd



-- 
Harsh J

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by Harsh J <ha...@cloudera.com>.
You are probably talking of
https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
This feature isn't available in HDFS yet, but may arrive soon.

On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
<si...@gmail.com> wrote:
> Hi guys,
>
> I have been wondering if there's a way (hack'ish would be okay too) to tell
> Hadoop that two files shall be stored together at the same location(s). It
> would benefit map-side join performance if it could be done somehow because
> all map tasks would be able to read data from a local copy. Does anyone know
> a way?
>
> -Sigurd



-- 
Harsh J

Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS

Posted by Harsh J <ha...@cloudera.com>.
You are probably talking of
https://issues.apache.org/jira/browse/HDFS-2576 and similar JIRAs.
This feature isn't available in HDFS yet, but may arrive soon.

On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
<si...@gmail.com> wrote:
> Hi guys,
>
> I have been wondering if there's a way (hack'ish would be okay too) to tell
> Hadoop that two files shall be stored together at the same location(s). It
> would benefit map-side join performance if it could be done somehow because
> all map tasks would be able to read data from a local copy. Does anyone know
> a way?
>
> -Sigurd



-- 
Harsh J