You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Sriram Rao <sr...@gmail.com> on 2012/05/08 19:32:20 UTC
Project announcement: Sailfish (also, looking for colloborators)
Hi,
I'd like to announce the release of a new open source project, Sailfish.
http://code.google.com/p/sailfish/
Sailfish tries to improve Hadoop-performance, particularly for large-jobs
which process TB's of data and run for hours. In building Sailfish, we
modify how map-output is handled and transported from map->reduce.
The project pages provide more information about the project.
We are looking for colloborators who can help get some of the ideas into
Apache Hadoop. A possible step forward could be to make "shuffle" phase of
Hadoop pluggable.
If you are interested in working with us, please get in touch with me.
Sriram
Re: Project announcement: Sailfish (also, looking for colloborators)
Posted by Andrew Purtell <ap...@apache.org>.
Sriram et. al.,
Do you intend this to be a joint project with the Hadoop community or
a technology competitor?
Regrettably, KFS is not a "drop in replacement" for HDFS.
Hypothetically: I have several petabytes of data in an existing HDFS
deployment, which is the norm, and a continuous MapReduce workflow.
How do you propose I, practically, migrate to something like Sailfish
without a major capital expenditure and/or downtime and/or data loss?
However, can the Sailfish I-files implementation be plugged in as an
alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
MAPREDUCE-4049), with necessary additional plumbing for dynamic
adjustment of reduce task population? And the workbuilder could be
part of an alternate MapReduce Application Manager? The I-file concept
could possibly be implemented here in a fairly self contained way. One
could even colocate/embed a KFS filesystem with such an alternate
shuffle, like how MR task temporary space is usually colocated with
HDFS storage.
Does this seem reasonable in any way?
Best regards,
- Andy
>> From: Sriram Rao <sr...@gmail.com>
>> To: common-dev@hadoop.apache.org
>> Sent: Tuesday, May 8, 2012 10:32 AM
>> Subject: Project announcement: Sailfish (also, looking for colloborators)
>>
>> Hi,
>>
>> I'd like to announce the release of a new open source project, Sailfish.
>>
>> http://code.google.com/p/sailfish/
>>
>> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
>> which process TB's of data and run for hours. In building Sailfish, we
>> modify how map-output is handled and transported from map->reduce.
>>
>> The project pages provide more information about the project.
>>
>> We are looking for colloborators who can help get some of the ideas into
>> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
>> Hadoop pluggable.
>>
>> If you are interested in working with us, please get in touch with me.
>>
>> Sriram
>
--
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)
RE: Project announcement: Sailfish (also, looking for colloborators)
Posted by Saikat Kanjilal <sx...@hotmail.com>.
Me 2, let me know as well.
> Date: Tue, 8 May 2012 10:35:34 -0700
> From: mrraghu@yahoo.com
> Subject: Re: Project announcement: Sailfish (also, looking for colloborators)
> To: common-dev@hadoop.apache.org
>
> Hi Sriram,
>
> I'm interested in getting involved. Let me know in what capacity I can get involved..
>
> Thanks,
> Raghu
>
>
> ________________________________
> From: Sriram Rao <sr...@gmail.com>
> To: common-dev@hadoop.apache.org
> Sent: Tuesday, May 8, 2012 10:32 AM
> Subject: Project announcement: Sailfish (also, looking for colloborators)
>
> Hi,
>
> I'd like to announce the release of a new open source project, Sailfish.
>
> http://code.google.com/p/sailfish/
>
> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
> which process TB's of data and run for hours. In building Sailfish, we
> modify how map-output is handled and transported from map->reduce.
>
> The project pages provide more information about the project.
>
> We are looking for colloborators who can help get some of the ideas into
> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
> Hadoop pluggable.
>
> If you are interested in working with us, please get in touch with me.
>
> Sriram
Re: Project announcement: Sailfish (also, looking for colloborators)
Posted by Anil Gurnani <an...@gmail.com>.
Me too
Sent from my iPhone
On May 8, 2012, at 1:35 PM, Raghu Sakleshpur <mr...@yahoo.com> wrote:
> Hi Sriram,
>
> I'm interested in getting involved. Let me know in what capacity I can get involved..
>
> Thanks,
> Raghu
>
>
> ________________________________
> From: Sriram Rao <sr...@gmail.com>
> To: common-dev@hadoop.apache.org
> Sent: Tuesday, May 8, 2012 10:32 AM
> Subject: Project announcement: Sailfish (also, looking for colloborators)
>
> Hi,
>
> I'd like to announce the release of a new open source project, Sailfish.
>
> http://code.google.com/p/sailfish/
>
> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
> which process TB's of data and run for hours. In building Sailfish, we
> modify how map-output is handled and transported from map->reduce.
>
> The project pages provide more information about the project.
>
> We are looking for colloborators who can help get some of the ideas into
> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
> Hadoop pluggable.
>
> If you are interested in working with us, please get in touch with me.
>
> Sriram
Re: Project announcement: Sailfish (also, looking for colloborators)
Posted by Raghu Sakleshpur <mr...@yahoo.com>.
Hi Sriram,
I'm interested in getting involved. Let me know in what capacity I can get involved..
Thanks,
Raghu
________________________________
From: Sriram Rao <sr...@gmail.com>
To: common-dev@hadoop.apache.org
Sent: Tuesday, May 8, 2012 10:32 AM
Subject: Project announcement: Sailfish (also, looking for colloborators)
Hi,
I'd like to announce the release of a new open source project, Sailfish.
http://code.google.com/p/sailfish/
Sailfish tries to improve Hadoop-performance, particularly for large-jobs
which process TB's of data and run for hours. In building Sailfish, we
modify how map-output is handled and transported from map->reduce.
The project pages provide more information about the project.
We are looking for colloborators who can help get some of the ideas into
Apache Hadoop. A possible step forward could be to make "shuffle" phase of
Hadoop pluggable.
If you are interested in working with us, please get in touch with me.
Sriram
Re: Project announcement: Sailfish (also, looking for colloborators)
Posted by Eric Baldeschwieler <er...@hortonworks.com>.
This would seem like a perfect use case for YARN. Is that what you are
thinking? You could implement this as a new framework rather then
trying to incrementally change map-reduce.
This would let you move faster and demonstrate side by side
performance improvements.
---
E14 - typing on glass
On May 8, 2012, at 10:32 AM, Sriram Rao <sr...@gmail.com> wrote:
> Hi,
>
> I'd like to announce the release of a new open source project, Sailfish.
>
> http://code.google.com/p/sailfish/
>
> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
> which process TB's of data and run for hours. In building Sailfish, we
> modify how map-output is handled and transported from map->reduce.
>
> The project pages provide more information about the project.
>
> We are looking for colloborators who can help get some of the ideas into
> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
> Hadoop pluggable.
>
> If you are interested in working with us, please get in touch with me.
>
> Sriram