You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Sriram Rao <sr...@gmail.com> on 2012/05/08 19:32:20 UTC

Project announcement: Sailfish (also, looking for colloborators)

Hi,

I'd like to announce the release of a new open source project, Sailfish.

http://code.google.com/p/sailfish/

Sailfish tries to improve Hadoop-performance, particularly for large-jobs
which process TB's of data and run for hours.  In building Sailfish, we
modify how map-output is handled and transported from map->reduce.

The project pages provide more information about the project.

We are looking for colloborators who can help get some of the ideas into
Apache Hadoop. A possible step forward could be to make "shuffle" phase of
Hadoop pluggable.

If you are interested in working with us, please get in touch with me.

Sriram

Re: Project announcement: Sailfish (also, looking for colloborators)

Posted by Andrew Purtell <ap...@apache.org>.
Sriram et. al.,

Do you intend this to be a joint project with the Hadoop community or
a technology competitor?

Regrettably, KFS is not a "drop in replacement" for HDFS.
Hypothetically: I have several petabytes of data in an existing HDFS
deployment, which is the norm, and a continuous MapReduce workflow.
How do you propose I, practically, migrate to something like Sailfish
without a major capital expenditure and/or downtime and/or data loss?

However, can the Sailfish I-files implementation be plugged in as an
alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
MAPREDUCE-4049), with necessary additional plumbing for dynamic
adjustment of reduce task population? And the workbuilder could be
part of an alternate MapReduce Application Manager? The I-file concept
could possibly be implemented here in a fairly self contained way. One
could even colocate/embed a KFS filesystem with such an alternate
shuffle, like how MR task temporary space is usually colocated with
HDFS storage.

Does this seem reasonable in any way?

Best regards,

   - Andy

>>  From: Sriram Rao <sr...@gmail.com>
>> To: common-dev@hadoop.apache.org
>> Sent: Tuesday, May 8, 2012 10:32 AM
>> Subject: Project announcement: Sailfish (also, looking for colloborators)
>>
>> Hi,
>>
>> I'd like to announce the release of a new open source project, Sailfish.
>>
>> http://code.google.com/p/sailfish/
>>
>> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
>> which process TB's of data and run for hours.  In building Sailfish, we
>> modify how map-output is handled and transported from map->reduce.
>>
>> The project pages provide more information about the project.
>>
>> We are looking for colloborators who can help get some of the ideas into
>> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
>> Hadoop pluggable.
>>
>> If you are interested in working with us, please get in touch with me.
>>
>> Sriram
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

RE: Project announcement: Sailfish (also, looking for colloborators)

Posted by Saikat Kanjilal <sx...@hotmail.com>.
Me 2, let me know as well.

> Date: Tue, 8 May 2012 10:35:34 -0700
> From: mrraghu@yahoo.com
> Subject: Re: Project announcement: Sailfish (also, looking for colloborators)
> To: common-dev@hadoop.apache.org
> 
> Hi Sriram,
> 
>        I'm interested in getting involved. Let me know in what capacity I can get involved..
> 
> Thanks,
> Raghu
> 
> 
> ________________________________
>  From: Sriram Rao <sr...@gmail.com>
> To: common-dev@hadoop.apache.org 
> Sent: Tuesday, May 8, 2012 10:32 AM
> Subject: Project announcement: Sailfish (also, looking for colloborators)
>  
> Hi,
> 
> I'd like to announce the release of a new open source project, Sailfish.
> 
> http://code.google.com/p/sailfish/
> 
> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
> which process TB's of data and run for hours.  In building Sailfish, we
> modify how map-output is handled and transported from map->reduce.
> 
> The project pages provide more information about the project.
> 
> We are looking for colloborators who can help get some of the ideas into
> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
> Hadoop pluggable.
> 
> If you are interested in working with us, please get in touch with me.
> 
> Sriram
 		 	   		  

Re: Project announcement: Sailfish (also, looking for colloborators)

Posted by Anil Gurnani <an...@gmail.com>.
Me too

Sent from my iPhone

On May 8, 2012, at 1:35 PM, Raghu Sakleshpur <mr...@yahoo.com> wrote:

> Hi Sriram,
> 
>        I'm interested in getting involved. Let me know in what capacity I can get involved..
> 
> Thanks,
> Raghu
> 
> 
> ________________________________
> From: Sriram Rao <sr...@gmail.com>
> To: common-dev@hadoop.apache.org 
> Sent: Tuesday, May 8, 2012 10:32 AM
> Subject: Project announcement: Sailfish (also, looking for colloborators)
> 
> Hi,
> 
> I'd like to announce the release of a new open source project, Sailfish.
> 
> http://code.google.com/p/sailfish/
> 
> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
> which process TB's of data and run for hours.  In building Sailfish, we
> modify how map-output is handled and transported from map->reduce.
> 
> The project pages provide more information about the project.
> 
> We are looking for colloborators who can help get some of the ideas into
> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
> Hadoop pluggable.
> 
> If you are interested in working with us, please get in touch with me.
> 
> Sriram

Re: Project announcement: Sailfish (also, looking for colloborators)

Posted by Raghu Sakleshpur <mr...@yahoo.com>.
Hi Sriram,

       I'm interested in getting involved. Let me know in what capacity I can get involved..

Thanks,
Raghu


________________________________
 From: Sriram Rao <sr...@gmail.com>
To: common-dev@hadoop.apache.org 
Sent: Tuesday, May 8, 2012 10:32 AM
Subject: Project announcement: Sailfish (also, looking for colloborators)
 
Hi,

I'd like to announce the release of a new open source project, Sailfish.

http://code.google.com/p/sailfish/

Sailfish tries to improve Hadoop-performance, particularly for large-jobs
which process TB's of data and run for hours.  In building Sailfish, we
modify how map-output is handled and transported from map->reduce.

The project pages provide more information about the project.

We are looking for colloborators who can help get some of the ideas into
Apache Hadoop. A possible step forward could be to make "shuffle" phase of
Hadoop pluggable.

If you are interested in working with us, please get in touch with me.

Sriram

Re: Project announcement: Sailfish (also, looking for colloborators)

Posted by Eric Baldeschwieler <er...@hortonworks.com>.
This would seem like a perfect use case for YARN. Is that what you are
thinking?  You could implement this as a new framework rather then
trying to incrementally change map-reduce.

This would let you move faster and demonstrate side by side
performance improvements.

---
E14 - typing on glass

On May 8, 2012, at 10:32 AM, Sriram Rao <sr...@gmail.com> wrote:

> Hi,
>
> I'd like to announce the release of a new open source project, Sailfish.
>
> http://code.google.com/p/sailfish/
>
> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
> which process TB's of data and run for hours.  In building Sailfish, we
> modify how map-output is handled and transported from map->reduce.
>
> The project pages provide more information about the project.
>
> We are looking for colloborators who can help get some of the ideas into
> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
> Hadoop pluggable.
>
> If you are interested in working with us, please get in touch with me.
>
> Sriram