You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by MTG dev <de...@magnatempusgroup.net> on 2012/09/24 18:31:21 UTC

Spark in-memory analytics in BigTop stack

Fellow BigTop'pers.

We have just rolled out a readily available Spark 0.5 (www.spark-project.org)
packaged for Ubuntu distribution. This package is build against current
official Apache Hadoop 1.0.3, so it should be compatible with everything from
0.20.205 up to Hadoop 1.1 release candidate. Redhat/CentOS version is coming in
a few days (in case someone is interested).

You can find all related information at
    http://www.magnatempusgroup.net/blog/2012/09/24/incredibly-fast-in-memory-analytics-for-bigdata-technology-preview/
and download installable package from
    http://magnatempusgroup.net/ftphost/releases/Spark-0.5-1.0.3/

I am posting this here, because the package is created in the exact standards of BigTop stack. In other words, BigTop rules!

We would love to hear your feedback and comments!

-- 
With regards,
	Alef
        MTG development team

Re: Spark in-memory analytics in BigTop stack

Posted by Konstantin Boudnik <co...@apache.org>.

On Wed, Sep 26, 2012 at 04:06PM, Wing Yew Poon wrote:
> Hi,
> please correct me if I'm wrong, but I thought Spark runs on top of
> Mesos. Does it not require Mesos to run?

Yes, you are right. That's why Mesos library is a part of Spark's dependencies.

Now, there are 'on top of Mesos' and 'on Mesos cluster', which are quite
different apparently ;)

Cos

> - Wing Yew
> 
> On Wed, Sep 26, 2012 at 10:42 AM, MTG dev <de...@magnatempusgroup.net> wrote:
> > Thanks Bruno!
> >
> > I have created  BIGTOP-715
> >
> > Cheers,
> >   MTG dev
> >
> > On Wed, Sep 26, 2012 at 01:19AM, Bruno MahИ wrote:
> >> zOn 09/25/2012 10:46 AM, MTG dev wrote:
> >> >Hi there.
> >> >
> >> >Apparently, I am not in a position to say what role Spark can play in the
> >> >Bigtop for I am not speaking for neither of those projects.
> >> >
> >> >However, I can tell that Spark provides a number of the advantages compare to
> >> >a traditional MapReduce model: stateful computational model with a need to
> >> >write everything back to file system after step, in-memory calculations,
> >> >higher level of primitives expressed in a functional language, etc. These
> >> >advantages combined with low-latency planner result in a very significant
> >> >performance improvement. I'd suggest to go over spark-project.org for more
> >> >information.
> >> >
> >> >I am not an expert on Drill, but I'd say that Spark give immediate benefits
> >> >over the former because it is already here and can be used by anyone ;)
> >> >
> >> >As for integration with Bigtop: Spark doesn't require any special integration
> >> >with the rest of the stack - it might use HDFS as the underlying storage, but
> >> >that's about it.
> >> >
> >> >Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
> >> >but I am not completely sure about its status.
> >> >
> >> >On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
> >> >>On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> >> >>>Hi Alef,
> >> >>>
> >> >>>Great news!
> >> >>>
> >> >>>Spark developers are interested in developing Spark packages and
> >> >>>contributing them to open source. Since you already have them,
> >> >>>what would you think about contributing the source to BigTop?
> >> >
> >> >We don't have any plans of holding the sources of the packages back, but we
> >> >are working on rpm packaging right now. Once the work is over, we should be
> >> >able to contribute it back to the community. Shall there be a JIRA ticket for
> >> >that or something?
> >> >
> >> >With regards,
> >> >   Alef
> >> >   MTG dev team
> >> >
> >>
> >>
> >> Great news!
> >>
> >>
> >> And yes, there should be a ticket. It will be helpful to organize
> >> any work around it.
> >>
> >> Thanks,
> >> Bruno
> >>
> >>

Re: Spark in-memory analytics in BigTop stack

Posted by Wing Yew Poon <wy...@cloudera.com>.

Hi,
please correct me if I'm wrong, but I thought Spark runs on top of
Mesos. Does it not require Mesos to run?
- Wing Yew

On Wed, Sep 26, 2012 at 10:42 AM, MTG dev <de...@magnatempusgroup.net> wrote:
> Thanks Bruno!
>
> I have created  BIGTOP-715
>
> Cheers,
>   MTG dev
>
> On Wed, Sep 26, 2012 at 01:19AM, Bruno Mahé wrote:
>> zOn 09/25/2012 10:46 AM, MTG dev wrote:
>> >Hi there.
>> >
>> >Apparently, I am not in a position to say what role Spark can play in the
>> >Bigtop for I am not speaking for neither of those projects.
>> >
>> >However, I can tell that Spark provides a number of the advantages compare to
>> >a traditional MapReduce model: stateful computational model with a need to
>> >write everything back to file system after step, in-memory calculations,
>> >higher level of primitives expressed in a functional language, etc. These
>> >advantages combined with low-latency planner result in a very significant
>> >performance improvement. I'd suggest to go over spark-project.org for more
>> >information.
>> >
>> >I am not an expert on Drill, but I'd say that Spark give immediate benefits
>> >over the former because it is already here and can be used by anyone ;)
>> >
>> >As for integration with Bigtop: Spark doesn't require any special integration
>> >with the rest of the stack - it might use HDFS as the underlying storage, but
>> >that's about it.
>> >
>> >Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
>> >but I am not completely sure about its status.
>> >
>> >On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
>> >>On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
>> >>>Hi Alef,
>> >>>
>> >>>Great news!
>> >>>
>> >>>Spark developers are interested in developing Spark packages and
>> >>>contributing them to open source. Since you already have them,
>> >>>what would you think about contributing the source to BigTop?
>> >
>> >We don't have any plans of holding the sources of the packages back, but we
>> >are working on rpm packaging right now. Once the work is over, we should be
>> >able to contribute it back to the community. Shall there be a JIRA ticket for
>> >that or something?
>> >
>> >With regards,
>> >   Alef
>> >   MTG dev team
>> >
>>
>>
>> Great news!
>>
>>
>> And yes, there should be a ticket. It will be helpful to organize
>> any work around it.
>>
>> Thanks,
>> Bruno
>>
>>

Re: Spark in-memory analytics in BigTop stack

Posted by MTG dev <de...@magnatempusgroup.net>.

Thanks Bruno!

I have created  BIGTOP-715

Cheers,
  MTG dev
  
On Wed, Sep 26, 2012 at 01:19AM, Bruno Mahé wrote:
> zOn 09/25/2012 10:46 AM, MTG dev wrote:
> >Hi there.
> >
> >Apparently, I am not in a position to say what role Spark can play in the
> >Bigtop for I am not speaking for neither of those projects.
> >
> >However, I can tell that Spark provides a number of the advantages compare to
> >a traditional MapReduce model: stateful computational model with a need to
> >write everything back to file system after step, in-memory calculations,
> >higher level of primitives expressed in a functional language, etc. These
> >advantages combined with low-latency planner result in a very significant
> >performance improvement. I'd suggest to go over spark-project.org for more
> >information.
> >
> >I am not an expert on Drill, but I'd say that Spark give immediate benefits
> >over the former because it is already here and can be used by anyone ;)
> >
> >As for integration with Bigtop: Spark doesn't require any special integration
> >with the rest of the stack - it might use HDFS as the underlying storage, but
> >that's about it.
> >
> >Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
> >but I am not completely sure about its status.
> >
> >On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
> >>On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> >>>Hi Alef,
> >>>
> >>>Great news!
> >>>
> >>>Spark developers are interested in developing Spark packages and
> >>>contributing them to open source. Since you already have them,
> >>>what would you think about contributing the source to BigTop?
> >
> >We don't have any plans of holding the sources of the packages back, but we
> >are working on rpm packaging right now. Once the work is over, we should be
> >able to contribute it back to the community. Shall there be a JIRA ticket for
> >that or something?
> >
> >With regards,
> >   Alef
> >   MTG dev team
> >
> 
> 
> Great news!
> 
> 
> And yes, there should be a ticket. It will be helpful to organize
> any work around it.
> 
> Thanks,
> Bruno
> 
>

Re: Spark in-memory analytics in BigTop stack

Posted by Bruno Mahé <bm...@apache.org>.

zOn 09/25/2012 10:46 AM, MTG dev wrote:
> Hi there.
>
> Apparently, I am not in a position to say what role Spark can play in the
> Bigtop for I am not speaking for neither of those projects.
>
> However, I can tell that Spark provides a number of the advantages compare to
> a traditional MapReduce model: stateful computational model with a need to
> write everything back to file system after step, in-memory calculations,
> higher level of primitives expressed in a functional language, etc. These
> advantages combined with low-latency planner result in a very significant
> performance improvement. I'd suggest to go over spark-project.org for more
> information.
>
> I am not an expert on Drill, but I'd say that Spark give immediate benefits
> over the former because it is already here and can be used by anyone ;)
>
> As for integration with Bigtop: Spark doesn't require any special integration
> with the rest of the stack - it might use HDFS as the underlying storage, but
> that's about it.
>
> Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
> but I am not completely sure about its status.
>
> On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
>> On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
>>> Hi Alef,
>>>
>>> Great news!
>>>
>>> Spark developers are interested in developing Spark packages and
>>> contributing them to open source. Since you already have them,
>>> what would you think about contributing the source to BigTop?
>
> We don't have any plans of holding the sources of the packages back, but we
> are working on rpm packaging right now. Once the work is over, we should be
> able to contribute it back to the community. Shall there be a JIRA ticket for
> that or something?
>
> With regards,
>    Alef
>    MTG dev team
>


Great news!


And yes, there should be a ticket. It will be helpful to organize any 
work around it.

Thanks,
Bruno

Re: Spark in-memory analytics in BigTop stack

Posted by MTG dev <de...@magnatempusgroup.net>.

Hi there.

Apparently, I am not in a position to say what role Spark can play in the
Bigtop for I am not speaking for neither of those projects.

However, I can tell that Spark provides a number of the advantages compare to
a traditional MapReduce model: stateful computational model with a need to
write everything back to file system after step, in-memory calculations,
higher level of primitives expressed in a functional language, etc. These
advantages combined with low-latency planner result in a very significant
performance improvement. I'd suggest to go over spark-project.org for more
information.

I am not an expert on Drill, but I'd say that Spark give immediate benefits
over the former because it is already here and can be used by anyone ;)

As for integration with Bigtop: Spark doesn't require any special integration
with the rest of the stack - it might use HDFS as the underlying storage, but
that's about it.

Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
but I am not completely sure about its status.

On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
> On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> > Hi Alef,
> >
> > Great news!
> >
> > Spark developers are interested in developing Spark packages and
> > contributing them to open source. Since you already have them,
> > what would you think about contributing the source to BigTop?

We don't have any plans of holding the sources of the packages back, but we
are working on rpm packaging right now. Once the work is over, we should be
able to contribute it back to the community. Shall there be a JIRA ticket for
that or something?

With regards,
  Alef
  MTG dev team

> This is very, very interesting indeed! I'd also like to hear a bit
> more about what role Spark can play in Bigtop project -- from
> just skimming the web it feels like it can be seen as an
> alternative to Apache Drill (incubating) or am I completely off
> base here?
> 
> Also, what level of integration is required between Spark and
> the rest of Hadoop ecosystem components (Hive, Pig, etc.)?
> 
> Thanks,
> Roman.

Re: Spark in-memory analytics in BigTop stack

Posted by Roman Shaposhnik <rv...@apache.org>.

On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <af...@yahoo.com> wrote:
> Hi Alef,
>
> Great news!
>
> Spark developers are interested in developing Spark packages and
> contributing them to open source. Since you already have them,
> what would you think about contributing the source to BigTop?

This is very, very interesting indeed! I'd also like to hear a bit
more about what role Spark can play in Bigtop project -- from
just skimming the web it feels like it can be seen as an
alternative to Apache Drill (incubating) or am I completely off
base here?

Also, what level of integration is required between Spark and
the rest of Hadoop ecosystem components (Hive, Pig, etc.)?

Thanks,
Roman.

Re: Spark in-memory analytics in BigTop stack

Posted by Anatoli Fomenko <af...@yahoo.com>.

Hi Alef,

Great news!

Spark developers are interested in developing Spark packages and contributing them to open source. Since you already have them, what would you think about contributing the source to BigTop?

Thank you,
Anatoli




________________________________
 From: MTG dev <de...@magnatempusgroup.net>
To: bigtop-dev@incubator.apache.org 
Sent: Monday, September 24, 2012 9:31 AM
Subject: Spark in-memory analytics in BigTop stack
 
Fellow BigTop'pers.

We have just rolled out a readily available Spark 0.5 (www.spark-project.org)
packaged for Ubuntu distribution. This package is build against current
official Apache Hadoop 1.0.3, so it should be compatible with everything from
0.20.205 up to Hadoop 1.1 release candidate. Redhat/CentOS version is coming in
a few days (in case someone is interested).

You can find all related information at
    http://www.magnatempusgroup.net/blog/2012/09/24/incredibly-fast-in-memory-analytics-for-bigdata-technology-preview/
and download installable package from
    http://magnatempusgroup.net/ftphost/releases/Spark-0.5-1.0.3/

I am posting this here, because the package is created in the exact standards of BigTop stack. In other words, BigTop rules!

We would love to hear your feedback and comments!

-- 
With regards,
    Alef
        MTG development team