You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by Dmitriy Setrakyan <ds...@gridgain.com> on 2014/03/25 02:07:24 UTC

GridGain

Hi,

I am writing on behalf of GridGain open source project (www.gridgain.org),
licensed under Apache 2.0. At GridGain we are working on In-Memory
Computing Platform which is becoming one-stop place for many distributed
compute, data, and streaming needs.

One of the main pieces of our platform is our In-Memory Apache Hadoop
Accelerator which aims to accelerate HDFS and Map/Reduce by bringing both,
data and computations into memory. We do it with our GGFS - Hadoop
compliant in-memory file system. For I/O intensive jobs GridGain GGFS
offers performance close to 100x faster than standard HDFS. More
information can be found here:
http://www.gridgain.org/features/hadoop-acceleration/

We would like to have an opportunity to integrate our Apache Hadoop
Accelerator with Apache Bigtop. Please let us know if this is possible and
what steps are required of us.

Thanks in advance.
-
Dmitriy Setrakyan, EVP Engineering
*GridGain Systems*
www.gridgain.com

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Anatoli Fomenko <af...@yahoo.com>.

Hi Dmitriy,

Seeing such a great excitement at the Google Cloud Platform Live event,
and numbers from BigQuery demo, I'd say it's a good time to add
high performance in-memory components to Hadoop Stack, and BigTop
would be a natural place to start.

Perhaps you could point to a quick technology intro and differentiators?

Thanks,
Anatoli

On Monday, March 24, 2014 11:12 PM, Roman Shaposhnik <rv...@apache.org> wrote:

Hi Dmitriy!

Welcome to the Bigtop community!

On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <co...@apache.org> wrote:
>> One of the main pieces of our platform is our In-Memory Apache Hadoop
>> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing both,
>> data and computations into memory. We do it with our GGFS - Hadoop
>> compliant in-memory file system. For I/O intensive jobs GridGain GGFS
>> offers performance close to 100x faster than standard HDFS. More
>> information can be found here:
>> http://www.gridgain.org/features/hadoop-acceleration/
>>
>> We would like to have an opportunity to integrate our Apache Hadoop
>> Accelerator with Apache Bigtop. Please let us know if this is possible and
>> what steps are required of us.

I've been actually fascinated by the in-memory analytics platforms lately.
Things like Apache Spark seem to be a really good addition to the
Hadoop ecosystem.

Now, I understand that you've got a piece of technology that can essentially
serve as a replacement for HDFS, but could you please elaborate on
what other integration points do you have between GridGain and the rest
of Hadoop ecosystem?

That, I think, would be a much wider discussion.

Thanks,
Roman.

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Jay Vyas <ja...@gmail.com>.

Yeah sure we can try a google hangout screencast thingy


On Wed, Mar 26, 2014 at 5:31 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Dmitriy,
>
> I think your proposal is great as we are just forming up the agenda for the
> meetup. And it seems to be great to have an deeper dive into the platform,
> which will let folks here to get more familiar with it.
>
> Jay, do you think we can tape the meetup talks and publish it later?
>   Cos
>
> On Wed, Mar 26, 2014 at 02:36PM, Dmitriy Setrakyan wrote:
> > I plan to be at ApacheCon on Monday, April 7th. I hear that Bigtop will
> > have a meetup there in the evening. Do you think it will be OK if I could
> > spend about 20 minutes there to present GridGain GGFS and overall
> approach
> > to Hadoop acceleration? I think it would be interesting to go through a
> > couple of architectural diagrams and may spur a good discussion.
> >
> > -Dmitriy
> >
> > On Wed, Mar 26, 2014 at 8:35 AM, Jay Vyas <ja...@gmail.com> wrote:
> >
> > > I love the fact that GridGain is going to be part of bigtop !   This
> will
> > > give us two new compute paradigms, all packaged  and testable under the
> > > same umbrella.  And now with our vagrant recipes, people will be able
> to
> > > demo grid gain by simply typing "vagrant up" into the console.
> > >
> > > And Im pretty sure GridGain and Spark will drive each other forward .
>  Just
> > > the same way Ceph, HDFS, and GlusterFS do.
> > >
> > > Dmitriy will you be at apachecon?  If so why dont you come share your
> > > thoughts with us at the two bigtop meetups on the 7th and the 8th ?
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan <
> > > dsetrakyan@gridgain.com
> > > > wrote:
> > >
> > > > Andrew,
> > > >
> > > > I agree with you. All I meant to say is that currently users of
> Hadoop
> > > that
> > > > would like to improve performance of their deployments have to
> switch to
> > > > Spark and code to Spark APIs. GridGain, on the other hand, will
> provide
> > > an
> > > > option to accelerate existing Hadoop deployments without any changes
> in
> > > > code.
> > > >
> > > > Regards,
> > > > -Dmtiriy
> > > >
> > > > On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <apurtell@apache.org
> >
> > > > wrote:
> > > >
> > > > > Thank you.
> > > > >
> > > > > On this part of your response:
> > > > >
> > > > > > GridGain is working on adding native MapReduce component which
> will
> > > > > provide
> > > > > native complete Hadoop integration without changes in API, like
> Spark
> > > > > currently forces you to do
> > > > >
> > > > > I'm not sure those flocking to Spark are doing so by force. Nor
> that
> > > the
> > > > > Spark API should be considered a liability when compared to Hadoop
> > > > > MapReduce. For your consideration.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> > > > > dsetrakyan@gridgain.com
> > > > > > wrote:
> > > > >
> > > > > > I think the feature set is pretty close and GGFS would be a good
> > > > contract
> > > > > > to Tachyon for performance and reliability features.
> > > > > >
> > > > > > I am not an expert on Tachyon, but I think the main differences
> are:
> > > > > >
> > > > > > - GGFS allows read-through and write-through to/from underlying
> HDFS
> > > or
> > > > > any
> > > > > > other Hadoop compliant file system with zero code change.
> Essentially
> > > > > GGFS
> > > > > > entirely removes ETL step from integration.
> > > > > >
> > > > > > - GGFS has ability to pick and choose what folders stay in
> memory,
> > > what
> > > > > > folders stay on disc, and what folders get synchronized with
> > > underlying
> > > > > > (HD)FS either synchronously or asynchronously.
> > > > > >
> > > > > > - GridGain is working on adding native MapReduce component which
> will
> > > > > > provide native complete Hadoop integration without changes in
> API,
> > > like
> > > > > > Spark currently forces you to do. Essentially GridGain MR+GGFS
> will
> > > > allow
> > > > > > to bring Hadoop completely or partially in-memory in Plug-n-Play
> > > > fashion
> > > > > > without any API changes.
> > > > > >
> > > > > > There are probably other differences that I am forgetting right
> now,
> > > > but
> > > > > I
> > > > > > think the above set lists the most significant ones.
> > > > > >
> > > > > > Regards,
> > > > > > --
> > > > > > Dmitriy Setrakyan, EVP Engineering
> > > > > > *GridGain Systems*
> > > > > > www.gridgain.com
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <
> > > apurtell@apache.org
> > > > > > >wrote:
> > > > > >
> > > > > > > Dmitriy,
> > > > > > >
> > > > > > > Would it be possible to contrast GGFS with Tachyon (
> > > > > > > http://tachyon-project.org/)?
> > > > > > >
> > > > > > > Also, do you have any plans for Spark integration?
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > > > > > dsetrakyan@gridgain.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Roman,
> > > > > > > >
> > > > > > > > At this point the integration is pluggable in memory file
> system,
> > > > > GGFS.
> > > > > > > It
> > > > > > > > works just like HDFS (same API), but in reality serves as a
> > > caching
> > > > > > layer
> > > > > > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > > > > > synchronizes
> > > > > > > > them with underlying HDFS either synchronously or
> asynchronously,
> > > > > > > depending
> > > > > > > > on configuration.
> > > > > > > >
> > > > > > > > Since, GGFS implements standard Hadoop File System API, it
> > > > > > automatically
> > > > > > > > integrates with other Hadoop ecosystem pieces via File
> System API
> > > > as
> > > > > > > well.
> > > > > > > >
> > > > > > > > Going forward, we are planning to add same native API
> integration
> > > > for
> > > > > > > > MapReduce component as well.
> > > > > > > >
> > > > > > > > Hope this answers your question.
> > > > > > > >
> > > > > > > > -Dmitriy
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <
> > > rvs@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Dmitriy!
> > > > > > > > >
> > > > > > > > > Welcome to the Bigtop community!
> > > > > > > > >
> > > > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> > > > > cos@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >> One of the main pieces of our platform is our In-Memory
> > > Apache
> > > > > > > Hadoop
> > > > > > > > > >> Accelerator which aims to accelerate HDFS and
> Map/Reduce by
> > > > > > bringing
> > > > > > > > > both,
> > > > > > > > > >> data and computations into memory. We do it with our
> GGFS -
> > > > > Hadoop
> > > > > > > > > >> compliant in-memory file system. For I/O intensive jobs
> > > > GridGain
> > > > > > > GGFS
> > > > > > > > > >> offers performance close to 100x faster than standard
> HDFS.
> > > > More
> > > > > > > > > >> information can be found here:
> > > > > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > > > > > >>
> > > > > > > > > >> We would like to have an opportunity to integrate our
> Apache
> > > > > > Hadoop
> > > > > > > > > >> Accelerator with Apache Bigtop. Please let us know if
> this
> > > is
> > > > > > > possible
> > > > > > > > > and
> > > > > > > > > >> what steps are required of us.
> > > > > > > > >
> > > > > > > > > I've been actually fascinated by the in-memory analytics
> > > > platforms
> > > > > > > > lately.
> > > > > > > > > Things like Apache Spark seem to be a really good addition
> to
> > > the
> > > > > > > > > Hadoop ecosystem.
> > > > > > > > >
> > > > > > > > > Now, I understand that you've got a piece of technology
> that
> > > can
> > > > > > > > > essentially
> > > > > > > > > serve as a replacement for HDFS, but could you please
> elaborate
> > > > on
> > > > > > > > > what other integration points do you have between GridGain
> and
> > > > the
> > > > > > rest
> > > > > > > > > of Hadoop ecosystem?
> > > > > > > > >
> > > > > > > > > That, I think, would be a much wider discussion.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Roman.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > >
> > > > > > >    - Andy
> > > > > > >
> > > > > > > Problems worthy of attack prove their worth by hitting back. -
> Piet
> > > > > Hein
> > > > > > > (via Tom White)
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >    - Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jay Vyas
> > > http://jayunit100.blogspot.com
> > >
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Konstantin Boudnik <co...@apache.org>.

Dmitriy,

I think your proposal is great as we are just forming up the agenda for the
meetup. And it seems to be great to have an deeper dive into the platform,
which will let folks here to get more familiar with it.

Jay, do you think we can tape the meetup talks and publish it later?
  Cos

On Wed, Mar 26, 2014 at 02:36PM, Dmitriy Setrakyan wrote:
> I plan to be at ApacheCon on Monday, April 7th. I hear that Bigtop will
> have a meetup there in the evening. Do you think it will be OK if I could
> spend about 20 minutes there to present GridGain GGFS and overall approach
> to Hadoop acceleration? I think it would be interesting to go through a
> couple of architectural diagrams and may spur a good discussion.
> 
> -Dmitriy
> 
> On Wed, Mar 26, 2014 at 8:35 AM, Jay Vyas <ja...@gmail.com> wrote:
> 
> > I love the fact that GridGain is going to be part of bigtop !   This will
> > give us two new compute paradigms, all packaged  and testable under the
> > same umbrella.  And now with our vagrant recipes, people will be able to
> > demo grid gain by simply typing "vagrant up" into the console.
> >
> > And Im pretty sure GridGain and Spark will drive each other forward .  Just
> > the same way Ceph, HDFS, and GlusterFS do.
> >
> > Dmitriy will you be at apachecon?  If so why dont you come share your
> > thoughts with us at the two bigtop meetups on the 7th and the 8th ?
> >
> >
> >
> >
> >
> > On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan <
> > dsetrakyan@gridgain.com
> > > wrote:
> >
> > > Andrew,
> > >
> > > I agree with you. All I meant to say is that currently users of Hadoop
> > that
> > > would like to improve performance of their deployments have to switch to
> > > Spark and code to Spark APIs. GridGain, on the other hand, will provide
> > an
> > > option to accelerate existing Hadoop deployments without any changes in
> > > code.
> > >
> > > Regards,
> > > -Dmtiriy
> > >
> > > On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <ap...@apache.org>
> > > wrote:
> > >
> > > > Thank you.
> > > >
> > > > On this part of your response:
> > > >
> > > > > GridGain is working on adding native MapReduce component which will
> > > > provide
> > > > native complete Hadoop integration without changes in API, like Spark
> > > > currently forces you to do
> > > >
> > > > I'm not sure those flocking to Spark are doing so by force. Nor that
> > the
> > > > Spark API should be considered a liability when compared to Hadoop
> > > > MapReduce. For your consideration.
> > > >
> > > >
> > > >
> > > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> > > > dsetrakyan@gridgain.com
> > > > > wrote:
> > > >
> > > > > I think the feature set is pretty close and GGFS would be a good
> > > contract
> > > > > to Tachyon for performance and reliability features.
> > > > >
> > > > > I am not an expert on Tachyon, but I think the main differences are:
> > > > >
> > > > > - GGFS allows read-through and write-through to/from underlying HDFS
> > or
> > > > any
> > > > > other Hadoop compliant file system with zero code change. Essentially
> > > > GGFS
> > > > > entirely removes ETL step from integration.
> > > > >
> > > > > - GGFS has ability to pick and choose what folders stay in memory,
> > what
> > > > > folders stay on disc, and what folders get synchronized with
> > underlying
> > > > > (HD)FS either synchronously or asynchronously.
> > > > >
> > > > > - GridGain is working on adding native MapReduce component which will
> > > > > provide native complete Hadoop integration without changes in API,
> > like
> > > > > Spark currently forces you to do. Essentially GridGain MR+GGFS will
> > > allow
> > > > > to bring Hadoop completely or partially in-memory in Plug-n-Play
> > > fashion
> > > > > without any API changes.
> > > > >
> > > > > There are probably other differences that I am forgetting right now,
> > > but
> > > > I
> > > > > think the above set lists the most significant ones.
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Dmitriy Setrakyan, EVP Engineering
> > > > > *GridGain Systems*
> > > > > www.gridgain.com
> > > > >
> > > > >
> > > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <
> > apurtell@apache.org
> > > > > >wrote:
> > > > >
> > > > > > Dmitriy,
> > > > > >
> > > > > > Would it be possible to contrast GGFS with Tachyon (
> > > > > > http://tachyon-project.org/)?
> > > > > >
> > > > > > Also, do you have any plans for Spark integration?
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > > > > dsetrakyan@gridgain.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Roman,
> > > > > > >
> > > > > > > At this point the integration is pluggable in memory file system,
> > > > GGFS.
> > > > > > It
> > > > > > > works just like HDFS (same API), but in reality serves as a
> > caching
> > > > > layer
> > > > > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > > > > synchronizes
> > > > > > > them with underlying HDFS either synchronously or asynchronously,
> > > > > > depending
> > > > > > > on configuration.
> > > > > > >
> > > > > > > Since, GGFS implements standard Hadoop File System API, it
> > > > > automatically
> > > > > > > integrates with other Hadoop ecosystem pieces via File System API
> > > as
> > > > > > well.
> > > > > > >
> > > > > > > Going forward, we are planning to add same native API integration
> > > for
> > > > > > > MapReduce component as well.
> > > > > > >
> > > > > > > Hope this answers your question.
> > > > > > >
> > > > > > > -Dmitriy
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <
> > rvs@apache.org
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Dmitriy!
> > > > > > > >
> > > > > > > > Welcome to the Bigtop community!
> > > > > > > >
> > > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> > > > cos@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >> One of the main pieces of our platform is our In-Memory
> > Apache
> > > > > > Hadoop
> > > > > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by
> > > > > bringing
> > > > > > > > both,
> > > > > > > > >> data and computations into memory. We do it with our GGFS -
> > > > Hadoop
> > > > > > > > >> compliant in-memory file system. For I/O intensive jobs
> > > GridGain
> > > > > > GGFS
> > > > > > > > >> offers performance close to 100x faster than standard HDFS.
> > > More
> > > > > > > > >> information can be found here:
> > > > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > > > > >>
> > > > > > > > >> We would like to have an opportunity to integrate our Apache
> > > > > Hadoop
> > > > > > > > >> Accelerator with Apache Bigtop. Please let us know if this
> > is
> > > > > > possible
> > > > > > > > and
> > > > > > > > >> what steps are required of us.
> > > > > > > >
> > > > > > > > I've been actually fascinated by the in-memory analytics
> > > platforms
> > > > > > > lately.
> > > > > > > > Things like Apache Spark seem to be a really good addition to
> > the
> > > > > > > > Hadoop ecosystem.
> > > > > > > >
> > > > > > > > Now, I understand that you've got a piece of technology that
> > can
> > > > > > > > essentially
> > > > > > > > serve as a replacement for HDFS, but could you please elaborate
> > > on
> > > > > > > > what other integration points do you have between GridGain and
> > > the
> > > > > rest
> > > > > > > > of Hadoop ecosystem?
> > > > > > > >
> > > > > > > > That, I think, would be a much wider discussion.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Roman.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > >
> > > > > >    - Andy
> > > > > >
> > > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > > Hein
> > > > > > (via Tom White)
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
> >
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> >