You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Dmitriy Setrakyan <ds...@gridgain.com> on 2014/03/25 02:07:24 UTC

GridGain

Hi,

I am writing on behalf of GridGain open source project (www.gridgain.org),
licensed under Apache 2.0. At GridGain we are working on In-Memory
Computing Platform which is becoming one-stop place for many distributed
compute, data, and streaming needs.

One of the main pieces of our platform is our In-Memory Apache Hadoop
Accelerator which aims to accelerate HDFS and Map/Reduce by bringing both,
data and computations into memory. We do it with our GGFS - Hadoop
compliant in-memory file system. For I/O intensive jobs GridGain GGFS
offers performance close to 100x faster than standard HDFS. More
information can be found here:
http://www.gridgain.org/features/hadoop-acceleration/

We would like to have an opportunity to integrate our Apache Hadoop
Accelerator with Apache Bigtop. Please let us know if this is possible and
what steps are required of us.

Thanks in advance.
-
Dmitriy Setrakyan, EVP Engineering
*GridGain Systems*
www.gridgain.com

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Anatoli Fomenko <af...@yahoo.com>.
Hi Dmitriy,

Seeing such a great excitement at the Google Cloud Platform Live event,
and numbers from BigQuery demo, I'd say it's a good time to add
high performance in-memory components to Hadoop Stack, and BigTop
would be a natural place to start.

Perhaps you could point to a quick technology intro and differentiators?

Thanks,
Anatoli



On Monday, March 24, 2014 11:12 PM, Roman Shaposhnik <rv...@apache.org> wrote:

Hi Dmitriy!

Welcome to the Bigtop community!

On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <co...@apache.org> wrote:
>> One of the main pieces of our platform is our In-Memory Apache Hadoop
>> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing both,
>> data and computations into memory. We do it with our GGFS - Hadoop
>> compliant in-memory file system. For I/O intensive jobs GridGain GGFS
>> offers performance close to 100x faster than standard HDFS. More
>> information can be found here:
>> http://www.gridgain.org/features/hadoop-acceleration/
>>
>> We would like to have an opportunity to integrate our Apache Hadoop
>> Accelerator with Apache Bigtop. Please let us know if this is possible and
>> what steps are required of us.

I've been actually fascinated by the in-memory analytics platforms lately.
Things like Apache Spark seem to be a really good addition to the
Hadoop ecosystem.

Now, I understand that you've got a piece of technology that can essentially
serve as a replacement for HDFS, but could you please elaborate on
what other integration points do you have between GridGain and the rest
of Hadoop ecosystem?

That, I think, would be a much wider discussion.


Thanks,
Roman.

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Jay Vyas <ja...@gmail.com>.
Yeah sure we can try a google hangout screencast thingy


On Wed, Mar 26, 2014 at 5:31 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Dmitriy,
>
> I think your proposal is great as we are just forming up the agenda for the
> meetup. And it seems to be great to have an deeper dive into the platform,
> which will let folks here to get more familiar with it.
>
> Jay, do you think we can tape the meetup talks and publish it later?
>   Cos
>
> On Wed, Mar 26, 2014 at 02:36PM, Dmitriy Setrakyan wrote:
> > I plan to be at ApacheCon on Monday, April 7th. I hear that Bigtop will
> > have a meetup there in the evening. Do you think it will be OK if I could
> > spend about 20 minutes there to present GridGain GGFS and overall
> approach
> > to Hadoop acceleration? I think it would be interesting to go through a
> > couple of architectural diagrams and may spur a good discussion.
> >
> > -Dmitriy
> >
> > On Wed, Mar 26, 2014 at 8:35 AM, Jay Vyas <ja...@gmail.com> wrote:
> >
> > > I love the fact that GridGain is going to be part of bigtop !   This
> will
> > > give us two new compute paradigms, all packaged  and testable under the
> > > same umbrella.  And now with our vagrant recipes, people will be able
> to
> > > demo grid gain by simply typing "vagrant up" into the console.
> > >
> > > And Im pretty sure GridGain and Spark will drive each other forward .
>  Just
> > > the same way Ceph, HDFS, and GlusterFS do.
> > >
> > > Dmitriy will you be at apachecon?  If so why dont you come share your
> > > thoughts with us at the two bigtop meetups on the 7th and the 8th ?
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan <
> > > dsetrakyan@gridgain.com
> > > > wrote:
> > >
> > > > Andrew,
> > > >
> > > > I agree with you. All I meant to say is that currently users of
> Hadoop
> > > that
> > > > would like to improve performance of their deployments have to
> switch to
> > > > Spark and code to Spark APIs. GridGain, on the other hand, will
> provide
> > > an
> > > > option to accelerate existing Hadoop deployments without any changes
> in
> > > > code.
> > > >
> > > > Regards,
> > > > -Dmtiriy
> > > >
> > > > On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <apurtell@apache.org
> >
> > > > wrote:
> > > >
> > > > > Thank you.
> > > > >
> > > > > On this part of your response:
> > > > >
> > > > > > GridGain is working on adding native MapReduce component which
> will
> > > > > provide
> > > > > native complete Hadoop integration without changes in API, like
> Spark
> > > > > currently forces you to do
> > > > >
> > > > > I'm not sure those flocking to Spark are doing so by force. Nor
> that
> > > the
> > > > > Spark API should be considered a liability when compared to Hadoop
> > > > > MapReduce. For your consideration.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> > > > > dsetrakyan@gridgain.com
> > > > > > wrote:
> > > > >
> > > > > > I think the feature set is pretty close and GGFS would be a good
> > > > contract
> > > > > > to Tachyon for performance and reliability features.
> > > > > >
> > > > > > I am not an expert on Tachyon, but I think the main differences
> are:
> > > > > >
> > > > > > - GGFS allows read-through and write-through to/from underlying
> HDFS
> > > or
> > > > > any
> > > > > > other Hadoop compliant file system with zero code change.
> Essentially
> > > > > GGFS
> > > > > > entirely removes ETL step from integration.
> > > > > >
> > > > > > - GGFS has ability to pick and choose what folders stay in
> memory,
> > > what
> > > > > > folders stay on disc, and what folders get synchronized with
> > > underlying
> > > > > > (HD)FS either synchronously or asynchronously.
> > > > > >
> > > > > > - GridGain is working on adding native MapReduce component which
> will
> > > > > > provide native complete Hadoop integration without changes in
> API,
> > > like
> > > > > > Spark currently forces you to do. Essentially GridGain MR+GGFS
> will
> > > > allow
> > > > > > to bring Hadoop completely or partially in-memory in Plug-n-Play
> > > > fashion
> > > > > > without any API changes.
> > > > > >
> > > > > > There are probably other differences that I am forgetting right
> now,
> > > > but
> > > > > I
> > > > > > think the above set lists the most significant ones.
> > > > > >
> > > > > > Regards,
> > > > > > --
> > > > > > Dmitriy Setrakyan, EVP Engineering
> > > > > > *GridGain Systems*
> > > > > > www.gridgain.com
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <
> > > apurtell@apache.org
> > > > > > >wrote:
> > > > > >
> > > > > > > Dmitriy,
> > > > > > >
> > > > > > > Would it be possible to contrast GGFS with Tachyon (
> > > > > > > http://tachyon-project.org/)?
> > > > > > >
> > > > > > > Also, do you have any plans for Spark integration?
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > > > > > dsetrakyan@gridgain.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Roman,
> > > > > > > >
> > > > > > > > At this point the integration is pluggable in memory file
> system,
> > > > > GGFS.
> > > > > > > It
> > > > > > > > works just like HDFS (same API), but in reality serves as a
> > > caching
> > > > > > layer
> > > > > > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > > > > > synchronizes
> > > > > > > > them with underlying HDFS either synchronously or
> asynchronously,
> > > > > > > depending
> > > > > > > > on configuration.
> > > > > > > >
> > > > > > > > Since, GGFS implements standard Hadoop File System API, it
> > > > > > automatically
> > > > > > > > integrates with other Hadoop ecosystem pieces via File
> System API
> > > > as
> > > > > > > well.
> > > > > > > >
> > > > > > > > Going forward, we are planning to add same native API
> integration
> > > > for
> > > > > > > > MapReduce component as well.
> > > > > > > >
> > > > > > > > Hope this answers your question.
> > > > > > > >
> > > > > > > > -Dmitriy
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <
> > > rvs@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Dmitriy!
> > > > > > > > >
> > > > > > > > > Welcome to the Bigtop community!
> > > > > > > > >
> > > > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> > > > > cos@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >> One of the main pieces of our platform is our In-Memory
> > > Apache
> > > > > > > Hadoop
> > > > > > > > > >> Accelerator which aims to accelerate HDFS and
> Map/Reduce by
> > > > > > bringing
> > > > > > > > > both,
> > > > > > > > > >> data and computations into memory. We do it with our
> GGFS -
> > > > > Hadoop
> > > > > > > > > >> compliant in-memory file system. For I/O intensive jobs
> > > > GridGain
> > > > > > > GGFS
> > > > > > > > > >> offers performance close to 100x faster than standard
> HDFS.
> > > > More
> > > > > > > > > >> information can be found here:
> > > > > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > > > > > >>
> > > > > > > > > >> We would like to have an opportunity to integrate our
> Apache
> > > > > > Hadoop
> > > > > > > > > >> Accelerator with Apache Bigtop. Please let us know if
> this
> > > is
> > > > > > > possible
> > > > > > > > > and
> > > > > > > > > >> what steps are required of us.
> > > > > > > > >
> > > > > > > > > I've been actually fascinated by the in-memory analytics
> > > > platforms
> > > > > > > > lately.
> > > > > > > > > Things like Apache Spark seem to be a really good addition
> to
> > > the
> > > > > > > > > Hadoop ecosystem.
> > > > > > > > >
> > > > > > > > > Now, I understand that you've got a piece of technology
> that
> > > can
> > > > > > > > > essentially
> > > > > > > > > serve as a replacement for HDFS, but could you please
> elaborate
> > > > on
> > > > > > > > > what other integration points do you have between GridGain
> and
> > > > the
> > > > > > rest
> > > > > > > > > of Hadoop ecosystem?
> > > > > > > > >
> > > > > > > > > That, I think, would be a much wider discussion.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Roman.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > >
> > > > > > >    - Andy
> > > > > > >
> > > > > > > Problems worthy of attack prove their worth by hitting back. -
> Piet
> > > > > Hein
> > > > > > > (via Tom White)
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >    - Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jay Vyas
> > > http://jayunit100.blogspot.com
> > >
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Konstantin Boudnik <co...@apache.org>.
Dmitriy,

I think your proposal is great as we are just forming up the agenda for the
meetup. And it seems to be great to have an deeper dive into the platform,
which will let folks here to get more familiar with it.

Jay, do you think we can tape the meetup talks and publish it later?
  Cos

On Wed, Mar 26, 2014 at 02:36PM, Dmitriy Setrakyan wrote:
> I plan to be at ApacheCon on Monday, April 7th. I hear that Bigtop will
> have a meetup there in the evening. Do you think it will be OK if I could
> spend about 20 minutes there to present GridGain GGFS and overall approach
> to Hadoop acceleration? I think it would be interesting to go through a
> couple of architectural diagrams and may spur a good discussion.
> 
> -Dmitriy
> 
> On Wed, Mar 26, 2014 at 8:35 AM, Jay Vyas <ja...@gmail.com> wrote:
> 
> > I love the fact that GridGain is going to be part of bigtop !   This will
> > give us two new compute paradigms, all packaged  and testable under the
> > same umbrella.  And now with our vagrant recipes, people will be able to
> > demo grid gain by simply typing "vagrant up" into the console.
> >
> > And Im pretty sure GridGain and Spark will drive each other forward .  Just
> > the same way Ceph, HDFS, and GlusterFS do.
> >
> > Dmitriy will you be at apachecon?  If so why dont you come share your
> > thoughts with us at the two bigtop meetups on the 7th and the 8th ?
> >
> >
> >
> >
> >
> > On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan <
> > dsetrakyan@gridgain.com
> > > wrote:
> >
> > > Andrew,
> > >
> > > I agree with you. All I meant to say is that currently users of Hadoop
> > that
> > > would like to improve performance of their deployments have to switch to
> > > Spark and code to Spark APIs. GridGain, on the other hand, will provide
> > an
> > > option to accelerate existing Hadoop deployments without any changes in
> > > code.
> > >
> > > Regards,
> > > -Dmtiriy
> > >
> > > On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <ap...@apache.org>
> > > wrote:
> > >
> > > > Thank you.
> > > >
> > > > On this part of your response:
> > > >
> > > > > GridGain is working on adding native MapReduce component which will
> > > > provide
> > > > native complete Hadoop integration without changes in API, like Spark
> > > > currently forces you to do
> > > >
> > > > I'm not sure those flocking to Spark are doing so by force. Nor that
> > the
> > > > Spark API should be considered a liability when compared to Hadoop
> > > > MapReduce. For your consideration.
> > > >
> > > >
> > > >
> > > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> > > > dsetrakyan@gridgain.com
> > > > > wrote:
> > > >
> > > > > I think the feature set is pretty close and GGFS would be a good
> > > contract
> > > > > to Tachyon for performance and reliability features.
> > > > >
> > > > > I am not an expert on Tachyon, but I think the main differences are:
> > > > >
> > > > > - GGFS allows read-through and write-through to/from underlying HDFS
> > or
> > > > any
> > > > > other Hadoop compliant file system with zero code change. Essentially
> > > > GGFS
> > > > > entirely removes ETL step from integration.
> > > > >
> > > > > - GGFS has ability to pick and choose what folders stay in memory,
> > what
> > > > > folders stay on disc, and what folders get synchronized with
> > underlying
> > > > > (HD)FS either synchronously or asynchronously.
> > > > >
> > > > > - GridGain is working on adding native MapReduce component which will
> > > > > provide native complete Hadoop integration without changes in API,
> > like
> > > > > Spark currently forces you to do. Essentially GridGain MR+GGFS will
> > > allow
> > > > > to bring Hadoop completely or partially in-memory in Plug-n-Play
> > > fashion
> > > > > without any API changes.
> > > > >
> > > > > There are probably other differences that I am forgetting right now,
> > > but
> > > > I
> > > > > think the above set lists the most significant ones.
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Dmitriy Setrakyan, EVP Engineering
> > > > > *GridGain Systems*
> > > > > www.gridgain.com
> > > > >
> > > > >
> > > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <
> > apurtell@apache.org
> > > > > >wrote:
> > > > >
> > > > > > Dmitriy,
> > > > > >
> > > > > > Would it be possible to contrast GGFS with Tachyon (
> > > > > > http://tachyon-project.org/)?
> > > > > >
> > > > > > Also, do you have any plans for Spark integration?
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > > > > dsetrakyan@gridgain.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Roman,
> > > > > > >
> > > > > > > At this point the integration is pluggable in memory file system,
> > > > GGFS.
> > > > > > It
> > > > > > > works just like HDFS (same API), but in reality serves as a
> > caching
> > > > > layer
> > > > > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > > > > synchronizes
> > > > > > > them with underlying HDFS either synchronously or asynchronously,
> > > > > > depending
> > > > > > > on configuration.
> > > > > > >
> > > > > > > Since, GGFS implements standard Hadoop File System API, it
> > > > > automatically
> > > > > > > integrates with other Hadoop ecosystem pieces via File System API
> > > as
> > > > > > well.
> > > > > > >
> > > > > > > Going forward, we are planning to add same native API integration
> > > for
> > > > > > > MapReduce component as well.
> > > > > > >
> > > > > > > Hope this answers your question.
> > > > > > >
> > > > > > > -Dmitriy
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <
> > rvs@apache.org
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Dmitriy!
> > > > > > > >
> > > > > > > > Welcome to the Bigtop community!
> > > > > > > >
> > > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> > > > cos@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >> One of the main pieces of our platform is our In-Memory
> > Apache
> > > > > > Hadoop
> > > > > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by
> > > > > bringing
> > > > > > > > both,
> > > > > > > > >> data and computations into memory. We do it with our GGFS -
> > > > Hadoop
> > > > > > > > >> compliant in-memory file system. For I/O intensive jobs
> > > GridGain
> > > > > > GGFS
> > > > > > > > >> offers performance close to 100x faster than standard HDFS.
> > > More
> > > > > > > > >> information can be found here:
> > > > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > > > > >>
> > > > > > > > >> We would like to have an opportunity to integrate our Apache
> > > > > Hadoop
> > > > > > > > >> Accelerator with Apache Bigtop. Please let us know if this
> > is
> > > > > > possible
> > > > > > > > and
> > > > > > > > >> what steps are required of us.
> > > > > > > >
> > > > > > > > I've been actually fascinated by the in-memory analytics
> > > platforms
> > > > > > > lately.
> > > > > > > > Things like Apache Spark seem to be a really good addition to
> > the
> > > > > > > > Hadoop ecosystem.
> > > > > > > >
> > > > > > > > Now, I understand that you've got a piece of technology that
> > can
> > > > > > > > essentially
> > > > > > > > serve as a replacement for HDFS, but could you please elaborate
> > > on
> > > > > > > > what other integration points do you have between GridGain and
> > > the
> > > > > rest
> > > > > > > > of Hadoop ecosystem?
> > > > > > > >
> > > > > > > > That, I think, would be a much wider discussion.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Roman.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > >
> > > > > >    - Andy
> > > > > >
> > > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > > Hein
> > > > > > (via Tom White)
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
> >
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> >

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Konstantin Boudnik <co...@apache.org>.
Oh, and BTW - don't forget to tweet with #asfbigtop hash to RSVP for the event ;)

On Wed, Mar 26, 2014 at 02:36PM, Dmitriy Setrakyan wrote:
> I plan to be at ApacheCon on Monday, April 7th. I hear that Bigtop will
> have a meetup there in the evening. Do you think it will be OK if I could
> spend about 20 minutes there to present GridGain GGFS and overall approach
> to Hadoop acceleration? I think it would be interesting to go through a
> couple of architectural diagrams and may spur a good discussion.
> 
> -Dmitriy
> 
> On Wed, Mar 26, 2014 at 8:35 AM, Jay Vyas <ja...@gmail.com> wrote:
> 
> > I love the fact that GridGain is going to be part of bigtop !   This will
> > give us two new compute paradigms, all packaged  and testable under the
> > same umbrella.  And now with our vagrant recipes, people will be able to
> > demo grid gain by simply typing "vagrant up" into the console.
> >
> > And Im pretty sure GridGain and Spark will drive each other forward .  Just
> > the same way Ceph, HDFS, and GlusterFS do.
> >
> > Dmitriy will you be at apachecon?  If so why dont you come share your
> > thoughts with us at the two bigtop meetups on the 7th and the 8th ?
> >
> >
> >
> >
> >
> > On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan <
> > dsetrakyan@gridgain.com
> > > wrote:
> >
> > > Andrew,
> > >
> > > I agree with you. All I meant to say is that currently users of Hadoop
> > that
> > > would like to improve performance of their deployments have to switch to
> > > Spark and code to Spark APIs. GridGain, on the other hand, will provide
> > an
> > > option to accelerate existing Hadoop deployments without any changes in
> > > code.
> > >
> > > Regards,
> > > -Dmtiriy
> > >
> > > On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <ap...@apache.org>
> > > wrote:
> > >
> > > > Thank you.
> > > >
> > > > On this part of your response:
> > > >
> > > > > GridGain is working on adding native MapReduce component which will
> > > > provide
> > > > native complete Hadoop integration without changes in API, like Spark
> > > > currently forces you to do
> > > >
> > > > I'm not sure those flocking to Spark are doing so by force. Nor that
> > the
> > > > Spark API should be considered a liability when compared to Hadoop
> > > > MapReduce. For your consideration.
> > > >
> > > >
> > > >
> > > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> > > > dsetrakyan@gridgain.com
> > > > > wrote:
> > > >
> > > > > I think the feature set is pretty close and GGFS would be a good
> > > contract
> > > > > to Tachyon for performance and reliability features.
> > > > >
> > > > > I am not an expert on Tachyon, but I think the main differences are:
> > > > >
> > > > > - GGFS allows read-through and write-through to/from underlying HDFS
> > or
> > > > any
> > > > > other Hadoop compliant file system with zero code change. Essentially
> > > > GGFS
> > > > > entirely removes ETL step from integration.
> > > > >
> > > > > - GGFS has ability to pick and choose what folders stay in memory,
> > what
> > > > > folders stay on disc, and what folders get synchronized with
> > underlying
> > > > > (HD)FS either synchronously or asynchronously.
> > > > >
> > > > > - GridGain is working on adding native MapReduce component which will
> > > > > provide native complete Hadoop integration without changes in API,
> > like
> > > > > Spark currently forces you to do. Essentially GridGain MR+GGFS will
> > > allow
> > > > > to bring Hadoop completely or partially in-memory in Plug-n-Play
> > > fashion
> > > > > without any API changes.
> > > > >
> > > > > There are probably other differences that I am forgetting right now,
> > > but
> > > > I
> > > > > think the above set lists the most significant ones.
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Dmitriy Setrakyan, EVP Engineering
> > > > > *GridGain Systems*
> > > > > www.gridgain.com
> > > > >
> > > > >
> > > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <
> > apurtell@apache.org
> > > > > >wrote:
> > > > >
> > > > > > Dmitriy,
> > > > > >
> > > > > > Would it be possible to contrast GGFS with Tachyon (
> > > > > > http://tachyon-project.org/)?
> > > > > >
> > > > > > Also, do you have any plans for Spark integration?
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > > > > dsetrakyan@gridgain.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Roman,
> > > > > > >
> > > > > > > At this point the integration is pluggable in memory file system,
> > > > GGFS.
> > > > > > It
> > > > > > > works just like HDFS (same API), but in reality serves as a
> > caching
> > > > > layer
> > > > > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > > > > synchronizes
> > > > > > > them with underlying HDFS either synchronously or asynchronously,
> > > > > > depending
> > > > > > > on configuration.
> > > > > > >
> > > > > > > Since, GGFS implements standard Hadoop File System API, it
> > > > > automatically
> > > > > > > integrates with other Hadoop ecosystem pieces via File System API
> > > as
> > > > > > well.
> > > > > > >
> > > > > > > Going forward, we are planning to add same native API integration
> > > for
> > > > > > > MapReduce component as well.
> > > > > > >
> > > > > > > Hope this answers your question.
> > > > > > >
> > > > > > > -Dmitriy
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <
> > rvs@apache.org
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Dmitriy!
> > > > > > > >
> > > > > > > > Welcome to the Bigtop community!
> > > > > > > >
> > > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> > > > cos@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >> One of the main pieces of our platform is our In-Memory
> > Apache
> > > > > > Hadoop
> > > > > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by
> > > > > bringing
> > > > > > > > both,
> > > > > > > > >> data and computations into memory. We do it with our GGFS -
> > > > Hadoop
> > > > > > > > >> compliant in-memory file system. For I/O intensive jobs
> > > GridGain
> > > > > > GGFS
> > > > > > > > >> offers performance close to 100x faster than standard HDFS.
> > > More
> > > > > > > > >> information can be found here:
> > > > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > > > > >>
> > > > > > > > >> We would like to have an opportunity to integrate our Apache
> > > > > Hadoop
> > > > > > > > >> Accelerator with Apache Bigtop. Please let us know if this
> > is
> > > > > > possible
> > > > > > > > and
> > > > > > > > >> what steps are required of us.
> > > > > > > >
> > > > > > > > I've been actually fascinated by the in-memory analytics
> > > platforms
> > > > > > > lately.
> > > > > > > > Things like Apache Spark seem to be a really good addition to
> > the
> > > > > > > > Hadoop ecosystem.
> > > > > > > >
> > > > > > > > Now, I understand that you've got a piece of technology that
> > can
> > > > > > > > essentially
> > > > > > > > serve as a replacement for HDFS, but could you please elaborate
> > > on
> > > > > > > > what other integration points do you have between GridGain and
> > > the
> > > > > rest
> > > > > > > > of Hadoop ecosystem?
> > > > > > > >
> > > > > > > > That, I think, would be a much wider discussion.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Roman.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > >
> > > > > >    - Andy
> > > > > >
> > > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > > Hein
> > > > > > (via Tom White)
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
> >
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> >

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Dmitriy Setrakyan <ds...@gridgain.com>.
I plan to be at ApacheCon on Monday, April 7th. I hear that Bigtop will
have a meetup there in the evening. Do you think it will be OK if I could
spend about 20 minutes there to present GridGain GGFS and overall approach
to Hadoop acceleration? I think it would be interesting to go through a
couple of architectural diagrams and may spur a good discussion.

-Dmitriy

On Wed, Mar 26, 2014 at 8:35 AM, Jay Vyas <ja...@gmail.com> wrote:

> I love the fact that GridGain is going to be part of bigtop !   This will
> give us two new compute paradigms, all packaged  and testable under the
> same umbrella.  And now with our vagrant recipes, people will be able to
> demo grid gain by simply typing "vagrant up" into the console.
>
> And Im pretty sure GridGain and Spark will drive each other forward .  Just
> the same way Ceph, HDFS, and GlusterFS do.
>
> Dmitriy will you be at apachecon?  If so why dont you come share your
> thoughts with us at the two bigtop meetups on the 7th and the 8th ?
>
>
>
>
>
> On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan <
> dsetrakyan@gridgain.com
> > wrote:
>
> > Andrew,
> >
> > I agree with you. All I meant to say is that currently users of Hadoop
> that
> > would like to improve performance of their deployments have to switch to
> > Spark and code to Spark APIs. GridGain, on the other hand, will provide
> an
> > option to accelerate existing Hadoop deployments without any changes in
> > code.
> >
> > Regards,
> > -Dmtiriy
> >
> > On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > Thank you.
> > >
> > > On this part of your response:
> > >
> > > > GridGain is working on adding native MapReduce component which will
> > > provide
> > > native complete Hadoop integration without changes in API, like Spark
> > > currently forces you to do
> > >
> > > I'm not sure those flocking to Spark are doing so by force. Nor that
> the
> > > Spark API should be considered a liability when compared to Hadoop
> > > MapReduce. For your consideration.
> > >
> > >
> > >
> > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> > > dsetrakyan@gridgain.com
> > > > wrote:
> > >
> > > > I think the feature set is pretty close and GGFS would be a good
> > contract
> > > > to Tachyon for performance and reliability features.
> > > >
> > > > I am not an expert on Tachyon, but I think the main differences are:
> > > >
> > > > - GGFS allows read-through and write-through to/from underlying HDFS
> or
> > > any
> > > > other Hadoop compliant file system with zero code change. Essentially
> > > GGFS
> > > > entirely removes ETL step from integration.
> > > >
> > > > - GGFS has ability to pick and choose what folders stay in memory,
> what
> > > > folders stay on disc, and what folders get synchronized with
> underlying
> > > > (HD)FS either synchronously or asynchronously.
> > > >
> > > > - GridGain is working on adding native MapReduce component which will
> > > > provide native complete Hadoop integration without changes in API,
> like
> > > > Spark currently forces you to do. Essentially GridGain MR+GGFS will
> > allow
> > > > to bring Hadoop completely or partially in-memory in Plug-n-Play
> > fashion
> > > > without any API changes.
> > > >
> > > > There are probably other differences that I am forgetting right now,
> > but
> > > I
> > > > think the above set lists the most significant ones.
> > > >
> > > > Regards,
> > > > --
> > > > Dmitriy Setrakyan, EVP Engineering
> > > > *GridGain Systems*
> > > > www.gridgain.com
> > > >
> > > >
> > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <
> apurtell@apache.org
> > > > >wrote:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > > Would it be possible to contrast GGFS with Tachyon (
> > > > > http://tachyon-project.org/)?
> > > > >
> > > > > Also, do you have any plans for Spark integration?
> > > > >
> > > > >
> > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > > > dsetrakyan@gridgain.com
> > > > > > wrote:
> > > > >
> > > > > > Hi Roman,
> > > > > >
> > > > > > At this point the integration is pluggable in memory file system,
> > > GGFS.
> > > > > It
> > > > > > works just like HDFS (same API), but in reality serves as a
> caching
> > > > layer
> > > > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > > > synchronizes
> > > > > > them with underlying HDFS either synchronously or asynchronously,
> > > > > depending
> > > > > > on configuration.
> > > > > >
> > > > > > Since, GGFS implements standard Hadoop File System API, it
> > > > automatically
> > > > > > integrates with other Hadoop ecosystem pieces via File System API
> > as
> > > > > well.
> > > > > >
> > > > > > Going forward, we are planning to add same native API integration
> > for
> > > > > > MapReduce component as well.
> > > > > >
> > > > > > Hope this answers your question.
> > > > > >
> > > > > > -Dmitriy
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <
> rvs@apache.org
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Dmitriy!
> > > > > > >
> > > > > > > Welcome to the Bigtop community!
> > > > > > >
> > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> > > cos@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > >> One of the main pieces of our platform is our In-Memory
> Apache
> > > > > Hadoop
> > > > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by
> > > > bringing
> > > > > > > both,
> > > > > > > >> data and computations into memory. We do it with our GGFS -
> > > Hadoop
> > > > > > > >> compliant in-memory file system. For I/O intensive jobs
> > GridGain
> > > > > GGFS
> > > > > > > >> offers performance close to 100x faster than standard HDFS.
> > More
> > > > > > > >> information can be found here:
> > > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > > > >>
> > > > > > > >> We would like to have an opportunity to integrate our Apache
> > > > Hadoop
> > > > > > > >> Accelerator with Apache Bigtop. Please let us know if this
> is
> > > > > possible
> > > > > > > and
> > > > > > > >> what steps are required of us.
> > > > > > >
> > > > > > > I've been actually fascinated by the in-memory analytics
> > platforms
> > > > > > lately.
> > > > > > > Things like Apache Spark seem to be a really good addition to
> the
> > > > > > > Hadoop ecosystem.
> > > > > > >
> > > > > > > Now, I understand that you've got a piece of technology that
> can
> > > > > > > essentially
> > > > > > > serve as a replacement for HDFS, but could you please elaborate
> > on
> > > > > > > what other integration points do you have between GridGain and
> > the
> > > > rest
> > > > > > > of Hadoop ecosystem?
> > > > > > >
> > > > > > > That, I think, would be a much wider discussion.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Roman.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >    - Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Jay Vyas <ja...@gmail.com>.
I love the fact that GridGain is going to be part of bigtop !   This will
give us two new compute paradigms, all packaged  and testable under the
same umbrella.  And now with our vagrant recipes, people will be able to
demo grid gain by simply typing "vagrant up" into the console.

And Im pretty sure GridGain and Spark will drive each other forward .  Just
the same way Ceph, HDFS, and GlusterFS do.

Dmitriy will you be at apachecon?  If so why dont you come share your
thoughts with us at the two bigtop meetups on the 7th and the 8th ?





On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan <dsetrakyan@gridgain.com
> wrote:

> Andrew,
>
> I agree with you. All I meant to say is that currently users of Hadoop that
> would like to improve performance of their deployments have to switch to
> Spark and code to Spark APIs. GridGain, on the other hand, will provide an
> option to accelerate existing Hadoop deployments without any changes in
> code.
>
> Regards,
> -Dmtiriy
>
> On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > Thank you.
> >
> > On this part of your response:
> >
> > > GridGain is working on adding native MapReduce component which will
> > provide
> > native complete Hadoop integration without changes in API, like Spark
> > currently forces you to do
> >
> > I'm not sure those flocking to Spark are doing so by force. Nor that the
> > Spark API should be considered a liability when compared to Hadoop
> > MapReduce. For your consideration.
> >
> >
> >
> > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> > dsetrakyan@gridgain.com
> > > wrote:
> >
> > > I think the feature set is pretty close and GGFS would be a good
> contract
> > > to Tachyon for performance and reliability features.
> > >
> > > I am not an expert on Tachyon, but I think the main differences are:
> > >
> > > - GGFS allows read-through and write-through to/from underlying HDFS or
> > any
> > > other Hadoop compliant file system with zero code change. Essentially
> > GGFS
> > > entirely removes ETL step from integration.
> > >
> > > - GGFS has ability to pick and choose what folders stay in memory, what
> > > folders stay on disc, and what folders get synchronized with underlying
> > > (HD)FS either synchronously or asynchronously.
> > >
> > > - GridGain is working on adding native MapReduce component which will
> > > provide native complete Hadoop integration without changes in API, like
> > > Spark currently forces you to do. Essentially GridGain MR+GGFS will
> allow
> > > to bring Hadoop completely or partially in-memory in Plug-n-Play
> fashion
> > > without any API changes.
> > >
> > > There are probably other differences that I am forgetting right now,
> but
> > I
> > > think the above set lists the most significant ones.
> > >
> > > Regards,
> > > --
> > > Dmitriy Setrakyan, EVP Engineering
> > > *GridGain Systems*
> > > www.gridgain.com
> > >
> > >
> > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <apurtell@apache.org
> > > >wrote:
> > >
> > > > Dmitriy,
> > > >
> > > > Would it be possible to contrast GGFS with Tachyon (
> > > > http://tachyon-project.org/)?
> > > >
> > > > Also, do you have any plans for Spark integration?
> > > >
> > > >
> > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > > dsetrakyan@gridgain.com
> > > > > wrote:
> > > >
> > > > > Hi Roman,
> > > > >
> > > > > At this point the integration is pluggable in memory file system,
> > GGFS.
> > > > It
> > > > > works just like HDFS (same API), but in reality serves as a caching
> > > layer
> > > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > > synchronizes
> > > > > them with underlying HDFS either synchronously or asynchronously,
> > > > depending
> > > > > on configuration.
> > > > >
> > > > > Since, GGFS implements standard Hadoop File System API, it
> > > automatically
> > > > > integrates with other Hadoop ecosystem pieces via File System API
> as
> > > > well.
> > > > >
> > > > > Going forward, we are planning to add same native API integration
> for
> > > > > MapReduce component as well.
> > > > >
> > > > > Hope this answers your question.
> > > > >
> > > > > -Dmitriy
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <rvs@apache.org
> >
> > > > wrote:
> > > > >
> > > > > > Hi Dmitriy!
> > > > > >
> > > > > > Welcome to the Bigtop community!
> > > > > >
> > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> > cos@apache.org
> > > >
> > > > > > wrote:
> > > > > > >> One of the main pieces of our platform is our In-Memory Apache
> > > > Hadoop
> > > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by
> > > bringing
> > > > > > both,
> > > > > > >> data and computations into memory. We do it with our GGFS -
> > Hadoop
> > > > > > >> compliant in-memory file system. For I/O intensive jobs
> GridGain
> > > > GGFS
> > > > > > >> offers performance close to 100x faster than standard HDFS.
> More
> > > > > > >> information can be found here:
> > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > > >>
> > > > > > >> We would like to have an opportunity to integrate our Apache
> > > Hadoop
> > > > > > >> Accelerator with Apache Bigtop. Please let us know if this is
> > > > possible
> > > > > > and
> > > > > > >> what steps are required of us.
> > > > > >
> > > > > > I've been actually fascinated by the in-memory analytics
> platforms
> > > > > lately.
> > > > > > Things like Apache Spark seem to be a really good addition to the
> > > > > > Hadoop ecosystem.
> > > > > >
> > > > > > Now, I understand that you've got a piece of technology that can
> > > > > > essentially
> > > > > > serve as a replacement for HDFS, but could you please elaborate
> on
> > > > > > what other integration points do you have between GridGain and
> the
> > > rest
> > > > > > of Hadoop ecosystem?
> > > > > >
> > > > > > That, I think, would be a much wider discussion.
> > > > > >
> > > > > > Thanks,
> > > > > > Roman.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Dmitriy Setrakyan <ds...@gridgain.com>.
Andrew,

I agree with you. All I meant to say is that currently users of Hadoop that
would like to improve performance of their deployments have to switch to
Spark and code to Spark APIs. GridGain, on the other hand, will provide an
option to accelerate existing Hadoop deployments without any changes in
code.

Regards,
-Dmtiriy

On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <ap...@apache.org> wrote:

> Thank you.
>
> On this part of your response:
>
> > GridGain is working on adding native MapReduce component which will
> provide
> native complete Hadoop integration without changes in API, like Spark
> currently forces you to do
>
> I'm not sure those flocking to Spark are doing so by force. Nor that the
> Spark API should be considered a liability when compared to Hadoop
> MapReduce. For your consideration.
>
>
>
> On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> dsetrakyan@gridgain.com
> > wrote:
>
> > I think the feature set is pretty close and GGFS would be a good contract
> > to Tachyon for performance and reliability features.
> >
> > I am not an expert on Tachyon, but I think the main differences are:
> >
> > - GGFS allows read-through and write-through to/from underlying HDFS or
> any
> > other Hadoop compliant file system with zero code change. Essentially
> GGFS
> > entirely removes ETL step from integration.
> >
> > - GGFS has ability to pick and choose what folders stay in memory, what
> > folders stay on disc, and what folders get synchronized with underlying
> > (HD)FS either synchronously or asynchronously.
> >
> > - GridGain is working on adding native MapReduce component which will
> > provide native complete Hadoop integration without changes in API, like
> > Spark currently forces you to do. Essentially GridGain MR+GGFS will allow
> > to bring Hadoop completely or partially in-memory in Plug-n-Play fashion
> > without any API changes.
> >
> > There are probably other differences that I am forgetting right now, but
> I
> > think the above set lists the most significant ones.
> >
> > Regards,
> > --
> > Dmitriy Setrakyan, EVP Engineering
> > *GridGain Systems*
> > www.gridgain.com
> >
> >
> > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <apurtell@apache.org
> > >wrote:
> >
> > > Dmitriy,
> > >
> > > Would it be possible to contrast GGFS with Tachyon (
> > > http://tachyon-project.org/)?
> > >
> > > Also, do you have any plans for Spark integration?
> > >
> > >
> > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > dsetrakyan@gridgain.com
> > > > wrote:
> > >
> > > > Hi Roman,
> > > >
> > > > At this point the integration is pluggable in memory file system,
> GGFS.
> > > It
> > > > works just like HDFS (same API), but in reality serves as a caching
> > layer
> > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > synchronizes
> > > > them with underlying HDFS either synchronously or asynchronously,
> > > depending
> > > > on configuration.
> > > >
> > > > Since, GGFS implements standard Hadoop File System API, it
> > automatically
> > > > integrates with other Hadoop ecosystem pieces via File System API as
> > > well.
> > > >
> > > > Going forward, we are planning to add same native API integration for
> > > > MapReduce component as well.
> > > >
> > > > Hope this answers your question.
> > > >
> > > > -Dmitriy
> > > >
> > > >
> > > >
> > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <rv...@apache.org>
> > > wrote:
> > > >
> > > > > Hi Dmitriy!
> > > > >
> > > > > Welcome to the Bigtop community!
> > > > >
> > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> cos@apache.org
> > >
> > > > > wrote:
> > > > > >> One of the main pieces of our platform is our In-Memory Apache
> > > Hadoop
> > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by
> > bringing
> > > > > both,
> > > > > >> data and computations into memory. We do it with our GGFS -
> Hadoop
> > > > > >> compliant in-memory file system. For I/O intensive jobs GridGain
> > > GGFS
> > > > > >> offers performance close to 100x faster than standard HDFS. More
> > > > > >> information can be found here:
> > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > >>
> > > > > >> We would like to have an opportunity to integrate our Apache
> > Hadoop
> > > > > >> Accelerator with Apache Bigtop. Please let us know if this is
> > > possible
> > > > > and
> > > > > >> what steps are required of us.
> > > > >
> > > > > I've been actually fascinated by the in-memory analytics platforms
> > > > lately.
> > > > > Things like Apache Spark seem to be a really good addition to the
> > > > > Hadoop ecosystem.
> > > > >
> > > > > Now, I understand that you've got a piece of technology that can
> > > > > essentially
> > > > > serve as a replacement for HDFS, but could you please elaborate on
> > > > > what other integration points do you have between GridGain and the
> > rest
> > > > > of Hadoop ecosystem?
> > > > >
> > > > > That, I think, would be a much wider discussion.
> > > > >
> > > > > Thanks,
> > > > > Roman.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Andrew Purtell <ap...@apache.org>.
Thank you.

On this part of your response:

> GridGain is working on adding native MapReduce component which will provide
native complete Hadoop integration without changes in API, like Spark
currently forces you to do

I'm not sure those flocking to Spark are doing so by force. Nor that the
Spark API should be considered a liability when compared to Hadoop
MapReduce. For your consideration.



On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <dsetrakyan@gridgain.com
> wrote:

> I think the feature set is pretty close and GGFS would be a good contract
> to Tachyon for performance and reliability features.
>
> I am not an expert on Tachyon, but I think the main differences are:
>
> - GGFS allows read-through and write-through to/from underlying HDFS or any
> other Hadoop compliant file system with zero code change. Essentially GGFS
> entirely removes ETL step from integration.
>
> - GGFS has ability to pick and choose what folders stay in memory, what
> folders stay on disc, and what folders get synchronized with underlying
> (HD)FS either synchronously or asynchronously.
>
> - GridGain is working on adding native MapReduce component which will
> provide native complete Hadoop integration without changes in API, like
> Spark currently forces you to do. Essentially GridGain MR+GGFS will allow
> to bring Hadoop completely or partially in-memory in Plug-n-Play fashion
> without any API changes.
>
> There are probably other differences that I am forgetting right now, but I
> think the above set lists the most significant ones.
>
> Regards,
> --
> Dmitriy Setrakyan, EVP Engineering
> *GridGain Systems*
> www.gridgain.com
>
>
> On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <apurtell@apache.org
> >wrote:
>
> > Dmitriy,
> >
> > Would it be possible to contrast GGFS with Tachyon (
> > http://tachyon-project.org/)?
> >
> > Also, do you have any plans for Spark integration?
> >
> >
> > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > dsetrakyan@gridgain.com
> > > wrote:
> >
> > > Hi Roman,
> > >
> > > At this point the integration is pluggable in memory file system, GGFS.
> > It
> > > works just like HDFS (same API), but in reality serves as a caching
> layer
> > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > synchronizes
> > > them with underlying HDFS either synchronously or asynchronously,
> > depending
> > > on configuration.
> > >
> > > Since, GGFS implements standard Hadoop File System API, it
> automatically
> > > integrates with other Hadoop ecosystem pieces via File System API as
> > well.
> > >
> > > Going forward, we are planning to add same native API integration for
> > > MapReduce component as well.
> > >
> > > Hope this answers your question.
> > >
> > > -Dmitriy
> > >
> > >
> > >
> > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <rv...@apache.org>
> > wrote:
> > >
> > > > Hi Dmitriy!
> > > >
> > > > Welcome to the Bigtop community!
> > > >
> > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <cos@apache.org
> >
> > > > wrote:
> > > > >> One of the main pieces of our platform is our In-Memory Apache
> > Hadoop
> > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by
> bringing
> > > > both,
> > > > >> data and computations into memory. We do it with our GGFS - Hadoop
> > > > >> compliant in-memory file system. For I/O intensive jobs GridGain
> > GGFS
> > > > >> offers performance close to 100x faster than standard HDFS. More
> > > > >> information can be found here:
> > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > >>
> > > > >> We would like to have an opportunity to integrate our Apache
> Hadoop
> > > > >> Accelerator with Apache Bigtop. Please let us know if this is
> > possible
> > > > and
> > > > >> what steps are required of us.
> > > >
> > > > I've been actually fascinated by the in-memory analytics platforms
> > > lately.
> > > > Things like Apache Spark seem to be a really good addition to the
> > > > Hadoop ecosystem.
> > > >
> > > > Now, I understand that you've got a piece of technology that can
> > > > essentially
> > > > serve as a replacement for HDFS, but could you please elaborate on
> > > > what other integration points do you have between GridGain and the
> rest
> > > > of Hadoop ecosystem?
> > > >
> > > > That, I think, would be a much wider discussion.
> > > >
> > > > Thanks,
> > > > Roman.
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Dmitriy Setrakyan <ds...@gridgain.com>.
I think the feature set is pretty close and GGFS would be a good contract
to Tachyon for performance and reliability features.

I am not an expert on Tachyon, but I think the main differences are:

- GGFS allows read-through and write-through to/from underlying HDFS or any
other Hadoop compliant file system with zero code change. Essentially GGFS
entirely removes ETL step from integration.

- GGFS has ability to pick and choose what folders stay in memory, what
folders stay on disc, and what folders get synchronized with underlying
(HD)FS either synchronously or asynchronously.

- GridGain is working on adding native MapReduce component which will
provide native complete Hadoop integration without changes in API, like
Spark currently forces you to do. Essentially GridGain MR+GGFS will allow
to bring Hadoop completely or partially in-memory in Plug-n-Play fashion
without any API changes.

There are probably other differences that I am forgetting right now, but I
think the above set lists the most significant ones.

Regards,
--
Dmitriy Setrakyan, EVP Engineering
*GridGain Systems*
www.gridgain.com


On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <ap...@apache.org>wrote:

> Dmitriy,
>
> Would it be possible to contrast GGFS with Tachyon (
> http://tachyon-project.org/)?
>
> Also, do you have any plans for Spark integration?
>
>
> On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> dsetrakyan@gridgain.com
> > wrote:
>
> > Hi Roman,
> >
> > At this point the integration is pluggable in memory file system, GGFS.
> It
> > works just like HDFS (same API), but in reality serves as a caching layer
> > on top  of HDFS. GGFS caches the hottest file blocks and then
> synchronizes
> > them with underlying HDFS either synchronously or asynchronously,
> depending
> > on configuration.
> >
> > Since, GGFS implements standard Hadoop File System API, it automatically
> > integrates with other Hadoop ecosystem pieces via File System API as
> well.
> >
> > Going forward, we are planning to add same native API integration for
> > MapReduce component as well.
> >
> > Hope this answers your question.
> >
> > -Dmitriy
> >
> >
> >
> > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <rv...@apache.org>
> wrote:
> >
> > > Hi Dmitriy!
> > >
> > > Welcome to the Bigtop community!
> > >
> > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <co...@apache.org>
> > > wrote:
> > > >> One of the main pieces of our platform is our In-Memory Apache
> Hadoop
> > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing
> > > both,
> > > >> data and computations into memory. We do it with our GGFS - Hadoop
> > > >> compliant in-memory file system. For I/O intensive jobs GridGain
> GGFS
> > > >> offers performance close to 100x faster than standard HDFS. More
> > > >> information can be found here:
> > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > >>
> > > >> We would like to have an opportunity to integrate our Apache Hadoop
> > > >> Accelerator with Apache Bigtop. Please let us know if this is
> possible
> > > and
> > > >> what steps are required of us.
> > >
> > > I've been actually fascinated by the in-memory analytics platforms
> > lately.
> > > Things like Apache Spark seem to be a really good addition to the
> > > Hadoop ecosystem.
> > >
> > > Now, I understand that you've got a piece of technology that can
> > > essentially
> > > serve as a replacement for HDFS, but could you please elaborate on
> > > what other integration points do you have between GridGain and the rest
> > > of Hadoop ecosystem?
> > >
> > > That, I think, would be a much wider discussion.
> > >
> > > Thanks,
> > > Roman.
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Andrew Purtell <ap...@apache.org>.
Dmitriy,

Would it be possible to contrast GGFS with Tachyon (
http://tachyon-project.org/)?

Also, do you have any plans for Spark integration?


On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <dsetrakyan@gridgain.com
> wrote:

> Hi Roman,
>
> At this point the integration is pluggable in memory file system, GGFS. It
> works just like HDFS (same API), but in reality serves as a caching layer
> on top  of HDFS. GGFS caches the hottest file blocks and then synchronizes
> them with underlying HDFS either synchronously or asynchronously, depending
> on configuration.
>
> Since, GGFS implements standard Hadoop File System API, it automatically
> integrates with other Hadoop ecosystem pieces via File System API as well.
>
> Going forward, we are planning to add same native API integration for
> MapReduce component as well.
>
> Hope this answers your question.
>
> -Dmitriy
>
>
>
> On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <rv...@apache.org> wrote:
>
> > Hi Dmitriy!
> >
> > Welcome to the Bigtop community!
> >
> > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <co...@apache.org>
> > wrote:
> > >> One of the main pieces of our platform is our In-Memory Apache Hadoop
> > >> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing
> > both,
> > >> data and computations into memory. We do it with our GGFS - Hadoop
> > >> compliant in-memory file system. For I/O intensive jobs GridGain GGFS
> > >> offers performance close to 100x faster than standard HDFS. More
> > >> information can be found here:
> > >> http://www.gridgain.org/features/hadoop-acceleration/
> > >>
> > >> We would like to have an opportunity to integrate our Apache Hadoop
> > >> Accelerator with Apache Bigtop. Please let us know if this is possible
> > and
> > >> what steps are required of us.
> >
> > I've been actually fascinated by the in-memory analytics platforms
> lately.
> > Things like Apache Spark seem to be a really good addition to the
> > Hadoop ecosystem.
> >
> > Now, I understand that you've got a piece of technology that can
> > essentially
> > serve as a replacement for HDFS, but could you please elaborate on
> > what other integration points do you have between GridGain and the rest
> > of Hadoop ecosystem?
> >
> > That, I think, would be a much wider discussion.
> >
> > Thanks,
> > Roman.
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Dmitriy Setrakyan <ds...@gridgain.com>.
Hi Roman,

At this point the integration is pluggable in memory file system, GGFS. It
works just like HDFS (same API), but in reality serves as a caching layer
on top  of HDFS. GGFS caches the hottest file blocks and then synchronizes
them with underlying HDFS either synchronously or asynchronously, depending
on configuration.

Since, GGFS implements standard Hadoop File System API, it automatically
integrates with other Hadoop ecosystem pieces via File System API as well.

Going forward, we are planning to add same native API integration for
MapReduce component as well.

Hope this answers your question.

-Dmitriy



On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <rv...@apache.org> wrote:

> Hi Dmitriy!
>
> Welcome to the Bigtop community!
>
> On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >> One of the main pieces of our platform is our In-Memory Apache Hadoop
> >> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing
> both,
> >> data and computations into memory. We do it with our GGFS - Hadoop
> >> compliant in-memory file system. For I/O intensive jobs GridGain GGFS
> >> offers performance close to 100x faster than standard HDFS. More
> >> information can be found here:
> >> http://www.gridgain.org/features/hadoop-acceleration/
> >>
> >> We would like to have an opportunity to integrate our Apache Hadoop
> >> Accelerator with Apache Bigtop. Please let us know if this is possible
> and
> >> what steps are required of us.
>
> I've been actually fascinated by the in-memory analytics platforms lately.
> Things like Apache Spark seem to be a really good addition to the
> Hadoop ecosystem.
>
> Now, I understand that you've got a piece of technology that can
> essentially
> serve as a replacement for HDFS, but could you please elaborate on
> what other integration points do you have between GridGain and the rest
> of Hadoop ecosystem?
>
> That, I think, would be a much wider discussion.
>
> Thanks,
> Roman.
>

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Roman Shaposhnik <rv...@apache.org>.
Hi Dmitriy!

Welcome to the Bigtop community!

On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <co...@apache.org> wrote:
>> One of the main pieces of our platform is our In-Memory Apache Hadoop
>> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing both,
>> data and computations into memory. We do it with our GGFS - Hadoop
>> compliant in-memory file system. For I/O intensive jobs GridGain GGFS
>> offers performance close to 100x faster than standard HDFS. More
>> information can be found here:
>> http://www.gridgain.org/features/hadoop-acceleration/
>>
>> We would like to have an opportunity to integrate our Apache Hadoop
>> Accelerator with Apache Bigtop. Please let us know if this is possible and
>> what steps are required of us.

I've been actually fascinated by the in-memory analytics platforms lately.
Things like Apache Spark seem to be a really good addition to the
Hadoop ecosystem.

Now, I understand that you've got a piece of technology that can essentially
serve as a replacement for HDFS, but could you please elaborate on
what other integration points do you have between GridGain and the rest
of Hadoop ecosystem?

That, I think, would be a much wider discussion.

Thanks,
Roman.

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Dmitriy Setrakyan <ds...@gridgain.com>.
Thanks Konstantin and the whole Bigtop team!

Happy to integrate with Bigtop. We will make sure to read the information
you provided. Let's talk some details at the Bigtop meetup today.

Regards,
--
Dmitriy Setrakyan, EVP Engineering
*GridGain Systems*
www.gridgain.com


On Mon, Apr 7, 2014 at 10:30 AM, Konstantin Boudnik <co...@apache.org> wrote:

> Looks like the discussion if winding down and I don't see any negative
> reaction to the proposal. Which is great as it seems to make sense to have
> an
> ability to offer a different approach to in-memory processing as well as
> performance acceleration for the tools that rely on traditional MR model.
>
> Dmitriy, as the next set of steps I'd recommend you guys to follow Apache
> process where everything is done on JIRAs and via patches. As you might
> know
> the integration into the bigtop includes the following parts:
>
>   - packaging
>   - tests: both integration and package
>   - deployment code (Puppet and perhaps Vagrant if embed VM is important)
>
> We have a wiki page
>     https://cwiki.apache.org/confluence/display/BIGTOP/How+to+Contribute
> that should help you to get started on this. Looking forward for your
> contributions! Once the ball is rolling there might be more people joining
> and
> helping you guys to feel welcome ;)!
>
> Please don't hesistate to send your questions, discussion topics to the
> dev@
> list. Good luck.
>
> Cos
>
> On Mon, Mar 24, 2014 at 10:43PM, Konstantin Boudnik wrote:
> > Re-shaping the thread into a [DISCUSSION].
> >
> > I encourage community members to chime in with your stand on the
> proposal so
> > we can have a discussion around it!
> >
> > Thanks,
> >   Cos
> >
> > On Mon, Mar 24, 2014 at 06:07PM, Dmitriy Setrakyan wrote:
> > > Hi,
> > >
> > > I am writing on behalf of GridGain open source project (
> www.gridgain.org),
> > > licensed under Apache 2.0. At GridGain we are working on In-Memory
> > > Computing Platform which is becoming one-stop place for many
> distributed
> > > compute, data, and streaming needs.
> > >
> > > One of the main pieces of our platform is our In-Memory Apache Hadoop
> > > Accelerator which aims to accelerate HDFS and Map/Reduce by bringing
> both,
> > > data and computations into memory. We do it with our GGFS - Hadoop
> > > compliant in-memory file system. For I/O intensive jobs GridGain GGFS
> > > offers performance close to 100x faster than standard HDFS. More
> > > information can be found here:
> > > http://www.gridgain.org/features/hadoop-acceleration/
> > >
> > > We would like to have an opportunity to integrate our Apache Hadoop
> > > Accelerator with Apache Bigtop. Please let us know if this is possible
> and
> > > what steps are required of us.
> > >
> > > Thanks in advance.
> > > -
> > > Dmitriy Setrakyan, EVP Engineering
> > > *GridGain Systems*
> > > www.gridgain.com
>
>
>

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Posted by Konstantin Boudnik <co...@apache.org>.
Looks like the discussion if winding down and I don't see any negative
reaction to the proposal. Which is great as it seems to make sense to have an
ability to offer a different approach to in-memory processing as well as
performance acceleration for the tools that rely on traditional MR model.

Dmitriy, as the next set of steps I'd recommend you guys to follow Apache
process where everything is done on JIRAs and via patches. As you might know
the integration into the bigtop includes the following parts:

  - packaging
  - tests: both integration and package
  - deployment code (Puppet and perhaps Vagrant if embed VM is important)

We have a wiki page
    https://cwiki.apache.org/confluence/display/BIGTOP/How+to+Contribute
that should help you to get started on this. Looking forward for your
contributions! Once the ball is rolling there might be more people joining and
helping you guys to feel welcome ;)!

Please don't hesistate to send your questions, discussion topics to the dev@
list. Good luck.

Cos

On Mon, Mar 24, 2014 at 10:43PM, Konstantin Boudnik wrote:
> Re-shaping the thread into a [DISCUSSION].
> 
> I encourage community members to chime in with your stand on the proposal so
> we can have a discussion around it!
> 
> Thanks,
>   Cos
> 
> On Mon, Mar 24, 2014 at 06:07PM, Dmitriy Setrakyan wrote:
> > Hi,
> > 
> > I am writing on behalf of GridGain open source project (www.gridgain.org),
> > licensed under Apache 2.0. At GridGain we are working on In-Memory
> > Computing Platform which is becoming one-stop place for many distributed
> > compute, data, and streaming needs.
> > 
> > One of the main pieces of our platform is our In-Memory Apache Hadoop
> > Accelerator which aims to accelerate HDFS and Map/Reduce by bringing both,
> > data and computations into memory. We do it with our GGFS - Hadoop
> > compliant in-memory file system. For I/O intensive jobs GridGain GGFS
> > offers performance close to 100x faster than standard HDFS. More
> > information can be found here:
> > http://www.gridgain.org/features/hadoop-acceleration/
> > 
> > We would like to have an opportunity to integrate our Apache Hadoop
> > Accelerator with Apache Bigtop. Please let us know if this is possible and
> > what steps are required of us.
> > 
> > Thanks in advance.
> > -
> > Dmitriy Setrakyan, EVP Engineering
> > *GridGain Systems*
> > www.gridgain.com



[DISCUSSION]: Adding GridGain component in Bigtop

Posted by Konstantin Boudnik <co...@apache.org>.
Re-shaping the thread into a [DISCUSSION].

I encourage community members to chime in with your stand on the proposal so
we can have a discussion around it!

Thanks,
  Cos

On Mon, Mar 24, 2014 at 06:07PM, Dmitriy Setrakyan wrote:
> Hi,
> 
> I am writing on behalf of GridGain open source project (www.gridgain.org),
> licensed under Apache 2.0. At GridGain we are working on In-Memory
> Computing Platform which is becoming one-stop place for many distributed
> compute, data, and streaming needs.
> 
> One of the main pieces of our platform is our In-Memory Apache Hadoop
> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing both,
> data and computations into memory. We do it with our GGFS - Hadoop
> compliant in-memory file system. For I/O intensive jobs GridGain GGFS
> offers performance close to 100x faster than standard HDFS. More
> information can be found here:
> http://www.gridgain.org/features/hadoop-acceleration/
> 
> We would like to have an opportunity to integrate our Apache Hadoop
> Accelerator with Apache Bigtop. Please let us know if this is possible and
> what steps are required of us.
> 
> Thanks in advance.
> -
> Dmitriy Setrakyan, EVP Engineering
> *GridGain Systems*
> www.gridgain.com