You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directmemory.apache.org by Akash Ashok <th...@gmail.com> on 2011/10/19 04:44:38 UTC

Involvement as a developer

I know this is still getting incubated. But I would like to get involved in
the development. What are the rules like? Can I get involved ?

Cheers,
Akash A

Re: Involvement as a developer

Posted by Ashish <pa...@gmail.com>.

On Wed, Oct 19, 2011 at 8:14 AM, Akash Ashok <th...@gmail.com> wrote:
> I know this is still getting incubated. But I would like to get involved in
> the development. What are the rules like? Can I get involved ?
>
> Cheers,
> Akash A
>

http://apache.org/foundation/getinvolved.html

I think you are fairly active on HBase ML :)

-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Involvement as a developer

Posted by Ashish <pa...@gmail.com>.

On Thu, Oct 20, 2011 at 8:19 PM, Raffaele P. Guidi
<ra...@gmail.com> wrote:
> With this we begin the battle against memcached/membase - weren't ehcache
> and hazelcast enough to begin with? :D
>
> Ciao,
>    R
>
> PS: of course agreed, later on in the roadmap

:) cool

Anyways I would probably never use REST interface myself. I would
rather use Avro or ProtoBuffs, with upcoming MINA 3.0 :)

cheers
ashish

Re: Involvement as a developer

Posted by "Raffaele P. Guidi" <ra...@gmail.com>.

With this we begin the battle against memcached/membase - weren't ehcache
and hazelcast enough to begin with? :D

Ciao,
    R

PS: of course agreed, later on in the roadmap

On Thu, Oct 20, 2011 at 4:57 AM, Ashish <pa...@gmail.com> wrote:

> On Wed, Oct 19, 2011 at 9:28 PM, Raffaele P. Guidi
> <ra...@gmail.com> wrote:
> > Yep. This will be a good challenge for our second release ;)
>
> How about adding a RESTful API access to the roadmap?
>
> There may be use cases when people might want to use a offheap cache
> server. Use Case can be mostly a read-only cache, where folks may want
> to dedicate a large server with 512G RAM, and may not be in interested
> in having lot of servers with 128G or larger RAM.
>
> cheers
> ashish
>

Re: Involvement as a developer

Posted by Ashish <pa...@gmail.com>.

On Wed, Oct 19, 2011 at 9:28 PM, Raffaele P. Guidi
<ra...@gmail.com> wrote:
> Yep. This will be a good challenge for our second release ;)

How about adding a RESTful API access to the roadmap?

There may be use cases when people might want to use a offheap cache
server. Use Case can be mostly a read-only cache, where folks may want
to dedicate a large server with 512G RAM, and may not be in interested
in having lot of servers with 128G or larger RAM.

cheers
ashish

Re: Involvement as a developer

Posted by "Raffaele P. Guidi" <ra...@gmail.com>.

Yep. This will be a good challenge for our second release ;)

On Wed, Oct 19, 2011 at 5:33 PM, Akash Ashok <th...@gmail.com> wrote:

> On Wed, Oct 19, 2011 at 7:01 PM, Raffaele P. Guidi <
> raffaele.p.guidi@gmail.com> wrote:
>
> > Sorry, Ashish, but I think there must be a misunderstanding: the map
> > doesn't
> > contain the actual data, it is just the index to data itself, which is
> into
> > the off-heap memory. In fact it is a collection of Pointer objects, which
> > contain the offset and the lenght of the DirectBuffer that contains the
> > actual byte array. So: replicating the map (which is natively offered by
> > both hc and terracotta) means replicating the INDEX of the data, not data
> > itself.
> >
> > Ah this is a good point  would eliminate SPOF of the entire cache.
>
>
> > Again: replication of the map(index) is one matter, distribution of the
> > data
> > is a different question. I'm not proposing to use  terracotta or
> hazelcast
> > for their caching features but for their *clustering* features
> >
> > Well distribution of data at its facevalue without replication I presume
> wouldn't be highly
> complicated I presume. Assuming a cluster of DirectMemory. We have a load
> balancer
> which goes to a particular system. stores the data OffHeap and adds the
> Pointer on Hazelcast.
>
> So Hazelcast would be acting as a meta-store. But this would require 2
> roundtrips to fetch
> somedata. Better than SPOF of the cache.
>
> But replicating the data is where the real challenge would be.Group
> membership which is
> pretty complex
>
>
> On Wed, Oct 19, 2011 at 2:46 PM, Ashish <pa...@gmail.com> wrote:
> >
> > > On Wed, Oct 19, 2011 at 5:41 PM, Raffaele P. Guidi
> > > <ra...@gmail.com> wrote:
> > > > Also, on replication/distribution, we have two distinct aspects:
> > > >
> > > >
> > > >   1. *map replication* - the pointers map has to be replicated to all
> > > nodes
> > > >   and each pointer have to contain also a reference to the node who
> > > "owns" the
> > > >   real data
> > > >   2. *communication between nodes* - once one node knows that one
> entry
> > > is
> > > >   contained in node "n" has to ask for it
> > > >
> > > >
> > > > The first point is easily covered by terracotta or hazelcast, while
> the
> > > > second one should be implemented using an RPC mechanism (Thrift or
> Avro
> > > are
> > > > both good choices). Another option is to cover also point 1 with a
> > custom
> > > > replication built on top of the chosen RPC framework - of course this
> > > would
> > > > lead to another (do we really need it?) distributed map
> implementation.
> > >
> > > Disagree on this. Be it TC or Hazelcast, they shall cover both the
> > points.
> > > Lets take an example of Terracotta. Its a Client-Server architecture
> > > with striping on Server side.
> > > Now if you choose TC (short for Terracotta), you got 3 options
> > > 1. Use DSO or Distributed Shared Object mode - needs instrumentation
> > > and other stuff, not recommended
> > > 2. Use Ehcache at back, and TC takes care Distributing data
> > > 3. Use Map via TC Toolkit
> > >
> > > TC will not let you know where its storing the key (which infact are
> > > stored in HA manner on Server Stripe). That's the beauty of TC. It
> > > does the faulting/flushing transparently to the user code.
> > >
> > > On Hazelcast side, it does allow to know where the key is, but the
> > > moment you use its client, it becomes transparent to you.
> > >
> > > IMHO, using any existing cache solution would complicate the user
> story.
> > >
> > > Distribution is a nice to have feature, and infact would lead to a
> > > wider adoption :)
> > >
> > > >
> > > > Keeping things like this is easy - of course making it
> > > efficient/performant
> > > > is a different story (i.e., should I keep a local cache of frequently
> > > > accessed items stored in other nodes? etc..).
> > > >
> > > > Ciao,
> > > >    R
> > > >
> > >
> > > thanks
> > > ashish
> > >
> >
>

Re: Involvement as a developer

Posted by Akash Ashok <th...@gmail.com>.

On Wed, Oct 19, 2011 at 7:01 PM, Raffaele P. Guidi <
raffaele.p.guidi@gmail.com> wrote:

> Sorry, Ashish, but I think there must be a misunderstanding: the map
> doesn't
> contain the actual data, it is just the index to data itself, which is into
> the off-heap memory. In fact it is a collection of Pointer objects, which
> contain the offset and the lenght of the DirectBuffer that contains the
> actual byte array. So: replicating the map (which is natively offered by
> both hc and terracotta) means replicating the INDEX of the data, not data
> itself.
>
> Ah this is a good point  would eliminate SPOF of the entire cache.


> Again: replication of the map(index) is one matter, distribution of the
> data
> is a different question. I'm not proposing to use  terracotta or hazelcast
> for their caching features but for their *clustering* features
>
> Well distribution of data at its facevalue without replication I presume
wouldn't be highly
complicated I presume. Assuming a cluster of DirectMemory. We have a load
balancer
which goes to a particular system. stores the data OffHeap and adds the
Pointer on Hazelcast.

So Hazelcast would be acting as a meta-store. But this would require 2
roundtrips to fetch
somedata. Better than SPOF of the cache.

But replicating the data is where the real challenge would be.Group
membership which is
pretty complex


On Wed, Oct 19, 2011 at 2:46 PM, Ashish <pa...@gmail.com> wrote:
>
> > On Wed, Oct 19, 2011 at 5:41 PM, Raffaele P. Guidi
> > <ra...@gmail.com> wrote:
> > > Also, on replication/distribution, we have two distinct aspects:
> > >
> > >
> > >   1. *map replication* - the pointers map has to be replicated to all
> > nodes
> > >   and each pointer have to contain also a reference to the node who
> > "owns" the
> > >   real data
> > >   2. *communication between nodes* - once one node knows that one entry
> > is
> > >   contained in node "n" has to ask for it
> > >
> > >
> > > The first point is easily covered by terracotta or hazelcast, while the
> > > second one should be implemented using an RPC mechanism (Thrift or Avro
> > are
> > > both good choices). Another option is to cover also point 1 with a
> custom
> > > replication built on top of the chosen RPC framework - of course this
> > would
> > > lead to another (do we really need it?) distributed map implementation.
> >
> > Disagree on this. Be it TC or Hazelcast, they shall cover both the
> points.
> > Lets take an example of Terracotta. Its a Client-Server architecture
> > with striping on Server side.
> > Now if you choose TC (short for Terracotta), you got 3 options
> > 1. Use DSO or Distributed Shared Object mode - needs instrumentation
> > and other stuff, not recommended
> > 2. Use Ehcache at back, and TC takes care Distributing data
> > 3. Use Map via TC Toolkit
> >
> > TC will not let you know where its storing the key (which infact are
> > stored in HA manner on Server Stripe). That's the beauty of TC. It
> > does the faulting/flushing transparently to the user code.
> >
> > On Hazelcast side, it does allow to know where the key is, but the
> > moment you use its client, it becomes transparent to you.
> >
> > IMHO, using any existing cache solution would complicate the user story.
> >
> > Distribution is a nice to have feature, and infact would lead to a
> > wider adoption :)
> >
> > >
> > > Keeping things like this is easy - of course making it
> > efficient/performant
> > > is a different story (i.e., should I keep a local cache of frequently
> > > accessed items stored in other nodes? etc..).
> > >
> > > Ciao,
> > >    R
> > >
> >
> > thanks
> > ashish
> >
>

Re: Involvement as a developer

Posted by "Raffaele P. Guidi" <ra...@gmail.com>.

totally agreed :)

On Wed, Oct 19, 2011 at 5:14 PM, Ashish <pa...@gmail.com> wrote:

> On Wed, Oct 19, 2011 at 8:26 PM, Raffaele P. Guidi
> <ra...@gmail.com> wrote:
> > It's probably my fault: as a former basketball player I believe that when
> an
> > assist is missed you have to blame the one who throws the ball :) Also,
> > after having worked alone on DirectMemory for a long time, discussing
> these
> > matters with skilled professionals like you is a real pleasure.
>
> Well having worked as Terracotta Field Engg, got to know a few things,
> and that what I discuss here.
>
> At ASF, its "we"- the community and we are all together in this :) so
> its no ones fault.
>
> Well during conversations and while I watched some commits at github,
> got a bit confused about DirectMemory objective. That's why I asked
> the question on other thread about it core objective.
>
> I feel we should focus on getting a release out, with know feature,
> and the slowly add features. Adding documentation would be vital to
> wide adoption.
>
> >
> > Thanks,
> >    Raffaele
> >
> > On Wed, Oct 19, 2011 at 3:33 PM, Ashish <pa...@gmail.com> wrote:
> >
> >> On Wed, Oct 19, 2011 at 7:01 PM, Raffaele P. Guidi
> >> <ra...@gmail.com> wrote:
> >> > Sorry, Ashish, but I think there must be a misunderstanding: the map
> >> doesn't
> >> > contain the actual data, it is just the index to data itself, which is
> >> into
> >> > the off-heap memory. In fact it is a collection of Pointer objects,
> which
> >> > contain the offset and the lenght of the DirectBuffer that contains
> the
> >> > actual byte array. So: replicating the map (which is natively offered
> by
> >> > both hc and terracotta) means replicating the INDEX of the data, not
> data
> >> > itself.
> >>
> >> Aha, got it. I thought about the data that's why brought up the point :)
> >>
> >> >
> >> > Again: replication of the map(index) is one matter, distribution of
> the
> >> data
> >> > is a different question. I'm not proposing to use  terracotta or
> >> hazelcast
> >> > for their caching features but for their *clustering* features
> >>
> >> Got it now :)
> >>
> >> >
> >> > On Wed, Oct 19, 2011 at 2:46 PM, Ashish <pa...@gmail.com>
> wrote:
> >> >
> >> >> On Wed, Oct 19, 2011 at 5:41 PM, Raffaele P. Guidi
> >> >> <ra...@gmail.com> wrote:
> >> >> > Also, on replication/distribution, we have two distinct aspects:
> >> >> >
> >> >> >
> >> >> >   1. *map replication* - the pointers map has to be replicated to
> all
> >> >> nodes
> >> >> >   and each pointer have to contain also a reference to the node who
> >> >> "owns" the
> >> >> >   real data
> >> >> >   2. *communication between nodes* - once one node knows that one
> >> entry
> >> >> is
> >> >> >   contained in node "n" has to ask for it
> >> >> >
> >> >> >
> >> >> > The first point is easily covered by terracotta or hazelcast, while
> >> the
> >> >> > second one should be implemented using an RPC mechanism (Thrift or
> >> Avro
> >> >> are
> >> >> > both good choices). Another option is to cover also point 1 with a
> >> custom
> >> >> > replication built on top of the chosen RPC framework - of course
> this
> >> >> would
> >> >> > lead to another (do we really need it?) distributed map
> >> implementation.
> >> >>
> >> >> Disagree on this. Be it TC or Hazelcast, they shall cover both the
> >> points.
> >> >> Lets take an example of Terracotta. Its a Client-Server architecture
> >> >> with striping on Server side.
> >> >> Now if you choose TC (short for Terracotta), you got 3 options
> >> >> 1. Use DSO or Distributed Shared Object mode - needs instrumentation
> >> >> and other stuff, not recommended
> >> >> 2. Use Ehcache at back, and TC takes care Distributing data
> >> >> 3. Use Map via TC Toolkit
> >> >>
> >> >> TC will not let you know where its storing the key (which infact are
> >> >> stored in HA manner on Server Stripe). That's the beauty of TC. It
> >> >> does the faulting/flushing transparently to the user code.
> >> >>
> >> >> On Hazelcast side, it does allow to know where the key is, but the
> >> >> moment you use its client, it becomes transparent to you.
> >> >>
> >> >> IMHO, using any existing cache solution would complicate the user
> story.
> >> >>
> >> >> Distribution is a nice to have feature, and infact would lead to a
> >> >> wider adoption :)
> >> >>
> >> >> >
> >> >> > Keeping things like this is easy - of course making it
> >> >> efficient/performant
> >> >> > is a different story (i.e., should I keep a local cache of
> frequently
> >> >> > accessed items stored in other nodes? etc..).
> >> >> >
> >> >> > Ciao,
> >> >> >    R
> >> >> >
> >> >>
> >> >> thanks
> >> >> ashish
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> thanks
> >> ashish
> >>
> >> Blog: http://www.ashishpaliwal.com/blog
> >> My Photo Galleries: http://www.pbase.com/ashishpaliwal
> >>
> >
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>

Re: Involvement as a developer

Posted by Ashish <pa...@gmail.com>.

On Wed, Oct 19, 2011 at 8:26 PM, Raffaele P. Guidi
<ra...@gmail.com> wrote:
> It's probably my fault: as a former basketball player I believe that when an
> assist is missed you have to blame the one who throws the ball :) Also,
> after having worked alone on DirectMemory for a long time, discussing these
> matters with skilled professionals like you is a real pleasure.

Well having worked as Terracotta Field Engg, got to know a few things,
and that what I discuss here.

At ASF, its "we"- the community and we are all together in this :) so
its no ones fault.

Well during conversations and while I watched some commits at github,
got a bit confused about DirectMemory objective. That's why I asked
the question on other thread about it core objective.

I feel we should focus on getting a release out, with know feature,
and the slowly add features. Adding documentation would be vital to
wide adoption.

>
> Thanks,
>    Raffaele
>
> On Wed, Oct 19, 2011 at 3:33 PM, Ashish <pa...@gmail.com> wrote:
>
>> On Wed, Oct 19, 2011 at 7:01 PM, Raffaele P. Guidi
>> <ra...@gmail.com> wrote:
>> > Sorry, Ashish, but I think there must be a misunderstanding: the map
>> doesn't
>> > contain the actual data, it is just the index to data itself, which is
>> into
>> > the off-heap memory. In fact it is a collection of Pointer objects, which
>> > contain the offset and the lenght of the DirectBuffer that contains the
>> > actual byte array. So: replicating the map (which is natively offered by
>> > both hc and terracotta) means replicating the INDEX of the data, not data
>> > itself.
>>
>> Aha, got it. I thought about the data that's why brought up the point :)
>>
>> >
>> > Again: replication of the map(index) is one matter, distribution of the
>> data
>> > is a different question. I'm not proposing to use  terracotta or
>> hazelcast
>> > for their caching features but for their *clustering* features
>>
>> Got it now :)
>>
>> >
>> > On Wed, Oct 19, 2011 at 2:46 PM, Ashish <pa...@gmail.com> wrote:
>> >
>> >> On Wed, Oct 19, 2011 at 5:41 PM, Raffaele P. Guidi
>> >> <ra...@gmail.com> wrote:
>> >> > Also, on replication/distribution, we have two distinct aspects:
>> >> >
>> >> >
>> >> >   1. *map replication* - the pointers map has to be replicated to all
>> >> nodes
>> >> >   and each pointer have to contain also a reference to the node who
>> >> "owns" the
>> >> >   real data
>> >> >   2. *communication between nodes* - once one node knows that one
>> entry
>> >> is
>> >> >   contained in node "n" has to ask for it
>> >> >
>> >> >
>> >> > The first point is easily covered by terracotta or hazelcast, while
>> the
>> >> > second one should be implemented using an RPC mechanism (Thrift or
>> Avro
>> >> are
>> >> > both good choices). Another option is to cover also point 1 with a
>> custom
>> >> > replication built on top of the chosen RPC framework - of course this
>> >> would
>> >> > lead to another (do we really need it?) distributed map
>> implementation.
>> >>
>> >> Disagree on this. Be it TC or Hazelcast, they shall cover both the
>> points.
>> >> Lets take an example of Terracotta. Its a Client-Server architecture
>> >> with striping on Server side.
>> >> Now if you choose TC (short for Terracotta), you got 3 options
>> >> 1. Use DSO or Distributed Shared Object mode - needs instrumentation
>> >> and other stuff, not recommended
>> >> 2. Use Ehcache at back, and TC takes care Distributing data
>> >> 3. Use Map via TC Toolkit
>> >>
>> >> TC will not let you know where its storing the key (which infact are
>> >> stored in HA manner on Server Stripe). That's the beauty of TC. It
>> >> does the faulting/flushing transparently to the user code.
>> >>
>> >> On Hazelcast side, it does allow to know where the key is, but the
>> >> moment you use its client, it becomes transparent to you.
>> >>
>> >> IMHO, using any existing cache solution would complicate the user story.
>> >>
>> >> Distribution is a nice to have feature, and infact would lead to a
>> >> wider adoption :)
>> >>
>> >> >
>> >> > Keeping things like this is easy - of course making it
>> >> efficient/performant
>> >> > is a different story (i.e., should I keep a local cache of frequently
>> >> > accessed items stored in other nodes? etc..).
>> >> >
>> >> > Ciao,
>> >> >    R
>> >> >
>> >>
>> >> thanks
>> >> ashish
>> >>
>> >
>>
>>
>>
>> --
>> thanks
>> ashish
>>
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Involvement as a developer

Posted by "Raffaele P. Guidi" <ra...@gmail.com>.

It's probably my fault: as a former basketball player I believe that when an
assist is missed you have to blame the one who throws the ball :) Also,
after having worked alone on DirectMemory for a long time, discussing these
matters with skilled professionals like you is a real pleasure.

Thanks,
    Raffaele

On Wed, Oct 19, 2011 at 3:33 PM, Ashish <pa...@gmail.com> wrote:

> On Wed, Oct 19, 2011 at 7:01 PM, Raffaele P. Guidi
> <ra...@gmail.com> wrote:
> > Sorry, Ashish, but I think there must be a misunderstanding: the map
> doesn't
> > contain the actual data, it is just the index to data itself, which is
> into
> > the off-heap memory. In fact it is a collection of Pointer objects, which
> > contain the offset and the lenght of the DirectBuffer that contains the
> > actual byte array. So: replicating the map (which is natively offered by
> > both hc and terracotta) means replicating the INDEX of the data, not data
> > itself.
>
> Aha, got it. I thought about the data that's why brought up the point :)
>
> >
> > Again: replication of the map(index) is one matter, distribution of the
> data
> > is a different question. I'm not proposing to use  terracotta or
> hazelcast
> > for their caching features but for their *clustering* features
>
> Got it now :)
>
> >
> > On Wed, Oct 19, 2011 at 2:46 PM, Ashish <pa...@gmail.com> wrote:
> >
> >> On Wed, Oct 19, 2011 at 5:41 PM, Raffaele P. Guidi
> >> <ra...@gmail.com> wrote:
> >> > Also, on replication/distribution, we have two distinct aspects:
> >> >
> >> >
> >> >   1. *map replication* - the pointers map has to be replicated to all
> >> nodes
> >> >   and each pointer have to contain also a reference to the node who
> >> "owns" the
> >> >   real data
> >> >   2. *communication between nodes* - once one node knows that one
> entry
> >> is
> >> >   contained in node "n" has to ask for it
> >> >
> >> >
> >> > The first point is easily covered by terracotta or hazelcast, while
> the
> >> > second one should be implemented using an RPC mechanism (Thrift or
> Avro
> >> are
> >> > both good choices). Another option is to cover also point 1 with a
> custom
> >> > replication built on top of the chosen RPC framework - of course this
> >> would
> >> > lead to another (do we really need it?) distributed map
> implementation.
> >>
> >> Disagree on this. Be it TC or Hazelcast, they shall cover both the
> points.
> >> Lets take an example of Terracotta. Its a Client-Server architecture
> >> with striping on Server side.
> >> Now if you choose TC (short for Terracotta), you got 3 options
> >> 1. Use DSO or Distributed Shared Object mode - needs instrumentation
> >> and other stuff, not recommended
> >> 2. Use Ehcache at back, and TC takes care Distributing data
> >> 3. Use Map via TC Toolkit
> >>
> >> TC will not let you know where its storing the key (which infact are
> >> stored in HA manner on Server Stripe). That's the beauty of TC. It
> >> does the faulting/flushing transparently to the user code.
> >>
> >> On Hazelcast side, it does allow to know where the key is, but the
> >> moment you use its client, it becomes transparent to you.
> >>
> >> IMHO, using any existing cache solution would complicate the user story.
> >>
> >> Distribution is a nice to have feature, and infact would lead to a
> >> wider adoption :)
> >>
> >> >
> >> > Keeping things like this is easy - of course making it
> >> efficient/performant
> >> > is a different story (i.e., should I keep a local cache of frequently
> >> > accessed items stored in other nodes? etc..).
> >> >
> >> > Ciao,
> >> >    R
> >> >
> >>
> >> thanks
> >> ashish
> >>
> >
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>

Re: Involvement as a developer

Posted by Ashish <pa...@gmail.com>.

On Wed, Oct 19, 2011 at 7:01 PM, Raffaele P. Guidi
<ra...@gmail.com> wrote:
> Sorry, Ashish, but I think there must be a misunderstanding: the map doesn't
> contain the actual data, it is just the index to data itself, which is into
> the off-heap memory. In fact it is a collection of Pointer objects, which
> contain the offset and the lenght of the DirectBuffer that contains the
> actual byte array. So: replicating the map (which is natively offered by
> both hc and terracotta) means replicating the INDEX of the data, not data
> itself.

Aha, got it. I thought about the data that's why brought up the point :)

>
> Again: replication of the map(index) is one matter, distribution of the data
> is a different question. I'm not proposing to use  terracotta or hazelcast
> for their caching features but for their *clustering* features

Got it now :)

>
> On Wed, Oct 19, 2011 at 2:46 PM, Ashish <pa...@gmail.com> wrote:
>
>> On Wed, Oct 19, 2011 at 5:41 PM, Raffaele P. Guidi
>> <ra...@gmail.com> wrote:
>> > Also, on replication/distribution, we have two distinct aspects:
>> >
>> >
>> >   1. *map replication* - the pointers map has to be replicated to all
>> nodes
>> >   and each pointer have to contain also a reference to the node who
>> "owns" the
>> >   real data
>> >   2. *communication between nodes* - once one node knows that one entry
>> is
>> >   contained in node "n" has to ask for it
>> >
>> >
>> > The first point is easily covered by terracotta or hazelcast, while the
>> > second one should be implemented using an RPC mechanism (Thrift or Avro
>> are
>> > both good choices). Another option is to cover also point 1 with a custom
>> > replication built on top of the chosen RPC framework - of course this
>> would
>> > lead to another (do we really need it?) distributed map implementation.
>>
>> Disagree on this. Be it TC or Hazelcast, they shall cover both the points.
>> Lets take an example of Terracotta. Its a Client-Server architecture
>> with striping on Server side.
>> Now if you choose TC (short for Terracotta), you got 3 options
>> 1. Use DSO or Distributed Shared Object mode - needs instrumentation
>> and other stuff, not recommended
>> 2. Use Ehcache at back, and TC takes care Distributing data
>> 3. Use Map via TC Toolkit
>>
>> TC will not let you know where its storing the key (which infact are
>> stored in HA manner on Server Stripe). That's the beauty of TC. It
>> does the faulting/flushing transparently to the user code.
>>
>> On Hazelcast side, it does allow to know where the key is, but the
>> moment you use its client, it becomes transparent to you.
>>
>> IMHO, using any existing cache solution would complicate the user story.
>>
>> Distribution is a nice to have feature, and infact would lead to a
>> wider adoption :)
>>
>> >
>> > Keeping things like this is easy - of course making it
>> efficient/performant
>> > is a different story (i.e., should I keep a local cache of frequently
>> > accessed items stored in other nodes? etc..).
>> >
>> > Ciao,
>> >    R
>> >
>>
>> thanks
>> ashish
>>
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Involvement as a developer

Posted by "Raffaele P. Guidi" <ra...@gmail.com>.

Sorry, Ashish, but I think there must be a misunderstanding: the map doesn't
contain the actual data, it is just the index to data itself, which is into
the off-heap memory. In fact it is a collection of Pointer objects, which
contain the offset and the lenght of the DirectBuffer that contains the
actual byte array. So: replicating the map (which is natively offered by
both hc and terracotta) means replicating the INDEX of the data, not data
itself.

Again: replication of the map(index) is one matter, distribution of the data
is a different question. I'm not proposing to use  terracotta or hazelcast
for their caching features but for their *clustering* features

On Wed, Oct 19, 2011 at 2:46 PM, Ashish <pa...@gmail.com> wrote:

> On Wed, Oct 19, 2011 at 5:41 PM, Raffaele P. Guidi
> <ra...@gmail.com> wrote:
> > Also, on replication/distribution, we have two distinct aspects:
> >
> >
> >   1. *map replication* - the pointers map has to be replicated to all
> nodes
> >   and each pointer have to contain also a reference to the node who
> "owns" the
> >   real data
> >   2. *communication between nodes* - once one node knows that one entry
> is
> >   contained in node "n" has to ask for it
> >
> >
> > The first point is easily covered by terracotta or hazelcast, while the
> > second one should be implemented using an RPC mechanism (Thrift or Avro
> are
> > both good choices). Another option is to cover also point 1 with a custom
> > replication built on top of the chosen RPC framework - of course this
> would
> > lead to another (do we really need it?) distributed map implementation.
>
> Disagree on this. Be it TC or Hazelcast, they shall cover both the points.
> Lets take an example of Terracotta. Its a Client-Server architecture
> with striping on Server side.
> Now if you choose TC (short for Terracotta), you got 3 options
> 1. Use DSO or Distributed Shared Object mode - needs instrumentation
> and other stuff, not recommended
> 2. Use Ehcache at back, and TC takes care Distributing data
> 3. Use Map via TC Toolkit
>
> TC will not let you know where its storing the key (which infact are
> stored in HA manner on Server Stripe). That's the beauty of TC. It
> does the faulting/flushing transparently to the user code.
>
> On Hazelcast side, it does allow to know where the key is, but the
> moment you use its client, it becomes transparent to you.
>
> IMHO, using any existing cache solution would complicate the user story.
>
> Distribution is a nice to have feature, and infact would lead to a
> wider adoption :)
>
> >
> > Keeping things like this is easy - of course making it
> efficient/performant
> > is a different story (i.e., should I keep a local cache of frequently
> > accessed items stored in other nodes? etc..).
> >
> > Ciao,
> >    R
> >
>
> thanks
> ashish
>

Re: Involvement as a developer

Posted by Ashish <pa...@gmail.com>.

On Wed, Oct 19, 2011 at 5:41 PM, Raffaele P. Guidi
<ra...@gmail.com> wrote:
> Also, on replication/distribution, we have two distinct aspects:
>
>
>   1. *map replication* - the pointers map has to be replicated to all nodes
>   and each pointer have to contain also a reference to the node who "owns" the
>   real data
>   2. *communication between nodes* - once one node knows that one entry is
>   contained in node "n" has to ask for it
>
>
> The first point is easily covered by terracotta or hazelcast, while the
> second one should be implemented using an RPC mechanism (Thrift or Avro are
> both good choices). Another option is to cover also point 1 with a custom
> replication built on top of the chosen RPC framework - of course this would
> lead to another (do we really need it?) distributed map implementation.

Disagree on this. Be it TC or Hazelcast, they shall cover both the points.
Lets take an example of Terracotta. Its a Client-Server architecture
with striping on Server side.
Now if you choose TC (short for Terracotta), you got 3 options
1. Use DSO or Distributed Shared Object mode - needs instrumentation
and other stuff, not recommended
2. Use Ehcache at back, and TC takes care Distributing data
3. Use Map via TC Toolkit

TC will not let you know where its storing the key (which infact are
stored in HA manner on Server Stripe). That's the beauty of TC. It
does the faulting/flushing transparently to the user code.

On Hazelcast side, it does allow to know where the key is, but the
moment you use its client, it becomes transparent to you.

IMHO, using any existing cache solution would complicate the user story.

Distribution is a nice to have feature, and infact would lead to a
wider adoption :)

>
> Keeping things like this is easy - of course making it efficient/performant
> is a different story (i.e., should I keep a local cache of frequently
> accessed items stored in other nodes? etc..).
>
> Ciao,
>    R
>

thanks
ashish

Re: Involvement as a developer

Posted by "Raffaele P. Guidi" <ra...@gmail.com>.

Also, on replication/distribution, we have two distinct aspects:


   1. *map replication* - the pointers map has to be replicated to all nodes
   and each pointer have to contain also a reference to the node who "owns" the
   real data
   2. *communication between nodes* - once one node knows that one entry is
   contained in node "n" has to ask for it


The first point is easily covered by terracotta or hazelcast, while the
second one should be implemented using an RPC mechanism (Thrift or Avro are
both good choices). Another option is to cover also point 1 with a custom
replication built on top of the chosen RPC framework - of course this would
lead to another (do we really need it?) distributed map implementation.

Keeping things like this is easy - of course making it efficient/performant
is a different story (i.e., should I keep a local cache of frequently
accessed items stored in other nodes? etc..).

Ciao,
    R

On Wed, Oct 19, 2011 at 8:43 AM, Raffaele P. Guidi <
raffaele.p.guidi@gmail.com> wrote:

> I Used to have two file storage implementations: the simplest one which
> wrote every entry on disk as a separate file - terribly slow - and the
> second one based on OrientDB, a nosql solution - fast but used all the heap
> on its own - you can see my blog for reference. Maybe the JCS file storage
> could fit?
>
> On Wednesday, October 19, 2011, Ashish <pa...@gmail.com> wrote:
> > On Wed, Oct 19, 2011 at 11:02 AM, Akash Ashok <th...@gmail.com>
> wrote:
> >> Thanks Guys for responding on a quick reply. HBase is indeed my first
> love
> >> :)
> >>
> >> I was really interested in what DirectMemory could offer and am excited
> to
> >> see that its pretty awesome. I posed that question because I wasn't
> really
> >> sure as to what the rules were on the incubator :)  Great to know I
> could
> >> get started right away. Shall start digging thru the code.
> >
> > All ASF projects work the same way :)
> >
> >>
> >> I went through the road map and have a few questions:
> >>
> >> 1. File Storage -
> >> This was my main concern. This moves more towards a database approach or
> to
> >> be more specific Key Value store approach  Aren't there existing file
> based
> >> solutions which can be used instead of developing another one?
> >
> > Yes there are. We can use any of key-value stores or Object DB's like
> > BerkleyDB or others.
> > It would be good to have a pluggable layer, where a persistant
> > provider can be chosen.
> > However, its OffHeap aka stored in RAM, so don't feel its a priority,
> > and persistance would make sense when we evolve into a Cache
> > framework.
> >
> >>
> >> 2. Hazelcast for replication -
> >> If i get this right the idea to use it more like a replication framework
> and
> >> then use that to cache them onto the DirectMemory on the respective
> systems
> >> ?
> >
> > Hazelcast already has released OffHeap storage. Moreover, its a
> > complete Cache framework with provision of adding persistant stores.
> > So I am very much against using it. Please note that I use Hazelcast
> > myself, so not implying that its not a good framework.
> >
> > Going Distributed is a different story, and if you see the past
> > conversations, our current focus is on OffHeap store.
> > To go distributed, we have to move scope from OffHeap store to a Cache
> > solution. Then comes the challenges Distributed or Replicated.
> > Replicated is easy, but Distributed world would bring in a lot more
> > design choice like Server or peer2peer architecture, what concurrency
> > levels to support etc.
> >
> > My take
> > 1. Implement OffHeap store
> > 2. Benchmark it, refine it to give latencies that we need in
> > production (would luv sub-milliseconds or keep then less than 2 ms)
> >
> > Once we reach here, we can think about going complete caching
> > solution, followed by going distributed :)
> >
> >>
> >> Lastly "I recently rewrote DM entirely for simplification." +1 on the
> fact
> >> of simplification.
> >> As Einstien once said " Make things as simple as possible, not simpler"
> :)
> >>
> >> Cheers,
> >> Akash A
> >>
> >>
> >>
> >> On Wed, Oct 19, 2011 at 8:33 AM, Raffaele P. Guidi <
> >> raffaele.p.guidi@gmail.com> wrote:
> >>
> >>> Uhm, I'm new in the ASF, but I guess you just need to read some docs
> (ok,
> >>> we
> >>> don't have many, just something in the old wiki @github and an initial
> >>> roadmap proposal [1]), take a look at the code and the issues and get
> >>> started :) I see googling around that you have interests in hadoop and
> >>> hbase
> >>> as well, so maybe you could start investigating sinergies with those
> >>> products (just an idea, it is in the roadmap).
> >>>
> >>> In any case thanks for your interest and welcome aboard.
> >>>
> >>> Ciao,
> >>>    R
> >>>
> >>> [1] Roadmap proposal:
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/DIRECTMEMORY/2011/10/18/Apache+DirectMemory+-+initial+roadmap+discussion
> >>>
> >>> On Wed, Oct 19, 2011 at 4:44 AM, Akash Ashok <th...@gmail.com>
> >>> wrote:
> >>>
> >>> > I know this is still getting incubated. But I would like to get
> involved
> >>> in
> >>> > the development. What are the rules like? Can I get involved ?
> >>> >
> >>> > Cheers,
> >>> > Akash A
> >>> >
> >>>
> >>
> >
> >
> >
> > --
> > thanks
> > ashish
> >
> > Blog: http://www.ashishpaliwal.com/blog
> > My Photo Galleries: http://www.pbase.com/ashishpaliwal
> >
>

Re: Involvement as a developer

Posted by "Raffaele P. Guidi" <ra...@gmail.com>.

I Used to have two file storage implementations: the simplest one which
wrote every entry on disk as a separate file - terribly slow - and the
second one based on OrientDB, a nosql solution - fast but used all the heap
on its own - you can see my blog for reference. Maybe the JCS file storage
could fit?
On Wednesday, October 19, 2011, Ashish <pa...@gmail.com> wrote:
> On Wed, Oct 19, 2011 at 11:02 AM, Akash Ashok <th...@gmail.com>
wrote:
>> Thanks Guys for responding on a quick reply. HBase is indeed my first
love
>> :)
>>
>> I was really interested in what DirectMemory could offer and am excited
to
>> see that its pretty awesome. I posed that question because I wasn't
really
>> sure as to what the rules were on the incubator :)  Great to know I could
>> get started right away. Shall start digging thru the code.
>
> All ASF projects work the same way :)
>
>>
>> I went through the road map and have a few questions:
>>
>> 1. File Storage -
>> This was my main concern. This moves more towards a database approach or
to
>> be more specific Key Value store approach  Aren't there existing file
based
>> solutions which can be used instead of developing another one?
>
> Yes there are. We can use any of key-value stores or Object DB's like
> BerkleyDB or others.
> It would be good to have a pluggable layer, where a persistant
> provider can be chosen.
> However, its OffHeap aka stored in RAM, so don't feel its a priority,
> and persistance would make sense when we evolve into a Cache
> framework.
>
>>
>> 2. Hazelcast for replication -
>> If i get this right the idea to use it more like a replication framework
and
>> then use that to cache them onto the DirectMemory on the respective
systems
>> ?
>
> Hazelcast already has released OffHeap storage. Moreover, its a
> complete Cache framework with provision of adding persistant stores.
> So I am very much against using it. Please note that I use Hazelcast
> myself, so not implying that its not a good framework.
>
> Going Distributed is a different story, and if you see the past
> conversations, our current focus is on OffHeap store.
> To go distributed, we have to move scope from OffHeap store to a Cache
> solution. Then comes the challenges Distributed or Replicated.
> Replicated is easy, but Distributed world would bring in a lot more
> design choice like Server or peer2peer architecture, what concurrency
> levels to support etc.
>
> My take
> 1. Implement OffHeap store
> 2. Benchmark it, refine it to give latencies that we need in
> production (would luv sub-milliseconds or keep then less than 2 ms)
>
> Once we reach here, we can think about going complete caching
> solution, followed by going distributed :)
>
>>
>> Lastly "I recently rewrote DM entirely for simplification." +1 on the
fact
>> of simplification.
>> As Einstien once said " Make things as simple as possible, not simpler"
:)
>>
>> Cheers,
>> Akash A
>>
>>
>>
>> On Wed, Oct 19, 2011 at 8:33 AM, Raffaele P. Guidi <
>> raffaele.p.guidi@gmail.com> wrote:
>>
>>> Uhm, I'm new in the ASF, but I guess you just need to read some docs
(ok,
>>> we
>>> don't have many, just something in the old wiki @github and an initial
>>> roadmap proposal [1]), take a look at the code and the issues and get
>>> started :) I see googling around that you have interests in hadoop and
>>> hbase
>>> as well, so maybe you could start investigating sinergies with those
>>> products (just an idea, it is in the roadmap).
>>>
>>> In any case thanks for your interest and welcome aboard.
>>>
>>> Ciao,
>>>    R
>>>
>>> [1] Roadmap proposal:
>>>
>>>
https://cwiki.apache.org/confluence/display/DIRECTMEMORY/2011/10/18/Apache+DirectMemory+-+initial+roadmap+discussion
>>>
>>> On Wed, Oct 19, 2011 at 4:44 AM, Akash Ashok <th...@gmail.com>
>>> wrote:
>>>
>>> > I know this is still getting incubated. But I would like to get
involved
>>> in
>>> > the development. What are the rules like? Can I get involved ?
>>> >
>>> > Cheers,
>>> > Akash A
>>> >
>>>
>>
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>

Re: Involvement as a developer

Posted by Ashish <pa...@gmail.com>.

On Wed, Oct 19, 2011 at 11:02 AM, Akash Ashok <th...@gmail.com> wrote:
> Thanks Guys for responding on a quick reply. HBase is indeed my first love
> :)
>
> I was really interested in what DirectMemory could offer and am excited to
> see that its pretty awesome. I posed that question because I wasn't really
> sure as to what the rules were on the incubator :)  Great to know I could
> get started right away. Shall start digging thru the code.

All ASF projects work the same way :)

>
> I went through the road map and have a few questions:
>
> 1. File Storage -
> This was my main concern. This moves more towards a database approach or to
> be more specific Key Value store approach  Aren't there existing file based
> solutions which can be used instead of developing another one?

Yes there are. We can use any of key-value stores or Object DB's like
BerkleyDB or others.
It would be good to have a pluggable layer, where a persistant
provider can be chosen.
However, its OffHeap aka stored in RAM, so don't feel its a priority,
and persistance would make sense when we evolve into a Cache
framework.

>
> 2. Hazelcast for replication -
> If i get this right the idea to use it more like a replication framework and
> then use that to cache them onto the DirectMemory on the respective systems
> ?

Hazelcast already has released OffHeap storage. Moreover, its a
complete Cache framework with provision of adding persistant stores.
So I am very much against using it. Please note that I use Hazelcast
myself, so not implying that its not a good framework.

Going Distributed is a different story, and if you see the past
conversations, our current focus is on OffHeap store.
To go distributed, we have to move scope from OffHeap store to a Cache
solution. Then comes the challenges Distributed or Replicated.
Replicated is easy, but Distributed world would bring in a lot more
design choice like Server or peer2peer architecture, what concurrency
levels to support etc.

My take
1. Implement OffHeap store
2. Benchmark it, refine it to give latencies that we need in
production (would luv sub-milliseconds or keep then less than 2 ms)

Once we reach here, we can think about going complete caching
solution, followed by going distributed :)

>
> Lastly "I recently rewrote DM entirely for simplification." +1 on the fact
> of simplification.
> As Einstien once said " Make things as simple as possible, not simpler" :)
>
> Cheers,
> Akash A
>
>
>
> On Wed, Oct 19, 2011 at 8:33 AM, Raffaele P. Guidi <
> raffaele.p.guidi@gmail.com> wrote:
>
>> Uhm, I'm new in the ASF, but I guess you just need to read some docs (ok,
>> we
>> don't have many, just something in the old wiki @github and an initial
>> roadmap proposal [1]), take a look at the code and the issues and get
>> started :) I see googling around that you have interests in hadoop and
>> hbase
>> as well, so maybe you could start investigating sinergies with those
>> products (just an idea, it is in the roadmap).
>>
>> In any case thanks for your interest and welcome aboard.
>>
>> Ciao,
>>    R
>>
>> [1] Roadmap proposal:
>>
>> https://cwiki.apache.org/confluence/display/DIRECTMEMORY/2011/10/18/Apache+DirectMemory+-+initial+roadmap+discussion
>>
>> On Wed, Oct 19, 2011 at 4:44 AM, Akash Ashok <th...@gmail.com>
>> wrote:
>>
>> > I know this is still getting incubated. But I would like to get involved
>> in
>> > the development. What are the rules like? Can I get involved ?
>> >
>> > Cheers,
>> > Akash A
>> >
>>
>

-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Involvement as a developer

Posted by Akash Ashok <th...@gmail.com>.

Thanks Guys for responding on a quick reply. HBase is indeed my first love
:)

I was really interested in what DirectMemory could offer and am excited to
see that its pretty awesome. I posed that question because I wasn't really
sure as to what the rules were on the incubator :)  Great to know I could
get started right away. Shall start digging thru the code.

I went through the road map and have a few questions:

1. File Storage -
This was my main concern. This moves more towards a database approach or to
be more specific Key Value store approach  Aren't there existing file based
solutions which can be used instead of developing another one?

2. Hazelcast for replication -
If i get this right the idea to use it more like a replication framework and
then use that to cache them onto the DirectMemory on the respective systems
?

Lastly "I recently rewrote DM entirely for simplification." +1 on the fact
of simplification.
As Einstien once said " Make things as simple as possible, not simpler" :)

Cheers,
Akash A

On Wed, Oct 19, 2011 at 8:33 AM, Raffaele P. Guidi <
raffaele.p.guidi@gmail.com> wrote:

> Uhm, I'm new in the ASF, but I guess you just need to read some docs (ok,
> we
> don't have many, just something in the old wiki @github and an initial
> roadmap proposal [1]), take a look at the code and the issues and get
> started :) I see googling around that you have interests in hadoop and
> hbase
> as well, so maybe you could start investigating sinergies with those
> products (just an idea, it is in the roadmap).
>
> In any case thanks for your interest and welcome aboard.
>
> Ciao,
>    R
>
> [1] Roadmap proposal:
>
> https://cwiki.apache.org/confluence/display/DIRECTMEMORY/2011/10/18/Apache+DirectMemory+-+initial+roadmap+discussion
>
> On Wed, Oct 19, 2011 at 4:44 AM, Akash Ashok <th...@gmail.com>
> wrote:
>
> > I know this is still getting incubated. But I would like to get involved
> in
> > the development. What are the rules like? Can I get involved ?
> >
> > Cheers,
> > Akash A
> >
>

Re: Involvement as a developer

Posted by "Raffaele P. Guidi" <ra...@gmail.com>.

Uhm, I'm new in the ASF, but I guess you just need to read some docs (ok, we
don't have many, just something in the old wiki @github and an initial
roadmap proposal [1]), take a look at the code and the issues and get
started :) I see googling around that you have interests in hadoop and hbase
as well, so maybe you could start investigating sinergies with those
products (just an idea, it is in the roadmap).

In any case thanks for your interest and welcome aboard.

Ciao,
    R

[1] Roadmap proposal:
https://cwiki.apache.org/confluence/display/DIRECTMEMORY/2011/10/18/Apache+DirectMemory+-+initial+roadmap+discussion

On Wed, Oct 19, 2011 at 4:44 AM, Akash Ashok <th...@gmail.com> wrote:

> I know this is still getting incubated. But I would like to get involved in
> the development. What are the rules like? Can I get involved ?
>
> Cheers,
> Akash A
>