You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gora.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2013/02/27 00:55:31 UTC

Pluggable Client Architecture for Apache Gora

Hi,
Me and Renato are only discussing a pluggable client architecture for Gora.
The motivation to discuss pluggable clients is performance across Gora and
the fact that so far we (between Renato and Myself) have little or no
indication of how the current performance is within Gora.
To put this into context, if one were to consider the gora-cassandra module
you see that we utilize Hector client as the underlying access Apache
Cassandra. We are interested to find if there are benefits (for Gora) to be
leveraged from comparing performance of clients.
Is there any comments on this from anyone?
Thank you
Lewis

-- 
*Lewis*

Re: Pluggable Client Architecture for Apache Gora

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Sorry, I forgot to write a NOT in

 This is only truth for data stores with different clients, but for
others it might NOT be such a great idea because for example Amazon
DynamoDB has only a single client provided by Amazon itself (and it'll
probably remain the same).

And as Lewis said, we need to think about benchmarking Gora @ some
point in the future so we can make it more appealing to other people
and engage a wider community (:


Renato M.

2013/3/4 Renato Marroquín Mogrovejo <re...@gmail.com>:
> Thanks for the support and ideas for this. Lewis and me have been
> talking about this for a while and we've been trying to shape the
> ideas so we could even write to the list. There are many many things
> to be done before we even decide to start on coding this, but to have
> the idea IMHO is the first step (:
>
> I don't think it'd work as Hive storage handler because Hive storage
> handler is more of a "pluggable format" enabler. So I think Hive's
> storage handler is a little Gora for Hive because it allows Hive
> access HBase, MySQL, and S3 data. The goal of this idea is to make
> Gora independent of clients (at least Cassandra's clients as there are
> too many in the world, and we don't know which one is the best one),
> so Gora can use the most suitable for each case. This is only truth
> for data stores with different clients, but for others it might be
> such a great idea because for example Amazon DynamoDB has only a
> single client provided by Amazon itself (and it'll probably remain the
> same).
>
>
> Renato M.
>
> 2013/3/4 Henry Saputra <he...@gmail.com>:
>> Agree. We have good reason to make it happen.
>>
>> +1
>>
>> - Henry
>>
>>
>> On Mon, Mar 4, 2013 at 12:00 PM, Lewis John Mcgibbney
>> <le...@gmail.com> wrote:
>>>
>>> I need to look at the Hive storage handler Henry but I imagine that the
>>> motivation and implementation is and would be similar in principle.
>>> With regards to your second point, this is very true and it is not going
>>> to be a simple thing to achieve. The thing which is working on our side is
>>> that Gora is still a young project at heart. We're still actively finding
>>> new improvements, suggestions to improve the core API etc. For example,  we
>>> don't even support delete or deleteByQuery in gora-cassandra TODO ;)
>>>
>>> Our motivation is really to discover what degree of overhead Gora put on
>>> operations between user and Datastore. In.Cassandra there are many possible
>>> clients we could build on e.g. hector client,  datastax java driver,
>>> intravert-ug etc so we want to find this stuff out if we are.to make the
>>> best code available.
>>>
>>>
>>>
>>> On Monday, March 4, 2013, Henry Saputra <he...@gmail.com> wrote:
>>> > Would this similar to storage handler architecture approach in Hive?
>>> > Looks like good idea.
>>> > The problem/difficulty I could think of on the top of my head is the
>>> > vast differences between operations in the origin data sources. So the API
>>> > has to be very generic and at the same time complete to cover all scenarios.
>>> > - Henry
>>> >
>>> > On Tue, Feb 26, 2013 at 3:55 PM, Lewis John Mcgibbney
>>> > <le...@gmail.com> wrote:
>>> >>
>>> >> Hi,
>>> >> Me and Renato are only discussing a pluggable client architecture for
>>> >> Gora.
>>> >> The motivation to discuss pluggable clients is performance across Gora
>>> >> and the fact that so far we (between Renato and Myself) have little or no
>>> >> indication of how the current performance is within Gora.
>>> >> To put this into context, if one were to consider the gora-cassandra
>>> >> module you see that we utilize Hector client as the underlying access Apache
>>> >> Cassandra. We are interested to find if there are benefits (for Gora) to be
>>> >> leveraged from comparing performance of clients.
>>> >> Is there any comments on this from anyone?
>>> >> Thank you
>>> >> Lewis
>>> >>
>>> >> --
>>> >> Lewis
>>> >
>>> >
>>>
>>> --
>>> Lewis
>>>
>>

Re: Pluggable Client Architecture for Apache Gora

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Thanks for the support and ideas for this. Lewis and me have been
talking about this for a while and we've been trying to shape the
ideas so we could even write to the list. There are many many things
to be done before we even decide to start on coding this, but to have
the idea IMHO is the first step (:

I don't think it'd work as Hive storage handler because Hive storage
handler is more of a "pluggable format" enabler. So I think Hive's
storage handler is a little Gora for Hive because it allows Hive
access HBase, MySQL, and S3 data. The goal of this idea is to make
Gora independent of clients (at least Cassandra's clients as there are
too many in the world, and we don't know which one is the best one),
so Gora can use the most suitable for each case. This is only truth
for data stores with different clients, but for others it might be
such a great idea because for example Amazon DynamoDB has only a
single client provided by Amazon itself (and it'll probably remain the
same).


Renato M.

2013/3/4 Henry Saputra <he...@gmail.com>:
> Agree. We have good reason to make it happen.
>
> +1
>
> - Henry
>
>
> On Mon, Mar 4, 2013 at 12:00 PM, Lewis John Mcgibbney
> <le...@gmail.com> wrote:
>>
>> I need to look at the Hive storage handler Henry but I imagine that the
>> motivation and implementation is and would be similar in principle.
>> With regards to your second point, this is very true and it is not going
>> to be a simple thing to achieve. The thing which is working on our side is
>> that Gora is still a young project at heart. We're still actively finding
>> new improvements, suggestions to improve the core API etc. For example,  we
>> don't even support delete or deleteByQuery in gora-cassandra TODO ;)
>>
>> Our motivation is really to discover what degree of overhead Gora put on
>> operations between user and Datastore. In.Cassandra there are many possible
>> clients we could build on e.g. hector client,  datastax java driver,
>> intravert-ug etc so we want to find this stuff out if we are.to make the
>> best code available.
>>
>>
>>
>> On Monday, March 4, 2013, Henry Saputra <he...@gmail.com> wrote:
>> > Would this similar to storage handler architecture approach in Hive?
>> > Looks like good idea.
>> > The problem/difficulty I could think of on the top of my head is the
>> > vast differences between operations in the origin data sources. So the API
>> > has to be very generic and at the same time complete to cover all scenarios.
>> > - Henry
>> >
>> > On Tue, Feb 26, 2013 at 3:55 PM, Lewis John Mcgibbney
>> > <le...@gmail.com> wrote:
>> >>
>> >> Hi,
>> >> Me and Renato are only discussing a pluggable client architecture for
>> >> Gora.
>> >> The motivation to discuss pluggable clients is performance across Gora
>> >> and the fact that so far we (between Renato and Myself) have little or no
>> >> indication of how the current performance is within Gora.
>> >> To put this into context, if one were to consider the gora-cassandra
>> >> module you see that we utilize Hector client as the underlying access Apache
>> >> Cassandra. We are interested to find if there are benefits (for Gora) to be
>> >> leveraged from comparing performance of clients.
>> >> Is there any comments on this from anyone?
>> >> Thank you
>> >> Lewis
>> >>
>> >> --
>> >> Lewis
>> >
>> >
>>
>> --
>> Lewis
>>
>

Re: Pluggable Client Architecture for Apache Gora

Posted by Henry Saputra <he...@gmail.com>.
Agree. We have good reason to make it happen.

+1

- Henry


On Mon, Mar 4, 2013 at 12:00 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> I need to look at the Hive storage handler Henry but I imagine that the
> motivation and implementation is and would be similar in principle.
> With regards to your second point, this is very true and it is not going
> to be a simple thing to achieve. The thing which is working on our side is
> that Gora is still a young project at heart. We're still actively finding
> new improvements, suggestions to improve the core API etc. For example,  we
> don't even support delete or deleteByQuery in gora-cassandra TODO ;)
>
> Our motivation is really to discover what degree of overhead Gora put on
> operations between user and Datastore. In.Cassandra there are many possible
> clients we could build on e.g. hector client,  datastax java driver,
> intravert-ug etc so we want to find this stuff out if we are.to make the
> best code available.
>
>
>
> On Monday, March 4, 2013, Henry Saputra <he...@gmail.com> wrote:
> > Would this similar to storage handler architecture approach in Hive?
> > Looks like good idea.
> > The problem/difficulty I could think of on the top of my head is the
> vast differences between operations in the origin data sources. So the API
> has to be very generic and at the same time complete to cover all scenarios.
> > - Henry
> >
> > On Tue, Feb 26, 2013 at 3:55 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
> >>
> >> Hi,
> >> Me and Renato are only discussing a pluggable client architecture for
> Gora.
> >> The motivation to discuss pluggable clients is performance across Gora
> and the fact that so far we (between Renato and Myself) have little or no
> indication of how the current performance is within Gora.
> >> To put this into context, if one were to consider the gora-cassandra
> module you see that we utilize Hector client as the underlying access
> Apache Cassandra. We are interested to find if there are benefits (for
> Gora) to be leveraged from comparing performance of clients.
> >> Is there any comments on this from anyone?
> >> Thank you
> >> Lewis
> >>
> >> --
> >> Lewis
> >
> >
>
> --
> *Lewis*
>
>

Re: Pluggable Client Architecture for Apache Gora

Posted by Lewis John Mcgibbney <le...@gmail.com>.
I need to look at the Hive storage handler Henry but I imagine that the
motivation and implementation is and would be similar in principle.
With regards to your second point, this is very true and it is not going to
be a simple thing to achieve. The thing which is working on our side is
that Gora is still a young project at heart. We're still actively finding
new improvements, suggestions to improve the core API etc. For example,  we
don't even support delete or deleteByQuery in gora-cassandra TODO ;)

Our motivation is really to discover what degree of overhead Gora put on
operations between user and Datastore. In.Cassandra there are many possible
clients we could build on e.g. hector client,  datastax java driver,
intravert-ug etc so we want to find this stuff out if we are.to make the
best code available.


On Monday, March 4, 2013, Henry Saputra <he...@gmail.com> wrote:
> Would this similar to storage handler architecture approach in Hive?
> Looks like good idea.
> The problem/difficulty I could think of on the top of my head is the vast
differences between operations in the origin data sources. So the API has
to be very generic and at the same time complete to cover all scenarios.
> - Henry
>
> On Tue, Feb 26, 2013 at 3:55 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
>>
>> Hi,
>> Me and Renato are only discussing a pluggable client architecture for
Gora.
>> The motivation to discuss pluggable clients is performance across Gora
and the fact that so far we (between Renato and Myself) have little or no
indication of how the current performance is within Gora.
>> To put this into context, if one were to consider the gora-cassandra
module you see that we utilize Hector client as the underlying access
Apache Cassandra. We are interested to find if there are benefits (for
Gora) to be leveraged from comparing performance of clients.
>> Is there any comments on this from anyone?
>> Thank you
>> Lewis
>>
>> --
>> Lewis
>
>

-- 
*Lewis*

Re: Pluggable Client Architecture for Apache Gora

Posted by Henry Saputra <he...@gmail.com>.
Would this similar to storage handler architecture approach in Hive?

Looks like good idea.
The problem/difficulty I could think of on the top of my head is the vast
differences between operations in the origin data sources. So the API has
to be very generic and at the same time complete to cover all scenarios.

- Henry


On Tue, Feb 26, 2013 at 3:55 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi,
> Me and Renato are only discussing a pluggable client architecture for Gora.
> The motivation to discuss pluggable clients is performance across Gora and
> the fact that so far we (between Renato and Myself) have little or no
> indication of how the current performance is within Gora.
> To put this into context, if one were to consider the gora-cassandra
> module you see that we utilize Hector client as the underlying access
> Apache Cassandra. We are interested to find if there are benefits (for
> Gora) to be leveraged from comparing performance of clients.
> Is there any comments on this from anyone?
> Thank you
> Lewis
>
> --
> *Lewis*
>

Re: Pluggable Client Architecture for Apache Gora

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Woot!

From: Lewis John Mcgibbney <le...@gmail.com>>
Reply-To: "user@gora.apache.org<ma...@gora.apache.org>" <us...@gora.apache.org>>
Date: Monday, March 4, 2013 11:00 AM
To: "user@gora.apache.org<ma...@gora.apache.org>" <us...@gora.apache.org>>
Subject: Re: Pluggable Client Architecture for Apache Gora

Hi Chris,
We intend on working on this and submitting a proposal to CassandraSummit which legend has it is being held this year in San Francisco. We will keep this thread (and presumably a jira ticket) lively.
Thanks for your input.
Lewis

http://www.datastax.com/events/cassandrasummit2012

On Sunday, March 3, 2013, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>> wrote:
> My comment is that I'd love to help you guys write it up for a research conference when you do the benchmarking! :)
> Cheers,
> Chris
>
> From: Lewis John Mcgibbney <le...@gmail.com>>
> Reply-To: "user@gora.apache.org<ma...@gora.apache.org>" <us...@gora.apache.org>>
> Date: Tuesday, February 26, 2013 3:55 PM
> To: "<us...@gora.apache.org>>" <us...@gora.apache.org>>
> Subject: Pluggable Client Architecture for Apache Gora
>
> Hi,
> Me and Renato are only discussing a pluggable client architecture for Gora.
> The motivation to discuss pluggable clients is performance across Gora and the fact that so far we (between Renato and Myself) have little or no indication of how the current performance is within Gora.
> To put this into context, if one were to consider the gora-cassandra module you see that we utilize Hector client as the underlying access Apache Cassandra. We are interested to find if there are benefits (for Gora) to be leveraged from comparing performance of clients.
> Is there any comments on this from anyone?
> Thank you
> Lewis
>
> --
> Lewis
>

--
Lewis


Re: Pluggable Client Architecture for Apache Gora

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Chris,
We intend on working on this and submitting a proposal to CassandraSummit
which legend has it is being held this year in San Francisco. We will keep
this thread (and presumably a jira ticket) lively.
Thanks for your input.
Lewis

http://www.datastax.com/events/cassandrasummit2012

On Sunday, March 3, 2013, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:
> My comment is that I'd love to help you guys write it up for a research
conference when you do the benchmarking! :)
> Cheers,
> Chris
>
> From: Lewis John Mcgibbney <le...@gmail.com>
> Reply-To: "user@gora.apache.org" <us...@gora.apache.org>
> Date: Tuesday, February 26, 2013 3:55 PM
> To: "<us...@gora.apache.org>" <us...@gora.apache.org>
> Subject: Pluggable Client Architecture for Apache Gora
>
> Hi,
> Me and Renato are only discussing a pluggable client architecture for
Gora.
> The motivation to discuss pluggable clients is performance across Gora
and the fact that so far we (between Renato and Myself) have little or no
indication of how the current performance is within Gora.
> To put this into context, if one were to consider the gora-cassandra
module you see that we utilize Hector client as the underlying access
Apache Cassandra. We are interested to find if there are benefits (for
Gora) to be leveraged from comparing performance of clients.
> Is there any comments on this from anyone?
> Thank you
> Lewis
>
> --
> Lewis
>

-- 
*Lewis*

Re: Pluggable Client Architecture for Apache Gora

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
My comment is that I'd love to help you guys write it up for a research conference when you do the benchmarking! :)

Cheers,
Chris


From: Lewis John Mcgibbney <le...@gmail.com>>
Reply-To: "user@gora.apache.org<ma...@gora.apache.org>" <us...@gora.apache.org>>
Date: Tuesday, February 26, 2013 3:55 PM
To: "<us...@gora.apache.org>>" <us...@gora.apache.org>>
Subject: Pluggable Client Architecture for Apache Gora

Hi,
Me and Renato are only discussing a pluggable client architecture for Gora.
The motivation to discuss pluggable clients is performance across Gora and the fact that so far we (between Renato and Myself) have little or no indication of how the current performance is within Gora.
To put this into context, if one were to consider the gora-cassandra module you see that we utilize Hector client as the underlying access Apache Cassandra. We are interested to find if there are benefits (for Gora) to be leveraged from comparing performance of clients.
Is there any comments on this from anyone?
Thank you
Lewis

--
Lewis