You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@bigtop.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2013/06/07 08:08:12 UTC

Bigtop for Goraci

Hi Bigtopers,

I am over here asking questions as I am a bit lost right now.
Over in Gora we have a real nice test suite called Goraci [0]. Basically
the test runs many ingest clients that continually create linked lists
containing 25 million nodes. At some point the clients are stopped and a
map reduce job is run to ensure no linked list has a hole. A hole indicates
data was lost. Gennerally speaking the more nodes in the cluster, the
better a chance there is of us finding that data is lost.

Now for part two... in Gora we currently have datastore implementations for
Accumulo, Avro, Cassandra, HBase and Amazon Dynamodb... what we don not
have, is a mechanism to run the ingestion test against each datastore as a
controlled job meaning that we can subsequently gather metrics and infer
behaviour across Gora datastores.

I have not gone to our friends @Infra yet as I would rather do my homework
first and exhaust the avenues where I could contribute to getting this off
of the ground.

My questions are therefore very very simple... does anyone have an idea
about how we can get this working in tandem? Is this prime territory for
Bigtop?

Thanks very much in advance.
Best
Lewis

[0] https://github.com/keith-turner/goraci

-- 
*Lewis*

Re: Bigtop for Goraci

Posted by Roman Shaposhnik <rv...@apache.org>.

Hi Lewis!

On Thu, Jun 6, 2013 at 11:08 PM, Lewis John Mcgibbney
<le...@gmail.com> wrote:
> My questions are therefore very very simple... does anyone have an idea
> about how we can get this working in tandem? Is this prime territory for
> Bigtop?

I think it would be awesome to make Gora part of the Bigtop distribution.
What it would accomplish is a much tighter integration between Gora and
HBase/Hadoop and a ready made availability in Bigtop binary distro (and
potentially commercial Hadoop vendors).

As a more day-to-day task it would also enable both of our communities
to rely on the results of integration tests that we can run on Bigtop's
Jenkins infrastructure.

Now, I'd be more than happy to help you guys out with basic understanding
of integrating a component into Bigtop, but I'm afraid that personally
I won't be able to spend much cycles doing the actual work.

If you're still interested -- do let us know!

Thanks,
Roman.

Re: Bigtop for Goraci

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Cos,

On Friday, June 7, 2013, Konstantin Boudnik <co...@apache.org> wrote:
> very interesting stuff, indeed. I am a little bit confused though: when
you
> are saying "does anyone have an idea about how we can get this working in
> tandem" what parts needs to be coupled here? The suite and DB
implementations?

The suite and DB implementation... you got it. The difference in running
the suite across different datastores is as follows

1. Add some trivial configuration to a gora.properties file specifying
which datastore you wish to use
2. generate a trivial xml mapping file which maps an avro JSON schema to
persistent beans. (we are in the process of automating this)
3. Run a mvn profile to download the required dependencies for whichever
datastore the test has to run against
4. execute commands from a bash script which run the test(s).
5. Aggregate the results from the MR jobs.

> Or is it something else? Appreciate more info.

Well yeah there kinda is 'something else' we need to provision clusters of
machines kitted out with the specific databases up and running prior to the
execution of my (most likely) rather simplified step-by-step above ;)

I don't imagine automating  the step-by-step to be much of a problem, but I
don't know where in gods name we could setUp() clusters of machines for
each datastore in a uniform manner.
Thanks so much for your input, it is really really valuable.

-- 
*Lewis*

Re: Bigtop for Goraci

Posted by Konstantin Boudnik <co...@apache.org>.

Hi Lewis.

very interesting stuff, indeed. I am a little bit confused though: when you
are saying "does anyone have an idea about how we can get this working in
tandem" what parts needs to be coupled here? The suite and DB implementations?

Or is it something else? Appreciate more info.

Regards,
  Cos

On Thu, Jun 06, 2013 at 11:08PM, Lewis John Mcgibbney wrote:
> Hi Bigtopers,
> 
> I am over here asking questions as I am a bit lost right now.
> Over in Gora we have a real nice test suite called Goraci [0]. Basically
> the test runs many ingest clients that continually create linked lists
> containing 25 million nodes. At some point the clients are stopped and a
> map reduce job is run to ensure no linked list has a hole. A hole indicates
> data was lost. Gennerally speaking the more nodes in the cluster, the
> better a chance there is of us finding that data is lost.
> 
> Now for part two... in Gora we currently have datastore implementations for
> Accumulo, Avro, Cassandra, HBase and Amazon Dynamodb... what we don not
> have, is a mechanism to run the ingestion test against each datastore as a
> controlled job meaning that we can subsequently gather metrics and infer
> behaviour across Gora datastores.
> 
> I have not gone to our friends @Infra yet as I would rather do my homework
> first and exhaust the avenues where I could contribute to getting this off
> of the ground.
> 
> My questions are therefore very very simple... does anyone have an idea
> about how we can get this working in tandem? Is this prime territory for
> Bigtop?
> 
> Thanks very much in advance.
> Best
> Lewis
> 
> [0] https://github.com/keith-turner/goraci
> 
> -- 
> *Lewis*