You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by RJ Nowling <rn...@gmail.com> on 2015/03/26 21:22:14 UTC

Hosting Data Generators in BigTop

Hi all,

Most of you are aware of my work with Jay on BigPetStore, particularly the
data generator and Spark pipelines.  Data generators are a great way to
load test systems, as Jay has recently done for kubernetes using the BPS
data generator.

We think they're generally useful to the big data community. Would BigTop
be interested in hosting these data generator / load testing tools as
released artifacts in their own right?

For example, we'd like to set up a web page on the BigTop site with links
to:

* BPS Data Generator
* BPS Spark
* BPS Transaction Queue for using the data generator to test streaming
services

and we'd like to release these as source tarballs, uber JARs, Maven-hosted
JARs, and Docker containers (as appropriate).

Would this be okay or should everything be released as part of BigTop
itself?

Secondly, I've been working on a model for simulating customer movements at
a conference.  It's designed for development and testing for a real-time
streaming analytics application where we didn't have access to data ahead
of time.  You can read about it here:

http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html

I'd like to call it "BigTop Bazaar" and release it through BigTop.  Is the
BigTop community interested in having multiple data generators?

Thanks,
RJ

Re: Hosting Data Generators in BigTop

Posted by Andrew Purtell <ap...@apache.org>.
Sure, we can take this to a JIRA and discuss by hangout. Maybe we can set
up something one day next week in the afternoon in the Pacific timezone?

Implementing a data generator in Go will be an integration challenge with
most of the Apache ecosystem if the generator is supposed to interact
directly with the system(s) under test. Given this is "big data" I don't
think we want to generate into an intermediate place, i.e. really big
files, and then replay those with some other Java-native utility. Therefore
you'll have to use GoJVM or something similar to interface with the client
Java code via JNI. JNI interactions aren't efficient unless the client API
of whatever component you are generating data for can accept data via
direct mapped buffers, and most don't. (Maybe none of them?) This will
limit the peak throughput of the generator.

Of course it would be great if someone builds a Go client for Apache $FOO
and contributes it. To do it right (which, in my opinion, avoids JNI), this
would involve reverse engineering wire formats and going up from there,
like what asynchbase (https://github.com/OpenTSDB/asynchbase) did for HBase
RPC. I don't expect this will happen any time soon but who knows, this is
open source.


On Mon, Mar 30, 2015 at 6:14 PM, jay vyas <ja...@gmail.com>
wrote:

> also guys shall we carry these guys  on in the JIRA
> https://issues.apache.org/jira/browse/BIGTOP-1782  ?
>  hangout conversation is a great idea if andy has time :)
>
> On Mon, Mar 30, 2015 at 8:05 PM, RJ Nowling <rn...@gmail.com> wrote:
>
> > My current model simulates the conference attendees as particles and
> > reports their X,Y positions at specified intervals. We could modify the
> > output to compute distances to scanners.
> >
> > I currently have half the model implemented in Golang. I could rewrite in
> > a Java or another JVM language, commit to BigTop, and continue
> development
> > through BigTop so BigTop can track my progress.
> >
> > Andrew, would you be willing to talk on the phone or via Google Hangouts
> > to work out details for a plan on integration with HBase / Phoenix?
> >
> >
> >
> > > On Mar 30, 2015, at 5:07 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> > >
> > > For IoT, some low hanging fruit is a sensor network use case. The
> > > particulars of the use case can vary but I can see stressing HBase on
> the
> > > write side by deploying sensors over a simulated 2-dimensional space,
> > > keying in part by location, and then having telemetry timeseries data
> > > arrive by time and location in irregular patterns. (Sensors would only
> > > report changes. The generator could model duty cycles in addition to
> > > modeling the physical process under measurement.) We could scale up and
> > > down the data bulk and arrival rate by varying the size of the
> simulated
> > > space and the rate of measurement change notices produced by the model.
> > On
> > > the read side having compound keys with geolocation in the leading
> edge,
> > > followed by a time component, would be natural for interactive
> > > visualization of the data as heat maps. They could be animated or
> > > summarized over varying time ranges. This would produce short and long
> > > scanning access patterns with wide variation in selectivity of server
> > side
> > > filtering depending on query. If using Phoenix, it would parallelize
> the
> > > scanning activity and put load through the roof.
> > >
> > >
> > > On Fri, Mar 27, 2015 at 11:58 AM, jay vyas <
> jayunit100.apache@gmail.com>
> > > wrote:
> > >
> > >> Definetely will be awesome if andrew can help us craft an idiomatic
> and
> > >> meaningfull way to stress HBase at scale w/ iot data
> > >>
> > >>> On Fri, Mar 27, 2015 at 2:48 PM, RJ Nowling <rn...@gmail.com>
> > wrote:
> > >>>
> > >>> Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss
> ways
> > >> to
> > >>> connect BigTop Bazaar to HBase.
> > >>>
> > >>> It would be great to work with the BigBench project to see if our
> data
> > >>> generators would be of interest.
> > >>>
> > >>> On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <apurtell@apache.org
> >
> > >>> wrote:
> > >>>
> > >>>> I agree the proposal sounds very interesting.
> > >>>>
> > >>>> I can also help with the HBase side of things.
> > >>>>
> > >>>> On the general subject of data generators, you may want to reach out
> > to
> > >>> the
> > >>>> people behind the "BigBench" project (
> > >>>> https://github.com/intel-hadoop/Big-Bench). These are ex colleagues
> > of
> > >>>> mine
> > >>>> from Intel. When I was there they were interested in contributing to
> > >>>> Apache, but had significant problems in that the data generator
> itself
> > >>> was
> > >>>> licensed under non-free terms incompatible with the ASL. I think
> they
> > >>>> wanted to move past that but weren't sure exactly how (including
> > having
> > >>> the
> > >>>> bandwidth to do so). I see occasional updates to the repo so they
> are
> > >>> still
> > >>>> active in some way.
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <
> > jayunit100.apache@gmail.com
> > >>>
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks for proposing rj.
> > >>>>>
> > >>>>> Im in favor, so long as it comes w/ a bigtop supported use case,
> and
> > >>>> indeed
> > >>>>> BigTop bazaar is a lovely use case for hbase !
> > >>>>>
> > >>>>> I'm happy help you with the HBase side of things, maybe andrew can
> > >>>>> collaborate on a reference architecture with us for scale testing
> of
> > >>>> hbase
> > >>>>> via bigtop bazaar's realtime IoT style of data generation.
> > >>>>>
> > >>>>> That will be a great blueprint compleiment to the mapreduce, spark,
> > >>>>> blueprints which we already have.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com>
> > >>> wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> Most of you are aware of my work with Jay on BigPetStore,
> > >>> particularly
> > >>>>> the
> > >>>>>> data generator and Spark pipelines.  Data generators are a great
> > >> way
> > >>> to
> > >>>>>> load test systems, as Jay has recently done for kubernetes using
> > >> the
> > >>>> BPS
> > >>>>>> data generator.
> > >>>>>>
> > >>>>>> We think they're generally useful to the big data community. Would
> > >>>> BigTop
> > >>>>>> be interested in hosting these data generator / load testing tools
> > >> as
> > >>>>>> released artifacts in their own right?
> > >>>>>>
> > >>>>>> For example, we'd like to set up a web page on the BigTop site
> with
> > >>>> links
> > >>>>>> to:
> > >>>>>>
> > >>>>>> * BPS Data Generator
> > >>>>>> * BPS Spark
> > >>>>>> * BPS Transaction Queue for using the data generator to test
> > >>> streaming
> > >>>>>> services
> > >>>>>>
> > >>>>>> and we'd like to release these as source tarballs, uber JARs,
> > >>>>> Maven-hosted
> > >>>>>> JARs, and Docker containers (as appropriate).
> > >>>>>>
> > >>>>>> Would this be okay or should everything be released as part of
> > >> BigTop
> > >>>>>> itself?
> > >>>>>>
> > >>>>>> Secondly, I've been working on a model for simulating customer
> > >>>> movements
> > >>>>> at
> > >>>>>> a conference.  It's designed for development and testing for a
> > >>>> real-time
> > >>>>>> streaming analytics application where we didn't have access to
> data
> > >>>> ahead
> > >>>>>> of time.  You can read about it here:
> > >>>>>>
> > >>>>>>
> http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
> > >>>>>>
> > >>>>>> I'd like to call it "BigTop Bazaar" and release it through BigTop.
> > >>> Is
> > >>>>> the
> > >>>>>> BigTop community interested in having multiple data generators?
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> RJ
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> jay vyas
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best regards,
> > >>>>
> > >>>>   - Andy
> > >>>>
> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> > >> Hein
> > >>>> (via Tom White)
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> jay vyas
> > >>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >   - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> >
>
>
>
> --
> jay vyas
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Hosting Data Generators in BigTop

Posted by jay vyas <ja...@gmail.com>.
also guys shall we carry these guys  on in the JIRA
https://issues.apache.org/jira/browse/BIGTOP-1782  ?
 hangout conversation is a great idea if andy has time :)

On Mon, Mar 30, 2015 at 8:05 PM, RJ Nowling <rn...@gmail.com> wrote:

> My current model simulates the conference attendees as particles and
> reports their X,Y positions at specified intervals. We could modify the
> output to compute distances to scanners.
>
> I currently have half the model implemented in Golang. I could rewrite in
> a Java or another JVM language, commit to BigTop, and continue development
> through BigTop so BigTop can track my progress.
>
> Andrew, would you be willing to talk on the phone or via Google Hangouts
> to work out details for a plan on integration with HBase / Phoenix?
>
>
>
> > On Mar 30, 2015, at 5:07 PM, Andrew Purtell <ap...@apache.org> wrote:
> >
> > For IoT, some low hanging fruit is a sensor network use case. The
> > particulars of the use case can vary but I can see stressing HBase on the
> > write side by deploying sensors over a simulated 2-dimensional space,
> > keying in part by location, and then having telemetry timeseries data
> > arrive by time and location in irregular patterns. (Sensors would only
> > report changes. The generator could model duty cycles in addition to
> > modeling the physical process under measurement.) We could scale up and
> > down the data bulk and arrival rate by varying the size of the simulated
> > space and the rate of measurement change notices produced by the model.
> On
> > the read side having compound keys with geolocation in the leading edge,
> > followed by a time component, would be natural for interactive
> > visualization of the data as heat maps. They could be animated or
> > summarized over varying time ranges. This would produce short and long
> > scanning access patterns with wide variation in selectivity of server
> side
> > filtering depending on query. If using Phoenix, it would parallelize the
> > scanning activity and put load through the roof.
> >
> >
> > On Fri, Mar 27, 2015 at 11:58 AM, jay vyas <ja...@gmail.com>
> > wrote:
> >
> >> Definetely will be awesome if andrew can help us craft an idiomatic and
> >> meaningfull way to stress HBase at scale w/ iot data
> >>
> >>> On Fri, Mar 27, 2015 at 2:48 PM, RJ Nowling <rn...@gmail.com>
> wrote:
> >>>
> >>> Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss ways
> >> to
> >>> connect BigTop Bazaar to HBase.
> >>>
> >>> It would be great to work with the BigBench project to see if our data
> >>> generators would be of interest.
> >>>
> >>> On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <ap...@apache.org>
> >>> wrote:
> >>>
> >>>> I agree the proposal sounds very interesting.
> >>>>
> >>>> I can also help with the HBase side of things.
> >>>>
> >>>> On the general subject of data generators, you may want to reach out
> to
> >>> the
> >>>> people behind the "BigBench" project (
> >>>> https://github.com/intel-hadoop/Big-Bench). These are ex colleagues
> of
> >>>> mine
> >>>> from Intel. When I was there they were interested in contributing to
> >>>> Apache, but had significant problems in that the data generator itself
> >>> was
> >>>> licensed under non-free terms incompatible with the ASL. I think they
> >>>> wanted to move past that but weren't sure exactly how (including
> having
> >>> the
> >>>> bandwidth to do so). I see occasional updates to the repo so they are
> >>> still
> >>>> active in some way.
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <
> jayunit100.apache@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>> Thanks for proposing rj.
> >>>>>
> >>>>> Im in favor, so long as it comes w/ a bigtop supported use case, and
> >>>> indeed
> >>>>> BigTop bazaar is a lovely use case for hbase !
> >>>>>
> >>>>> I'm happy help you with the HBase side of things, maybe andrew can
> >>>>> collaborate on a reference architecture with us for scale testing of
> >>>> hbase
> >>>>> via bigtop bazaar's realtime IoT style of data generation.
> >>>>>
> >>>>> That will be a great blueprint compleiment to the mapreduce, spark,
> >>>>> blueprints which we already have.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> Most of you are aware of my work with Jay on BigPetStore,
> >>> particularly
> >>>>> the
> >>>>>> data generator and Spark pipelines.  Data generators are a great
> >> way
> >>> to
> >>>>>> load test systems, as Jay has recently done for kubernetes using
> >> the
> >>>> BPS
> >>>>>> data generator.
> >>>>>>
> >>>>>> We think they're generally useful to the big data community. Would
> >>>> BigTop
> >>>>>> be interested in hosting these data generator / load testing tools
> >> as
> >>>>>> released artifacts in their own right?
> >>>>>>
> >>>>>> For example, we'd like to set up a web page on the BigTop site with
> >>>> links
> >>>>>> to:
> >>>>>>
> >>>>>> * BPS Data Generator
> >>>>>> * BPS Spark
> >>>>>> * BPS Transaction Queue for using the data generator to test
> >>> streaming
> >>>>>> services
> >>>>>>
> >>>>>> and we'd like to release these as source tarballs, uber JARs,
> >>>>> Maven-hosted
> >>>>>> JARs, and Docker containers (as appropriate).
> >>>>>>
> >>>>>> Would this be okay or should everything be released as part of
> >> BigTop
> >>>>>> itself?
> >>>>>>
> >>>>>> Secondly, I've been working on a model for simulating customer
> >>>> movements
> >>>>> at
> >>>>>> a conference.  It's designed for development and testing for a
> >>>> real-time
> >>>>>> streaming analytics application where we didn't have access to data
> >>>> ahead
> >>>>>> of time.  You can read about it here:
> >>>>>>
> >>>>>> http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
> >>>>>>
> >>>>>> I'd like to call it "BigTop Bazaar" and release it through BigTop.
> >>> Is
> >>>>> the
> >>>>>> BigTop community interested in having multiple data generators?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> RJ
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> jay vyas
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>>
> >>>>   - Andy
> >>>>
> >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >> Hein
> >>>> (via Tom White)
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>



-- 
jay vyas

Re: Hosting Data Generators in BigTop

Posted by Andrew Purtell <ap...@apache.org>.
> My current model simulates the conference attendees as particles and
reports their X,Y positions at specified intervals. We could modify the
output to compute distances to scanners.

Tangent: This reminds me of MERLSense:
https://sites.google.com/a/drwren.com/wmd/home

On Mon, Mar 30, 2015 at 5:05 PM, RJ Nowling <rn...@gmail.com> wrote:

> My current model simulates the conference attendees as particles and
> reports their X,Y positions at specified intervals. We could modify the
> output to compute distances to scanners.
>
> I currently have half the model implemented in Golang. I could rewrite in
> a Java or another JVM language, commit to BigTop, and continue development
> through BigTop so BigTop can track my progress.
>
> Andrew, would you be willing to talk on the phone or via Google Hangouts
> to work out details for a plan on integration with HBase / Phoenix?
>
>
>
> > On Mar 30, 2015, at 5:07 PM, Andrew Purtell <ap...@apache.org> wrote:
> >
> > For IoT, some low hanging fruit is a sensor network use case. The
> > particulars of the use case can vary but I can see stressing HBase on the
> > write side by deploying sensors over a simulated 2-dimensional space,
> > keying in part by location, and then having telemetry timeseries data
> > arrive by time and location in irregular patterns. (Sensors would only
> > report changes. The generator could model duty cycles in addition to
> > modeling the physical process under measurement.) We could scale up and
> > down the data bulk and arrival rate by varying the size of the simulated
> > space and the rate of measurement change notices produced by the model.
> On
> > the read side having compound keys with geolocation in the leading edge,
> > followed by a time component, would be natural for interactive
> > visualization of the data as heat maps. They could be animated or
> > summarized over varying time ranges. This would produce short and long
> > scanning access patterns with wide variation in selectivity of server
> side
> > filtering depending on query. If using Phoenix, it would parallelize the
> > scanning activity and put load through the roof.
> >
> >
> > On Fri, Mar 27, 2015 at 11:58 AM, jay vyas <ja...@gmail.com>
> > wrote:
> >
> >> Definetely will be awesome if andrew can help us craft an idiomatic and
> >> meaningfull way to stress HBase at scale w/ iot data
> >>
> >>> On Fri, Mar 27, 2015 at 2:48 PM, RJ Nowling <rn...@gmail.com>
> wrote:
> >>>
> >>> Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss ways
> >> to
> >>> connect BigTop Bazaar to HBase.
> >>>
> >>> It would be great to work with the BigBench project to see if our data
> >>> generators would be of interest.
> >>>
> >>> On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <ap...@apache.org>
> >>> wrote:
> >>>
> >>>> I agree the proposal sounds very interesting.
> >>>>
> >>>> I can also help with the HBase side of things.
> >>>>
> >>>> On the general subject of data generators, you may want to reach out
> to
> >>> the
> >>>> people behind the "BigBench" project (
> >>>> https://github.com/intel-hadoop/Big-Bench). These are ex colleagues
> of
> >>>> mine
> >>>> from Intel. When I was there they were interested in contributing to
> >>>> Apache, but had significant problems in that the data generator itself
> >>> was
> >>>> licensed under non-free terms incompatible with the ASL. I think they
> >>>> wanted to move past that but weren't sure exactly how (including
> having
> >>> the
> >>>> bandwidth to do so). I see occasional updates to the repo so they are
> >>> still
> >>>> active in some way.
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <
> jayunit100.apache@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>> Thanks for proposing rj.
> >>>>>
> >>>>> Im in favor, so long as it comes w/ a bigtop supported use case, and
> >>>> indeed
> >>>>> BigTop bazaar is a lovely use case for hbase !
> >>>>>
> >>>>> I'm happy help you with the HBase side of things, maybe andrew can
> >>>>> collaborate on a reference architecture with us for scale testing of
> >>>> hbase
> >>>>> via bigtop bazaar's realtime IoT style of data generation.
> >>>>>
> >>>>> That will be a great blueprint compleiment to the mapreduce, spark,
> >>>>> blueprints which we already have.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> Most of you are aware of my work with Jay on BigPetStore,
> >>> particularly
> >>>>> the
> >>>>>> data generator and Spark pipelines.  Data generators are a great
> >> way
> >>> to
> >>>>>> load test systems, as Jay has recently done for kubernetes using
> >> the
> >>>> BPS
> >>>>>> data generator.
> >>>>>>
> >>>>>> We think they're generally useful to the big data community. Would
> >>>> BigTop
> >>>>>> be interested in hosting these data generator / load testing tools
> >> as
> >>>>>> released artifacts in their own right?
> >>>>>>
> >>>>>> For example, we'd like to set up a web page on the BigTop site with
> >>>> links
> >>>>>> to:
> >>>>>>
> >>>>>> * BPS Data Generator
> >>>>>> * BPS Spark
> >>>>>> * BPS Transaction Queue for using the data generator to test
> >>> streaming
> >>>>>> services
> >>>>>>
> >>>>>> and we'd like to release these as source tarballs, uber JARs,
> >>>>> Maven-hosted
> >>>>>> JARs, and Docker containers (as appropriate).
> >>>>>>
> >>>>>> Would this be okay or should everything be released as part of
> >> BigTop
> >>>>>> itself?
> >>>>>>
> >>>>>> Secondly, I've been working on a model for simulating customer
> >>>> movements
> >>>>> at
> >>>>>> a conference.  It's designed for development and testing for a
> >>>> real-time
> >>>>>> streaming analytics application where we didn't have access to data
> >>>> ahead
> >>>>>> of time.  You can read about it here:
> >>>>>>
> >>>>>> http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
> >>>>>>
> >>>>>> I'd like to call it "BigTop Bazaar" and release it through BigTop.
> >>> Is
> >>>>> the
> >>>>>> BigTop community interested in having multiple data generators?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> RJ
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> jay vyas
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>>
> >>>>   - Andy
> >>>>
> >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >> Hein
> >>>> (via Tom White)
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Hosting Data Generators in BigTop

Posted by RJ Nowling <rn...@gmail.com>.
My current model simulates the conference attendees as particles and reports their X,Y positions at specified intervals. We could modify the output to compute distances to scanners.  

I currently have half the model implemented in Golang. I could rewrite in a Java or another JVM language, commit to BigTop, and continue development through BigTop so BigTop can track my progress. 

Andrew, would you be willing to talk on the phone or via Google Hangouts to work out details for a plan on integration with HBase / Phoenix?



> On Mar 30, 2015, at 5:07 PM, Andrew Purtell <ap...@apache.org> wrote:
> 
> For IoT, some low hanging fruit is a sensor network use case. The
> particulars of the use case can vary but I can see stressing HBase on the
> write side by deploying sensors over a simulated 2-dimensional space,
> keying in part by location, and then having telemetry timeseries data
> arrive by time and location in irregular patterns. (Sensors would only
> report changes. The generator could model duty cycles in addition to
> modeling the physical process under measurement.) We could scale up and
> down the data bulk and arrival rate by varying the size of the simulated
> space and the rate of measurement change notices produced by the model. On
> the read side having compound keys with geolocation in the leading edge,
> followed by a time component, would be natural for interactive
> visualization of the data as heat maps. They could be animated or
> summarized over varying time ranges. This would produce short and long
> scanning access patterns with wide variation in selectivity of server side
> filtering depending on query. If using Phoenix, it would parallelize the
> scanning activity and put load through the roof.
> 
> 
> On Fri, Mar 27, 2015 at 11:58 AM, jay vyas <ja...@gmail.com>
> wrote:
> 
>> Definetely will be awesome if andrew can help us craft an idiomatic and
>> meaningfull way to stress HBase at scale w/ iot data
>> 
>>> On Fri, Mar 27, 2015 at 2:48 PM, RJ Nowling <rn...@gmail.com> wrote:
>>> 
>>> Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss ways
>> to
>>> connect BigTop Bazaar to HBase.
>>> 
>>> It would be great to work with the BigBench project to see if our data
>>> generators would be of interest.
>>> 
>>> On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <ap...@apache.org>
>>> wrote:
>>> 
>>>> I agree the proposal sounds very interesting.
>>>> 
>>>> I can also help with the HBase side of things.
>>>> 
>>>> On the general subject of data generators, you may want to reach out to
>>> the
>>>> people behind the "BigBench" project (
>>>> https://github.com/intel-hadoop/Big-Bench). These are ex colleagues of
>>>> mine
>>>> from Intel. When I was there they were interested in contributing to
>>>> Apache, but had significant problems in that the data generator itself
>>> was
>>>> licensed under non-free terms incompatible with the ASL. I think they
>>>> wanted to move past that but weren't sure exactly how (including having
>>> the
>>>> bandwidth to do so). I see occasional updates to the repo so they are
>>> still
>>>> active in some way.
>>>> 
>>>> 
>>>> 
>>>> On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <jayunit100.apache@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>> Thanks for proposing rj.
>>>>> 
>>>>> Im in favor, so long as it comes w/ a bigtop supported use case, and
>>>> indeed
>>>>> BigTop bazaar is a lovely use case for hbase !
>>>>> 
>>>>> I'm happy help you with the HBase side of things, maybe andrew can
>>>>> collaborate on a reference architecture with us for scale testing of
>>>> hbase
>>>>> via bigtop bazaar's realtime IoT style of data generation.
>>>>> 
>>>>> That will be a great blueprint compleiment to the mapreduce, spark,
>>>>> blueprints which we already have.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Most of you are aware of my work with Jay on BigPetStore,
>>> particularly
>>>>> the
>>>>>> data generator and Spark pipelines.  Data generators are a great
>> way
>>> to
>>>>>> load test systems, as Jay has recently done for kubernetes using
>> the
>>>> BPS
>>>>>> data generator.
>>>>>> 
>>>>>> We think they're generally useful to the big data community. Would
>>>> BigTop
>>>>>> be interested in hosting these data generator / load testing tools
>> as
>>>>>> released artifacts in their own right?
>>>>>> 
>>>>>> For example, we'd like to set up a web page on the BigTop site with
>>>> links
>>>>>> to:
>>>>>> 
>>>>>> * BPS Data Generator
>>>>>> * BPS Spark
>>>>>> * BPS Transaction Queue for using the data generator to test
>>> streaming
>>>>>> services
>>>>>> 
>>>>>> and we'd like to release these as source tarballs, uber JARs,
>>>>> Maven-hosted
>>>>>> JARs, and Docker containers (as appropriate).
>>>>>> 
>>>>>> Would this be okay or should everything be released as part of
>> BigTop
>>>>>> itself?
>>>>>> 
>>>>>> Secondly, I've been working on a model for simulating customer
>>>> movements
>>>>> at
>>>>>> a conference.  It's designed for development and testing for a
>>>> real-time
>>>>>> streaming analytics application where we didn't have access to data
>>>> ahead
>>>>>> of time.  You can read about it here:
>>>>>> 
>>>>>> http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
>>>>>> 
>>>>>> I'd like to call it "BigTop Bazaar" and release it through BigTop.
>>> Is
>>>>> the
>>>>>> BigTop community interested in having multiple data generators?
>>>>>> 
>>>>>> Thanks,
>>>>>> RJ
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> jay vyas
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>>   - Andy
>>>> 
>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>> (via Tom White)
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> jay vyas
>> 
> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: Hosting Data Generators in BigTop

Posted by Andrew Purtell <ap...@apache.org>.
For IoT, some low hanging fruit is a sensor network use case. The
particulars of the use case can vary but I can see stressing HBase on the
write side by deploying sensors over a simulated 2-dimensional space,
keying in part by location, and then having telemetry timeseries data
arrive by time and location in irregular patterns. (Sensors would only
report changes. The generator could model duty cycles in addition to
modeling the physical process under measurement.) We could scale up and
down the data bulk and arrival rate by varying the size of the simulated
space and the rate of measurement change notices produced by the model. On
the read side having compound keys with geolocation in the leading edge,
followed by a time component, would be natural for interactive
visualization of the data as heat maps. They could be animated or
summarized over varying time ranges. This would produce short and long
scanning access patterns with wide variation in selectivity of server side
filtering depending on query. If using Phoenix, it would parallelize the
scanning activity and put load through the roof.


On Fri, Mar 27, 2015 at 11:58 AM, jay vyas <ja...@gmail.com>
wrote:

> Definetely will be awesome if andrew can help us craft an idiomatic and
> meaningfull way to stress HBase at scale w/ iot data
>
> On Fri, Mar 27, 2015 at 2:48 PM, RJ Nowling <rn...@gmail.com> wrote:
>
> > Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss ways
> to
> > connect BigTop Bazaar to HBase.
> >
> > It would be great to work with the BigBench project to see if our data
> > generators would be of interest.
> >
> > On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > I agree the proposal sounds very interesting.
> > >
> > > I can also help with the HBase side of things.
> > >
> > > On the general subject of data generators, you may want to reach out to
> > the
> > > people behind the "BigBench" project (
> > > https://github.com/intel-hadoop/Big-Bench). These are ex colleagues of
> > > mine
> > > from Intel. When I was there they were interested in contributing to
> > > Apache, but had significant problems in that the data generator itself
> > was
> > > licensed under non-free terms incompatible with the ASL. I think they
> > > wanted to move past that but weren't sure exactly how (including having
> > the
> > > bandwidth to do so). I see occasional updates to the repo so they are
> > still
> > > active in some way.
> > >
> > >
> > >
> > > On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <jayunit100.apache@gmail.com
> >
> > > wrote:
> > >
> > > > Thanks for proposing rj.
> > > >
> > > > Im in favor, so long as it comes w/ a bigtop supported use case, and
> > > indeed
> > > > BigTop bazaar is a lovely use case for hbase !
> > > >
> > > > I'm happy help you with the HBase side of things, maybe andrew can
> > > > collaborate on a reference architecture with us for scale testing of
> > > hbase
> > > > via bigtop bazaar's realtime IoT style of data generation.
> > > >
> > > > That will be a great blueprint compleiment to the mapreduce, spark,
> > > > blueprints which we already have.
> > > >
> > > >
> > > >
> > > > On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Most of you are aware of my work with Jay on BigPetStore,
> > particularly
> > > > the
> > > > > data generator and Spark pipelines.  Data generators are a great
> way
> > to
> > > > > load test systems, as Jay has recently done for kubernetes using
> the
> > > BPS
> > > > > data generator.
> > > > >
> > > > > We think they're generally useful to the big data community. Would
> > > BigTop
> > > > > be interested in hosting these data generator / load testing tools
> as
> > > > > released artifacts in their own right?
> > > > >
> > > > > For example, we'd like to set up a web page on the BigTop site with
> > > links
> > > > > to:
> > > > >
> > > > > * BPS Data Generator
> > > > > * BPS Spark
> > > > > * BPS Transaction Queue for using the data generator to test
> > streaming
> > > > > services
> > > > >
> > > > > and we'd like to release these as source tarballs, uber JARs,
> > > > Maven-hosted
> > > > > JARs, and Docker containers (as appropriate).
> > > > >
> > > > > Would this be okay or should everything be released as part of
> BigTop
> > > > > itself?
> > > > >
> > > > > Secondly, I've been working on a model for simulating customer
> > > movements
> > > > at
> > > > > a conference.  It's designed for development and testing for a
> > > real-time
> > > > > streaming analytics application where we didn't have access to data
> > > ahead
> > > > > of time.  You can read about it here:
> > > > >
> > > > > http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
> > > > >
> > > > > I'd like to call it "BigTop Bazaar" and release it through BigTop.
> > Is
> > > > the
> > > > > BigTop community interested in having multiple data generators?
> > > > >
> > > > > Thanks,
> > > > > RJ
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > jay vyas
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
>
> --
> jay vyas
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Hosting Data Generators in BigTop

Posted by jay vyas <ja...@gmail.com>.
Definetely will be awesome if andrew can help us craft an idiomatic and
meaningfull way to stress HBase at scale w/ iot data

On Fri, Mar 27, 2015 at 2:48 PM, RJ Nowling <rn...@gmail.com> wrote:

> Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss ways to
> connect BigTop Bazaar to HBase.
>
> It would be great to work with the BigBench project to see if our data
> generators would be of interest.
>
> On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > I agree the proposal sounds very interesting.
> >
> > I can also help with the HBase side of things.
> >
> > On the general subject of data generators, you may want to reach out to
> the
> > people behind the "BigBench" project (
> > https://github.com/intel-hadoop/Big-Bench). These are ex colleagues of
> > mine
> > from Intel. When I was there they were interested in contributing to
> > Apache, but had significant problems in that the data generator itself
> was
> > licensed under non-free terms incompatible with the ASL. I think they
> > wanted to move past that but weren't sure exactly how (including having
> the
> > bandwidth to do so). I see occasional updates to the repo so they are
> still
> > active in some way.
> >
> >
> >
> > On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <ja...@gmail.com>
> > wrote:
> >
> > > Thanks for proposing rj.
> > >
> > > Im in favor, so long as it comes w/ a bigtop supported use case, and
> > indeed
> > > BigTop bazaar is a lovely use case for hbase !
> > >
> > > I'm happy help you with the HBase side of things, maybe andrew can
> > > collaborate on a reference architecture with us for scale testing of
> > hbase
> > > via bigtop bazaar's realtime IoT style of data generation.
> > >
> > > That will be a great blueprint compleiment to the mapreduce, spark,
> > > blueprints which we already have.
> > >
> > >
> > >
> > > On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Most of you are aware of my work with Jay on BigPetStore,
> particularly
> > > the
> > > > data generator and Spark pipelines.  Data generators are a great way
> to
> > > > load test systems, as Jay has recently done for kubernetes using the
> > BPS
> > > > data generator.
> > > >
> > > > We think they're generally useful to the big data community. Would
> > BigTop
> > > > be interested in hosting these data generator / load testing tools as
> > > > released artifacts in their own right?
> > > >
> > > > For example, we'd like to set up a web page on the BigTop site with
> > links
> > > > to:
> > > >
> > > > * BPS Data Generator
> > > > * BPS Spark
> > > > * BPS Transaction Queue for using the data generator to test
> streaming
> > > > services
> > > >
> > > > and we'd like to release these as source tarballs, uber JARs,
> > > Maven-hosted
> > > > JARs, and Docker containers (as appropriate).
> > > >
> > > > Would this be okay or should everything be released as part of BigTop
> > > > itself?
> > > >
> > > > Secondly, I've been working on a model for simulating customer
> > movements
> > > at
> > > > a conference.  It's designed for development and testing for a
> > real-time
> > > > streaming analytics application where we didn't have access to data
> > ahead
> > > > of time.  You can read about it here:
> > > >
> > > > http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
> > > >
> > > > I'd like to call it "BigTop Bazaar" and release it through BigTop.
> Is
> > > the
> > > > BigTop community interested in having multiple data generators?
> > > >
> > > > Thanks,
> > > > RJ
> > > >
> > >
> > >
> > >
> > > --
> > > jay vyas
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
jay vyas

Re: Hosting Data Generators in BigTop

Posted by RJ Nowling <rn...@gmail.com>.
Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss ways to
connect BigTop Bazaar to HBase.

It would be great to work with the BigBench project to see if our data
generators would be of interest.

On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <ap...@apache.org> wrote:

> I agree the proposal sounds very interesting.
>
> I can also help with the HBase side of things.
>
> On the general subject of data generators, you may want to reach out to the
> people behind the "BigBench" project (
> https://github.com/intel-hadoop/Big-Bench). These are ex colleagues of
> mine
> from Intel. When I was there they were interested in contributing to
> Apache, but had significant problems in that the data generator itself was
> licensed under non-free terms incompatible with the ASL. I think they
> wanted to move past that but weren't sure exactly how (including having the
> bandwidth to do so). I see occasional updates to the repo so they are still
> active in some way.
>
>
>
> On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <ja...@gmail.com>
> wrote:
>
> > Thanks for proposing rj.
> >
> > Im in favor, so long as it comes w/ a bigtop supported use case, and
> indeed
> > BigTop bazaar is a lovely use case for hbase !
> >
> > I'm happy help you with the HBase side of things, maybe andrew can
> > collaborate on a reference architecture with us for scale testing of
> hbase
> > via bigtop bazaar's realtime IoT style of data generation.
> >
> > That will be a great blueprint compleiment to the mapreduce, spark,
> > blueprints which we already have.
> >
> >
> >
> > On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Most of you are aware of my work with Jay on BigPetStore, particularly
> > the
> > > data generator and Spark pipelines.  Data generators are a great way to
> > > load test systems, as Jay has recently done for kubernetes using the
> BPS
> > > data generator.
> > >
> > > We think they're generally useful to the big data community. Would
> BigTop
> > > be interested in hosting these data generator / load testing tools as
> > > released artifacts in their own right?
> > >
> > > For example, we'd like to set up a web page on the BigTop site with
> links
> > > to:
> > >
> > > * BPS Data Generator
> > > * BPS Spark
> > > * BPS Transaction Queue for using the data generator to test streaming
> > > services
> > >
> > > and we'd like to release these as source tarballs, uber JARs,
> > Maven-hosted
> > > JARs, and Docker containers (as appropriate).
> > >
> > > Would this be okay or should everything be released as part of BigTop
> > > itself?
> > >
> > > Secondly, I've been working on a model for simulating customer
> movements
> > at
> > > a conference.  It's designed for development and testing for a
> real-time
> > > streaming analytics application where we didn't have access to data
> ahead
> > > of time.  You can read about it here:
> > >
> > > http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
> > >
> > > I'd like to call it "BigTop Bazaar" and release it through BigTop.  Is
> > the
> > > BigTop community interested in having multiple data generators?
> > >
> > > Thanks,
> > > RJ
> > >
> >
> >
> >
> > --
> > jay vyas
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Hosting Data Generators in BigTop

Posted by Andrew Purtell <ap...@apache.org>.
I agree the proposal sounds very interesting.

I can also help with the HBase side of things.

On the general subject of data generators, you may want to reach out to the
people behind the "BigBench" project (
https://github.com/intel-hadoop/Big-Bench). These are ex colleagues of mine
from Intel. When I was there they were interested in contributing to
Apache, but had significant problems in that the data generator itself was
licensed under non-free terms incompatible with the ASL. I think they
wanted to move past that but weren't sure exactly how (including having the
bandwidth to do so). I see occasional updates to the repo so they are still
active in some way.



On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <ja...@gmail.com>
wrote:

> Thanks for proposing rj.
>
> Im in favor, so long as it comes w/ a bigtop supported use case, and indeed
> BigTop bazaar is a lovely use case for hbase !
>
> I'm happy help you with the HBase side of things, maybe andrew can
> collaborate on a reference architecture with us for scale testing of hbase
> via bigtop bazaar's realtime IoT style of data generation.
>
> That will be a great blueprint compleiment to the mapreduce, spark,
> blueprints which we already have.
>
>
>
> On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com> wrote:
>
> > Hi all,
> >
> > Most of you are aware of my work with Jay on BigPetStore, particularly
> the
> > data generator and Spark pipelines.  Data generators are a great way to
> > load test systems, as Jay has recently done for kubernetes using the BPS
> > data generator.
> >
> > We think they're generally useful to the big data community. Would BigTop
> > be interested in hosting these data generator / load testing tools as
> > released artifacts in their own right?
> >
> > For example, we'd like to set up a web page on the BigTop site with links
> > to:
> >
> > * BPS Data Generator
> > * BPS Spark
> > * BPS Transaction Queue for using the data generator to test streaming
> > services
> >
> > and we'd like to release these as source tarballs, uber JARs,
> Maven-hosted
> > JARs, and Docker containers (as appropriate).
> >
> > Would this be okay or should everything be released as part of BigTop
> > itself?
> >
> > Secondly, I've been working on a model for simulating customer movements
> at
> > a conference.  It's designed for development and testing for a real-time
> > streaming analytics application where we didn't have access to data ahead
> > of time.  You can read about it here:
> >
> > http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
> >
> > I'd like to call it "BigTop Bazaar" and release it through BigTop.  Is
> the
> > BigTop community interested in having multiple data generators?
> >
> > Thanks,
> > RJ
> >
>
>
>
> --
> jay vyas
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Hosting Data Generators in BigTop

Posted by jay vyas <ja...@gmail.com>.
Thanks for proposing rj.

Im in favor, so long as it comes w/ a bigtop supported use case, and indeed
BigTop bazaar is a lovely use case for hbase !

I'm happy help you with the HBase side of things, maybe andrew can
collaborate on a reference architecture with us for scale testing of hbase
via bigtop bazaar's realtime IoT style of data generation.

That will be a great blueprint compleiment to the mapreduce, spark,
blueprints which we already have.



On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <rn...@gmail.com> wrote:

> Hi all,
>
> Most of you are aware of my work with Jay on BigPetStore, particularly the
> data generator and Spark pipelines.  Data generators are a great way to
> load test systems, as Jay has recently done for kubernetes using the BPS
> data generator.
>
> We think they're generally useful to the big data community. Would BigTop
> be interested in hosting these data generator / load testing tools as
> released artifacts in their own right?
>
> For example, we'd like to set up a web page on the BigTop site with links
> to:
>
> * BPS Data Generator
> * BPS Spark
> * BPS Transaction Queue for using the data generator to test streaming
> services
>
> and we'd like to release these as source tarballs, uber JARs, Maven-hosted
> JARs, and Docker containers (as appropriate).
>
> Would this be okay or should everything be released as part of BigTop
> itself?
>
> Secondly, I've been working on a model for simulating customer movements at
> a conference.  It's designed for development and testing for a real-time
> streaming analytics application where we didn't have access to data ahead
> of time.  You can read about it here:
>
> http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
>
> I'd like to call it "BigTop Bazaar" and release it through BigTop.  Is the
> BigTop community interested in having multiple data generators?
>
> Thanks,
> RJ
>



-- 
jay vyas