You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Josh Elser <jo...@gmail.com> on 2013/09/03 04:30:34 UTC

Cosmos - Accumulo-backed sorting, filtering and grouping of columnar data sets

Since this is the community that's likely to be interested, I wanted to 
spread some word about a project I've been working on in my spare time: 
Cosmos.

https://github.com/joshelser/cosmos

The point of Cosmos is to provide an efficient, easy-to-use interface 
around Accumulo for the general purpose of counting and filtering of a 
data set. At a glance, it accepts Multimaps of data, and provides 
mechanism to fetch records by column, fetch records by column with value 
filtering, and count unique values across records in a column (groupBy). 
It also contains a very simple internal timing/tracing API (much less 
granular than Accumulo's tracing library), and a (very) rough web 
interface for viewing said traces. Additionally, Cosmos contains a 
simple example of its API using a public dataset of ~350K records 
provided by the city of Chicago (https://data.cityofchicago.org/).

Cosmos' design lends itself well to multiple users accessing the same 
Accumulo instance, deferring to Accumulo or ZooKeeper to do 
synchronization/persistence when necessary. It aims at abstracting some 
of the difficulty in using Accumulo away from the user to make the 
application developer's life a bit easier.

And, as you'd expect, Apache licensed and compatible with Apache 
Accumulo 1.4.4 and 1.5.0.

I'd love to hear what people think. Any feedback is welcome.

- Josh

Re: Cosmos - Accumulo-backed sorting, filtering and grouping of columnar data sets

Posted by Miguel Pereira <mi...@gmail.com>.
+1 for making things simple :D


On Mon, Sep 2, 2013 at 10:30 PM, Josh Elser <jo...@gmail.com> wrote:

> Since this is the community that's likely to be interested, I wanted to
> spread some word about a project I've been working on in my spare time:
> Cosmos.
>
> https://github.com/joshelser/**cosmos<https://github.com/joshelser/cosmos>
>
> The point of Cosmos is to provide an efficient, easy-to-use interface
> around Accumulo for the general purpose of counting and filtering of a data
> set. At a glance, it accepts Multimaps of data, and provides mechanism to
> fetch records by column, fetch records by column with value filtering, and
> count unique values across records in a column (groupBy). It also contains
> a very simple internal timing/tracing API (much less granular than
> Accumulo's tracing library), and a (very) rough web interface for viewing
> said traces. Additionally, Cosmos contains a simple example of its API
> using a public dataset of ~350K records provided by the city of Chicago (
> https://data.cityofchicago.**org/ <https://data.cityofchicago.org/>).
>
> Cosmos' design lends itself well to multiple users accessing the same
> Accumulo instance, deferring to Accumulo or ZooKeeper to do
> synchronization/persistence when necessary. It aims at abstracting some of
> the difficulty in using Accumulo away from the user to make the application
> developer's life a bit easier.
>
> And, as you'd expect, Apache licensed and compatible with Apache Accumulo
> 1.4.4 and 1.5.0.
>
> I'd love to hear what people think. Any feedback is welcome.
>
> - Josh
>