You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jesse Yates <je...@gmail.com> on 2011/12/22 20:44:20 UTC

(Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Culvert was originally introduced at Hadoop Summit 2011, but recent updates
have made it very applicable to current systems. Recently, we added support
for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
Summit, there have also been significant code cleanup and added some small
features. However, we found that most people hadn't heard of Culvert, so we
wanted to re-release the framework.

For an introduction to using Culvert, check out the blog post here:
http://jyates.github.com/2011/11/17/intro-to-culvert.html

Also, the original presentation (where we discuss the internals) is
available on slideshare<http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data>
.

There is a Culvert hackathon in the middle of January:
http://culverthackathon2012.eventbrite.com/

Oh, and you can find the code on
github<https://github.com/booz-allen-hamilton/culvert>
.

Below is an overview of why we wrote Culvert and what it does.

Secondary indexing is a common design pattern in BigTable-like databases
that allows users to index one or more columns in a table. This technique
enables fast search of records in a database based on a particular column
instead of the row id, thus enabling relational-style semantics in a NoSQL
environment. Frequently, the index is stored either in a reserved namespace
in the table or another index table.

Despite the fact that this is a common design pattern in BigTable-based
applications, most implementations of this practice to date have been
tightly coupled with a particular application. As a result, few
general-purpose frameworks for secondary indexing on BigTable-like
databases exist, and those that do are tied to a particular implementation
of the BigTable model.

There are several existing tools (Solr, Lily), but these are focused on
doing text based search and are highly restrictive to indexes created
through their framework. What if you want to use your existing indexes? Or
leverage the indexes to do complex queries?

We developed a solution to this problem called Culvert that supports online
index updates as well as a variation of the HIVE query language. In
designing Culvert, we sought to make the solution pluggable so that it can
be used on any of the many BigTable-like databases (HBase, Cassandra,
etc.). Furthermore, it is also easily extensible to existing, hand rolled
indexes.

As well as being a secondary indexing framework, it is also a query
execution mechanism - think pig/hive minus the fancy command line. We
support a subset of SQL, but are able to take full advantage of home-rolled
and built-in indexes, leading to query execution times potentially orders
of magnitude smaller than existing approaches and certainly orders of
magnitude more easily.

-- Jesse
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Stack <st...@duboce.net>.

On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <je...@gmail.com> wrote:
> Culvert was originally introduced at Hadoop Summit 2011, but recent updates
> have made it very applicable to current systems. Recently, we added support
> for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> Summit, there have also been significant code cleanup and added some small
> features. However, we found that most people hadn't heard of Culvert, so we
> wanted to re-release the framework.
>
> For an introduction to using Culvert, check out the blog post here:
> http://jyates.github.com/2011/11/17/intro-to-culvert.html
>

Nice one Jesse.  Would suggest you add to
http://wiki.apache.org/hadoop/SupportingProjects -- after you get it
compiling again (smile). Add it at top so the projects that rot tend
to fall down in the list.

St.Ack

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

On Fri, Dec 23, 2011 at 9:28 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> I briefly looked at the presentation. May I ask how is it much
> different than using elasticsearch or solr? As I understand terms are
> being indexed which is also done by search engines. Just trying to
> understand the main benefit. We currently use Cassandra.
>
> Thanks
>

Culvert is designed not just to do search over documents, but to also do
general indexing over all your keyvalues. Chances are the things you are
storing are more than just unstructured text with some special key. If
thats the case, then some general, text based indexing is really all you
need. Right now, Culvert only supports a a built-in text-based index, but
is pretty easy to write new ones. The power in culvert comes from the fact
that it can integrate really easily with existing indexes (legacy systems)
and do indexing with some of its built-in indexes. If you want to look up
by something that is not the row key (primary key), then you will need to
have an index on that value - this is usually taken care of for you in
'traditional' SQL systems.

On top of just doing the indexing for you, Culvert does a lot of complex
query execution with a subset of SQL combined with a decorator design
pattern to make it really natural to build up queries. Because this
execution is built into the core of Culvert, it leverages the all the
information you have indexed - this means potentially orders of magnitude
faster queries. There is also a lot of potential work here, under the hood,
doing query optimization (culvert is pretty young).

We also can potentially do server-side joins. I don't know what Cassandra
supports in this field, but it would need to be something equivalent to
coprocessors in hbase (or a modified iterator for accumulo). Even not
having the server-side joins, we can still leverage the indexes in doing
the joins, making for much more efficient joins.

The Hive adapter is about 90% of the way there as well, which would give
you full index support on top of the ease that hive lets you write HQL for
your tables.

Finally, culvert allows you to be entirely cross-platform with other
BigTable style databases. All the queries and indexes are developed
entirely agnostically to the underlying datastore. So, if you wanted to
switch to HBase tomorrow, all you would need to do is  copy your data over
to the database (through the culvert client, though we've discussed adding
batch indexing) and then point culvert at the new install. All your queries
stay the same, leveraging the same indexes. The only work you need to
reproduce are any of the indexes you wrote by hand.

The adapter for Cassandra really wouldn't be that hard to write - there are
pretty good examples for how it works with hbase and accumulo, so I don't
expect the cassandra part to be that much different.

-Jesse



>
> On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <jo...@ugov.gov>
> wrote:
> > We have yet to release accumulo-1.4, so that was all you working out of
> your local repo.
> >
> > As for Accumulo-1.3.5, we are currently working on making the
> appropriate changes to get make it kosher for a maven release, but we're
> not there yet.
> >
> > John
> >
> > ----- Original Message -----
> > | From: "Jesse Yates" <je...@gmail.com>
> > | To: user@hbase.apache.org
> > | Cc: dev@hbase.apache.org, accumulo-dev@incubator.apache.org,
> accumulo-user@incubator.apache.org
> > | Sent: Thursday, December 22, 2011 5:22:46 PM
> > | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework
> for BigTable like systems
> > | Wow, that's embarrassing - project not building...
> > |
> > | It's because accumulo's release is no longer deployed into the
> > | standard apache maven repository. Maybe one of the accumulo committers
> > | can shed some light on where to find it?
> > |
> > | I'll make some changes and have it at least compiling from the raw
> > | tonight :)
> > |
> > | The alternative is to download accumulo source (
> > | https://github.com/apache/accumulo ) and "mvn clean install" to get it
> > | working on your local machine.
> > |
> > | Thanks Ted!
> > |
> > | -Jesse
> > |
> > |
> > | On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < yuzhihong@gmail.com > wrote:
> > |
> > |
> > | Thanks for the update, Jesse.
> > | Let us know of any feature Culvert needs from HBase.
> > |
> > | After cloning Culvert, I got:
> > |
> > | [INFO] Culvert - Accumulo Integration .................... FAILURE
> > | [0.431s]
> > | [INFO]
> > |
> ------------------------------------------------------------------------
> > | [INFO] BUILD FAILURE
> > | [INFO]
> > |
> ------------------------------------------------------------------------
> > | [INFO] Total time: 1:06.638s
> > | [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > | [INFO] Final Memory: 20M/81M
> > | [INFO]
> > |
> ------------------------------------------------------------------------
> > | [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> > | resolve dependencies for project
> > | com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> > | artifact
> > | org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
> > | apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help
> > | 1]
> > |
> > | Can someone provide hint ?
> > |
> > | On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <
> > | jesse.k.yates@gmail.com >wrote:
> > |
> > |
> > | > Culvert was originally introduced at Hadoop Summit 2011, but recent
> > | > updates
> > | > have made it very applicable to current systems. Recently, we added
> > | > support
> > | > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > | > Summit, there have also been significant code cleanup and added some
> > | > small
> > | > features. However, we found that most people hadn't heard of
> > | > Culvert, so we
> > | > wanted to re-release the framework.
> > | >
> > | > For an introduction to using Culvert, check out the blog post here:
> > | > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> > | >
> > | > Also, the original presentation (where we discuss the internals) is
> > | > available on slideshare<
> > | >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > |
> > | > >
> > | > .
> > | >
> > | > There is a Culvert hackathon in the middle of January:
> > | > http://culverthackathon2012.eventbrite.com/
> > | >
> > | > Oh, and you can find the code on
> > | > github< https://github.com/booz-allen-hamilton/culvert >
> > |
> > |
> > | > .
> > | >
> > | > Below is an overview of why we wrote Culvert and what it does.
> > | >
> > | > Secondary indexing is a common design pattern in BigTable-like
> > | > databases
> > | > that allows users to index one or more columns in a table. This
> > | > technique
> > | > enables fast search of records in a database based on a particular
> > | > column
> > | > instead of the row id, thus enabling relational-style semantics in a
> > | > NoSQL
> > | > environment. Frequently, the index is stored either in a reserved
> > | > namespace
> > | > in the table or another index table.
> > | >
> > | > Despite the fact that this is a common design pattern in
> > | > BigTable-based
> > | > applications, most implementations of this practice to date have
> > | > been
> > | > tightly coupled with a particular application. As a result, few
> > | > general-purpose frameworks for secondary indexing on BigTable-like
> > | > databases exist, and those that do are tied to a particular
> > | > implementation
> > | > of the BigTable model.
> > | >
> > | > There are several existing tools (Solr, Lily), but these are focused
> > | > on
> > | > doing text based search and are highly restrictive to indexes
> > | > created
> > | > through their framework. What if you want to use your existing
> > | > indexes? Or
> > | > leverage the indexes to do complex queries?
> > | >
> > | > We developed a solution to this problem called Culvert that supports
> > | > online
> > | > index updates as well as a variation of the HIVE query language. In
> > | > designing Culvert, we sought to make the solution pluggable so that
> > | > it can
> > | > be used on any of the many BigTable-like databases (HBase,
> > | > Cassandra,
> > | > etc.). Furthermore, it is also easily extensible to existing, hand
> > | > rolled
> > | > indexes.
> > | >
> > | > As well as being a secondary indexing framework, it is also a query
> > | > execution mechanism - think pig/hive minus the fancy command line.
> > | > We
> > | > support a subset of SQL, but are able to take full advantage of
> > | > home-rolled
> > | > and built-in indexes, leading to query execution times potentially
> > | > orders
> > | > of magnitude smaller than existing approaches and certainly orders
> > | > of
> > | > magnitude more easily.
> > | >
> > | > -- Jesse
> > | > -------------------
> > | > Jesse Yates
> > | > 240-888-2200
> > | > @jesse_yates
> > | >
> > |
> > |
> > |
> > | --
> > | -------------------
> > | Jesse Yates
> > | 240-888-2200
> > | @jesse_yates
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Mohit Anchlia <mo...@gmail.com>.

I briefly looked at the presentation. May I ask how is it much
different than using elasticsearch or solr? As I understand terms are
being indexed which is also done by search engines. Just trying to
understand the main benefit. We currently use Cassandra.

Thanks

On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <jo...@ugov.gov> wrote:
> We have yet to release accumulo-1.4, so that was all you working out of your local repo.
>
> As for Accumulo-1.3.5, we are currently working on making the appropriate changes to get make it kosher for a maven release, but we're not there yet.
>
> John
>
> ----- Original Message -----
> | From: "Jesse Yates" <je...@gmail.com>
> | To: user@hbase.apache.org
> | Cc: dev@hbase.apache.org, accumulo-dev@incubator.apache.org, accumulo-user@incubator.apache.org
> | Sent: Thursday, December 22, 2011 5:22:46 PM
> | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems
> | Wow, that's embarrassing - project not building...
> |
> | It's because accumulo's release is no longer deployed into the
> | standard apache maven repository. Maybe one of the accumulo committers
> | can shed some light on where to find it?
> |
> | I'll make some changes and have it at least compiling from the raw
> | tonight :)
> |
> | The alternative is to download accumulo source (
> | https://github.com/apache/accumulo ) and "mvn clean install" to get it
> | working on your local machine.
> |
> | Thanks Ted!
> |
> | -Jesse
> |
> |
> | On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < yuzhihong@gmail.com > wrote:
> |
> |
> | Thanks for the update, Jesse.
> | Let us know of any feature Culvert needs from HBase.
> |
> | After cloning Culvert, I got:
> |
> | [INFO] Culvert - Accumulo Integration .................... FAILURE
> | [0.431s]
> | [INFO]
> | ------------------------------------------------------------------------
> | [INFO] BUILD FAILURE
> | [INFO]
> | ------------------------------------------------------------------------
> | [INFO] Total time: 1:06.638s
> | [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> | [INFO] Final Memory: 20M/81M
> | [INFO]
> | ------------------------------------------------------------------------
> | [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> | resolve dependencies for project
> | com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> | artifact
> | org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
> | apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help
> | 1]
> |
> | Can someone provide hint ?
> |
> | On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <
> | jesse.k.yates@gmail.com >wrote:
> |
> |
> | > Culvert was originally introduced at Hadoop Summit 2011, but recent
> | > updates
> | > have made it very applicable to current systems. Recently, we added
> | > support
> | > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> | > Summit, there have also been significant code cleanup and added some
> | > small
> | > features. However, we found that most people hadn't heard of
> | > Culvert, so we
> | > wanted to re-release the framework.
> | >
> | > For an introduction to using Culvert, check out the blog post here:
> | > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> | >
> | > Also, the original presentation (where we discuss the internals) is
> | > available on slideshare<
> | > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> |
> | > >
> | > .
> | >
> | > There is a Culvert hackathon in the middle of January:
> | > http://culverthackathon2012.eventbrite.com/
> | >
> | > Oh, and you can find the code on
> | > github< https://github.com/booz-allen-hamilton/culvert >
> |
> |
> | > .
> | >
> | > Below is an overview of why we wrote Culvert and what it does.
> | >
> | > Secondary indexing is a common design pattern in BigTable-like
> | > databases
> | > that allows users to index one or more columns in a table. This
> | > technique
> | > enables fast search of records in a database based on a particular
> | > column
> | > instead of the row id, thus enabling relational-style semantics in a
> | > NoSQL
> | > environment. Frequently, the index is stored either in a reserved
> | > namespace
> | > in the table or another index table.
> | >
> | > Despite the fact that this is a common design pattern in
> | > BigTable-based
> | > applications, most implementations of this practice to date have
> | > been
> | > tightly coupled with a particular application. As a result, few
> | > general-purpose frameworks for secondary indexing on BigTable-like
> | > databases exist, and those that do are tied to a particular
> | > implementation
> | > of the BigTable model.
> | >
> | > There are several existing tools (Solr, Lily), but these are focused
> | > on
> | > doing text based search and are highly restrictive to indexes
> | > created
> | > through their framework. What if you want to use your existing
> | > indexes? Or
> | > leverage the indexes to do complex queries?
> | >
> | > We developed a solution to this problem called Culvert that supports
> | > online
> | > index updates as well as a variation of the HIVE query language. In
> | > designing Culvert, we sought to make the solution pluggable so that
> | > it can
> | > be used on any of the many BigTable-like databases (HBase,
> | > Cassandra,
> | > etc.). Furthermore, it is also easily extensible to existing, hand
> | > rolled
> | > indexes.
> | >
> | > As well as being a secondary indexing framework, it is also a query
> | > execution mechanism - think pig/hive minus the fancy command line.
> | > We
> | > support a subset of SQL, but are able to take full advantage of
> | > home-rolled
> | > and built-in indexes, leading to query execution times potentially
> | > orders
> | > of magnitude smaller than existing approaches and certainly orders
> | > of
> | > magnitude more easily.
> | >
> | > -- Jesse
> | > -------------------
> | > Jesse Yates
> | > 240-888-2200
> | > @jesse_yates
> | >
> |
> |
> |
> | --
> | -------------------
> | Jesse Yates
> | 240-888-2200
> | @jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by John W Vines <jo...@ugov.gov>.

It's not a problem of deciding which versions to release to maven, etc. It's an issue of having our deployed jars and poms being Apache compliant.

But once we get ourselves in order, that's a pretty good idea.

John

----- Original Message -----
| From: "Jesse Yates" <je...@gmail.com>
| To: accumulo-dev@incubator.apache.org
| Sent: Friday, December 23, 2011 1:48:45 PM
| Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems
| What about doing -SNAPSHOT releases of dev branches? I know other
| projects
| tend to do that, so people can easily pull in current(ish) dev
| branches for
| local development against upcoming features.
| 
| Thanks!
| 
| -Jesse
| 
| On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <jo...@ugov.gov>
| wrote:
| 
| > We have yet to release accumulo-1.4, so that was all you working out
| > of
| > your local repo.
| >
| > As for Accumulo-1.3.5, we are currently working on making the
| > appropriate
| > changes to get make it kosher for a maven release, but we're not
| > there yet.
| >
| > John
| >
| > ----- Original Message -----
| > | From: "Jesse Yates" <je...@gmail.com>
| > | To: user@hbase.apache.org
| > | Cc: dev@hbase.apache.org, accumulo-dev@incubator.apache.org,
| > accumulo-user@incubator.apache.org
| > | Sent: Thursday, December 22, 2011 5:22:46 PM
| > | Subject: Re: (Re)Introducing Culvert - A secondary indexing
| > | framework
| > for BigTable like systems
| > | Wow, that's embarrassing - project not building...
| > |
| > | It's because accumulo's release is no longer deployed into the
| > | standard apache maven repository. Maybe one of the accumulo
| > | committers
| > | can shed some light on where to find it?
| > |
| > | I'll make some changes and have it at least compiling from the raw
| > | tonight :)
| > |
| > | The alternative is to download accumulo source (
| > | https://github.com/apache/accumulo ) and "mvn clean install" to
| > | get it
| > | working on your local machine.
| > |
| > | Thanks Ted!
| > |
| > | -Jesse
| > |
| > |
| > | On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < yuzhihong@gmail.com >
| > | wrote:
| > |
| > |
| > | Thanks for the update, Jesse.
| > | Let us know of any feature Culvert needs from HBase.
| > |
| > | After cloning Culvert, I got:
| > |
| > | [INFO] Culvert - Accumulo Integration .................... FAILURE
| > | [0.431s]
| > | [INFO]
| > | ------------------------------------------------------------------------
| > | [INFO] BUILD FAILURE
| > | [INFO]
| > | ------------------------------------------------------------------------
| > | [INFO] Total time: 1:06.638s
| > | [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
| > | [INFO] Final Memory: 20M/81M
| > | [INFO]
| > | ------------------------------------------------------------------------
| > | [ERROR] Failed to execute goal on project culvert-accumulo: Could
| > | not
| > | resolve dependencies for project
| > | com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not
| > | find
| > | artifact
| > | org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
| > | apache-snapshots ( http://repository.apache.org/snapshots/ ) ->
| > | [Help
| > | 1]
| > |
| > | Can someone provide hint ?
| > |
| > | On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <
| > | jesse.k.yates@gmail.com >wrote:
| > |
| > |
| > | > Culvert was originally introduced at Hadoop Summit 2011, but
| > | > recent
| > | > updates
| > | > have made it very applicable to current systems. Recently, we
| > | > added
| > | > support
| > | > for Accumulo as well as upgraded HBase support to 0.92. Since
| > | > Hadoop
| > | > Summit, there have also been significant code cleanup and added
| > | > some
| > | > small
| > | > features. However, we found that most people hadn't heard of
| > | > Culvert, so we
| > | > wanted to re-release the framework.
| > | >
| > | > For an introduction to using Culvert, check out the blog post
| > | > here:
| > | > http://jyates.github.com/2011/11/17/intro-to-culvert.html
| > | >
| > | > Also, the original presentation (where we discuss the internals)
| > | > is
| > | > available on slideshare<
| > | >
| > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
| > |
| > | > >
| > | > .
| > | >
| > | > There is a Culvert hackathon in the middle of January:
| > | > http://culverthackathon2012.eventbrite.com/
| > | >
| > | > Oh, and you can find the code on
| > | > github< https://github.com/booz-allen-hamilton/culvert >
| > |
| > |
| > | > .
| > | >
| > | > Below is an overview of why we wrote Culvert and what it does.
| > | >
| > | > Secondary indexing is a common design pattern in BigTable-like
| > | > databases
| > | > that allows users to index one or more columns in a table. This
| > | > technique
| > | > enables fast search of records in a database based on a
| > | > particular
| > | > column
| > | > instead of the row id, thus enabling relational-style semantics
| > | > in a
| > | > NoSQL
| > | > environment. Frequently, the index is stored either in a
| > | > reserved
| > | > namespace
| > | > in the table or another index table.
| > | >
| > | > Despite the fact that this is a common design pattern in
| > | > BigTable-based
| > | > applications, most implementations of this practice to date have
| > | > been
| > | > tightly coupled with a particular application. As a result, few
| > | > general-purpose frameworks for secondary indexing on
| > | > BigTable-like
| > | > databases exist, and those that do are tied to a particular
| > | > implementation
| > | > of the BigTable model.
| > | >
| > | > There are several existing tools (Solr, Lily), but these are
| > | > focused
| > | > on
| > | > doing text based search and are highly restrictive to indexes
| > | > created
| > | > through their framework. What if you want to use your existing
| > | > indexes? Or
| > | > leverage the indexes to do complex queries?
| > | >
| > | > We developed a solution to this problem called Culvert that
| > | > supports
| > | > online
| > | > index updates as well as a variation of the HIVE query language.
| > | > In
| > | > designing Culvert, we sought to make the solution pluggable so
| > | > that
| > | > it can
| > | > be used on any of the many BigTable-like databases (HBase,
| > | > Cassandra,
| > | > etc.). Furthermore, it is also easily extensible to existing,
| > | > hand
| > | > rolled
| > | > indexes.
| > | >
| > | > As well as being a secondary indexing framework, it is also a
| > | > query
| > | > execution mechanism - think pig/hive minus the fancy command
| > | > line.
| > | > We
| > | > support a subset of SQL, but are able to take full advantage of
| > | > home-rolled
| > | > and built-in indexes, leading to query execution times
| > | > potentially
| > | > orders
| > | > of magnitude smaller than existing approaches and certainly
| > | > orders
| > | > of
| > | > magnitude more easily.
| > | >
| > | > -- Jesse
| > | > -------------------
| > | > Jesse Yates
| > | > 240-888-2200
| > | > @jesse_yates
| > | >
| > |
| > |
| > |
| > | --
| > | -------------------
| > | Jesse Yates
| > | 240-888-2200
| > | @jesse_yates
| >
| 
| 
| 
| --
| -------------------
| Jesse Yates
| 240-888-2200
| @jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

What about doing -SNAPSHOT releases of dev branches? I know other projects
tend to do that, so people can easily pull in current(ish) dev branches for
local development against upcoming features.

Thanks!

-Jesse

On Fri, Dec 23, 2011 at 6:23 AM, John W Vines <jo...@ugov.gov> wrote:

> We have yet to release accumulo-1.4, so that was all you working out of
> your local repo.
>
> As for Accumulo-1.3.5, we are currently working on making the appropriate
> changes to get make it kosher for a maven release, but we're not there yet.
>
> John
>
> ----- Original Message -----
> | From: "Jesse Yates" <je...@gmail.com>
> | To: user@hbase.apache.org
> | Cc: dev@hbase.apache.org, accumulo-dev@incubator.apache.org,
> accumulo-user@incubator.apache.org
> | Sent: Thursday, December 22, 2011 5:22:46 PM
> | Subject: Re: (Re)Introducing Culvert - A secondary indexing framework
> for BigTable like systems
> | Wow, that's embarrassing - project not building...
> |
> | It's because accumulo's release is no longer deployed into the
> | standard apache maven repository. Maybe one of the accumulo committers
> | can shed some light on where to find it?
> |
> | I'll make some changes and have it at least compiling from the raw
> | tonight :)
> |
> | The alternative is to download accumulo source (
> | https://github.com/apache/accumulo ) and "mvn clean install" to get it
> | working on your local machine.
> |
> | Thanks Ted!
> |
> | -Jesse
> |
> |
> | On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < yuzhihong@gmail.com > wrote:
> |
> |
> | Thanks for the update, Jesse.
> | Let us know of any feature Culvert needs from HBase.
> |
> | After cloning Culvert, I got:
> |
> | [INFO] Culvert - Accumulo Integration .................... FAILURE
> | [0.431s]
> | [INFO]
> | ------------------------------------------------------------------------
> | [INFO] BUILD FAILURE
> | [INFO]
> | ------------------------------------------------------------------------
> | [INFO] Total time: 1:06.638s
> | [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> | [INFO] Final Memory: 20M/81M
> | [INFO]
> | ------------------------------------------------------------------------
> | [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> | resolve dependencies for project
> | com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> | artifact
> | org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
> | apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help
> | 1]
> |
> | Can someone provide hint ?
> |
> | On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <
> | jesse.k.yates@gmail.com >wrote:
> |
> |
> | > Culvert was originally introduced at Hadoop Summit 2011, but recent
> | > updates
> | > have made it very applicable to current systems. Recently, we added
> | > support
> | > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> | > Summit, there have also been significant code cleanup and added some
> | > small
> | > features. However, we found that most people hadn't heard of
> | > Culvert, so we
> | > wanted to re-release the framework.
> | >
> | > For an introduction to using Culvert, check out the blog post here:
> | > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> | >
> | > Also, the original presentation (where we discuss the internals) is
> | > available on slideshare<
> | >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> |
> | > >
> | > .
> | >
> | > There is a Culvert hackathon in the middle of January:
> | > http://culverthackathon2012.eventbrite.com/
> | >
> | > Oh, and you can find the code on
> | > github< https://github.com/booz-allen-hamilton/culvert >
> |
> |
> | > .
> | >
> | > Below is an overview of why we wrote Culvert and what it does.
> | >
> | > Secondary indexing is a common design pattern in BigTable-like
> | > databases
> | > that allows users to index one or more columns in a table. This
> | > technique
> | > enables fast search of records in a database based on a particular
> | > column
> | > instead of the row id, thus enabling relational-style semantics in a
> | > NoSQL
> | > environment. Frequently, the index is stored either in a reserved
> | > namespace
> | > in the table or another index table.
> | >
> | > Despite the fact that this is a common design pattern in
> | > BigTable-based
> | > applications, most implementations of this practice to date have
> | > been
> | > tightly coupled with a particular application. As a result, few
> | > general-purpose frameworks for secondary indexing on BigTable-like
> | > databases exist, and those that do are tied to a particular
> | > implementation
> | > of the BigTable model.
> | >
> | > There are several existing tools (Solr, Lily), but these are focused
> | > on
> | > doing text based search and are highly restrictive to indexes
> | > created
> | > through their framework. What if you want to use your existing
> | > indexes? Or
> | > leverage the indexes to do complex queries?
> | >
> | > We developed a solution to this problem called Culvert that supports
> | > online
> | > index updates as well as a variation of the HIVE query language. In
> | > designing Culvert, we sought to make the solution pluggable so that
> | > it can
> | > be used on any of the many BigTable-like databases (HBase,
> | > Cassandra,
> | > etc.). Furthermore, it is also easily extensible to existing, hand
> | > rolled
> | > indexes.
> | >
> | > As well as being a secondary indexing framework, it is also a query
> | > execution mechanism - think pig/hive minus the fancy command line.
> | > We
> | > support a subset of SQL, but are able to take full advantage of
> | > home-rolled
> | > and built-in indexes, leading to query execution times potentially
> | > orders
> | > of magnitude smaller than existing approaches and certainly orders
> | > of
> | > magnitude more easily.
> | >
> | > -- Jesse
> | > -------------------
> | > Jesse Yates
> | > 240-888-2200
> | > @jesse_yates
> | >
> |
> |
> |
> | --
> | -------------------
> | Jesse Yates
> | 240-888-2200
> | @jesse_yates
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by David Medinets <da...@gmail.com>.

+1 to get accumulo into a maven repository.

On Thu, Dec 22, 2011 at 5:46 PM, Jesse Yates <je...@gmail.com> wrote:
> Hopefully, in the near future, we can start hosting the accumulo snapshots
> in a publicly accessible maven repository, and we can merge the accumulo
> branch back into trunk.

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

I just updated trunk so that we don't build the accumulo package by default.

If you want to build with accumulo, right now we are supporting the
"accumulo-1.3.5-incubating" branch, which supports the current released
version of accumulo
(accumulo-1.3.5<http://incubator.apache.org/accumulo/downloads/downloads.html>).


Hopefully, in the near future, we can start hosting the accumulo snapshots
in a publicly accessible maven repository, and we can merge the accumulo
branch back into trunk.

On Thu, Dec 22, 2011 at 2:35 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the hint. That works.
>
> I had to modify culvert-accumulo/pom.xml so that it looks for
> 1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK.
>
> On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Wow, that's embarrassing - project not building...
> >
> > It's because accumulo's release is no longer deployed into the standard
> > apache maven repository. Maybe one of the accumulo committers can shed
> some
> > light on where to find it?
> >
> > I'll make some changes and have it at least compiling from the raw
> tonight
> > :)
> >
> > The alternative is to download accumulo source (
> > https://github.com/apache/accumulo) and "mvn clean install" to get it
> > working on your local machine.
> >
> > Thanks Ted!
> >
> > -Jesse
> >
> > On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Thanks for the update, Jesse.
> > > Let us know of any feature Culvert needs from HBase.
> > >
> > > After cloning Culvert, I got:
> > >
> > > [INFO] Culvert - Accumulo Integration .................... FAILURE
> > [0.431s]
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [INFO] BUILD FAILURE
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [INFO] Total time: 1:06.638s
> > > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > > [INFO] Final Memory: 20M/81M
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> > > resolve dependencies for project
> > > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> > > artifact
> org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT
> > in
> > > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
> > >
> > > Can someone provide hint ?
> > >
> > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> > > >wrote:
> > >
> > > > Culvert was originally introduced at Hadoop Summit 2011, but recent
> > > updates
> > > > have made it very applicable to current systems. Recently, we added
> > > support
> > > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > > > Summit, there have also been significant code cleanup and added some
> > > small
> > > > features. However, we found that most people hadn't heard of Culvert,
> > so
> > > we
> > > > wanted to re-release the framework.
> > > >
> > > > For an introduction to using Culvert, check out the blog post here:
> > > > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> > > >
> > > > Also, the original presentation (where we discuss the internals) is
> > > > available on slideshare<
> > > >
> > >
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > > > >
> > > > .
> > > >
> > > > There is a Culvert hackathon in the middle of January:
> > > > http://culverthackathon2012.eventbrite.com/
> > > >
> > > > Oh, and you can find the code on
> > > > github<https://github.com/booz-allen-hamilton/culvert>
> > > > .
> > > >
> > > > Below is an overview of why we wrote Culvert and what it does.
> > > >
> > > > Secondary indexing is a common design pattern in BigTable-like
> > databases
> > > > that allows users to index one or more columns in a table. This
> > technique
> > > > enables fast search of records in a database based on a particular
> > column
> > > > instead of the row id, thus enabling relational-style semantics in a
> > > NoSQL
> > > > environment. Frequently, the index is stored either in a reserved
> > > namespace
> > > > in the table or another index table.
> > > >
> > > > Despite the fact that this is a common design pattern in
> BigTable-based
> > > > applications, most implementations of this practice to date have been
> > > > tightly coupled with a particular application. As a result, few
> > > > general-purpose frameworks for secondary indexing on BigTable-like
> > > > databases exist, and those that do are tied to a particular
> > > implementation
> > > > of the BigTable model.
> > > >
> > > > There are several existing tools (Solr, Lily), but these are focused
> on
> > > > doing text based search and are highly restrictive to indexes created
> > > > through their framework. What if you want to use your existing
> indexes?
> > > Or
> > > > leverage the indexes to do complex queries?
> > > >
> > > > We developed a solution to this problem called Culvert that supports
> > > online
> > > > index updates as well as a variation of the HIVE query language. In
> > > > designing Culvert, we sought to make the solution pluggable so that
> it
> > > can
> > > > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > > > etc.). Furthermore, it is also easily extensible to existing, hand
> > rolled
> > > > indexes.
> > > >
> > > > As well as being a secondary indexing framework, it is also a query
> > > > execution mechanism - think pig/hive minus the fancy command line. We
> > > > support a subset of SQL, but are able to take full advantage of
> > > home-rolled
> > > > and built-in indexes, leading to query execution times potentially
> > orders
> > > > of magnitude smaller than existing approaches and certainly orders of
> > > > magnitude more easily.
> > > >
> > > > -- Jesse
> > > > -------------------
> > > > Jesse Yates
> > > > 240-888-2200
> > > > @jesse_yates
> > > >
> > >
> >
> >
> >
> > --
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
> >
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

I just updated trunk so that we don't build the accumulo package by default.

If you want to build with accumulo, right now we are supporting the
"accumulo-1.3.5-incubating" branch, which supports the current released
version of accumulo
(accumulo-1.3.5<http://incubator.apache.org/accumulo/downloads/downloads.html>).


Hopefully, in the near future, we can start hosting the accumulo snapshots
in a publicly accessible maven repository, and we can merge the accumulo
branch back into trunk.

On Thu, Dec 22, 2011 at 2:35 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the hint. That works.
>
> I had to modify culvert-accumulo/pom.xml so that it looks for
> 1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK.
>
> On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Wow, that's embarrassing - project not building...
> >
> > It's because accumulo's release is no longer deployed into the standard
> > apache maven repository. Maybe one of the accumulo committers can shed
> some
> > light on where to find it?
> >
> > I'll make some changes and have it at least compiling from the raw
> tonight
> > :)
> >
> > The alternative is to download accumulo source (
> > https://github.com/apache/accumulo) and "mvn clean install" to get it
> > working on your local machine.
> >
> > Thanks Ted!
> >
> > -Jesse
> >
> > On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Thanks for the update, Jesse.
> > > Let us know of any feature Culvert needs from HBase.
> > >
> > > After cloning Culvert, I got:
> > >
> > > [INFO] Culvert - Accumulo Integration .................... FAILURE
> > [0.431s]
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [INFO] BUILD FAILURE
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [INFO] Total time: 1:06.638s
> > > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > > [INFO] Final Memory: 20M/81M
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> > > resolve dependencies for project
> > > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> > > artifact
> org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT
> > in
> > > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
> > >
> > > Can someone provide hint ?
> > >
> > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> > > >wrote:
> > >
> > > > Culvert was originally introduced at Hadoop Summit 2011, but recent
> > > updates
> > > > have made it very applicable to current systems. Recently, we added
> > > support
> > > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > > > Summit, there have also been significant code cleanup and added some
> > > small
> > > > features. However, we found that most people hadn't heard of Culvert,
> > so
> > > we
> > > > wanted to re-release the framework.
> > > >
> > > > For an introduction to using Culvert, check out the blog post here:
> > > > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> > > >
> > > > Also, the original presentation (where we discuss the internals) is
> > > > available on slideshare<
> > > >
> > >
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > > > >
> > > > .
> > > >
> > > > There is a Culvert hackathon in the middle of January:
> > > > http://culverthackathon2012.eventbrite.com/
> > > >
> > > > Oh, and you can find the code on
> > > > github<https://github.com/booz-allen-hamilton/culvert>
> > > > .
> > > >
> > > > Below is an overview of why we wrote Culvert and what it does.
> > > >
> > > > Secondary indexing is a common design pattern in BigTable-like
> > databases
> > > > that allows users to index one or more columns in a table. This
> > technique
> > > > enables fast search of records in a database based on a particular
> > column
> > > > instead of the row id, thus enabling relational-style semantics in a
> > > NoSQL
> > > > environment. Frequently, the index is stored either in a reserved
> > > namespace
> > > > in the table or another index table.
> > > >
> > > > Despite the fact that this is a common design pattern in
> BigTable-based
> > > > applications, most implementations of this practice to date have been
> > > > tightly coupled with a particular application. As a result, few
> > > > general-purpose frameworks for secondary indexing on BigTable-like
> > > > databases exist, and those that do are tied to a particular
> > > implementation
> > > > of the BigTable model.
> > > >
> > > > There are several existing tools (Solr, Lily), but these are focused
> on
> > > > doing text based search and are highly restrictive to indexes created
> > > > through their framework. What if you want to use your existing
> indexes?
> > > Or
> > > > leverage the indexes to do complex queries?
> > > >
> > > > We developed a solution to this problem called Culvert that supports
> > > online
> > > > index updates as well as a variation of the HIVE query language. In
> > > > designing Culvert, we sought to make the solution pluggable so that
> it
> > > can
> > > > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > > > etc.). Furthermore, it is also easily extensible to existing, hand
> > rolled
> > > > indexes.
> > > >
> > > > As well as being a secondary indexing framework, it is also a query
> > > > execution mechanism - think pig/hive minus the fancy command line. We
> > > > support a subset of SQL, but are able to take full advantage of
> > > home-rolled
> > > > and built-in indexes, leading to query execution times potentially
> > orders
> > > > of magnitude smaller than existing approaches and certainly orders of
> > > > magnitude more easily.
> > > >
> > > > -- Jesse
> > > > -------------------
> > > > Jesse Yates
> > > > 240-888-2200
> > > > @jesse_yates
> > > >
> > >
> >
> >
> >
> > --
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
> >
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

I just updated trunk so that we don't build the accumulo package by default.

If you want to build with accumulo, right now we are supporting the
"accumulo-1.3.5-incubating" branch, which supports the current released
version of accumulo
(accumulo-1.3.5<http://incubator.apache.org/accumulo/downloads/downloads.html>).


Hopefully, in the near future, we can start hosting the accumulo snapshots
in a publicly accessible maven repository, and we can merge the accumulo
branch back into trunk.

On Thu, Dec 22, 2011 at 2:35 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the hint. That works.
>
> I had to modify culvert-accumulo/pom.xml so that it looks for
> 1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK.
>
> On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Wow, that's embarrassing - project not building...
> >
> > It's because accumulo's release is no longer deployed into the standard
> > apache maven repository. Maybe one of the accumulo committers can shed
> some
> > light on where to find it?
> >
> > I'll make some changes and have it at least compiling from the raw
> tonight
> > :)
> >
> > The alternative is to download accumulo source (
> > https://github.com/apache/accumulo) and "mvn clean install" to get it
> > working on your local machine.
> >
> > Thanks Ted!
> >
> > -Jesse
> >
> > On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Thanks for the update, Jesse.
> > > Let us know of any feature Culvert needs from HBase.
> > >
> > > After cloning Culvert, I got:
> > >
> > > [INFO] Culvert - Accumulo Integration .................... FAILURE
> > [0.431s]
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [INFO] BUILD FAILURE
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [INFO] Total time: 1:06.638s
> > > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > > [INFO] Final Memory: 20M/81M
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> > > resolve dependencies for project
> > > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> > > artifact
> org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT
> > in
> > > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
> > >
> > > Can someone provide hint ?
> > >
> > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> > > >wrote:
> > >
> > > > Culvert was originally introduced at Hadoop Summit 2011, but recent
> > > updates
> > > > have made it very applicable to current systems. Recently, we added
> > > support
> > > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > > > Summit, there have also been significant code cleanup and added some
> > > small
> > > > features. However, we found that most people hadn't heard of Culvert,
> > so
> > > we
> > > > wanted to re-release the framework.
> > > >
> > > > For an introduction to using Culvert, check out the blog post here:
> > > > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> > > >
> > > > Also, the original presentation (where we discuss the internals) is
> > > > available on slideshare<
> > > >
> > >
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > > > >
> > > > .
> > > >
> > > > There is a Culvert hackathon in the middle of January:
> > > > http://culverthackathon2012.eventbrite.com/
> > > >
> > > > Oh, and you can find the code on
> > > > github<https://github.com/booz-allen-hamilton/culvert>
> > > > .
> > > >
> > > > Below is an overview of why we wrote Culvert and what it does.
> > > >
> > > > Secondary indexing is a common design pattern in BigTable-like
> > databases
> > > > that allows users to index one or more columns in a table. This
> > technique
> > > > enables fast search of records in a database based on a particular
> > column
> > > > instead of the row id, thus enabling relational-style semantics in a
> > > NoSQL
> > > > environment. Frequently, the index is stored either in a reserved
> > > namespace
> > > > in the table or another index table.
> > > >
> > > > Despite the fact that this is a common design pattern in
> BigTable-based
> > > > applications, most implementations of this practice to date have been
> > > > tightly coupled with a particular application. As a result, few
> > > > general-purpose frameworks for secondary indexing on BigTable-like
> > > > databases exist, and those that do are tied to a particular
> > > implementation
> > > > of the BigTable model.
> > > >
> > > > There are several existing tools (Solr, Lily), but these are focused
> on
> > > > doing text based search and are highly restrictive to indexes created
> > > > through their framework. What if you want to use your existing
> indexes?
> > > Or
> > > > leverage the indexes to do complex queries?
> > > >
> > > > We developed a solution to this problem called Culvert that supports
> > > online
> > > > index updates as well as a variation of the HIVE query language. In
> > > > designing Culvert, we sought to make the solution pluggable so that
> it
> > > can
> > > > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > > > etc.). Furthermore, it is also easily extensible to existing, hand
> > rolled
> > > > indexes.
> > > >
> > > > As well as being a secondary indexing framework, it is also a query
> > > > execution mechanism - think pig/hive minus the fancy command line. We
> > > > support a subset of SQL, but are able to take full advantage of
> > > home-rolled
> > > > and built-in indexes, leading to query execution times potentially
> > orders
> > > > of magnitude smaller than existing approaches and certainly orders of
> > > > magnitude more easily.
> > > >
> > > > -- Jesse
> > > > -------------------
> > > > Jesse Yates
> > > > 240-888-2200
> > > > @jesse_yates
> > > >
> > >
> >
> >
> >
> > --
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
> >
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

I just updated trunk so that we don't build the accumulo package by default.

If you want to build with accumulo, right now we are supporting the
"accumulo-1.3.5-incubating" branch, which supports the current released
version of accumulo
(accumulo-1.3.5<http://incubator.apache.org/accumulo/downloads/downloads.html>).


Hopefully, in the near future, we can start hosting the accumulo snapshots
in a publicly accessible maven repository, and we can merge the accumulo
branch back into trunk.

On Thu, Dec 22, 2011 at 2:35 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the hint. That works.
>
> I had to modify culvert-accumulo/pom.xml so that it looks for
> 1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK.
>
> On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Wow, that's embarrassing - project not building...
> >
> > It's because accumulo's release is no longer deployed into the standard
> > apache maven repository. Maybe one of the accumulo committers can shed
> some
> > light on where to find it?
> >
> > I'll make some changes and have it at least compiling from the raw
> tonight
> > :)
> >
> > The alternative is to download accumulo source (
> > https://github.com/apache/accumulo) and "mvn clean install" to get it
> > working on your local machine.
> >
> > Thanks Ted!
> >
> > -Jesse
> >
> > On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Thanks for the update, Jesse.
> > > Let us know of any feature Culvert needs from HBase.
> > >
> > > After cloning Culvert, I got:
> > >
> > > [INFO] Culvert - Accumulo Integration .................... FAILURE
> > [0.431s]
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [INFO] BUILD FAILURE
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [INFO] Total time: 1:06.638s
> > > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > > [INFO] Final Memory: 20M/81M
> > > [INFO]
> > >
> ------------------------------------------------------------------------
> > > [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> > > resolve dependencies for project
> > > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> > > artifact
> org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT
> > in
> > > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
> > >
> > > Can someone provide hint ?
> > >
> > > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> > > >wrote:
> > >
> > > > Culvert was originally introduced at Hadoop Summit 2011, but recent
> > > updates
> > > > have made it very applicable to current systems. Recently, we added
> > > support
> > > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > > > Summit, there have also been significant code cleanup and added some
> > > small
> > > > features. However, we found that most people hadn't heard of Culvert,
> > so
> > > we
> > > > wanted to re-release the framework.
> > > >
> > > > For an introduction to using Culvert, check out the blog post here:
> > > > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> > > >
> > > > Also, the original presentation (where we discuss the internals) is
> > > > available on slideshare<
> > > >
> > >
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > > > >
> > > > .
> > > >
> > > > There is a Culvert hackathon in the middle of January:
> > > > http://culverthackathon2012.eventbrite.com/
> > > >
> > > > Oh, and you can find the code on
> > > > github<https://github.com/booz-allen-hamilton/culvert>
> > > > .
> > > >
> > > > Below is an overview of why we wrote Culvert and what it does.
> > > >
> > > > Secondary indexing is a common design pattern in BigTable-like
> > databases
> > > > that allows users to index one or more columns in a table. This
> > technique
> > > > enables fast search of records in a database based on a particular
> > column
> > > > instead of the row id, thus enabling relational-style semantics in a
> > > NoSQL
> > > > environment. Frequently, the index is stored either in a reserved
> > > namespace
> > > > in the table or another index table.
> > > >
> > > > Despite the fact that this is a common design pattern in
> BigTable-based
> > > > applications, most implementations of this practice to date have been
> > > > tightly coupled with a particular application. As a result, few
> > > > general-purpose frameworks for secondary indexing on BigTable-like
> > > > databases exist, and those that do are tied to a particular
> > > implementation
> > > > of the BigTable model.
> > > >
> > > > There are several existing tools (Solr, Lily), but these are focused
> on
> > > > doing text based search and are highly restrictive to indexes created
> > > > through their framework. What if you want to use your existing
> indexes?
> > > Or
> > > > leverage the indexes to do complex queries?
> > > >
> > > > We developed a solution to this problem called Culvert that supports
> > > online
> > > > index updates as well as a variation of the HIVE query language. In
> > > > designing Culvert, we sought to make the solution pluggable so that
> it
> > > can
> > > > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > > > etc.). Furthermore, it is also easily extensible to existing, hand
> > rolled
> > > > indexes.
> > > >
> > > > As well as being a secondary indexing framework, it is also a query
> > > > execution mechanism - think pig/hive minus the fancy command line. We
> > > > support a subset of SQL, but are able to take full advantage of
> > > home-rolled
> > > > and built-in indexes, leading to query execution times potentially
> > orders
> > > > of magnitude smaller than existing approaches and certainly orders of
> > > > magnitude more easily.
> > > >
> > > > -- Jesse
> > > > -------------------
> > > > Jesse Yates
> > > > 240-888-2200
> > > > @jesse_yates
> > > >
> > >
> >
> >
> >
> > --
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
> >
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Ted Yu <yu...@gmail.com>.

Thanks for the hint. That works.

I had to modify culvert-accumulo/pom.xml so that it looks for
1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK.

On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <je...@gmail.com>wrote:

> Wow, that's embarrassing - project not building...
>
> It's because accumulo's release is no longer deployed into the standard
> apache maven repository. Maybe one of the accumulo committers can shed some
> light on where to find it?
>
> I'll make some changes and have it at least compiling from the raw tonight
> :)
>
> The alternative is to download accumulo source (
> https://github.com/apache/accumulo) and "mvn clean install" to get it
> working on your local machine.
>
> Thanks Ted!
>
> -Jesse
>
> On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Thanks for the update, Jesse.
> > Let us know of any feature Culvert needs from HBase.
> >
> > After cloning Culvert, I got:
> >
> > [INFO] Culvert - Accumulo Integration .................... FAILURE
> [0.431s]
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] BUILD FAILURE
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] Total time: 1:06.638s
> > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > [INFO] Final Memory: 20M/81M
> > [INFO]
> > ------------------------------------------------------------------------
> > [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> > resolve dependencies for project
> > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> > artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT
> in
> > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
> >
> > Can someone provide hint ?
> >
> > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> > >wrote:
> >
> > > Culvert was originally introduced at Hadoop Summit 2011, but recent
> > updates
> > > have made it very applicable to current systems. Recently, we added
> > support
> > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > > Summit, there have also been significant code cleanup and added some
> > small
> > > features. However, we found that most people hadn't heard of Culvert,
> so
> > we
> > > wanted to re-release the framework.
> > >
> > > For an introduction to using Culvert, check out the blog post here:
> > > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> > >
> > > Also, the original presentation (where we discuss the internals) is
> > > available on slideshare<
> > >
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > > >
> > > .
> > >
> > > There is a Culvert hackathon in the middle of January:
> > > http://culverthackathon2012.eventbrite.com/
> > >
> > > Oh, and you can find the code on
> > > github<https://github.com/booz-allen-hamilton/culvert>
> > > .
> > >
> > > Below is an overview of why we wrote Culvert and what it does.
> > >
> > > Secondary indexing is a common design pattern in BigTable-like
> databases
> > > that allows users to index one or more columns in a table. This
> technique
> > > enables fast search of records in a database based on a particular
> column
> > > instead of the row id, thus enabling relational-style semantics in a
> > NoSQL
> > > environment. Frequently, the index is stored either in a reserved
> > namespace
> > > in the table or another index table.
> > >
> > > Despite the fact that this is a common design pattern in BigTable-based
> > > applications, most implementations of this practice to date have been
> > > tightly coupled with a particular application. As a result, few
> > > general-purpose frameworks for secondary indexing on BigTable-like
> > > databases exist, and those that do are tied to a particular
> > implementation
> > > of the BigTable model.
> > >
> > > There are several existing tools (Solr, Lily), but these are focused on
> > > doing text based search and are highly restrictive to indexes created
> > > through their framework. What if you want to use your existing indexes?
> > Or
> > > leverage the indexes to do complex queries?
> > >
> > > We developed a solution to this problem called Culvert that supports
> > online
> > > index updates as well as a variation of the HIVE query language. In
> > > designing Culvert, we sought to make the solution pluggable so that it
> > can
> > > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > > etc.). Furthermore, it is also easily extensible to existing, hand
> rolled
> > > indexes.
> > >
> > > As well as being a secondary indexing framework, it is also a query
> > > execution mechanism - think pig/hive minus the fancy command line. We
> > > support a subset of SQL, but are able to take full advantage of
> > home-rolled
> > > and built-in indexes, leading to query execution times potentially
> orders
> > > of magnitude smaller than existing approaches and certainly orders of
> > > magnitude more easily.
> > >
> > > -- Jesse
> > > -------------------
> > > Jesse Yates
> > > 240-888-2200
> > > @jesse_yates
> > >
> >
>
>
>
> --
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates
>

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Ted Yu <yu...@gmail.com>.

Thanks for the hint. That works.

I had to modify culvert-accumulo/pom.xml so that it looks for
1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK.

On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <je...@gmail.com>wrote:

> Wow, that's embarrassing - project not building...
>
> It's because accumulo's release is no longer deployed into the standard
> apache maven repository. Maybe one of the accumulo committers can shed some
> light on where to find it?
>
> I'll make some changes and have it at least compiling from the raw tonight
> :)
>
> The alternative is to download accumulo source (
> https://github.com/apache/accumulo) and "mvn clean install" to get it
> working on your local machine.
>
> Thanks Ted!
>
> -Jesse
>
> On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Thanks for the update, Jesse.
> > Let us know of any feature Culvert needs from HBase.
> >
> > After cloning Culvert, I got:
> >
> > [INFO] Culvert - Accumulo Integration .................... FAILURE
> [0.431s]
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] BUILD FAILURE
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] Total time: 1:06.638s
> > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > [INFO] Final Memory: 20M/81M
> > [INFO]
> > ------------------------------------------------------------------------
> > [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> > resolve dependencies for project
> > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> > artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT
> in
> > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
> >
> > Can someone provide hint ?
> >
> > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> > >wrote:
> >
> > > Culvert was originally introduced at Hadoop Summit 2011, but recent
> > updates
> > > have made it very applicable to current systems. Recently, we added
> > support
> > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > > Summit, there have also been significant code cleanup and added some
> > small
> > > features. However, we found that most people hadn't heard of Culvert,
> so
> > we
> > > wanted to re-release the framework.
> > >
> > > For an introduction to using Culvert, check out the blog post here:
> > > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> > >
> > > Also, the original presentation (where we discuss the internals) is
> > > available on slideshare<
> > >
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > > >
> > > .
> > >
> > > There is a Culvert hackathon in the middle of January:
> > > http://culverthackathon2012.eventbrite.com/
> > >
> > > Oh, and you can find the code on
> > > github<https://github.com/booz-allen-hamilton/culvert>
> > > .
> > >
> > > Below is an overview of why we wrote Culvert and what it does.
> > >
> > > Secondary indexing is a common design pattern in BigTable-like
> databases
> > > that allows users to index one or more columns in a table. This
> technique
> > > enables fast search of records in a database based on a particular
> column
> > > instead of the row id, thus enabling relational-style semantics in a
> > NoSQL
> > > environment. Frequently, the index is stored either in a reserved
> > namespace
> > > in the table or another index table.
> > >
> > > Despite the fact that this is a common design pattern in BigTable-based
> > > applications, most implementations of this practice to date have been
> > > tightly coupled with a particular application. As a result, few
> > > general-purpose frameworks for secondary indexing on BigTable-like
> > > databases exist, and those that do are tied to a particular
> > implementation
> > > of the BigTable model.
> > >
> > > There are several existing tools (Solr, Lily), but these are focused on
> > > doing text based search and are highly restrictive to indexes created
> > > through their framework. What if you want to use your existing indexes?
> > Or
> > > leverage the indexes to do complex queries?
> > >
> > > We developed a solution to this problem called Culvert that supports
> > online
> > > index updates as well as a variation of the HIVE query language. In
> > > designing Culvert, we sought to make the solution pluggable so that it
> > can
> > > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > > etc.). Furthermore, it is also easily extensible to existing, hand
> rolled
> > > indexes.
> > >
> > > As well as being a secondary indexing framework, it is also a query
> > > execution mechanism - think pig/hive minus the fancy command line. We
> > > support a subset of SQL, but are able to take full advantage of
> > home-rolled
> > > and built-in indexes, leading to query execution times potentially
> orders
> > > of magnitude smaller than existing approaches and certainly orders of
> > > magnitude more easily.
> > >
> > > -- Jesse
> > > -------------------
> > > Jesse Yates
> > > 240-888-2200
> > > @jesse_yates
> > >
> >
>
>
>
> --
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates
>

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Ted Yu <yu...@gmail.com>.

Thanks for the hint. That works.

I had to modify culvert-accumulo/pom.xml so that it looks for
1.5.0-incubating-SNAPSHOT which was built by accumulo TRUNK.

On Thu, Dec 22, 2011 at 2:22 PM, Jesse Yates <je...@gmail.com>wrote:

> Wow, that's embarrassing - project not building...
>
> It's because accumulo's release is no longer deployed into the standard
> apache maven repository. Maybe one of the accumulo committers can shed some
> light on where to find it?
>
> I'll make some changes and have it at least compiling from the raw tonight
> :)
>
> The alternative is to download accumulo source (
> https://github.com/apache/accumulo) and "mvn clean install" to get it
> working on your local machine.
>
> Thanks Ted!
>
> -Jesse
>
> On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Thanks for the update, Jesse.
> > Let us know of any feature Culvert needs from HBase.
> >
> > After cloning Culvert, I got:
> >
> > [INFO] Culvert - Accumulo Integration .................... FAILURE
> [0.431s]
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] BUILD FAILURE
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] Total time: 1:06.638s
> > [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> > [INFO] Final Memory: 20M/81M
> > [INFO]
> > ------------------------------------------------------------------------
> > [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> > resolve dependencies for project
> > com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> > artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT
> in
> > apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
> >
> > Can someone provide hint ?
> >
> > On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> > >wrote:
> >
> > > Culvert was originally introduced at Hadoop Summit 2011, but recent
> > updates
> > > have made it very applicable to current systems. Recently, we added
> > support
> > > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > > Summit, there have also been significant code cleanup and added some
> > small
> > > features. However, we found that most people hadn't heard of Culvert,
> so
> > we
> > > wanted to re-release the framework.
> > >
> > > For an introduction to using Culvert, check out the blog post here:
> > > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> > >
> > > Also, the original presentation (where we discuss the internals) is
> > > available on slideshare<
> > >
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > > >
> > > .
> > >
> > > There is a Culvert hackathon in the middle of January:
> > > http://culverthackathon2012.eventbrite.com/
> > >
> > > Oh, and you can find the code on
> > > github<https://github.com/booz-allen-hamilton/culvert>
> > > .
> > >
> > > Below is an overview of why we wrote Culvert and what it does.
> > >
> > > Secondary indexing is a common design pattern in BigTable-like
> databases
> > > that allows users to index one or more columns in a table. This
> technique
> > > enables fast search of records in a database based on a particular
> column
> > > instead of the row id, thus enabling relational-style semantics in a
> > NoSQL
> > > environment. Frequently, the index is stored either in a reserved
> > namespace
> > > in the table or another index table.
> > >
> > > Despite the fact that this is a common design pattern in BigTable-based
> > > applications, most implementations of this practice to date have been
> > > tightly coupled with a particular application. As a result, few
> > > general-purpose frameworks for secondary indexing on BigTable-like
> > > databases exist, and those that do are tied to a particular
> > implementation
> > > of the BigTable model.
> > >
> > > There are several existing tools (Solr, Lily), but these are focused on
> > > doing text based search and are highly restrictive to indexes created
> > > through their framework. What if you want to use your existing indexes?
> > Or
> > > leverage the indexes to do complex queries?
> > >
> > > We developed a solution to this problem called Culvert that supports
> > online
> > > index updates as well as a variation of the HIVE query language. In
> > > designing Culvert, we sought to make the solution pluggable so that it
> > can
> > > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > > etc.). Furthermore, it is also easily extensible to existing, hand
> rolled
> > > indexes.
> > >
> > > As well as being a secondary indexing framework, it is also a query
> > > execution mechanism - think pig/hive minus the fancy command line. We
> > > support a subset of SQL, but are able to take full advantage of
> > home-rolled
> > > and built-in indexes, leading to query execution times potentially
> orders
> > > of magnitude smaller than existing approaches and certainly orders of
> > > magnitude more easily.
> > >
> > > -- Jesse
> > > -------------------
> > > Jesse Yates
> > > 240-888-2200
> > > @jesse_yates
> > >
> >
>
>
>
> --
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates
>

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by John W Vines <jo...@ugov.gov>.

We have yet to release accumulo-1.4, so that was all you working out of your local repo.

As for Accumulo-1.3.5, we are currently working on making the appropriate changes to get make it kosher for a maven release, but we're not there yet.

John

----- Original Message -----
| From: "Jesse Yates" <je...@gmail.com>
| To: user@hbase.apache.org
| Cc: dev@hbase.apache.org, accumulo-dev@incubator.apache.org, accumulo-user@incubator.apache.org
| Sent: Thursday, December 22, 2011 5:22:46 PM
| Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems
| Wow, that's embarrassing - project not building...
| 
| It's because accumulo's release is no longer deployed into the
| standard apache maven repository. Maybe one of the accumulo committers
| can shed some light on where to find it?
| 
| I'll make some changes and have it at least compiling from the raw
| tonight :)
| 
| The alternative is to download accumulo source (
| https://github.com/apache/accumulo ) and "mvn clean install" to get it
| working on your local machine.
| 
| Thanks Ted!
| 
| -Jesse
| 
| 
| On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < yuzhihong@gmail.com > wrote:
| 
| 
| Thanks for the update, Jesse.
| Let us know of any feature Culvert needs from HBase.
| 
| After cloning Culvert, I got:
| 
| [INFO] Culvert - Accumulo Integration .................... FAILURE
| [0.431s]
| [INFO]
| ------------------------------------------------------------------------
| [INFO] BUILD FAILURE
| [INFO]
| ------------------------------------------------------------------------
| [INFO] Total time: 1:06.638s
| [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
| [INFO] Final Memory: 20M/81M
| [INFO]
| ------------------------------------------------------------------------
| [ERROR] Failed to execute goal on project culvert-accumulo: Could not
| resolve dependencies for project
| com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
| artifact
| org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
| apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help
| 1]
| 
| Can someone provide hint ?
| 
| On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <
| jesse.k.yates@gmail.com >wrote:
| 
| 
| > Culvert was originally introduced at Hadoop Summit 2011, but recent
| > updates
| > have made it very applicable to current systems. Recently, we added
| > support
| > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
| > Summit, there have also been significant code cleanup and added some
| > small
| > features. However, we found that most people hadn't heard of
| > Culvert, so we
| > wanted to re-release the framework.
| >
| > For an introduction to using Culvert, check out the blog post here:
| > http://jyates.github.com/2011/11/17/intro-to-culvert.html
| >
| > Also, the original presentation (where we discuss the internals) is
| > available on slideshare<
| > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
| 
| > >
| > .
| >
| > There is a Culvert hackathon in the middle of January:
| > http://culverthackathon2012.eventbrite.com/
| >
| > Oh, and you can find the code on
| > github< https://github.com/booz-allen-hamilton/culvert >
| 
| 
| > .
| >
| > Below is an overview of why we wrote Culvert and what it does.
| >
| > Secondary indexing is a common design pattern in BigTable-like
| > databases
| > that allows users to index one or more columns in a table. This
| > technique
| > enables fast search of records in a database based on a particular
| > column
| > instead of the row id, thus enabling relational-style semantics in a
| > NoSQL
| > environment. Frequently, the index is stored either in a reserved
| > namespace
| > in the table or another index table.
| >
| > Despite the fact that this is a common design pattern in
| > BigTable-based
| > applications, most implementations of this practice to date have
| > been
| > tightly coupled with a particular application. As a result, few
| > general-purpose frameworks for secondary indexing on BigTable-like
| > databases exist, and those that do are tied to a particular
| > implementation
| > of the BigTable model.
| >
| > There are several existing tools (Solr, Lily), but these are focused
| > on
| > doing text based search and are highly restrictive to indexes
| > created
| > through their framework. What if you want to use your existing
| > indexes? Or
| > leverage the indexes to do complex queries?
| >
| > We developed a solution to this problem called Culvert that supports
| > online
| > index updates as well as a variation of the HIVE query language. In
| > designing Culvert, we sought to make the solution pluggable so that
| > it can
| > be used on any of the many BigTable-like databases (HBase,
| > Cassandra,
| > etc.). Furthermore, it is also easily extensible to existing, hand
| > rolled
| > indexes.
| >
| > As well as being a secondary indexing framework, it is also a query
| > execution mechanism - think pig/hive minus the fancy command line.
| > We
| > support a subset of SQL, but are able to take full advantage of
| > home-rolled
| > and built-in indexes, leading to query execution times potentially
| > orders
| > of magnitude smaller than existing approaches and certainly orders
| > of
| > magnitude more easily.
| >
| > -- Jesse
| > -------------------
| > Jesse Yates
| > 240-888-2200
| > @jesse_yates
| >
| 
| 
| 
| --
| -------------------
| Jesse Yates
| 240-888-2200
| @jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by John W Vines <jo...@ugov.gov>.

We have yet to release accumulo-1.4, so that was all you working out of your local repo.

As for Accumulo-1.3.5, we are currently working on making the appropriate changes to get make it kosher for a maven release, but we're not there yet.

John

----- Original Message -----
| From: "Jesse Yates" <je...@gmail.com>
| To: user@hbase.apache.org
| Cc: dev@hbase.apache.org, accumulo-dev@incubator.apache.org, accumulo-user@incubator.apache.org
| Sent: Thursday, December 22, 2011 5:22:46 PM
| Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems
| Wow, that's embarrassing - project not building...
| 
| It's because accumulo's release is no longer deployed into the
| standard apache maven repository. Maybe one of the accumulo committers
| can shed some light on where to find it?
| 
| I'll make some changes and have it at least compiling from the raw
| tonight :)
| 
| The alternative is to download accumulo source (
| https://github.com/apache/accumulo ) and "mvn clean install" to get it
| working on your local machine.
| 
| Thanks Ted!
| 
| -Jesse
| 
| 
| On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < yuzhihong@gmail.com > wrote:
| 
| 
| Thanks for the update, Jesse.
| Let us know of any feature Culvert needs from HBase.
| 
| After cloning Culvert, I got:
| 
| [INFO] Culvert - Accumulo Integration .................... FAILURE
| [0.431s]
| [INFO]
| ------------------------------------------------------------------------
| [INFO] BUILD FAILURE
| [INFO]
| ------------------------------------------------------------------------
| [INFO] Total time: 1:06.638s
| [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
| [INFO] Final Memory: 20M/81M
| [INFO]
| ------------------------------------------------------------------------
| [ERROR] Failed to execute goal on project culvert-accumulo: Could not
| resolve dependencies for project
| com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
| artifact
| org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
| apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help
| 1]
| 
| Can someone provide hint ?
| 
| On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <
| jesse.k.yates@gmail.com >wrote:
| 
| 
| > Culvert was originally introduced at Hadoop Summit 2011, but recent
| > updates
| > have made it very applicable to current systems. Recently, we added
| > support
| > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
| > Summit, there have also been significant code cleanup and added some
| > small
| > features. However, we found that most people hadn't heard of
| > Culvert, so we
| > wanted to re-release the framework.
| >
| > For an introduction to using Culvert, check out the blog post here:
| > http://jyates.github.com/2011/11/17/intro-to-culvert.html
| >
| > Also, the original presentation (where we discuss the internals) is
| > available on slideshare<
| > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
| 
| > >
| > .
| >
| > There is a Culvert hackathon in the middle of January:
| > http://culverthackathon2012.eventbrite.com/
| >
| > Oh, and you can find the code on
| > github< https://github.com/booz-allen-hamilton/culvert >
| 
| 
| > .
| >
| > Below is an overview of why we wrote Culvert and what it does.
| >
| > Secondary indexing is a common design pattern in BigTable-like
| > databases
| > that allows users to index one or more columns in a table. This
| > technique
| > enables fast search of records in a database based on a particular
| > column
| > instead of the row id, thus enabling relational-style semantics in a
| > NoSQL
| > environment. Frequently, the index is stored either in a reserved
| > namespace
| > in the table or another index table.
| >
| > Despite the fact that this is a common design pattern in
| > BigTable-based
| > applications, most implementations of this practice to date have
| > been
| > tightly coupled with a particular application. As a result, few
| > general-purpose frameworks for secondary indexing on BigTable-like
| > databases exist, and those that do are tied to a particular
| > implementation
| > of the BigTable model.
| >
| > There are several existing tools (Solr, Lily), but these are focused
| > on
| > doing text based search and are highly restrictive to indexes
| > created
| > through their framework. What if you want to use your existing
| > indexes? Or
| > leverage the indexes to do complex queries?
| >
| > We developed a solution to this problem called Culvert that supports
| > online
| > index updates as well as a variation of the HIVE query language. In
| > designing Culvert, we sought to make the solution pluggable so that
| > it can
| > be used on any of the many BigTable-like databases (HBase,
| > Cassandra,
| > etc.). Furthermore, it is also easily extensible to existing, hand
| > rolled
| > indexes.
| >
| > As well as being a secondary indexing framework, it is also a query
| > execution mechanism - think pig/hive minus the fancy command line.
| > We
| > support a subset of SQL, but are able to take full advantage of
| > home-rolled
| > and built-in indexes, leading to query execution times potentially
| > orders
| > of magnitude smaller than existing approaches and certainly orders
| > of
| > magnitude more easily.
| >
| > -- Jesse
| > -------------------
| > Jesse Yates
| > 240-888-2200
| > @jesse_yates
| >
| 
| 
| 
| --
| -------------------
| Jesse Yates
| 240-888-2200
| @jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by John W Vines <jo...@ugov.gov>.

We have yet to release accumulo-1.4, so that was all you working out of your local repo.

As for Accumulo-1.3.5, we are currently working on making the appropriate changes to get make it kosher for a maven release, but we're not there yet.

John

----- Original Message -----
| From: "Jesse Yates" <je...@gmail.com>
| To: user@hbase.apache.org
| Cc: dev@hbase.apache.org, accumulo-dev@incubator.apache.org, accumulo-user@incubator.apache.org
| Sent: Thursday, December 22, 2011 5:22:46 PM
| Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems
| Wow, that's embarrassing - project not building...
| 
| It's because accumulo's release is no longer deployed into the
| standard apache maven repository. Maybe one of the accumulo committers
| can shed some light on where to find it?
| 
| I'll make some changes and have it at least compiling from the raw
| tonight :)
| 
| The alternative is to download accumulo source (
| https://github.com/apache/accumulo ) and "mvn clean install" to get it
| working on your local machine.
| 
| Thanks Ted!
| 
| -Jesse
| 
| 
| On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < yuzhihong@gmail.com > wrote:
| 
| 
| Thanks for the update, Jesse.
| Let us know of any feature Culvert needs from HBase.
| 
| After cloning Culvert, I got:
| 
| [INFO] Culvert - Accumulo Integration .................... FAILURE
| [0.431s]
| [INFO]
| ------------------------------------------------------------------------
| [INFO] BUILD FAILURE
| [INFO]
| ------------------------------------------------------------------------
| [INFO] Total time: 1:06.638s
| [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
| [INFO] Final Memory: 20M/81M
| [INFO]
| ------------------------------------------------------------------------
| [ERROR] Failed to execute goal on project culvert-accumulo: Could not
| resolve dependencies for project
| com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
| artifact
| org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
| apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help
| 1]
| 
| Can someone provide hint ?
| 
| On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <
| jesse.k.yates@gmail.com >wrote:
| 
| 
| > Culvert was originally introduced at Hadoop Summit 2011, but recent
| > updates
| > have made it very applicable to current systems. Recently, we added
| > support
| > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
| > Summit, there have also been significant code cleanup and added some
| > small
| > features. However, we found that most people hadn't heard of
| > Culvert, so we
| > wanted to re-release the framework.
| >
| > For an introduction to using Culvert, check out the blog post here:
| > http://jyates.github.com/2011/11/17/intro-to-culvert.html
| >
| > Also, the original presentation (where we discuss the internals) is
| > available on slideshare<
| > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
| 
| > >
| > .
| >
| > There is a Culvert hackathon in the middle of January:
| > http://culverthackathon2012.eventbrite.com/
| >
| > Oh, and you can find the code on
| > github< https://github.com/booz-allen-hamilton/culvert >
| 
| 
| > .
| >
| > Below is an overview of why we wrote Culvert and what it does.
| >
| > Secondary indexing is a common design pattern in BigTable-like
| > databases
| > that allows users to index one or more columns in a table. This
| > technique
| > enables fast search of records in a database based on a particular
| > column
| > instead of the row id, thus enabling relational-style semantics in a
| > NoSQL
| > environment. Frequently, the index is stored either in a reserved
| > namespace
| > in the table or another index table.
| >
| > Despite the fact that this is a common design pattern in
| > BigTable-based
| > applications, most implementations of this practice to date have
| > been
| > tightly coupled with a particular application. As a result, few
| > general-purpose frameworks for secondary indexing on BigTable-like
| > databases exist, and those that do are tied to a particular
| > implementation
| > of the BigTable model.
| >
| > There are several existing tools (Solr, Lily), but these are focused
| > on
| > doing text based search and are highly restrictive to indexes
| > created
| > through their framework. What if you want to use your existing
| > indexes? Or
| > leverage the indexes to do complex queries?
| >
| > We developed a solution to this problem called Culvert that supports
| > online
| > index updates as well as a variation of the HIVE query language. In
| > designing Culvert, we sought to make the solution pluggable so that
| > it can
| > be used on any of the many BigTable-like databases (HBase,
| > Cassandra,
| > etc.). Furthermore, it is also easily extensible to existing, hand
| > rolled
| > indexes.
| >
| > As well as being a secondary indexing framework, it is also a query
| > execution mechanism - think pig/hive minus the fancy command line.
| > We
| > support a subset of SQL, but are able to take full advantage of
| > home-rolled
| > and built-in indexes, leading to query execution times potentially
| > orders
| > of magnitude smaller than existing approaches and certainly orders
| > of
| > magnitude more easily.
| >
| > -- Jesse
| > -------------------
| > Jesse Yates
| > 240-888-2200
| > @jesse_yates
| >
| 
| 
| 
| --
| -------------------
| Jesse Yates
| 240-888-2200
| @jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by John W Vines <jo...@ugov.gov>.

We have yet to release accumulo-1.4, so that was all you working out of your local repo.

As for Accumulo-1.3.5, we are currently working on making the appropriate changes to get make it kosher for a maven release, but we're not there yet.

John

----- Original Message -----
| From: "Jesse Yates" <je...@gmail.com>
| To: user@hbase.apache.org
| Cc: dev@hbase.apache.org, accumulo-dev@incubator.apache.org, accumulo-user@incubator.apache.org
| Sent: Thursday, December 22, 2011 5:22:46 PM
| Subject: Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems
| Wow, that's embarrassing - project not building...
| 
| It's because accumulo's release is no longer deployed into the
| standard apache maven repository. Maybe one of the accumulo committers
| can shed some light on where to find it?
| 
| I'll make some changes and have it at least compiling from the raw
| tonight :)
| 
| The alternative is to download accumulo source (
| https://github.com/apache/accumulo ) and "mvn clean install" to get it
| working on your local machine.
| 
| Thanks Ted!
| 
| -Jesse
| 
| 
| On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu < yuzhihong@gmail.com > wrote:
| 
| 
| Thanks for the update, Jesse.
| Let us know of any feature Culvert needs from HBase.
| 
| After cloning Culvert, I got:
| 
| [INFO] Culvert - Accumulo Integration .................... FAILURE
| [0.431s]
| [INFO]
| ------------------------------------------------------------------------
| [INFO] BUILD FAILURE
| [INFO]
| ------------------------------------------------------------------------
| [INFO] Total time: 1:06.638s
| [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
| [INFO] Final Memory: 20M/81M
| [INFO]
| ------------------------------------------------------------------------
| [ERROR] Failed to execute goal on project culvert-accumulo: Could not
| resolve dependencies for project
| com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
| artifact
| org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
| apache-snapshots ( http://repository.apache.org/snapshots/ ) -> [Help
| 1]
| 
| Can someone provide hint ?
| 
| On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <
| jesse.k.yates@gmail.com >wrote:
| 
| 
| > Culvert was originally introduced at Hadoop Summit 2011, but recent
| > updates
| > have made it very applicable to current systems. Recently, we added
| > support
| > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
| > Summit, there have also been significant code cleanup and added some
| > small
| > features. However, we found that most people hadn't heard of
| > Culvert, so we
| > wanted to re-release the framework.
| >
| > For an introduction to using Culvert, check out the blog post here:
| > http://jyates.github.com/2011/11/17/intro-to-culvert.html
| >
| > Also, the original presentation (where we discuss the internals) is
| > available on slideshare<
| > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
| 
| > >
| > .
| >
| > There is a Culvert hackathon in the middle of January:
| > http://culverthackathon2012.eventbrite.com/
| >
| > Oh, and you can find the code on
| > github< https://github.com/booz-allen-hamilton/culvert >
| 
| 
| > .
| >
| > Below is an overview of why we wrote Culvert and what it does.
| >
| > Secondary indexing is a common design pattern in BigTable-like
| > databases
| > that allows users to index one or more columns in a table. This
| > technique
| > enables fast search of records in a database based on a particular
| > column
| > instead of the row id, thus enabling relational-style semantics in a
| > NoSQL
| > environment. Frequently, the index is stored either in a reserved
| > namespace
| > in the table or another index table.
| >
| > Despite the fact that this is a common design pattern in
| > BigTable-based
| > applications, most implementations of this practice to date have
| > been
| > tightly coupled with a particular application. As a result, few
| > general-purpose frameworks for secondary indexing on BigTable-like
| > databases exist, and those that do are tied to a particular
| > implementation
| > of the BigTable model.
| >
| > There are several existing tools (Solr, Lily), but these are focused
| > on
| > doing text based search and are highly restrictive to indexes
| > created
| > through their framework. What if you want to use your existing
| > indexes? Or
| > leverage the indexes to do complex queries?
| >
| > We developed a solution to this problem called Culvert that supports
| > online
| > index updates as well as a variation of the HIVE query language. In
| > designing Culvert, we sought to make the solution pluggable so that
| > it can
| > be used on any of the many BigTable-like databases (HBase,
| > Cassandra,
| > etc.). Furthermore, it is also easily extensible to existing, hand
| > rolled
| > indexes.
| >
| > As well as being a secondary indexing framework, it is also a query
| > execution mechanism - think pig/hive minus the fancy command line.
| > We
| > support a subset of SQL, but are able to take full advantage of
| > home-rolled
| > and built-in indexes, leading to query execution times potentially
| > orders
| > of magnitude smaller than existing approaches and certainly orders
| > of
| > magnitude more easily.
| >
| > -- Jesse
| > -------------------
| > Jesse Yates
| > 240-888-2200
| > @jesse_yates
| >
| 
| 
| 
| --
| -------------------
| Jesse Yates
| 240-888-2200
| @jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

Wow, that's embarrassing - project not building...

It's because accumulo's release is no longer deployed into the standard
apache maven repository. Maybe one of the accumulo committers can shed some
light on where to find it?

I'll make some changes and have it at least compiling from the raw tonight
:)

The alternative is to download accumulo source (
https://github.com/apache/accumulo) and "mvn clean install" to get it
working on your local machine.

Thanks Ted!

-Jesse

On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the update, Jesse.
> Let us know of any feature Culvert needs from HBase.
>
> After cloning Culvert, I got:
>
> [INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 1:06.638s
> [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> [INFO] Final Memory: 20M/81M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> resolve dependencies for project
> com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
> apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
>
> Can someone provide hint ?
>
> On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Culvert was originally introduced at Hadoop Summit 2011, but recent
> updates
> > have made it very applicable to current systems. Recently, we added
> support
> > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > Summit, there have also been significant code cleanup and added some
> small
> > features. However, we found that most people hadn't heard of Culvert, so
> we
> > wanted to re-release the framework.
> >
> > For an introduction to using Culvert, check out the blog post here:
> > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> >
> > Also, the original presentation (where we discuss the internals) is
> > available on slideshare<
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > >
> > .
> >
> > There is a Culvert hackathon in the middle of January:
> > http://culverthackathon2012.eventbrite.com/
> >
> > Oh, and you can find the code on
> > github<https://github.com/booz-allen-hamilton/culvert>
> > .
> >
> > Below is an overview of why we wrote Culvert and what it does.
> >
> > Secondary indexing is a common design pattern in BigTable-like databases
> > that allows users to index one or more columns in a table. This technique
> > enables fast search of records in a database based on a particular column
> > instead of the row id, thus enabling relational-style semantics in a
> NoSQL
> > environment. Frequently, the index is stored either in a reserved
> namespace
> > in the table or another index table.
> >
> > Despite the fact that this is a common design pattern in BigTable-based
> > applications, most implementations of this practice to date have been
> > tightly coupled with a particular application. As a result, few
> > general-purpose frameworks for secondary indexing on BigTable-like
> > databases exist, and those that do are tied to a particular
> implementation
> > of the BigTable model.
> >
> > There are several existing tools (Solr, Lily), but these are focused on
> > doing text based search and are highly restrictive to indexes created
> > through their framework. What if you want to use your existing indexes?
> Or
> > leverage the indexes to do complex queries?
> >
> > We developed a solution to this problem called Culvert that supports
> online
> > index updates as well as a variation of the HIVE query language. In
> > designing Culvert, we sought to make the solution pluggable so that it
> can
> > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > etc.). Furthermore, it is also easily extensible to existing, hand rolled
> > indexes.
> >
> > As well as being a secondary indexing framework, it is also a query
> > execution mechanism - think pig/hive minus the fancy command line. We
> > support a subset of SQL, but are able to take full advantage of
> home-rolled
> > and built-in indexes, leading to query execution times potentially orders
> > of magnitude smaller than existing approaches and certainly orders of
> > magnitude more easily.
> >
> > -- Jesse
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
> >
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

Wow, that's embarrassing - project not building...

It's because accumulo's release is no longer deployed into the standard
apache maven repository. Maybe one of the accumulo committers can shed some
light on where to find it?

I'll make some changes and have it at least compiling from the raw tonight
:)

The alternative is to download accumulo source (
https://github.com/apache/accumulo) and "mvn clean install" to get it
working on your local machine.

Thanks Ted!

-Jesse

On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the update, Jesse.
> Let us know of any feature Culvert needs from HBase.
>
> After cloning Culvert, I got:
>
> [INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 1:06.638s
> [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> [INFO] Final Memory: 20M/81M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> resolve dependencies for project
> com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
> apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
>
> Can someone provide hint ?
>
> On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Culvert was originally introduced at Hadoop Summit 2011, but recent
> updates
> > have made it very applicable to current systems. Recently, we added
> support
> > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > Summit, there have also been significant code cleanup and added some
> small
> > features. However, we found that most people hadn't heard of Culvert, so
> we
> > wanted to re-release the framework.
> >
> > For an introduction to using Culvert, check out the blog post here:
> > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> >
> > Also, the original presentation (where we discuss the internals) is
> > available on slideshare<
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > >
> > .
> >
> > There is a Culvert hackathon in the middle of January:
> > http://culverthackathon2012.eventbrite.com/
> >
> > Oh, and you can find the code on
> > github<https://github.com/booz-allen-hamilton/culvert>
> > .
> >
> > Below is an overview of why we wrote Culvert and what it does.
> >
> > Secondary indexing is a common design pattern in BigTable-like databases
> > that allows users to index one or more columns in a table. This technique
> > enables fast search of records in a database based on a particular column
> > instead of the row id, thus enabling relational-style semantics in a
> NoSQL
> > environment. Frequently, the index is stored either in a reserved
> namespace
> > in the table or another index table.
> >
> > Despite the fact that this is a common design pattern in BigTable-based
> > applications, most implementations of this practice to date have been
> > tightly coupled with a particular application. As a result, few
> > general-purpose frameworks for secondary indexing on BigTable-like
> > databases exist, and those that do are tied to a particular
> implementation
> > of the BigTable model.
> >
> > There are several existing tools (Solr, Lily), but these are focused on
> > doing text based search and are highly restrictive to indexes created
> > through their framework. What if you want to use your existing indexes?
> Or
> > leverage the indexes to do complex queries?
> >
> > We developed a solution to this problem called Culvert that supports
> online
> > index updates as well as a variation of the HIVE query language. In
> > designing Culvert, we sought to make the solution pluggable so that it
> can
> > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > etc.). Furthermore, it is also easily extensible to existing, hand rolled
> > indexes.
> >
> > As well as being a secondary indexing framework, it is also a query
> > execution mechanism - think pig/hive minus the fancy command line. We
> > support a subset of SQL, but are able to take full advantage of
> home-rolled
> > and built-in indexes, leading to query execution times potentially orders
> > of magnitude smaller than existing approaches and certainly orders of
> > magnitude more easily.
> >
> > -- Jesse
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
> >
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

Wow, that's embarrassing - project not building...

It's because accumulo's release is no longer deployed into the standard
apache maven repository. Maybe one of the accumulo committers can shed some
light on where to find it?

I'll make some changes and have it at least compiling from the raw tonight
:)

The alternative is to download accumulo source (
https://github.com/apache/accumulo) and "mvn clean install" to get it
working on your local machine.

Thanks Ted!

-Jesse

On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the update, Jesse.
> Let us know of any feature Culvert needs from HBase.
>
> After cloning Culvert, I got:
>
> [INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 1:06.638s
> [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> [INFO] Final Memory: 20M/81M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> resolve dependencies for project
> com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
> apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
>
> Can someone provide hint ?
>
> On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Culvert was originally introduced at Hadoop Summit 2011, but recent
> updates
> > have made it very applicable to current systems. Recently, we added
> support
> > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > Summit, there have also been significant code cleanup and added some
> small
> > features. However, we found that most people hadn't heard of Culvert, so
> we
> > wanted to re-release the framework.
> >
> > For an introduction to using Culvert, check out the blog post here:
> > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> >
> > Also, the original presentation (where we discuss the internals) is
> > available on slideshare<
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > >
> > .
> >
> > There is a Culvert hackathon in the middle of January:
> > http://culverthackathon2012.eventbrite.com/
> >
> > Oh, and you can find the code on
> > github<https://github.com/booz-allen-hamilton/culvert>
> > .
> >
> > Below is an overview of why we wrote Culvert and what it does.
> >
> > Secondary indexing is a common design pattern in BigTable-like databases
> > that allows users to index one or more columns in a table. This technique
> > enables fast search of records in a database based on a particular column
> > instead of the row id, thus enabling relational-style semantics in a
> NoSQL
> > environment. Frequently, the index is stored either in a reserved
> namespace
> > in the table or another index table.
> >
> > Despite the fact that this is a common design pattern in BigTable-based
> > applications, most implementations of this practice to date have been
> > tightly coupled with a particular application. As a result, few
> > general-purpose frameworks for secondary indexing on BigTable-like
> > databases exist, and those that do are tied to a particular
> implementation
> > of the BigTable model.
> >
> > There are several existing tools (Solr, Lily), but these are focused on
> > doing text based search and are highly restrictive to indexes created
> > through their framework. What if you want to use your existing indexes?
> Or
> > leverage the indexes to do complex queries?
> >
> > We developed a solution to this problem called Culvert that supports
> online
> > index updates as well as a variation of the HIVE query language. In
> > designing Culvert, we sought to make the solution pluggable so that it
> can
> > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > etc.). Furthermore, it is also easily extensible to existing, hand rolled
> > indexes.
> >
> > As well as being a secondary indexing framework, it is also a query
> > execution mechanism - think pig/hive minus the fancy command line. We
> > support a subset of SQL, but are able to take full advantage of
> home-rolled
> > and built-in indexes, leading to query execution times potentially orders
> > of magnitude smaller than existing approaches and certainly orders of
> > magnitude more easily.
> >
> > -- Jesse
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
> >
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Jesse Yates <je...@gmail.com>.

Wow, that's embarrassing - project not building...

It's because accumulo's release is no longer deployed into the standard
apache maven repository. Maybe one of the accumulo committers can shed some
light on where to find it?

I'll make some changes and have it at least compiling from the raw tonight
:)

The alternative is to download accumulo source (
https://github.com/apache/accumulo) and "mvn clean install" to get it
working on your local machine.

Thanks Ted!

-Jesse

On Thu, Dec 22, 2011 at 1:54 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the update, Jesse.
> Let us know of any feature Culvert needs from HBase.
>
> After cloning Culvert, I got:
>
> [INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 1:06.638s
> [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
> [INFO] Final Memory: 20M/81M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal on project culvert-accumulo: Could not
> resolve dependencies for project
> com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
> artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
> apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]
>
> Can someone provide hint ?
>
> On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.yates@gmail.com
> >wrote:
>
> > Culvert was originally introduced at Hadoop Summit 2011, but recent
> updates
> > have made it very applicable to current systems. Recently, we added
> support
> > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> > Summit, there have also been significant code cleanup and added some
> small
> > features. However, we found that most people hadn't heard of Culvert, so
> we
> > wanted to re-release the framework.
> >
> > For an introduction to using Culvert, check out the blog post here:
> > http://jyates.github.com/2011/11/17/intro-to-culvert.html
> >
> > Also, the original presentation (where we discuss the internals) is
> > available on slideshare<
> >
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> > >
> > .
> >
> > There is a Culvert hackathon in the middle of January:
> > http://culverthackathon2012.eventbrite.com/
> >
> > Oh, and you can find the code on
> > github<https://github.com/booz-allen-hamilton/culvert>
> > .
> >
> > Below is an overview of why we wrote Culvert and what it does.
> >
> > Secondary indexing is a common design pattern in BigTable-like databases
> > that allows users to index one or more columns in a table. This technique
> > enables fast search of records in a database based on a particular column
> > instead of the row id, thus enabling relational-style semantics in a
> NoSQL
> > environment. Frequently, the index is stored either in a reserved
> namespace
> > in the table or another index table.
> >
> > Despite the fact that this is a common design pattern in BigTable-based
> > applications, most implementations of this practice to date have been
> > tightly coupled with a particular application. As a result, few
> > general-purpose frameworks for secondary indexing on BigTable-like
> > databases exist, and those that do are tied to a particular
> implementation
> > of the BigTable model.
> >
> > There are several existing tools (Solr, Lily), but these are focused on
> > doing text based search and are highly restrictive to indexes created
> > through their framework. What if you want to use your existing indexes?
> Or
> > leverage the indexes to do complex queries?
> >
> > We developed a solution to this problem called Culvert that supports
> online
> > index updates as well as a variation of the HIVE query language. In
> > designing Culvert, we sought to make the solution pluggable so that it
> can
> > be used on any of the many BigTable-like databases (HBase, Cassandra,
> > etc.). Furthermore, it is also easily extensible to existing, hand rolled
> > indexes.
> >
> > As well as being a secondary indexing framework, it is also a query
> > execution mechanism - think pig/hive minus the fancy command line. We
> > support a subset of SQL, but are able to take full advantage of
> home-rolled
> > and built-in indexes, leading to query execution times potentially orders
> > of magnitude smaller than existing approaches and certainly orders of
> > magnitude more easily.
> >
> > -- Jesse
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
> >
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Ted Yu <yu...@gmail.com>.

Thanks for the update, Jesse.
Let us know of any feature Culvert needs from HBase.

After cloning Culvert, I got:

[INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 1:06.638s
[INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
[INFO] Final Memory: 20M/81M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal on project culvert-accumulo: Could not
resolve dependencies for project
com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]

Can someone provide hint ?

On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <je...@gmail.com>wrote:

> Culvert was originally introduced at Hadoop Summit 2011, but recent updates
> have made it very applicable to current systems. Recently, we added support
> for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> Summit, there have also been significant code cleanup and added some small
> features. However, we found that most people hadn't heard of Culvert, so we
> wanted to re-release the framework.
>
> For an introduction to using Culvert, check out the blog post here:
> http://jyates.github.com/2011/11/17/intro-to-culvert.html
>
> Also, the original presentation (where we discuss the internals) is
> available on slideshare<
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> >
> .
>
> There is a Culvert hackathon in the middle of January:
> http://culverthackathon2012.eventbrite.com/
>
> Oh, and you can find the code on
> github<https://github.com/booz-allen-hamilton/culvert>
> .
>
> Below is an overview of why we wrote Culvert and what it does.
>
> Secondary indexing is a common design pattern in BigTable-like databases
> that allows users to index one or more columns in a table. This technique
> enables fast search of records in a database based on a particular column
> instead of the row id, thus enabling relational-style semantics in a NoSQL
> environment. Frequently, the index is stored either in a reserved namespace
> in the table or another index table.
>
> Despite the fact that this is a common design pattern in BigTable-based
> applications, most implementations of this practice to date have been
> tightly coupled with a particular application. As a result, few
> general-purpose frameworks for secondary indexing on BigTable-like
> databases exist, and those that do are tied to a particular implementation
> of the BigTable model.
>
> There are several existing tools (Solr, Lily), but these are focused on
> doing text based search and are highly restrictive to indexes created
> through their framework. What if you want to use your existing indexes? Or
> leverage the indexes to do complex queries?
>
> We developed a solution to this problem called Culvert that supports online
> index updates as well as a variation of the HIVE query language. In
> designing Culvert, we sought to make the solution pluggable so that it can
> be used on any of the many BigTable-like databases (HBase, Cassandra,
> etc.). Furthermore, it is also easily extensible to existing, hand rolled
> indexes.
>
> As well as being a secondary indexing framework, it is also a query
> execution mechanism - think pig/hive minus the fancy command line. We
> support a subset of SQL, but are able to take full advantage of home-rolled
> and built-in indexes, leading to query execution times potentially orders
> of magnitude smaller than existing approaches and certainly orders of
> magnitude more easily.
>
> -- Jesse
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates
>

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Ted Yu <yu...@gmail.com>.

Thanks for the update, Jesse.
Let us know of any feature Culvert needs from HBase.

After cloning Culvert, I got:

[INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 1:06.638s
[INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
[INFO] Final Memory: 20M/81M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal on project culvert-accumulo: Could not
resolve dependencies for project
com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]

Can someone provide hint ?

On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <je...@gmail.com>wrote:

> Culvert was originally introduced at Hadoop Summit 2011, but recent updates
> have made it very applicable to current systems. Recently, we added support
> for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> Summit, there have also been significant code cleanup and added some small
> features. However, we found that most people hadn't heard of Culvert, so we
> wanted to re-release the framework.
>
> For an introduction to using Culvert, check out the blog post here:
> http://jyates.github.com/2011/11/17/intro-to-culvert.html
>
> Also, the original presentation (where we discuss the internals) is
> available on slideshare<
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> >
> .
>
> There is a Culvert hackathon in the middle of January:
> http://culverthackathon2012.eventbrite.com/
>
> Oh, and you can find the code on
> github<https://github.com/booz-allen-hamilton/culvert>
> .
>
> Below is an overview of why we wrote Culvert and what it does.
>
> Secondary indexing is a common design pattern in BigTable-like databases
> that allows users to index one or more columns in a table. This technique
> enables fast search of records in a database based on a particular column
> instead of the row id, thus enabling relational-style semantics in a NoSQL
> environment. Frequently, the index is stored either in a reserved namespace
> in the table or another index table.
>
> Despite the fact that this is a common design pattern in BigTable-based
> applications, most implementations of this practice to date have been
> tightly coupled with a particular application. As a result, few
> general-purpose frameworks for secondary indexing on BigTable-like
> databases exist, and those that do are tied to a particular implementation
> of the BigTable model.
>
> There are several existing tools (Solr, Lily), but these are focused on
> doing text based search and are highly restrictive to indexes created
> through their framework. What if you want to use your existing indexes? Or
> leverage the indexes to do complex queries?
>
> We developed a solution to this problem called Culvert that supports online
> index updates as well as a variation of the HIVE query language. In
> designing Culvert, we sought to make the solution pluggable so that it can
> be used on any of the many BigTable-like databases (HBase, Cassandra,
> etc.). Furthermore, it is also easily extensible to existing, hand rolled
> indexes.
>
> As well as being a secondary indexing framework, it is also a query
> execution mechanism - think pig/hive minus the fancy command line. We
> support a subset of SQL, but are able to take full advantage of home-rolled
> and built-in indexes, leading to query execution times potentially orders
> of magnitude smaller than existing approaches and certainly orders of
> magnitude more easily.
>
> -- Jesse
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates
>

Re: (Re)Introducing Culvert - A secondary indexing framework for BigTable like systems

Posted by Ted Yu <yu...@gmail.com>.

Thanks for the update, Jesse.
Let us know of any feature Culvert needs from HBase.

After cloning Culvert, I got:

[INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 1:06.638s
[INFO] Finished at: Thu Dec 22 13:51:34 PST 2011
[INFO] Final Memory: 20M/81M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal on project culvert-accumulo: Could not
resolve dependencies for project
com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find
artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in
apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1]

Can someone provide hint ?

On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <je...@gmail.com>wrote:

> Culvert was originally introduced at Hadoop Summit 2011, but recent updates
> have made it very applicable to current systems. Recently, we added support
> for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop
> Summit, there have also been significant code cleanup and added some small
> features. However, we found that most people hadn't heard of Culvert, so we
> wanted to re-release the framework.
>
> For an introduction to using Culvert, check out the blog post here:
> http://jyates.github.com/2011/11/17/intro-to-culvert.html
>
> Also, the original presentation (where we discuss the internals) is
> available on slideshare<
> http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data
> >
> .
>
> There is a Culvert hackathon in the middle of January:
> http://culverthackathon2012.eventbrite.com/
>
> Oh, and you can find the code on
> github<https://github.com/booz-allen-hamilton/culvert>
> .
>
> Below is an overview of why we wrote Culvert and what it does.
>
> Secondary indexing is a common design pattern in BigTable-like databases
> that allows users to index one or more columns in a table. This technique
> enables fast search of records in a database based on a particular column
> instead of the row id, thus enabling relational-style semantics in a NoSQL
> environment. Frequently, the index is stored either in a reserved namespace
> in the table or another index table.
>
> Despite the fact that this is a common design pattern in BigTable-based
> applications, most implementations of this practice to date have been
> tightly coupled with a particular application. As a result, few
> general-purpose frameworks for secondary indexing on BigTable-like
> databases exist, and those that do are tied to a particular implementation
> of the BigTable model.
>
> There are several existing tools (Solr, Lily), but these are focused on
> doing text based search and are highly restrictive to indexes created
> through their framework. What if you want to use your existing indexes? Or
> leverage the indexes to do complex queries?
>
> We developed a solution to this problem called Culvert that supports online
> index updates as well as a variation of the HIVE query language. In
> designing Culvert, we sought to make the solution pluggable so that it can
> be used on any of the many BigTable-like databases (HBase, Cassandra,
> etc.). Furthermore, it is also easily extensible to existing, hand rolled
> indexes.
>
> As well as being a secondary indexing framework, it is also a query
> execution mechanism - think pig/hive minus the fancy command line. We
> support a subset of SQL, but are able to take full advantage of home-rolled
> and built-in indexes, leading to query execution times potentially orders
> of magnitude smaller than existing approaches and certainly orders of
> magnitude more easily.
>
> -- Jesse
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates
>