You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by mv...@comcast.net on 2011/10/26 06:17:04 UTC

Camel Accumulo


Hi there. 



I recently read about your project and like the direction it is taking.  Currently, I am a committer to another incubator project, Kalumet, and have been a long-time contributor to the Karaf and Camel projects. After talking with a few of the Accumulo project members, it looks like the most immediate hurdle is growing the user community.  I believe I can help with that.  



Making associations between incubator projects and top-level-projects has been a proven mechanism to pique developer interest and garner more dedicated contributors and commiters.  Because of the wide integration of Camel and NOSQL databases, creating a Camel component for interaction with Accumulo seems like a no-brainer.  



In order to help grow the Accumulo community, tonight I began writing a camel-accumulo component.  This will allow Camel users to route files to Accumulo in the same manner as they currently route files to HDFS .  





For some background, Camel is the open-source implementation of Enterprise Integration Patterns. Most modern ESB's use OSGI and Camel to perform routing and orchestration of data through endpoints. Camel has been written to allow various technologies to create Camel Components that folks can use when they define the route that a given file or data will be processed through. In this model, users would define a "route" in Camel that contains various Accumulo endpoints for reading, writing, or mutating data persisted through Accumulo. To make this work, I need to define a URI that folks will use.  Would you folks be able to help me define the URI and URL parameters? 



Right now, I'm using the URI "accum".  For the first iteration of this component, I'm thinking it would be simplest to create an endpoint folks could write a single re cord to. Then, follow it up with a scan, and mutate  components. Once these are done, I'd like to do the batch-versions of these operations. 



In Camel, an endpoint usually looks similar to a web-service endpoint: URI://location/service ?[arg1=value1][&arg<x>=value<x>] 



With this in mind, I'm thinking the following would be the minimally acceptable Camel-Accumulo endpoint for simplistic write  operations: 

accum://location /write ?zookeeperName=value&\ 

tableName=value& \ 

userName=value&\ 

password=value&\ 

userPrivs=value&\ 

scanAuths=value&\ 

debug=value 



Does this contain a c omplete listing of the properties? A m I missing anything, did I put something in that's not needed, or are there other options a user should be able to pass in? Also, is the URI of "accum" ok for this camel component?  



Because Camel is written to play nicely inside of OSGI (like Karaf/Felix), the .jar files this camel component relies on should be bundle-ized. This shouldn't be too hard to do, and as a Karaf contributor, I've done this with hundreds of third party .jar files. Basically, we would replace the maven-jar-plugin with a light implementation of the maven-bundle-plugin along with some fairly generic attributes.  If you folks would like, I can do this for you on a seperate branch so that you can test it.  



Mike Van 

Committer - ASF Kalumet 

Contributor - ASF Karaf, Camel

Re: Camel Accumulo

Posted by Eric Newton <er...@gmail.com>.
I'm not a maven/packaging expert... so we'll have a discussion/vote in the
ticket.  A patch would be awesome.

I did see your ticket, and it should be fixed.  If you find it works for
you, please close the ticket.  We only know of one other person using
accumulo under cygwin: expect other glitches.

Is it common to do a sandbox in an apache project?

-Eric

On Wed, Oct 26, 2011 at 11:08 AM, <mv...@comcast.net> wrote:

>
>
> Jesse,
>
>
>
> The only difference between a bundle and a .jar is the inclusion of
> additional fields in the MANIFEST.MF file. I think there's a way to allow
> the maven-bundle-plugin to create this file, and then to include it into a
> .jar file created by the maven-jar-plugin. Basically, after the code is
> compiled, the maven-bundle-plugin goes through all of the compiled .class
> files, and creates a set of Import-Package statements. Then, you can either
> have the bundle plugin export everything automatically, or specificy the
> packages you want to export.
>
>
>
> If you'd like, tonight I can implement this in the parent pom.xml file and
> submit a patch for your review.
>
>
>
> Also, did anyone take a look at the Jira ticket I created last night
> referencing exec plugin attempting to find /bin/bash?  I'd really like to be
> able to compile accumulo and get it working in a stand-alone mode with a
> single hadoop and zookeeper instance to allow me to test the camel-accumulo
> component once it is complete.
>
>
>
> Lastly, do you folks envision creating a "sandbox" area in your svn
> repository?  This would allow me to commit the camel component along with
> examples and tests to allow you folks to review it.
>
>
>
> ----- Original Message -----
>
>
> From: "Jesse Yates" <je...@gmail.com>
> To: accumulo-dev@incubator.apache.org
> Sent: Wednesday, October 26, 2011 10:37:16 AM
> Subject: Re: Camel Accumulo
>
> Comments inline.
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates
>
> On Wed, Oct 26, 2011 at 6:39 AM, <mv...@comcast.net> wrote:
>
> >
> >
> > Eric,
> >
> >
> >
> > I'm confused about the need for a username/password. Does Accumulo
> validate
> > against that, or does it accept a set of priveledges for a user?
>
>
> It checks that user and password against an internal db of users, giving
> the
> user ability to read from a subset of the permissions on the system
> ('Authorizations' in Accumulo parlance).
>
>
> > Also, in your discussion of zookeeper, what does the "zookeepers"
> attribute
> > represent?
> >
>
> The address of the zookeeper servers keeping track of the accumulo
> instance.
> This is the 'root' truth of where things are for clients.
>
>
> >
> >
> >
> > For my first stab at this, I'm using the ReadandWrite example to show me
> > how to connect to Accumulo and perform reads/writes .  Is there a better
> one
> > that I should be using?  This example class seems pretty heavy-weight...
> >
> >
> It might be a little bit heavy, but I would rather have a complete example,
> than a bunch of scattered sub-pieces. I think once you have the connection
> setup, the actual read/write shouldn't be too bad.
>
>
> >
> >
> > For the table name, I'd like to keep the camel component as generic as
> > possible. So, to allow for this, I'm going to require folks to pass in
> the
> > table name  on their accumulate url. Is this ok?
> >
> >
> >
> > Full Ack on the "accumulate" versus "accum" uri.  The only reason I
> > attempted to shorten it is that looks like the trend for other
> > camel-components.  That said, I can live with a longer name if it makes
> it
> > easier to use.  Along these lines, I've thought that having a different
> uri
> > for reads, writes, and mutates would be an easy way to handle
> interactions
> > through the camel component. To help with this, I'm thinking that he
> > following uri's would be use ful (developed in this order) :
> >
> > accumulate-read
> >
> > accumulate-write
> >
> > accumulate-mutate
> >
> > accumulate-batch-read
> >
> > accumulate-batch-write
> >
> >
> >
> > How does that sound?
> >
>
> shouldn't this be 'accumulo' or is this for implying accumulation of
> information via Camel?
>
>
> >
> >
> >
> > Also, how do you feel about changing the build so that it creates bundles
> > instead of .jar files?  All a bundle is, is a .jar file with a
> MANIFEST.MF
> > file that has some extra attributes.  The maven-bundle-plugin can take
> care
> > of that pretty easily, but it will change the packaging type from jar to
> > bundle.  This would really only impact folks who are looking for a
> packaging
> > type in thier .pom file dependencies, which is not common.
> >
> >
> I would argue vehemently against changing the core build for an external
> process (no offense mike). I think it would be okay if we have a profile in
> the build that builds a bundle, but since that is a pretty uncommon distro,
> people will easily get confused, leading to lower adoption, lots of
> questions on user@, etc. Yes, its a low likelyhood, but I would rather
> make
> it is as easy as possible to 'do the right thing'.
>
> Camel (and other systems that need a bundle) are a special case, so already
> people are going to be be a little more advanced and expect slightly
> 'special' versions.
>
>
>
> >
> >
> > Mike Van
> >
> >
> >
> > ----- Original Message -----
> >
> >
> > From: "Eric Newton" <er...@gmail.com>
> > To: accumulo-dev@incubator.apache.org
> > Sent: Wednesday, October 26, 2011 9:06:19 AM
> > Subject: Re: Camel Accumulo
> >
> > Hi Mike,
> >
> > Scan auths are needed for reading, not for writing.
> >
> > Looking at constructors and factory methods looks like we'll need:
> >
> > ZooKeeperInstance:
> >     zookeepers
> >     instance name
> >
> > getConnector():
> >     username
> >     password
> >
> > getMultiTableBatchWriter():
> >      memory
> >      latency
> >      write threads
> >
> > This would allow you to stream in tuples of (table, row, cf, cq,
> > visibility,
> > timestamp).  If you already have a connector for HBase, we would probably
> > want something compatible with that, and that is probably table oriented
> > (just guessing, being unfamiliar with Camel).  So, perhaps if you specify
> a
> > table you would accept tuples of (row, cf, cq, visibility, timestamp).
>  You
> > could make visibility and timestamp optional, too.
> >
> > A minimal uri would look something like this?  You could provide
> reasonable
> > defaults for the other arguments.
> >
> >
> > accumulo://instance/write?zookeeper=zoohost&userName=root&password=secret
> >
> > Unfortunately, "accumulo" doesn't have a nice short abbreviation.  I
> would
> > lean towards using the whole word, but that's just a personal preference.
> >
> > -Eric
> >
> > On Wed, Oct 26, 2011 at 12:17 AM, <mv...@comcast.net> wrote:
> >
> > >
> > >
> > > Hi there.
> > >
> > >
> > >
> > > I recently read about your project and like the direction it is taking.
> > > Currently, I am a committer to another incubator project, Kalumet, and
> > have
> > > been a long-time contributor to the Karaf and Camel projects. After
> > talking
> > > with a few of the Accumulo project members, it looks like the most
> > immediate
> > > hurdle is growing the user community.  I believe I can help with that.
> > >
> > >
> > >
> > > Making associations between incubator projects and top-level-projects
> has
> > > been a proven mechanism to pique developer interest and garner more
> > > dedicated contributors and commiters.  Because of the wide integration
> of
> > > Camel and NOSQL databases, creating a Camel component for interaction
> > with
> > > Accumulo seems like a no-brainer.
> > >
> > >
> > >
> > > In order to help grow the Accumulo community, tonight I began writing a
> > > camel-accumulo component.  This will allow Camel users to route files
> to
> > > Accumulo in the same manner as they currently route files to HDFS .
> > >
> > >
> > >
> > >
> > >
> > > For some background, Camel is the open-source implementation of
> > Enterprise
> > > Integration Patterns. Most modern ESB's use OSGI and Camel to perform
> > > routing and orchestration of data through endpoints. Camel has been
> > written
> > > to allow various technologies to create Camel Components that folks can
> > use
> > > when they define the route that a given file or data will be processed
> > > through. In this model, users would define a "route" in Camel that
> > contains
> > > various Accumulo endpoints for reading, writing, or mutating data
> > persisted
> > > through Accumulo. To make this work, I need to define a URI that folks
> > will
> > > use.  Would you folks be able to help me define the URI and URL
> > parameters?
> > >
> > >
> > >
> > > Right now, I'm using the URI "accum".  For the first iteration of this
> > > component, I'm thinking it would be simplest to create an endpoint
> folks
> > > could write a single re cord to. Then, follow it up with a scan, and
> > mutate
> > >  components. Once these are done, I'd like to do the batch-versions of
> > these
> > > operations.
> > >
> > >
> > >
> > > In Camel, an endpoint usually looks similar to a web-service endpoint:
> > > URI://location/service ?[arg1=value1][&arg<x>=value<x>]
> > >
> > >
> > >
> > > With this in mind, I'm thinking the following would be the minimally
> > > acceptable Camel-Accumulo endpoint for simplistic write  operations:
> > >
> > > accum://location /write ?zookeeperName=value&\
> > >
> > > tableName=value& \
> > >
> > > userName=value&\
> > >
> > > password=value&\
> > >
> > > userPrivs=value&\
> > >
> > > scanAuths=value&\
> > >
> > > debug=value
> > >
> > >
> > >
> > > Does this contain a c omplete listing of the properties? A m I missing
> > > anything, did I put something in that's not needed, or are there other
> > > options a user should be able to pass in? Also, is the URI of "accum"
> ok
> > for
> > > this camel component?
> > >
> > >
> > >
> > > Because Camel is written to play nicely inside of OSGI (like
> > Karaf/Felix),
> > > the .jar files this camel component relies on should be bundle-ized.
> This
> > > shouldn't be too hard to do, and as a Karaf contributor, I've done this
> > with
> > > hundreds of third party .jar files. Basically, we would replace the
> > > maven-jar-plugin with a light implementation of the maven-bundle-plugin
> > > along with some fairly generic attributes.  If you folks would like, I
> > can
> > > do this for you on a seperate branch so that you can test it.
> > >
> > >
> > >
> > > Mike Van
> > >
> > > Committer - ASF Kalumet
> > >
> > > Contributor - ASF Karaf, Camel
> >
>

Re: Camel Accumulo

Posted by mv...@comcast.net.

Jesse, 



The only difference between a bundle and a .jar is the inclusion of additional fields in the MANIFEST.MF file. I think there's a way to allow the maven-bundle-plugin to create this file, and then to include it into a .jar file created by the maven-jar-plugin. Basically, after the code is compiled, the maven-bundle-plugin goes through all of the compiled .class files, and creates a set of Import-Package statements. Then, you can either have the bundle plugin export everything automatically, or specificy the packages you want to export. 



If you'd like, tonight I can implement this in the parent pom.xml file and submit a patch for your review. 



Also, did anyone take a look at the Jira ticket I created last night referencing exec plugin attempting to find /bin/bash?  I'd really like to be able to compile accumulo and get it working in a stand-alone mode with a single hadoop and zookeeper instance to allow me to test the camel-accumulo component once it is complete. 



Lastly, do you folks envision creating a "sandbox" area in your svn repository?  This would allow me to commit the camel component along with examples and tests to allow you folks to review it. 



----- Original Message -----


From: "Jesse Yates" <je...@gmail.com> 
To: accumulo-dev@incubator.apache.org 
Sent: Wednesday, October 26, 2011 10:37:16 AM 
Subject: Re: Camel Accumulo 

Comments inline. 
------------------- 
Jesse Yates 
240-888-2200 
@jesse_yates 

On Wed, Oct 26, 2011 at 6:39 AM, <mv...@comcast.net> wrote: 

> 
> 
> Eric, 
> 
> 
> 
> I'm confused about the need for a username/password. Does Accumulo validate 
> against that, or does it accept a set of priveledges for a user? 


It checks that user and password against an internal db of users, giving the 
user ability to read from a subset of the permissions on the system 
('Authorizations' in Accumulo parlance). 


> Also, in your discussion of zookeeper, what does the "zookeepers" attribute 
> represent? 
> 

The address of the zookeeper servers keeping track of the accumulo instance. 
This is the 'root' truth of where things are for clients. 


> 
> 
> 
> For my first stab at this, I'm using the ReadandWrite example to show me 
> how to connect to Accumulo and perform reads/writes .  Is there a better one 
> that I should be using?  This example class seems pretty heavy-weight... 
> 
> 
It might be a little bit heavy, but I would rather have a complete example, 
than a bunch of scattered sub-pieces. I think once you have the connection 
setup, the actual read/write shouldn't be too bad. 


> 
> 
> For the table name, I'd like to keep the camel component as generic as 
> possible. So, to allow for this, I'm going to require folks to pass in the 
> table name  on their accumulate url. Is this ok? 
> 
> 
> 
> Full Ack on the "accumulate" versus "accum" uri.  The only reason I 
> attempted to shorten it is that looks like the trend for other 
> camel-components.  That said, I can live with a longer name if it makes it 
> easier to use.  Along these lines, I've thought that having a different uri 
> for reads, writes, and mutates would be an easy way to handle interactions 
> through the camel component. To help with this, I'm thinking that he 
> following uri's would be use ful (developed in this order) : 
> 
> accumulate-read 
> 
> accumulate-write 
> 
> accumulate-mutate 
> 
> accumulate-batch-read 
> 
> accumulate-batch-write 
> 
> 
> 
> How does that sound? 
> 

shouldn't this be 'accumulo' or is this for implying accumulation of 
information via Camel? 


> 
> 
> 
> Also, how do you feel about changing the build so that it creates bundles 
> instead of .jar files?  All a bundle is, is a .jar file with a MANIFEST.MF 
> file that has some extra attributes.  The maven-bundle-plugin can take care 
> of that pretty easily, but it will change the packaging type from jar to 
> bundle.  This would really only impact folks who are looking for a packaging 
> type in thier .pom file dependencies, which is not common. 
> 
> 
I would argue vehemently against changing the core build for an external 
process (no offense mike). I think it would be okay if we have a profile in 
the build that builds a bundle, but since that is a pretty uncommon distro, 
people will easily get confused, leading to lower adoption, lots of 
questions on user@, etc. Yes, its a low likelyhood, but I would rather make 
it is as easy as possible to 'do the right thing'. 

Camel (and other systems that need a bundle) are a special case, so already 
people are going to be be a little more advanced and expect slightly 
'special' versions. 



> 
> 
> Mike Van 
> 
> 
> 
> ----- Original Message ----- 
> 
> 
> From: "Eric Newton" <er...@gmail.com> 
> To: accumulo-dev@incubator.apache.org 
> Sent: Wednesday, October 26, 2011 9:06:19 AM 
> Subject: Re: Camel Accumulo 
> 
> Hi Mike, 
> 
> Scan auths are needed for reading, not for writing. 
> 
> Looking at constructors and factory methods looks like we'll need: 
> 
> ZooKeeperInstance: 
>     zookeepers 
>     instance name 
> 
> getConnector(): 
>     username 
>     password 
> 
> getMultiTableBatchWriter(): 
>      memory 
>      latency 
>      write threads 
> 
> This would allow you to stream in tuples of (table, row, cf, cq, 
> visibility, 
> timestamp).  If you already have a connector for HBase, we would probably 
> want something compatible with that, and that is probably table oriented 
> (just guessing, being unfamiliar with Camel).  So, perhaps if you specify a 
> table you would accept tuples of (row, cf, cq, visibility, timestamp).  You 
> could make visibility and timestamp optional, too. 
> 
> A minimal uri would look something like this?  You could provide reasonable 
> defaults for the other arguments. 
> 
> 
> accumulo://instance/write?zookeeper=zoohost&userName=root&password=secret 
> 
> Unfortunately, "accumulo" doesn't have a nice short abbreviation.  I would 
> lean towards using the whole word, but that's just a personal preference. 
> 
> -Eric 
> 
> On Wed, Oct 26, 2011 at 12:17 AM, <mv...@comcast.net> wrote: 
> 
> > 
> > 
> > Hi there. 
> > 
> > 
> > 
> > I recently read about your project and like the direction it is taking. 
> > Currently, I am a committer to another incubator project, Kalumet, and 
> have 
> > been a long-time contributor to the Karaf and Camel projects. After 
> talking 
> > with a few of the Accumulo project members, it looks like the most 
> immediate 
> > hurdle is growing the user community.  I believe I can help with that. 
> > 
> > 
> > 
> > Making associations between incubator projects and top-level-projects has 
> > been a proven mechanism to pique developer interest and garner more 
> > dedicated contributors and commiters.  Because of the wide integration of 
> > Camel and NOSQL databases, creating a Camel component for interaction 
> with 
> > Accumulo seems like a no-brainer. 
> > 
> > 
> > 
> > In order to help grow the Accumulo community, tonight I began writing a 
> > camel-accumulo component.  This will allow Camel users to route files to 
> > Accumulo in the same manner as they currently route files to HDFS . 
> > 
> > 
> > 
> > 
> > 
> > For some background, Camel is the open-source implementation of 
> Enterprise 
> > Integration Patterns. Most modern ESB's use OSGI and Camel to perform 
> > routing and orchestration of data through endpoints. Camel has been 
> written 
> > to allow various technologies to create Camel Components that folks can 
> use 
> > when they define the route that a given file or data will be processed 
> > through. In this model, users would define a "route" in Camel that 
> contains 
> > various Accumulo endpoints for reading, writing, or mutating data 
> persisted 
> > through Accumulo. To make this work, I need to define a URI that folks 
> will 
> > use.  Would you folks be able to help me define the URI and URL 
> parameters? 
> > 
> > 
> > 
> > Right now, I'm using the URI "accum".  For the first iteration of this 
> > component, I'm thinking it would be simplest to create an endpoint folks 
> > could write a single re cord to. Then, follow it up with a scan, and 
> mutate 
> >  components. Once these are done, I'd like to do the batch-versions of 
> these 
> > operations. 
> > 
> > 
> > 
> > In Camel, an endpoint usually looks similar to a web-service endpoint: 
> > URI://location/service ?[arg1=value1][&arg<x>=value<x>] 
> > 
> > 
> > 
> > With this in mind, I'm thinking the following would be the minimally 
> > acceptable Camel-Accumulo endpoint for simplistic write  operations: 
> > 
> > accum://location /write ?zookeeperName=value&\ 
> > 
> > tableName=value& \ 
> > 
> > userName=value&\ 
> > 
> > password=value&\ 
> > 
> > userPrivs=value&\ 
> > 
> > scanAuths=value&\ 
> > 
> > debug=value 
> > 
> > 
> > 
> > Does this contain a c omplete listing of the properties? A m I missing 
> > anything, did I put something in that's not needed, or are there other 
> > options a user should be able to pass in? Also, is the URI of "accum" ok 
> for 
> > this camel component? 
> > 
> > 
> > 
> > Because Camel is written to play nicely inside of OSGI (like 
> Karaf/Felix), 
> > the .jar files this camel component relies on should be bundle-ized. This 
> > shouldn't be too hard to do, and as a Karaf contributor, I've done this 
> with 
> > hundreds of third party .jar files. Basically, we would replace the 
> > maven-jar-plugin with a light implementation of the maven-bundle-plugin 
> > along with some fairly generic attributes.  If you folks would like, I 
> can 
> > do this for you on a seperate branch so that you can test it. 
> > 
> > 
> > 
> > Mike Van 
> > 
> > Committer - ASF Kalumet 
> > 
> > Contributor - ASF Karaf, Camel 
> 

Re: Camel Accumulo

Posted by mv...@comcast.net.

Oops, I used "accumulate" instead of "accumulo".  My intent is to create accumulo processors, not accumulate. Thank you for pointing out my error. :-) 







----- Original Message -----


From: "Jesse Yates" <je...@gmail.com> 
To: accumulo-dev@incubator.apache.org 
Sent: Wednesday, October 26, 2011 10:37:16 AM 
Subject: Re: Camel Accumulo 

Comments inline. 
------------------- 
Jesse Yates 
240-888-2200 
@jesse_yates 

On Wed, Oct 26, 2011 at 6:39 AM, <mv...@comcast.net> wrote: 

> 
> 
> Eric, 
> 
> 
> 
> I'm confused about the need for a username/password. Does Accumulo validate 
> against that, or does it accept a set of priveledges for a user? 


It checks that user and password against an internal db of users, giving the 
user ability to read from a subset of the permissions on the system 
('Authorizations' in Accumulo parlance). 


> Also, in your discussion of zookeeper, what does the "zookeepers" attribute 
> represent? 
> 

The address of the zookeeper servers keeping track of the accumulo instance. 
This is the 'root' truth of where things are for clients. 


> 
> 
> 
> For my first stab at this, I'm using the ReadandWrite example to show me 
> how to connect to Accumulo and perform reads/writes .  Is there a better one 
> that I should be using?  This example class seems pretty heavy-weight... 
> 
> 
It might be a little bit heavy, but I would rather have a complete example, 
than a bunch of scattered sub-pieces. I think once you have the connection 
setup, the actual read/write shouldn't be too bad. 


> 
> 
> For the table name, I'd like to keep the camel component as generic as 
> possible. So, to allow for this, I'm going to require folks to pass in the 
> table name  on their accumulate url. Is this ok? 
> 
> 
> 
> Full Ack on the "accumulate" versus "accum" uri.  The only reason I 
> attempted to shorten it is that looks like the trend for other 
> camel-components.  That said, I can live with a longer name if it makes it 
> easier to use.  Along these lines, I've thought that having a different uri 
> for reads, writes, and mutates would be an easy way to handle interactions 
> through the camel component. To help with this, I'm thinking that he 
> following uri's would be use ful (developed in this order) : 
> 
> accumulate-read 
> 
> accumulate-write 
> 
> accumulate-mutate 
> 
> accumulate-batch-read 
> 
> accumulate-batch-write 
> 
> 
> 
> How does that sound? 
> 

shouldn't this be 'accumulo' or is this for implying accumulation of 
information via Camel? 


> 
> 
> 
> Also, how do you feel about changing the build so that it creates bundles 
> instead of .jar files?  All a bundle is, is a .jar file with a MANIFEST.MF 
> file that has some extra attributes.  The maven-bundle-plugin can take care 
> of that pretty easily, but it will change the packaging type from jar to 
> bundle.  This would really only impact folks who are looking for a packaging 
> type in thier .pom file dependencies, which is not common. 
> 
> 
I would argue vehemently against changing the core build for an external 
process (no offense mike). I think it would be okay if we have a profile in 
the build that builds a bundle, but since that is a pretty uncommon distro, 
people will easily get confused, leading to lower adoption, lots of 
questions on user@, etc. Yes, its a low likelyhood, but I would rather make 
it is as easy as possible to 'do the right thing'. 

Camel (and other systems that need a bundle) are a special case, so already 
people are going to be be a little more advanced and expect slightly 
'special' versions. 



> 
> 
> Mike Van 
> 
> 
> 
> ----- Original Message ----- 
> 
> 
> From: "Eric Newton" <er...@gmail.com> 
> To: accumulo-dev@incubator.apache.org 
> Sent: Wednesday, October 26, 2011 9:06:19 AM 
> Subject: Re: Camel Accumulo 
> 
> Hi Mike, 
> 
> Scan auths are needed for reading, not for writing. 
> 
> Looking at constructors and factory methods looks like we'll need: 
> 
> ZooKeeperInstance: 
>     zookeepers 
>     instance name 
> 
> getConnector(): 
>     username 
>     password 
> 
> getMultiTableBatchWriter(): 
>      memory 
>      latency 
>      write threads 
> 
> This would allow you to stream in tuples of (table, row, cf, cq, 
> visibility, 
> timestamp).  If you already have a connector for HBase, we would probably 
> want something compatible with that, and that is probably table oriented 
> (just guessing, being unfamiliar with Camel).  So, perhaps if you specify a 
> table you would accept tuples of (row, cf, cq, visibility, timestamp).  You 
> could make visibility and timestamp optional, too. 
> 
> A minimal uri would look something like this?  You could provide reasonable 
> defaults for the other arguments. 
> 
> 
> accumulo://instance/write?zookeeper=zoohost&userName=root&password=secret 
> 
> Unfortunately, "accumulo" doesn't have a nice short abbreviation.  I would 
> lean towards using the whole word, but that's just a personal preference. 
> 
> -Eric 
> 
> On Wed, Oct 26, 2011 at 12:17 AM, <mv...@comcast.net> wrote: 
> 
> > 
> > 
> > Hi there. 
> > 
> > 
> > 
> > I recently read about your project and like the direction it is taking. 
> > Currently, I am a committer to another incubator project, Kalumet, and 
> have 
> > been a long-time contributor to the Karaf and Camel projects. After 
> talking 
> > with a few of the Accumulo project members, it looks like the most 
> immediate 
> > hurdle is growing the user community.  I believe I can help with that. 
> > 
> > 
> > 
> > Making associations between incubator projects and top-level-projects has 
> > been a proven mechanism to pique developer interest and garner more 
> > dedicated contributors and commiters.  Because of the wide integration of 
> > Camel and NOSQL databases, creating a Camel component for interaction 
> with 
> > Accumulo seems like a no-brainer. 
> > 
> > 
> > 
> > In order to help grow the Accumulo community, tonight I began writing a 
> > camel-accumulo component.  This will allow Camel users to route files to 
> > Accumulo in the same manner as they currently route files to HDFS . 
> > 
> > 
> > 
> > 
> > 
> > For some background, Camel is the open-source implementation of 
> Enterprise 
> > Integration Patterns. Most modern ESB's use OSGI and Camel to perform 
> > routing and orchestration of data through endpoints. Camel has been 
> written 
> > to allow various technologies to create Camel Components that folks can 
> use 
> > when they define the route that a given file or data will be processed 
> > through. In this model, users would define a "route" in Camel that 
> contains 
> > various Accumulo endpoints for reading, writing, or mutating data 
> persisted 
> > through Accumulo. To make this work, I need to define a URI that folks 
> will 
> > use.  Would you folks be able to help me define the URI and URL 
> parameters? 
> > 
> > 
> > 
> > Right now, I'm using the URI "accum".  For the first iteration of this 
> > component, I'm thinking it would be simplest to create an endpoint folks 
> > could write a single re cord to. Then, follow it up with a scan, and 
> mutate 
> >  components. Once these are done, I'd like to do the batch-versions of 
> these 
> > operations. 
> > 
> > 
> > 
> > In Camel, an endpoint usually looks similar to a web-service endpoint: 
> > URI://location/service ?[arg1=value1][&arg<x>=value<x>] 
> > 
> > 
> > 
> > With this in mind, I'm thinking the following would be the minimally 
> > acceptable Camel-Accumulo endpoint for simplistic write  operations: 
> > 
> > accum://location /write ?zookeeperName=value&\ 
> > 
> > tableName=value& \ 
> > 
> > userName=value&\ 
> > 
> > password=value&\ 
> > 
> > userPrivs=value&\ 
> > 
> > scanAuths=value&\ 
> > 
> > debug=value 
> > 
> > 
> > 
> > Does this contain a c omplete listing of the properties? A m I missing 
> > anything, did I put something in that's not needed, or are there other 
> > options a user should be able to pass in? Also, is the URI of "accum" ok 
> for 
> > this camel component? 
> > 
> > 
> > 
> > Because Camel is written to play nicely inside of OSGI (like 
> Karaf/Felix), 
> > the .jar files this camel component relies on should be bundle-ized. This 
> > shouldn't be too hard to do, and as a Karaf contributor, I've done this 
> with 
> > hundreds of third party .jar files. Basically, we would replace the 
> > maven-jar-plugin with a light implementation of the maven-bundle-plugin 
> > along with some fairly generic attributes.  If you folks would like, I 
> can 
> > do this for you on a seperate branch so that you can test it. 
> > 
> > 
> > 
> > Mike Van 
> > 
> > Committer - ASF Kalumet 
> > 
> > Contributor - ASF Karaf, Camel 
> 

Re: Camel Accumulo

Posted by Jesse Yates <je...@gmail.com>.
Comments inline.
-------------------
Jesse Yates
240-888-2200
@jesse_yates

On Wed, Oct 26, 2011 at 6:39 AM, <mv...@comcast.net> wrote:

>
>
> Eric,
>
>
>
> I'm confused about the need for a username/password. Does Accumulo validate
> against that, or does it accept a set of priveledges for a user?


It checks that user and password against an internal db of users, giving the
user ability to read from a subset of the permissions on the system
('Authorizations' in Accumulo parlance).


> Also, in your discussion of zookeeper, what does the "zookeepers" attribute
> represent?
>

The address of the zookeeper servers keeping track of the accumulo instance.
This is the 'root' truth of where things are for clients.


>
>
>
> For my first stab at this, I'm using the ReadandWrite example to show me
> how to connect to Accumulo and perform reads/writes .  Is there a better one
> that I should be using?  This example class seems pretty heavy-weight...
>
>
It might be a little bit heavy, but I would rather have a complete example,
than a bunch of scattered sub-pieces. I think once you have the connection
setup, the actual read/write shouldn't be too bad.


>
>
> For the table name, I'd like to keep the camel component as generic as
> possible. So, to allow for this, I'm going to require folks to pass in the
> table name  on their accumulate url. Is this ok?
>
>
>
> Full Ack on the "accumulate" versus "accum" uri.  The only reason I
> attempted to shorten it is that looks like the trend for other
> camel-components.  That said, I can live with a longer name if it makes it
> easier to use.  Along these lines, I've thought that having a different uri
> for reads, writes, and mutates would be an easy way to handle interactions
> through the camel component. To help with this, I'm thinking that he
> following uri's would be use ful (developed in this order) :
>
> accumulate-read
>
> accumulate-write
>
> accumulate-mutate
>
> accumulate-batch-read
>
> accumulate-batch-write
>
>
>
> How does that sound?
>

shouldn't this be 'accumulo' or is this for implying accumulation of
information via Camel?


>
>
>
> Also, how do you feel about changing the build so that it creates bundles
> instead of .jar files?  All a bundle is, is a .jar file with a MANIFEST.MF
> file that has some extra attributes.  The maven-bundle-plugin can take care
> of that pretty easily, but it will change the packaging type from jar to
> bundle.  This would really only impact folks who are looking for a packaging
> type in thier .pom file dependencies, which is not common.
>
>
I would argue vehemently against changing the core build for an external
process (no offense mike). I think it would be okay if we have a profile in
the build that builds a bundle, but since that is a pretty uncommon distro,
people will easily get confused, leading to lower adoption, lots of
questions on user@, etc. Yes, its a low likelyhood, but I would rather make
it is as easy as possible to 'do the right thing'.

Camel (and other systems that need a bundle) are a special case, so already
people are going to be be a little more advanced and expect slightly
'special' versions.



>
>
> Mike Van
>
>
>
> ----- Original Message -----
>
>
> From: "Eric Newton" <er...@gmail.com>
> To: accumulo-dev@incubator.apache.org
> Sent: Wednesday, October 26, 2011 9:06:19 AM
> Subject: Re: Camel Accumulo
>
> Hi Mike,
>
> Scan auths are needed for reading, not for writing.
>
> Looking at constructors and factory methods looks like we'll need:
>
> ZooKeeperInstance:
>     zookeepers
>     instance name
>
> getConnector():
>     username
>     password
>
> getMultiTableBatchWriter():
>      memory
>      latency
>      write threads
>
> This would allow you to stream in tuples of (table, row, cf, cq,
> visibility,
> timestamp).  If you already have a connector for HBase, we would probably
> want something compatible with that, and that is probably table oriented
> (just guessing, being unfamiliar with Camel).  So, perhaps if you specify a
> table you would accept tuples of (row, cf, cq, visibility, timestamp).  You
> could make visibility and timestamp optional, too.
>
> A minimal uri would look something like this?  You could provide reasonable
> defaults for the other arguments.
>
>
> accumulo://instance/write?zookeeper=zoohost&userName=root&password=secret
>
> Unfortunately, "accumulo" doesn't have a nice short abbreviation.  I would
> lean towards using the whole word, but that's just a personal preference.
>
> -Eric
>
> On Wed, Oct 26, 2011 at 12:17 AM, <mv...@comcast.net> wrote:
>
> >
> >
> > Hi there.
> >
> >
> >
> > I recently read about your project and like the direction it is taking.
> > Currently, I am a committer to another incubator project, Kalumet, and
> have
> > been a long-time contributor to the Karaf and Camel projects. After
> talking
> > with a few of the Accumulo project members, it looks like the most
> immediate
> > hurdle is growing the user community.  I believe I can help with that.
> >
> >
> >
> > Making associations between incubator projects and top-level-projects has
> > been a proven mechanism to pique developer interest and garner more
> > dedicated contributors and commiters.  Because of the wide integration of
> > Camel and NOSQL databases, creating a Camel component for interaction
> with
> > Accumulo seems like a no-brainer.
> >
> >
> >
> > In order to help grow the Accumulo community, tonight I began writing a
> > camel-accumulo component.  This will allow Camel users to route files to
> > Accumulo in the same manner as they currently route files to HDFS .
> >
> >
> >
> >
> >
> > For some background, Camel is the open-source implementation of
> Enterprise
> > Integration Patterns. Most modern ESB's use OSGI and Camel to perform
> > routing and orchestration of data through endpoints. Camel has been
> written
> > to allow various technologies to create Camel Components that folks can
> use
> > when they define the route that a given file or data will be processed
> > through. In this model, users would define a "route" in Camel that
> contains
> > various Accumulo endpoints for reading, writing, or mutating data
> persisted
> > through Accumulo. To make this work, I need to define a URI that folks
> will
> > use.  Would you folks be able to help me define the URI and URL
> parameters?
> >
> >
> >
> > Right now, I'm using the URI "accum".  For the first iteration of this
> > component, I'm thinking it would be simplest to create an endpoint folks
> > could write a single re cord to. Then, follow it up with a scan, and
> mutate
> >  components. Once these are done, I'd like to do the batch-versions of
> these
> > operations.
> >
> >
> >
> > In Camel, an endpoint usually looks similar to a web-service endpoint:
> > URI://location/service ?[arg1=value1][&arg<x>=value<x>]
> >
> >
> >
> > With this in mind, I'm thinking the following would be the minimally
> > acceptable Camel-Accumulo endpoint for simplistic write  operations:
> >
> > accum://location /write ?zookeeperName=value&\
> >
> > tableName=value& \
> >
> > userName=value&\
> >
> > password=value&\
> >
> > userPrivs=value&\
> >
> > scanAuths=value&\
> >
> > debug=value
> >
> >
> >
> > Does this contain a c omplete listing of the properties? A m I missing
> > anything, did I put something in that's not needed, or are there other
> > options a user should be able to pass in? Also, is the URI of "accum" ok
> for
> > this camel component?
> >
> >
> >
> > Because Camel is written to play nicely inside of OSGI (like
> Karaf/Felix),
> > the .jar files this camel component relies on should be bundle-ized. This
> > shouldn't be too hard to do, and as a Karaf contributor, I've done this
> with
> > hundreds of third party .jar files. Basically, we would replace the
> > maven-jar-plugin with a light implementation of the maven-bundle-plugin
> > along with some fairly generic attributes.  If you folks would like, I
> can
> > do this for you on a seperate branch so that you can test it.
> >
> >
> >
> > Mike Van
> >
> > Committer - ASF Kalumet
> >
> > Contributor - ASF Karaf, Camel
>

Re: Camel Accumulo

Posted by Eric Newton <er...@gmail.com>.
Hi Mike,

The username/password is validated, and there are privileges stored in
accumulo for each user.  For example, you may be able to read a table, but
not write to it.

Zookeeper maintains a consistent, replicated set of data.   Accumulo stores
configuration information in zookeeper, including an all-important pointer
to the root tablet, which allows clients to find out where every tablet is
located in the cluster.

We need a reference to at least one zookeeper replica in order to get this
information.  You can provide a comma separated list for redundancy:
"zoohost1,zoohost2,zoohost3"

I don't know where "accumulate" comes from... I was just +1 for using the
full project name "accumulo".

The InsertWithBatchWriter example is a pretty simple start:

src/examples/src/main/java/org/apache/accumulo/examples/helloworld/InsertWithBatchWriter.java

Sure, you can require folks to pass the table in the URI; I just wanted to
point out that the table could also be passed as data.

There isn't any semantic difference between write and batch-write.  There is
a bulk-load, which requires the input data to be sorted. The difference
between read and batch-read is that the order of results is not guaranteed
in batch-read.  Accumulo does not have a mutate, in the traditional sense,
so lets ignore that for now.  There is a difference between read and
isolated-read, but perhaps that should just be an option for the read uri?

write
bulk-load
read
batch-read


What would make the most sense for the URI form:

accumulo://instance/write?table=mytable&...

or
accumulo-write://instance?table=mytable&

or
accumulo://write?instance=myinstance&table=mytable&

or
accumulo://write/instance/mytable?...

Can you create a ticket for switching to bundles?  Since we have no releases
of accumulo, now is an ideal time to change the packaging.

-Eric


On Wed, Oct 26, 2011 at 9:39 AM, <mv...@comcast.net> wrote:

>
>
> Eric,
>
>
>
> I'm confused about the need for a username/password. Does Accumulo validate
> against that, or does it accept a set of priveledges for a user?  Also, in
> your discussion of zookeeper, what does the "zookeepers" attribute
> represent?
>
>
>
> For my first stab at this, I'm using the ReadandWrite example to show me
> how to connect to Accumulo and perform reads/writes .  Is there a better one
> that I should be using?  This example class seems pretty heavy-weight...
>
>
>
> For the table name, I'd like to keep the camel component as generic as
> possible. So, to allow for this, I'm going to require folks to pass in the
> table name  on their accumulate url. Is this ok?
>
>
>
> Full Ack on the "accumulate" versus "accum" uri.  The only reason I
> attempted to shorten it is that looks like the trend for other
> camel-components.  That said, I can live with a longer name if it makes it
> easier to use.  Along these lines, I've thought that having a different uri
> for reads, writes, and mutates would be an easy way to handle interactions
> through the camel component. To help with this, I'm thinking that he
> following uri's would be use ful (developed in this order) :
>
> accumulate-read
>
> accumulate-write
>
> accumulate-mutate
>
> accumulate-batch-read
>
> accumulate-batch-write
>
>
>
> How does that sound?
>
>
>
> Also, how do you feel about changing the build so that it creates bundles
> instead of .jar files?  All a bundle is, is a .jar file with a MANIFEST.MF
> file that has some extra attributes.  The maven-bundle-plugin can take care
> of that pretty easily, but it will change the packaging type from jar to
> bundle.  This would really only impact folks who are looking for a packaging
> type in thier .pom file dependencies, which is not common.
>
>
>
> Mike Van
>
>
>
> ----- Original Message -----
>
>
> From: "Eric Newton" <er...@gmail.com>
> To: accumulo-dev@incubator.apache.org
> Sent: Wednesday, October 26, 2011 9:06:19 AM
> Subject: Re: Camel Accumulo
>
> Hi Mike,
>
> Scan auths are needed for reading, not for writing.
>
> Looking at constructors and factory methods looks like we'll need:
>
> ZooKeeperInstance:
>     zookeepers
>     instance name
>
> getConnector():
>     username
>     password
>
> getMultiTableBatchWriter():
>      memory
>      latency
>      write threads
>
> This would allow you to stream in tuples of (table, row, cf, cq,
> visibility,
> timestamp).  If you already have a connector for HBase, we would probably
> want something compatible with that, and that is probably table oriented
> (just guessing, being unfamiliar with Camel).  So, perhaps if you specify a
> table you would accept tuples of (row, cf, cq, visibility, timestamp).  You
> could make visibility and timestamp optional, too.
>
> A minimal uri would look something like this?  You could provide reasonable
> defaults for the other arguments.
>
>
> accumulo://instance/write?zookeeper=zoohost&userName=root&password=secret
>
> Unfortunately, "accumulo" doesn't have a nice short abbreviation.  I would
> lean towards using the whole word, but that's just a personal preference.
>
> -Eric
>
> On Wed, Oct 26, 2011 at 12:17 AM, <mv...@comcast.net> wrote:
>
> >
> >
> > Hi there.
> >
> >
> >
> > I recently read about your project and like the direction it is taking.
> > Currently, I am a committer to another incubator project, Kalumet, and
> have
> > been a long-time contributor to the Karaf and Camel projects. After
> talking
> > with a few of the Accumulo project members, it looks like the most
> immediate
> > hurdle is growing the user community.  I believe I can help with that.
> >
> >
> >
> > Making associations between incubator projects and top-level-projects has
> > been a proven mechanism to pique developer interest and garner more
> > dedicated contributors and commiters.  Because of the wide integration of
> > Camel and NOSQL databases, creating a Camel component for interaction
> with
> > Accumulo seems like a no-brainer.
> >
> >
> >
> > In order to help grow the Accumulo community, tonight I began writing a
> > camel-accumulo component.  This will allow Camel users to route files to
> > Accumulo in the same manner as they currently route files to HDFS .
> >
> >
> >
> >
> >
> > For some background, Camel is the open-source implementation of
> Enterprise
> > Integration Patterns. Most modern ESB's use OSGI and Camel to perform
> > routing and orchestration of data through endpoints. Camel has been
> written
> > to allow various technologies to create Camel Components that folks can
> use
> > when they define the route that a given file or data will be processed
> > through. In this model, users would define a "route" in Camel that
> contains
> > various Accumulo endpoints for reading, writing, or mutating data
> persisted
> > through Accumulo. To make this work, I need to define a URI that folks
> will
> > use.  Would you folks be able to help me define the URI and URL
> parameters?
> >
> >
> >
> > Right now, I'm using the URI "accum".  For the first iteration of this
> > component, I'm thinking it would be simplest to create an endpoint folks
> > could write a single re cord to. Then, follow it up with a scan, and
> mutate
> >  components. Once these are done, I'd like to do the batch-versions of
> these
> > operations.
> >
> >
> >
> > In Camel, an endpoint usually looks similar to a web-service endpoint:
> > URI://location/service ?[arg1=value1][&arg<x>=value<x>]
> >
> >
> >
> > With this in mind, I'm thinking the following would be the minimally
> > acceptable Camel-Accumulo endpoint for simplistic write  operations:
> >
> > accum://location /write ?zookeeperName=value&\
> >
> > tableName=value& \
> >
> > userName=value&\
> >
> > password=value&\
> >
> > userPrivs=value&\
> >
> > scanAuths=value&\
> >
> > debug=value
> >
> >
> >
> > Does this contain a c omplete listing of the properties? A m I missing
> > anything, did I put something in that's not needed, or are there other
> > options a user should be able to pass in? Also, is the URI of "accum" ok
> for
> > this camel component?
> >
> >
> >
> > Because Camel is written to play nicely inside of OSGI (like
> Karaf/Felix),
> > the .jar files this camel component relies on should be bundle-ized. This
> > shouldn't be too hard to do, and as a Karaf contributor, I've done this
> with
> > hundreds of third party .jar files. Basically, we would replace the
> > maven-jar-plugin with a light implementation of the maven-bundle-plugin
> > along with some fairly generic attributes.  If you folks would like, I
> can
> > do this for you on a seperate branch so that you can test it.
> >
> >
> >
> > Mike Van
> >
> > Committer - ASF Kalumet
> >
> > Contributor - ASF Karaf, Camel
>

Re: Camel Accumulo

Posted by mv...@comcast.net.

Eric, 



I'm confused about the need for a username/password. Does Accumulo validate against that, or does it accept a set of priveledges for a user?  Also, in your discussion of zookeeper, what does the "zookeepers" attribute represent? 



For my first stab at this, I'm using the ReadandWrite example to show me how to connect to Accumulo and perform reads/writes .  Is there a better one that I should be using?  This example class seems pretty heavy-weight... 



For the table name, I'd like to keep the camel component as generic as possible. So, to allow for this, I'm going to require folks to pass in the table name  on their accumulate url. Is this ok? 



Full Ack on the "accumulate" versus "accum" uri.  The only reason I attempted to shorten it is that looks like the trend for other camel-components.  That said, I can live with a longer name if it makes it easier to use.  Along these lines, I've thought that having a different uri for reads, writes, and mutates would be an easy way to handle interactions through the camel component. To help with this, I'm thinking that he following uri's would be use ful (developed in this order) : 

accumulate-read 

accumulate-write 

accumulate-mutate 

accumulate-batch-read 

accumulate-batch-write 



How does that sound? 



Also, how do you feel about changing the build so that it creates bundles instead of .jar files?  All a bundle is, is a .jar file with a MANIFEST.MF file that has some extra attributes.  The maven-bundle-plugin can take care of that pretty easily, but it will change the packaging type from jar to bundle.  This would really only impact folks who are looking for a packaging type in thier .pom file dependencies, which is not common. 



Mike Van 



----- Original Message -----


From: "Eric Newton" <er...@gmail.com> 
To: accumulo-dev@incubator.apache.org 
Sent: Wednesday, October 26, 2011 9:06:19 AM 
Subject: Re: Camel Accumulo 

Hi Mike, 

Scan auths are needed for reading, not for writing. 

Looking at constructors and factory methods looks like we'll need: 

ZooKeeperInstance: 
    zookeepers 
    instance name 

getConnector(): 
    username 
    password 

getMultiTableBatchWriter(): 
     memory 
     latency 
     write threads 

This would allow you to stream in tuples of (table, row, cf, cq, visibility, 
timestamp).  If you already have a connector for HBase, we would probably 
want something compatible with that, and that is probably table oriented 
(just guessing, being unfamiliar with Camel).  So, perhaps if you specify a 
table you would accept tuples of (row, cf, cq, visibility, timestamp).  You 
could make visibility and timestamp optional, too. 

A minimal uri would look something like this?  You could provide reasonable 
defaults for the other arguments. 

   accumulo://instance/write?zookeeper=zoohost&userName=root&password=secret 

Unfortunately, "accumulo" doesn't have a nice short abbreviation.  I would 
lean towards using the whole word, but that's just a personal preference. 

-Eric 

On Wed, Oct 26, 2011 at 12:17 AM, <mv...@comcast.net> wrote: 

> 
> 
> Hi there. 
> 
> 
> 
> I recently read about your project and like the direction it is taking. 
> Currently, I am a committer to another incubator project, Kalumet, and have 
> been a long-time contributor to the Karaf and Camel projects. After talking 
> with a few of the Accumulo project members, it looks like the most immediate 
> hurdle is growing the user community.  I believe I can help with that. 
> 
> 
> 
> Making associations between incubator projects and top-level-projects has 
> been a proven mechanism to pique developer interest and garner more 
> dedicated contributors and commiters.  Because of the wide integration of 
> Camel and NOSQL databases, creating a Camel component for interaction with 
> Accumulo seems like a no-brainer. 
> 
> 
> 
> In order to help grow the Accumulo community, tonight I began writing a 
> camel-accumulo component.  This will allow Camel users to route files to 
> Accumulo in the same manner as they currently route files to HDFS . 
> 
> 
> 
> 
> 
> For some background, Camel is the open-source implementation of Enterprise 
> Integration Patterns. Most modern ESB's use OSGI and Camel to perform 
> routing and orchestration of data through endpoints. Camel has been written 
> to allow various technologies to create Camel Components that folks can use 
> when they define the route that a given file or data will be processed 
> through. In this model, users would define a "route" in Camel that contains 
> various Accumulo endpoints for reading, writing, or mutating data persisted 
> through Accumulo. To make this work, I need to define a URI that folks will 
> use.  Would you folks be able to help me define the URI and URL parameters? 
> 
> 
> 
> Right now, I'm using the URI "accum".  For the first iteration of this 
> component, I'm thinking it would be simplest to create an endpoint folks 
> could write a single re cord to. Then, follow it up with a scan, and mutate 
>  components. Once these are done, I'd like to do the batch-versions of these 
> operations. 
> 
> 
> 
> In Camel, an endpoint usually looks similar to a web-service endpoint: 
> URI://location/service ?[arg1=value1][&arg<x>=value<x>] 
> 
> 
> 
> With this in mind, I'm thinking the following would be the minimally 
> acceptable Camel-Accumulo endpoint for simplistic write  operations: 
> 
> accum://location /write ?zookeeperName=value&\ 
> 
> tableName=value& \ 
> 
> userName=value&\ 
> 
> password=value&\ 
> 
> userPrivs=value&\ 
> 
> scanAuths=value&\ 
> 
> debug=value 
> 
> 
> 
> Does this contain a c omplete listing of the properties? A m I missing 
> anything, did I put something in that's not needed, or are there other 
> options a user should be able to pass in? Also, is the URI of "accum" ok for 
> this camel component? 
> 
> 
> 
> Because Camel is written to play nicely inside of OSGI (like Karaf/Felix), 
> the .jar files this camel component relies on should be bundle-ized. This 
> shouldn't be too hard to do, and as a Karaf contributor, I've done this with 
> hundreds of third party .jar files. Basically, we would replace the 
> maven-jar-plugin with a light implementation of the maven-bundle-plugin 
> along with some fairly generic attributes.  If you folks would like, I can 
> do this for you on a seperate branch so that you can test it. 
> 
> 
> 
> Mike Van 
> 
> Committer - ASF Kalumet 
> 
> Contributor - ASF Karaf, Camel 

Re: Camel Accumulo

Posted by Eric Newton <er...@gmail.com>.
Hi Mike,

Scan auths are needed for reading, not for writing.

Looking at constructors and factory methods looks like we'll need:

ZooKeeperInstance:
    zookeepers
    instance name

getConnector():
    username
    password

getMultiTableBatchWriter():
     memory
     latency
     write threads

This would allow you to stream in tuples of (table, row, cf, cq, visibility,
timestamp).  If you already have a connector for HBase, we would probably
want something compatible with that, and that is probably table oriented
(just guessing, being unfamiliar with Camel).  So, perhaps if you specify a
table you would accept tuples of (row, cf, cq, visibility, timestamp).  You
could make visibility and timestamp optional, too.

A minimal uri would look something like this?  You could provide reasonable
defaults for the other arguments.

   accumulo://instance/write?zookeeper=zoohost&userName=root&password=secret

Unfortunately, "accumulo" doesn't have a nice short abbreviation.  I would
lean towards using the whole word, but that's just a personal preference.

-Eric

On Wed, Oct 26, 2011 at 12:17 AM, <mv...@comcast.net> wrote:

>
>
> Hi there.
>
>
>
> I recently read about your project and like the direction it is taking.
> Currently, I am a committer to another incubator project, Kalumet, and have
> been a long-time contributor to the Karaf and Camel projects. After talking
> with a few of the Accumulo project members, it looks like the most immediate
> hurdle is growing the user community.  I believe I can help with that.
>
>
>
> Making associations between incubator projects and top-level-projects has
> been a proven mechanism to pique developer interest and garner more
> dedicated contributors and commiters.  Because of the wide integration of
> Camel and NOSQL databases, creating a Camel component for interaction with
> Accumulo seems like a no-brainer.
>
>
>
> In order to help grow the Accumulo community, tonight I began writing a
> camel-accumulo component.  This will allow Camel users to route files to
> Accumulo in the same manner as they currently route files to HDFS .
>
>
>
>
>
> For some background, Camel is the open-source implementation of Enterprise
> Integration Patterns. Most modern ESB's use OSGI and Camel to perform
> routing and orchestration of data through endpoints. Camel has been written
> to allow various technologies to create Camel Components that folks can use
> when they define the route that a given file or data will be processed
> through. In this model, users would define a "route" in Camel that contains
> various Accumulo endpoints for reading, writing, or mutating data persisted
> through Accumulo. To make this work, I need to define a URI that folks will
> use.  Would you folks be able to help me define the URI and URL parameters?
>
>
>
> Right now, I'm using the URI "accum".  For the first iteration of this
> component, I'm thinking it would be simplest to create an endpoint folks
> could write a single re cord to. Then, follow it up with a scan, and mutate
>  components. Once these are done, I'd like to do the batch-versions of these
> operations.
>
>
>
> In Camel, an endpoint usually looks similar to a web-service endpoint:
> URI://location/service ?[arg1=value1][&arg<x>=value<x>]
>
>
>
> With this in mind, I'm thinking the following would be the minimally
> acceptable Camel-Accumulo endpoint for simplistic write  operations:
>
> accum://location /write ?zookeeperName=value&\
>
> tableName=value& \
>
> userName=value&\
>
> password=value&\
>
> userPrivs=value&\
>
> scanAuths=value&\
>
> debug=value
>
>
>
> Does this contain a c omplete listing of the properties? A m I missing
> anything, did I put something in that's not needed, or are there other
> options a user should be able to pass in? Also, is the URI of "accum" ok for
> this camel component?
>
>
>
> Because Camel is written to play nicely inside of OSGI (like Karaf/Felix),
> the .jar files this camel component relies on should be bundle-ized. This
> shouldn't be too hard to do, and as a Karaf contributor, I've done this with
> hundreds of third party .jar files. Basically, we would replace the
> maven-jar-plugin with a light implementation of the maven-bundle-plugin
> along with some fairly generic attributes.  If you folks would like, I can
> do this for you on a seperate branch so that you can test it.
>
>
>
> Mike Van
>
> Committer - ASF Kalumet
>
> Contributor - ASF Karaf, Camel