You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Edmon Begoli <eb...@gmail.com> on 2012/07/27 05:15:28 UTC

Re: Python client lib for Accumulo?

Hi folks,

I have just joined the list with the purpose of volunteering ideas,
design and development (and whatever else in lifecycle)
related to development of the Python client for accumulo.

I have developed several RESTful clients and libraries before using
web.py and I am about to write another in Tornado
(http://www.tornadoweb.org/).

I think that we could have a very nice, scalable and fast RESTful API
for Accumulo through Tornado.

I would also like to develop pure Python library for accumulo similar
to HappyBase for HBase (https://github.com/wbolster/happybase).

I work at Oak Ridge National Lab as a software engineer and tech. lead
on "big data" projects,
I can devote time, possibly bring more team members and I would be
happy to collaborate. Collaborations are welcome.

I could certainly start a small wiki outlining the ideas and open them
for discussion.

Regards and please advise,
Edmon


On Wed, May 2, 2012 at 11:31 AM, Jason Trost <ja...@gmail.com> wrote:
> I noticed that there are no JIRAs for a python client
> interface/lib/API for Accumulo.  How involved would it be to develop
> AND maintain a python client for Accumulo?
>
> I realize that Jython can be used, but I am interested in a native
> python lib that can be use more broadly with systems that don't work
> with Jython.
>
> In order to do this, it seems like we would need to:
> 1. generate the python thrift bindings code (this is trivial)
> 2. develop and maintain the python glue code to use the thrift code
> and python zookeeper code to interact with the various accumulo
> components.  The current Java "glue" code looks quite long.  How often
> does this code change (in terms of new features or changes in
> protocol, not bug fixes)?
>

I would advise against rewriting the accumulo client code in python.
The code that finds tablets, retries in case of failure, parallelizes
read/writes, etc is fairly complex.  I think the proxy option is best.
 David and Eric mentioned REST and Thrift proxies.

If we were to go to down the route of writing the client code in
another language, I think C++ with a C API would be the best option
because many language can easily bind to a C API.

> Ideally the python API would be very similar to the Java interface
> (Connector, Instance, Scanner, BatchScanner, BatchWriter, Key, Value,
> Mutation, etc).
>
> I guess what I am trying to get at is, does the Accumulo dev community
> think it's worth the time and effort to develop and maintain a python
> API?  I personally think it is in order to help with adoption and
> integration with other systems (Django is the primary system I want to
> be able to use with it).  I have some time to help this along, but I
> don't think I have enough time to take this on alone.  Is anyone else
> interested in working together on this?
>
> Thanks,
>
> --Jason

Re: Python client lib for Accumulo?

Posted by Keith Turner <ke...@deenlo.com>.
Does anyone know anything about Py4J?

http://py4j.sourceforge.net/index.html

I have never used it, but I am wondering if it would fit the bill?

On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <eb...@gmail.com> wrote:
> Hi folks,
>
> I have just joined the list with the purpose of volunteering ideas,
> design and development (and whatever else in lifecycle)
> related to development of the Python client for accumulo.
>
> I have developed several RESTful clients and libraries before using
> web.py and I am about to write another in Tornado
> (http://www.tornadoweb.org/).
>
> I think that we could have a very nice, scalable and fast RESTful API
> for Accumulo through Tornado.
>
> I would also like to develop pure Python library for accumulo similar
> to HappyBase for HBase (https://github.com/wbolster/happybase).
>
> I work at Oak Ridge National Lab as a software engineer and tech. lead
> on "big data" projects,
> I can devote time, possibly bring more team members and I would be
> happy to collaborate. Collaborations are welcome.
>
> I could certainly start a small wiki outlining the ideas and open them
> for discussion.
>
> Regards and please advise,
> Edmon
>
>
> On Wed, May 2, 2012 at 11:31 AM, Jason Trost <ja...@gmail.com> wrote:
>> I noticed that there are no JIRAs for a python client
>> interface/lib/API for Accumulo.  How involved would it be to develop
>> AND maintain a python client for Accumulo?
>>
>> I realize that Jython can be used, but I am interested in a native
>> python lib that can be use more broadly with systems that don't work
>> with Jython.
>>
>> In order to do this, it seems like we would need to:
>> 1. generate the python thrift bindings code (this is trivial)
>> 2. develop and maintain the python glue code to use the thrift code
>> and python zookeeper code to interact with the various accumulo
>> components.  The current Java "glue" code looks quite long.  How often
>> does this code change (in terms of new features or changes in
>> protocol, not bug fixes)?
>>
>
> I would advise against rewriting the accumulo client code in python.
> The code that finds tablets, retries in case of failure, parallelizes
> read/writes, etc is fairly complex.  I think the proxy option is best.
>  David and Eric mentioned REST and Thrift proxies.
>
> If we were to go to down the route of writing the client code in
> another language, I think C++ with a C API would be the best option
> because many language can easily bind to a C API.
>
>> Ideally the python API would be very similar to the Java interface
>> (Connector, Instance, Scanner, BatchScanner, BatchWriter, Key, Value,
>> Mutation, etc).
>>
>> I guess what I am trying to get at is, does the Accumulo dev community
>> think it's worth the time and effort to develop and maintain a python
>> API?  I personally think it is in order to help with adoption and
>> integration with other systems (Django is the primary system I want to
>> be able to use with it).  I have some time to help this along, but I
>> don't think I have enough time to take this on alone.  Is anyone else
>> interested in working together on this?
>>
>> Thanks,
>>
>> --Jason

Re: Python client lib for Accumulo?

Posted by Adam Fuchs <af...@apache.org>.
One of the big challenges of connecting directly to the existing thrift
services is that there is a lot of logic imbedded in the Java client
libraries that would have to be recreated. This includes things like
finding tablets, managing multiple connections, handling tablet migration,
handling read and write threads, etc. Sapan Shah was working on building a
thrift proxy that would make native python bindings a lot simpler: see
ACCUMULO-482. Maybe we can encourage him to continue to work on that if we
all ask nicely.

Adam


On Fri, Jul 27, 2012 at 9:01 AM, Jim Klucar <kl...@gmail.com> wrote:

> I have a small proof of concept going. I'm still not sure what the
> best way to do results paging is (i.e. your scan has a billion results
> and won't fit in memory) My initial work is moving towards opening up
> a HTTP/1.1 chunked-encoded stream like Twitter does for its streaming
> API. The other thing I've been playing with are using websockets, but
> that may restrict you to using JavaScript but I'm sure more client
> side websocket libraries are coming.
>
> On Fri, Jul 27, 2012 at 8:50 AM, David Medinets
> <da...@gmail.com> wrote:
> > Which reminds me. There was a discussion of using a REST interface on
> > this list. Several people liked that approach because it would provide
> > loose coupling between client and server. Also the client could use
> > any language. At the time, nobody could spare the time to implement
> > it.
> >
> > On Fri, Jul 27, 2012 at 7:37 AM, Jim Klucar <kl...@gmail.com> wrote:
> >> Welcome Edmon. I think as far as a pure python library goes, you would
> >> have to interface with the thrift protocols. My sense is that would be
> >> discouraged at this point by the devs. I do have some experience with
> >> it though, I made an attempt to interface to Accumulo with Node.js. It
> >> turned into me writing the JavaScript version of TCompactProtocol, but
> >> it's still incomplete at this point. I would vote for either
> >> developing an officially supported Thrift interface, or an officially
> >> supported REST interface using a JVM language. Then the language
> >> barrier would be easier to overcome.
> >>
> >> Jim
> >>
> >> On Jul 27, 2012, at 7:19 AM, Edmon Begoli <eb...@gmail.com> wrote:
> >>
> >>> Hi David,
> >>>
> >>> I think that Jython is a good idea as at least a prototype or as a
> bridge
> >>> towards a full blown python library.
> >>>
> >>> It is probably not a good end state because most Python developers do
> not
> >>> want JVM and Java environment, and there is also performance overhead.
> >>>
> >>> Personally, I program in both languages, so I am good.
> >>>
> >>> Is there a particular protocol about contributing to accumulo project?
> >>> On Jul 27, 2012 5:27 AM, "David Medinets" <da...@gmail.com>
> wrote:
> >>>
> >>>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <eb...@gmail.com>
> wrote:
> >>>>> I have just joined the list with the purpose of volunteering ideas,
> >>>>> design and development (and whatever else in lifecycle)
> >>>>> related to development of the Python client for accumulo.
> >>>>
> >>>> Welcome to the list. There are a lot of Python developers and I'm sure
> >>>> that your client would be well received by the community. My own
> >>>> advice is to write whatever is simplest (fastest to develop) and
> >>>> iterate towards a more complex complete solution.
> >>>>
> >>>> Would jython be any use to provide python access to the existing Java
> >>>> API without any rewrite or plumbing needed?
> >>>>
>

Re: Python client lib for Accumulo?

Posted by Edmon Begoli <eb...@gmail.com>.
Just let me know how and if we want to collaborate on this.

As for RESTful API and paging, I think we could also look into
OData-like protocol
conventions that specify an API to scroll through the result set using
'skip' and 'top' in addition to opening the stream.

Edmon

On Fri, Jul 27, 2012 at 9:01 AM, Jim Klucar <kl...@gmail.com> wrote:
> I have a small proof of concept going. I'm still not sure what the
> best way to do results paging is (i.e. your scan has a billion results
> and won't fit in memory) My initial work is moving towards opening up
> a HTTP/1.1 chunked-encoded stream like Twitter does for its streaming
> API. The other thing I've been playing with are using websockets, but
> that may restrict you to using JavaScript but I'm sure more client
> side websocket libraries are coming.
>
> On Fri, Jul 27, 2012 at 8:50 AM, David Medinets
> <da...@gmail.com> wrote:
>> Which reminds me. There was a discussion of using a REST interface on
>> this list. Several people liked that approach because it would provide
>> loose coupling between client and server. Also the client could use
>> any language. At the time, nobody could spare the time to implement
>> it.
>>
>> On Fri, Jul 27, 2012 at 7:37 AM, Jim Klucar <kl...@gmail.com> wrote:
>>> Welcome Edmon. I think as far as a pure python library goes, you would
>>> have to interface with the thrift protocols. My sense is that would be
>>> discouraged at this point by the devs. I do have some experience with
>>> it though, I made an attempt to interface to Accumulo with Node.js. It
>>> turned into me writing the JavaScript version of TCompactProtocol, but
>>> it's still incomplete at this point. I would vote for either
>>> developing an officially supported Thrift interface, or an officially
>>> supported REST interface using a JVM language. Then the language
>>> barrier would be easier to overcome.
>>>
>>> Jim
>>>
>>> On Jul 27, 2012, at 7:19 AM, Edmon Begoli <eb...@gmail.com> wrote:
>>>
>>>> Hi David,
>>>>
>>>> I think that Jython is a good idea as at least a prototype or as a bridge
>>>> towards a full blown python library.
>>>>
>>>> It is probably not a good end state because most Python developers do not
>>>> want JVM and Java environment, and there is also performance overhead.
>>>>
>>>> Personally, I program in both languages, so I am good.
>>>>
>>>> Is there a particular protocol about contributing to accumulo project?
>>>> On Jul 27, 2012 5:27 AM, "David Medinets" <da...@gmail.com> wrote:
>>>>
>>>>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <eb...@gmail.com> wrote:
>>>>>> I have just joined the list with the purpose of volunteering ideas,
>>>>>> design and development (and whatever else in lifecycle)
>>>>>> related to development of the Python client for accumulo.
>>>>>
>>>>> Welcome to the list. There are a lot of Python developers and I'm sure
>>>>> that your client would be well received by the community. My own
>>>>> advice is to write whatever is simplest (fastest to develop) and
>>>>> iterate towards a more complex complete solution.
>>>>>
>>>>> Would jython be any use to provide python access to the existing Java
>>>>> API without any rewrite or plumbing needed?
>>>>>

Re: Python client lib for Accumulo?

Posted by Jim Klucar <kl...@gmail.com>.
I have a small proof of concept going. I'm still not sure what the
best way to do results paging is (i.e. your scan has a billion results
and won't fit in memory) My initial work is moving towards opening up
a HTTP/1.1 chunked-encoded stream like Twitter does for its streaming
API. The other thing I've been playing with are using websockets, but
that may restrict you to using JavaScript but I'm sure more client
side websocket libraries are coming.

On Fri, Jul 27, 2012 at 8:50 AM, David Medinets
<da...@gmail.com> wrote:
> Which reminds me. There was a discussion of using a REST interface on
> this list. Several people liked that approach because it would provide
> loose coupling between client and server. Also the client could use
> any language. At the time, nobody could spare the time to implement
> it.
>
> On Fri, Jul 27, 2012 at 7:37 AM, Jim Klucar <kl...@gmail.com> wrote:
>> Welcome Edmon. I think as far as a pure python library goes, you would
>> have to interface with the thrift protocols. My sense is that would be
>> discouraged at this point by the devs. I do have some experience with
>> it though, I made an attempt to interface to Accumulo with Node.js. It
>> turned into me writing the JavaScript version of TCompactProtocol, but
>> it's still incomplete at this point. I would vote for either
>> developing an officially supported Thrift interface, or an officially
>> supported REST interface using a JVM language. Then the language
>> barrier would be easier to overcome.
>>
>> Jim
>>
>> On Jul 27, 2012, at 7:19 AM, Edmon Begoli <eb...@gmail.com> wrote:
>>
>>> Hi David,
>>>
>>> I think that Jython is a good idea as at least a prototype or as a bridge
>>> towards a full blown python library.
>>>
>>> It is probably not a good end state because most Python developers do not
>>> want JVM and Java environment, and there is also performance overhead.
>>>
>>> Personally, I program in both languages, so I am good.
>>>
>>> Is there a particular protocol about contributing to accumulo project?
>>> On Jul 27, 2012 5:27 AM, "David Medinets" <da...@gmail.com> wrote:
>>>
>>>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <eb...@gmail.com> wrote:
>>>>> I have just joined the list with the purpose of volunteering ideas,
>>>>> design and development (and whatever else in lifecycle)
>>>>> related to development of the Python client for accumulo.
>>>>
>>>> Welcome to the list. There are a lot of Python developers and I'm sure
>>>> that your client would be well received by the community. My own
>>>> advice is to write whatever is simplest (fastest to develop) and
>>>> iterate towards a more complex complete solution.
>>>>
>>>> Would jython be any use to provide python access to the existing Java
>>>> API without any rewrite or plumbing needed?
>>>>

Re: Python client lib for Accumulo?

Posted by David Medinets <da...@gmail.com>.
Which reminds me. There was a discussion of using a REST interface on
this list. Several people liked that approach because it would provide
loose coupling between client and server. Also the client could use
any language. At the time, nobody could spare the time to implement
it.

On Fri, Jul 27, 2012 at 7:37 AM, Jim Klucar <kl...@gmail.com> wrote:
> Welcome Edmon. I think as far as a pure python library goes, you would
> have to interface with the thrift protocols. My sense is that would be
> discouraged at this point by the devs. I do have some experience with
> it though, I made an attempt to interface to Accumulo with Node.js. It
> turned into me writing the JavaScript version of TCompactProtocol, but
> it's still incomplete at this point. I would vote for either
> developing an officially supported Thrift interface, or an officially
> supported REST interface using a JVM language. Then the language
> barrier would be easier to overcome.
>
> Jim
>
> On Jul 27, 2012, at 7:19 AM, Edmon Begoli <eb...@gmail.com> wrote:
>
>> Hi David,
>>
>> I think that Jython is a good idea as at least a prototype or as a bridge
>> towards a full blown python library.
>>
>> It is probably not a good end state because most Python developers do not
>> want JVM and Java environment, and there is also performance overhead.
>>
>> Personally, I program in both languages, so I am good.
>>
>> Is there a particular protocol about contributing to accumulo project?
>> On Jul 27, 2012 5:27 AM, "David Medinets" <da...@gmail.com> wrote:
>>
>>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <eb...@gmail.com> wrote:
>>>> I have just joined the list with the purpose of volunteering ideas,
>>>> design and development (and whatever else in lifecycle)
>>>> related to development of the Python client for accumulo.
>>>
>>> Welcome to the list. There are a lot of Python developers and I'm sure
>>> that your client would be well received by the community. My own
>>> advice is to write whatever is simplest (fastest to develop) and
>>> iterate towards a more complex complete solution.
>>>
>>> Would jython be any use to provide python access to the existing Java
>>> API without any rewrite or plumbing needed?
>>>

Re: Python client lib for Accumulo?

Posted by Jim Klucar <kl...@gmail.com>.
Welcome Edmon. I think as far as a pure python library goes, you would
have to interface with the thrift protocols. My sense is that would be
discouraged at this point by the devs. I do have some experience with
it though, I made an attempt to interface to Accumulo with Node.js. It
turned into me writing the JavaScript version of TCompactProtocol, but
it's still incomplete at this point. I would vote for either
developing an officially supported Thrift interface, or an officially
supported REST interface using a JVM language. Then the language
barrier would be easier to overcome.

Jim

On Jul 27, 2012, at 7:19 AM, Edmon Begoli <eb...@gmail.com> wrote:

> Hi David,
>
> I think that Jython is a good idea as at least a prototype or as a bridge
> towards a full blown python library.
>
> It is probably not a good end state because most Python developers do not
> want JVM and Java environment, and there is also performance overhead.
>
> Personally, I program in both languages, so I am good.
>
> Is there a particular protocol about contributing to accumulo project?
> On Jul 27, 2012 5:27 AM, "David Medinets" <da...@gmail.com> wrote:
>
>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <eb...@gmail.com> wrote:
>>> I have just joined the list with the purpose of volunteering ideas,
>>> design and development (and whatever else in lifecycle)
>>> related to development of the Python client for accumulo.
>>
>> Welcome to the list. There are a lot of Python developers and I'm sure
>> that your client would be well received by the community. My own
>> advice is to write whatever is simplest (fastest to develop) and
>> iterate towards a more complex complete solution.
>>
>> Would jython be any use to provide python access to the existing Java
>> API without any rewrite or plumbing needed?
>>

Re: Python client lib for Accumulo?

Posted by Edmon Begoli <eb...@gmail.com>.
Hi David,

I think that Jython is a good idea as at least a prototype or as a bridge
towards a full blown python library.

It is probably not a good end state because most Python developers do not
want JVM and Java environment, and there is also performance overhead.

Personally, I program in both languages, so I am good.

Is there a particular protocol about contributing to accumulo project?
 On Jul 27, 2012 5:27 AM, "David Medinets" <da...@gmail.com> wrote:

> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <eb...@gmail.com> wrote:
> > I have just joined the list with the purpose of volunteering ideas,
> > design and development (and whatever else in lifecycle)
> > related to development of the Python client for accumulo.
>
> Welcome to the list. There are a lot of Python developers and I'm sure
> that your client would be well received by the community. My own
> advice is to write whatever is simplest (fastest to develop) and
> iterate towards a more complex complete solution.
>
> Would jython be any use to provide python access to the existing Java
> API without any rewrite or plumbing needed?
>

Re: Python client lib for Accumulo?

Posted by David Medinets <da...@gmail.com>.
On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <eb...@gmail.com> wrote:
> I have just joined the list with the purpose of volunteering ideas,
> design and development (and whatever else in lifecycle)
> related to development of the Python client for accumulo.

Welcome to the list. There are a lot of Python developers and I'm sure
that your client would be well received by the community. My own
advice is to write whatever is simplest (fastest to develop) and
iterate towards a more complex complete solution.

Would jython be any use to provide python access to the existing Java
API without any rewrite or plumbing needed?