You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Wojciech Nowak <ma...@pythonic.ninja> on 2015/12/28 09:07:44 UTC

Python Driver Contribution Idea

Dear Drill developers,

Recently I was trying to use Drill from Python through ODBC interface based on blog post from https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl It worked as expected, but what struck to me was that It’s a lot of hassle to configure it.

That’s why based on Your site under Contribution Ideas (https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided to create simpler solution for Python community.  

My Contribution would have two phases:
client/driver for interacting with Drill
dsl which will provide a easier and idiomatic way to write and manipulate queries using defined query set expressions.


1.
Similarly to official client for Elastic Search (https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api of Drill for which i found documentation under https://drill.apache.org/docs/rest-api/
sketch of usage:
https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py

questions:
1.1 I was wondering if Python driver for Drill could be based on Rest-Api, do you see any problems?  
1.2 Do you have any ideas or suggestions for that project?

2.  
It would be separate package from driver, you can install as an optional package via command:
pip install pydrill-dsl
so that it would have separate releases from 1 package.
It would enhance way of interacting with Drill via query set like expressions.
sketch of usage:
https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py

questions:
2.1 Should it be separated from Python Drill Driver package?
2.2 Do you have any ideas or suggestions for that project?

This contribution would be part of my Master Thesis, so any ideas are welcome. My thesis supervisor suggested to contact You to get Drill core developers perspective.

I would be very grateful if You could provide me with your thoughts.

kind regards,
Wojtek Nowak

Re: Python Driver Contribution Idea

Posted by Josh Elser <jo...@gmail.com>.
Ted Dunning wrote:
> On Tue, Dec 29, 2015 at 1:24 AM, Dhruv Gohil<yo...@gmail.com>
> wrote:
>
>> Just my 2cents:
>> I think its into greater good that J/ODBC/Native/REST connectivity
>> features (for SQL operations) are pushed under "Apache Calcite" Project
>> instead of Drill kind of projects.
>> Just like "Apache Phoenix<https://phoenix.apache.org/server.html>" guys
>> started with Query Server<
>> https://issues.apache.org/jira/browse/PHOENIX-971>
>>
>> Drill can/should use Calcite+Avatica<
>> http://calcite.apache.org/docs/avatica_overview.html>  combination to
>> support native language interfaces.
>> AFAIK ODBC/Native client development is in TODO and seeking contribution
>> effort, so if Wojtek selects to work with Calcite Community on this, All
>> dependent project including DRILL will get the benefit of Python
>> Connectivity.
>>
>
> Doing the development first in Drill and then porting it back to Calcite if
> (when) Calcite gets the rest interface going seems like a better path to
> success since it gets results sooner.
>

FWIW, I think I'll have some cycles to start looking into a Python 
implementation in Avatica itself "soonish". I haven't really looked into 
Drill's current relationship with Calcite, but would be amenable to 
having discussions on what a Python API would look like (the ES project 
Wojciech linked and Cassandra are two places I plan to look for 
inspiration).

Re: Python Driver Contribution Idea

Posted by Josh Elser <jo...@gmail.com>.
Ted Dunning wrote:
> On Tue, Dec 29, 2015 at 1:24 AM, Dhruv Gohil<yo...@gmail.com>
> wrote:
>
>> Just my 2cents:
>> I think its into greater good that J/ODBC/Native/REST connectivity
>> features (for SQL operations) are pushed under "Apache Calcite" Project
>> instead of Drill kind of projects.
>> Just like "Apache Phoenix<https://phoenix.apache.org/server.html>" guys
>> started with Query Server<
>> https://issues.apache.org/jira/browse/PHOENIX-971>
>>
>> Drill can/should use Calcite+Avatica<
>> http://calcite.apache.org/docs/avatica_overview.html>  combination to
>> support native language interfaces.
>> AFAIK ODBC/Native client development is in TODO and seeking contribution
>> effort, so if Wojtek selects to work with Calcite Community on this, All
>> dependent project including DRILL will get the benefit of Python
>> Connectivity.
>>
>
> Doing the development first in Drill and then porting it back to Calcite if
> (when) Calcite gets the rest interface going seems like a better path to
> success since it gets results sooner.
>

FWIW, I think I'll have some cycles to start looking into a Python 
implementation in Avatica itself "soonish". I haven't really looked into 
Drill's current relationship with Calcite, but would be amenable to 
having discussions on what a Python API would look like (the ES project 
Wojciech linked and Cassandra are two places I plan to look for 
inspiration).

Re: Python Driver Contribution Idea

Posted by Josh Elser <jo...@gmail.com>.
Ted Dunning wrote:
> On Tue, Dec 29, 2015 at 1:24 AM, Dhruv Gohil<yo...@gmail.com>
> wrote:
>
>> Just my 2cents:
>> I think its into greater good that J/ODBC/Native/REST connectivity
>> features (for SQL operations) are pushed under "Apache Calcite" Project
>> instead of Drill kind of projects.
>> Just like "Apache Phoenix<https://phoenix.apache.org/server.html>" guys
>> started with Query Server<
>> https://issues.apache.org/jira/browse/PHOENIX-971>
>>
>> Drill can/should use Calcite+Avatica<
>> http://calcite.apache.org/docs/avatica_overview.html>  combination to
>> support native language interfaces.
>> AFAIK ODBC/Native client development is in TODO and seeking contribution
>> effort, so if Wojtek selects to work with Calcite Community on this, All
>> dependent project including DRILL will get the benefit of Python
>> Connectivity.
>>
>
> Doing the development first in Drill and then porting it back to Calcite if
> (when) Calcite gets the rest interface going seems like a better path to
> success since it gets results sooner.
>

FWIW, I think I'll have some cycles to start looking into a Python 
implementation in Avatica itself "soonish". I haven't really looked into 
Drill's current relationship with Calcite, but would be amenable to 
having discussions on what a Python API would look like (the ES project 
Wojciech linked and Cassandra are two places I plan to look for 
inspiration).

Re: Python Driver Contribution Idea

Posted by Dhruv Gohil <yo...@gmail.com>.
+1 on "gets results sooner" path!

On Tuesday 29 December 2015 08:07 PM, Ted Dunning wrote:
> On Tue, Dec 29, 2015 at 1:24 AM, Dhruv Gohil <yo...@gmail.com>
> wrote:
>
>> Just my 2cents:
>> I think its into greater good that J/ODBC/Native/REST connectivity
>> features (for SQL operations) are pushed under "Apache Calcite" Project
>> instead of Drill kind of projects.
>> Just like "Apache Phoenix <https://phoenix.apache.org/server.html>" guys
>> started with Query Server <
>> https://issues.apache.org/jira/browse/PHOENIX-971>
>>
>> Drill can/should use Calcite+Avatica <
>> http://calcite.apache.org/docs/avatica_overview.html> combination to
>> support native language interfaces.
>> AFAIK ODBC/Native client development is in TODO and seeking contribution
>> effort, so if Wojtek selects to work with Calcite Community on this, All
>> dependent project including DRILL will get the benefit of Python
>> Connectivity.
>>
> Doing the development first in Drill and then porting it back to Calcite if
> (when) Calcite gets the rest interface going seems like a better path to
> success since it gets results sooner.
>


Re: Python Driver Contribution Idea

Posted by Ted Dunning <te...@gmail.com>.
On Tue, Dec 29, 2015 at 1:24 AM, Dhruv Gohil <yo...@gmail.com>
wrote:

> Just my 2cents:
> I think its into greater good that J/ODBC/Native/REST connectivity
> features (for SQL operations) are pushed under "Apache Calcite" Project
> instead of Drill kind of projects.
> Just like "Apache Phoenix <https://phoenix.apache.org/server.html>" guys
> started with Query Server <
> https://issues.apache.org/jira/browse/PHOENIX-971>
>
> Drill can/should use Calcite+Avatica <
> http://calcite.apache.org/docs/avatica_overview.html> combination to
> support native language interfaces.
> AFAIK ODBC/Native client development is in TODO and seeking contribution
> effort, so if Wojtek selects to work with Calcite Community on this, All
> dependent project including DRILL will get the benefit of Python
> Connectivity.
>

Doing the development first in Drill and then porting it back to Calcite if
(when) Calcite gets the rest interface going seems like a better path to
success since it gets results sooner.

Re: Python Driver Contribution Idea

Posted by Ted Dunning <te...@gmail.com>.
On Tue, Dec 29, 2015 at 1:24 AM, Dhruv Gohil <yo...@gmail.com>
wrote:

> Just my 2cents:
> I think its into greater good that J/ODBC/Native/REST connectivity
> features (for SQL operations) are pushed under "Apache Calcite" Project
> instead of Drill kind of projects.
> Just like "Apache Phoenix <https://phoenix.apache.org/server.html>" guys
> started with Query Server <
> https://issues.apache.org/jira/browse/PHOENIX-971>
>
> Drill can/should use Calcite+Avatica <
> http://calcite.apache.org/docs/avatica_overview.html> combination to
> support native language interfaces.
> AFAIK ODBC/Native client development is in TODO and seeking contribution
> effort, so if Wojtek selects to work with Calcite Community on this, All
> dependent project including DRILL will get the benefit of Python
> Connectivity.
>

Doing the development first in Drill and then porting it back to Calcite if
(when) Calcite gets the rest interface going seems like a better path to
success since it gets results sooner.

Re: Python Driver Contribution Idea

Posted by Ted Dunning <te...@gmail.com>.
On Tue, Dec 29, 2015 at 1:24 AM, Dhruv Gohil <yo...@gmail.com>
wrote:

> Just my 2cents:
> I think its into greater good that J/ODBC/Native/REST connectivity
> features (for SQL operations) are pushed under "Apache Calcite" Project
> instead of Drill kind of projects.
> Just like "Apache Phoenix <https://phoenix.apache.org/server.html>" guys
> started with Query Server <
> https://issues.apache.org/jira/browse/PHOENIX-971>
>
> Drill can/should use Calcite+Avatica <
> http://calcite.apache.org/docs/avatica_overview.html> combination to
> support native language interfaces.
> AFAIK ODBC/Native client development is in TODO and seeking contribution
> effort, so if Wojtek selects to work with Calcite Community on this, All
> dependent project including DRILL will get the benefit of Python
> Connectivity.
>

Doing the development first in Drill and then porting it back to Calcite if
(when) Calcite gets the rest interface going seems like a better path to
success since it gets results sooner.

Re: Python Driver Contribution Idea

Posted by Dhruv Gohil <yo...@gmail.com>.
Just my 2cents:
I think its into greater good that J/ODBC/Native/REST connectivity 
features (for SQL operations) are pushed under "Apache Calcite" Project 
instead of Drill kind of projects.
Just like "Apache Phoenix <https://phoenix.apache.org/server.html>" guys 
started with Query Server 
<https://issues.apache.org/jira/browse/PHOENIX-971>

Drill can/should use Calcite+Avatica 
<http://calcite.apache.org/docs/avatica_overview.html> combination to 
support native language interfaces.
AFAIK ODBC/Native client development is in TODO and seeking contribution 
effort, so if Wojtek selects to work with Calcite Community on this, All 
dependent project including DRILL will get the benefit of Python 
Connectivity.

cc:dev@calcite.apache.org

On Monday 28 December 2015 01:37 PM, Wojciech Nowak wrote:
> Dear Drill developers,
>
> Recently I was trying to use Drill from Python through ODBC interface based on blog post from https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl It worked as expected, but what struck to me was that It’s a lot of hassle to configure it.
>
> That’s why based on Your site under Contribution Ideas (https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided to create simpler solution for Python community.
>
> My Contribution would have two phases:
> client/driver for interacting with Drill
> dsl which will provide a easier and idiomatic way to write and manipulate queries using defined query set expressions.
>
>
> 1.
> Similarly to official client for Elastic Search (https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api of Drill for which i found documentation under https://drill.apache.org/docs/rest-api/
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>
> questions:
> 1.1 I was wondering if Python driver for Drill could be based on Rest-Api, do you see any problems?
> 1.2 Do you have any ideas or suggestions for that project?
>
> 2.
> It would be separate package from driver, you can install as an optional package via command:
> pip install pydrill-dsl
> so that it would have separate releases from 1 package.
> It would enhance way of interacting with Drill via query set like expressions.
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>
> questions:
> 2.1 Should it be separated from Python Drill Driver package?
> 2.2 Do you have any ideas or suggestions for that project?
>
> This contribution would be part of my Master Thesis, so any ideas are welcome. My thesis supervisor suggested to contact You to get Drill core developers perspective.
>
> I would be very grateful if You could provide me with your thoughts.
>
> kind regards,
> Wojtek Nowak
>


Re: Python Driver Contribution Idea

Posted by Dhruv Gohil <yo...@gmail.com>.
Just my 2cents:
I think its into greater good that J/ODBC/Native/REST connectivity 
features (for SQL operations) are pushed under "Apache Calcite" Project 
instead of Drill kind of projects.
Just like "Apache Phoenix <https://phoenix.apache.org/server.html>" guys 
started with Query Server 
<https://issues.apache.org/jira/browse/PHOENIX-971>

Drill can/should use Calcite+Avatica 
<http://calcite.apache.org/docs/avatica_overview.html> combination to 
support native language interfaces.
AFAIK ODBC/Native client development is in TODO and seeking contribution 
effort, so if Wojtek selects to work with Calcite Community on this, All 
dependent project including DRILL will get the benefit of Python 
Connectivity.

cc:dev@calcite.apache.org

On Monday 28 December 2015 01:37 PM, Wojciech Nowak wrote:
> Dear Drill developers,
>
> Recently I was trying to use Drill from Python through ODBC interface based on blog post from https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl It worked as expected, but what struck to me was that It’s a lot of hassle to configure it.
>
> That’s why based on Your site under Contribution Ideas (https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided to create simpler solution for Python community.
>
> My Contribution would have two phases:
> client/driver for interacting with Drill
> dsl which will provide a easier and idiomatic way to write and manipulate queries using defined query set expressions.
>
>
> 1.
> Similarly to official client for Elastic Search (https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api of Drill for which i found documentation under https://drill.apache.org/docs/rest-api/
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>
> questions:
> 1.1 I was wondering if Python driver for Drill could be based on Rest-Api, do you see any problems?
> 1.2 Do you have any ideas or suggestions for that project?
>
> 2.
> It would be separate package from driver, you can install as an optional package via command:
> pip install pydrill-dsl
> so that it would have separate releases from 1 package.
> It would enhance way of interacting with Drill via query set like expressions.
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>
> questions:
> 2.1 Should it be separated from Python Drill Driver package?
> 2.2 Do you have any ideas or suggestions for that project?
>
> This contribution would be part of my Master Thesis, so any ideas are welcome. My thesis supervisor suggested to contact You to get Drill core developers perspective.
>
> I would be very grateful if You could provide me with your thoughts.
>
> kind regards,
> Wojtek Nowak
>


Re: Python Driver Contribution Idea

Posted by Dhruv Gohil <yo...@gmail.com>.
Just my 2cents:
I think its into greater good that J/ODBC/Native/REST connectivity 
features (for SQL operations) are pushed under "Apache Calcite" Project 
instead of Drill kind of projects.
Just like "Apache Phoenix <https://phoenix.apache.org/server.html>" guys 
started with Query Server 
<https://issues.apache.org/jira/browse/PHOENIX-971>

Drill can/should use Calcite+Avatica 
<http://calcite.apache.org/docs/avatica_overview.html> combination to 
support native language interfaces.
AFAIK ODBC/Native client development is in TODO and seeking contribution 
effort, so if Wojtek selects to work with Calcite Community on this, All 
dependent project including DRILL will get the benefit of Python 
Connectivity.

cc:dev@calcite.apache.org

On Monday 28 December 2015 01:37 PM, Wojciech Nowak wrote:
> Dear Drill developers,
>
> Recently I was trying to use Drill from Python through ODBC interface based on blog post from https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl It worked as expected, but what struck to me was that It’s a lot of hassle to configure it.
>
> That’s why based on Your site under Contribution Ideas (https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided to create simpler solution for Python community.
>
> My Contribution would have two phases:
> client/driver for interacting with Drill
> dsl which will provide a easier and idiomatic way to write and manipulate queries using defined query set expressions.
>
>
> 1.
> Similarly to official client for Elastic Search (https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api of Drill for which i found documentation under https://drill.apache.org/docs/rest-api/
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>
> questions:
> 1.1 I was wondering if Python driver for Drill could be based on Rest-Api, do you see any problems?
> 1.2 Do you have any ideas or suggestions for that project?
>
> 2.
> It would be separate package from driver, you can install as an optional package via command:
> pip install pydrill-dsl
> so that it would have separate releases from 1 package.
> It would enhance way of interacting with Drill via query set like expressions.
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>
> questions:
> 2.1 Should it be separated from Python Drill Driver package?
> 2.2 Do you have any ideas or suggestions for that project?
>
> This contribution would be part of my Master Thesis, so any ideas are welcome. My thesis supervisor suggested to contact You to get Drill core developers perspective.
>
> I would be very grateful if You could provide me with your thoughts.
>
> kind regards,
> Wojtek Nowak
>


Re: Python Driver Contribution Idea

Posted by Neeraja Rentachintala <nr...@maprtech.com>.
This is a great idea. thanks for the initiative.
Look forward for more detailed discussions on the API design.
-Neeraja

On Mon, Dec 28, 2015 at 12:07 AM, Wojciech Nowak <ma...@pythonic.ninja>
wrote:

> Dear Drill developers,
>
> Recently I was trying to use Drill from Python through ODBC interface
> based on blog post from
> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
> It worked as expected, but what struck to me was that It’s a lot of hassle
> to configure it.
>
> That’s why based on Your site under Contribution Ideas (
> https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided
> to create simpler solution for Python community.
>
> My Contribution would have two phases:
> client/driver for interacting with Drill
> dsl which will provide a easier and idiomatic way to write and manipulate
> queries using defined query set expressions.
>
>
> 1.
> Similarly to official client for Elastic Search (
> https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api
> of Drill for which i found documentation under
> https://drill.apache.org/docs/rest-api/
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>
> questions:
> 1.1 I was wondering if Python driver for Drill could be based on Rest-Api,
> do you see any problems?
> 1.2 Do you have any ideas or suggestions for that project?
>
> 2.
> It would be separate package from driver, you can install as an optional
> package via command:
> pip install pydrill-dsl
> so that it would have separate releases from 1 package.
> It would enhance way of interacting with Drill via query set like
> expressions.
> sketch of usage:
>
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>
> questions:
> 2.1 Should it be separated from Python Drill Driver package?
> 2.2 Do you have any ideas or suggestions for that project?
>
> This contribution would be part of my Master Thesis, so any ideas are
> welcome. My thesis supervisor suggested to contact You to get Drill core
> developers perspective.
>
> I would be very grateful if You could provide me with your thoughts.
>
> kind regards,
> Wojtek Nowak
>

Re: Python Driver Contribution Idea

Posted by Jason Altekruse <al...@gmail.com>.
There is already a JIRA opened for this issue with some work done by Adam
Gilmore. Looks like we never merged it though.

I'll try to take a look at this soon and get it merged.

https://issues.apache.org/jira/browse/DRILL-2373

On Mon, Dec 28, 2015 at 11:03 AM, Jason Altekruse <al...@gmail.com>
wrote:

> One thing we should fix to make this easier is to provide properly typed
> data through the rest API. This result listener is transforming the native
> drill record format into a simple hashmap with both the keys and values
> provided as strings. This list of hashmaps is serialized by jackson into
> the result set returned by the Rest API in response to a query request.
>
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java#L122
>
> I assume that there should be reasonable libraries in python for parsing
> extended JSON, which would probably be the easiest way to get fully typed
> data back to your new client.
>
> I think it would be best to deprecate the current behavior of returning
> all strings entirely and just create two modes. Extended JSON for full
> typing and simple JSON with types like date, time and binary converted to
> strings appropriately. We have had discussions on the list in the past from
> users that had to work around the fact that the numeric data was coming
> back as strings. We should just make the behavior intuitive to get started
> with and allow the option to turn on full typing with extended JSON if
> needed.
>
> - Jason
>
> On Mon, Dec 28, 2015 at 10:47 AM, Wojciech Nowak <ma...@pythonic.ninja>
> wrote:
>
>> Hello!
>>
>> Great to read such enthusiastic feedback.
>>
>> I have created git repo -> https://github.com/PythonicNinja/pydrill
>>
>> Enabled testing via travis.
>> Enabled creation of docs on
>> http://pydrill.readthedocs.org/en/latest/readme.html
>>
>> Discussions related to Python Driver can move there.
>>
>> kind regards,
>> Wojtek Nowak
>>
>> On Monday, 28 December 2015 at 17:12, Tomer Shiran wrote:
>>
>> > +1
>> >
>> > Having a Python client would be super valuable
>> >
>> >
>> >
>> > > On Dec 28, 2015, at 9:45 AM, Peder Jakobsen | gmail <
>> pjakobsen@gmail.com (mailto:pjakobsen@gmail.com)> wrote:
>> > >
>> > > Two thumbs up for this project. An immediate benefit is the ability to
>> > > take advantage of the enhanced interactive features of the iPython
>> shell.
>> > >
>> > > Perhaps the next step is to model the design after a similar Rest API
>> > > wrapper, for example, python-twitter:
>> > > https://github.com/bear/python-twitter
>> > >
>> > > > On Mon, Dec 28, 2015 at 8:45 AM, Charles Givre <cgivre@gmail.com
>> (mailto:cgivre@gmail.com)> wrote:
>> > > >
>> > > > I’d second that and be willing to help.
>> > > >
>> > > >
>> > > > > On Dec 28, 2015, at 07:59, John Omernik <john@omernik.com
>> (mailto:john@omernik.com)> wrote:
>> > > > >
>> > > > > I think a Pythonic module with Drill could be a great
>> contribution.
>> > > > Using
>> > > > > the Rest API makes the most sense, wrapping it, and interfacing
>> with it
>> > > > > using requests or something similar. Since everything is done via
>> JSON in
>> > > > > the rest API, there could be nice interaction with the API, doing
>> things
>> > > > > such as authentication (it's form based, so you have to use a
>> requests
>> > > > > session or similar), query submission, results, error
>> handling,etc. You
>> > > > > will want to determine what you want your driver to do, do you
>> want an
>> > > > > interface to support submitting new storage plugins? Do you want
>> to
>> > > > >
>> > > >
>> > > > expose
>> > > > > query time settings (such as the JSON read number as double) via
>> the
>> > > > > driver, or just via a statement submitted by the user? (one
>> requires much
>> > > > > more work, the other requires a eye towards security). Security in
>> > > > >
>> > > >
>> > > > another
>> > > > > thing, you want to ensure that if something is using your module,
>> say a
>> > > > > Python Flask App, that there is validation of SQL, and other such
>> > > > >
>> > > >
>> > > > concerns.
>> > > > > Drill seems to be pretty good about it, but any module you would
>> write
>> > > > > should be explicit about what it is and what it isn't doing
>> related to
>> > > > > input sanitization/security
>> > > > >
>> > > > > Other things to think about would be something that would allow
>> result
>> > > > set
>> > > > > objects in your Python driver to be easily moved to a pandas data
>> frame.
>> > > >
>> > > > I
>> > > > > think the Data Science folks out there would love this, and you
>> would
>> > > >
>> > > > have
>> > > > > a core setup of users and other contributions very quickly with
>> that.
>> > > >
>> > > > The
>> > > > > key to something like this would be ensuring it's as Pythonic as
>> possible
>> > > > > and is trying to bridge the gap between the Python language and
>> Rest API.
>> > > > > This allows you, the author, the most flexibility to focus on
>> your code,
>> > > > > and not have to worry much about the Drill code base as
>> everything is
>> > > > >
>> > > >
>> > > > using
>> > > > > the Rest API (which is really well designed having used it myself
>> in
>> > > >
>> > > > Python
>> > > > > scripts).
>> > > > >
>> > > > > This is a great idea and I would be happy to contribute/assist!
>> > > > >
>> > > > > John
>> > > > >
>> > > > > > On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak
>> <mail@pythonic.ninja (mailto:mail@pythonic.ninja)>
>> > > > > wrote:
>> > > > >
>> > > > > > Dear Drill developers,
>> > > > > >
>> > > > > > Recently I was trying to use Drill from Python through ODBC
>> interface
>> > > > > > based on blog post from
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > > >
>> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
>> > > > > > It worked as expected, but what struck to me was that It’s a
>> lot of
>> > > > >
>> > > >
>> > > > hassle
>> > > > > > to configure it.
>> > > > > >
>> > > > > > That’s why based on Your site under Contribution Ideas (
>> > > > > > https://drill.apache.org/docs/apache-drill-contribution-ideas/)
>> I
>> > > > > >
>> > > > >
>> > > >
>> > > > decided
>> > > > > > to create simpler solution for Python community.
>> > > > > >
>> > > > > > My Contribution would have two phases:
>> > > > > > client/driver for interacting with Drill
>> > > > > > dsl which will provide a easier and idiomatic way to write and
>> > > > > >
>> > > > >
>> > > >
>> > > > manipulate
>> > > > > > queries using defined query set expressions.
>> > > > > >
>> > > > > >
>> > > > > > 1.
>> > > > > > Similarly to official client for Elastic Search (
>> > > > > > https://github.com/elastic/elasticsearch-py) I would like to
>> use
>> > > > > >
>> > > > >
>> > > >
>> > > > Rest-Api
>> > > > > > of Drill for which i found documentation under
>> > > > > > https://drill.apache.org/docs/rest-api/
>> > > > > > sketch of usage:
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>> > > > > >
>> > > > > > questions:
>> > > > > > 1.1 I was wondering if Python driver for Drill could be based on
>> > > > > >
>> > > > >
>> > > >
>> > > > Rest-Api,
>> > > > > > do you see any problems?
>> > > > > > 1.2 Do you have any ideas or suggestions for that project?
>> > > > > >
>> > > > > > 2.
>> > > > > > It would be separate package from driver, you can install as an
>> optional
>> > > > > > package via command:
>> > > > > > pip install pydrill-dsl
>> > > > > > so that it would have separate releases from 1 package.
>> > > > > > It would enhance way of interacting with Drill via query set
>> like
>> > > > > > expressions.
>> > > > > > sketch of usage:
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>> > > > > >
>> > > > > > questions:
>> > > > > > 2.1 Should it be separated from Python Drill Driver package?
>> > > > > > 2.2 Do you have any ideas or suggestions for that project?
>> > > > > >
>> > > > > > This contribution would be part of my Master Thesis, so any
>> ideas are
>> > > > > > welcome. My thesis supervisor suggested to contact You to get
>> Drill core
>> > > > > > developers perspective.
>> > > > > >
>> > > > > > I would be very grateful if You could provide me with your
>> thoughts.
>> > > > > >
>> > > > > > kind regards,
>> > > > > > Wojtek Nowak
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> > >
>> >
>> >
>> >
>>
>>
>>
>

Re: Python Driver Contribution Idea

Posted by Jason Altekruse <al...@gmail.com>.
One thing we should fix to make this easier is to provide properly typed
data through the rest API. This result listener is transforming the native
drill record format into a simple hashmap with both the keys and values
provided as strings. This list of hashmaps is serialized by jackson into
the result set returned by the Rest API in response to a query request.

https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java#L122

I assume that there should be reasonable libraries in python for parsing
extended JSON, which would probably be the easiest way to get fully typed
data back to your new client.

I think it would be best to deprecate the current behavior of returning all
strings entirely and just create two modes. Extended JSON for full typing
and simple JSON with types like date, time and binary converted to strings
appropriately. We have had discussions on the list in the past from users
that had to work around the fact that the numeric data was coming back as
strings. We should just make the behavior intuitive to get started with and
allow the option to turn on full typing with extended JSON if needed.

- Jason

On Mon, Dec 28, 2015 at 10:47 AM, Wojciech Nowak <ma...@pythonic.ninja>
wrote:

> Hello!
>
> Great to read such enthusiastic feedback.
>
> I have created git repo -> https://github.com/PythonicNinja/pydrill
>
> Enabled testing via travis.
> Enabled creation of docs on
> http://pydrill.readthedocs.org/en/latest/readme.html
>
> Discussions related to Python Driver can move there.
>
> kind regards,
> Wojtek Nowak
>
> On Monday, 28 December 2015 at 17:12, Tomer Shiran wrote:
>
> > +1
> >
> > Having a Python client would be super valuable
> >
> >
> >
> > > On Dec 28, 2015, at 9:45 AM, Peder Jakobsen | gmail <
> pjakobsen@gmail.com (mailto:pjakobsen@gmail.com)> wrote:
> > >
> > > Two thumbs up for this project. An immediate benefit is the ability to
> > > take advantage of the enhanced interactive features of the iPython
> shell.
> > >
> > > Perhaps the next step is to model the design after a similar Rest API
> > > wrapper, for example, python-twitter:
> > > https://github.com/bear/python-twitter
> > >
> > > > On Mon, Dec 28, 2015 at 8:45 AM, Charles Givre <cgivre@gmail.com
> (mailto:cgivre@gmail.com)> wrote:
> > > >
> > > > I’d second that and be willing to help.
> > > >
> > > >
> > > > > On Dec 28, 2015, at 07:59, John Omernik <john@omernik.com (mailto:
> john@omernik.com)> wrote:
> > > > >
> > > > > I think a Pythonic module with Drill could be a great contribution.
> > > > Using
> > > > > the Rest API makes the most sense, wrapping it, and interfacing
> with it
> > > > > using requests or something similar. Since everything is done via
> JSON in
> > > > > the rest API, there could be nice interaction with the API, doing
> things
> > > > > such as authentication (it's form based, so you have to use a
> requests
> > > > > session or similar), query submission, results, error
> handling,etc. You
> > > > > will want to determine what you want your driver to do, do you
> want an
> > > > > interface to support submitting new storage plugins? Do you want to
> > > > >
> > > >
> > > > expose
> > > > > query time settings (such as the JSON read number as double) via
> the
> > > > > driver, or just via a statement submitted by the user? (one
> requires much
> > > > > more work, the other requires a eye towards security). Security in
> > > > >
> > > >
> > > > another
> > > > > thing, you want to ensure that if something is using your module,
> say a
> > > > > Python Flask App, that there is validation of SQL, and other such
> > > > >
> > > >
> > > > concerns.
> > > > > Drill seems to be pretty good about it, but any module you would
> write
> > > > > should be explicit about what it is and what it isn't doing
> related to
> > > > > input sanitization/security
> > > > >
> > > > > Other things to think about would be something that would allow
> result
> > > > set
> > > > > objects in your Python driver to be easily moved to a pandas data
> frame.
> > > >
> > > > I
> > > > > think the Data Science folks out there would love this, and you
> would
> > > >
> > > > have
> > > > > a core setup of users and other contributions very quickly with
> that.
> > > >
> > > > The
> > > > > key to something like this would be ensuring it's as Pythonic as
> possible
> > > > > and is trying to bridge the gap between the Python language and
> Rest API.
> > > > > This allows you, the author, the most flexibility to focus on your
> code,
> > > > > and not have to worry much about the Drill code base as everything
> is
> > > > >
> > > >
> > > > using
> > > > > the Rest API (which is really well designed having used it myself
> in
> > > >
> > > > Python
> > > > > scripts).
> > > > >
> > > > > This is a great idea and I would be happy to contribute/assist!
> > > > >
> > > > > John
> > > > >
> > > > > > On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak
> <mail@pythonic.ninja (mailto:mail@pythonic.ninja)>
> > > > > wrote:
> > > > >
> > > > > > Dear Drill developers,
> > > > > >
> > > > > > Recently I was trying to use Drill from Python through ODBC
> interface
> > > > > > based on blog post from
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
> > > > > > It worked as expected, but what struck to me was that It’s a lot
> of
> > > > >
> > > >
> > > > hassle
> > > > > > to configure it.
> > > > > >
> > > > > > That’s why based on Your site under Contribution Ideas (
> > > > > > https://drill.apache.org/docs/apache-drill-contribution-ideas/)
> I
> > > > > >
> > > > >
> > > >
> > > > decided
> > > > > > to create simpler solution for Python community.
> > > > > >
> > > > > > My Contribution would have two phases:
> > > > > > client/driver for interacting with Drill
> > > > > > dsl which will provide a easier and idiomatic way to write and
> > > > > >
> > > > >
> > > >
> > > > manipulate
> > > > > > queries using defined query set expressions.
> > > > > >
> > > > > >
> > > > > > 1.
> > > > > > Similarly to official client for Elastic Search (
> > > > > > https://github.com/elastic/elasticsearch-py) I would like to use
> > > > > >
> > > > >
> > > >
> > > > Rest-Api
> > > > > > of Drill for which i found documentation under
> > > > > > https://drill.apache.org/docs/rest-api/
> > > > > > sketch of usage:
> > > > > >
> > > > >
> > > >
> > > >
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
> > > > > >
> > > > > > questions:
> > > > > > 1.1 I was wondering if Python driver for Drill could be based on
> > > > > >
> > > > >
> > > >
> > > > Rest-Api,
> > > > > > do you see any problems?
> > > > > > 1.2 Do you have any ideas or suggestions for that project?
> > > > > >
> > > > > > 2.
> > > > > > It would be separate package from driver, you can install as an
> optional
> > > > > > package via command:
> > > > > > pip install pydrill-dsl
> > > > > > so that it would have separate releases from 1 package.
> > > > > > It would enhance way of interacting with Drill via query set like
> > > > > > expressions.
> > > > > > sketch of usage:
> > > > > >
> > > > >
> > > >
> > > >
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
> > > > > >
> > > > > > questions:
> > > > > > 2.1 Should it be separated from Python Drill Driver package?
> > > > > > 2.2 Do you have any ideas or suggestions for that project?
> > > > > >
> > > > > > This contribution would be part of my Master Thesis, so any
> ideas are
> > > > > > welcome. My thesis supervisor suggested to contact You to get
> Drill core
> > > > > > developers perspective.
> > > > > >
> > > > > > I would be very grateful if You could provide me with your
> thoughts.
> > > > > >
> > > > > > kind regards,
> > > > > > Wojtek Nowak
> > > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
> >
>
>
>

Re: Python Driver Contribution Idea

Posted by Wojciech Nowak <ma...@pythonic.ninja>.
Hello!

Great to read such enthusiastic feedback.

I have created git repo -> https://github.com/PythonicNinja/pydrill

Enabled testing via travis.
Enabled creation of docs on http://pydrill.readthedocs.org/en/latest/readme.html

Discussions related to Python Driver can move there.

kind regards,
Wojtek Nowak

On Monday, 28 December 2015 at 17:12, Tomer Shiran wrote:

> +1
>  
> Having a Python client would be super valuable
>  
>  
>  
> > On Dec 28, 2015, at 9:45 AM, Peder Jakobsen | gmail <pjakobsen@gmail.com (mailto:pjakobsen@gmail.com)> wrote:
> >  
> > Two thumbs up for this project. An immediate benefit is the ability to
> > take advantage of the enhanced interactive features of the iPython shell.
> >  
> > Perhaps the next step is to model the design after a similar Rest API
> > wrapper, for example, python-twitter:
> > https://github.com/bear/python-twitter
> >  
> > > On Mon, Dec 28, 2015 at 8:45 AM, Charles Givre <cgivre@gmail.com (mailto:cgivre@gmail.com)> wrote:
> > >  
> > > I’d second that and be willing to help.
> > >  
> > >  
> > > > On Dec 28, 2015, at 07:59, John Omernik <john@omernik.com (mailto:john@omernik.com)> wrote:
> > > >  
> > > > I think a Pythonic module with Drill could be a great contribution.
> > > Using
> > > > the Rest API makes the most sense, wrapping it, and interfacing with it
> > > > using requests or something similar. Since everything is done via JSON in
> > > > the rest API, there could be nice interaction with the API, doing things
> > > > such as authentication (it's form based, so you have to use a requests
> > > > session or similar), query submission, results, error handling,etc. You
> > > > will want to determine what you want your driver to do, do you want an
> > > > interface to support submitting new storage plugins? Do you want to
> > > >  
> > >  
> > > expose
> > > > query time settings (such as the JSON read number as double) via the
> > > > driver, or just via a statement submitted by the user? (one requires much
> > > > more work, the other requires a eye towards security). Security in
> > > >  
> > >  
> > > another
> > > > thing, you want to ensure that if something is using your module, say a
> > > > Python Flask App, that there is validation of SQL, and other such
> > > >  
> > >  
> > > concerns.
> > > > Drill seems to be pretty good about it, but any module you would write
> > > > should be explicit about what it is and what it isn't doing related to
> > > > input sanitization/security
> > > >  
> > > > Other things to think about would be something that would allow result
> > > set
> > > > objects in your Python driver to be easily moved to a pandas data frame.
> > >  
> > > I
> > > > think the Data Science folks out there would love this, and you would
> > >  
> > > have
> > > > a core setup of users and other contributions very quickly with that.
> > >  
> > > The
> > > > key to something like this would be ensuring it's as Pythonic as possible
> > > > and is trying to bridge the gap between the Python language and Rest API.
> > > > This allows you, the author, the most flexibility to focus on your code,
> > > > and not have to worry much about the Drill code base as everything is
> > > >  
> > >  
> > > using
> > > > the Rest API (which is really well designed having used it myself in
> > >  
> > > Python
> > > > scripts).
> > > >  
> > > > This is a great idea and I would be happy to contribute/assist!
> > > >  
> > > > John
> > > >  
> > > > > On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <mail@pythonic.ninja (mailto:mail@pythonic.ninja)>
> > > > wrote:
> > > >  
> > > > > Dear Drill developers,
> > > > >  
> > > > > Recently I was trying to use Drill from Python through ODBC interface
> > > > > based on blog post from
> > > > >  
> > > >  
> > > >  
> > >  
> > > https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
> > > > > It worked as expected, but what struck to me was that It’s a lot of
> > > >  
> > >  
> > > hassle
> > > > > to configure it.
> > > > >  
> > > > > That’s why based on Your site under Contribution Ideas (
> > > > > https://drill.apache.org/docs/apache-drill-contribution-ideas/) I
> > > > >  
> > > >  
> > >  
> > > decided
> > > > > to create simpler solution for Python community.
> > > > >  
> > > > > My Contribution would have two phases:
> > > > > client/driver for interacting with Drill
> > > > > dsl which will provide a easier and idiomatic way to write and
> > > > >  
> > > >  
> > >  
> > > manipulate
> > > > > queries using defined query set expressions.
> > > > >  
> > > > >  
> > > > > 1.
> > > > > Similarly to official client for Elastic Search (
> > > > > https://github.com/elastic/elasticsearch-py) I would like to use
> > > > >  
> > > >  
> > >  
> > > Rest-Api
> > > > > of Drill for which i found documentation under
> > > > > https://drill.apache.org/docs/rest-api/
> > > > > sketch of usage:
> > > > >  
> > > >  
> > >  
> > > https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
> > > > >  
> > > > > questions:
> > > > > 1.1 I was wondering if Python driver for Drill could be based on
> > > > >  
> > > >  
> > >  
> > > Rest-Api,
> > > > > do you see any problems?
> > > > > 1.2 Do you have any ideas or suggestions for that project?
> > > > >  
> > > > > 2.
> > > > > It would be separate package from driver, you can install as an optional
> > > > > package via command:
> > > > > pip install pydrill-dsl
> > > > > so that it would have separate releases from 1 package.
> > > > > It would enhance way of interacting with Drill via query set like
> > > > > expressions.
> > > > > sketch of usage:
> > > > >  
> > > >  
> > >  
> > > https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
> > > > >  
> > > > > questions:
> > > > > 2.1 Should it be separated from Python Drill Driver package?
> > > > > 2.2 Do you have any ideas or suggestions for that project?
> > > > >  
> > > > > This contribution would be part of my Master Thesis, so any ideas are
> > > > > welcome. My thesis supervisor suggested to contact You to get Drill core
> > > > > developers perspective.
> > > > >  
> > > > > I would be very grateful if You could provide me with your thoughts.
> > > > >  
> > > > > kind regards,
> > > > > Wojtek Nowak
> > > > >  
> > > >  
> > >  
> > >  
> >  
> >  
>  
>  
>  



Re: Python Driver Contribution Idea

Posted by Tomer Shiran <ts...@dremio.com>.
+1

Having a Python client would be super valuable



> On Dec 28, 2015, at 9:45 AM, Peder Jakobsen | gmail <pj...@gmail.com> wrote:
> 
> Two thumbs up for this project.  An immediate benefit  is the ability to
> take advantage of the enhanced interactive features of the iPython shell.
> 
> Perhaps the next step is to model the design after a similar Rest API
> wrapper, for example,  python-twitter:
> https://github.com/bear/python-twitter
> 
>> On Mon, Dec 28, 2015 at 8:45 AM, Charles Givre <cg...@gmail.com> wrote:
>> 
>> I’d second that and be willing to help.
>> 
>> 
>>> On Dec 28, 2015, at 07:59, John Omernik <jo...@omernik.com> wrote:
>>> 
>>> I think a Pythonic module with Drill could be a great contribution.
>> Using
>>> the Rest API makes the most sense, wrapping it, and interfacing with it
>>> using requests or something similar. Since everything is done via JSON in
>>> the rest API, there could be nice interaction with the API, doing things
>>> such as authentication (it's form based, so you have to use a requests
>>> session or similar), query submission, results, error handling,etc. You
>>> will want to determine what you want your driver to do, do you want an
>>> interface to support submitting new storage plugins?  Do you want to
>> expose
>>> query time settings (such as the JSON read number as double) via the
>>> driver, or just via a statement submitted by the user? (one requires much
>>> more work, the other requires a eye towards security).  Security in
>> another
>>> thing, you want to ensure that if something is using your module, say a
>>> Python Flask App, that there is validation of SQL, and other such
>> concerns.
>>> Drill seems to be pretty good about it, but any module you would write
>>> should be explicit about what it is and what it isn't doing related to
>>> input sanitization/security
>>> 
>>> Other things to think about would be something that would allow result
>> set
>>> objects in your Python driver to be easily moved to a pandas data frame.
>> I
>>> think the Data Science folks out there would love this, and you would
>> have
>>> a core setup of users and other contributions very quickly with that.
>> The
>>> key to something like this would be ensuring it's as Pythonic as possible
>>> and is trying to bridge the gap between the Python language and Rest API.
>>> This allows you, the author, the most flexibility to focus on your code,
>>> and not have to worry much about the Drill code base as everything is
>> using
>>> the Rest API (which is really well designed having used it myself in
>> Python
>>> scripts).
>>> 
>>> This is a great idea and I would be happy to contribute/assist!
>>> 
>>> John
>>> 
>>>> On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <ma...@pythonic.ninja>
>>> wrote:
>>> 
>>>> Dear Drill developers,
>>>> 
>>>> Recently I was trying to use Drill from Python through ODBC interface
>>>> based on blog post from
>> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
>>>> It worked as expected, but what struck to me was that It’s a lot of
>> hassle
>>>> to configure it.
>>>> 
>>>> That’s why based on Your site under Contribution Ideas (
>>>> https://drill.apache.org/docs/apache-drill-contribution-ideas/) I
>> decided
>>>> to create simpler solution for Python community.
>>>> 
>>>> My Contribution would have two phases:
>>>> client/driver for interacting with Drill
>>>> dsl which will provide a easier and idiomatic way to write and
>> manipulate
>>>> queries using defined query set expressions.
>>>> 
>>>> 
>>>> 1.
>>>> Similarly to official client for Elastic Search (
>>>> https://github.com/elastic/elasticsearch-py) I would like to use
>> Rest-Api
>>>> of Drill for which i found documentation under
>>>> https://drill.apache.org/docs/rest-api/
>>>> sketch of usage:
>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>>>> 
>>>> questions:
>>>> 1.1 I was wondering if Python driver for Drill could be based on
>> Rest-Api,
>>>> do you see any problems?
>>>> 1.2 Do you have any ideas or suggestions for that project?
>>>> 
>>>> 2.
>>>> It would be separate package from driver, you can install as an optional
>>>> package via command:
>>>> pip install pydrill-dsl
>>>> so that it would have separate releases from 1 package.
>>>> It would enhance way of interacting with Drill via query set like
>>>> expressions.
>>>> sketch of usage:
>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>>>> 
>>>> questions:
>>>> 2.1 Should it be separated from Python Drill Driver package?
>>>> 2.2 Do you have any ideas or suggestions for that project?
>>>> 
>>>> This contribution would be part of my Master Thesis, so any ideas are
>>>> welcome. My thesis supervisor suggested to contact You to get Drill core
>>>> developers perspective.
>>>> 
>>>> I would be very grateful if You could provide me with your thoughts.
>>>> 
>>>> kind regards,
>>>> Wojtek Nowak
>> 
>> 

Re: Python Driver Contribution Idea

Posted by Charles Givre <cg...@gmail.com>.
Wojtek, Peter, 
If I can help, please add me to the github acct. I think it would be a huge boost for drill. 
V/R,

On Dec 28, 2015, at 10:59, Peder Jakobsen | gmail <pj...@gmail.com> wrote:
> 
> Hi Wojtek,  if you want to kick start this project quickly, I would
> suggest you set up a project with a README.md in your github account, and
> share the link with us.   Then we can move the detailed discussion about
> features over there, and start collaborating.
> 
> Personally, I would then start with selecting the python test framework
> (some are better that others), then simply  write some documented tests.
> These will fail, and we can then work on making them pass if they feel like
> they embody API calls that have the correct design.
> 
> Personally, I would start using such an API immediately, so I'm quite
> motivated to help.  And yes,  it would make for a very nice masters thesis:
> Good API design is a seriously useful thing to become an expert at  :)
> 
> Cheers,
> 
> Peder Jakobsen
> 
> On Mon, Dec 28, 2015 at 10:45 AM, Peder Jakobsen | gmail <
> pjakobsen@gmail.com> wrote:
> 
>> Two thumbs up for this project.  An immediate benefit  is the ability to
>> take advantage of the enhanced interactive features of the iPython shell.
>> 
>> Perhaps the next step is to model the design after a similar Rest API
>> wrapper, for example,  python-twitter:
>> https://github.com/bear/python-twitter
>> 
>> On Mon, Dec 28, 2015 at 8:45 AM, Charles Givre <cg...@gmail.com> wrote:
>> 
>>> I’d second that and be willing to help.
>>> 
>>> 
>>>> On Dec 28, 2015, at 07:59, John Omernik <jo...@omernik.com> wrote:
>>>> 
>>>> I think a Pythonic module with Drill could be a great contribution.
>>> Using
>>>> the Rest API makes the most sense, wrapping it, and interfacing with it
>>>> using requests or something similar. Since everything is done via JSON
>>> in
>>>> the rest API, there could be nice interaction with the API, doing things
>>>> such as authentication (it's form based, so you have to use a requests
>>>> session or similar), query submission, results, error handling,etc. You
>>>> will want to determine what you want your driver to do, do you want an
>>>> interface to support submitting new storage plugins?  Do you want to
>>> expose
>>>> query time settings (such as the JSON read number as double) via the
>>>> driver, or just via a statement submitted by the user? (one requires
>>> much
>>>> more work, the other requires a eye towards security).  Security in
>>> another
>>>> thing, you want to ensure that if something is using your module, say a
>>>> Python Flask App, that there is validation of SQL, and other such
>>> concerns.
>>>> Drill seems to be pretty good about it, but any module you would write
>>>> should be explicit about what it is and what it isn't doing related to
>>>> input sanitization/security
>>>> 
>>>> Other things to think about would be something that would allow result
>>> set
>>>> objects in your Python driver to be easily moved to a pandas data
>>> frame. I
>>>> think the Data Science folks out there would love this, and you would
>>> have
>>>> a core setup of users and other contributions very quickly with that.
>>> The
>>>> key to something like this would be ensuring it's as Pythonic as
>>> possible
>>>> and is trying to bridge the gap between the Python language and Rest
>>> API.
>>>> This allows you, the author, the most flexibility to focus on your code,
>>>> and not have to worry much about the Drill code base as everything is
>>> using
>>>> the Rest API (which is really well designed having used it myself in
>>> Python
>>>> scripts).
>>>> 
>>>> This is a great idea and I would be happy to contribute/assist!
>>>> 
>>>> John
>>>> 
>>>> On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <ma...@pythonic.ninja>
>>> wrote:
>>>> 
>>>>> Dear Drill developers,
>>>>> 
>>>>> Recently I was trying to use Drill from Python through ODBC interface
>>>>> based on blog post from
>>>>> 
>>> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
>>>>> It worked as expected, but what struck to me was that It’s a lot of
>>> hassle
>>>>> to configure it.
>>>>> 
>>>>> That’s why based on Your site under Contribution Ideas (
>>>>> https://drill.apache.org/docs/apache-drill-contribution-ideas/) I
>>> decided
>>>>> to create simpler solution for Python community.
>>>>> 
>>>>> My Contribution would have two phases:
>>>>> client/driver for interacting with Drill
>>>>> dsl which will provide a easier and idiomatic way to write and
>>> manipulate
>>>>> queries using defined query set expressions.
>>>>> 
>>>>> 
>>>>> 1.
>>>>> Similarly to official client for Elastic Search (
>>>>> https://github.com/elastic/elasticsearch-py) I would like to use
>>> Rest-Api
>>>>> of Drill for which i found documentation under
>>>>> https://drill.apache.org/docs/rest-api/
>>>>> sketch of usage:
>>>>> 
>>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>>>>> 
>>>>> questions:
>>>>> 1.1 I was wondering if Python driver for Drill could be based on
>>> Rest-Api,
>>>>> do you see any problems?
>>>>> 1.2 Do you have any ideas or suggestions for that project?
>>>>> 
>>>>> 2.
>>>>> It would be separate package from driver, you can install as an
>>> optional
>>>>> package via command:
>>>>> pip install pydrill-dsl
>>>>> so that it would have separate releases from 1 package.
>>>>> It would enhance way of interacting with Drill via query set like
>>>>> expressions.
>>>>> sketch of usage:
>>>>> 
>>>>> 
>>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>>>>> 
>>>>> questions:
>>>>> 2.1 Should it be separated from Python Drill Driver package?
>>>>> 2.2 Do you have any ideas or suggestions for that project?
>>>>> 
>>>>> This contribution would be part of my Master Thesis, so any ideas are
>>>>> welcome. My thesis supervisor suggested to contact You to get Drill
>>> core
>>>>> developers perspective.
>>>>> 
>>>>> I would be very grateful if You could provide me with your thoughts.
>>>>> 
>>>>> kind regards,
>>>>> Wojtek Nowak
>>>>> 
>>> 
>>> 
>> 


Re: Python Driver Contribution Idea

Posted by Peder Jakobsen | gmail <pj...@gmail.com>.
Hi Wojtek,  if you want to kick start this project quickly, I would
 suggest you set up a project with a README.md in your github account, and
share the link with us.   Then we can move the detailed discussion about
features over there, and start collaborating.

Personally, I would then start with selecting the python test framework
(some are better that others), then simply  write some documented tests.
These will fail, and we can then work on making them pass if they feel like
they embody API calls that have the correct design.

Personally, I would start using such an API immediately, so I'm quite
motivated to help.  And yes,  it would make for a very nice masters thesis:
Good API design is a seriously useful thing to become an expert at  :)

Cheers,

Peder Jakobsen

On Mon, Dec 28, 2015 at 10:45 AM, Peder Jakobsen | gmail <
pjakobsen@gmail.com> wrote:

> Two thumbs up for this project.  An immediate benefit  is the ability to
> take advantage of the enhanced interactive features of the iPython shell.
>
> Perhaps the next step is to model the design after a similar Rest API
> wrapper, for example,  python-twitter:
> https://github.com/bear/python-twitter
>
> On Mon, Dec 28, 2015 at 8:45 AM, Charles Givre <cg...@gmail.com> wrote:
>
>> I’d second that and be willing to help.
>>
>>
>> > On Dec 28, 2015, at 07:59, John Omernik <jo...@omernik.com> wrote:
>> >
>> > I think a Pythonic module with Drill could be a great contribution.
>> Using
>> > the Rest API makes the most sense, wrapping it, and interfacing with it
>> > using requests or something similar. Since everything is done via JSON
>> in
>> > the rest API, there could be nice interaction with the API, doing things
>> > such as authentication (it's form based, so you have to use a requests
>> > session or similar), query submission, results, error handling,etc. You
>> > will want to determine what you want your driver to do, do you want an
>> > interface to support submitting new storage plugins?  Do you want to
>> expose
>> > query time settings (such as the JSON read number as double) via the
>> > driver, or just via a statement submitted by the user? (one requires
>> much
>> > more work, the other requires a eye towards security).  Security in
>> another
>> > thing, you want to ensure that if something is using your module, say a
>> > Python Flask App, that there is validation of SQL, and other such
>> concerns.
>> > Drill seems to be pretty good about it, but any module you would write
>> > should be explicit about what it is and what it isn't doing related to
>> > input sanitization/security
>> >
>> > Other things to think about would be something that would allow result
>> set
>> > objects in your Python driver to be easily moved to a pandas data
>> frame. I
>> > think the Data Science folks out there would love this, and you would
>> have
>> > a core setup of users and other contributions very quickly with that.
>> The
>> > key to something like this would be ensuring it's as Pythonic as
>> possible
>> > and is trying to bridge the gap between the Python language and Rest
>> API.
>> > This allows you, the author, the most flexibility to focus on your code,
>> > and not have to worry much about the Drill code base as everything is
>> using
>> > the Rest API (which is really well designed having used it myself in
>> Python
>> > scripts).
>> >
>> > This is a great idea and I would be happy to contribute/assist!
>> >
>> > John
>> >
>> > On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <ma...@pythonic.ninja>
>> wrote:
>> >
>> >> Dear Drill developers,
>> >>
>> >> Recently I was trying to use Drill from Python through ODBC interface
>> >> based on blog post from
>> >>
>> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
>> >> It worked as expected, but what struck to me was that It’s a lot of
>> hassle
>> >> to configure it.
>> >>
>> >> That’s why based on Your site under Contribution Ideas (
>> >> https://drill.apache.org/docs/apache-drill-contribution-ideas/) I
>> decided
>> >> to create simpler solution for Python community.
>> >>
>> >> My Contribution would have two phases:
>> >> client/driver for interacting with Drill
>> >> dsl which will provide a easier and idiomatic way to write and
>> manipulate
>> >> queries using defined query set expressions.
>> >>
>> >>
>> >> 1.
>> >> Similarly to official client for Elastic Search (
>> >> https://github.com/elastic/elasticsearch-py) I would like to use
>> Rest-Api
>> >> of Drill for which i found documentation under
>> >> https://drill.apache.org/docs/rest-api/
>> >> sketch of usage:
>> >>
>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>> >>
>> >> questions:
>> >> 1.1 I was wondering if Python driver for Drill could be based on
>> Rest-Api,
>> >> do you see any problems?
>> >> 1.2 Do you have any ideas or suggestions for that project?
>> >>
>> >> 2.
>> >> It would be separate package from driver, you can install as an
>> optional
>> >> package via command:
>> >> pip install pydrill-dsl
>> >> so that it would have separate releases from 1 package.
>> >> It would enhance way of interacting with Drill via query set like
>> >> expressions.
>> >> sketch of usage:
>> >>
>> >>
>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>> >>
>> >> questions:
>> >> 2.1 Should it be separated from Python Drill Driver package?
>> >> 2.2 Do you have any ideas or suggestions for that project?
>> >>
>> >> This contribution would be part of my Master Thesis, so any ideas are
>> >> welcome. My thesis supervisor suggested to contact You to get Drill
>> core
>> >> developers perspective.
>> >>
>> >> I would be very grateful if You could provide me with your thoughts.
>> >>
>> >> kind regards,
>> >> Wojtek Nowak
>> >>
>>
>>
>

Re: Python Driver Contribution Idea

Posted by Peder Jakobsen | gmail <pj...@gmail.com>.
Two thumbs up for this project.  An immediate benefit  is the ability to
take advantage of the enhanced interactive features of the iPython shell.

Perhaps the next step is to model the design after a similar Rest API
wrapper, for example,  python-twitter:
https://github.com/bear/python-twitter

On Mon, Dec 28, 2015 at 8:45 AM, Charles Givre <cg...@gmail.com> wrote:

> I’d second that and be willing to help.
>
>
> > On Dec 28, 2015, at 07:59, John Omernik <jo...@omernik.com> wrote:
> >
> > I think a Pythonic module with Drill could be a great contribution.
> Using
> > the Rest API makes the most sense, wrapping it, and interfacing with it
> > using requests or something similar. Since everything is done via JSON in
> > the rest API, there could be nice interaction with the API, doing things
> > such as authentication (it's form based, so you have to use a requests
> > session or similar), query submission, results, error handling,etc. You
> > will want to determine what you want your driver to do, do you want an
> > interface to support submitting new storage plugins?  Do you want to
> expose
> > query time settings (such as the JSON read number as double) via the
> > driver, or just via a statement submitted by the user? (one requires much
> > more work, the other requires a eye towards security).  Security in
> another
> > thing, you want to ensure that if something is using your module, say a
> > Python Flask App, that there is validation of SQL, and other such
> concerns.
> > Drill seems to be pretty good about it, but any module you would write
> > should be explicit about what it is and what it isn't doing related to
> > input sanitization/security
> >
> > Other things to think about would be something that would allow result
> set
> > objects in your Python driver to be easily moved to a pandas data frame.
> I
> > think the Data Science folks out there would love this, and you would
> have
> > a core setup of users and other contributions very quickly with that.
> The
> > key to something like this would be ensuring it's as Pythonic as possible
> > and is trying to bridge the gap between the Python language and Rest API.
> > This allows you, the author, the most flexibility to focus on your code,
> > and not have to worry much about the Drill code base as everything is
> using
> > the Rest API (which is really well designed having used it myself in
> Python
> > scripts).
> >
> > This is a great idea and I would be happy to contribute/assist!
> >
> > John
> >
> > On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <ma...@pythonic.ninja>
> wrote:
> >
> >> Dear Drill developers,
> >>
> >> Recently I was trying to use Drill from Python through ODBC interface
> >> based on blog post from
> >>
> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
> >> It worked as expected, but what struck to me was that It’s a lot of
> hassle
> >> to configure it.
> >>
> >> That’s why based on Your site under Contribution Ideas (
> >> https://drill.apache.org/docs/apache-drill-contribution-ideas/) I
> decided
> >> to create simpler solution for Python community.
> >>
> >> My Contribution would have two phases:
> >> client/driver for interacting with Drill
> >> dsl which will provide a easier and idiomatic way to write and
> manipulate
> >> queries using defined query set expressions.
> >>
> >>
> >> 1.
> >> Similarly to official client for Elastic Search (
> >> https://github.com/elastic/elasticsearch-py) I would like to use
> Rest-Api
> >> of Drill for which i found documentation under
> >> https://drill.apache.org/docs/rest-api/
> >> sketch of usage:
> >>
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
> >>
> >> questions:
> >> 1.1 I was wondering if Python driver for Drill could be based on
> Rest-Api,
> >> do you see any problems?
> >> 1.2 Do you have any ideas or suggestions for that project?
> >>
> >> 2.
> >> It would be separate package from driver, you can install as an optional
> >> package via command:
> >> pip install pydrill-dsl
> >> so that it would have separate releases from 1 package.
> >> It would enhance way of interacting with Drill via query set like
> >> expressions.
> >> sketch of usage:
> >>
> >>
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
> >>
> >> questions:
> >> 2.1 Should it be separated from Python Drill Driver package?
> >> 2.2 Do you have any ideas or suggestions for that project?
> >>
> >> This contribution would be part of my Master Thesis, so any ideas are
> >> welcome. My thesis supervisor suggested to contact You to get Drill core
> >> developers perspective.
> >>
> >> I would be very grateful if You could provide me with your thoughts.
> >>
> >> kind regards,
> >> Wojtek Nowak
> >>
>
>

Re: Python Driver Contribution Idea

Posted by Charles Givre <cg...@gmail.com>.
I’d second that and be willing to help.  


> On Dec 28, 2015, at 07:59, John Omernik <jo...@omernik.com> wrote:
> 
> I think a Pythonic module with Drill could be a great contribution.  Using
> the Rest API makes the most sense, wrapping it, and interfacing with it
> using requests or something similar. Since everything is done via JSON in
> the rest API, there could be nice interaction with the API, doing things
> such as authentication (it's form based, so you have to use a requests
> session or similar), query submission, results, error handling,etc. You
> will want to determine what you want your driver to do, do you want an
> interface to support submitting new storage plugins?  Do you want to expose
> query time settings (such as the JSON read number as double) via the
> driver, or just via a statement submitted by the user? (one requires much
> more work, the other requires a eye towards security).  Security in another
> thing, you want to ensure that if something is using your module, say a
> Python Flask App, that there is validation of SQL, and other such concerns.
> Drill seems to be pretty good about it, but any module you would write
> should be explicit about what it is and what it isn't doing related to
> input sanitization/security
> 
> Other things to think about would be something that would allow result set
> objects in your Python driver to be easily moved to a pandas data frame. I
> think the Data Science folks out there would love this, and you would have
> a core setup of users and other contributions very quickly with that.  The
> key to something like this would be ensuring it's as Pythonic as possible
> and is trying to bridge the gap between the Python language and Rest API.
> This allows you, the author, the most flexibility to focus on your code,
> and not have to worry much about the Drill code base as everything is using
> the Rest API (which is really well designed having used it myself in Python
> scripts).
> 
> This is a great idea and I would be happy to contribute/assist!
> 
> John
> 
> On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <ma...@pythonic.ninja> wrote:
> 
>> Dear Drill developers,
>> 
>> Recently I was trying to use Drill from Python through ODBC interface
>> based on blog post from
>> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
>> It worked as expected, but what struck to me was that It’s a lot of hassle
>> to configure it.
>> 
>> That’s why based on Your site under Contribution Ideas (
>> https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided
>> to create simpler solution for Python community.
>> 
>> My Contribution would have two phases:
>> client/driver for interacting with Drill
>> dsl which will provide a easier and idiomatic way to write and manipulate
>> queries using defined query set expressions.
>> 
>> 
>> 1.
>> Similarly to official client for Elastic Search (
>> https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api
>> of Drill for which i found documentation under
>> https://drill.apache.org/docs/rest-api/
>> sketch of usage:
>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>> 
>> questions:
>> 1.1 I was wondering if Python driver for Drill could be based on Rest-Api,
>> do you see any problems?
>> 1.2 Do you have any ideas or suggestions for that project?
>> 
>> 2.
>> It would be separate package from driver, you can install as an optional
>> package via command:
>> pip install pydrill-dsl
>> so that it would have separate releases from 1 package.
>> It would enhance way of interacting with Drill via query set like
>> expressions.
>> sketch of usage:
>> 
>> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>> 
>> questions:
>> 2.1 Should it be separated from Python Drill Driver package?
>> 2.2 Do you have any ideas or suggestions for that project?
>> 
>> This contribution would be part of my Master Thesis, so any ideas are
>> welcome. My thesis supervisor suggested to contact You to get Drill core
>> developers perspective.
>> 
>> I would be very grateful if You could provide me with your thoughts.
>> 
>> kind regards,
>> Wojtek Nowak
>> 


Re: Python Driver Contribution Idea

Posted by John Omernik <jo...@omernik.com>.
I think a Pythonic module with Drill could be a great contribution.  Using
the Rest API makes the most sense, wrapping it, and interfacing with it
using requests or something similar. Since everything is done via JSON in
the rest API, there could be nice interaction with the API, doing things
such as authentication (it's form based, so you have to use a requests
session or similar), query submission, results, error handling,etc. You
will want to determine what you want your driver to do, do you want an
interface to support submitting new storage plugins?  Do you want to expose
query time settings (such as the JSON read number as double) via the
driver, or just via a statement submitted by the user? (one requires much
more work, the other requires a eye towards security).  Security in another
thing, you want to ensure that if something is using your module, say a
Python Flask App, that there is validation of SQL, and other such concerns.
Drill seems to be pretty good about it, but any module you would write
should be explicit about what it is and what it isn't doing related to
input sanitization/security

Other things to think about would be something that would allow result set
objects in your Python driver to be easily moved to a pandas data frame. I
think the Data Science folks out there would love this, and you would have
a core setup of users and other contributions very quickly with that.  The
key to something like this would be ensuring it's as Pythonic as possible
and is trying to bridge the gap between the Python language and Rest API.
This allows you, the author, the most flexibility to focus on your code,
and not have to worry much about the Drill code base as everything is using
the Rest API (which is really well designed having used it myself in Python
scripts).

This is a great idea and I would be happy to contribute/assist!

John

On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <ma...@pythonic.ninja> wrote:

> Dear Drill developers,
>
> Recently I was trying to use Drill from Python through ODBC interface
> based on blog post from
> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
> It worked as expected, but what struck to me was that It’s a lot of hassle
> to configure it.
>
> That’s why based on Your site under Contribution Ideas (
> https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided
> to create simpler solution for Python community.
>
> My Contribution would have two phases:
> client/driver for interacting with Drill
> dsl which will provide a easier and idiomatic way to write and manipulate
> queries using defined query set expressions.
>
>
> 1.
> Similarly to official client for Elastic Search (
> https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api
> of Drill for which i found documentation under
> https://drill.apache.org/docs/rest-api/
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>
> questions:
> 1.1 I was wondering if Python driver for Drill could be based on Rest-Api,
> do you see any problems?
> 1.2 Do you have any ideas or suggestions for that project?
>
> 2.
> It would be separate package from driver, you can install as an optional
> package via command:
> pip install pydrill-dsl
> so that it would have separate releases from 1 package.
> It would enhance way of interacting with Drill via query set like
> expressions.
> sketch of usage:
>
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>
> questions:
> 2.1 Should it be separated from Python Drill Driver package?
> 2.2 Do you have any ideas or suggestions for that project?
>
> This contribution would be part of my Master Thesis, so any ideas are
> welcome. My thesis supervisor suggested to contact You to get Drill core
> developers perspective.
>
> I would be very grateful if You could provide me with your thoughts.
>
> kind regards,
> Wojtek Nowak
>