You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2016/05/19 14:21:56 UTC

Caravel and Drill Integration Update

Hey all,

As you may see from the uber thread that evolved from Neeraja bringing
Caravel to the Drill community, there has been lots of back and forth on
the subject.  I thought I'd give a little update, as well as a call to arms
so to speak, for anyone who wants to play with Drill/Caravel and find
issues.

First of all, mostly due to the hard work of PythonicNinja (Wojtek Nowak)
there is a working demo of Drill being used as the backend for Caravel
visualizations.  This to me is a great sign that the work being is not only
worth it, but something that could really be a benefit to the Drill
community. Thanks again Wojtek for your great work.

*Current Status*

We have two repos we are maintaining. One is a "dev" environment that
includes a Dockerfile to show how to get thing setup and working with a
number of components: Caravel, PyODBC, UnixODBC, MapR ODBC Connector, etc.
This is:

https://github.com/JohnOmernik/caraveldrill

This basically has everything you may need to connect to Drill.  It does
NOT include Drill, it assumes you have a Drill instance to connect to.
This allows you to run caravel with the test data, and includes a test
python script for making a connection to drill. If the script succeeds for
you in showing schemas, you can then see in the output the exact connection
string you'd need for Caravel (Thanks Chris Matta from MapR for sharing his
repo using PyODBC, I used that quite a bit in proving this out.  Source
Repo: https://github.com/cjmatta/DrillPandasReddit

The other component is the actual SQL Alchemy dialect for Drill.  This is
in rougher shape, and it will need the crowd sourced effort.  Basically, it
works. (to install take a running container of the caraveldrill repo,
connect to it via docker exec -it %containerid% /bin/bash  Now you from
within the container, you can install the Dialect.   Once you run the
install on the dialect, it will work and you can connect via Caravel!

This is sorta a hack, we started with a MS Access Dialect and are slowly
removing obvious Access only parts (functions related to primary/foreign
keys, indexes, etc) and adding/replacing parts that have to do with Drill.
It's a learning process for us.  If you think you can address the issues in
Drill please submit PRs, this will be highly iterative and things will move
fast. The end result here should be a nice dialect for Drill that handles
all the Caravel functionality, and potentially could be used in other
projects.

The dialect Repo is located here:

https://github.com/JohnOmernik/sqlalchemy-drill/

Once again, thanks to all, especially Wojtek who has contributed thus far,
and I am looking forward to seeing this evolve!

John Omernik

Re: Caravel and Drill Integration Update

Posted by Ted Dunning <te...@gmail.com>.
Updates are AWEsome here.



On Wed, May 25, 2016 at 9:08 AM, John Omernik <jo...@omernik.com> wrote:

> Update!
>
> (Note, please tell me if updates here are inappropriate.  I like updating
> here, because I think it's relevant to the Drill community, and I want
> people to see what is happening so they can help, however, I do know there
> are others who may see this as a separate project from Drill, a thus
> updates like this just noise.  I want to be respectful of the communities
> wishes here)
>
> I've made some updates to the Drill dialect, removing some of the MS Access
> cruft, and more importantly, working with the join function so that I can
> get some other types of charts working. (There was an issue where caravel
> would issue a join but not put aliases in the ON clause
>
> so  "select field1, field2 from table1 JOIN (select field2 as __field2,
> field3 from table1) as anon_1 on field2 = __field2
>
> While the field names where different, Drill wouldn't allow it, I needed to
> do on field1 = anon_1.__field2  This was allowed, that's what I attempted
> to do in my join clause in SQL Alchemy, it's ugly, but it has some things
> working, now I need to do more testing.
>
> Things I am still working on:
>
> Date issues on time series charts: Potentially some issues with caravel
> itself, but I am looking into it.
>
> John
>
>
>
> On Thu, May 19, 2016 at 9:21 AM, John Omernik <jo...@omernik.com> wrote:
>
> > Hey all,
> >
> > As you may see from the uber thread that evolved from Neeraja bringing
> > Caravel to the Drill community, there has been lots of back and forth on
> > the subject.  I thought I'd give a little update, as well as a call to
> arms
> > so to speak, for anyone who wants to play with Drill/Caravel and find
> > issues.
> >
> > First of all, mostly due to the hard work of PythonicNinja (Wojtek Nowak)
> > there is a working demo of Drill being used as the backend for Caravel
> > visualizations.  This to me is a great sign that the work being is not
> only
> > worth it, but something that could really be a benefit to the Drill
> > community. Thanks again Wojtek for your great work.
> >
> > *Current Status*
> >
> > We have two repos we are maintaining. One is a "dev" environment that
> > includes a Dockerfile to show how to get thing setup and working with a
> > number of components: Caravel, PyODBC, UnixODBC, MapR ODBC Connector,
> etc.
> > This is:
> >
> > https://github.com/JohnOmernik/caraveldrill
> >
> > This basically has everything you may need to connect to Drill.  It does
> > NOT include Drill, it assumes you have a Drill instance to connect to.
> > This allows you to run caravel with the test data, and includes a test
> > python script for making a connection to drill. If the script succeeds
> for
> > you in showing schemas, you can then see in the output the exact
> connection
> > string you'd need for Caravel (Thanks Chris Matta from MapR for sharing
> his
> > repo using PyODBC, I used that quite a bit in proving this out.  Source
> > Repo: https://github.com/cjmatta/DrillPandasReddit
> >
> > The other component is the actual SQL Alchemy dialect for Drill.  This is
> > in rougher shape, and it will need the crowd sourced effort.  Basically,
> it
> > works. (to install take a running container of the caraveldrill repo,
> > connect to it via docker exec -it %containerid% /bin/bash  Now you from
> > within the container, you can install the Dialect.   Once you run the
> > install on the dialect, it will work and you can connect via Caravel!
> >
> > This is sorta a hack, we started with a MS Access Dialect and are slowly
> > removing obvious Access only parts (functions related to primary/foreign
> > keys, indexes, etc) and adding/replacing parts that have to do with
> Drill.
> > It's a learning process for us.  If you think you can address the issues
> in
> > Drill please submit PRs, this will be highly iterative and things will
> move
> > fast. The end result here should be a nice dialect for Drill that handles
> > all the Caravel functionality, and potentially could be used in other
> > projects.
> >
> > The dialect Repo is located here:
> >
> > https://github.com/JohnOmernik/sqlalchemy-drill/
> >
> > Once again, thanks to all, especially Wojtek who has contributed thus
> far,
> > and I am looking forward to seeing this evolve!
> >
> > John Omernik
> >
> >
> >
> >
> >
>

Re: Caravel and Drill Integration Update

Posted by John Omernik <jo...@omernik.com>.
Update!

(Note, please tell me if updates here are inappropriate.  I like updating
here, because I think it's relevant to the Drill community, and I want
people to see what is happening so they can help, however, I do know there
are others who may see this as a separate project from Drill, a thus
updates like this just noise.  I want to be respectful of the communities
wishes here)

I've made some updates to the Drill dialect, removing some of the MS Access
cruft, and more importantly, working with the join function so that I can
get some other types of charts working. (There was an issue where caravel
would issue a join but not put aliases in the ON clause

so  "select field1, field2 from table1 JOIN (select field2 as __field2,
field3 from table1) as anon_1 on field2 = __field2

While the field names where different, Drill wouldn't allow it, I needed to
do on field1 = anon_1.__field2  This was allowed, that's what I attempted
to do in my join clause in SQL Alchemy, it's ugly, but it has some things
working, now I need to do more testing.

Things I am still working on:

Date issues on time series charts: Potentially some issues with caravel
itself, but I am looking into it.

John



On Thu, May 19, 2016 at 9:21 AM, John Omernik <jo...@omernik.com> wrote:

> Hey all,
>
> As you may see from the uber thread that evolved from Neeraja bringing
> Caravel to the Drill community, there has been lots of back and forth on
> the subject.  I thought I'd give a little update, as well as a call to arms
> so to speak, for anyone who wants to play with Drill/Caravel and find
> issues.
>
> First of all, mostly due to the hard work of PythonicNinja (Wojtek Nowak)
> there is a working demo of Drill being used as the backend for Caravel
> visualizations.  This to me is a great sign that the work being is not only
> worth it, but something that could really be a benefit to the Drill
> community. Thanks again Wojtek for your great work.
>
> *Current Status*
>
> We have two repos we are maintaining. One is a "dev" environment that
> includes a Dockerfile to show how to get thing setup and working with a
> number of components: Caravel, PyODBC, UnixODBC, MapR ODBC Connector, etc.
> This is:
>
> https://github.com/JohnOmernik/caraveldrill
>
> This basically has everything you may need to connect to Drill.  It does
> NOT include Drill, it assumes you have a Drill instance to connect to.
> This allows you to run caravel with the test data, and includes a test
> python script for making a connection to drill. If the script succeeds for
> you in showing schemas, you can then see in the output the exact connection
> string you'd need for Caravel (Thanks Chris Matta from MapR for sharing his
> repo using PyODBC, I used that quite a bit in proving this out.  Source
> Repo: https://github.com/cjmatta/DrillPandasReddit
>
> The other component is the actual SQL Alchemy dialect for Drill.  This is
> in rougher shape, and it will need the crowd sourced effort.  Basically, it
> works. (to install take a running container of the caraveldrill repo,
> connect to it via docker exec -it %containerid% /bin/bash  Now you from
> within the container, you can install the Dialect.   Once you run the
> install on the dialect, it will work and you can connect via Caravel!
>
> This is sorta a hack, we started with a MS Access Dialect and are slowly
> removing obvious Access only parts (functions related to primary/foreign
> keys, indexes, etc) and adding/replacing parts that have to do with Drill.
> It's a learning process for us.  If you think you can address the issues in
> Drill please submit PRs, this will be highly iterative and things will move
> fast. The end result here should be a nice dialect for Drill that handles
> all the Caravel functionality, and potentially could be used in other
> projects.
>
> The dialect Repo is located here:
>
> https://github.com/JohnOmernik/sqlalchemy-drill/
>
> Once again, thanks to all, especially Wojtek who has contributed thus far,
> and I am looking forward to seeing this evolve!
>
> John Omernik
>
>
>
>
>