You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by pmccaffrey6 <gi...@git.apache.org> on 2016/12/16 09:31:45 UTC
[GitHub] zeppelin pull request #1776: [ZEPPELIN-1824] Add MetaData exploration to JDB...
GitHub user pmccaffrey6 opened a pull request:
https://github.com/apache/zeppelin/pull/1776
[ZEPPELIN-1824] Add MetaData exploration to JDBC Interpreter
### What is this PR for?
Zeppelin currently has little functionality for data source exploration. This PR exists to build a small feature for the JDBC interpreter that would allow users to explore metadata for databases and database objects.
With this PR, the JDBC interpreter accepts the "explore" keyword. When run in isolation, this fetches metadata about the database as a whole (tables, views etc...). When the explore keyword is followed by the name of a table or view, this fetches metadata about that table or view (column names, data types etc...).
A video of this feature in action can be found here (https://s3.amazonaws.com/screenshots-mockups/embedvid.html).
### What type of PR is it?
Improvement | Feature
### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-1824
### How should this be tested?
1. Run `explore` in a `%jdbc` paragraph to get a list of tables, views, system tables, global and local temporary tables, aliases and synonyms.
2. Run `explore` followed by the name of a database table or view in order to get a list of column names, data types etc...
Additionally, this PR adds two new unit tests to `JDBCInterpreterTest` which test fetching of database metadata as well as table and view metadata.
### Screenshots (if appropriate)
https://s3.amazonaws.com/screenshots-mockups/embedvid.html
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes. This would benefit from a small addition to the JDBC Interpreter documentation.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pmccaffrey6/zeppelin jdbc-metadata
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/1776.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1776
----
commit 4c0c18249de5cb77f1fb038c251801bac3365fe7
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-15T21:55:28Z
add getmetadata method
commit 161511656a948c3fb6edb5a7590c23256b46528e
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-15T22:05:08Z
added more table types
commit af09f3d96fcb10d74c89223cfca55a026a75463b
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-15T22:16:14Z
checkstyle
commit d1c8a2861c4108848d6ae1907c4de6ce6a1d7da0
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-15T22:18:49Z
checkstyle
commit 375604c70271eebd8099be71b49fe93ef61d0287
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-15T22:41:03Z
removed comments
commit efb6a4f98de2748f065dc673405a5c413635fd36
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T00:04:48Z
added tests
commit c9451892f9287e7bd466527158c5cf0838174fd4
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T00:15:40Z
test
commit e1df106e4b42dd733d557f9de39f26e955a65e04
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T06:43:13Z
check cmd length
commit ea177b3fe26342a42a02cda122df07a26b481bfd
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T06:49:51Z
test table content
commit 9e88fbe32aa9e5dbe0193f4f44335b078bb6a944
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T06:56:03Z
test table content
commit 8308a613af039cf32dbccaaf7b9f79b127a149d0
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T06:59:17Z
test table content
commit 85764037ab6d0351e587c88513523387b7ffa52f
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T07:04:01Z
test table content
commit 54980d37adfa4ba038d534b555653d80ac8b4753
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T07:29:45Z
test table content
commit 804ab176d143ffca9dec7f5f4bc074a3f3c53ec3
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T07:48:22Z
test table content
commit 60765cc2f176fe9a5bb59ca427996230bf7989e4
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T07:49:39Z
test table content
commit 3047bd7e7f345d43b59d3d9bdecfc3a8a8562ef4
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T07:54:56Z
test table content
commit 5f360a5f868c4b973809611d79163a4c9e4751ff
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T08:00:10Z
test table content
commit 122e777bd2b47e55de573fbc880e326faa0021d5
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T08:02:22Z
test table content
commit 854931a4d06525726f41b87272fae1727c987fcb
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T08:05:43Z
test table content
commit a8a337beb0289355fe2d2dad567dc8685c1b0d3d
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T08:25:19Z
add comment
commit 57d14383bd62352f4bb7be9b1cbee756aa0cd96c
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T08:50:53Z
dont optimize imports
commit 81506499442bb0061dfd35f86bcda6b854ba788b
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T09:12:21Z
dont optimize imports
commit 8d2d8dbd89cccac3edd8a88ea891bbe90e768983
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date: 2016-12-16T09:19:27Z
organize
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...
Posted by FireArrow <gi...@git.apache.org>.
Github user FireArrow commented on the issue:
https://github.com/apache/zeppelin/pull/1776
LGTM. Test failure seems unrelated.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin pull request #1776: [ZEPPELIN-1824] Add MetaData exploration to JDB...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/zeppelin/pull/1776
---
[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...
Posted by pmccaffrey6 <gi...@git.apache.org>.
Github user pmccaffrey6 commented on the issue:
https://github.com/apache/zeppelin/pull/1776
Hey guys,
Thanks so much for the review! I removed the unused variable (great catch).
As for the other issues, I think they make a ton of sense. I wrote the `getMetaData()` method so that it pairs closely with the `executeSql()` method and so these issues are present in both.
Perhaps a good option would be to open a second PR to refactor the JDBC interpreter and make these changes in both `getMetaData()` and `executeSql()` methods? What do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...
Posted by pmccaffrey6 <gi...@git.apache.org>.
Github user pmccaffrey6 commented on the issue:
https://github.com/apache/zeppelin/pull/1776
I went ahead and changed parsing of the "explore" keyword to use `split(" +")` instead of substring so that it wasn't tied to any particular word length etc.. and the `METADATA_KEYWORD` can be set to any single case-insensitive word.
As an additional detail, if you look up a table that doesn't exist you'll just get an empty table in the paragraph as opposed to an explicit error message. I experimented a bit with testing for an empty result set using `resultSet.next()` and `resultSet.first()` so that I could output an error message like "Database object not found".
However, it seems that when getting metadata, since you're not using `Statement`, this kind of thing causes problems because re-setting cursors on the resultset using methods like `resultSet.beforeFirst()` isn't supported for all data sources. I don't want to use `Statement` because I want to have this abstracted away from SQL and simply use the JDBC api to get metadata. So, in the interest of making this as general of a feature as possible, and therefore not using `Statement`, as well as not having a good way to test for `resultSet` emptiness that is adequately vendor-agnostic, there currently isn't testing for empty result sets, they just return as empty tables.
I don't imagine this is an issue really but if anyone has a suggestion of a good way to do this, I'd be very interested to hear it!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...
Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:
https://github.com/apache/zeppelin/pull/1776
Thank you @pmccaffrey6 , it looks great to me, modulo few things noted above.
Could you just double check that all feedback from reviews was addressed?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...
Posted by pmccaffrey6 <gi...@git.apache.org>.
Github user pmccaffrey6 commented on the issue:
https://github.com/apache/zeppelin/pull/1776
Hey @bzz,
Great catch! Sorry about that. I removed the unnecessary import. The error seems unrelated. It comes from:
```
1470 21:03:56,259 ERROR org.apache.zeppelin.AbstractZeppelinIT:136 - Exception in ZeppelinIT while testSparkInterpreterDependencyLoading
1471 org.openqa.selenium.TimeoutException: Timed out after 60 seconds waiting for org.apache.zeppelin.AbstractZeppelinIT$1@7918cba6
... Stack Trace ...
1507 Caused by: org.openqa.selenium.NoSuchElementException: Unable to locate element: {"method":"xpath","selector":"(//div[@ng-controller=\"ParagraphCtrl\"])[1]//div[contains(@class, 'control')]//span[1][contains(.,'FINISHED')]"}
```
As for the other excellent points that @DrIgor noted (including the great point about using logger), those issues are present both in `getMetaData()` as well as `executeSql()` methods. Would you prefer that I make those changes to the `getMetaData()` method in this PR or refactor both `getMetaData()` and `executeSql()` in a separate PR?
As always, thanks for your review!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...
Posted by pmccaffrey6 <gi...@git.apache.org>.
Github user pmccaffrey6 commented on the issue:
https://github.com/apache/zeppelin/pull/1776
The failed test looks to be from `testSparkSQLInterpreter` from `LivyInterpreterIT.java`. More specifically, it comes from line 227 (and 226) in that file:
```
InterpreterResult result = sqlInterpreter.interpret("show tables", context);
assertEquals(InterpreterResult.Code.SUCCESS, result.code());
```
`sqlInterpreter` being an instance of `LivySparkSQLInterpreter`, it seems unrelated.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---