You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by pmccaffrey6 <gi...@git.apache.org> on 2016/12/16 09:31:45 UTC

[GitHub] zeppelin pull request #1776: [ZEPPELIN-1824] Add MetaData exploration to JDB...

GitHub user pmccaffrey6 opened a pull request:

    https://github.com/apache/zeppelin/pull/1776

    [ZEPPELIN-1824] Add MetaData exploration to JDBC Interpreter

    ### What is this PR for?
    Zeppelin currently has little functionality for data source exploration. This PR exists to build a small feature for the JDBC interpreter that would allow users to explore metadata for databases and database objects.
    
    With this PR, the JDBC interpreter accepts the "explore" keyword. When run in isolation, this fetches metadata about the database as a whole (tables, views etc...). When the explore keyword is followed by the name of a table or view, this fetches metadata about that table or view (column names, data types etc...).
    
    A video of this feature in action can be found here (https://s3.amazonaws.com/screenshots-mockups/embedvid.html).
    
    ### What type of PR is it?
    Improvement | Feature
    
    ### What is the Jira issue?
    https://issues.apache.org/jira/browse/ZEPPELIN-1824
    
    ### How should this be tested?
    1. Run `explore` in a `%jdbc` paragraph to get a list of tables, views, system tables, global and local temporary tables, aliases and synonyms.
    
    2. Run `explore` followed by the name of a database table or view in order to get a list of column names, data types etc...
    
    Additionally, this PR adds two new unit tests to `JDBCInterpreterTest` which test fetching of database metadata as well as table and view metadata.
    
    ### Screenshots (if appropriate)
    https://s3.amazonaws.com/screenshots-mockups/embedvid.html
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? Yes. This would benefit from a small addition to the JDBC Interpreter documentation.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pmccaffrey6/zeppelin jdbc-metadata

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1776
    
----
commit 4c0c18249de5cb77f1fb038c251801bac3365fe7
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-15T21:55:28Z

    add getmetadata method

commit 161511656a948c3fb6edb5a7590c23256b46528e
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-15T22:05:08Z

    added more table types

commit af09f3d96fcb10d74c89223cfca55a026a75463b
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-15T22:16:14Z

    checkstyle

commit d1c8a2861c4108848d6ae1907c4de6ce6a1d7da0
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-15T22:18:49Z

    checkstyle

commit 375604c70271eebd8099be71b49fe93ef61d0287
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-15T22:41:03Z

    removed comments

commit efb6a4f98de2748f065dc673405a5c413635fd36
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T00:04:48Z

    added tests

commit c9451892f9287e7bd466527158c5cf0838174fd4
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T00:15:40Z

    test

commit e1df106e4b42dd733d557f9de39f26e955a65e04
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T06:43:13Z

    check cmd length

commit ea177b3fe26342a42a02cda122df07a26b481bfd
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T06:49:51Z

    test table content

commit 9e88fbe32aa9e5dbe0193f4f44335b078bb6a944
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T06:56:03Z

    test table content

commit 8308a613af039cf32dbccaaf7b9f79b127a149d0
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T06:59:17Z

    test table content

commit 85764037ab6d0351e587c88513523387b7ffa52f
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T07:04:01Z

    test table content

commit 54980d37adfa4ba038d534b555653d80ac8b4753
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T07:29:45Z

    test table content

commit 804ab176d143ffca9dec7f5f4bc074a3f3c53ec3
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T07:48:22Z

    test table content

commit 60765cc2f176fe9a5bb59ca427996230bf7989e4
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T07:49:39Z

    test table content

commit 3047bd7e7f345d43b59d3d9bdecfc3a8a8562ef4
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T07:54:56Z

    test table content

commit 5f360a5f868c4b973809611d79163a4c9e4751ff
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T08:00:10Z

    test table content

commit 122e777bd2b47e55de573fbc880e326faa0021d5
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T08:02:22Z

    test table content

commit 854931a4d06525726f41b87272fae1727c987fcb
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T08:05:43Z

    test table content

commit a8a337beb0289355fe2d2dad567dc8685c1b0d3d
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T08:25:19Z

    add comment

commit 57d14383bd62352f4bb7be9b1cbee756aa0cd96c
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T08:50:53Z

    dont optimize imports

commit 81506499442bb0061dfd35f86bcda6b854ba788b
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T09:12:21Z

    dont optimize imports

commit 8d2d8dbd89cccac3edd8a88ea891bbe90e768983
Author: Peter McCaffrey <pm...@tests-macbook-pro-2.local>
Date:   2016-12-16T09:19:27Z

    organize

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...

Posted by FireArrow <gi...@git.apache.org>.
Github user FireArrow commented on the issue:

    https://github.com/apache/zeppelin/pull/1776
  
    LGTM. Test failure seems unrelated. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1776: [ZEPPELIN-1824] Add MetaData exploration to JDB...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/1776


---

[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...

Posted by pmccaffrey6 <gi...@git.apache.org>.
Github user pmccaffrey6 commented on the issue:

    https://github.com/apache/zeppelin/pull/1776
  
    Hey guys,
         Thanks so much for the review! I removed the unused variable (great catch). 
    
    As for the other issues, I think they make a ton of sense. I wrote the `getMetaData()` method so that it pairs closely with the `executeSql()` method and so these issues are present in both. 
    
    Perhaps a good option would be to open a second PR to refactor the JDBC interpreter and make these changes in both `getMetaData()` and `executeSql()` methods? What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...

Posted by pmccaffrey6 <gi...@git.apache.org>.
Github user pmccaffrey6 commented on the issue:

    https://github.com/apache/zeppelin/pull/1776
  
    I went ahead and changed parsing of the "explore" keyword to use `split(" +")` instead of substring so that it wasn't tied to any particular word length etc.. and the `METADATA_KEYWORD` can be set to any single case-insensitive word.
    
    As an additional detail, if you look up a table that doesn't exist you'll just get an empty table in the paragraph as opposed to an explicit error message. I experimented a bit with testing for an empty result set using `resultSet.next()` and `resultSet.first()` so that I could output an error message like "Database object not found". 
    
    However, it seems that when getting metadata, since you're not using `Statement`, this kind of thing causes problems because re-setting cursors on the resultset using methods like `resultSet.beforeFirst()` isn't supported for all data sources. I don't want to use `Statement` because I want to have this abstracted away from SQL and simply use the JDBC api to get metadata. So, in the interest of making this as general of a feature as possible, and therefore not using `Statement`, as well as not having a good way to test for `resultSet` emptiness that is adequately vendor-agnostic, there currently isn't testing for empty result sets, they just return as empty tables.
    
    I don't imagine this is an issue really but if anyone has a suggestion of a good way to do this, I'd be very interested to hear it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1776
  
    Thank you @pmccaffrey6 , it looks great to me, modulo few things noted above.
    
    Could you just double check that all feedback from reviews was addressed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...

Posted by pmccaffrey6 <gi...@git.apache.org>.
Github user pmccaffrey6 commented on the issue:

    https://github.com/apache/zeppelin/pull/1776
  
    Hey @bzz,
         Great catch! Sorry about that. I removed the unnecessary import. The error seems unrelated. It comes from:
    
    ```
    1470 21:03:56,259 ERROR org.apache.zeppelin.AbstractZeppelinIT:136 - Exception in ZeppelinIT while testSparkInterpreterDependencyLoading 
    1471 org.openqa.selenium.TimeoutException: Timed out after 60 seconds waiting for org.apache.zeppelin.AbstractZeppelinIT$1@7918cba6
    ... Stack Trace ...
    1507 Caused by: org.openqa.selenium.NoSuchElementException: Unable to locate element: {"method":"xpath","selector":"(//div[@ng-controller=\"ParagraphCtrl\"])[1]//div[contains(@class, 'control')]//span[1][contains(.,'FINISHED')]"}
    ```
    As for the other excellent points that @DrIgor noted (including the great point about using logger), those issues are present both in `getMetaData()` as well as `executeSql()` methods. Would you prefer that I make those changes to the `getMetaData()` method in this PR or refactor both `getMetaData()` and `executeSql()` in a separate PR?
    
    As always, thanks for your review!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1776: [ZEPPELIN-1824] Add MetaData exploration to JDBC Inter...

Posted by pmccaffrey6 <gi...@git.apache.org>.
Github user pmccaffrey6 commented on the issue:

    https://github.com/apache/zeppelin/pull/1776
  
    The failed test looks to be from `testSparkSQLInterpreter` from `LivyInterpreterIT.java`. More specifically, it comes from line 227 (and 226) in that file: 
    
    ```
    InterpreterResult result = sqlInterpreter.interpret("show tables", context);
    assertEquals(InterpreterResult.Code.SUCCESS, result.code());
    ```
    
    `sqlInterpreter` being an instance of `LivySparkSQLInterpreter`, it seems unrelated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---