You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by babupe <gi...@git.apache.org> on 2016/07/12 22:13:59 UTC

[GitHub] zeppelin pull request #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPEL...

GitHub user babupe opened a pull request:

    https://github.com/apache/zeppelin/pull/1170

    BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153]

    ### What is this PR for?
    Google BigQuery is a popular no-ops datawarehouse. This commit will enable Apache Zeppelin users to perform BI and Analytics on their datasets in BigQuery.
    
    ### What type of PR is it?
    Feature
    
    ### Todos
    * Make bigquery interpreter appear in the interpreters section in the UI
    * Build SQL completion 
    * Authorization of non-gcp 
    
    ### What is the Jira issue?
    https://issues.apache.org/jira/browse/ZEPPELIN-1153
    
    ### How should this be tested?
    copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml
    Add org.apache.zeppelin.bigquery.bigQueryInterpreter to property zeppelin.interpreters in zeppelin-site.xml
    Start Zeppelin
    Add BigQuery Interpreter with your project ID
    Create new note with %bsql.sql and run your SQL against public datasets in bigquery.
    
    ### Screenshots (if appropriate)
    ![screenshot from 2016-07-12 14 27 30](https://cloud.githubusercontent.com/assets/4242273/16785302/31b104e2-4842-11e6-87c0-b79763dd85c0.png)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/babupe/zeppelin babupe-bigquery

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1170
    
----
commit 2a2bedcf3f90459f2fae5d7efe4798cd6058748f
Author: Babu Prasad Elumalai <ba...@google.com>
Date:   2016-07-12T21:37:10Z

    BigQuery Interpreter for Apazhe Zeppelin

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Awesome! Merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    So, I asked around and it looks like these are apiary generated apis and the generation is not external. I was guided to the discovery doc here - https://www.googleapis.com/discovery/v1/apis/bigquery/v2/rest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Looks great to me, thank you!
    I'll do a final pass and play with it more today. 
    
    Will merge it right after, if there is no further discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Looks great to me, thank you for taking care!
    
    I think now the only thing that is left, is to determine the status of dependency:
    
    ```
    <groupId>com.google.apis</groupId>
    <artifactId>google-api-services-bigquery</artifactId>
    <version>v2-rev265-1.21.0</version>
    ```
    
    >Got it. These packages are licensed under Apache 2.0. I have asked around to see if the code is publicly available.
    
    Is that an open source library? If so, let's link the sources to README.md



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    I have skipped the failing test and modified licensing and modified the notice at the project root.
    Let me know if it looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Please clarify regarding the LICENSE file.
    Also, if you have any other feedback, would be happy to take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Thanks @babupe I think we are almost there! Did another pass and add few last things above.
    
    One more thing I noticed is - right now this PR fails the project's CI build and we heavily rely on this CI automation in the overall progress of the project. So we need to make it green again, in order to be able to merge this. 
    
    From [the logs](https://s3.amazonaws.com/archive.travis-ci.org/jobs/145380794/log.txt) it looks like tests are failing
    
    ```
    [INFO] Zeppelin: BigQuery interpreter ..................... FAILURE [  2.834 s]
    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project zeppelin-bigquery: Compilation failure: Compilation failure:
    [ERROR] /home/travis/build/apache/zeppelin/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java:[41,41] package com.google.cloud.bigquery.testing does not exist
    [ERROR] /home/travis/build/apache/zeppelin/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java:[41,41] package com.google.cloud.bigquery.testing does not exist
    [ERROR] -> [Help 1]
    [ERROR] 
    ```
    
    I tried running `mvn test -pl bigquery` on my machine and got 
    ```
    [INFO] -------------------------------------------------------------
    [ERROR] COMPILATION ERROR :
    [INFO] -------------------------------------------------------------
    [ERROR] /Users/..../zeppelin-incubator/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java:[41,41] package com.google.cloud.bigquery.testing does not exist
    [ERROR] /Users/..../zeppelin-incubator/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java:[41,41] package com.google.cloud.bigquery.testing does not exist
    [INFO] 2 errors
    [INFO] -------------------------------------------------------------
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD FAILURE
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 20.704 s
    [INFO] Finished at: 2016-07-18T11:34:19+09:00
    [INFO] Final Memory: 37M/363M
    [INFO] ------------------------------------------------------------------------
    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project zeppelin-bigquery: Compilation failure: Compilation failure:
    [ERROR] /Users/..../zeppelin-incubator/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java:[41,41] package com.google.cloud.bigquery.testing does not exist
    [ERROR] /Users/..../zeppelin-incubator/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java:[41,41] package com.google.cloud.bigquery.testing does not exist
    [ERROR] -> [Help 1]
    ```
    
    I'm not 100% sure why is that, may be you could point me to the documentation on some pre-requests that I'm missing?
    
    But in case some useful tests can not work without external service dependency - the simplest way to fix the CI is to:
     - exclude them from the default build and 
     - document the way to run it manually
    
    You can see an example of how the same was recently done for the python interpreter in #1164, in particular regarding [PythonInterpreterWithPythonInstalledTest](https://github.com/apache/zeppelin/pull/1164/files#diff-177dacc81c4b28b15b73263d3540968cL33) and it's excludion in [pom.xml](https://github.com/apache/zeppelin/pull/1164/files#diff-db5740cf5cd90cf11935b92eb80d7908R38)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Hi Alexander,
    I spoke to our OSS team in Google and it seems they wanted to keep the source headers.
    Since I have added that to the NOTICE file earlier, I have modified the license headers for the sources to refer to Google license.
    Is this OK?
    
    I referred to your earlier comment for this,
    "In this case, if you choose to keep it, it makes sense to be more specific and include in NOTICE a location as well: either ./bigquery/ or full paths to the source code files for which Google holds copyright i.e bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java, etc"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    No problem. Pushed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Great stuff! Thank you for taking care. 
    I have added babupe/zeppelin#1 please review and merge it in.
    
    Will merge this contribution to **master** and **branch-0.6**, if there is no further discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Sorry about the delay. Merged it now. Thank you so much for documenting this!
    I am looking at surfacing the errors out to the interpreter. Will have an update on it.
    Will also look at exposing creds for non GCE instances through configuration.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    > So, you may have to run the tests manually from a GCE instance after setting a project id in resources/constants.json.
    >After this, running the test with -Dbigquery.test.exclude='' will start passing.
    Do you want me to document this in README?
    
    Got it, yes, I think this is exactly what needs to be documented, like:
    ```
    Tests XXXX are excluded as they depend on BigQuery service which does not have mock at the moment. So in order to run them manually you need:
     - create a new project in GCE (?with some permissions?)
     -  copy project id from YYYY to `resources/constants.json`
     -  run them from GCE instance using `mvn ....`
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Made the changes and pushed them. Hope its good now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Got it, makes perfect sense! Thank you for updating error handling and the plan to address external auth cfg later sounds good.
    
    Looks great to me, merging to *master* and *branch-0.6* if there is not further duscussion!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Yes, the code is failing because it has to call the bigquery external service with a project id that it owns. Currently there is no emulator.
    So, you may have to run the tests manually from a GCE instance after setting a project id in resources/constants.json.
    After this, running the test with -Dbigquery.test.exclude='' will start passing.
    Do you want me to document this in README?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    @babupe thank you for taking care. @AhyoungRyu thank you for review. 
    
    Let me do another pass on it and get back to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    While running locally I got:
    
    ```
    ERROR [2016-07-29 15:05:09,591] ({pool-2-thread-2} BigQueryInterpreter.java[open]:142) - Cannot open connection
    java.io.IOException: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
            at com.google.api.client.googleapis.auth.oauth2.DefaultCredentialProvider.getDefaultCredential(DefaultCredentialProvider.java:95)
            at com.google.api.client.googleapis.auth.oauth2.GoogleCredential.getApplicationDefault(GoogleCredential.java:213)
            at org.apache.zeppelin.bigquery.BigQueryInterpreter.createAuthorizedClient(BigQueryInterpreter.java:155)
            at org.apache.zeppelin.bigquery.BigQueryInterpreter.open(BigQueryInterpreter.java:138)
            at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
            at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
            at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:383)
            at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
            at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
     INFO [2016-07-29 15:05:09,593] ({pool-2-thread-2} BigQueryInterpreter.java[close]:281) - Close bqsql connection!
    ```
    
    @babupe do you think it would make sense to expose this though interpreter configuration properties, same as `project id`, so people could override it? Or may be it would be easier to use OAuth service account?
    
    In any way, we need to document this properly for people to know that there are [pre-requests](https://developers.google.com/identity/protocols/application-default-credentials), before using this interpreter.
    
    I will submit the PR with docs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Awesome! Thank you!
    I have added that to the documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by AhyoungRyu <gi...@git.apache.org>.
Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    @babupe Also you need to add
    ```
    <li><a href="{{BASE_PATH}}/interpreter/bigquery.html">BigQuery</a></li>
    ```
    
    this line to [here](https://github.com/apache/zeppelin/blob/master/docs/_includes/themes/zeppelin/_navigation.html#L51). Then you can find `bigquery.md` in the dropdown menu on docs site.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Thank you! Sounds awesome to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Added babupe/zeppelin#2 
    
    After configuring credentials I got strange behavior: query `show tables` results is `ERROR` paragraph status with 
     - no output in GUI
     - and nothing in the logs
    
    ![screen shot 2016-07-29 at 15 40 08](https://cloud.githubusercontent.com/assets/5582506/17239974/3869e042-55a3-11e6-9861-ecc4cc9f6247.png)
    
    ```
      INFO [2016-07-29 15:39:49,759] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1469774389758 started by scheduler org.apache.zeppelin.bigquery.BigQueryInterpreter745463803
     INFO [2016-07-29 15:39:49,759] ({pool-2-thread-2} BigQueryInterpreter.java[interpret]:288) - Run SQL command 'show tables'
     INFO [2016-07-29 15:39:50,531] ({pool-2-thread-2} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1469774389758 finished by scheduler org.apache.zeppelin.bigquery.BigQueryInterpreter745463803
    ```
    
    Which is a bit frustrating experience, since it is `ERROR` but user does not know why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    > Sorry about that, I have removed the bad package import from the class
    
    No problem! Thanks for taking care. According to [CI logs](https://s3.amazonaws.com/archive.travis-ci.org/jobs/145450671/log.txt) there is still some issue with the tests 
    
    ```
    [INFO] Zeppelin: BigQuery interpreter ..................... FAILURE [  1.003 s]
    
    -------------------------------------------------------
     T E S T S
    -------------------------------------------------------
    Running org.apache.zeppelin.bigquery.BigQueryInterpreterTest
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/travis/build/apache/zeppelin/zeppelin-interpreter/target/zeppelin-interpreter-0.7.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/travis/build/apache/zeppelin/zeppelin-interpreter/target/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/travis/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    log4j:WARN No appenders could be found for logger (org.apache.zeppelin.interpreter.Interpreter).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.3 sec <<< FAILURE! - in org.apache.zeppelin.bigquery.BigQueryInterpreterTest
    badSqlSyntaxFails(org.apache.zeppelin.bigquery.BigQueryInterpreterTest)  Time elapsed: 0.156 sec  <<< ERROR!
    java.lang.NullPointerException: null
    	at org.apache.zeppelin.bigquery.BigQueryInterpreter.run(BigQueryInterpreter.java:263)
    	at org.apache.zeppelin.bigquery.BigQueryInterpreter.executeSql(BigQueryInterpreter.java:248)
    	at org.apache.zeppelin.bigquery.BigQueryInterpreter.interpret(BigQueryInterpreter.java:289)
    	at org.apache.zeppelin.bigquery.BigQueryInterpreterTest.badSqlSyntaxFails(BigQueryInterpreterTest.java:112)
    
    sqlSuccess(org.apache.zeppelin.bigquery.BigQueryInterpreterTest)  Time elapsed: 0 sec  <<< ERROR!
    java.lang.NullPointerException: null
    	at org.apache.zeppelin.bigquery.BigQueryInterpreter.run(BigQueryInterpreter.java:263)
    	at org.apache.zeppelin.bigquery.BigQueryInterpreter.executeSql(BigQueryInterpreter.java:248)
    	at org.apache.zeppelin.bigquery.BigQueryInterpreter.interpret(BigQueryInterpreter.java:289)
    	at org.apache.zeppelin.bigquery.BigQueryInterpreterTest.sqlSuccess(BigQueryInterpreterTest.java:103)
    
    
    Results :
    
    Tests in error: 
      BigQueryInterpreterTest.badSqlSyntaxFails:112 » NullPointer
      BigQueryInterpreterTest.sqlSuccess:103 » NullPointer
    
    Tests run: 2, Failures: 0, Errors: 2, Skipped: 0
    ``` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    I have now pushed the change to capture and show bigquery exceptions with bad statements both in logs and interpreter. Will look into the credentials.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Do you think the external auth could be a blocker for the initial merge?
    If not, lets table it for now and I will pick this up later?
    It may be a little bit more involved than I thought.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Got it, pushed all the changes in now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    CI build failed due to networking issues
    
    ```
    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.4:process (default) on project zeppelin: Error downloading resources archive. Could not transfer artifact org.apache.apache.resources:apache-jar-resource-bundle:jar:1.5-SNAPSHOT from/to apache-snapshots (https://repository.apache.org/snapshots/): Connect to repository.apache.org:443 [repository.apache.org/207.244.88.143] failed: Connection timed out
    [ERROR] org.apache.apache.resources:apache-jar-resource-bundle:jar:1.5-SNAPSHOT
    ```
    which are not related.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    I see, generated source code is not available, thank you for clarification! So we should assume that we are including in the Apache Zeppelin release convenience binary a non-opensource third-party binary, distributed under permissive license.
    
    We need a proper attribution then, but to be honest with you -  I think we might need a hand from LEGAL branch of ASF here, as it's the first time we deal with such matter.
    
    I think everything is fine and it should be possible, will just require a propper attribution. I will bet back to you ASAP here, as we triage it with ASF.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Hi Babupe, in my understanding this change makes ASF treat current patch\PR as a ["third-party work"](http://www.apache.org/legal/src-headers.html#3party) VS ["work submitted directly to the ASF by the copyright owner or owner's agent"](http://www.apache.org/legal/src-headers.html#headers), wich just happen to have Apache 2.0 license.
    
    So according to the policy I reference above
    > Do ensure that every third-party work includes its associated license, even if that requires adding a copy of the license from the third-party download site into the distribution.
    
    We need to:
     - add the full text of the license as a separate file under the [./licenses/](https://github.com/apache/zeppelin/tree/master/licenses)
     - remove the line that was added before from NOTICE and 
     - add mention of the `./bigquery` to the Apache2.0 part of the root LICENSE file
    
    Sorry for the bureaucracy again, hope it makes sense to you and please do not hesitate to ask questions in case you have them.
    
    One more thing - there is a trivial conflict with current master now, but it blocks the merge, could you please `git rebase origin master` it?
    
    I plan to merge it to *master*, as well as *branch-0.6* ASAP, so it will become part of the coming 0.6.1 release later this week.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    @babupe great job, I have highlighted few issues inline in code above.
    
    I would really appreciate if, for the sake of maintainability, you could please add `./bigquery/README.md` with some _technical_ information on interpreter implementation i.e
    
     - point to the sources of current BigQuery library that is beeing used
    
      ```
    com.google.apis:google-api-services-bigquery:v2-rev265-1.21.0
      ```
      I had no luck locating it by searching online :\
    
     - small rationale why it was used or how it is compared to [GoogleCloudPlatform/gcloud-java](https://github.com/GoogleCloudPlatform/gcloud-java/)
    
     - mention authentication mechanism used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Great, thank you! Please, let me make a final pass and I'll get back to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    As transitive dependecies in compile scope got included in final release convenience binary, all licenses need to be included to [zeppelin-distribution/src/bin_license/LICENSE](https://github.com/apache/zeppelin/blob/master/zeppelin-distribution/src/bin_license/LICENSE)
    
    I double-checked and have marked the included ones
    
      - [x] +- com.google.apis:google-api-services-bigquery:jar:v2-rev265-1.21.0:compile
      - [ ] |  \- com.google.api-client:google-api-client:jar:1.21.0:compile
      - [ ] |     \- com.google.guava:guava-jdk5:jar:17.0:compile
      - [ ] +- com.google.oauth-client:google-oauth-client:jar:1.21.0:compile
      - [ ] |  +- com.google.http-client:google-http-client:jar:1.21.0:compile
      - [x] |  |  \- org.apache.httpcomponents:httpclient:jar:4.3.6:compile
      - [x] |  |     +- org.apache.httpcomponents:httpcore:jar:4.3.3:compile
      - [x] |  |     +- commons-logging:commons-logging:jar:1.1.1:compile
      - [x] |  |     \- commons-codec:commons-codec:jar:1.5:compile
      - [x] |  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
      - [ ] +- com.google.http-client:google-http-client-jackson2:jar:1.21.0:compile
      - [x] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.1.3:compile
      - [ ] +- com.google.oauth-client:google-oauth-client-jetty:jar:1.21.0:compile
      - [ ] |  +- com.google.oauth-client:google-oauth-client-java6:jar:1.21.0:compile
      - [ ] |  \- org.mortbay.jetty:jetty:jar:6.1.26:compile
      - [ ] |     +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
      - [ ] |     \- org.mortbay.jetty:servlet-api:jar:2.5-20081211:compile
      - [x] +- com.google.code.gson:gson:jar:2.6:compile
      - [x] +- org.slf4j:slf4j-api:jar:1.7.10:compile
      - [x] +- org.slf4j:slf4j-log4j12:jar:1.7.10:compile
      - [x] |  \- log4j:log4j:jar:1.2.17:compile



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Thanks! I will work on incorporating the feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    https://github.com/GoogleCloudPlatform/gcloud-java/blob/master/TESTING.md#testing-code-that-uses-bigquery
    There is no emulator for BigQuery today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Great contribution, thank you!
    
    Couple of questions on top of few one already raised above:
     - will you be willing to add some tests as well? Looking at other interpreters impl like #1164 one can see how to test I.e syntax errors and other conditions, etc
     - would it be hard or do you have plans to implement `cancel` and `getProgress` as part of this contribution? They are very usefull in case of big queries and improve user experience a lot on UI
     - in current code, you use static interpreter registration mechanism, which is now [deprecated](https://issues.apache.org/jira/browse/ZEPPELIN-804) and all interpreters are in the process of migration. While we are here, in order not to create one more migration sub-tesk for later, could you please switch to `interpreter-settings.json` one instead? Its a dime change, here is an example of how it is done for another interpreter #1063 
     - every interpreter and it's supported/unsupported features and cobfiguration options are usually documented under `/docs/` which becomes part of the [website](http://zeppelin.apache.org/docs/0.6.0/development/howtocontributewebsite.html). Would you care to document this one as well?
     - for every new library introduced by this module, which Jar will end up in the release convenience binary (all, with default scope) we need to have a record in `https://github.com/apache/zeppelin/blob/master/zeppelin-distribution/src/bin_license/LICENSE` specifying it's license
    
    Feel free to ask here, if something is not clearly or in case you have any further questions!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Hi,
    
    1. I have added unit tests, but the tests will work with a valid google project id on google compute, else it will fail. This is because the test has to run against the bigquery service which does not have a local build.
    2. I have added a server side cancel operation. However I could not find a viable method to report progress. Will implement this if this becomes available as part of big query API in the future.
    3. Implemented dynamic interpreter
    4. Added BigQuery API license to the license file
    5. Docs updated with BigQuery information.
    6. Exceptions implemented at appropriate places.
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPEL...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/1170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by AhyoungRyu <gi...@git.apache.org>.
Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    +1 for technical docs `bigquery/README.md` that @bzz suggested and it looks great that you added! 
    How about adding below one sentence in the end of `bigquery.md` so that users can know the existence of this file?
    
    ```
    ## Technical description
    
    For in-depth technical details on current implementation please refer to [bigquery/README.md](https://github.com/apache/zeppelin/blob/master/bigquery/README.md).
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    I think excluding constants.json helped.
    Let me know if you need anything else from my side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by babupe <gi...@git.apache.org>.
Github user babupe commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Sorry about that, I have removed the bad package import from the class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Regarding 
    >Is it ok if I do the following,
    > 1. Create a NOTICE file in the root of bigquery project. i.e ./bigquery/NOTICE and add something like https://github.com/apache/incubator-beam/blob/master/NOTICE
    
    [ASF policy](http://www.apache.org/legal/src-headers.html#notice) requires "Every Apache distribution should include a NOTICE file in the top directory, along with the standard LICENSE file.", so I think we need to keep it in https://github.com/apache/zeppelin/blob/master/NOTICE  
    
    > 2. Change all copyright headers to standard ASF text.
    
    This sounds great


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1170
  
    Great, thank you for understanding, we are almost there. I have marked above ones that I was able to verify, could you please double-check it?
    
    On the `com.google.apis:google-api-services-bigquery:v2-rev265-1.21.0` - good news! I managed to find source code of this library, presumably generated from the [endpoint](https://www.googleapis.com/discovery/v1/apis/bigquery/v2/rest) by [google/apis-client-generator](https://github.com/google/apis-client-generator) available under ASL2.0 on Maven Central http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar
    
    To avoid this confusion further down the road, could you please add those links to `README.md` as part of the interpreter implementation details and, if possible, a brief note why is it preferred over [GoogleCloudPlatform/gcloud-java/#google-cloud-bigquery-alpha](https://github.com/GoogleCloudPlatform/gcloud-java/#google-cloud-bigquery-alpha) ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---