You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by rolmovel <gi...@git.apache.org> on 2015/09/23 12:36:17 UTC

[GitHub] incubator-zeppelin pull request: ZEPPELIN-289: User can now enter ...

GitHub user rolmovel opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/320

    ZEPPELIN-289: User can now enter custom expressions in notebooks' input fields

    Actually, with Zeppelin we can use Spark SQL UDFs perfectly fine. 
    
    We developed a custom UDF library that parses absolute and relative dates. Feeding this library into Spark SQL using the standard UDF mechanism is suboptimal, since each UDF call is repeated for each row of the queried table. 
    
    Example:
    ```
    select * from my_table where agg_date >= parseDate(“-5d”)
    ```
    This repeats the call to parseDate(...) for every single row of 'my_table'.
    
    Even worse, if we filter for a date range like in:
    ```
    select * from my_table where agg_date >= parseDate(“-5d”) and agg_date <= parseDate(“now”)
    ```
    the call to parseDate(...) is performed twice for each row in the table.
    
    Since Spark's UDFs do not have a concept of 'execution context' we were not able to overcome the problem.
    
    We implemented a mechanism of UDF evaluation in Zeppelin, before the query parameters are sent to the interpreter. Parametrizing queries as usual in Zeppelin, in Zeppelin's input forms you can now enter expressions like:
    ```
    eval:parseDate("-5d")
    ```
    or:
    ```
    eval:com.company.custom.udf.UDFUtility.parseDate("-5d")
    ```
    this is similar to how standard SQL works, where parameters are evaluated before being sent to the execution engine.
    
    You can find more info in the org.apache.zeppelin.display.Evaluator javadoc.
    
    The above mentioned query over a table of 1 million records lasts about 1 minute. Applying this PR the execution time is reduced to 15 seconds.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/keedio/incubator-zeppelin eval-notebook-expression

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #320
    
----
commit 1c8a97f38061808ed5221b1de568f7ef7487a34d
Author: Rodrigo Olmo Velasco <ro...@macbook-pro-de-rodrigo.local>
Date:   2015-09-07T09:44:38Z

    ZEPPELIN-289: User can now enter custom expressions in notebooks' input fields. Expression will be evaluated server-side by Zeppelin before being sent to the interpreter.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELIN-289: User can now enter ...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/320#issuecomment-168946727
  
    @lucarosellini thanks for the explanation! 
    @rolmovel Could you merge latest master in to resolve conflicts as well as update `zeppelin-distribution/src/bin_license/LICENSE` with new dependencies added?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELIN-289: User can now enter ...

Posted by lucarosellini <gi...@git.apache.org>.
Github user lucarosellini commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/320#issuecomment-145923135
  
    Hi @bzz,
    this feature is unaware of the underlying interpreter the code is being sent to, no interpreter specific code has been changed.
    We've tested it successfully with spark sql, hive and markdown interpreters, it should work with any other interpreter as well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELIN-289: User can now enter ...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/320#issuecomment-145819977
  
    Looks interesting, thank you for contributing!
    
    Please help me to understand, am I right that these changes potentially affect all interpreter's syntax and code-wise are not localised to the particular use-case with spark sql?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---