You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by herval <gi...@git.apache.org> on 2017/06/22 00:00:15 UTC

[GitHub] zeppelin pull request #2428: [ZEPPELIN-1470] limiting results from jdbc

GitHub user herval opened a pull request:

    https://github.com/apache/zeppelin/pull/2428

    [ZEPPELIN-1470] limiting results from jdbc

    ### What is this PR for?
    
    One thing we tracked down is that if you issue a large query on a very large table, it will simply try to load all results (and then cap them on Zeppelin's side), which seems suboptimal (and will freeze the server). Setting this on the JDBC level seems to solve the problem.
    
    ### What type of PR is it?
    Bug Fix
    
    ### Todos
    * [ ] - Task
    
    ### How should this be tested?
    
    - Create or use a table with a very large number of rows. In our tests, I simply created a:
    
    ```
    createdb zeppelin_test
    
    psql zeppelin_test
    
    create table too_many_rows(n int)
    ```
    
    And added 5m rows to it.
    
    Making a paragraph like this will hang without this:
    ```
    %zeppelin_test
    
    select * from too_many_rows
    ```
    
    ### Questions:
    * Does the licenses files need update?
    No
    * Is there breaking changes for older versions?
    No
    * Does this needs documentation?
    No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/herval/zeppelin hfreire/limit-row-count

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/2428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2428
    
----
commit f574f84706e4b345b62bebd456964928c4b787e8
Author: Herval Freire <hf...@twitter.com>
Date:   2017-06-21T23:54:29Z

    limiting results from jdbc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    @tinkoff-dwh build #3 is failing in both my master & zeppelin master: https://travis-ci.org/apache/zeppelin - restarting didn't make it pass: https://travis-ci.org/apache/zeppelin/jobs/246113686


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    @tinkoff-dwh you're right, picked the wrong method - using setFetchSize now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    user will not know that the table has more rows and the result is not complete. Maybe set fetchSize?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    yep, getMaxResult is correct. i thought the condition <= 
    ```while (displayRowCount < getMaxResult() && resultSet.next())```
    and last next() will request the next part (fetchSize)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    I am using the original setting (getMaxResults). I had to set it in two calls because setFetchSize will determine that there is a “next” page, but if I only do “setMaxResults”, it will not display the truncation warning (as it translates to a “limit”)
    
    ________________________________
    From: Felix Cheung <no...@github.com>
    Sent: Thursday, June 29, 2017 9:30:44 PM
    To: apache/zeppelin
    Cc: Herval Freire; Mention
    Subject: Re: [apache/zeppelin] [ZEPPELIN-1470] limiting results from jdbc (#2428)
    
    
    got it. I agree it's generally better to restrict the job at the source
    perhaps you can use the existing settings but apply it at the JDBC instead? it would be easier to switch existing user over the new behavior this way - actually, let's see what other thinks about this approach.
    
    —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub<https://github.com/apache/zeppelin/pull/2428#issuecomment-312172202>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAAV6lwqItW73HbpQPIjLnKd6M4nnvryks5sJHn0gaJpZM4OBqFf>.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    https://travis-ci.org/herval/zeppelin/builds/245889847
    try restart 3 and 5 job


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    Rebased to latest master - please advise


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    @herval 
    reopen pull request. and wait another reviewers or committers)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    failed test wich load dependencies, it is possible too long or problems with the network. try restart again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    got it. I agree it's generally better to restrict the job at the source
    perhaps you can use the existing settings but apply it at the JDBC instead? it would be easier to switch existing user over the new behavior this way - actually, let's see what other thinks about this approach.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    I'm also not exactly sure how to write a test for this, as the JDBCInterpreterTest doesn't use anything in the line of mocks (and the outside behavior - limiting the number of returned rows - is already guaranteed by some filtering code. Any ideas @tinkoff-dwh ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    It seems apache/zeppelin master (which is what this branch is based on) is failing the build - should I rebase it to 0.7 instead, or wait for master to be fixed? Please let me know what should I do next here (this is my first contribution to the project, pardon the ignorance)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    ping - please advise


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    Why +1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    merging if no more comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    Build's passing: https://travis-ci.org/herval/zeppelin/builds/247193502
    
    Let me know if this is mergeable :-) @tinkoff-dwh 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    it would be better getMaxResult() + 1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    any additional thoughts on this? Is it mergeable? @khalidhuseynov @tinkoff-dwh @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    @tinkoff-dwh  it's not duplicate. The bug here is that if you don't do `statement.setMaxRows`, some drivers (eg postgres) will try to load everything when you issue a `select *`. Using `setFetchSize` won't affect that (I tested it). So the solution I found is to set the limit to (fetch size + 1). There's a function that will prune the results *after they're back from JDBC* (`getResults`) - I'm guessing we could remove that, if all it did was checking that limit, but don't think it's worth pursuing at this time.
    
    TLDR - the current limit check is only applied _after_ you get results. If you query a big enough table, you'll never get the results (and will kill Zeppelin in the process)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    hmm.. It actually seems that setFetchSize does not limit the query execution (so it takes forever & hangs Zeppelin)
    
    Figuring out if there's a way to show results are truncated + use setMaxRows


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/2428


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    ![screenshot 2017-06-29 12 37 41](https://user-images.githubusercontent.com/5610/27706856-07f38206-5cc8-11e7-9e57-03f5d6d4cffd.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    we already have a check to limit. setMaxRows duplicate it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    I think it's impossible to test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by astroshim <gi...@git.apache.org>.
Github user astroshim commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    LGTM~
    I think this PR can be merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc

Posted by masyan <gi...@git.apache.org>.
Github user masyan commented on the issue:

    https://github.com/apache/zeppelin/pull/2428
  
    we already have a check to limit. setMaxRows duplicate it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---