You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by herval <gi...@git.apache.org> on 2017/06/22 00:00:15 UTC
[GitHub] zeppelin pull request #2428: [ZEPPELIN-1470] limiting results from jdbc
GitHub user herval opened a pull request:
https://github.com/apache/zeppelin/pull/2428
[ZEPPELIN-1470] limiting results from jdbc
### What is this PR for?
One thing we tracked down is that if you issue a large query on a very large table, it will simply try to load all results (and then cap them on Zeppelin's side), which seems suboptimal (and will freeze the server). Setting this on the JDBC level seems to solve the problem.
### What type of PR is it?
Bug Fix
### Todos
* [ ] - Task
### How should this be tested?
- Create or use a table with a very large number of rows. In our tests, I simply created a:
```
createdb zeppelin_test
psql zeppelin_test
create table too_many_rows(n int)
```
And added 5m rows to it.
Making a paragraph like this will hang without this:
```
%zeppelin_test
select * from too_many_rows
```
### Questions:
* Does the licenses files need update?
No
* Is there breaking changes for older versions?
No
* Does this needs documentation?
No
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/herval/zeppelin hfreire/limit-row-count
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/2428.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2428
----
commit f574f84706e4b345b62bebd456964928c4b787e8
Author: Herval Freire <hf...@twitter.com>
Date: 2017-06-21T23:54:29Z
limiting results from jdbc
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
@tinkoff-dwh build #3 is failing in both my master & zeppelin master: https://travis-ci.org/apache/zeppelin - restarting didn't make it pass: https://travis-ci.org/apache/zeppelin/jobs/246113686
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
@tinkoff-dwh you're right, picked the wrong method - using setFetchSize now
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
user will not know that the table has more rows and the result is not complete. Maybe set fetchSize?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
yep, getMaxResult is correct. i thought the condition <=
```while (displayRowCount < getMaxResult() && resultSet.next())```
and last next() will request the next part (fetchSize)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
I am using the original setting (getMaxResults). I had to set it in two calls because setFetchSize will determine that there is a “next” page, but if I only do “setMaxResults”, it will not display the truncation warning (as it translates to a “limit”)
________________________________
From: Felix Cheung <no...@github.com>
Sent: Thursday, June 29, 2017 9:30:44 PM
To: apache/zeppelin
Cc: Herval Freire; Mention
Subject: Re: [apache/zeppelin] [ZEPPELIN-1470] limiting results from jdbc (#2428)
got it. I agree it's generally better to restrict the job at the source
perhaps you can use the existing settings but apply it at the JDBC instead? it would be easier to switch existing user over the new behavior this way - actually, let's see what other thinks about this approach.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://github.com/apache/zeppelin/pull/2428#issuecomment-312172202>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAAV6lwqItW73HbpQPIjLnKd6M4nnvryks5sJHn0gaJpZM4OBqFf>.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
https://travis-ci.org/herval/zeppelin/builds/245889847
try restart 3 and 5 job
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
Rebased to latest master - please advise
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
@herval
reopen pull request. and wait another reviewers or committers)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
failed test wich load dependencies, it is possible too long or problems with the network. try restart again
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/zeppelin/pull/2428
got it. I agree it's generally better to restrict the job at the source
perhaps you can use the existing settings but apply it at the JDBC instead? it would be easier to switch existing user over the new behavior this way - actually, let's see what other thinks about this approach.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
I'm also not exactly sure how to write a test for this, as the JDBCInterpreterTest doesn't use anything in the line of mocks (and the outside behavior - limiting the number of returned rows - is already guaranteed by some filtering code. Any ideas @tinkoff-dwh ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
It seems apache/zeppelin master (which is what this branch is based on) is failing the build - should I rebase it to 0.7 instead, or wait for master to be fixed? Please let me know what should I do next here (this is my first contribution to the project, pardon the ignorance)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
ping - please advise
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
Why +1?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/zeppelin/pull/2428
merging if no more comment
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
Build's passing: https://travis-ci.org/herval/zeppelin/builds/247193502
Let me know if this is mergeable :-) @tinkoff-dwh
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
it would be better getMaxResult() + 1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
any additional thoughts on this? Is it mergeable? @khalidhuseynov @tinkoff-dwh @felixcheung
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
@tinkoff-dwh it's not duplicate. The bug here is that if you don't do `statement.setMaxRows`, some drivers (eg postgres) will try to load everything when you issue a `select *`. Using `setFetchSize` won't affect that (I tested it). So the solution I found is to set the limit to (fetch size + 1). There's a function that will prune the results *after they're back from JDBC* (`getResults`) - I'm guessing we could remove that, if all it did was checking that limit, but don't think it's worth pursuing at this time.
TLDR - the current limit check is only applied _after_ you get results. If you query a big enough table, you'll never get the results (and will kill Zeppelin in the process)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
hmm.. It actually seems that setFetchSize does not limit the query execution (so it takes forever & hangs Zeppelin)
Figuring out if there's a way to show results are truncated + use setMaxRows
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin pull request #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/zeppelin/pull/2428
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by herval <gi...@git.apache.org>.
Github user herval commented on the issue:
https://github.com/apache/zeppelin/pull/2428
![screenshot 2017-06-29 12 37 41](https://user-images.githubusercontent.com/5610/27706856-07f38206-5cc8-11e7-9e57-03f5d6d4cffd.png)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
we already have a check to limit. setMaxRows duplicate it
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by tinkoff-dwh <gi...@git.apache.org>.
Github user tinkoff-dwh commented on the issue:
https://github.com/apache/zeppelin/pull/2428
I think it's impossible to test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by astroshim <gi...@git.apache.org>.
Github user astroshim commented on the issue:
https://github.com/apache/zeppelin/pull/2428
LGTM~
I think this PR can be merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #2428: [ZEPPELIN-1470] limiting results from jdbc
Posted by masyan <gi...@git.apache.org>.
Github user masyan commented on the issue:
https://github.com/apache/zeppelin/pull/2428
we already have a check to limit. setMaxRows duplicate it
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---