You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Vitalii Diravka <vi...@gmail.com> on 2018/09/15 17:31:02 UTC

Re: Long running query succeeds but UI times out?

Hi James,

This is the mail for user mailing list.
There is no attachment, please upload it to Google Drive, for instance, and
give us the link.

Did you try to use Drill SqlLine?


Kind regards
Vitalii


On Sat, Sep 15, 2018 at 7:45 PM James Barney <ja...@gmail.com>
wrote:

> Hey,
> I've had pretty great success using drill on top of S3 but I'm hitting one
> big issue: a "long running" query (more than 4.5 minutes) will succeed
> after submitting but the UI times out with  'network error (tcp error):
> ""'. See attachment.
>
> Basics:
> Running Drill 1.14 on Amazon Linux. Only modification I made is this
> parameter at runtime to drill-env.sh for reading encrypted files from S3:
> export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
> -Dcom.amazonaws.services.s3.enableV4"
>
> To simplify things I'm just on one drill node with this query:
> select distinct(column_name) from s3.`/path/to/files/year/month/day/hour/`
>
> All the files are well-formed parquet files and querying any single file
> returns fine in a few seconds. When I scale the cluster up to 50+ nodes,
> the query obviously returns much faster and no time out occurs. However,
> more complicated/higher data volume queries (ie, querying a whole days
> worth of data instead of one hour) suffer the same timeout.
>
> Are there settings I can tweak to prevent this timeout from occurring? Can
> I save the results of the query somewhere since it's succeeding in the
> background?
>
> Drill demolishes our current solution with its performance and we really
> want to use it but this bug is making it tricky to sell.
>
> Thanks,
> James
>

Re: Long running query succeeds but UI times out?

Posted by Kunal Khatua <ku...@apache.org>.

Hi James

The problem you're describing could be due to multiple factors. Typically, browsers don't specify a timeout for a request it sends to the server (Amazon, in this case), but it can be safe to assume that it is reasonably long.

You said that things work great for small scale data or simpler queries, but hits this 'timeout' for more complex ones. Drill, as such, will not timeout the query when running via the WebUI. It is more likely that AWS is terminating the connection as it does not see any activity on the wire.

The WebUI is mostly an exploratory tool to help users navigate around and sample their data without the hassle of setting up more powerful tools like SQuirreL or DBeaver. Hence, using the WebUI to run queries that return large result sets is impractical and risks slowing (or, until Drill 1.13, crashing) the Drillbit.

If you cannot use a 3rd party tool like SQuirreL for such scenarios, you can follow Nitin's suggestion of using CTAS or C(Temp)TAS command to write your resultset to a new table for a deeper analysis.This is akin to doing a small ETL step, and you don't risk losing the results if your browser connection breaks. You can simply connect back and query the new table (if it is ready). 

Hope this helps!
~ Kunal

On 9/15/2018 12:23:48 PM, Nitin Pawar <ni...@gmail.com> wrote:
Hi James

you can try creating CTAS query and write the results back to s3 and then
query the data from resulted table

On Sat, Sep 15, 2018 at 11:01 PM Vitalii Diravka
wrote:

> Hi James,
>
> This is the mail for user mailing list.
> There is no attachment, please upload it to Google Drive, for instance, and
> give us the link.
>
> Did you try to use Drill SqlLine?
>
>
> Kind regards
> Vitalii
>
>
> On Sat, Sep 15, 2018 at 7:45 PM James Barney
> wrote:
>
> > Hey,
> > I've had pretty great success using drill on top of S3 but I'm hitting
> one
> > big issue: a "long running" query (more than 4.5 minutes) will succeed
> > after submitting but the UI times out with 'network error (tcp error):
> > ""'. See attachment.
> >
> > Basics:
> > Running Drill 1.14 on Amazon Linux. Only modification I made is this
> > parameter at runtime to drill-env.sh for reading encrypted files from S3:
> > export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
> > -Dcom.amazonaws.services.s3.enableV4"
> >
> > To simplify things I'm just on one drill node with this query:
> > select distinct(column_name) from
> s3.`/path/to/files/year/month/day/hour/`
> >
> > All the files are well-formed parquet files and querying any single file
> > returns fine in a few seconds. When I scale the cluster up to 50+ nodes,
> > the query obviously returns much faster and no time out occurs. However,
> > more complicated/higher data volume queries (ie, querying a whole days
> > worth of data instead of one hour) suffer the same timeout.
> >
> > Are there settings I can tweak to prevent this timeout from occurring?
> Can
> > I save the results of the query somewhere since it's succeeding in the
> > background?
> >
> > Drill demolishes our current solution with its performance and we really
> > want to use it but this bug is making it tricky to sell.
> >
> > Thanks,
> > James
> >
>

--
Nitin Pawar

Re: Long running query succeeds but UI times out?

Posted by Nitin Pawar <ni...@gmail.com>.

Hi James

you can try creating CTAS query and write the results back to s3 and then
query the data from resulted table


On Sat, Sep 15, 2018 at 11:01 PM Vitalii Diravka <vi...@gmail.com>
wrote:

> Hi James,
>
> This is the mail for user mailing list.
> There is no attachment, please upload it to Google Drive, for instance, and
> give us the link.
>
> Did you try to use Drill SqlLine?
>
>
> Kind regards
> Vitalii
>
>
> On Sat, Sep 15, 2018 at 7:45 PM James Barney <ja...@gmail.com>
> wrote:
>
> > Hey,
> > I've had pretty great success using drill on top of S3 but I'm hitting
> one
> > big issue: a "long running" query (more than 4.5 minutes) will succeed
> > after submitting but the UI times out with  'network error (tcp error):
> > ""'. See attachment.
> >
> > Basics:
> > Running Drill 1.14 on Amazon Linux. Only modification I made is this
> > parameter at runtime to drill-env.sh for reading encrypted files from S3:
> > export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
> > -Dcom.amazonaws.services.s3.enableV4"
> >
> > To simplify things I'm just on one drill node with this query:
> > select distinct(column_name) from
> s3.`/path/to/files/year/month/day/hour/`
> >
> > All the files are well-formed parquet files and querying any single file
> > returns fine in a few seconds. When I scale the cluster up to 50+ nodes,
> > the query obviously returns much faster and no time out occurs. However,
> > more complicated/higher data volume queries (ie, querying a whole days
> > worth of data instead of one hour) suffer the same timeout.
> >
> > Are there settings I can tweak to prevent this timeout from occurring?
> Can
> > I save the results of the query somewhere since it's succeeding in the
> > background?
> >
> > Drill demolishes our current solution with its performance and we really
> > want to use it but this bug is making it tricky to sell.
> >
> > Thanks,
> > James
> >
>


-- 
Nitin Pawar