You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by imbar marinescu <im...@gmail.com> on 2016/08/11 16:22:41 UTC

Performance question

Hi,

I'm looking into drill, to use it as an in memory db.
I wanted to handle data that I have in a Sql Server db.
I connected with an Sql Server jdbc plug in, and my test query ran for
about 2 sec.
When running directly from Sql Server it took 0.15 sec.

I ran a "create table" as a parquet file and then tried to query with dfs
plug in.
The query ran for 0.5 sec (after caching. first run is about 3 sec).
Also tried to do "REFRESH TABLE METADATA", but it didn't change anything.

My Test query is:
select sum(f.Sales), p.`Product Category`
from dfs.tmp.`/Demo/Facts/` f
join dfs.tmp.`/Demo/Product/` p on p.productKey = f.productKey
group by p.`Product Category`;

Facts table has 422,833 rows, product has 606.
The result set is 4 rows.

This was done running drill locally (embedded) on a windows machine.
I tried a linux machine, but the results where even slower.

I didn't configure anything, just used the install as-is.

Am I doing something wrong? Is a RDBMS going to be faster anyway?
I read about the performance and I feel I'm not getting there.

SqlServer: 0.15 sec.
SqlServer in drill: 2 sec.
Parquet in drill: 0.5 sec.

Thank you,
Imbar

Re: Performance question

Posted by Zelaine Fong <zf...@maprtech.com>.
What does the query plan look like when you're using SqlServer with Drill?
I'm guessing that the join isn't being pushed down to SqlServer.  If so,
you've hit DRILL-4818.  There are known limitations with the JDBC storage
plugin that prevent it from generating the optimal query plan in cases like
this.

-- Zelaine

On Thu, Aug 11, 2016 at 9:22 AM, imbar marinescu <im...@gmail.com> wrote:

> Hi,
>
> I'm looking into drill, to use it as an in memory db.
> I wanted to handle data that I have in a Sql Server db.
> I connected with an Sql Server jdbc plug in, and my test query ran for
> about 2 sec.
> When running directly from Sql Server it took 0.15 sec.
>
> I ran a "create table" as a parquet file and then tried to query with dfs
> plug in.
> The query ran for 0.5 sec (after caching. first run is about 3 sec).
> Also tried to do "REFRESH TABLE METADATA", but it didn't change anything.
>
> My Test query is:
> select sum(f.Sales), p.`Product Category`
> from dfs.tmp.`/Demo/Facts/` f
> join dfs.tmp.`/Demo/Product/` p on p.productKey = f.productKey
> group by p.`Product Category`;
>
> Facts table has 422,833 rows, product has 606.
> The result set is 4 rows.
>
> This was done running drill locally (embedded) on a windows machine.
> I tried a linux machine, but the results where even slower.
>
> I didn't configure anything, just used the install as-is.
>
> Am I doing something wrong? Is a RDBMS going to be faster anyway?
> I read about the performance and I feel I'm not getting there.
>
> SqlServer: 0.15 sec.
> SqlServer in drill: 2 sec.
> Parquet in drill: 0.5 sec.
>
> Thank you,
> Imbar
>

Re: Performance question

Posted by imbar marinescu <im...@gmail.com>.
I also checked on Microsoft Tabular, and the same query came back within
0.01 sec.
That is amazing!

2016-08-11 19:22 GMT+03:00 imbar marinescu <im...@gmail.com>:

> Hi,
>
> I'm looking into drill, to use it as an in memory db.
> I wanted to handle data that I have in a Sql Server db.
> I connected with an Sql Server jdbc plug in, and my test query ran for
> about 2 sec.
> When running directly from Sql Server it took 0.15 sec.
>
> I ran a "create table" as a parquet file and then tried to query with dfs
> plug in.
> The query ran for 0.5 sec (after caching. first run is about 3 sec).
> Also tried to do "REFRESH TABLE METADATA", but it didn't change anything.
>
> My Test query is:
> select sum(f.Sales), p.`Product Category`
> from dfs.tmp.`/Demo/Facts/` f
> join dfs.tmp.`/Demo/Product/` p on p.productKey = f.productKey
> group by p.`Product Category`;
>
> Facts table has 422,833 rows, product has 606.
> The result set is 4 rows.
>
> This was done running drill locally (embedded) on a windows machine.
> I tried a linux machine, but the results where even slower.
>
> I didn't configure anything, just used the install as-is.
>
> Am I doing something wrong? Is a RDBMS going to be faster anyway?
> I read about the performance and I feel I'm not getting there.
>
> SqlServer: 0.15 sec.
> SqlServer in drill: 2 sec.
> Parquet in drill: 0.5 sec.
>
> Thank you,
> Imbar
>