You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Wes Peng <we...@freenetMail.de> on 2022/04/07 11:05:23 UTC

query time comparison to several SQL engines

I made a simple test to query time for several SQL engines including 
mysql, hive, drill and spark. The report,

https://cloudcache.net/data/query-time-mysql-hive-drill-spark.pdf

It maybe have no special meaning, just for fun. :)

regards.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: query time comparison to several SQL engines

Posted by James Turton <ja...@somecomputer.xyz.INVALID>.

What might be the biggest factor affecting running time here is that 
Drill's query execution is not fault tolerant while Spark's is.  The 
philosophy is different, Drill's says "when you're doing interactive 
analytics and a node dies, killing your query as it goes, just run the 
query again."

On 2022/04/07 16:11, Wes Peng wrote:
>
> Hi Jacek,
>
> Spark and Drill have no direct relations. But they have the similar 
> architecture.
>
> If you read the book "Learning Apache Drill" (I guess it's free 
> online), chap 3 will give you Drill's SQL engine architecture:
>
>
> It's quite similar to Spark's.
>
> And the distributed implementation architecture is almost the same as 
> Spark:
>
>
> Though they are separated products, but have the similar 
> implementation IMO.
>
> No, I didn't use a statement optimized for Drill. It's just a common 
> SQL statement.
>
> The reason for drill is faster, I think it's b/c drill's direct mmap 
> technology. It's more memory consumed than spark, so more faster.
>
> Thanks.
>
>
> Jacek Laskowski wrote:
>> Is this true that Drill is Spark or vice versa under the hood? If so, 
>> how is it possible that Drill is faster? What does Drill do to make 
>> the query faster? Could this be that you used a type of query Drill 
>> is optimized for? Just guessing and am really curious (not implying 
>> that one is better or worse than the other(s)).


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: query time comparison to several SQL engines

Posted by James Turton <ja...@somecomputer.xyz.INVALID>.

What might be the biggest factor affecting running time here is that 
Drill's query execution is not fault tolerant while Spark's is.  The 
philosophy is different, Drill's says "when you're doing interactive 
analytics and a node dies, killing your query as it goes, just run the 
query again."

On 2022/04/07 16:11, Wes Peng wrote:
>
> Hi Jacek,
>
> Spark and Drill have no direct relations. But they have the similar 
> architecture.
>
> If you read the book "Learning Apache Drill" (I guess it's free 
> online), chap 3 will give you Drill's SQL engine architecture:
>
>
> It's quite similar to Spark's.
>
> And the distributed implementation architecture is almost the same as 
> Spark:
>
>
> Though they are separated products, but have the similar 
> implementation IMO.
>
> No, I didn't use a statement optimized for Drill. It's just a common 
> SQL statement.
>
> The reason for drill is faster, I think it's b/c drill's direct mmap 
> technology. It's more memory consumed than spark, so more faster.
>
> Thanks.
>
>
> Jacek Laskowski wrote:
>> Is this true that Drill is Spark or vice versa under the hood? If so, 
>> how is it possible that Drill is faster? What does Drill do to make 
>> the query faster? Could this be that you used a type of query Drill 
>> is optimized for? Just guessing and am really curious (not implying 
>> that one is better or worse than the other(s)).

Re: query time comparison to several SQL engines

Posted by Wes Peng <we...@freenetMail.de>.

Hi Jacek,

Spark and Drill have no direct relations. But they have the similar 
architecture.

If you read the book "Learning Apache Drill" (I guess it's free online), 
chap 3 will give you Drill's SQL engine architecture:

It's quite similar to Spark's.

And the distributed implementation architecture is almost the same as Spark:

Though they are separated products, but have the similar implementation IMO.

No, I didn't use a statement optimized for Drill. It's just a common SQL 
statement.

The reason for drill is faster, I think it's b/c drill's direct mmap 
technology. It's more memory consumed than spark, so more faster.

Thanks.

Jacek Laskowski wrote:
> Is this true that Drill is Spark or vice versa under the hood? If so, 
> how is it possible that Drill is faster? What does Drill do to make 
> the query faster? Could this be that you used a type of query Drill is 
> optimized for? Just guessing and am really curious (not implying that 
> one is better or worse than the other(s)).

Re: query time comparison to several SQL engines

Posted by Wes Peng <we...@freenetMail.de>.

Hi Jacek,

Spark and Drill have no direct relations. But they have the similar 
architecture.

If you read the book "Learning Apache Drill" (I guess it's free online), 
chap 3 will give you Drill's SQL engine architecture:

It's quite similar to Spark's.

And the distributed implementation architecture is almost the same as Spark:

Though they are separated products, but have the similar implementation IMO.

No, I didn't use a statement optimized for Drill. It's just a common SQL 
statement.

The reason for drill is faster, I think it's b/c drill's direct mmap 
technology. It's more memory consumed than spark, so more faster.

Thanks.

Jacek Laskowski wrote:
> Is this true that Drill is Spark or vice versa under the hood? If so, 
> how is it possible that Drill is faster? What does Drill do to make 
> the query faster? Could this be that you used a type of query Drill is 
> optimized for? Just guessing and am really curious (not implying that 
> one is better or worse than the other(s)).

Re: query time comparison to several SQL engines

Posted by Jacek Laskowski <ja...@japila.pl>.

Hi Wes,

Thanks for the report! I like it (mostly because it's short and concise).
Thank you.

I know nothing about Drill and am curious about the similar execution times
and this sentence in the report: "Spark is the second fastest, that should
be reasonable, since both Spark and Drill have almost the same
implementation architecture.".

Is this true that Drill is Spark or vice versa under the hood? If so, how
is it possible that Drill is faster? What does Drill do to make the query
faster? Could this be that you used a type of query Drill is optimized for?
Just guessing and am really curious (not implying that one is better or
worse than the other(s)).

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>

On Thu, Apr 7, 2022 at 1:05 PM Wes Peng <we...@freenetmail.de> wrote:

> I made a simple test to query time for several SQL engines including
> mysql, hive, drill and spark. The report,
>
> https://cloudcache.net/data/query-time-mysql-hive-drill-spark.pdf
>
> It maybe have no special meaning, just for fun. :)
>
> regards.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>