You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by si...@bt.com on 2013/11/07 08:55:26 UTC

Presto -> SQL engines for HDFS ?

Hi there,

the appearance of Presto stimulates a thought that a list and overview of engines of this sort would be jolly useful for general understanding and management of FUD in the enterprise....

I think that these engines are purposed for responsiveness for queries that can be written to generate relatively little interconnect traffic on a cluster. 

I know of :

Drill (doh) 
Impala
Presto
Gryphon (uses H-BASE?)
HAWK 
BlinkDB (slightly different, uses bootstrapping for aggregates) 

I believe that Stinger should be thought of as something else - optimisations of an engine purposed for large scale queries.

Is this view close to correct?

Can anyone elucidate on the statements about Impala doing record materialisation where as other engines do vectorization? Is vectorization query rewriting for parallelism? 

How do the folk in the Drill project see the plethora of efforts? Does anyone have a view as to why there are so many engines appearing? 

Best

Simon

----            
Dr. Simon Thompson
Chief Researcher, Customer Experience. 
BT Research. 
BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath. 
IP5 3RE

Note : 

This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you.
We monitor our email system, and may record your emails. 
British Telecommunications plc
Registered office: 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000

Re: Presto -> SQL engines for HDFS ?

Posted by Michael Hausenblas <mh...@maprtech.com>.
> There's a comparison of a few of these in this document.  Perhaps Presto
> needs to be added.
> 
> http://online.liebertpub.com/doi/pdfplus/10.1089/big.2013.0011


Good point. Might create a GDocs around that …

Cheers,
		Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 7 Nov 2013, at 10:33, Tom Seddon <mr...@gmail.com> wrote:

> There's a comparison of a few of these in this document.  Perhaps Presto
> needs to be added.
> 
> http://online.liebertpub.com/doi/pdfplus/10.1089/big.2013.0011
> 
> 
> On 7 November 2013 07:55, <si...@bt.com> wrote:
> 
>> Hi there,
>> 
>> the appearance of Presto stimulates a thought that a list and overview of
>> engines of this sort would be jolly useful for general understanding and
>> management of FUD in the enterprise....
>> 
>> I think that these engines are purposed for responsiveness for queries
>> that can be written to generate relatively little interconnect traffic on a
>> cluster.
>> 
>> I know of :
>> 
>> Drill (doh)
>> Impala
>> Presto
>> Gryphon (uses H-BASE?)
>> HAWK
>> BlinkDB (slightly different, uses bootstrapping for aggregates)
>> 
>> I believe that Stinger should be thought of as something else -
>> optimisations of an engine purposed for large scale queries.
>> 
>> Is this view close to correct?
>> 
>> Can anyone elucidate on the statements about Impala doing record
>> materialisation where as other engines do vectorization? Is vectorization
>> query rewriting for parallelism?
>> 
>> How do the folk in the Drill project see the plethora of efforts? Does
>> anyone have a view as to why there are so many engines appearing?
>> 
>> Best
>> 
>> Simon
>> 
>> ----
>> Dr. Simon Thompson
>> Chief Researcher, Customer Experience.
>> BT Research.
>> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
>> IP5 3RE
>> 
>> Note :
>> 
>> This email contains BT information, which may be privileged or
>> confidential. It's meant only for the individual(s) or entity named above.
>> If you're not the intended recipient, note that disclosing, copying,
>> distributing or using this information is prohibited. If you've received
>> this email in error, please let me know immediately on the email address
>> above. Thank you.
>> We monitor our email system, and may record your emails.
>> British Telecommunications plc
>> Registered office: 81 Newgate Street London EC1A 7AJ
>> Registered in England no: 1800000


Re: Presto -> SQL engines for HDFS ?

Posted by Tom Seddon <mr...@gmail.com>.
There's a comparison of a few of these in this document.  Perhaps Presto
needs to be added.

http://online.liebertpub.com/doi/pdfplus/10.1089/big.2013.0011


On 7 November 2013 07:55, <si...@bt.com> wrote:

> Hi there,
>
> the appearance of Presto stimulates a thought that a list and overview of
> engines of this sort would be jolly useful for general understanding and
> management of FUD in the enterprise....
>
> I think that these engines are purposed for responsiveness for queries
> that can be written to generate relatively little interconnect traffic on a
> cluster.
>
> I know of :
>
> Drill (doh)
> Impala
> Presto
> Gryphon (uses H-BASE?)
> HAWK
> BlinkDB (slightly different, uses bootstrapping for aggregates)
>
> I believe that Stinger should be thought of as something else -
> optimisations of an engine purposed for large scale queries.
>
> Is this view close to correct?
>
> Can anyone elucidate on the statements about Impala doing record
> materialisation where as other engines do vectorization? Is vectorization
> query rewriting for parallelism?
>
> How do the folk in the Drill project see the plethora of efforts? Does
> anyone have a view as to why there are so many engines appearing?
>
> Best
>
> Simon
>
> ----
> Dr. Simon Thompson
> Chief Researcher, Customer Experience.
> BT Research.
> BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
> IP5 3RE
>
> Note :
>
> This email contains BT information, which may be privileged or
> confidential. It's meant only for the individual(s) or entity named above.
> If you're not the intended recipient, note that disclosing, copying,
> distributing or using this information is prohibited. If you've received
> this email in error, please let me know immediately on the email address
> above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ
> Registered in England no: 1800000