You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Azuryy Yu <az...@gmail.com> on 2015/02/05 10:39:09 UTC

Re: Which [open-souce] SQL engine atop Hadoop?

please look at:
http://mail-archives.apache.org/mod_mbox/tajo-user/201502.mbox/browser



On Tue, Jan 27, 2015 at 5:13 PM, Daniel Haviv <da...@gmail.com> wrote:

> Can you elaborate on why you prefer Tajo?
>
> Daniel
>
> On 27 בינו׳ 2015, at 10:35, Azuryy Yu <az...@gmail.com> wrote:
>
> You almost list all open sourced MPP real time SQL-ON-Hadoop.
>
> I prefer Tajo, which was relased by 0.9.0 recently, and still working in
> progress for 1.0
>
>
> On Mon, Jan 26, 2015 at 10:19 PM, Samuel Marks <sa...@gmail.com>
> wrote:
>
>> Since Hadoop <https://hive.apache.org> came out, there have been various
>> commercial and/or open-source attempts to expose some compatibility with
>> SQL <http://drill.apache.org>.
>>
>> I am seeking one which is good for low-latency querying, and supports the
>> most common CRUD <https://spark.apache.org>, including [the basics!]
>> along these lines: CREATE TABLE, INSERT INTO, SELECT * FROM, UPDATE
>> Table SET C1=2 WHERE, DELETE FROM, and DROP TABLE.
>>
>> I will be utilising them from Python, however there does seem to be a Python
>> JDBC wrapper <https://spark.apache.org/sql>. Additionally it needs to be
>> scalable for big and small data (starting on a single-node "cluster").
>>
>> Here is what I've found thus far:
>>
>>    - Apache Hive <https://hive.apache.org> (SQL-like, with interactive
>>    SQL thanks to the Stinger initiative)
>>    - Apache Drill <http://drill.apache.org> (ANSI SQL support)
>>    - Apache Spark <https://spark.apache.org> (Spark SQL
>>    <https://spark.apache.org/sql>, queries only, add data via Hive, RDD
>>    <https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SchemaRDD>
>>    or Paraquet <http://parquet.io/>)
>>    - Apache Phoenix <http://phoenix.apache.org> (built atop Apache HBase
>>    <http://hbase.apache.org>, lacks full transaction
>>    <http://en.wikipedia.org/wiki/Database_transaction> support, relational
>>    operators <http://en.wikipedia.org/wiki/Relational_operators> and
>>    some built-in functions)
>>    - Presto <https://github.com/facebook/presto> from Facebook (can
>>    query Hive, Cassandra <http://cassandra.apache.org>, relational DBs
>>    &etc. Doesn't seem to be designed for low-latency responses across small
>>    clusters, or support UPDATE operations. It is optimized for data
>>    warehousing or analytics¹
>>    <http://prestodb.io/docs/current/overview/use-cases.html>)
>>    - SQL-Hadoop <https://www.mapr.com/why-hadoop/sql-hadoop> via MapR
>>    community edition <https://www.mapr.com/products/hadoop-download>
>>    (seems to be a packaging of Hive, HP Vertica
>>    <http://www.vertica.com/hp-vertica-products/sqlonhadoop>, SparkSQL,
>>    Drill and a native ODBC wrapper
>>    <http://package.mapr.com/tools/MapR-ODBC/MapR_ODBC>)
>>    - Apache Kylin <http://www.kylin.io> from Ebay (provides an SQL
>>    interface and multi-dimensional analysis [OLAP
>>    <http://en.wikipedia.org/wiki/OLAP>], "… offers ANSI SQL on Hadoop
>>    and supports most ANSI SQL query functions". It depends on HDFS, MapReduce,
>>    Hive and HBase; and seems targeted at very large data-sets though maintains
>>    low query latency)
>>    - Apache Tajo <http://tajo.apache.org> (ANSI/ISO SQL standard
>>    compliance with JDBC <http://en.wikipedia.org/wiki/JDBC> driver
>>    support [benchmarks against Hive and Impala
>>    <http://blogs.gartner.com/nick-heudecker/apache-tajo-enters-the-sql-on-hadoop-space>
>>    ])
>>    - Cascading <http://en.wikipedia.org/wiki/Cascading_%28software%29>'s
>>    Lingual <http://docs.cascading.org/lingual/1.0/>²
>>    <http://docs.cascading.org/lingual/1.0/#sql-support> ("Lingual
>>    provides JDBC Drivers, a SQL command shell, and a catalog manager for
>>    publishing files [or any resource] as schemas and tables.")
>>
>> Which—from this list or elsewhere—would you recommend, and why?
>> Thanks for all suggestions,
>>
>> Samuel Marks
>> http://linkedin.com/in/samuelmarks
>>
>
>

Re: Which [open-souce] SQL engine atop Hadoop?

Posted by Samuel Marks <sa...@gmail.com>.

Hey cool, just found this one: http://trafodion.apache.org/


Samuel Marks
http://linkedin.com/in/samuelmarks

On Thu, Feb 5, 2015 at 8:39 PM, Azuryy Yu <az...@gmail.com> wrote:

> please look at:
> http://mail-archives.apache.org/mod_mbox/tajo-user/201502.mbox/browser
>
>
>
> On Tue, Jan 27, 2015 at 5:13 PM, Daniel Haviv <da...@gmail.com>
> wrote:
>
>> Can you elaborate on why you prefer Tajo?
>>
>> Daniel
>>
>> On 27 בינו׳ 2015, at 10:35, Azuryy Yu <az...@gmail.com> wrote:
>>
>> You almost list all open sourced MPP real time SQL-ON-Hadoop.
>>
>> I prefer Tajo, which was relased by 0.9.0 recently, and still working in
>> progress for 1.0
>>
>>
>> On Mon, Jan 26, 2015 at 10:19 PM, Samuel Marks <sa...@gmail.com>
>> wrote:
>>
>>> Since Hadoop <https://hive.apache.org> came out, there have been
>>> various commercial and/or open-source attempts to expose some compatibility
>>> with SQL <http://drill.apache.org>.
>>>
>>> I am seeking one which is good for low-latency querying, and supports
>>> the most common CRUD <https://spark.apache.org>, including [the
>>> basics!] along these lines: CREATE TABLE, INSERT INTO, SELECT * FROM, UPDATE
>>> Table SET C1=2 WHERE, DELETE FROM, and DROP TABLE.
>>>
>>> I will be utilising them from Python, however there does seem to be a Python
>>> JDBC wrapper <https://spark.apache.org/sql>. Additionally it needs to
>>> be scalable for big and small data (starting on a single-node "cluster").
>>>
>>> Here is what I've found thus far:
>>>
>>>    - Apache Hive <https://hive.apache.org> (SQL-like, with interactive
>>>    SQL thanks to the Stinger initiative)
>>>    - Apache Drill <http://drill.apache.org> (ANSI SQL support)
>>>    - Apache Spark <https://spark.apache.org> (Spark SQL
>>>    <https://spark.apache.org/sql>, queries only, add data via Hive, RDD
>>>    <https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SchemaRDD>
>>>    or Paraquet <http://parquet.io/>)
>>>    - Apache Phoenix <http://phoenix.apache.org> (built atop Apache HBase
>>>    <http://hbase.apache.org>, lacks full transaction
>>>    <http://en.wikipedia.org/wiki/Database_transaction> support, relational
>>>    operators <http://en.wikipedia.org/wiki/Relational_operators> and
>>>    some built-in functions)
>>>    - Presto <https://github.com/facebook/presto> from Facebook (can
>>>    query Hive, Cassandra <http://cassandra.apache.org>, relational DBs
>>>    &etc. Doesn't seem to be designed for low-latency responses across small
>>>    clusters, or support UPDATE operations. It is optimized for data
>>>    warehousing or analytics¹
>>>    <http://prestodb.io/docs/current/overview/use-cases.html>)
>>>    - SQL-Hadoop <https://www.mapr.com/why-hadoop/sql-hadoop> via MapR
>>>    community edition <https://www.mapr.com/products/hadoop-download>
>>>    (seems to be a packaging of Hive, HP Vertica
>>>    <http://www.vertica.com/hp-vertica-products/sqlonhadoop>, SparkSQL,
>>>    Drill and a native ODBC wrapper
>>>    <http://package.mapr.com/tools/MapR-ODBC/MapR_ODBC>)
>>>    - Apache Kylin <http://www.kylin.io> from Ebay (provides an SQL
>>>    interface and multi-dimensional analysis [OLAP
>>>    <http://en.wikipedia.org/wiki/OLAP>], "… offers ANSI SQL on Hadoop
>>>    and supports most ANSI SQL query functions". It depends on HDFS, MapReduce,
>>>    Hive and HBase; and seems targeted at very large data-sets though maintains
>>>    low query latency)
>>>    - Apache Tajo <http://tajo.apache.org> (ANSI/ISO SQL standard
>>>    compliance with JDBC <http://en.wikipedia.org/wiki/JDBC> driver
>>>    support [benchmarks against Hive and Impala
>>>    <http://blogs.gartner.com/nick-heudecker/apache-tajo-enters-the-sql-on-hadoop-space>
>>>    ])
>>>    - Cascading <http://en.wikipedia.org/wiki/Cascading_%28software%29>'s
>>>    Lingual <http://docs.cascading.org/lingual/1.0/>²
>>>    <http://docs.cascading.org/lingual/1.0/#sql-support> ("Lingual
>>>    provides JDBC Drivers, a SQL command shell, and a catalog manager for
>>>    publishing files [or any resource] as schemas and tables.")
>>>
>>> Which—from this list or elsewhere—would you recommend, and why?
>>> Thanks for all suggestions,
>>>
>>> Samuel Marks
>>> http://linkedin.com/in/samuelmarks
>>>
>>
>>
>