You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Edward Capriolo <ed...@gmail.com> on 2017/04/05 23:41:16 UTC

Apache Hive metastore and Impala

Hello impala devs!

Let me say that I have used impala a lot and am very impressed with it.

I know impala is moving into the Apache incubator (I have an incubator
prodling gossip so I know this is challenging). There are few things I want
to bring to your attention/discuss, so that they do not become an issue or
blocker in the future.

1) code
Your proposal https://wiki.apache.org/incubator/ImpalaProposal lists hive
as a dependency.

External Dependencies

Apache Hive (Apache Software License v2.0)

I notice that the cloudera impala has CDH "hive" (which are rather old)
jars in its source tree:

https://github.com/cloudera/Impala/tree/8b621a301329d91fbe10a8aac5e39a2b14d6d25f/thirdparty/hive-1.1.0-cdh5.12.0-SNAPSHOT

A quick search did not find any evidence of that in incubator-impala (which
is good):
https://github.com/apache/incubator-impala/

We (Hive) want people using only official Apache Hive releases for
dependencies. We want to avoid:
1) Full or partial code forks of Apache Hive which still carry the Hive name
2) Artifacts published to central repositories named "*Hive*" which could
be confusing

I am not asserting that impala if affected by case #1 or #2 currently, but
something to be aware of. If you need guidance  feel free to discuss
further with the Hive PMC.

2) Next topic, the Hive name and statements that imply compatibility:

http://impala.apache.org/

For Apache Hive users, Impala utilizes the same metadata, ODBC driver, SQL
syntax, and user interface as Hiveā€”so you don't have to worry about
re-inventing the implementation wheel.

Apache Hive proposes and adds syntax all the time. For example, this
feature is in the works now (
https://issues.apache.org/jira/browse/HIVE-15986). Even if every effort was
made to keep the languages and features in sync no one would be able to
make this claim. This because Apache Hive does not have compatibility tests
for any of these things (We do not have anything like ANSI SQL 92).

This text needs be replaced. It is probably fine to make statements such as
"Impala can run many of queries as Apache Hive", or "users of Apache Hive
will find many familiar features in Impala".

Again welcome to the incubator, I am sure getting impala through is fun
with the c++ ness of it all!

Thanks,
Edward

Re: Apache Hive metastore and Impala

Posted by Jim Apple <jb...@cloudera.com>.
On Wed, Apr 5, 2017 at 4:41 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> Hello impala devs!
>
> Let me say that I have used impala a lot and am very impressed with it.
>

Thank you!


> I am not asserting that impala if affected by case #1 or #2 currently, but
> something to be aware of. If you need guidance  feel free to discuss
> further with the Hive PMC.
>

I don't think there is a danger right now of Apache Impala (incubating)
forking Hive. As far as artifact publication, Apache Impala (incubating)
does not publish binary artifacts at this time. Nonetheless, I will forward
your message to some people at Cloudera who might be interested, since
Cloudera does publish binaries.


> Apache Hive proposes and adds syntax all the time. For example, this
> feature is in the works now (
> https://issues.apache.org/jira/browse/HIVE-15986).


Funny you should mention that feature -- I actually implemented that for
Impala:
https://github.com/apache/incubator-impala/commit/1a3d7ffd4fd392b3ed831dfc7a3bfcfdb8cb8bbd#diff-a7c8505823aef79d508ede7e4d4e464a

As you say, though, our syntax is unlikely to be identical. I've sent a
patch for review that will change our webpage:
http://gerrit.cloudera.org:8080/6567

Re: Apache Hive metastore and Impala

Posted by Jim Apple <jb...@cloudera.com>.
On Wed, Apr 5, 2017 at 4:41 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> Hello impala devs!
>
> Let me say that I have used impala a lot and am very impressed with it.
>

Thank you!


> I am not asserting that impala if affected by case #1 or #2 currently, but
> something to be aware of. If you need guidance  feel free to discuss
> further with the Hive PMC.
>

I don't think there is a danger right now of Apache Impala (incubating)
forking Hive. As far as artifact publication, Apache Impala (incubating)
does not publish binary artifacts at this time. Nonetheless, I will forward
your message to some people at Cloudera who might be interested, since
Cloudera does publish binaries.


> Apache Hive proposes and adds syntax all the time. For example, this
> feature is in the works now (
> https://issues.apache.org/jira/browse/HIVE-15986).


Funny you should mention that feature -- I actually implemented that for
Impala:
https://github.com/apache/incubator-impala/commit/1a3d7ffd4fd392b3ed831dfc7a3bfcfdb8cb8bbd#diff-a7c8505823aef79d508ede7e4d4e464a

As you say, though, our syntax is unlikely to be identical. I've sent a
patch for review that will change our webpage:
http://gerrit.cloudera.org:8080/6567