You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Nicolas Lalevée (Updated JIRA)" <ji...@apache.org> on 2011/11/08 13:50:53 UTC

[jira] [Updated] (CASSANDRA-913) Add Hive support

     [ https://issues.apache.org/jira/browse/CASSANDRA-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Lalevée updated CASSANDRA-913:
--------------------------------------

    Attachment: CASSANDRA-913-r1199213.patch

I cannot reopen this issue, so I'll just comment.

As suggested by Jonathan in HIVE-1434, an hive/cassandra bridge may better fit here.

I have finally found the source of Brisk's implementation (https://github.com/riptano/hive). The patch I am submitting here (CASSANDRA-913-r1199213.patch) is based on their work. So I cannot grant any license here.

What I did on the original source:
* I changed the package names (for some classes, some package access was needed)
* add ASL2 headers for the ASF
* format the code according to cassandra standard
* change some logger from log4j and commons logging to slf4j
* it didn't handle well nulls in hive tables, I have fixed that for the little tests I did.

About the build, it needs hive jars in contrib/hive/lib. I don't know how to better setup this since those jars are not available in the maven repo.

About runtime, I had a lot of trouble due to some conflict between the thrift library used by hive and the one used by cassandra. hive 0.7 is using the 0.5, cassandra the 0.6. Cassandra external table in hive could not be declared due to some NoSuchMethodException.
As far as I understand hive, hive need thrift at job runtime just for handling dynamic column serialization. In my use case I didn't needed it so I did some hack: I remove every org.apache.thrift class from hive-exec.jar. Then it works nicely (for my use case).

There were some tests in the github repo. They are Hive oriented. I'm too lazy to try to make then work in cassandra's source tree.

With Hive 0.8, it will use thrift 0.7 (hopefully backward compatible with 0.6), and hive artifacts will be published on the maven repository (HIVE-1095). So probably it will be best to wait for easier integration in cassandra ?

                
> Add Hive support
> ----------------
>
>                 Key: CASSANDRA-913
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-913
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Contrib
>            Reporter: Jonathan Ellis
>              Labels: gsoc, gsoc2010
>         Attachments: CASSANDRA-913-r1199213.patch
>
>
> http://hadoop.apache.org/hive/ is a project that runs SQL queries against Hadoop map/reduce clusters.  (For analytics; it is too high-latency to run applications against Hive directly).  HIVE-705 added support for backends other than HDFS, with HBase as the first.  Cassandra support should be doable too now.
> The Hive storage backends are described in http://wiki.apache.org/hadoop/Hive/StorageHandlers and the HBase backend specifically in http://wiki.apache.org/hadoop/Hive/HBaseIntegration.
> I also note that John Sichi, author of the HBase backend, seems like a helpful guy and I imagine would be totally cool with answering questions about implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira