You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@carbondata.apache.org by Lewis Goldstein <le...@gm.com> on 2018/06/15 20:00:28 UTC

Carbon Data integration with HIVE

Happened upon Apache CarbonData while searching for info on other Columnar Data Stores on HDFS.   As I am looking for ways to accelerate consumption from Hadoop that could cover both large query, interactive query, and OLAP this technology sounds quite promising.   On initial read it sounds like CarbonData is considered another Columnar Data Store on HDFS analogous to Parquet and ORC, but then on further reading it sounds like to load data to this format it must pass through Spark;  I would like to know if this is truly the case?

Was hoping it would work similar to Parquet and Hive in that one would just define the Hive Table as external with a designated file type of CarbonData - is this possible or does one need Spark to be an intermediary?   Is CarbonData actually more like Druid than simply another Columnar Data Store on HDFS?



Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.