You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Sea <26...@qq.com> on 2017/03/27 15:36:52 UTC

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672)

Hi, Anubhav:
    Do you use mysql to store the hive metadata?spark sql and hive must use the same metastore.
    PS: Before you query data using hive,  you should alter table schema.


    This is the latest guide.
https://github.com/cenyuhai/incubator-carbondata/blob/CARBONDATA-727/integration/hive/hive-guide.md




------------------ Original ------------------
From:  "Anubhav Tarar";<an...@knoldus.in>;
Date:  Mon, Mar 27, 2017 02:59 PM
To:  "dev"<de...@carbondata.incubator.apache.org>; 
Cc:  "chenliang613"<ch...@apache.org>; "Mention"<me...@noreply.github.com>; 
Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672)



@sea hi i tried to use hive with the steps you mentioned from you pr but
get table not found exception from hive cli, here are the steps i use

1.start the spark shell with hive and carbon bulids

./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2.
jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0-
incubating-SNAPSHOT.jar

2.create the carbonsession and create and load tables

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc.
getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore")

scala>carbon.sql("create table hive_carbon(id int, name string, scale
decimal, country string, salary double) STORED BY 'carbondata'")
scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO
TABLE hive_carbon")

3.start hive cli and added the jars

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class
path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_
2.11-2.1.0.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
to class path
Added resources: [/home/hduser/spark-2.1.0-bin-
hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]


4.query data using hive

hive> select * from hive_carbon;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found
'hive_carbon'







On Fri, Mar 24, 2017 at 9:30 AM, Sea <26...@qq.com> wrote:

> I forgot something.
> Before query data from hive. We should set
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
>
>
> ------------------ Original ------------------
> From:  "261810726";<26...@qq.com>;
> Date:  Thu, Mar 23, 2017 09:58 PM
> To:  "chenliang613"<ch...@apache.org>; "dev"<dev@carbondata.
> incubator.apache.org>;
> Cc:  "Mention"<me...@noreply.github.com>;
> Subject:  Re:  [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
> Hi, liang:
>     I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
>     My steps are as following:
>
> 1.build carbondata
>
>
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
>
>
>
> 2.copy jars
>
>
> mkdir ~/spark-2.1/carbon_lib
> cp ~/cenyuhai/incubator-carbondata/assembly/target/
> scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp ~/cenyuhai/incubator-carbondata/integration/hive/
> target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
>
>
>
> 3.create sample.csv and put it into hdfs
>
>
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
>
>
>
> 4.create table in spark
>
>
> spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.
> 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/
> spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
>
>
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
>
>
> val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.
> CarbonCommonConstants.STORE_LOCATION, storeLocation).
> getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
>
>
>
> 5.alter table schema in hive
>
>
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
>
>
> #start hive cli
> ./$HIVE_HOME/bin/hive
>
>
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
>
>
> alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/
> hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
>
>
>
>
>
> 6.check table schema
>
>
> execute "show create table hive_carbon"
>
>
>
>
>
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 8.the table are still available in spark
>
>
>
>
>
>
>
>
>
> ------------------ Original ------------------
> From:  "Liang Chen";<no...@github.com>;
> Date:  Thu, Mar 23, 2017 00:09 AM
> To:  "apache/incubator-carbondata"<incubator-carbondata@noreply.github.com
> >;
> Cc:  "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>;
> Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
>
> @cenyuhai  Thank you contributed this feature.
>  Suggest creating a new profile for "integration/hive" module,  and let
> all hive related code decoupled from current modules,  let CI run normally
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>



-- 
Thanks and Regards

*   Anubhav Tarar     *


* Software Consultant*
      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
<https://twitter.com/RahulKu71223673>    fb <ra...@facebook.com>
          mob : 8588915184