You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Sea <26...@qq.com> on 2017/03/23 13:58:16 UTC
Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)
Hi, liang:
I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
My steps are as following:
1.build carbondata
mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1
2.copy jars
mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/
3.create sample.csv and put it into hdfs
id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0
4.create table in spark
spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"
val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)
carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")
5.alter table schema in hive
cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/
#start hive cli
./$HIVE_HOME/bin/hive
#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);
6.check table schema
execute "show create table hive_carbon"
7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"
8.the table are still available in spark
------------------ Original ------------------
From: "Liang Chen";<no...@github.com>;
Date: Thu, Mar 23, 2017 00:09 AM
To: "apache/incubator-carbondata"<in...@noreply.github.com>;
Cc: "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>;
Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)
@cenyuhai Thank you contributed this feature.
Suggest creating a new profile for "integration/hive" module, and let all hive related code decoupled from current modules, let CI run normally first.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
hiveintegration for carbon (#672)
Posted by Liang Chen <ch...@gmail.com>.
Hi
Thanks for your great contributions.
Regards
Liang
cenyuhai wrote
> Hi, liang:
> I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
> My steps are as following:
>
> 1.build carbondata
>
>
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
>
>
>
> 2.copy jars
>
>
> mkdir ~/spark-2.1/carbon_lib
> cp
> ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp
> ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
>
>
>
> 3.create sample.csv and put it into hdfs
>
>
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
>
>
>
> 4.create table in spark
>
>
> spark-shell --jars
> "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
>
>
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
>
>
> val carbon =
> SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION,
> storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
>
>
>
> 5.alter table schema in hive
>
>
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
>
>
> #start hive cli
> ./$HIVE_HOME/bin/hive
>
>
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
>
>
> alter table hive_carbon set LOCATION
> 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
>
>
>
>
>
> 6.check table schema
>
>
> execute "show create table hive_carbon"
>
>
>
>
>
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 8.the table are still available in spark
>
>
>
>
>
>
>
>
>
> ------------------ Original ------------------
> From: "Liang Chen";<
> notifications@
> >;
> Date: Thu, Mar 23, 2017 00:09 AM
> To: "apache/incubator-carbondata"<
> incubator-carbondata@.github
> >;
> Cc: "Sea"<
> 261810726@
>>; "Mention"<
> mention@.github
> >;
> Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
>
> @cenyuhai Thank you contributed this feature.
> Suggest creating a new profile for "integration/hive" module, and let
> all hive related code decoupled from current modules, let CI run normally
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Re-apache-incubator-carbondata-CARBONDATA-727-WIP-add-hiveintegration-for-carbon-672-tp9488p9497.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672)
Posted by Sea <26...@qq.com>.
Hi, Anubhav:
Do you use mysql to store the hive metadata?spark sql and hive must use the same metastore.
PS: Before you query data using hive, you should alter table schema.
This is the latest guide.
https://github.com/cenyuhai/incubator-carbondata/blob/CARBONDATA-727/integration/hive/hive-guide.md
------------------ Original ------------------
From: "Anubhav Tarar";<an...@knoldus.in>;
Date: Mon, Mar 27, 2017 02:59 PM
To: "dev"<de...@carbondata.incubator.apache.org>;
Cc: "chenliang613"<ch...@apache.org>; "Mention"<me...@noreply.github.com>;
Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672)
@sea hi i tried to use hive with the steps you mentioned from you pr but
get table not found exception from hive cli, here are the steps i use
1.start the spark shell with hive and carbon bulids
./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2.
jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0-
incubating-SNAPSHOT.jar
2.create the carbonsession and create and load tables
scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._
scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession
scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc.
getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore")
scala>carbon.sql("create table hive_carbon(id int, name string, scale
decimal, country string, salary double) STORED BY 'carbondata'")
scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO
TABLE hive_carbon")
3.start hive cli and added the jars
hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar]
hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class
path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar]
hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_
2.11-2.1.0.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
to class path
Added resources: [/home/hduser/spark-2.1.0-bin-
hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
4.query data using hive
hive> select * from hive_carbon;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found
'hive_carbon'
On Fri, Mar 24, 2017 at 9:30 AM, Sea <26...@qq.com> wrote:
> I forgot something.
> Before query data from hive. We should set
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
>
>
> ------------------ Original ------------------
> From: "261810726";<26...@qq.com>;
> Date: Thu, Mar 23, 2017 09:58 PM
> To: "chenliang613"<ch...@apache.org>; "dev"<dev@carbondata.
> incubator.apache.org>;
> Cc: "Mention"<me...@noreply.github.com>;
> Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
> Hi, liang:
> I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
> My steps are as following:
>
> 1.build carbondata
>
>
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
>
>
>
> 2.copy jars
>
>
> mkdir ~/spark-2.1/carbon_lib
> cp ~/cenyuhai/incubator-carbondata/assembly/target/
> scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp ~/cenyuhai/incubator-carbondata/integration/hive/
> target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
>
>
>
> 3.create sample.csv and put it into hdfs
>
>
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
>
>
>
> 4.create table in spark
>
>
> spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.
> 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/
> spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
>
>
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
>
>
> val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.
> CarbonCommonConstants.STORE_LOCATION, storeLocation).
> getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
>
>
>
> 5.alter table schema in hive
>
>
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
>
>
> #start hive cli
> ./$HIVE_HOME/bin/hive
>
>
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
>
>
> alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/
> hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
>
>
>
>
>
> 6.check table schema
>
>
> execute "show create table hive_carbon"
>
>
>
>
>
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 8.the table are still available in spark
>
>
>
>
>
>
>
>
>
> ------------------ Original ------------------
> From: "Liang Chen";<no...@github.com>;
> Date: Thu, Mar 23, 2017 00:09 AM
> To: "apache/incubator-carbondata"<incubator-carbondata@noreply.github.com
> >;
> Cc: "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>;
> Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
>
> @cenyuhai Thank you contributed this feature.
> Suggest creating a new profile for "integration/hive" module, and let
> all hive related code decoupled from current modules, let CI run normally
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
--
Thanks and Regards
* Anubhav Tarar *
* Software Consultant*
*Knoldus Software LLP <http://www.knoldus.com/home.knol> *
LinkedIn <http://in.linkedin.com/in/rahulforallp> Twitter
<https://twitter.com/RahulKu71223673> fb <ra...@facebook.com>
mob : 8588915184
Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
hiveintegration for carbon (#672)
Posted by Anubhav Tarar <an...@knoldus.in>.
@sea hi i tried to use hive with the steps you mentioned from you pr but
get table not found exception from hive cli, here are the steps i use
1.start the spark shell with hive and carbon bulids
./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2.
jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0-
incubating-SNAPSHOT.jar
2.create the carbonsession and create and load tables
scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._
scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession
scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc.
getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore")
scala>carbon.sql("create table hive_carbon(id int, name string, scale
decimal, country string, salary double) STORED BY 'carbondata'")
scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO
TABLE hive_carbon")
3.start hive cli and added the jars
hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar]
hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class
path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar]
hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_
2.11-2.1.0.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
to class path
Added resources: [/home/hduser/spark-2.1.0-bin-
hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
4.query data using hive
hive> select * from hive_carbon;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found
'hive_carbon'
On Fri, Mar 24, 2017 at 9:30 AM, Sea <26...@qq.com> wrote:
> I forgot something.
> Before query data from hive. We should set
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
>
>
> ------------------ Original ------------------
> From: "261810726";<26...@qq.com>;
> Date: Thu, Mar 23, 2017 09:58 PM
> To: "chenliang613"<ch...@apache.org>; "dev"<dev@carbondata.
> incubator.apache.org>;
> Cc: "Mention"<me...@noreply.github.com>;
> Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
> Hi, liang:
> I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
> My steps are as following:
>
> 1.build carbondata
>
>
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
>
>
>
> 2.copy jars
>
>
> mkdir ~/spark-2.1/carbon_lib
> cp ~/cenyuhai/incubator-carbondata/assembly/target/
> scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp ~/cenyuhai/incubator-carbondata/integration/hive/
> target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
>
>
>
> 3.create sample.csv and put it into hdfs
>
>
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
>
>
>
> 4.create table in spark
>
>
> spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.
> 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/
> spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
>
>
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
>
>
> val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.
> CarbonCommonConstants.STORE_LOCATION, storeLocation).
> getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
>
>
>
> 5.alter table schema in hive
>
>
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
>
>
> #start hive cli
> ./$HIVE_HOME/bin/hive
>
>
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
>
>
> alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/
> hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
>
>
>
>
>
> 6.check table schema
>
>
> execute "show create table hive_carbon"
>
>
>
>
>
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 8.the table are still available in spark
>
>
>
>
>
>
>
>
>
> ------------------ Original ------------------
> From: "Liang Chen";<no...@github.com>;
> Date: Thu, Mar 23, 2017 00:09 AM
> To: "apache/incubator-carbondata"<incubator-carbondata@noreply.github.com
> >;
> Cc: "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>;
> Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
>
> @cenyuhai Thank you contributed this feature.
> Suggest creating a new profile for "integration/hive" module, and let
> all hive related code decoupled from current modules, let CI run normally
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
--
Thanks and Regards
* Anubhav Tarar *
* Software Consultant*
*Knoldus Software LLP <http://www.knoldus.com/home.knol> *
LinkedIn <http://in.linkedin.com/in/rahulforallp> Twitter
<https://twitter.com/RahulKu71223673> fb <ra...@facebook.com>
mob : 8588915184
Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)
Posted by Sea <26...@qq.com>.
I forgot something.
Before query data from hive. We should set
set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;
------------------ Original ------------------
From: "261810726";<26...@qq.com>;
Date: Thu, Mar 23, 2017 09:58 PM
To: "chenliang613"<ch...@apache.org>; "dev"<de...@carbondata.incubator.apache.org>;
Cc: "Mention"<me...@noreply.github.com>;
Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)
Hi, liang:
I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
My steps are as following:
1.build carbondata
mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1
2.copy jars
mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/
3.create sample.csv and put it into hdfs
id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0
4.create table in spark
spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"
val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)
carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")
5.alter table schema in hive
cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/
#start hive cli
./$HIVE_HOME/bin/hive
#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);
6.check table schema
execute "show create table hive_carbon"
7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"
8.the table are still available in spark
------------------ Original ------------------
From: "Liang Chen";<no...@github.com>;
Date: Thu, Mar 23, 2017 00:09 AM
To: "apache/incubator-carbondata"<in...@noreply.github.com>;
Cc: "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>;
Subject: Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)
@cenyuhai Thank you contributed this feature.
Suggest creating a new profile for "integration/hive" module, and let all hive related code decoupled from current modules, let CI run normally first.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.