You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Sea <26...@qq.com> on 2017/03/23 13:58:16 UTC

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

Hi, liang:
    I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
    My steps are as following:
    
1.build carbondata


mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1



2.copy jars


mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/



3.create sample.csv and put it into hdfs


id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0



4.create table in spark


spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"


#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"


val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)


carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")



5.alter table schema in hive


cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/


#start hive cli
./$HIVE_HOME/bin/hive


#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";


alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;  
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);





6.check table schema


execute "show create table hive_carbon"





7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"


















8.the table are still available in spark 









------------------ Original ------------------
From:  "Liang Chen";<no...@github.com>;
Date:  Thu, Mar 23, 2017 00:09 AM
To:  "apache/incubator-carbondata"<in...@noreply.github.com>; 
Cc:  "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>; 
Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)




@cenyuhai  Thank you contributed this feature.
 Suggest creating a new profile for "integration/hive" module,  and let all hive related code decoupled from current modules,  let CI run normally first.
 
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

Posted by Liang Chen <ch...@gmail.com>.
Hi

Thanks for your great contributions.

Regards
Liang


cenyuhai wrote
> Hi, liang:
>     I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
>     My steps are as following:
>     
> 1.build carbondata
> 
> 
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
> 
> 
> 
> 2.copy jars
> 
> 
> mkdir ~/spark-2.1/carbon_lib
> cp
> ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp
> ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
> 
> 
> 
> 3.create sample.csv and put it into hdfs
> 
> 
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
> 
> 
> 
> 4.create table in spark
> 
> 
> spark-shell --jars
> "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
> 
> 
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
> 
> 
> val carbon =
> SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION,
> storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)
> 
> 
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
> 
> 
> 
> 5.alter table schema in hive
> 
> 
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
> 
> 
> #start hive cli
> ./$HIVE_HOME/bin/hive
> 
> 
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
> 
> 
> alter table hive_carbon set LOCATION
> 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;  
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
> 
> 
> 
> 
> 
> 6.check table schema
> 
> 
> execute "show create table hive_carbon"
> 
> 
> 
> 
> 
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 8.the table are still available in spark 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ------------------ Original ------------------
> From:  "Liang Chen";&lt;

> notifications@

> &gt;;
> Date:  Thu, Mar 23, 2017 00:09 AM
> To:  "apache/incubator-carbondata"&lt;

> incubator-carbondata@.github

> &gt;; 
> Cc:  "Sea"<

> 261810726@

>>; "Mention"&lt;

> mention@.github

> &gt;; 
> Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
> 
> 
> 
> 
> @cenyuhai  Thank you contributed this feature.
>  Suggest creating a new profile for "integration/hive" module,  and let
> all hive related code decoupled from current modules,  let CI run normally
> first.
>  
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.





--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Re-apache-incubator-carbondata-CARBONDATA-727-WIP-add-hiveintegration-for-carbon-672-tp9488p9497.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672)

Posted by Sea <26...@qq.com>.
Hi, Anubhav:
    Do you use mysql to store the hive metadata?spark sql and hive must use the same metastore.
    PS: Before you query data using hive,  you should alter table schema.


    This is the latest guide.
https://github.com/cenyuhai/incubator-carbondata/blob/CARBONDATA-727/integration/hive/hive-guide.md




------------------ Original ------------------
From:  "Anubhav Tarar";<an...@knoldus.in>;
Date:  Mon, Mar 27, 2017 02:59 PM
To:  "dev"<de...@carbondata.incubator.apache.org>; 
Cc:  "chenliang613"<ch...@apache.org>; "Mention"<me...@noreply.github.com>; 
Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] addhiveintegration for carbon (#672)



@sea hi i tried to use hive with the steps you mentioned from you pr but
get table not found exception from hive cli, here are the steps i use

1.start the spark shell with hive and carbon bulids

./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2.
jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0-
incubating-SNAPSHOT.jar

2.create the carbonsession and create and load tables

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc.
getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore")

scala>carbon.sql("create table hive_carbon(id int, name string, scale
decimal, country string, salary double) STORED BY 'carbondata'")
scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO
TABLE hive_carbon")

3.start hive cli and added the jars

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class
path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_
2.11-2.1.0.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
to class path
Added resources: [/home/hduser/spark-2.1.0-bin-
hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]


4.query data using hive

hive> select * from hive_carbon;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found
'hive_carbon'







On Fri, Mar 24, 2017 at 9:30 AM, Sea <26...@qq.com> wrote:

> I forgot something.
> Before query data from hive. We should set
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
>
>
> ------------------ Original ------------------
> From:  "261810726";<26...@qq.com>;
> Date:  Thu, Mar 23, 2017 09:58 PM
> To:  "chenliang613"<ch...@apache.org>; "dev"<dev@carbondata.
> incubator.apache.org>;
> Cc:  "Mention"<me...@noreply.github.com>;
> Subject:  Re:  [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
> Hi, liang:
>     I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
>     My steps are as following:
>
> 1.build carbondata
>
>
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
>
>
>
> 2.copy jars
>
>
> mkdir ~/spark-2.1/carbon_lib
> cp ~/cenyuhai/incubator-carbondata/assembly/target/
> scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp ~/cenyuhai/incubator-carbondata/integration/hive/
> target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
>
>
>
> 3.create sample.csv and put it into hdfs
>
>
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
>
>
>
> 4.create table in spark
>
>
> spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.
> 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/
> spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
>
>
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
>
>
> val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.
> CarbonCommonConstants.STORE_LOCATION, storeLocation).
> getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
>
>
>
> 5.alter table schema in hive
>
>
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
>
>
> #start hive cli
> ./$HIVE_HOME/bin/hive
>
>
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
>
>
> alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/
> hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
>
>
>
>
>
> 6.check table schema
>
>
> execute "show create table hive_carbon"
>
>
>
>
>
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 8.the table are still available in spark
>
>
>
>
>
>
>
>
>
> ------------------ Original ------------------
> From:  "Liang Chen";<no...@github.com>;
> Date:  Thu, Mar 23, 2017 00:09 AM
> To:  "apache/incubator-carbondata"<incubator-carbondata@noreply.github.com
> >;
> Cc:  "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>;
> Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
>
> @cenyuhai  Thank you contributed this feature.
>  Suggest creating a new profile for "integration/hive" module,  and let
> all hive related code decoupled from current modules,  let CI run normally
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>



-- 
Thanks and Regards

*   Anubhav Tarar     *


* Software Consultant*
      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
<https://twitter.com/RahulKu71223673>    fb <ra...@facebook.com>
          mob : 8588915184

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

Posted by Anubhav Tarar <an...@knoldus.in>.
@sea hi i tried to use hive with the steps you mentioned from you pr but
get table not found exception from hive cli, here are the steps i use

1.start the spark shell with hive and carbon bulids

./spark-shell --jars /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.incubating-SNAPSHOT-shade-hadoop2.7.2.
jar,/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata-hive-1.1.0-
incubating-SNAPSHOT.jar

2.create the carbonsession and create and load tables

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> val carbon = SparkSession.builder().enableHiveSupport().config(sc.
getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/carbonStore")

scala>carbon.sql("create table hive_carbon(id int, name string, scale
decimal, country string, salary double) STORED BY 'carbondata'")
scala>carbon.sql("LOAD DATA INPATH 'hdfs://localhost:54310/sample.csv' INTO
TABLE hive_carbon")

3.start hive cli and added the jars

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar] to class path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata-hive-1.1.0-incubating-SNAPSHOT.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar] to class
path
Added resources: [/home/hduser/spark-2.1.0-bin-hadoop2.7/carbonlib/
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar]

hive> add jar /home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_
2.11-2.1.0.jar;
Added [/home/hduser/spark-2.1.0-bin-hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]
to class path
Added resources: [/home/hduser/spark-2.1.0-bin-
hadoop2.7/jars/spark-catalyst_2.11-2.1.0.jar]


4.query data using hive

hive> select * from hive_carbon;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found
'hive_carbon'







On Fri, Mar 24, 2017 at 9:30 AM, Sea <26...@qq.com> wrote:

> I forgot something.
> Before query data from hive. We should set
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
>
>
> ------------------ Original ------------------
> From:  "261810726";<26...@qq.com>;
> Date:  Thu, Mar 23, 2017 09:58 PM
> To:  "chenliang613"<ch...@apache.org>; "dev"<dev@carbondata.
> incubator.apache.org>;
> Cc:  "Mention"<me...@noreply.github.com>;
> Subject:  Re:  [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
> Hi, liang:
>     I create a new profile "integration/hive" and the CI is OK now. But I
> still have some problems in altering hive metastore schema.
>     My steps are as following:
>
> 1.build carbondata
>
>
> mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
> -Phadoop-2.7.2 -Phive-1.2.1
>
>
>
> 2.copy jars
>
>
> mkdir ~/spark-2.1/carbon_lib
> cp ~/cenyuhai/incubator-carbondata/assembly/target/
> scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> ~/spark-2.1/carbon_lib/
> cp ~/cenyuhai/incubator-carbondata/integration/hive/
> target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar
> ~/spark-2.1/carbon_lib/
>
>
>
> 3.create sample.csv and put it into hdfs
>
>
> id,name,scale,country,salary
> 1,yuhai,1.77,china,33000.0
> 2,runlin,1.70,china,32000.0
>
>
>
> 4.create table in spark
>
>
> spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.
> 1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/
> spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"
>
>
> #execute these commands:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs:////user/hadoop/carbon"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
>
>
> val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir",
> warehouse).config(org.apache.carbondata.core.constants.
> CarbonCommonConstants.STORE_LOCATION, storeLocation).
> getOrCreateCarbonSession(storeLocation, metastoredb)
>
>
> carbon.sql("create table hive_carbon(id int, name string, scale decimal,
> country string, salary double) STORED BY 'carbondata'")
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv'
> INTO TABLE hive_carbon")
>
>
>
> 5.alter table schema in hive
>
>
> cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
> cp spark-catalyst*.jar hive/auxlibs/
> export HIVE_AUX_JARS_PATH=hive/auxlibs/
>
>
> #start hive cli
> ./$HIVE_HOME/bin/hive
>
>
> #execute commands:
> alter table hive_carbon set FILEFORMAT
> INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
> OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
> SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";
>
>
> alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/
> hadoop/carbon/store/default/hive_carbon';
> alter table hive_carbon change col id INT;
> alter table hive_carbon add columns(name string, scale decimal, country
> string, salary double);
>
>
>
>
>
> 6.check table schema
>
>
> execute "show create table hive_carbon"
>
>
>
>
>
> 7. execute "select * from hive_carbon" and "select * from hive_carbon
> order by id"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 8.the table are still available in spark
>
>
>
>
>
>
>
>
>
> ------------------ Original ------------------
> From:  "Liang Chen";<no...@github.com>;
> Date:  Thu, Mar 23, 2017 00:09 AM
> To:  "apache/incubator-carbondata"<incubator-carbondata@noreply.github.com
> >;
> Cc:  "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>;
> Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add
> hiveintegration for carbon (#672)
>
>
>
>
> @cenyuhai  Thank you contributed this feature.
>  Suggest creating a new profile for "integration/hive" module,  and let
> all hive related code decoupled from current modules,  let CI run normally
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>



-- 
Thanks and Regards

*   Anubhav Tarar     *


* Software Consultant*
      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
<https://twitter.com/RahulKu71223673>    fb <ra...@facebook.com>
          mob : 8588915184

Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)

Posted by Sea <26...@qq.com>.
I forgot something.
Before query data from hive. We should set 
set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;


------------------ Original ------------------
From:  "261810726";<26...@qq.com>;
Date:  Thu, Mar 23, 2017 09:58 PM
To:  "chenliang613"<ch...@apache.org>; "dev"<de...@carbondata.incubator.apache.org>; 
Cc:  "Mention"<me...@noreply.github.com>; 
Subject:  Re:  [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)



Hi, liang:
    I create a new profile "integration/hive" and the CI is OK now. But I still have some problems in altering hive metastore schema.
    My steps are as following:
    
1.build carbondata


mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -Phadoop-2.7.2 -Phive-1.2.1



2.copy jars


mkdir ~/spark-2.1/carbon_lib
cp ~/cenyuhai/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar ~/spark-2.1/carbon_lib/
cp ~/cenyuhai/incubator-carbondata/integration/hive/target/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar ~/spark-2.1/carbon_lib/



3.create sample.csv and put it into hdfs


id,name,scale,country,salary
1,yuhai,1.77,china,33000.0
2,runlin,1.70,china,32000.0



4.create table in spark


spark-shell --jars "/data/hadoop/spark-2.1/carbon_lib/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar,/data/hadoop/spark-2.1/carbon_lib/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar"


#execute these commands:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs:////user/hadoop/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"


val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)


carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/hadoop/sample.csv' INTO TABLE hive_carbon")



5.alter table schema in hive


cp ~/spark-2.1/carbon_lib/carbon-assembly-*.jar hive/auxlibs/
cp spark-catalyst*.jar hive/auxlibs/
export HIVE_AUX_JARS_PATH=hive/auxlibs/


#start hive cli
./$HIVE_HOME/bin/hive


#execute commands:
alter table hive_carbon set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";


alter table hive_carbon set LOCATION 'hdfs://mycluster-tj/user/hadoop/carbon/store/default/hive_carbon';
alter table hive_carbon change col id INT;  
alter table hive_carbon add columns(name string, scale decimal, country string, salary double);





6.check table schema


execute "show create table hive_carbon"





7. execute "select * from hive_carbon" and "select * from hive_carbon order by id"


















8.the table are still available in spark 









------------------ Original ------------------
From:  "Liang Chen";<no...@github.com>;
Date:  Thu, Mar 23, 2017 00:09 AM
To:  "apache/incubator-carbondata"<in...@noreply.github.com>; 
Cc:  "Sea"<26...@qq.com>; "Mention"<me...@noreply.github.com>; 
Subject:  Re: [apache/incubator-carbondata] [CARBONDATA-727][WIP] add hiveintegration for carbon (#672)




@cenyuhai  Thank you contributed this feature.
 Suggest creating a new profile for "integration/hive" module,  and let all hive related code decoupled from current modules,  let CI run normally first.
 
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.