You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by Liang Chen <ch...@gmail.com> on 2017/06/02 00:11:09 UTC

Re: Carbondata hive integration Plan

Hi cenyuhai

Thanks for you started this discussion about hive integration:

1、Make carbon schema compatible with hive（CARBONDATA-1008）(create table
and alter table)

Liang: Like you mentioned, for first phase(1.2.0), supports read
carbondata files in hive. so can i understand the flow should be liking this
: a)all steps of preparing carbondata files be handled in Spark, so "create
table and alter table" would be handled in Spark.) In hive, only
read(query). So can you explain at little more about what you mentioned
that schema compatible with hive is for what part ?

2、Filter pushdown （especially partition filter FilterPushdownDev）
Liang : LGTM for this point.

3、A tool to update the existing tables' schema to be compatible with
hive.
Liang : comment same as question1. can you give some examples for "the
existing tables' schema".

For hive integration feature in Apache CarbonData 1.2.0, i propose the scope
as below:
1.Only support read/query carbondata files in hive. write carbondata(create
carbon table,alter carbon table, load data etc.) in hive will be supported
in future.(new mailing topic can discuss the plan)
2.Can utilize CarbonData's good features(index, dictionary...... ) to get
good query performance. Hive+CarbonData performance should be better than
Hive+ORC
3.Provide a solution/tool to migrate all hive tables&data to carbon
tables&data in Spark.

Regards
Liang

--
View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Carbondata-integration-Plan-tp13450p13647.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.

Re: Carbondata hive integration Plan

Posted by Sea <26...@qq.com>.

hi, anubhav:
    you are right, this tool is unnecessary.




------------------ Original ------------------
From:  "anubhav.tarar";<an...@knoldus.in>;
Date:  Wed, Jun 7, 2017 06:44 PM
To:  "dev"<de...@carbondata.apache.org>; 

Subject:  Re: Carbondata hive integration Plan



Hi cenyuhai,can you tell why tool will be required,you already a pr
for it Making
carbon schema compatible with hive（CARBONDATA-1008）what will this tool do?

@liang HI,By existing tables' schema".cenyuhai means that when you are
reading carbondata table from hive you need to alter schema of that
carbontable to use mapredcarboninputformat and mapredcarbonoutput format
which are compatible with hive using following steps

alter table CHARTYPES31 set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";

alter table CHARTYPES3 set LOCATION
'hdfs://localhost:54310/opt/carbonStore/default/CHARTYPES3' ;



On Fri, Jun 2, 2017 at 5:41 AM, Liang Chen <ch...@gmail.com> wrote:

> Hi cenyuhai
>
> Thanks for you started this discussion about hive integration:
>
>     1、Make carbon schema compatible with hive（CARBONDATA-1008）(create table
> and alter table)
>
>     Liang:  Like you mentioned, for first phase(1.2.0), supports read
> carbondata files in hive. so can i understand the flow should be liking
> this
> : a)all steps of preparing carbondata files be handled in Spark, so "create
> table and alter table" would be handled in Spark.) In hive, only
> read(query).   So can you explain at little more about what you mentioned
> that schema compatible with hive is for what part ?
>
>     2、Filter pushdown （especially partition filter FilterPushdownDev）
>     Liang : LGTM for this point.
>
>     3、A tool to update the existing tables' schema to be compatible with
> hive.
>     Liang : comment same as question1.   can you give some examples for
> "the
> existing tables' schema".
>
>
> For hive integration feature in Apache CarbonData 1.2.0, i propose the
> scope
> as below:
> 1.Only support read/query carbondata files in hive.  write
> carbondata(create
> carbon table,alter carbon table, load data etc.) in hive will be supported
> in future.(new mailing topic can discuss the plan)
> 2.Can utilize CarbonData's good features(index, dictionary...... ) to get
> good query performance. Hive+CarbonData performance should be better than
> Hive+ORC
> 3.Provide a solution/tool to migrate all hive tables&data to carbon
> tables&data in Spark.
>
> Regards
> Liang
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/Carbondata-integration-Plan-
> tp13450p13647.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>



-- 
Thanks and Regards

*   Anubhav Tarar     *


* Software Consultant*
      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
<https://twitter.com/RahulKu71223673>    fb <ra...@facebook.com>
          mob : 8588915184

Re: Carbondata hive integration Plan

Posted by Anubhav Tarar <an...@knoldus.in>.

Hi cenyuhai,can you tell why tool will be required,you already a pr
for it Making
carbon schema compatible with hive（CARBONDATA-1008）what will this tool do?

@liang HI,By existing tables' schema".cenyuhai means that when you are
reading carbondata table from hive you need to alter schema of that
carbontable to use mapredcarboninputformat and mapredcarbonoutput format
which are compatible with hive using following steps

alter table CHARTYPES31 set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";

alter table CHARTYPES3 set LOCATION
'hdfs://localhost:54310/opt/carbonStore/default/CHARTYPES3' ;



On Fri, Jun 2, 2017 at 5:41 AM, Liang Chen <ch...@gmail.com> wrote:

> Hi cenyuhai
>
> Thanks for you started this discussion about hive integration:
>
>     1、Make carbon schema compatible with hive（CARBONDATA-1008）(create table
> and alter table)
>
>     Liang:  Like you mentioned, for first phase(1.2.0), supports read
> carbondata files in hive. so can i understand the flow should be liking
> this
> : a)all steps of preparing carbondata files be handled in Spark, so "create
> table and alter table" would be handled in Spark.) In hive, only
> read(query).   So can you explain at little more about what you mentioned
> that schema compatible with hive is for what part ?
>
>     2、Filter pushdown （especially partition filter FilterPushdownDev）
>     Liang : LGTM for this point.
>
>     3、A tool to update the existing tables' schema to be compatible with
> hive.
>     Liang : comment same as question1.   can you give some examples for
> "the
> existing tables' schema".
>
>
> For hive integration feature in Apache CarbonData 1.2.0, i propose the
> scope
> as below:
> 1.Only support read/query carbondata files in hive.  write
> carbondata(create
> carbon table,alter carbon table, load data etc.) in hive will be supported
> in future.(new mailing topic can discuss the plan)
> 2.Can utilize CarbonData's good features(index, dictionary...... ) to get
> good query performance. Hive+CarbonData performance should be better than
> Hive+ORC
> 3.Provide a solution/tool to migrate all hive tables&data to carbon
> tables&data in Spark.
>
> Regards
> Liang
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/Carbondata-integration-Plan-
> tp13450p13647.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>



-- 
Thanks and Regards

*   Anubhav Tarar     *


* Software Consultant*
      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
<https://twitter.com/RahulKu71223673>    fb <ra...@facebook.com>
          mob : 8588915184