You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by ilegend <51...@qq.com> on 2018/02/02 03:30:24 UTC

Help, carbondata issues on spark

Hi guys 
We're testing carbondata for our project. The performance of the carbondata is better than parquet under the special rules, but there are some problems. Do you have any solutions for our issues. 
Hdfs 2.6, spark 2.1, carbondata 1.3
1.no multiple levels partitions , we need three levels partitions, like year,day,hour
2.spark needs import carbondata jar, we wouldn't modify the existing sql algorithm 
3.low stability, insert failure frequently 

Look forward to your reply.

发自我的 iPhone







Re: Help, carbondata issues on spark

Posted by Liang Chen <ch...@gmail.com>.
Hi

1.no multiple levels partitions , we need three levels partitions, like 
year,day,hour 

Reply : Year,day,hour belong to one column(field)  or three columns ?   Can 
you explain, what are your exact scenarios?  we can help you to design 
partition + sort columns to solve your specific query issues. 

2.spark needs import carbondata jar, we wouldn't modify the existing sql 
algorithm 

Reply : No need to modify any sql rules , you can use all sql which be 
supported by SparkSQL to query carbondata. 

3.low stability, insert failure frequently 
Reply : What are the exact error ? 

Regards
Liang

ilegend wrote
> Hi guys 
> We're testing carbondata for our project. The performance of the
> carbondata is better than parquet under the special rules, but there are
> some problems. Do you have any solutions for our issues. 
> Hdfs 2.6, spark 2.1, carbondata 1.3
> 1.no multiple levels partitions , we need three levels partitions, like
> year,day,hour
> 2.spark needs import carbondata jar, we wouldn't modify the existing sql
> algorithm 
> 3.low stability, insert failure frequently 
> 
> Look forward to your reply.
> 
> 发自我的 iPhone





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Help, carbondata issues on spark

Posted by Jacky Li <ja...@qq.com>.

> 在 2018年2月2日,上午11:30,ilegend <51...@qq.com> 写道:
> 
> Hi guys 
> We're testing carbondata for our project. The performance of the carbondata is better than parquet under the special rules, but there are some problems. Do you have any solutions for our issues. 
> Hdfs 2.6, spark 2.1, carbondata 1.3
> 1.no multiple levels partitions , we need three levels partitions, like year,day,hour

If you are looking for OLAP on timeseries day, you can try timeseries feature in 1.3, you can refer to the timeseries section in https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#pre-aggregate-tables <https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#pre-aggregate-tables>

> 2.spark needs import carbondata jar, we wouldn't modify the existing sql algorithm 

I think if you are using CarbonSession, you have all builtin sql optimization support from carbon. You do not need to modify your spark jar.

> 3.low stability, insert failure frequently 

Is it memory issue?

> 
> Look forward to your reply.
> 
> 发自我的 iPhone
> 
> 
> 
> 
> 
>