You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by chenliang613 <ch...@gmail.com> on 2016/08/09 04:02:16 UTC

Open Discussion:Apache CarbonData Roadmap

HiI would like to start one discussion thread for Apache CarbonData
Roadmap.Your any input and comments would be very appreciated!
Apache CarbonData 0.1.0-incubating
Support integration with Apache Spark1.5.2,1.6.1,1.6.2Support integration
with Apache Hadoop 2.2 later versionColumnar data storeFully Index: it can
significantly accelerate query performance and reduces the I/O scans and CPU
resources, where there are filters in the query. it can also do skip scan in
more finer grain unit (called blocklet) in task side scanning instead of
scanning the whole file.Global Multi Dimensional Keys(MDK) based B+Tree
Index for all non-measure columnsMin-Max Index for all columns:.Inverted
index for all dimensionsOperable encoded data :Through supporting efficient
compression and global encoding schemes, can query on compressed/encoded
data, the data can be converted just before returning the results to the
users, which is "late materialized".Column group: Allow multiple columns to
form a column group that would be stored as row format. This reduces the row
reconstruction cost at query time.Supports for various use cases with one
single Data format : like interactive OLAP-style query, Sequential Access
(big scan), Random Access (narrow scan).	
Apache CarbonData 0.2.0-incubating
Support integration with Apache Spark 2.1Support Map data
type(CARBONDATA-45)Support create carbondata table select from other
datastore’s tableFor supporting more flexible data load, remove
kettleSupport CarbonDataOutputFormat.RegardsLiang



--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: Open Discussion:Apache CarbonData Roadmap

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

It sounds like a plan ;)

On 08/09/2016 08:37 AM, chenliang613 wrote:
> Hi Jb
>
> Thanks for your comments.
> Remove kettle for preparing to integrate with Apache Beam/Apache Flink for
> supporting real-time data load.
>
> I would like to propose integration with Apache Beam etc. in Apache
> CarbonData 0.3.0.
>
> Regards
> Liang
>
>
>
> --
> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49p55.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
>

-- 
Jean-Baptiste Onofr
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Open Discussion:Apache CarbonData Roadmap

Posted by chenliang613 <ch...@gmail.com>.

Hi Jb

Thanks for your comments.
Remove kettle for preparing to integrate with Apache Beam/Apache Flink for
supporting real-time data load.

I would like to propose integration with Apache Beam etc. in Apache
CarbonData 0.3.0.

Regards
Liang



--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49p55.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: Open Discussion:Apache CarbonData Roadmap

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Hi Liang,

it sounds good.

Any plan to support Apache Beam (instead of Spark directly) ?

Regards
JB

On 08/09/2016 06:02 AM, chenliang613 wrote:
> HiI would like to start one discussion thread for Apache CarbonData
> Roadmap.Your any input and comments would be very appreciated!
> Apache CarbonData 0.1.0-incubating
> Support integration with Apache Spark1.5.2,1.6.1,1.6.2Support integration
> with Apache Hadoop 2.2 later versionColumnar data storeFully Index: it can
> significantly accelerate query performance and reduces the I/O scans and CPU
> resources, where there are filters in the query. it can also do skip scan in
> more finer grain unit (called blocklet) in task side scanning instead of
> scanning the whole file.Global Multi Dimensional Keys(MDK) based B+Tree
> Index for all non-measure columnsMin-Max Index for all columns:.Inverted
> index for all dimensionsOperable encoded data :Through supporting efficient
> compression and global encoding schemes, can query on compressed/encoded
> data, the data can be converted just before returning the results to the
> users, which is "late materialized".Column group: Allow multiple columns to
> form a column group that would be stored as row format. This reduces the row
> reconstruction cost at query time.Supports for various use cases with one
> single Data format : like interactive OLAP-style query, Sequential Access
> (big scan), Random Access (narrow scan).	
> Apache CarbonData 0.2.0-incubating
> Support integration with Apache Spark 2.1Support Map data
> type(CARBONDATA-45)Support create carbondata table select from other
> datastore\u2019s tableFor supporting more flexible data load, remove
> kettleSupport CarbonDataOutputFormat.RegardsLiang
>
>
>
> --
> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Open Discussion:Apache CarbonData Roadmap

Posted by Jacky Li <ja...@qq.com>.

I think William’s point is valid, we should focus mainly on usability improvement in 0.2.0

Besides what Liang has pointed out, I have a brief list in mind that can be planned in several releases, if they make sense for the community users. They are mainly for more integration and more performance improvement.

1. Streaming ingest. It requires CarbonData to add new format support and integrate with streaming engine
2. Code refactory to make CarbonData in good shape to integrate processing framework other than spark, should be enable to integrate with both batch engine and streaming engine, including Hive/Flink/Beam/SparkStreaming/Kafka , etc.
3. More dictionary support. For example, for really high cardinality columns, can use file level local dictionary for encoding
4. More performance improvement for join operation leveraging CarbonData's late materialization


Regards,
Jacky

> 在 2016年8月9日，下午10:07，chenliang613 <ch...@gmail.com> 写道：
> 
> Hi William
> 
> Thanks for your input.
> Most of your points would be considered in 0.2.0 : remove kettle, add create
> table properties for simplifying data load,especially for high cardinality
> columns setting, support 2.0
> 
> Regards
> Liang
> 
> 
> --
> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49p65.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
> Received: from 140.211.11.3 (unknown [140.211.11.3])
> 	by newmx27.qq.com (NewMx) with SMTP id 
> 	for <ja...@qq.com>; Tue, 09 Aug 2016 22:07:22 +0800
> X-QQ-FEAT: 9w50BnWz/RNfZ7n2vc603oJoUfl5GGivHEdQYBRxC2u7k/n3I2o34fp5yz6iV
> 	Dw4zg1QjjWpz1Ne/luuMeWylg81hMbQdOIzWd96hnYDLr8Oo9BEhz4BI/7Nv8seHmet6UWV
> 	kTG3vcV0woN6p3vNFt6AtQk5u/McMnGhxo4a6EjwMzDeTCrS8vTKs8guSWINhP7YI3E2CKz
> 	HwJxeowSz+Y9P/Sq/78Flhqzh1v3PH7u3AnoWqnKmdVdVF3I9s24fJLtrBYPHiAN9TQ+bwe
> 	1Y/g==
> X-QQ-MAILINFO: NL3WKUOj1eeIq9ilG0feeyQgMypg5V3P+LBcwdBmPyY7tepW4nocKSbxX
> 	8Yl1xOsQEoqxUiToiLsrhZQFbOerAGpd4F8KNhXiM+Zy1R0HDyfTdKsQxn7uDQZQXhL83Jn
> 	wUqMGtxYFoTknKDh0EEgNV4=
> X-QQ-mid: usamxproxy15t1470751643tc27q81
> X-QQ-CSender: dev-return-657-jacky.likun=qq.com@carbondata.incubator.apache.org
> X-QQ-ORGSender: dev-return-657-jacky.likun=qq.com@carbondata.incubator.apache.org
> X-KK-mid:usamxproxy15t1470751643tc27q81
> Received: (qmail 62958 invoked by uid 500); 9 Aug 2016 14:07:22 -0000
> Mailing-List: contact dev-help@carbondata.incubator.apache.org; run by ezmlm
> Precedence: bulk
> List-Help: <ma...@carbondata.incubator.apache.org>
> List-Unsubscribe: <ma...@carbondata.incubator.apache.org>
> List-Post: <ma...@carbondata.incubator.apache.org>
> List-Id: <dev.carbondata.incubator.apache.org>
> Reply-To: dev@carbondata.incubator.apache.org
> Delivered-To: mailing list dev@carbondata.incubator.apache.org
> Received: (qmail 62945 invoked by uid 99); 9 Aug 2016 14:07:22 -0000
> Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2016 14:07:22 +0000
> Received: from localhost (localhost [127.0.0.1])
> 	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B9F0C1804A2
> 	for <de...@carbondata.apache.org>; Tue,  9 Aug 2016 14:07:21 +0000 (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: 3.736
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.736 tagged_above=-999 required=6.31
> 	tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_ENVFROM_END_DIGIT=0.25,
> 	NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_NONE=-0.0001,
> 	SPF_SOFTFAIL=0.972, URI_HEX=1.313] autolearn=disabled
> Received: from mx1-lw-eu.apache.org ([10.40.0.8])
> 	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
> 	with ESMTP id 5ZNIc-hs1KLy for <de...@carbondata.apache.org>;
> 	Tue,  9 Aug 2016 14:07:20 +0000 (UTC)
> Received: from mbob.nabble.com (mbob.nabble.com [162.253.133.15])
> 	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 6863860DFD
> 	for <de...@carbondata.incubator.apache.org>; Tue,  9 Aug 2016 14:07:19 +0000 (UTC)
> Received: from msam.nabble.com (unknown [162.253.133.85])
> 	by mbob.nabble.com (Postfix) with ESMTP id 4ED782E5DCA4
> 	for <de...@carbondata.incubator.apache.org>; Tue,  9 Aug 2016 06:41:42 -0700 (PDT)
> Date: Tue, 9 Aug 2016 07:07:18 -0700 (MST)
> From: chenliang613 <ch...@gmail.com>
> To: dev@carbondata.incubator.apache.org
> Message-ID: <14...@n5.nabble.com>
> In-Reply-To: <14...@n5.nabble.com>
> References: <14...@n5.nabble.com> <14...@n5.nabble.com>
> Subject: Re: Open Discussion:Apache CarbonData Roadmap
> MIME-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> 
> Hi William
> 
> Thanks for your input.
> Most of your points would be considered in 0.2.0 : remove kettle, add create
> table properties for simplifying data load,especially for high cardinality
> columns setting, support 2.0
> 
> Regards
> Liang
> 
> 
> --
> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49p65.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: Open Discussion:Apache CarbonData Roadmap

Posted by chenliang613 <ch...@gmail.com>.

Hi William

Thanks for your input.
Most of your points would be considered in 0.2.0 : remove kettle, add create
table properties for simplifying data load,especially for high cardinality
columns setting, support 2.0

Regards
Liang



--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49p65.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.