You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Ravindra Pesala <ra...@gmail.com> on 2019/07/16 14:30:24 UTC

[Discussion] Roadmap for Apache CarbonData 2

Hi Community,

Three years have passed since the launching of the Apache CarbonData
project, CarbonData has become a popular data management solution for
various scenarios. As new workload like AI and new runtime environment like
the cloud is emerging quickly, I think we are reaching a point that needs
to discuss the future of CarbonData.

To bring CarbonData to a new level to satisfy those new requirements, Jacky
and I drafted a roadmap for CarbonData 2 in the cwiki website.
- English Version:
https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal
- Chinese Version:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492

Please feel free to discuss the roadmap in this thread, and we welcome
every feedback to make CarbonData better.

Thanks and Regards,
Ravindra.

Re: [Discussion] Roadmap for Apache CarbonData 2

Posted by manish gupta <to...@gmail.com>.
Hi Team

Its glad to see how Carbondata has grown and become popular over the time.
It was important to re-look and come up with a roadmap as per future needs.
Carbondata 2.0 proposal looks good as we are trying to align it with Cloud
which will be more or less the prominent run time environment in the near
future. A lot of code refactoring will be required as per the roadmap. I
would like to add a couple of points.

1. Complex type support: Although we do have complex type support there is
scope for improvement. use cases for nested columns are growing
extensively. We should work on improving the storage of nested columns and
should also support creating compound/multi column indexes for the nested
columns.
2. Feature code segregation and Pluggability: Current code is tightly
coupled. The ideal case would be to have a base and make all the features
pluggable into it but that will be hard to achieve. We can try segregation
at the package level for major features but for any new feature developed
we should think in terms of pluggability.

[Clarification] Carbon UI: I did not understand the usage of Carbon segment
management UI. For cloud scenario we will have to expose rest end points
which will make carbon more like a Microservice and that does not go along
with Carbondata use case. UI/tool makes more sense for internal testing but
not sure how it will be beneficial for end user. May be a tool showing the
data stored in each table would be more useful to the end user.

Regards
Manish Gupta

On Tue, Aug 13, 2019 at 4:51 PM Kumar Vishal <ku...@gmail.com>
wrote:

> Hi Ravi,
>
> We can add below requirements in 2.0:
>
> 1. Data Loading performance improvement.(Need to analyze and improve)
> 2. Unify reading for carbon data file, currently data is read in two parts
> dimension and measure because of this number of IO is more.
> 3. Carbon Store size optimization(Already PR is raised need to revisit) and
> we can explore some more optimization(like RLE hybrid Bit Packing).
> 4. Presto enhancement(Like write support, Presto SQL adaptation, Complex
> type read support)
> 5. Spark Data Source V2 integration.
> 6. Spatial Index Support.
>
>
> -Regards
> Kumar Vishal
>
> On Thu, Jul 18, 2019 at 8:20 PM Ravindra Pesala <ra...@gmail.com>
> wrote:
>
> > Hi Kevin,
> >
> > Yes, we can improve it. The implementation is closely related to
> supporting
> > pre-aggregate datamaps on the streaming table which we have already
> > implemented some time ago. And same will be reimplemented for MV datamap
> > soon as well.
> > The implementation allows using of pre-aggregate datamap for
> non-streaming
> > segments and main table for streaming segments. We update the query plan
> to
> > do union on both the tables and query only the streaming segments for
> main
> > table.
> > So even in our case also we can use the same way, we can do the union
> query
> > of MV table and main table(only non loaded datamap segments) and execute
> > the query.  We can definitely consider after we support streaming table
> for
> > MV datamap.
> >
> > Regards,
> > Ravindra.
> >
> > On Wed, 17 Jul 2019 at 07:55, kevinjmh <ke...@qq.com> wrote:
> >
> > > currently, datamap in carbon applys to all segments.
> > > The roadmap refers to commands like add/drop segment, and also maybe
> > > something
> > > about incremental loading for MV. For these scenes, it is better to
> make
> > > datamap can be use on segment level instead of disable the datamap when
> > any
> > > datamap data is not ready for any segment. Also this can make datamap
> > > fail-safe and enhance carbon's stablility.
> > > Maybe we can consider about this also.
> > >
> > >
> > >
> > >
> > > -----
> > > Regards
> > > Manhua
> > >
> > >
> > >
> > > ---Original---
> > > From: "Ravindra Pesala"<ra...@gmail.com>
> > > Date: Tue, Jul 16, 2019 22:31 PM
> > > To: "dev"<de...@carbondata.apache.org>;
> > > Subject: [Discussion] Roadmap for Apache CarbonData 2
> > >
> > >
> > > Hi Community,
> > >
> > > Three years have passed since the launching of the Apache CarbonData
> > > project, CarbonData has become a popular data management solution for
> > > various scenarios. As new workload like AI and new runtime environment
> > like
> > > the cloud is emerging quickly, I think we are reaching a point that
> needs
> > > to discuss the future of CarbonData.
> > >
> > > To bring CarbonData to a new level to satisfy those new requirements,
> > Jacky
> > > and I drafted a roadmap for CarbonData 2 in the cwiki website.
> > > - English Version:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal
> > > - Chinese Version:
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492
> > >
> > > Please feel free to discuss the roadmap in this thread, and we welcome
> > > every feedback to make CarbonData better.
> > >
> > > Thanks and Regards,
> > > Ravindra.
> >
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
> >
>

Re: [Discussion] Roadmap for Apache CarbonData 2

Posted by Kumar Vishal <ku...@gmail.com>.
Hi Ravi,

We can add below requirements in 2.0:

1. Data Loading performance improvement.(Need to analyze and improve)
2. Unify reading for carbon data file, currently data is read in two parts
dimension and measure because of this number of IO is more.
3. Carbon Store size optimization(Already PR is raised need to revisit) and
we can explore some more optimization(like RLE hybrid Bit Packing).
4. Presto enhancement(Like write support, Presto SQL adaptation, Complex
type read support)
5. Spark Data Source V2 integration.
6. Spatial Index Support.


-Regards
Kumar Vishal

On Thu, Jul 18, 2019 at 8:20 PM Ravindra Pesala <ra...@gmail.com>
wrote:

> Hi Kevin,
>
> Yes, we can improve it. The implementation is closely related to supporting
> pre-aggregate datamaps on the streaming table which we have already
> implemented some time ago. And same will be reimplemented for MV datamap
> soon as well.
> The implementation allows using of pre-aggregate datamap for non-streaming
> segments and main table for streaming segments. We update the query plan to
> do union on both the tables and query only the streaming segments for main
> table.
> So even in our case also we can use the same way, we can do the union query
> of MV table and main table(only non loaded datamap segments) and execute
> the query.  We can definitely consider after we support streaming table for
> MV datamap.
>
> Regards,
> Ravindra.
>
> On Wed, 17 Jul 2019 at 07:55, kevinjmh <ke...@qq.com> wrote:
>
> > currently, datamap in carbon applys to all segments.
> > The roadmap refers to commands like add/drop segment, and also maybe
> > something
> > about incremental loading for MV. For these scenes, it is better to make
> > datamap can be use on segment level instead of disable the datamap when
> any
> > datamap data is not ready for any segment. Also this can make datamap
> > fail-safe and enhance carbon's stablility.
> > Maybe we can consider about this also.
> >
> >
> >
> >
> > -----
> > Regards
> > Manhua
> >
> >
> >
> > ---Original---
> > From: "Ravindra Pesala"<ra...@gmail.com>
> > Date: Tue, Jul 16, 2019 22:31 PM
> > To: "dev"<de...@carbondata.apache.org>;
> > Subject: [Discussion] Roadmap for Apache CarbonData 2
> >
> >
> > Hi Community,
> >
> > Three years have passed since the launching of the Apache CarbonData
> > project, CarbonData has become a popular data management solution for
> > various scenarios. As new workload like AI and new runtime environment
> like
> > the cloud is emerging quickly, I think we are reaching a point that needs
> > to discuss the future of CarbonData.
> >
> > To bring CarbonData to a new level to satisfy those new requirements,
> Jacky
> > and I drafted a roadmap for CarbonData 2 in the cwiki website.
> > - English Version:
> >
> >
> https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal
> > - Chinese Version:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492
> >
> > Please feel free to discuss the roadmap in this thread, and we welcome
> > every feedback to make CarbonData better.
> >
> > Thanks and Regards,
> > Ravindra.
>
>
>
> --
> Thanks & Regards,
> Ravi
>

Re: [Discussion] Roadmap for Apache CarbonData 2

Posted by Ravindra Pesala <ra...@gmail.com>.
Hi Kevin,

Yes, we can improve it. The implementation is closely related to supporting
pre-aggregate datamaps on the streaming table which we have already
implemented some time ago. And same will be reimplemented for MV datamap
soon as well.
The implementation allows using of pre-aggregate datamap for non-streaming
segments and main table for streaming segments. We update the query plan to
do union on both the tables and query only the streaming segments for main
table.
So even in our case also we can use the same way, we can do the union query
of MV table and main table(only non loaded datamap segments) and execute
the query.  We can definitely consider after we support streaming table for
MV datamap.

Regards,
Ravindra.

On Wed, 17 Jul 2019 at 07:55, kevinjmh <ke...@qq.com> wrote:

> currently, datamap in carbon applys to all segments.
> The roadmap refers to commands like add/drop segment, and also maybe
> something
> about incremental loading for MV. For these scenes, it is better to make
> datamap can be use on segment level instead of disable the datamap when any
> datamap data is not ready for any segment. Also this can make datamap
> fail-safe and enhance carbon's stablility.
> Maybe we can consider about this also.
>
>
>
>
> -----
> Regards
> Manhua
>
>
>
> ---Original---
> From: "Ravindra Pesala"<ra...@gmail.com>
> Date: Tue, Jul 16, 2019 22:31 PM
> To: "dev"<de...@carbondata.apache.org>;
> Subject: [Discussion] Roadmap for Apache CarbonData 2
>
>
> Hi Community,
>
> Three years have passed since the launching of the Apache CarbonData
> project, CarbonData has become a popular data management solution for
> various scenarios. As new workload like AI and new runtime environment like
> the cloud is emerging quickly, I think we are reaching a point that needs
> to discuss the future of CarbonData.
>
> To bring CarbonData to a new level to satisfy those new requirements, Jacky
> and I drafted a roadmap for CarbonData 2 in the cwiki website.
> - English Version:
>
> https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal
> - Chinese Version:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492
>
> Please feel free to discuss the roadmap in this thread, and we welcome
> every feedback to make CarbonData better.
>
> Thanks and Regards,
> Ravindra.



-- 
Thanks & Regards,
Ravi

Re: [Discussion] Roadmap for Apache CarbonData 2

Posted by kevinjmh <ke...@qq.com>.
currently, datamap in carbon applys to all segments. 
The roadmap refers to commands like add/drop segment, and also maybe something
about incremental loading for MV. For these scenes, it is better to make
datamap can be use on segment level instead of disable the datamap when any
datamap data is not ready for any segment. Also this can make datamap
fail-safe and enhance carbon's stablility. 
Maybe we can consider about this also.




-----
Regards 
Manhua


 
---Original---
From: "Ravindra Pesala"<ra...@gmail.com>
Date: Tue, Jul 16, 2019 22:31 PM
To: "dev"<de...@carbondata.apache.org>;
Subject: [Discussion] Roadmap for Apache CarbonData 2


Hi Community,

Three years have passed since the launching of the Apache CarbonData
project, CarbonData has become a popular data management solution for
various scenarios. As new workload like AI and new runtime environment like
the cloud is emerging quickly, I think we are reaching a point that needs
to discuss the future of CarbonData.

To bring CarbonData to a new level to satisfy those new requirements, Jacky
and I drafted a roadmap for CarbonData 2 in the cwiki website.
- English Version:
https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal
- Chinese Version:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492

Please feel free to discuss the roadmap in this thread, and we welcome
every feedback to make CarbonData better.

Thanks and Regards,
Ravindra.

Re: [Discussion] Roadmap for Apache CarbonData 2

Posted by melin li <li...@gmail.com>.
antl4 parse sql

xuchuanyin <xu...@apache.org> 于2019年8月17日周六 上午11:31写道:

> Hi, so glad to see Carbondata will enter stage 2.x and I have the following
> suggestions for your consideration as following:
>
> 1. Evolution for Carbondata file format.
> Previously I thought one of the key highlights of Carbondata is the
> Carbondata file format, is there any evolution for that?
> While Carbondata steps to a broader application scopes, will the current
> file format still suite well for them?
>
>
> 2. Performance commitment of Carbondata.
> Seems that Carbondata cares more about expanding the scope of application
> than the performance enhancemance.
> What is the performance commitment of Carbondata 2 for
> dataloading&querying?
> Many enterprises do have big data, but that is not BIG enough to use
> cloud/datalake etc.
> For these scenarios, is Carbondata performance obviously better than other
> fileFormat+executionEngine combination?
> Do we have any plan for the enhancement?
>
>
> 3. Smarter Carbondata.
> As we suggested earlier, is Carbondata advisor on the roadmap?
> Carbondata has many features, but I notice that part of them are never used
> by the user.
> While Carbondata will serve AI scope, can itself be smarter as well?
> The Carbondata advisor is a DBA for Carbondata which will monitor the
> workload, usage, current performance and give proper suggestions or even
> can
> do proper operation itself.
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Re: [Discussion] Roadmap for Apache CarbonData 2

Posted by xuchuanyin <xu...@apache.org>.
Hi, so glad to see Carbondata will enter stage 2.x and I have the following
suggestions for your consideration as following:

1. Evolution for Carbondata file format.
Previously I thought one of the key highlights of Carbondata is the
Carbondata file format, is there any evolution for that?
While Carbondata steps to a broader application scopes, will the current
file format still suite well for them?


2. Performance commitment of Carbondata.
Seems that Carbondata cares more about expanding the scope of application
than the performance enhancemance.
What is the performance commitment of Carbondata 2 for dataloading&querying?
Many enterprises do have big data, but that is not BIG enough to use
cloud/datalake etc.
For these scenarios, is Carbondata performance obviously better than other
fileFormat+executionEngine combination?
Do we have any plan for the enhancement?


3. Smarter Carbondata.
As we suggested earlier, is Carbondata advisor on the roadmap?
Carbondata has many features, but I notice that part of them are never used
by the user.
While Carbondata will serve AI scope, can itself be smarter as well?
The Carbondata advisor is a DBA for Carbondata which will monitor the
workload, usage, current performance and give proper suggestions or even can
do proper operation itself.




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/