You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by xm_zzc <44...@qq.com> on 2019/03/28 16:31:49 UTC

[DISCUSSION] Add new compaction type for compacting delta data file

Hi dev:
  Currently CarbonData supports using compaction command to compact delta
data into carbondata file, but it needs two or more segments to be
compacted, if the size of these segments is big and user don't want to
compact them(it needs to spend a lot of time), just want to compact delta
data files into carbondata files for every segment. 
  Discuss with Jacky and David offline, there is a way to do this: add new
compaction type for compacting delta data files for each segment, for
example:
  alter table table_name compact 'iud_delta' where segment.id in (0.2), this
command will compact all delta data files of segment 0.2 into carbondata
files as new segment 0.3.

  Any suggestion for this, thanks.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by akashrn5 <ak...@gmail.com>.
hi,

Thanks for reply. Once you create jira and design document is ready, we can
further decide the impact and any other things to handle.

Thank you

Regards,
Akash R



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by xm_zzc <44...@qq.com>.
Hi:
  Just as I said before, we can add a compaction type called
'iud_delta_compact' for command 'alter table table_name compact' to support
this feature.
  The concurrency for this feature will be handled as the same as other
compaction types, and it's 
recommended to do this operation in off peak hours.

  I will create a jira for this feature later.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by xuchuanyin <xu...@apache.org>.
emm, eliminating delta files to enhance query performance is quite reasonable
and compaction is a candidate for it. However I have some questions about
this, maybe they will help in your design.

Q1:
A segment with delta files means there are some UD(update/delete) operations
on this segment before, which means there will still be some UD in the
future. So, is it worth conpacting this segment?

Also please keep in mind that UD operations will be blocked if the
compaction is going on.

Q2:
I feel there may be too many kinds of compaction in carbondata...

What if in the further I want another compaction that can merge smaller
carbondata file into larger ones? Will we add another kind of compaction? I
think it's time for us to consider extensibility for the further while
proposing this feature.

Q3:
Currently all kinds of compactions are using the query procedure to rewrite
all the records for the related segments.

Suppose we have a segment with 100 carbondata files and we only delete one
record in this segment. The 
penalty of rewriting all the records for this segment is heavy.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by xm_zzc <44...@qq.com>.
Hi:
  Just as I said before, we can add a new compaction type called
'iud_delta_compact' for command 'alter table table_name compact' to support
this feature.
  The concurrency of this feature will be handled as the same as other
compaction type, and it's recommended to do this operation in off peak
hours.

  I will create a jira task for this feature later.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by akashrn5 <ak...@gmail.com>.
hi,

Thanks for clearing the doubt.

So according to my understanding, bascially you want to merge all the delete
delta files and base carbondtaa files and write a new segment. basically
this helps to reduce IO right? 

So here i have some questions regarding that

1. are you planning for a new DDL for this operation? if you are, then DDL
structure?
2. how about the concurrency will handled with this? like update and delete,
compaction to table when this compaction is progress? if concurrent
operations are blocked well and good, else how the segment mapping wil be
maintained?

3. As jacky said, i agee with him, this will be costly operation as  you
will be writing the whole segment again and time consuming, how this will be
handled so that user wont be blocked for query or other operation. or is it
recommended to do this operation in offpeak hours?

I suggest, can you please add the design document and create a JIRA for
this, it would be helpful.


Thanks.

Regards,
Akash R




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by xm_zzc <44...@qq.com>.
Hi Jacky:
  Yes, my purpose is what you said.
  If there is one big segments including big delete and update delta files,
it needs to find another segment to compact with that big one if users want
to eliminate delta files, this operation will be more time consuming than
the operation what i want.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by Jacky Li <ja...@qq.com>.
I guess your intention is to rewrite a single segment by merging base file and delta files to improve the query performance of that segment, right? I think this is doable and note that this operation may be time consuming since it is rewriting the whole segment.

Regards,
Jacky

> 在 2019年3月30日,上午10:36,xm_zzc <44...@qq.com> 写道:
> 
> Hi, Akash R:
>  thanks for your reply.
>  I am talking about the delta data files including update and delete delta
> files. Horizontal compaction just compacts all delta files into one in one
> segment, right? But if the size of segment is big and the size of update and
> delete delta file is big too, I think it will effect the query performance,
> because it needs to filter carbondata files and delta file, right?
> 
> 
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 


Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by xm_zzc <44...@qq.com>.
Hi, Akash R:
  thanks for your reply.
  I am talking about the delta data files including update and delete delta
files. Horizontal compaction just compacts all delta files into one in one
segment, right? But if the size of segment is big and the size of update and
delete delta file is big too, I think it will effect the query performance,
because it needs to filter carbondata files and delta file, right?



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Add new compaction type for compacting delta data file

Posted by akashrn5 <ak...@gmail.com>.
Hi, 


I have some doubt, are you talking about the delete delta files ? Or delta
data files ? Is it specific to update and delete scenarios?

If compating is just within segment, it's similar to horizontal compaction
in case of update and delete. So is it required to create a new segment?

Thanks

Regards,
Akash R




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/