You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Jin Zhou <em...@163.com> on 2018/03/12 02:31:56 UTC

[Discussion] About syntax of compaction on specified segments

Hi, community:

I'm working on PR-1812
( https://github.com/apache/carbondata/pull/1812
<https://github.com/apache/carbondata/pull/1812>  )
which aims to support user specified segments in compaction operation.

After previous discussions, I think there are 3 possible ways to implement
the function:
1) Extending existing SQL syntax of Major and Minor compaciton:
    ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4';
    ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4';

2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor
compaciton:
    SET carbon.input.segments.dbname.tablename=1,3;
    ALTER TABLE tablename compact 'MAJOR';

3) Adding a new compaction type and some associated configs, for example,
'CUSTOM' :
    ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4'

I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed
discussion history can be seen on web page:
( https://github.com/apache/carbondata/pull/1812
<https://github.com/apache/carbondata/pull/1812>  )

Now I'm a bit confused and really need your suggestion :)






--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] About syntax of compaction on specified segments

Posted by "bill.zhou" <zg...@163.com>.
hi community 
I prefer the option 3 because of following reason. 

1. it is not good idea to change the previously behavior for majar and minor
compaction. Customer who used the carbondata will be confusion. should
compatibility previously feature. 
2. customer compaction is different with major and minor for usability,
customer compaction is a high level feature which need customer know
carbondata logical much more. 



Jin Zhou wrote
> Hi, community:
> 
> I'm working on PR-1812
> ( https://github.com/apache/carbondata/pull/1812
> &lt;https://github.com/apache/carbondata/pull/1812&gt;  )
> which aims to support user specified segments in compaction operation.
> 
> After previous discussions, I think there are 3 possible ways to implement
> the function:
> 1) Extending existing SQL syntax of Major and Minor compaciton:
>     ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4';
>     ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4';
> 
> 2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor
> compaciton:
>     SET carbon.input.segments.dbname.tablename=1,3;
>     ALTER TABLE tablename compact 'MAJOR';
> 
> 3) Adding a new compaction type and some associated configs, for example,
> 'CUSTOM' :
>     ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4'
> 
> I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed
> discussion history can be seen on web page:
> ( https://github.com/apache/carbondata/pull/1812
> &lt;https://github.com/apache/carbondata/pull/1812&gt;  )
> 
> Now I'm a bit confused and really need your suggestion :)
> 
> 
> 
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Re: [Discussion] About syntax of compaction on specified segments

Posted by 徐传印 <xu...@hust.edu.cn>.
I think ‘major’ and ‘minor’ is enough to describe compaction, there is no need to add another one, besides 'custom' is somewhat ambiguous.

As it is described in readme,
```
In Major compaction, multiple segments can be merged into one large segment. 
User will specify the compaction size until which segments can be merged.
```
The previous (default without condition) major compaction is size based, carbondata choose the segments by size. And for the newly major compaction (with condition), we specify the segments and let carbondata merge them into one large segment. 
They are no different in purpose -- merge some segments into larger one, they are only different in selecting segments -- by segment size or by condition. So we don't need an another compaction type.

Re: [Discussion] About syntax of compaction on specified segments

Posted by xm_zzc <44...@qq.com>.
I prefer to use option 3.  Using the new compact type called 'CUSTOM', which
is different from the minor and major, will not make the user confused.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] About syntax of compaction on specified segments

Posted by Liang Chen <ch...@gmail.com>.
Hi

Thank jinzhou started this discussion session.

I also propose to use the proposed solution from manish, not impacts the
current Major and Minor compaction behaviors.

Regards
Liang

manishgupta88 wrote
> Hi,
> 
> I agree with @gvramana &lt;https://github.com/gvramana&gt;
> 
>    1. We should *not use* Major/Minor compaction type as they have a
>    specific meaning and both are controlled by the system for taking
> decisions
>    whether segment is valid to be compacted or not.
>    2. We should *not use* carbon.input.segments.default.seg_compact to set
>    the segments to be compacted.
>    3. We should introduce a new compaction type in the DDL 'CUSTOM' as
>    suggested by @gvramana &lt;https://github.com/gvramana&gt; because it
> is
>    something like force compaction for the given segments as it will not
> check
>    for size and frequency of segments. We can work on using the below
> syntax
>    for custom compaction.
> 
> *ALTER TABLE [db_name.]table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID
> &lt;http://SEGMENT.ID&gt; IN (0,5,8)*
> 
> Once a table is compacted using Custom compaction, then minor compaction
> does not hold good for the custom compacted segment. Custom compacted
> segment should only participate during major compaction if it satisfies
> the
> major compaction size property.
> 
> Regards
> Manish Gupta
> 
> On Tue, Mar 13, 2018 at 2:55 PM, luffy &lt;

> luffy.wang@

> &gt; wrote:
> 
>> compaction have major and minor is ok,not need another like custom,i am
>> more
>> concerned about compaction performance.
>>
>>
>>
>> --
>> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
>> n5.nabble.com/
>>





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] About syntax of compaction on specified segments

Posted by manish gupta <to...@gmail.com>.
Hi,

I agree with @gvramana <https://github.com/gvramana>

   1. We should *not use* Major/Minor compaction type as they have a
   specific meaning and both are controlled by the system for taking decisions
   whether segment is valid to be compacted or not.
   2. We should *not use* carbon.input.segments.default.seg_compact to set
   the segments to be compacted.
   3. We should introduce a new compaction type in the DDL 'CUSTOM' as
   suggested by @gvramana <https://github.com/gvramana> because it is
   something like force compaction for the given segments as it will not check
   for size and frequency of segments. We can work on using the below syntax
   for custom compaction.

*ALTER TABLE [db_name.]table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID
<http://SEGMENT.ID> IN (0,5,8)*

Once a table is compacted using Custom compaction, then minor compaction
does not hold good for the custom compacted segment. Custom compacted
segment should only participate during major compaction if it satisfies the
major compaction size property.

Regards
Manish Gupta

On Tue, Mar 13, 2018 at 2:55 PM, luffy <lu...@huawei.com> wrote:

> compaction have major and minor is ok,not need another like custom,i am
> more
> concerned about compaction performance.
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>

Re: [Discussion] About syntax of compaction on specified segments

Posted by luffy <lu...@huawei.com>.
compaction have major and minor is ok,not need another like custom,i am more 
concerned about compaction performance.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] About syntax of compaction on specified segments

Posted by xuchuanyin <xu...@hust.edu.cn>.
Hi, all:
Here I am to make a conclusion of my opinion and provide option 4.

Option 4:
4) Extending existing SQL syntax of Major and Minor compaciton based on
syntax of delete segment:
    ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT.ID IN (1,2,3,4)
    ALTER TABLE tablename COMPACT 'MINOR' WHERE SEGMENT.ID IN (1,2,3,4)
    ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT.STARTTIME BEFORE
'2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06' 
    ALTER TABLE tablename COMPACT 'MINOR' WHERE SEGMENT.STARTTIME BEFORE
'2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06' 
  Notice: The syntax is slightly different from that of Option1.

The previous (default without condition) major compaction is size based,
carbondata choose the segments by size. And for the newly major compaction
(with condition), we specify the segments and let carbondata merge them into
one large segment. 
Actually the previous compaction statement looks like this
    ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT_SIZE > XXMB
The condition part 'WHERE SEGMENT_SIZE > XXMB' is implicit. However the
condition part in newly compaction statement is explicit.
They are no different in purpose -- merge some segments into larger one,
they are only different in selecting segments -- by segment size or by
condition. So we don't need an another compaction type.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Re: [Discussion] About syntax of compaction on specified segments

Posted by 徐传印 <xu...@hust.edu.cn>.
I think the syntax of segment compaction should be similar with that of other management on segment.
Currently in carbondata, we delete segment using syntax:
```
DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0,5,8)
```
And
```
DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' 
```

So, we can imitate the above syntax and get the followings:
```
ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' WHERE SEGMENT.ID IN (0,5,8)
```
And
```
ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06' 
```
We can support compact segment by specifying IDs and dates.


-----原始邮件-----
发件人:xuchuanyin <xu...@hust.edu.cn>
发送时间:2018-03-12 20:23:20 (星期一)
收件人: "Jin Zhou" <de...@carbondata.apache.org>
抄送: carbondata <de...@carbondata.apache.org>
主题: Re: [Discussion] About syntax of compaction on specified segments


what does 1/2/3/4 mean in your example? If it is the segment id, probably compact segments by specifying date range is also needed.


FROM MOBILE EMAIL CLIENT


在2018年03月12日 10:31,Jin Zhou 写道:
Hi, community:

I'm working on PR-1812
( https://github.com/apache/carbondata/pull/1812
<https://github.com/apache/carbondata/pull/1812&gt;  )
which aims to support user specified segments in compaction operation.

After previous discussions, I think there are 3 possible ways to implement
the function:
1) Extending existing SQL syntax of Major and Minor compaciton:
   ALTER TABLE tablename compact 'MAJOR' '1, 2, 3, 4';
   ALTER TABLE tablename compact 'MINOR' '1, 2, 3, 4';

2) Adding support for CARBON_INPUT_SEGMENTS property of Major and Minor
compaciton:
   SET carbon.input.segments.dbname.tablename=1,3;
   ALTER TABLE tablename compact 'MAJOR';

3) Adding a new compaction type and some associated configs, for example,
'CUSTOM' :
   ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4'

I'm grateful for advice from chenliang,ravipesala and gvramana ,detailed
discussion history can be seen on web page:
( https://github.com/apache/carbondata/pull/1812
<https://github.com/apache/carbondata/pull/1812&gt;  )

Now I'm a bit confused and really need your suggestion :)






--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/