You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by ling miao <li...@apache.org> on 2020/11/20 10:11:31 UTC

Support for forced drop of user data

Hi folks,

Currently, when user drop table or partition, Doris will first put the data
in the trash directory, and then wait for the trash directory to be emptied.

But during operation and maintenance, I found that when I need to clean up
cluster data to release disk resources, I need the following steps:
1. Drop table or partition
2. Log in to the machine
3. Clean up the trash directory

If there are too many BE nodes in the cluster, the entire process of
releasing disk resources will be extremely slow and cumbersome.

So I propose whether it is possible to add a function for forcibly dropping
data. After this function is called, the data will be deleted directly
instead of placing the data in the trash directory.

Looking forward to your reply (*^ω^*)


Ling Miao

Re: Re: Re: Support for forced drop of user data

Posted by ling miao <li...@apache.org>.
I checked the code of be.
The config 'trash_file_expire_time_sec' is a unified configuration.
It means that the expiration time of all tablets are the same.
In other words, each tablet does not have its own expiration time.

I wonder if the code I understand is correct?
If I understand correctly, we cannot force the tablet to expire immediately
by force drop stmt.

Ling Miao


ling miao <li...@apache.org> 于2020年11月23日周一 下午8:25写道:

> It is true that the metadata can be cleared immediately, but the data
> cannot be cleared immediately.
> Or do we set the expire time of the tablet to a small value while force?
>
> Ling Miao
>
> 蔡聪辉 <ca...@163.com> 于2020年11月23日周一 上午10:46写道:
>
>> Now, Doris support Drop partition, Drop Table, Drop database force
>> operation,
>> so if we doesn't want to recover metadata anymore, we can simple execute
>> drop force operation.
>>
>>
>> From Conghui Cai
>> At 2020-11-23 10:05:46, "ling miao" <li...@apache.org> wrote:
>> >Or can we have a new command to clear metadata and data directly?
>> >Instead of adjusting the parameters before issuing the command.
>> >Because if you want to adjust the parameters, it may affect some data
>> that
>> >you don't want to delete immediately.
>> >Instead, our repair function is affected.
>> >
>> >Ling Miao
>> >
>> >陈明雨 <mo...@163.com> 于2020年11月22日周日 下午3:33写道:
>> >
>> >> Yes, there will be a problem that once the user recover the partition
>> by
>> >> executing `RECOVER` statement,
>> >> it will only recover the meta data, but the data is lost.
>> >>
>> >>
>> >> I think we can implement a command to view the meta data in Catalog
>> >> Recycle Bin, and can drop them manually
>> >> with some properties to tell the BE to remove the data along with.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> 此致!Best Regards
>> >> 陈明雨 Mingyu Chen
>> >>
>> >> Email:
>> >> chenmingyu@apache.org
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> 在 2020-11-22 10:05:13,"ling miao" <li...@apache.org> 写道:
>> >> >Now we often just drop the partition and go directly to clean up the
>> files
>> >> >under trash
>> >> >However, Doris continues to keep the catalog as you said, will there
>> be no
>> >> >problem?
>> >> >
>> >> >Ling Miao
>> >> >
>> >> >陈明雨 <mo...@163.com> 于2020年11月21日周六 下午11:12写道:
>> >> >
>> >> >> First, after table is dropped, it will stay in Catalog Recycle bin
>> for a
>> >> >> while, configured be FE config "catalog_trash_expire_second",
>> default is
>> >> >> 86400.
>> >> >> After that, the table will be dropped from Catalog, and data will be
>> >> moved
>> >> >> to BE's trash. And then, the BE's trash will be cleaned later,
>> >> >> configued by BE config "trash_file_expire_time_sec", default is
>> 259200.
>> >> >>
>> >> >>
>> >> >> If we want to clean the data immediately, we can first modify the FE
>> >> >> config "catalog_trash_expire_second" to a smaller value.
>> >> >> And we can implement a new command to view and clean BE's trash.
>> >> >>
>> >> >>
>> >> >> For example, we can add a new column in result of "SHOW BACKENDS" to
>> >> view
>> >> >> the trash data size of each BE.
>> >> >> And a new command such as: "ADMIN CLEAN BACKEND TRASH 'xxx'" to
>> clean
>> >> the
>> >> >> trash immediately.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> 此致!Best Regards
>> >> >> 陈明雨 Mingyu Chen
>> >> >>
>> >> >> Email:
>> >> >> chenmingyu@apache.org
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2020-11-20 18:11:31, "ling miao" <li...@apache.org> wrote:
>> >> >> >Hi folks,
>> >> >> >
>> >> >> >Currently, when user drop table or partition, Doris will first put
>> the
>> >> >> data
>> >> >> >in the trash directory, and then wait for the trash directory to be
>> >> >> emptied.
>> >> >> >
>> >> >> >But during operation and maintenance, I found that when I need to
>> >> clean up
>> >> >> >cluster data to release disk resources, I need the following steps:
>> >> >> >1. Drop table or partition
>> >> >> >2. Log in to the machine
>> >> >> >3. Clean up the trash directory
>> >> >> >
>> >> >> >If there are too many BE nodes in the cluster, the entire process
>> of
>> >> >> >releasing disk resources will be extremely slow and cumbersome.
>> >> >> >
>> >> >> >So I propose whether it is possible to add a function for forcibly
>> >> >> dropping
>> >> >> >data. After this function is called, the data will be deleted
>> directly
>> >> >> >instead of placing the data in the trash directory.
>> >> >> >
>> >> >> >Looking forward to your reply (*^ω^*)
>> >> >> >
>> >> >> >
>> >> >> >Ling Miao
>> >> >>
>> >>
>>
>

Re: Re: Re: Support for forced drop of user data

Posted by ling miao <li...@apache.org>.
It is true that the metadata can be cleared immediately, but the data
cannot be cleared immediately.
Or do we set the expire time of the tablet to a small value while force?

Ling Miao

蔡聪辉 <ca...@163.com> 于2020年11月23日周一 上午10:46写道:

> Now, Doris support Drop partition, Drop Table, Drop database force
> operation,
> so if we doesn't want to recover metadata anymore, we can simple execute
> drop force operation.
>
>
> From Conghui Cai
> At 2020-11-23 10:05:46, "ling miao" <li...@apache.org> wrote:
> >Or can we have a new command to clear metadata and data directly?
> >Instead of adjusting the parameters before issuing the command.
> >Because if you want to adjust the parameters, it may affect some data that
> >you don't want to delete immediately.
> >Instead, our repair function is affected.
> >
> >Ling Miao
> >
> >陈明雨 <mo...@163.com> 于2020年11月22日周日 下午3:33写道:
> >
> >> Yes, there will be a problem that once the user recover the partition by
> >> executing `RECOVER` statement,
> >> it will only recover the meta data, but the data is lost.
> >>
> >>
> >> I think we can implement a command to view the meta data in Catalog
> >> Recycle Bin, and can drop them manually
> >> with some properties to tell the BE to remove the data along with.
> >>
> >>
> >>
> >>
> >> --
> >>
> >> 此致!Best Regards
> >> 陈明雨 Mingyu Chen
> >>
> >> Email:
> >> chenmingyu@apache.org
> >>
> >>
> >>
> >>
> >>
> >> 在 2020-11-22 10:05:13,"ling miao" <li...@apache.org> 写道:
> >> >Now we often just drop the partition and go directly to clean up the
> files
> >> >under trash
> >> >However, Doris continues to keep the catalog as you said, will there
> be no
> >> >problem?
> >> >
> >> >Ling Miao
> >> >
> >> >陈明雨 <mo...@163.com> 于2020年11月21日周六 下午11:12写道:
> >> >
> >> >> First, after table is dropped, it will stay in Catalog Recycle bin
> for a
> >> >> while, configured be FE config "catalog_trash_expire_second",
> default is
> >> >> 86400.
> >> >> After that, the table will be dropped from Catalog, and data will be
> >> moved
> >> >> to BE's trash. And then, the BE's trash will be cleaned later,
> >> >> configued by BE config "trash_file_expire_time_sec", default is
> 259200.
> >> >>
> >> >>
> >> >> If we want to clean the data immediately, we can first modify the FE
> >> >> config "catalog_trash_expire_second" to a smaller value.
> >> >> And we can implement a new command to view and clean BE's trash.
> >> >>
> >> >>
> >> >> For example, we can add a new column in result of "SHOW BACKENDS" to
> >> view
> >> >> the trash data size of each BE.
> >> >> And a new command such as: "ADMIN CLEAN BACKEND TRASH 'xxx'" to clean
> >> the
> >> >> trash immediately.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> 此致!Best Regards
> >> >> 陈明雨 Mingyu Chen
> >> >>
> >> >> Email:
> >> >> chenmingyu@apache.org
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2020-11-20 18:11:31, "ling miao" <li...@apache.org> wrote:
> >> >> >Hi folks,
> >> >> >
> >> >> >Currently, when user drop table or partition, Doris will first put
> the
> >> >> data
> >> >> >in the trash directory, and then wait for the trash directory to be
> >> >> emptied.
> >> >> >
> >> >> >But during operation and maintenance, I found that when I need to
> >> clean up
> >> >> >cluster data to release disk resources, I need the following steps:
> >> >> >1. Drop table or partition
> >> >> >2. Log in to the machine
> >> >> >3. Clean up the trash directory
> >> >> >
> >> >> >If there are too many BE nodes in the cluster, the entire process of
> >> >> >releasing disk resources will be extremely slow and cumbersome.
> >> >> >
> >> >> >So I propose whether it is possible to add a function for forcibly
> >> >> dropping
> >> >> >data. After this function is called, the data will be deleted
> directly
> >> >> >instead of placing the data in the trash directory.
> >> >> >
> >> >> >Looking forward to your reply (*^ω^*)
> >> >> >
> >> >> >
> >> >> >Ling Miao
> >> >>
> >>
>

Re:Re: Re: Support for forced drop of user data

Posted by 蔡聪辉 <ca...@163.com>.
Now, Doris support Drop partition, Drop Table, Drop database force operation,
so if we doesn't want to recover metadata anymore, we can simple execute
drop force operation.


From Conghui Cai
At 2020-11-23 10:05:46, "ling miao" <li...@apache.org> wrote:
>Or can we have a new command to clear metadata and data directly?
>Instead of adjusting the parameters before issuing the command.
>Because if you want to adjust the parameters, it may affect some data that
>you don't want to delete immediately.
>Instead, our repair function is affected.
>
>Ling Miao
>
>陈明雨 <mo...@163.com> 于2020年11月22日周日 下午3:33写道:
>
>> Yes, there will be a problem that once the user recover the partition by
>> executing `RECOVER` statement,
>> it will only recover the meta data, but the data is lost.
>>
>>
>> I think we can implement a command to view the meta data in Catalog
>> Recycle Bin, and can drop them manually
>> with some properties to tell the BE to remove the data along with.
>>
>>
>>
>>
>> --
>>
>> 此致!Best Regards
>> 陈明雨 Mingyu Chen
>>
>> Email:
>> chenmingyu@apache.org
>>
>>
>>
>>
>>
>> 在 2020-11-22 10:05:13,"ling miao" <li...@apache.org> 写道:
>> >Now we often just drop the partition and go directly to clean up the files
>> >under trash
>> >However, Doris continues to keep the catalog as you said, will there be no
>> >problem?
>> >
>> >Ling Miao
>> >
>> >陈明雨 <mo...@163.com> 于2020年11月21日周六 下午11:12写道:
>> >
>> >> First, after table is dropped, it will stay in Catalog Recycle bin for a
>> >> while, configured be FE config "catalog_trash_expire_second", default is
>> >> 86400.
>> >> After that, the table will be dropped from Catalog, and data will be
>> moved
>> >> to BE's trash. And then, the BE's trash will be cleaned later,
>> >> configued by BE config "trash_file_expire_time_sec", default is 259200.
>> >>
>> >>
>> >> If we want to clean the data immediately, we can first modify the FE
>> >> config "catalog_trash_expire_second" to a smaller value.
>> >> And we can implement a new command to view and clean BE's trash.
>> >>
>> >>
>> >> For example, we can add a new column in result of "SHOW BACKENDS" to
>> view
>> >> the trash data size of each BE.
>> >> And a new command such as: "ADMIN CLEAN BACKEND TRASH 'xxx'" to clean
>> the
>> >> trash immediately.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> 此致!Best Regards
>> >> 陈明雨 Mingyu Chen
>> >>
>> >> Email:
>> >> chenmingyu@apache.org
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2020-11-20 18:11:31, "ling miao" <li...@apache.org> wrote:
>> >> >Hi folks,
>> >> >
>> >> >Currently, when user drop table or partition, Doris will first put the
>> >> data
>> >> >in the trash directory, and then wait for the trash directory to be
>> >> emptied.
>> >> >
>> >> >But during operation and maintenance, I found that when I need to
>> clean up
>> >> >cluster data to release disk resources, I need the following steps:
>> >> >1. Drop table or partition
>> >> >2. Log in to the machine
>> >> >3. Clean up the trash directory
>> >> >
>> >> >If there are too many BE nodes in the cluster, the entire process of
>> >> >releasing disk resources will be extremely slow and cumbersome.
>> >> >
>> >> >So I propose whether it is possible to add a function for forcibly
>> >> dropping
>> >> >data. After this function is called, the data will be deleted directly
>> >> >instead of placing the data in the trash directory.
>> >> >
>> >> >Looking forward to your reply (*^ω^*)
>> >> >
>> >> >
>> >> >Ling Miao
>> >>
>>

Re: Re: Support for forced drop of user data

Posted by ling miao <li...@apache.org>.
Or can we have a new command to clear metadata and data directly?
Instead of adjusting the parameters before issuing the command.
Because if you want to adjust the parameters, it may affect some data that
you don't want to delete immediately.
Instead, our repair function is affected.

Ling Miao

陈明雨 <mo...@163.com> 于2020年11月22日周日 下午3:33写道:

> Yes, there will be a problem that once the user recover the partition by
> executing `RECOVER` statement,
> it will only recover the meta data, but the data is lost.
>
>
> I think we can implement a command to view the meta data in Catalog
> Recycle Bin, and can drop them manually
> with some properties to tell the BE to remove the data along with.
>
>
>
>
> --
>
> 此致!Best Regards
> 陈明雨 Mingyu Chen
>
> Email:
> chenmingyu@apache.org
>
>
>
>
>
> 在 2020-11-22 10:05:13,"ling miao" <li...@apache.org> 写道:
> >Now we often just drop the partition and go directly to clean up the files
> >under trash
> >However, Doris continues to keep the catalog as you said, will there be no
> >problem?
> >
> >Ling Miao
> >
> >陈明雨 <mo...@163.com> 于2020年11月21日周六 下午11:12写道:
> >
> >> First, after table is dropped, it will stay in Catalog Recycle bin for a
> >> while, configured be FE config "catalog_trash_expire_second", default is
> >> 86400.
> >> After that, the table will be dropped from Catalog, and data will be
> moved
> >> to BE's trash. And then, the BE's trash will be cleaned later,
> >> configued by BE config "trash_file_expire_time_sec", default is 259200.
> >>
> >>
> >> If we want to clean the data immediately, we can first modify the FE
> >> config "catalog_trash_expire_second" to a smaller value.
> >> And we can implement a new command to view and clean BE's trash.
> >>
> >>
> >> For example, we can add a new column in result of "SHOW BACKENDS" to
> view
> >> the trash data size of each BE.
> >> And a new command such as: "ADMIN CLEAN BACKEND TRASH 'xxx'" to clean
> the
> >> trash immediately.
> >>
> >>
> >>
> >>
> >> --
> >>
> >> 此致!Best Regards
> >> 陈明雨 Mingyu Chen
> >>
> >> Email:
> >> chenmingyu@apache.org
> >>
> >>
> >>
> >>
> >>
> >> At 2020-11-20 18:11:31, "ling miao" <li...@apache.org> wrote:
> >> >Hi folks,
> >> >
> >> >Currently, when user drop table or partition, Doris will first put the
> >> data
> >> >in the trash directory, and then wait for the trash directory to be
> >> emptied.
> >> >
> >> >But during operation and maintenance, I found that when I need to
> clean up
> >> >cluster data to release disk resources, I need the following steps:
> >> >1. Drop table or partition
> >> >2. Log in to the machine
> >> >3. Clean up the trash directory
> >> >
> >> >If there are too many BE nodes in the cluster, the entire process of
> >> >releasing disk resources will be extremely slow and cumbersome.
> >> >
> >> >So I propose whether it is possible to add a function for forcibly
> >> dropping
> >> >data. After this function is called, the data will be deleted directly
> >> >instead of placing the data in the trash directory.
> >> >
> >> >Looking forward to your reply (*^ω^*)
> >> >
> >> >
> >> >Ling Miao
> >>
>

Re:Re: Support for forced drop of user data

Posted by 陈明雨 <mo...@163.com>.
Yes, there will be a problem that once the user recover the partition by executing `RECOVER` statement,
it will only recover the meta data, but the data is lost.


I think we can implement a command to view the meta data in Catalog Recycle Bin, and can drop them manually
with some properties to tell the BE to remove the data along with.




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmingyu@apache.org





在 2020-11-22 10:05:13,"ling miao" <li...@apache.org> 写道:
>Now we often just drop the partition and go directly to clean up the files
>under trash
>However, Doris continues to keep the catalog as you said, will there be no
>problem?
>
>Ling Miao
>
>陈明雨 <mo...@163.com> 于2020年11月21日周六 下午11:12写道:
>
>> First, after table is dropped, it will stay in Catalog Recycle bin for a
>> while, configured be FE config "catalog_trash_expire_second", default is
>> 86400.
>> After that, the table will be dropped from Catalog, and data will be moved
>> to BE's trash. And then, the BE's trash will be cleaned later,
>> configued by BE config "trash_file_expire_time_sec", default is 259200.
>>
>>
>> If we want to clean the data immediately, we can first modify the FE
>> config "catalog_trash_expire_second" to a smaller value.
>> And we can implement a new command to view and clean BE's trash.
>>
>>
>> For example, we can add a new column in result of "SHOW BACKENDS" to view
>> the trash data size of each BE.
>> And a new command such as: "ADMIN CLEAN BACKEND TRASH 'xxx'" to clean the
>> trash immediately.
>>
>>
>>
>>
>> --
>>
>> 此致!Best Regards
>> 陈明雨 Mingyu Chen
>>
>> Email:
>> chenmingyu@apache.org
>>
>>
>>
>>
>>
>> At 2020-11-20 18:11:31, "ling miao" <li...@apache.org> wrote:
>> >Hi folks,
>> >
>> >Currently, when user drop table or partition, Doris will first put the
>> data
>> >in the trash directory, and then wait for the trash directory to be
>> emptied.
>> >
>> >But during operation and maintenance, I found that when I need to clean up
>> >cluster data to release disk resources, I need the following steps:
>> >1. Drop table or partition
>> >2. Log in to the machine
>> >3. Clean up the trash directory
>> >
>> >If there are too many BE nodes in the cluster, the entire process of
>> >releasing disk resources will be extremely slow and cumbersome.
>> >
>> >So I propose whether it is possible to add a function for forcibly
>> dropping
>> >data. After this function is called, the data will be deleted directly
>> >instead of placing the data in the trash directory.
>> >
>> >Looking forward to your reply (*^ω^*)
>> >
>> >
>> >Ling Miao
>>

Re: Support for forced drop of user data

Posted by ling miao <li...@apache.org>.
Now we often just drop the partition and go directly to clean up the files
under trash
However, Doris continues to keep the catalog as you said, will there be no
problem?

Ling Miao

陈明雨 <mo...@163.com> 于2020年11月21日周六 下午11:12写道:

> First, after table is dropped, it will stay in Catalog Recycle bin for a
> while, configured be FE config "catalog_trash_expire_second", default is
> 86400.
> After that, the table will be dropped from Catalog, and data will be moved
> to BE's trash. And then, the BE's trash will be cleaned later,
> configued by BE config "trash_file_expire_time_sec", default is 259200.
>
>
> If we want to clean the data immediately, we can first modify the FE
> config "catalog_trash_expire_second" to a smaller value.
> And we can implement a new command to view and clean BE's trash.
>
>
> For example, we can add a new column in result of "SHOW BACKENDS" to view
> the trash data size of each BE.
> And a new command such as: "ADMIN CLEAN BACKEND TRASH 'xxx'" to clean the
> trash immediately.
>
>
>
>
> --
>
> 此致!Best Regards
> 陈明雨 Mingyu Chen
>
> Email:
> chenmingyu@apache.org
>
>
>
>
>
> At 2020-11-20 18:11:31, "ling miao" <li...@apache.org> wrote:
> >Hi folks,
> >
> >Currently, when user drop table or partition, Doris will first put the
> data
> >in the trash directory, and then wait for the trash directory to be
> emptied.
> >
> >But during operation and maintenance, I found that when I need to clean up
> >cluster data to release disk resources, I need the following steps:
> >1. Drop table or partition
> >2. Log in to the machine
> >3. Clean up the trash directory
> >
> >If there are too many BE nodes in the cluster, the entire process of
> >releasing disk resources will be extremely slow and cumbersome.
> >
> >So I propose whether it is possible to add a function for forcibly
> dropping
> >data. After this function is called, the data will be deleted directly
> >instead of placing the data in the trash directory.
> >
> >Looking forward to your reply (*^ω^*)
> >
> >
> >Ling Miao
>

Re:Support for forced drop of user data

Posted by 陈明雨 <mo...@163.com>.
First, after table is dropped, it will stay in Catalog Recycle bin for a while, configured be FE config "catalog_trash_expire_second", default is 86400.
After that, the table will be dropped from Catalog, and data will be moved to BE's trash. And then, the BE's trash will be cleaned later,
configued by BE config "trash_file_expire_time_sec", default is 259200.


If we want to clean the data immediately, we can first modify the FE config "catalog_trash_expire_second" to a smaller value.
And we can implement a new command to view and clean BE's trash.


For example, we can add a new column in result of "SHOW BACKENDS" to view the trash data size of each BE.
And a new command such as: "ADMIN CLEAN BACKEND TRASH 'xxx'" to clean the trash immediately.




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmingyu@apache.org





At 2020-11-20 18:11:31, "ling miao" <li...@apache.org> wrote:
>Hi folks,
>
>Currently, when user drop table or partition, Doris will first put the data
>in the trash directory, and then wait for the trash directory to be emptied.
>
>But during operation and maintenance, I found that when I need to clean up
>cluster data to release disk resources, I need the following steps:
>1. Drop table or partition
>2. Log in to the machine
>3. Clean up the trash directory
>
>If there are too many BE nodes in the cluster, the entire process of
>releasing disk resources will be extremely slow and cumbersome.
>
>So I propose whether it is possible to add a function for forcibly dropping
>data. After this function is called, the data will be deleted directly
>instead of placing the data in the trash directory.
>
>Looking forward to your reply (*^ω^*)
>
>
>Ling Miao

Re: Support for forced drop of user data

Posted by "Xu,Yang(AIDP)" <xu...@baidu.com>.
Good idea. I think it is very useful for user.

在 2020/11/20 下午6:11,“ling miao”<li...@apache.org> 写入:

    Hi folks,
    
    Currently, when user drop table or partition, Doris will first put the data
    in the trash directory, and then wait for the trash directory to be emptied.
    
    But during operation and maintenance, I found that when I need to clean up
    cluster data to release disk resources, I need the following steps:
    1. Drop table or partition
    2. Log in to the machine
    3. Clean up the trash directory
    
    If there are too many BE nodes in the cluster, the entire process of
    releasing disk resources will be extremely slow and cumbersome.
    
    So I propose whether it is possible to add a function for forcibly dropping
    data. After this function is called, the data will be deleted directly
    instead of placing the data in the trash directory.
    
    Looking forward to your reply (*^ω^*)
    
    
    Ling Miao