You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by 蔡聪辉 <ca...@163.com> on 2022/02/21 11:36:48 UTC

[Proposal] Support load data only with some tablets instead of all tablets in the partition to improve data loading stability

Now,we want to improve the performance and stability of loading data,one way is only load some tablets in the partition instead of all tablets which may cause many little files and bring instability,and for stability I would gradually change it, the first step is to support tablet with version property and then enable be to submit some tablets of one partition to commit transaction and finally publish them.


The main advantages of this feature is : each load may only involve a small number of be nodes, which can greatly enhance the stability of the loading data.




Re:Re:Re: [Proposal] Support load data only with some tablets instead of all tablets in the partition to improve data loading stability

Posted by 蔡聪辉 <ca...@163.com>.
Here is the design draft, and I has created dsip in doris wiki
https://cwiki.apache.org/confluence/display/DORIS/DSIP-005%3A+Support+Random+Sink
feel free to discuss with it
and some releated implementation work PR  see #8041 #8259
At 2022-02-22 13:38:23, "陈明雨" <mo...@163.com> wrote:
>If you want to implement version control at the tablet level, then first, you need to design a way for FE to be able to clearly distinguish between “missing versions” and “unimported versions”.
>
>Let's say two tablet A and B under the same partition. tablet A is with version 3 and tablet B is with version 2. Then how can we determine if B is a missing version of a tablet, or a tablet that has not been imported.
>
>Before implementing this part of the code, I recommend a more detailed design to ensure that the currently existing feature does not receive any impact or has a manageable impact.
>
>
>
>
>--
>
>此致!Best Regards
>陈明雨 Mingyu Chen
>
>Email:
>chenmingyu@apache.org
>
>
>
>
>
>At 2022-02-22 13:28:00, "王博" <wa...@gmail.com> wrote:
>>Problems and solutions are relatively clear.
>>I would like to do some discussion on the problems this solution may face.
>>First, It seems that you want to commit transactions based on tablet
>>granularity, I think multiple tablets in one partition should
>>keep consistency in one steam load.
>>In one stream load, if some tablets commits success, some tablets commits
>>failed, how to re-load the failed tablet data.
>>From the user's point of view, how to deal the case which the steam load
>>with partial tablet load failed.
>>
>>Second, this may be a discussion on the details. Is the transaction lock
>>granularity still table of tablet?
>>
>>Finally, I think this is a very valuable project for Doris Load,Can you
>>provide a brief project plan? Including what are the parts of the whole,
>>and what does each part do? This can help newcomers to quickly understand
>>the project and participate in the development.
>>
>>
>>蔡聪辉 <ca...@163.com> 于2022年2月21日周一 19:37写道:
>>
>>> Now,we want to improve the performance and stability of loading data,one
>>> way is only load some tablets in the partition instead of all tablets which
>>> may cause many little files and bring instability,and for stability I would
>>> gradually change it, the first step is to support tablet with version
>>> property and then enable be to submit some tablets of one partition to
>>> commit transaction and finally publish them.
>>>
>>>
>>> The main advantages of this feature is : each load may only involve a
>>> small number of be nodes, which can greatly enhance the stability of the
>>> loading data.
>>>
>>>
>>>
>>>

Re:Re: [Proposal] Support load data only with some tablets instead of all tablets in the partition to improve data loading stability

Posted by 陈明雨 <mo...@163.com>.
If you want to implement version control at the tablet level, then first, you need to design a way for FE to be able to clearly distinguish between “missing versions” and “unimported versions”.

Let's say two tablet A and B under the same partition. tablet A is with version 3 and tablet B is with version 2. Then how can we determine if B is a missing version of a tablet, or a tablet that has not been imported.

Before implementing this part of the code, I recommend a more detailed design to ensure that the currently existing feature does not receive any impact or has a manageable impact.




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmingyu@apache.org





At 2022-02-22 13:28:00, "王博" <wa...@gmail.com> wrote:
>Problems and solutions are relatively clear.
>I would like to do some discussion on the problems this solution may face.
>First, It seems that you want to commit transactions based on tablet
>granularity, I think multiple tablets in one partition should
>keep consistency in one steam load.
>In one stream load, if some tablets commits success, some tablets commits
>failed, how to re-load the failed tablet data.
>From the user's point of view, how to deal the case which the steam load
>with partial tablet load failed.
>
>Second, this may be a discussion on the details. Is the transaction lock
>granularity still table of tablet?
>
>Finally, I think this is a very valuable project for Doris Load,Can you
>provide a brief project plan? Including what are the parts of the whole,
>and what does each part do? This can help newcomers to quickly understand
>the project and participate in the development.
>
>
>蔡聪辉 <ca...@163.com> 于2022年2月21日周一 19:37写道:
>
>> Now,we want to improve the performance and stability of loading data,one
>> way is only load some tablets in the partition instead of all tablets which
>> may cause many little files and bring instability,and for stability I would
>> gradually change it, the first step is to support tablet with version
>> property and then enable be to submit some tablets of one partition to
>> commit transaction and finally publish them.
>>
>>
>> The main advantages of this feature is : each load may only involve a
>> small number of be nodes, which can greatly enhance the stability of the
>> loading data.
>>
>>
>>
>>

Re: [Proposal] Support load data only with some tablets instead of all tablets in the partition to improve data loading stability

Posted by 王博 <wa...@gmail.com>.
Problems and solutions are relatively clear.
I would like to do some discussion on the problems this solution may face.
First, It seems that you want to commit transactions based on tablet
granularity, I think multiple tablets in one partition should
keep consistency in one steam load.
In one stream load, if some tablets commits success, some tablets commits
failed, how to re-load the failed tablet data.
From the user's point of view, how to deal the case which the steam load
with partial tablet load failed.

Second, this may be a discussion on the details. Is the transaction lock
granularity still table of tablet?

Finally, I think this is a very valuable project for Doris Load,Can you
provide a brief project plan? Including what are the parts of the whole,
and what does each part do? This can help newcomers to quickly understand
the project and participate in the development.


蔡聪辉 <ca...@163.com> 于2022年2月21日周一 19:37写道:

> Now,we want to improve the performance and stability of loading data,one
> way is only load some tablets in the partition instead of all tablets which
> may cause many little files and bring instability,and for stability I would
> gradually change it, the first step is to support tablet with version
> property and then enable be to submit some tablets of one partition to
> commit transaction and finally publish them.
>
>
> The main advantages of this feature is : each load may only involve a
> small number of be nodes, which can greatly enhance the stability of the
> loading data.
>
>
>
>