You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by 蔡聪辉 <ca...@163.com> on 2022/03/30 06:00:01 UTC

Re:Re:Re: [Proposal] Support load data only with some tablets instead of all tablets in the partition to improve data loading stability

Here is the design draft, and I has created dsip in doris wiki
https://cwiki.apache.org/confluence/display/DORIS/DSIP-005%3A+Support+Random+Sink
feel free to discuss with it
and some releated implementation work PR  see #8041 #8259
At 2022-02-22 13:38:23, "陈明雨" <mo...@163.com> wrote:
>If you want to implement version control at the tablet level, then first, you need to design a way for FE to be able to clearly distinguish between “missing versions” and “unimported versions”.
>
>Let's say two tablet A and B under the same partition. tablet A is with version 3 and tablet B is with version 2. Then how can we determine if B is a missing version of a tablet, or a tablet that has not been imported.
>
>Before implementing this part of the code, I recommend a more detailed design to ensure that the currently existing feature does not receive any impact or has a manageable impact.
>
>
>
>
>--
>
>此致!Best Regards
>陈明雨 Mingyu Chen
>
>Email:
>chenmingyu@apache.org
>
>
>
>
>
>At 2022-02-22 13:28:00, "王博" <wa...@gmail.com> wrote:
>>Problems and solutions are relatively clear.
>>I would like to do some discussion on the problems this solution may face.
>>First, It seems that you want to commit transactions based on tablet
>>granularity, I think multiple tablets in one partition should
>>keep consistency in one steam load.
>>In one stream load, if some tablets commits success, some tablets commits
>>failed, how to re-load the failed tablet data.
>>From the user's point of view, how to deal the case which the steam load
>>with partial tablet load failed.
>>
>>Second, this may be a discussion on the details. Is the transaction lock
>>granularity still table of tablet?
>>
>>Finally, I think this is a very valuable project for Doris Load,Can you
>>provide a brief project plan? Including what are the parts of the whole,
>>and what does each part do? This can help newcomers to quickly understand
>>the project and participate in the development.
>>
>>
>>蔡聪辉 <ca...@163.com> 于2022年2月21日周一 19:37写道:
>>
>>> Now,we want to improve the performance and stability of loading data,one
>>> way is only load some tablets in the partition instead of all tablets which
>>> may cause many little files and bring instability,and for stability I would
>>> gradually change it, the first step is to support tablet with version
>>> property and then enable be to submit some tablets of one partition to
>>> commit transaction and finally publish them.
>>>
>>>
>>> The main advantages of this feature is : each load may only involve a
>>> small number of be nodes, which can greatly enhance the stability of the
>>> loading data.
>>>
>>>
>>>
>>>