You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by 陈明雨 <mo...@163.com> on 2022/04/27 07:46:01 UTC

Re:Re:Re: Refactor Doris's IO Stack

Add write priv for plat1ko

[1] https://cwiki.apache.org/confluence/display/DORIS/DSIP-006%3A+Refactor+IO+stack




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmingyu@apache.org





在 2022-03-31 14:41:35,"陈明雨" <mo...@163.com> 写道:
>Hi Guolei,
>I have created DSIP-006 for this proposal
>https://cwiki.apache.org/confluence/display/DORIS/DSIP-006%3A+Refactor+IO+stack
>
>
>
>
>--
>
>此致!Best Regards
>陈明雨 Mingyu Chen
>
>Email:
>chenmingyu@apache.org
>
>
>
>
>
>在 2022-03-30 12:35:44,"王博" <wa...@gmail.com> 写道:
>>+1
>>Looking forward Teacher Guolei's dsip.
>>
>>GuoLei Yi <yi...@gmail.com> 于2022年3月29日周二 14:17写道:
>>
>>> Currently, there are various interfaces for file IO operations in Doris:
>>>
>>>    - There are FileReader and FileWriter in the query layer. There are
>>>    corresponding implementations for HDFS, S3, Broker, and Local.
>>>    - In the storage layer, there is a BlockManager that abstracts Block,
>>>    there are WriteableFileBlock, ReadableFileBlock.
>>>    - For directory management work, there is an Env interface that can
>>>    include directory operations, including RemoteEnv and PosixEnv, and
>>> there
>>>    are also some link files and delete blocks in BlockManager; in addition,
>>>    for S3, HDFS, there are operations such as S3StorageBackend that contain
>>>    some file directories, including mkdir, copy , rm these operations
>>>
>>> So many ways to operate will  cause the following problems:
>>>
>>>    - It's messy, sometimes I don't know which one to use, many functions
>>>    are repeated, but they have different abstract names;
>>>    - Modifying a feature or fix a bug needs to be modified in multiple
>>>    places. For example, if we want to read S3 and have a local cache, then
>>> all
>>>    places need to be added;
>>>
>>> We need to unify the IO stack. In fact, access to IO can be roughly divided
>>> into the following three types:
>>>
>>>    - Directory operations, create files, delete files, get file list, etc.
>>>    - File write operation
>>>    - File read operation
>>>
>>> And we could implement these API for different storage backends:
>>>
>>>
>>>    - Local file
>>>    - S3 file
>>>    - HDFS file
>>>    - Broker
>>>
>>> Once implemented, it can be used in the storage layer (separation of hot
>>> and cold, separation of storage and computing), query layer (query S3,
>>> query HDFS), backup and recovery, etc., to avoid repeated development and
>>> maintenance
>>>
>>> --
>>> Guolei Yi
>>> Tel:134-3991-0228
>>> Email:yiguolei@gmail.com
>>>
>>
>>
>>-- 
>>王博  Wang Bo