You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/08/27 14:57:02 UTC

[GitHub] [incubator-doris] morningman opened a new issue #6521: [DevCon] meeting minutes [Updated 2021-08-27]

morningman opened a new issue #6521:
URL: https://github.com/apache/incubator-doris/issues/6521


   This is the minutes of the developer meeting. After each meeting, we will update this summary.
   Developers can use this summary to track the progress of related issues.
   
   -----------------
   
   1. Novice task
   
       The goal of the novice task is to bring in more developers to join the community building. For developers who are participating in open source contributions for the first time, they can choose one from the novice task list, which can help developers familiarize themselves with the submission process and feel the community-friendliness. We have already referred to mature projects such as Apache Pulsar and Apache DolphinScheduler. The novice task is currently being planned and will be released in the near future.
   
   2. SIG (Special Interest Group)
   
       Set up some special interest groups according to modules, classify pr/issues according to modules, and send them to the corresponding groups, so that it is more convenient to discuss related issues together, and people who are interested in certain modules can pay attention to the progress of community development.
   
       At present, Doris has actually established SIGs in several directions, including Doris Manager and vectorization. We will gradually open more SIGs in the future. Everyone is welcome to participate.
   
   3. Document construction
   
       At present, the comprehensiveness of the Doris documentation is somewhat lacking. Some are because they forgot to add the documentation during development, and some are because the documentation is not updated in time when the function is iterated. At the same time, there are some problems with the document format. We hope that the overall reconstruction can be carried out.
   
       Through the overall refactoring of the document, on the one hand, it can help Doris to improve the grammar manual. On the other hand, it is a relatively friendly task for novices. It can help everyone become familiar with Doris's functions as soon as possible, and it can also help everyone integrate into the community as soon as possible. We have created a new issue of a document example on Github, and built an empty framework for all documents, just fill in the content, and hope that more friends who want to join the community construction can participate in the document construction.
       
       #6336 
   
       This is also a kind of novice task.
   
   4. Regression testing
   
       Currently, for the development of this submitted PR, the community only provides unit test detection and a minimum test set containing more than 100 cases to ensure code quality. Although Baidu has a complete regression test set for daily regression testing, it is temporarily not visible to external developers, so it is not conducive to community developers to conduct tests.
       
       In the follow-up, we will try to provide a set of regression testing framework, and can support developers to add and improve the case, so as to further ensure the quality of the code.
   
   ## Function development related
   
   1. Vectorized execution engine
   
       Through the transformation of vectorized operators, we want to significantly improve Doris' query performance. This work involves code refactoring of all operators and storage layers, and will be one of the key research and development directions of the community this year.
       
       At present, the first stage of related work has been completed, and the vectorized execution operation of a single table can be realized. The first version of this version is expected to be released in September. Follow-up work is also underway, and it is expected to meet with you in the Q4 quarter. At present, the SIG for this work has been established. If you are interested in participating, please contact us.
   
       #6238
   
   2. Doris Manager visual operation and maintenance monitoring platform
   
       Doris Manager is mainly positioned to access cluster monitoring and support some operation and maintenance operations, such as cluster deployment wizard, node management, rolling upgrade, online expansion and contraction, etc. Doris Manager is currently in the intensive development phase. In September, we will first release a version to the community. Welcome more community friends to join, especially front-end development students. Of course, everyone is welcome to feedback the problems in the operation and maintenance process. After abstracting productization and functionalization, we can add it to the function list of Doris Manager.
   
       [doris-manager branch](https://github.com/apache/incubator-doris/tree/doris-manager)
   
   3. New Query Optimizer
   
       The query optimizer is one of the most important components of Doris. The current query optimizer framework has some problems such as unclear hierarchical design and poor scalability. We hope that the first version of the operational query optimizer can be launched by the end of the year. At present, some framework design verification work and the development of peripheral related functions have been carried out, such as the collection of statistical information. Welcome students who have experience in research and development of query optimizers to contact us.
   
       Finally, I hope everyone will join in. First of all, I hope to collect a new name for New Optimizer, which can highlight a certain characteristic of New Optimizer, just like the fastest and most accurate ability to locate the best plan. At the same time, we sincerely invite people who have ideas about New Optimizer to cooperate. I hope to have a strong enthusiasm for optimizer technology. It is better if you understand Cascades theory or have other open source products such as Spark and Presto optimizer development experience. We have also prepared some relatively simple tasks for novices, and hope that everyone can participate.
   
       #6483
   
   4. Resource isolation
   
       Resource isolation is also a function that many users care about. For the database of MPP architecture, resource isolation is a headache, because the original intention of MPP architecture design is to use cluster resources as much as possible to process query tasks. If there are multiple tasks, it must be Resource preemption will occur.
   
       At present, we mainly do two parts of work. One is resource labeling. The storage and computing nodes in the Doris cluster are divided into resource label groups, so that the resources in the cluster can be divided and isolated at the node level. The second is the resource limit of a single query, which limits the CPU usage of a query on a single node through parameters, which is more suitable for scenarios where users run timed tasks and are not sensitive to delay. The two parts of the work have been developed and will soon be integrated into the community.
   
       #5902
       #6442
   
   5. Z-Order Indexing
   
       Doris's current data is sorted and stored according to the prefix column, so when the prefix query conditions are included, you can perform quick data search on the sorted data. However, if the query condition is not a prefix column, you cannot use the characteristics of data sorting for fast data search. After the investigation, we found that Z-Order Indexing can solve this problem, and it can have a good filtering effect in the Kanban type multi-column query scenario. At present, the algorithm has basically been developed, and related testing and verification work is in progress.
       
       At the same time, Z-Order may bring a certain write performance degradation. Although the current test results show that the performance impact is not significant, the test conclusions need to be further refined
   
       #6359 
   
   6. PreparedStatement
   
       Doris currently does not support Prepared Statement operations on the Server side. Prepared statement can effectively prevent SQL injection problems, and can reduce the overhead of repeated parsing of query statements in some scenarios.
       
       For the SQL injection problem, because the MySQL Driver of most languages ​​supports Prepared Statement operations on the Client side, it can solve most SQL injection problems.
       
       Regarding the Prepared Statement operation on the Server side, we will continue to investigate.
   
   7. InternalErrorCode
   
       At present, the ErrorCode in Doris is rather confusing, which is not conducive to program access and error judgment. In the future, we will sort out the error codes and form a more standardized error message display.
   
       #6357 
   
   8. Import performance optimization
   
       At present, the import performance of Doris still has a lot of room for optimization, especially in the generation of the memory structure and the optimization of the disk writing stage. This aspect needs further analysis.
   
       #6398 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org