You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/17 04:13:20 UTC

[GitHub] [incubator-doris] EmmyMiao87 opened a new issue #4370: Release Nodes 0.13.0

EmmyMiao87 opened a new issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370


   # New Feature
   ## Query
   + Support bitmap intersect. (#3571)
   + Support `INTO OUTFILE` to export query result. (#3584)
   + Support spill to disk in sort and windows function. (#3820)
   + Add framework of materialized view selector and support more function in mv. (#4014)
   
   ## Delete
   + Support InPredicate in delete statement. (#4006)
   
   ## Load
   + Support load json-data into Doris by RoutineLoad or StreamLoad. (#3553)
   + Support spark load (#3418) etc.
   + Support modify routine load job. (#4158)
   
   ## Plugin
   + [Extension] Logstash Doris output plugin. (#3800)
   
   ## Config
   + [config] Support to modify configs when BE is running without restarting (#3264)
   + [New Feature] Support setting replica quota in db level (#3283)
    
   # Enhancement
   + [Optimize][Delete] Simplify the delete process to make it fast (#3191)
   + [Enhancement] documents rebuild with Vuepress
   + [Query Plan]Support simple transitivity on join predicate pushdown (#3453)
   + Non blocking OlapTableSink (#3143)
   + [TxxMgr] Support txn management in db level and use ArrayDeque to improve txn task performance (#3369)
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [Enhancement] Improve the performance of query with IN predicate  (#3694)
   + [optimize] Optimize spark load/broker load reading parquet format file (#3878)
   + [webserver] Make BE webserver more pretty (#4050)
   + [webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [CodeRefactor] Modify FE modules (#4146)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-674835876


   # Credits
   
   @ZhangYu0123           
   @wfjcmcb               
   @Fullstop000           
   @sduzh                 
   @Stalary               
   @worker24h             
   @chaoyli               
   @vagetablechicken      
   @jmk1011               
   @funyeah               
   @wutiangan             
   @gengjun-git           
   @xinghuayu007          
   @EmmyMiao87            
   @songenjie             
   @acelyc111             
   @yangzhg               
   @Seaven                
   @hexian55              
   @ChenXiaofei           
   @WingsGo               
   @kangpinghuang         
   @wangbo                
   @weizuo93              
   @sdgshawn              
   @skyduy                
   @wyb                   
   @gaodayue              
   @HappenLee             
   @kangkaisen            
   @wuyunfeng             
   @HangyuanLiu           
   @xy720                 
   @liutang123            
   @caiconghui            
   @liyuance              
   @spaces-X              
   @hffariel              
   @decster               
   @blackfox1983          
   @Astralidea            
   @morningman            
   @hf200012              
   @xbyang18              
   @Youngwb               
   @imay                  
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count #3932
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 commented on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 commented on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968]
   
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   # Enhancement
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count #3932
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [CodeRefactor] Generate jave files using maven (#4133)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count (#3932)
   + [ColocateJoin] Support table join itself by colocate join (#4231)
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   + Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [CodeRefactor] Generate jave files using maven (#4133)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count (#3932)
   + [ColocateJoin] Support table join itself by colocate join (#4231)
   + [Load] Support import true or false as boolean value (#3898)
   + [Meta tool] Add segment v2 footer meta viewer (#3822)
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   + Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968]
   
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimized the speed of reading parquet files
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [CodeRefactor] Generate jave files using maven (#4133)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count (#3932)
   + [ColocateJoin] Support table join itself by colocate join (#4231)
   + [Load] Support import true or false as boolean value (#3898)
   + [Meta tool] Add segment v2 footer meta viewer (#3822)
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   + Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count #3932
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count #3932
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968]
   
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [CodeRefactor] Generate jave files using maven (#4133)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count (#3932)
   + [ColocateJoin] Support table join itself by colocate join (#4231)
   + [Load] Support import true or false as boolean value (#3898)
   + [Meta tool] Add segment v2 footer meta viewer (#3822)
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   + Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 commented on issue #4370: Release Notes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 commented on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-722935780


   Apache incubator Doris 0.13 has been released. Welcome to try it~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] morningman closed issue #4370: Release Notes 0.13.0

Posted by GitBox <gi...@apache.org>.

morningman closed issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count #3932
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [CodeRefactor] Generate jave files using maven (#4133)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count (#3932)
   + [ColocateJoin] Support table join itself by colocate join (#4231)
   + [Load] Support import true or false as boolean value (#3898)
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   + Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] marising commented on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

marising commented on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-675815496


   Please merge the feature：
   
   [Feature][Cache] Doris caches query results based on partition #2581
   
   
   
   
   
   
   
   LiHaibo 2020-8-19
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   At 2020-08-17 19:52:47, "EmmyMiao87" <no...@github.com> wrote:
   
   Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   
   —
   You are receiving this because you were mentioned.
   Reply to this email directly, view it on GitHub, or unsubscribe.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count (#3932)
   + [ColocateJoin] Support table join itself by colocate join (#4231)
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 closed issue #4370: Release Notes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 closed issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 commented on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 commented on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-674835876


   # Credits
   
   @ZhangYu0123           
   @wfjcmcb               
   @Fullstop000           
   @sduzh                 
   @Stalary               
   @worker24h             
   @chaoyli               
   @vagetablechicken      
   @jmk1011               
   @funyeah               
   @wutiangan             
   @gengjun-git           
   @xinghuayu007          
   @EmmyMiao87            
   @songenjie             
   @acelyc111             
   @yangzhg               
   @Seaven                
   @hexian55              
   @ChenXiaofei           
   @WingsGo               
   @kangpinghuang         
   @wangbo                
   @weizuo93              
   @sdgshawn              
   @skyduy                
   @wyb                   
   @gaodayue              
   @HappenLee             
   @kangkaisen            
   @wuyunfeng             
   @HangyuanLiu           
   @xy720                 
   @liutang123            
   @caiconghui            
   @liyuance              
   @spaces-X              
   @hffariel              
   @decster               
   @blackfox1983          
   @Astralidea            
   @morningman            
   @hf200012              
   @xbyang18              
   @Youngwb               
   @imay                  
   @marising


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] morningman closed issue #4370: Release Notes 0.13.0

Posted by GitBox <gi...@apache.org>.

morningman closed issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

Posted by GitBox <gi...@apache.org>.

EmmyMiao87 edited a comment on issue #4370:
URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the `enable_spilling` is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, `hll_union` and `count`. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase index of ES
   
   There is `_id` field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with `aliases` or `wildcard index` such as `log_*`. User can easily search all those index by using aliases and wildcards to match those indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are executed concurrently in `OlapTableSink`, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task performance
   
   The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [CodeRefactor] Generate jave files using maven (#4133)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. (#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count (#3932)
   + [ColocateJoin] Support table join itself by colocate join (#4231)
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org