You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/25 08:42:11 UTC

[GitHub] [doris] yiguolei commented on a diff in pull request #10420: [improvement]Do not lazily read dict encoded columns

yiguolei commented on code in PR #10420:
URL: https://github.com/apache/doris/pull/10420#discussion_r906655600


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -704,61 +705,48 @@ void SegmentIterator::_vec_init_lazy_materialization() {
     //   When output block to query layer, delete column can be skipped.
     //  _schema.column_ids() stands for storage layer block schema, so it contains delete columnid
     //  we just regard delete column as common pred column here.
-    if (_schema.column_ids().size() > pred_column_ids.size()) {
-        for (auto cid : _schema.column_ids()) {
-            if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
-                FieldType type = _schema.column(cid)->type();
-
-                // todo(wb) maybe we can make read char type faster
-                // todo(wb) support map/array type
-                // todo(wb) consider multiple integer columns cost, such as 1000 columns, maybe lazy materialization faster
-                if (!_lazy_materialization_read &&
-                    (_is_need_vec_eval ||
-                     _is_need_short_eval) && // only when pred exists, we need to consider lazy materialization
-                    (type == OLAP_FIELD_TYPE_HLL || type == OLAP_FIELD_TYPE_OBJECT ||
-                     type == OLAP_FIELD_TYPE_VARCHAR || type == OLAP_FIELD_TYPE_CHAR ||
-                     type == OLAP_FIELD_TYPE_STRING || type == OLAP_FIELD_TYPE_BOOL ||
-                     type == OLAP_FIELD_TYPE_DATE || type == OLAP_FIELD_TYPE_DATETIME ||
-                     type == OLAP_FIELD_TYPE_DECIMAL)) {
-                    _lazy_materialization_read = true;
+    for (size_t i = 0; i < _schema.num_column_ids(); ++i) {
+        auto cid = _schema.column_id(i);
+        FieldType type = _schema.column(cid)->type();
+        if (!_is_pred_column[cid]) {
+            _non_predicate_columns.emplace_back(cid);
+            switch (type) {
+            case OLAP_FIELD_TYPE_VARCHAR:
+            case OLAP_FIELD_TYPE_CHAR:
+            case OLAP_FIELD_TYPE_STRING: {
+                // if a string column is all dict encoding in one segment, it's almost same as
+                // an int32_t column, it can be read together with predicate columns.
+                if (config::enable_low_cardinality_optimize &&
+                    _column_iterators[cid]->is_all_dict_encoding()) {
+                    pred_column_ids.insert(cid);

Review Comment:
   A litter confusing, we should distinguish first read columns and predicate columns. Here, the string column (all dict encoding) is not a predicate column but you add it to pred_column_ids. The code is very confusing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org