You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/10/22 09:32:53 UTC

[GitHub] [doris] morningman commented on a diff in pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

morningman commented on code in PR #13535:
URL: https://github.com/apache/doris/pull/13535#discussion_r1002428145


##########
be/src/vec/exec/scan/vfile_scanner.cpp:
##########
@@ -256,7 +256,20 @@ Status VFileScanner::_fill_columns_from_path(size_t rows) {
                 ss << "Unknown source slot descriptor, slot_id=" << slot_desc->id();
                 return Status::InternalError(ss.str());
             }
-            const std::string& column_from_path = range.columns_from_path[it->second];
+
+            // For external table query, find the index of column in path.
+            // Reason is _partition_slot_index_map may be incorrect for query
+            // Because query doesn't always search for all columns in a table and the select columns order is random.
+            vector<std::string>::const_iterator iit =

Review Comment:
   Better do this in prepare phase once. Otherwise, we have to find keys for each `_fill_columns_from_path()` call, which is inefficient.
   You can use the structure to same the mapping from `_partition_slot_descs` to the index of `columns_from_path_keys`



##########
be/src/vec/exec/format/parquet/vparquet_group_reader.h:
##########
@@ -35,12 +35,16 @@ class RowGroupReader {
 
     ParquetColumnReader::Statistics statistics();
 
+private:
+    Status read_empty_batch(size_t batch_size, size_t* read_rows, bool* _batch_eof);

Review Comment:
   ```suggestion
       Status _read_empty_batch(size_t batch_size, size_t* read_rows, bool* _batch_eof);
   ```



##########
be/src/vec/exec/format/parquet/vparquet_reader.cpp:
##########
@@ -156,12 +156,13 @@ Status ParquetReader::_init_read_columns() {
             _missing_cols.push_back(file_col_name);
         }
     }
+    // It is legal to get empty include_column_ids in query task.
+    _read_columns.clear();

Review Comment:
   It seems we don't need to call this `clear` for `_read_columns`, because `_read_columns` is only initialized once



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org