You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/14 12:26:03 UTC

[GitHub] [incubator-doris] Lchangliang opened a new pull request, #10136: [Feature] Lightweight schema change of add/drop column

Lchangliang opened a new pull request, #10136:
URL: https://github.com/apache/incubator-doris/pull/10136

   # Proposed changes
   
   Issue Number: close #10135
   
   ## Problem Summary:
   
   Optimize ideas for writing in associated issues
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
dataroaring commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912369533


##########
be/src/olap/rowset/rowset.cpp:
##########
@@ -35,6 +34,12 @@ Rowset::Rowset(const TabletSchema* schema, const FilePathDesc& rowset_path_desc,
         Version version = _rowset_meta->version();
         _is_cumulative = version.first != version.second;
     }
+    // build schema from RowsetMeta.tablet_schema or Tablet.tablet_schema
+    if (_rowset_meta->tablet_schema() != nullptr) {
+        _schema = _rowset_meta->tablet_schema();
+    } else {
+        _schema = schema;

Review Comment:
   _schema = _rowset_meta->tablet_schema() != nullptr ? _schema = _rowset_meta->tablet_schema() : schema;



##########
be/src/olap/delta_writer.cpp:
##########
@@ -197,7 +197,7 @@ Status DeltaWriter::write(const vectorized::Block* block, const std::vector<int>
     if (_is_cancelled) {
         return Status::OLAPInternalError(OLAP_ERR_ALREADY_CANCELLED);
     }
-
+    LOG(INFO) << "3 block columns: " << block->columns();

Review Comment:
   remove this log.



##########
be/src/olap/rowset/segment_v2/segment_writer.cpp:
##########
@@ -116,6 +116,7 @@ Status SegmentWriter::append_block(const vectorized::Block* block, size_t row_po
                                    size_t num_rows) {
     assert(block && num_rows > 0 && row_pos + num_rows <= block->rows() &&
            block->columns() == _column_writers.size());
+    LOG(INFO) << "block columns: " << block->columns();

Review Comment:
   remove the log please.



##########
be/src/olap/segment_loader.cpp:
##########
@@ -58,16 +59,17 @@ void SegmentLoader::_insert(const SegmentLoader::CacheKey& key, SegmentLoader::C
 }
 
 Status SegmentLoader::load_segments(const BetaRowsetSharedPtr& rowset,
-                                    SegmentCacheHandle* cache_handle, bool use_cache) {
-    SegmentLoader::CacheKey cache_key(rowset->rowset_id());
-    if (_lookup(cache_key, cache_handle)) {
+                                    SegmentCacheHandle* cache_handle,
+                                    const TabletSchema* read_tablet_schema, bool use_cache) {
+    SegmentLoader::CacheKey cache_key(rowset->rowset_id(), *read_tablet_schema);

Review Comment:
   Could we use schema version as part of the cache key instead of schema it self?



##########
be/src/exec/olap_scanner.cpp:
##########
@@ -86,6 +88,14 @@ Status OlapScanner::prepare(
             LOG(WARNING) << ss.str();
             return Status::InternalError(ss.str());
         }
+        _tablet_schema = _tablet->tablet_schema();
+        if (!_parent->_olap_scan_node.columns_desc.empty() &&
+            _parent->_olap_scan_node.columns_desc[0].col_unique_id >= 0) {
+            _tablet_schema.clear_columns();
+            for (const auto& column_desc : _parent->_olap_scan_node.columns_desc) {
+                _tablet_schema.append_column(TabletColumn(column_desc));
+            }
+        }

Review Comment:
   We'd better put a comment here. e.g. Originally scanner get TabletSchema from tablet object in be. To support lightweight schema change for adding / dropping columns, tabletschema is bounded to rowset and tablet's schema maybe outdated, so we have to use schema from a query plan, fe puts schema in query plans.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Lchangliang commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r914356254


##########
be/src/olap/segment_loader.cpp:
##########
@@ -58,16 +59,17 @@ void SegmentLoader::_insert(const SegmentLoader::CacheKey& key, SegmentLoader::C
 }
 
 Status SegmentLoader::load_segments(const BetaRowsetSharedPtr& rowset,
-                                    SegmentCacheHandle* cache_handle, bool use_cache) {
-    SegmentLoader::CacheKey cache_key(rowset->rowset_id());
-    if (_lookup(cache_key, cache_handle)) {
+                                    SegmentCacheHandle* cache_handle,
+                                    const TabletSchema* read_tablet_schema, bool use_cache) {
+    SegmentLoader::CacheKey cache_key(rowset->rowset_id(), *read_tablet_schema);

Review Comment:
   @yiguolei will rewrite this part soon. It is not necessary to modify it.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Lchangliang commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r916502360


##########
be/src/olap/delta_writer.cpp:
##########
@@ -121,10 +121,11 @@ Status DeltaWriter::init() {
         RETURN_NOT_OK(_storage_engine->txn_manager()->prepare_txn(_req.partition_id, _tablet,
                                                                   _req.txn_id, _req.load_id));
     }
+    // build tablet schema in request level
+    _build_current_tablet_schema(_req.index_id, _req.ptable_schema_param, _tablet->tablet_schema());

Review Comment:
   I don't think that's a problem. In _build_current_tablet_schema, the code will check field 'columns_desc' has value or not. 1.1 send the old proto to 1.2, BE will use tablet_schema directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912365661


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java:
##########
@@ -4922,4 +4916,190 @@ private static void addTableComment(Table table, StringBuilder sb) {
             sb.append("\nCOMMENT '").append(table.getComment(true)).append("'");
         }
     }
+
+    // the invoker should keep table's write lock
+    public void modifyTableAddOrDropColumns(Database db, OlapTable olapTable,

Review Comment:
   Better move this method to `SchemaChangeHandler`, don't make Catalog.java too big



##########
fe/fe-core/src/main/java/org/apache/doris/persist/OperationType.java:
##########
@@ -222,6 +222,8 @@ public class OperationType {
     // policy 310-320
     public static final short OP_CREATE_POLICY = 310;
     public static final short OP_DROP_POLICY = 311;
+    //schema change for add and drop columns 320-329
+    public static final short OP_MODIFY_TABLE_ADD_OR_DROP_COLUMNS = 320;

Review Comment:
   better put it after `public static final short OP_MODIFY_TABLE_ENGINE = 127;`
   and use `128`



##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java:
##########
@@ -4922,4 +4916,190 @@ private static void addTableComment(Table table, StringBuilder sb) {
             sb.append("\nCOMMENT '").append(table.getComment(true)).append("'");
         }
     }
+
+    // the invoker should keep table's write lock
+    public void modifyTableAddOrDropColumns(Database db, OlapTable olapTable,
+            Map<Long, LinkedList<Column>> indexSchemaMap,
+            List<Index> indexes, long jobId, boolean isReplay) throws DdlException {
+
+        LOG.debug("indexSchemaMap:{}, indexes:{}", indexSchemaMap, indexes);
+        if (olapTable.getState() == OlapTableState.ROLLUP) {
+            throw new DdlException("Table[" + olapTable.getName() + "]'s is doing ROLLUP job");
+        }
+
+        // for now table's state can only be NORMAL
+        Preconditions.checkState(olapTable.getState() == OlapTableState.NORMAL, olapTable.getState().name());
+
+        // for bitmapIndex
+        boolean hasIndexChange = false;
+        Set<Index> newSet = new HashSet<>(indexes);
+        Set<Index> oriSet = new HashSet<>(olapTable.getIndexes());
+        if (!newSet.equals(oriSet)) {
+            hasIndexChange = true;
+        }
+
+        // begin checking each table
+        // ATTN: DO NOT change any meta in this loop
+        Map<Long, List<Column>> changedIndexIdToSchema = Maps.newHashMap();
+        for (Long alterIndexId : indexSchemaMap.keySet()) {
+            // Must get all columns including invisible columns.
+            // Because in alter process, all columns must be considered.
+            List<Column> alterSchema = indexSchemaMap.get(alterIndexId);
+
+            LOG.debug("index[{}] is changed. start checking...", alterIndexId);
+            // 1. check order: a) has key; b) value after key
+            boolean meetValue = false;
+            boolean hasKey = false;
+            for (Column column : alterSchema) {
+                if (column.isKey() && meetValue) {
+                    throw new DdlException(
+                            "Invalid column order. value should be after key. index[" + olapTable.getIndexNameById(
+                                    alterIndexId) + "]");
+                }
+                if (!column.isKey()) {
+                    meetValue = true;
+                } else {
+                    hasKey = true;
+                }
+            }
+            if (!hasKey) {
+                throw new DdlException("No key column left. index[" + olapTable.getIndexNameById(alterIndexId) + "]");
+            }
+
+            // 2. check partition key
+            PartitionInfo partitionInfo = olapTable.getPartitionInfo();
+            if (partitionInfo.getType() == PartitionType.RANGE || partitionInfo.getType() == PartitionType.LIST) {
+                List<Column> partitionColumns = partitionInfo.getPartitionColumns();
+                for (Column partitionCol : partitionColumns) {
+                    boolean found = false;
+                    for (Column alterColumn : alterSchema) {
+                        if (alterColumn.nameEquals(partitionCol.getName(), true)) {
+                            found = true;
+                            break;
+                        }
+                    } // end for alterColumns
+
+                    if (!found && alterIndexId == olapTable.getBaseIndexId()) {
+                        // 2.1 partition column cannot be deleted.
+                        throw new DdlException(
+                                "Partition column[" + partitionCol.getName() + "] cannot be dropped. index["
+                                        + olapTable.getIndexNameById(alterIndexId) + "]");
+                    }
+                } // end for partitionColumns
+            }
+
+            // 3. check distribution key:
+            DistributionInfo distributionInfo = olapTable.getDefaultDistributionInfo();
+            if (distributionInfo.getType() == DistributionInfoType.HASH) {
+                List<Column> distributionColumns = ((HashDistributionInfo) distributionInfo).getDistributionColumns();
+                for (Column distributionCol : distributionColumns) {
+                    boolean found = false;
+                    for (Column alterColumn : alterSchema) {
+                        if (alterColumn.nameEquals(distributionCol.getName(), true)) {
+                            found = true;
+                            break;
+                        }
+                    } // end for alterColumns
+                    if (!found && alterIndexId == olapTable.getBaseIndexId()) {
+                        // 2.2 distribution column cannot be deleted.
+                        throw new DdlException(
+                                "Distribution column[" + distributionCol.getName() + "] cannot be dropped. index["
+                                        + olapTable.getIndexNameById(alterIndexId) + "]");
+                    }
+                } // end for distributionCols
+            }
+
+            // 5. store the changed columns for edit log
+            changedIndexIdToSchema.put(alterIndexId, alterSchema);
+
+            LOG.debug("schema change[{}-{}-{}] check pass.", db.getId(), olapTable.getId(), alterIndexId);
+        } // end for indices
+
+        if (changedIndexIdToSchema.isEmpty() && !hasIndexChange) {
+            throw new DdlException("Nothing is changed. please check your alter stmt.");
+        }
+
+        //update base index schema
+        long baseIndexId = olapTable.getBaseIndexId();
+        List<Long> indexIds = new ArrayList<Long>();
+        indexIds.add(baseIndexId);
+        indexIds.addAll(olapTable.getIndexIdListExceptBaseIndex());
+        for (int i = 0; i < indexIds.size(); i++) {
+            List<Column> indexSchema = indexSchemaMap.get(indexIds.get(i));
+            MaterializedIndexMeta currentIndexMeta = olapTable.getIndexMetaByIndexId(indexIds.get(i));
+            currentIndexMeta.setSchema(indexSchema);
+
+            int currentSchemaVersion = currentIndexMeta.getSchemaVersion();
+            int newSchemaVersion = currentSchemaVersion + 1;
+            currentIndexMeta.setSchemaVersion(newSchemaVersion);
+            // generate schema hash for new index has to generate a new schema hash not equal to current schema hash
+            int currentSchemaHash = currentIndexMeta.getSchemaHash();
+            int newSchemaHash = Util.generateSchemaHash();
+            while (currentSchemaHash == newSchemaHash) {
+                newSchemaHash = Util.generateSchemaHash();
+            }
+            currentIndexMeta.setSchemaHash(newSchemaHash);
+        }
+        olapTable.setIndexes(indexes);
+        olapTable.rebuildFullSchema();
+
+        //update max column unique id
+        int maxColUniqueId = olapTable.getMaxColUniqueId();
+        for (Column column : indexSchemaMap.get(olapTable.getBaseIndexId())) {
+            if (column.getUniqueId() > maxColUniqueId) {
+                maxColUniqueId = column.getUniqueId();
+            }
+        }
+        olapTable.setMaxColUniqueId(maxColUniqueId);
+
+        if (!isReplay) {
+            TableAddOrDropColumnsInfo info = new TableAddOrDropColumnsInfo(db.getId(), olapTable.getId(),
+                    indexSchemaMap, indexes, jobId);
+            LOG.debug("logModifyTableAddOrDropColumns info:{}", info);
+            editLog.logModifyTableAddOrDropColumns(info);
+        }
+
+        //for compatibility, we need create a finished state schema change job v2
+
+        SchemaChangeJobV2 schemaChangeJob = new SchemaChangeJobV2(jobId, db.getId(), olapTable.getId(),
+                olapTable.getName(), 1000);
+        schemaChangeJob.setJobState(AlterJobV2.JobState.FINISHED);
+        schemaChangeJob.setFinishedTimeMs(System.currentTimeMillis());

Review Comment:
   I think we should save the read `createTime` and `finishTime` of this schema change job in `TableAddOrDropColumnsInfo`.(Maybe we can just use one timestamp for both create and finish time).
   Otherwise, it will make user confused because after FE restart, the finish time of the schema change job will be changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Lchangliang commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912368929


##########
gensrc/proto/olap_file.proto:
##########
@@ -93,6 +93,8 @@ message RowsetMetaPB {
     optional int64 num_segments = 22;
     // rowset id definition, it will replace required rowset id 
     optional string rowset_id_v2 = 23;
+    // tablet meta pb, for compaction
+    optional TabletSchemaPB tablet_schema = 24;

Review Comment:
   There are two reasons that RowsetMeta need the new field. 
   1. When compaction, we need to use newset version schema not the schema which newest rowset has. The diff is that when a long load is going, the schema change, and then a short load begin. When the short load was over faster than the long load, the newest schema is the short load rowset has. That's because of that segment don't persist the  schema version too.
   2. When we need to use rowset schema,we can read it directly not need to read segment footer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Lchangliang commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r914359306


##########
be/src/olap/schema_change.cpp:
##########
@@ -1649,8 +1650,8 @@ Status VSchemaChangeWithSorting::_internal_sorting(
     MultiBlockMerger merger(new_tablet);
 
     std::unique_ptr<RowsetWriter> rowset_writer;
-    RETURN_IF_ERROR(
-            new_tablet->create_rowset_writer(version, VISIBLE, segments_overlap, &rowset_writer));
+    RETURN_IF_ERROR(new_tablet->create_rowset_writer(version, VISIBLE, segments_overlap,
+                                                     &new_tablet->tablet_schema(), &rowset_writer));

Review Comment:
   it is not only use for new_tablet. Such as PushHandler::_convert_v2, Compaction::construct_output_rowset_writer. They should set newer schema from outside.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Lchangliang commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912369245


##########
be/src/olap/tablet.cpp:
##########
@@ -1636,4 +1641,13 @@ std::shared_ptr<MemTracker>& Tablet::get_compaction_mem_tracker(CompactionType c
     }
 }
 
+const TabletSchema& Tablet::tablet_schema() const {
+    std::shared_lock wrlock(_meta_lock);
+    const RowsetSharedPtr last_rowset = rowset_with_max_version();

Review Comment:
   the continue code process it. 
   if (last_rowset == nullptr) {
           return _schema;
    }
   that will use the schema from BE saves.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r916447021


##########
be/src/olap/schema_change.cpp:
##########
@@ -2150,9 +2169,9 @@ Status SchemaChangeHandler::_convert_historical_rowsets(const SchemaChangeParams
     bool sc_directly = false;
 

Review Comment:
   schema change 可能会有一个问题。
   比如 tablet1 我们做了add column C, 然后就一直没有导入数据; 然后我们做一个schema change 2, 按照C 列排序,此时我们往tablet 2写,从tablet 1 中读取的数据是没有C 列的。
   所以schema change 是不是应该,从FE 传递进来原始的和新的schema。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r911827461


##########
gensrc/proto/olap_file.proto:
##########
@@ -93,6 +93,8 @@ message RowsetMetaPB {
     optional int64 num_segments = 22;
     // rowset id definition, it will replace required rowset id 
     optional string rowset_id_v2 = 23;
+    // tablet meta pb, for compaction
+    optional TabletSchemaPB tablet_schema = 24;

Review Comment:
   No need, there is already a tablet schema for every segment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r916442404


##########
be/src/olap/convert_rowset.cpp:
##########
@@ -0,0 +1,141 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/convert_rowset.h"
+
+namespace doris {
+
+Status ConvertRowset::do_convert() {

Review Comment:
   这个文件没用了,直接删了吧



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10136:
URL: https://github.com/apache/doris/pull/10136#issuecomment-1181559743

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r913785236


##########
be/src/olap/compaction.h:
##########
@@ -67,7 +67,7 @@ class Compaction {
     Status modify_rowsets();
     void gc_output_rowset();
 
-    Status construct_output_rowset_writer();
+    Status construct_output_rowset_writer(const TabletSchema* schama);

Review Comment:
   typo error



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on PR #10136:
URL: https://github.com/apache/doris/pull/10136#issuecomment-1172179514

   There is already a schema info for every segment. I think you could use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r916455664


##########
be/src/runtime/tablets_channel.h:
##########
@@ -204,6 +204,7 @@ Status TabletsChannel::add_batch(const TabletWriterAddRequest& request,
         if constexpr (std::is_same_v<TabletWriterAddRequest, PTabletWriterAddBatchRequest>) {
             return RowBatch(*_row_desc, request.row_batch());
         } else {
+            LOG(INFO) << "2 block columns: " << vectorized::Block(request.block()).columns();

Review Comment:
   调试日志删掉



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10136:
URL: https://github.com/apache/doris/pull/10136#issuecomment-1181559811

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Lchangliang commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912368972


##########
be/src/exec/olap_scanner.cpp:
##########
@@ -86,6 +88,14 @@ Status OlapScanner::prepare(
             LOG(WARNING) << ss.str();
             return Status::InternalError(ss.str());
         }
+        _tablet_schema = _tablet->tablet_schema();

Review Comment:
   VOlapScanner is modified too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] SWJTU-ZhangLei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
SWJTU-ZhangLei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912498082


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java:
##########
@@ -4922,4 +4916,190 @@ private static void addTableComment(Table table, StringBuilder sb) {
             sb.append("\nCOMMENT '").append(table.getComment(true)).append("'");
         }
     }
+
+    // the invoker should keep table's write lock
+    public void modifyTableAddOrDropColumns(Database db, OlapTable olapTable,
+            Map<Long, LinkedList<Column>> indexSchemaMap,
+            List<Index> indexes, long jobId, boolean isReplay) throws DdlException {
+
+        LOG.debug("indexSchemaMap:{}, indexes:{}", indexSchemaMap, indexes);
+        if (olapTable.getState() == OlapTableState.ROLLUP) {
+            throw new DdlException("Table[" + olapTable.getName() + "]'s is doing ROLLUP job");
+        }
+
+        // for now table's state can only be NORMAL
+        Preconditions.checkState(olapTable.getState() == OlapTableState.NORMAL, olapTable.getState().name());
+
+        // for bitmapIndex
+        boolean hasIndexChange = false;
+        Set<Index> newSet = new HashSet<>(indexes);
+        Set<Index> oriSet = new HashSet<>(olapTable.getIndexes());
+        if (!newSet.equals(oriSet)) {
+            hasIndexChange = true;
+        }
+
+        // begin checking each table
+        // ATTN: DO NOT change any meta in this loop
+        Map<Long, List<Column>> changedIndexIdToSchema = Maps.newHashMap();
+        for (Long alterIndexId : indexSchemaMap.keySet()) {
+            // Must get all columns including invisible columns.
+            // Because in alter process, all columns must be considered.
+            List<Column> alterSchema = indexSchemaMap.get(alterIndexId);
+
+            LOG.debug("index[{}] is changed. start checking...", alterIndexId);
+            // 1. check order: a) has key; b) value after key
+            boolean meetValue = false;
+            boolean hasKey = false;
+            for (Column column : alterSchema) {
+                if (column.isKey() && meetValue) {
+                    throw new DdlException(
+                            "Invalid column order. value should be after key. index[" + olapTable.getIndexNameById(
+                                    alterIndexId) + "]");
+                }
+                if (!column.isKey()) {
+                    meetValue = true;
+                } else {
+                    hasKey = true;
+                }
+            }
+            if (!hasKey) {
+                throw new DdlException("No key column left. index[" + olapTable.getIndexNameById(alterIndexId) + "]");
+            }
+
+            // 2. check partition key
+            PartitionInfo partitionInfo = olapTable.getPartitionInfo();
+            if (partitionInfo.getType() == PartitionType.RANGE || partitionInfo.getType() == PartitionType.LIST) {
+                List<Column> partitionColumns = partitionInfo.getPartitionColumns();
+                for (Column partitionCol : partitionColumns) {
+                    boolean found = false;
+                    for (Column alterColumn : alterSchema) {
+                        if (alterColumn.nameEquals(partitionCol.getName(), true)) {
+                            found = true;
+                            break;
+                        }
+                    } // end for alterColumns
+
+                    if (!found && alterIndexId == olapTable.getBaseIndexId()) {
+                        // 2.1 partition column cannot be deleted.
+                        throw new DdlException(
+                                "Partition column[" + partitionCol.getName() + "] cannot be dropped. index["
+                                        + olapTable.getIndexNameById(alterIndexId) + "]");
+                    }
+                } // end for partitionColumns
+            }
+
+            // 3. check distribution key:
+            DistributionInfo distributionInfo = olapTable.getDefaultDistributionInfo();
+            if (distributionInfo.getType() == DistributionInfoType.HASH) {
+                List<Column> distributionColumns = ((HashDistributionInfo) distributionInfo).getDistributionColumns();
+                for (Column distributionCol : distributionColumns) {
+                    boolean found = false;
+                    for (Column alterColumn : alterSchema) {
+                        if (alterColumn.nameEquals(distributionCol.getName(), true)) {
+                            found = true;
+                            break;
+                        }
+                    } // end for alterColumns
+                    if (!found && alterIndexId == olapTable.getBaseIndexId()) {
+                        // 2.2 distribution column cannot be deleted.
+                        throw new DdlException(
+                                "Distribution column[" + distributionCol.getName() + "] cannot be dropped. index["
+                                        + olapTable.getIndexNameById(alterIndexId) + "]");
+                    }
+                } // end for distributionCols
+            }
+
+            // 5. store the changed columns for edit log
+            changedIndexIdToSchema.put(alterIndexId, alterSchema);
+
+            LOG.debug("schema change[{}-{}-{}] check pass.", db.getId(), olapTable.getId(), alterIndexId);
+        } // end for indices
+
+        if (changedIndexIdToSchema.isEmpty() && !hasIndexChange) {
+            throw new DdlException("Nothing is changed. please check your alter stmt.");
+        }
+
+        //update base index schema
+        long baseIndexId = olapTable.getBaseIndexId();
+        List<Long> indexIds = new ArrayList<Long>();
+        indexIds.add(baseIndexId);
+        indexIds.addAll(olapTable.getIndexIdListExceptBaseIndex());
+        for (int i = 0; i < indexIds.size(); i++) {
+            List<Column> indexSchema = indexSchemaMap.get(indexIds.get(i));
+            MaterializedIndexMeta currentIndexMeta = olapTable.getIndexMetaByIndexId(indexIds.get(i));
+            currentIndexMeta.setSchema(indexSchema);
+
+            int currentSchemaVersion = currentIndexMeta.getSchemaVersion();
+            int newSchemaVersion = currentSchemaVersion + 1;
+            currentIndexMeta.setSchemaVersion(newSchemaVersion);
+            // generate schema hash for new index has to generate a new schema hash not equal to current schema hash
+            int currentSchemaHash = currentIndexMeta.getSchemaHash();
+            int newSchemaHash = Util.generateSchemaHash();
+            while (currentSchemaHash == newSchemaHash) {
+                newSchemaHash = Util.generateSchemaHash();
+            }
+            currentIndexMeta.setSchemaHash(newSchemaHash);
+        }
+        olapTable.setIndexes(indexes);
+        olapTable.rebuildFullSchema();
+
+        //update max column unique id
+        int maxColUniqueId = olapTable.getMaxColUniqueId();
+        for (Column column : indexSchemaMap.get(olapTable.getBaseIndexId())) {
+            if (column.getUniqueId() > maxColUniqueId) {
+                maxColUniqueId = column.getUniqueId();
+            }
+        }
+        olapTable.setMaxColUniqueId(maxColUniqueId);
+
+        if (!isReplay) {
+            TableAddOrDropColumnsInfo info = new TableAddOrDropColumnsInfo(db.getId(), olapTable.getId(),
+                    indexSchemaMap, indexes, jobId);
+            LOG.debug("logModifyTableAddOrDropColumns info:{}", info);
+            editLog.logModifyTableAddOrDropColumns(info);
+        }
+
+        //for compatibility, we need create a finished state schema change job v2
+
+        SchemaChangeJobV2 schemaChangeJob = new SchemaChangeJobV2(jobId, db.getId(), olapTable.getId(),
+                olapTable.getName(), 1000);
+        schemaChangeJob.setJobState(AlterJobV2.JobState.FINISHED);
+        schemaChangeJob.setFinishedTimeMs(System.currentTimeMillis());

Review Comment:
   > I think we should save the read `createTime` and `finishTime` of this schema change job in `TableAddOrDropColumnsInfo`.(Maybe we can just use one timestamp for both create and finish time). Otherwise, it will make user confused because after FE restart, the finish time of the schema change job will be changed.
   
   > AlterJobV2's constructor will calculate createTime.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912443030


##########
be/src/olap/compaction.cpp:
##########
@@ -138,8 +138,21 @@ Status Compaction::do_compaction_impl(int64_t permits) {
 
     LOG(INFO) << "start " << merge_type << compaction_name() << ". tablet=" << _tablet->full_name()
               << ", output_version=" << _output_version << ", permits: " << permits;
+    const TabletSchema* cur_tablet_schema = nullptr;
+    if (_input_rowsets.front()->rowset_meta()->tablet_schema() == nullptr) {
+        cur_tablet_schema = &(_tablet->tablet_schema());
+    } else {
+        // get cur schema if rowset schema exist, rowset schema must be newer than tablet schema
+        auto max_version_rowset =

Review Comment:
   rowset with max version may not have the latest tablet schema.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912443030


##########
be/src/olap/compaction.cpp:
##########
@@ -138,8 +138,21 @@ Status Compaction::do_compaction_impl(int64_t permits) {
 
     LOG(INFO) << "start " << merge_type << compaction_name() << ". tablet=" << _tablet->full_name()
               << ", output_version=" << _output_version << ", permits: " << permits;
+    const TabletSchema* cur_tablet_schema = nullptr;
+    if (_input_rowsets.front()->rowset_meta()->tablet_schema() == nullptr) {
+        cur_tablet_schema = &(_tablet->tablet_schema());
+    } else {
+        // get cur schema if rowset schema exist, rowset schema must be newer than tablet schema
+        auto max_version_rowset =

Review Comment:
   Is it equal to call tablet->tablet_schema()?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yixiutt commented on pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yixiutt commented on PR #10136:
URL: https://github.com/apache/incubator-doris/pull/10136#issuecomment-1156032038

   > What a fancy improvement! And after reading your issue, I'm curious will it increase the pressure on metadata storage because we persist all schema of all rowsets?
   
   schema is really small, and one schema per rowset is really controllable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r911842036


##########
be/src/exec/olap_scanner.cpp:
##########
@@ -86,6 +88,14 @@ Status OlapScanner::prepare(
             LOG(WARNING) << ss.str();
             return Status::InternalError(ss.str());
         }
+        _tablet_schema = _tablet->tablet_schema();

Review Comment:
   OlapScanner is not used in vectorized engine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] SWJTU-ZhangLei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
SWJTU-ZhangLei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912498009


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java:
##########
@@ -4922,4 +4916,190 @@ private static void addTableComment(Table table, StringBuilder sb) {
             sb.append("\nCOMMENT '").append(table.getComment(true)).append("'");
         }
     }
+
+    // the invoker should keep table's write lock
+    public void modifyTableAddOrDropColumns(Database db, OlapTable olapTable,
+            Map<Long, LinkedList<Column>> indexSchemaMap,
+            List<Index> indexes, long jobId, boolean isReplay) throws DdlException {
+
+        LOG.debug("indexSchemaMap:{}, indexes:{}", indexSchemaMap, indexes);
+        if (olapTable.getState() == OlapTableState.ROLLUP) {
+            throw new DdlException("Table[" + olapTable.getName() + "]'s is doing ROLLUP job");
+        }
+
+        // for now table's state can only be NORMAL
+        Preconditions.checkState(olapTable.getState() == OlapTableState.NORMAL, olapTable.getState().name());
+
+        // for bitmapIndex
+        boolean hasIndexChange = false;
+        Set<Index> newSet = new HashSet<>(indexes);
+        Set<Index> oriSet = new HashSet<>(olapTable.getIndexes());
+        if (!newSet.equals(oriSet)) {
+            hasIndexChange = true;
+        }
+
+        // begin checking each table
+        // ATTN: DO NOT change any meta in this loop
+        Map<Long, List<Column>> changedIndexIdToSchema = Maps.newHashMap();
+        for (Long alterIndexId : indexSchemaMap.keySet()) {
+            // Must get all columns including invisible columns.
+            // Because in alter process, all columns must be considered.
+            List<Column> alterSchema = indexSchemaMap.get(alterIndexId);
+
+            LOG.debug("index[{}] is changed. start checking...", alterIndexId);
+            // 1. check order: a) has key; b) value after key
+            boolean meetValue = false;
+            boolean hasKey = false;
+            for (Column column : alterSchema) {
+                if (column.isKey() && meetValue) {
+                    throw new DdlException(
+                            "Invalid column order. value should be after key. index[" + olapTable.getIndexNameById(
+                                    alterIndexId) + "]");
+                }
+                if (!column.isKey()) {
+                    meetValue = true;
+                } else {
+                    hasKey = true;
+                }
+            }
+            if (!hasKey) {
+                throw new DdlException("No key column left. index[" + olapTable.getIndexNameById(alterIndexId) + "]");
+            }
+
+            // 2. check partition key
+            PartitionInfo partitionInfo = olapTable.getPartitionInfo();
+            if (partitionInfo.getType() == PartitionType.RANGE || partitionInfo.getType() == PartitionType.LIST) {
+                List<Column> partitionColumns = partitionInfo.getPartitionColumns();
+                for (Column partitionCol : partitionColumns) {
+                    boolean found = false;
+                    for (Column alterColumn : alterSchema) {
+                        if (alterColumn.nameEquals(partitionCol.getName(), true)) {
+                            found = true;
+                            break;
+                        }
+                    } // end for alterColumns
+
+                    if (!found && alterIndexId == olapTable.getBaseIndexId()) {
+                        // 2.1 partition column cannot be deleted.
+                        throw new DdlException(
+                                "Partition column[" + partitionCol.getName() + "] cannot be dropped. index["
+                                        + olapTable.getIndexNameById(alterIndexId) + "]");
+                    }
+                } // end for partitionColumns
+            }
+
+            // 3. check distribution key:
+            DistributionInfo distributionInfo = olapTable.getDefaultDistributionInfo();
+            if (distributionInfo.getType() == DistributionInfoType.HASH) {
+                List<Column> distributionColumns = ((HashDistributionInfo) distributionInfo).getDistributionColumns();
+                for (Column distributionCol : distributionColumns) {
+                    boolean found = false;
+                    for (Column alterColumn : alterSchema) {
+                        if (alterColumn.nameEquals(distributionCol.getName(), true)) {
+                            found = true;
+                            break;
+                        }
+                    } // end for alterColumns
+                    if (!found && alterIndexId == olapTable.getBaseIndexId()) {
+                        // 2.2 distribution column cannot be deleted.
+                        throw new DdlException(
+                                "Distribution column[" + distributionCol.getName() + "] cannot be dropped. index["
+                                        + olapTable.getIndexNameById(alterIndexId) + "]");
+                    }
+                } // end for distributionCols
+            }
+
+            // 5. store the changed columns for edit log
+            changedIndexIdToSchema.put(alterIndexId, alterSchema);
+
+            LOG.debug("schema change[{}-{}-{}] check pass.", db.getId(), olapTable.getId(), alterIndexId);
+        } // end for indices
+
+        if (changedIndexIdToSchema.isEmpty() && !hasIndexChange) {
+            throw new DdlException("Nothing is changed. please check your alter stmt.");
+        }
+
+        //update base index schema
+        long baseIndexId = olapTable.getBaseIndexId();
+        List<Long> indexIds = new ArrayList<Long>();
+        indexIds.add(baseIndexId);
+        indexIds.addAll(olapTable.getIndexIdListExceptBaseIndex());
+        for (int i = 0; i < indexIds.size(); i++) {
+            List<Column> indexSchema = indexSchemaMap.get(indexIds.get(i));
+            MaterializedIndexMeta currentIndexMeta = olapTable.getIndexMetaByIndexId(indexIds.get(i));
+            currentIndexMeta.setSchema(indexSchema);
+
+            int currentSchemaVersion = currentIndexMeta.getSchemaVersion();
+            int newSchemaVersion = currentSchemaVersion + 1;
+            currentIndexMeta.setSchemaVersion(newSchemaVersion);
+            // generate schema hash for new index has to generate a new schema hash not equal to current schema hash
+            int currentSchemaHash = currentIndexMeta.getSchemaHash();
+            int newSchemaHash = Util.generateSchemaHash();
+            while (currentSchemaHash == newSchemaHash) {
+                newSchemaHash = Util.generateSchemaHash();
+            }
+            currentIndexMeta.setSchemaHash(newSchemaHash);
+        }
+        olapTable.setIndexes(indexes);
+        olapTable.rebuildFullSchema();
+
+        //update max column unique id
+        int maxColUniqueId = olapTable.getMaxColUniqueId();
+        for (Column column : indexSchemaMap.get(olapTable.getBaseIndexId())) {
+            if (column.getUniqueId() > maxColUniqueId) {
+                maxColUniqueId = column.getUniqueId();
+            }
+        }
+        olapTable.setMaxColUniqueId(maxColUniqueId);
+
+        if (!isReplay) {
+            TableAddOrDropColumnsInfo info = new TableAddOrDropColumnsInfo(db.getId(), olapTable.getId(),
+                    indexSchemaMap, indexes, jobId);
+            LOG.debug("logModifyTableAddOrDropColumns info:{}", info);
+            editLog.logModifyTableAddOrDropColumns(info);
+        }
+
+        //for compatibility, we need create a finished state schema change job v2
+
+        SchemaChangeJobV2 schemaChangeJob = new SchemaChangeJobV2(jobId, db.getId(), olapTable.getId(),
+                olapTable.getName(), 1000);
+        schemaChangeJob.setJobState(AlterJobV2.JobState.FINISHED);
+        schemaChangeJob.setFinishedTimeMs(System.currentTimeMillis());

Review Comment:
   AlterJobV2's constructor will calculate createTime.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912351667


##########
be/src/olap/tablet.cpp:
##########
@@ -1636,4 +1641,13 @@ std::shared_ptr<MemTracker>& Tablet::get_compaction_mem_tracker(CompactionType c
     }
 }
 
+const TabletSchema& Tablet::tablet_schema() const {
+    std::shared_lock wrlock(_meta_lock);
+    const RowsetSharedPtr last_rowset = rowset_with_max_version();

Review Comment:
   Currently rowset does not have tablet schema. So tablet schema == null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r913785829


##########
be/src/olap/convert_rowset.cpp:
##########
@@ -48,8 +48,9 @@ Status ConvertRowset::do_convert() {
 
         std::unique_ptr<RowsetWriter> output_rs_writer;
         RETURN_NOT_OK(_tablet->create_rowset_writer(output_version, VISIBLE, NONOVERLAPPING,
-                                                    &output_rs_writer));
-        res = Merger::merge_rowsets(_tablet, ReaderType::READER_BASE_COMPACTION, {input_rs_reader},
+                                                    &_tablet->tablet_schema(), &output_rs_writer));

Review Comment:
   This file is useless now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring merged pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
dataroaring merged PR #10136:
URL: https://github.com/apache/doris/pull/10136


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r916447721


##########
be/src/olap/tablet_schema.cpp:
##########
@@ -460,27 +503,87 @@ void TabletSchema::init_from_pb(const TabletSchemaPB& schema) {
     _sort_type = schema.sort_type();
     _sort_col_num = schema.sort_col_num();
     _compression_type = schema.compression_type();
+    _schema_version = schema.schema_version();
 }
 
-void TabletSchema::to_schema_pb(TabletSchemaPB* tablet_meta_pb) {
-    tablet_meta_pb->set_keys_type(_keys_type);
+void TabletSchema::build_current_tablet_schema(int64_t index_id,
+                                               const POlapTableSchemaParam& ptable_schema_param,
+                                               const TabletSchema& ori_tablet_schema) {
+    // copy from ori_tablet_schema
+    _keys_type = ori_tablet_schema.keys_type();

Review Comment:
   这里的逻辑我看也只有导入用到了,我们可不可以直接导入的时候优FE 生成PB 结构,这个逻辑感觉跟创建表的重合了



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r916444410


##########
be/src/olap/delta_writer.cpp:
##########
@@ -121,10 +121,11 @@ Status DeltaWriter::init() {
         RETURN_NOT_OK(_storage_engine->txn_manager()->prepare_txn(_req.partition_id, _tablet,
                                                                   _req.txn_id, _req.load_id));
     }
+    // build tablet schema in request level
+    _build_current_tablet_schema(_req.index_id, _req.ptable_schema_param, _tablet->tablet_schema());

Review Comment:
   在升级过程中可能出现,当前机器是1.2 版本,老机器是1.1 版本,那么导入数据时,1.1 版本的机器转发过来的数据是没schema的,此时这里会有错误。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r916444639


##########
be/src/olap/delta_writer.h:
##########
@@ -116,7 +122,11 @@ class DeltaWriter {
     // TODO: Recheck the lifetime of _mem_table, Look should use unique_ptr
     std::shared_ptr<MemTable> _mem_table;
     std::unique_ptr<Schema> _schema;
-    const TabletSchema* _tablet_schema;
+    //const TabletSchema* _tablet_schema;
+    // tablet schema owned by delta writer, all write will use this tablet schema
+    // it's build from tablet_schema(stored when create tablet) and OlapTableSchema
+    // every request will have it's own tablet schema so simple schema change can work
+    std::shared_ptr<TabletSchema> _tablet_schema;

Review Comment:
   why not unique ptr?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Lchangliang commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r914359505


##########
be/src/olap/convert_rowset.cpp:
##########
@@ -48,8 +48,9 @@ Status ConvertRowset::do_convert() {
 
         std::unique_ptr<RowsetWriter> output_rs_writer;
         RETURN_NOT_OK(_tablet->create_rowset_writer(output_version, VISIBLE, NONOVERLAPPING,
-                                                    &output_rs_writer));
-        res = Merger::merge_rowsets(_tablet, ReaderType::READER_BASE_COMPACTION, {input_rs_reader},
+                                                    &_tablet->tablet_schema(), &output_rs_writer));

Review Comment:
   But it need to be compile. Maybe someone is responsible for deleting related files together.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on PR #10136:
URL: https://github.com/apache/incubator-doris/pull/10136#issuecomment-1155126378

   What a fancy improvement! And after reading your issue, I'm curious will it increase the pressure on metadata storage because  we persist all schema of all rowsets?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912441824


##########
be/src/olap/rowset/beta_rowset.cpp:
##########
@@ -54,11 +54,12 @@ Status BetaRowset::do_load(bool /*use_cache*/) {
     return Status::OK();
 }
 
-Status BetaRowset::load_segments(std::vector<segment_v2::SegmentSharedPtr>* segments) {
+Status BetaRowset::load_segments(std::vector<segment_v2::SegmentSharedPtr>* segments,
+                                 const TabletSchema* read_tablet_schema) {
     for (int seg_id = 0; seg_id < num_segments(); ++seg_id) {

Review Comment:
   Why segment should related with the input schema? Actually segment or rowset has its own schema.



##########
be/src/olap/rowset/rowset.cpp:
##########
@@ -35,6 +34,12 @@ Rowset::Rowset(const TabletSchema* schema, const FilePathDesc& rowset_path_desc,
         Version version = _rowset_meta->version();
         _is_cumulative = version.first != version.second;
     }
+    // build schema from RowsetMeta.tablet_schema or Tablet.tablet_schema
+    if (_rowset_meta->tablet_schema() != nullptr) {

Review Comment:
   I find there are some logic that check whether the rowset has tablet schema. Why not add tablet meta to all rowset during data_dir.load()?
   Then we do not need deal with these if else logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10136: [Feature] Lightweight schema change of add/drop column

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10136:
URL: https://github.com/apache/doris/pull/10136#discussion_r912442441


##########
be/src/olap/schema_change.cpp:
##########
@@ -1649,8 +1650,8 @@ Status VSchemaChangeWithSorting::_internal_sorting(
     MultiBlockMerger merger(new_tablet);
 
     std::unique_ptr<RowsetWriter> rowset_writer;
-    RETURN_IF_ERROR(
-            new_tablet->create_rowset_writer(version, VISIBLE, segments_overlap, &rowset_writer));
+    RETURN_IF_ERROR(new_tablet->create_rowset_writer(version, VISIBLE, segments_overlap,
+                                                     &new_tablet->tablet_schema(), &rowset_writer));

Review Comment:
   If create rowset writer always bind the tablet's tablet_schema, then its no need as an input parameter. We could get the tablet schema during create_rowset_writer inside.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org