You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/10/19 11:43:43 UTC

[GitHub] [doris] pengxiangyu opened a new pull request, #13487: [Feature](remote)Add schema change for remote storage file

pengxiangyu opened a new pull request, #13487:
URL: https://github.com/apache/doris/pull/13487

   # Proposed changes
   
   Issue Number: close #13486
   
   ## Problem summary
   
   Support schema change for rowset on remote storage.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ x] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ x] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [x ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ x] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [x ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1294402643

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 38.46 seconds
    load time: 571 seconds
    storage size: 17154644848 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221028112724_clickbench_pr_35071.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
pengxiangyu commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1006524942


##########
be/src/olap/rowset/beta_rowset_writer.cpp:
##########
@@ -67,8 +67,13 @@ BetaRowsetWriter::~BetaRowsetWriter() {
 Status BetaRowsetWriter::init(const RowsetWriterContext& rowset_writer_context) {
     _context = rowset_writer_context;
     _rowset_meta.reset(new RowsetMeta);
-    if (_context.data_dir) {
+    if (_context.fs == nullptr && _context.data_dir) {
         _rowset_meta->set_fs(_context.data_dir->fs());
+    } else {
+        _rowset_meta->set_fs(_context.fs);
+    }
+    if (_context.resource_id.size() > 0) {
+        _rowset_meta->set_resource_id(_context.resource_id);

Review Comment:
   it is used when calling function is_local()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #13487:
URL: https://github.com/apache/doris/pull/13487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1294419375

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 39.43 seconds
    load time: 634 seconds
    storage size: 17154810725 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221028035700_clickbench_pr_35100.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
pengxiangyu commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1284848912

   > We already have light weight schema change(set a property during create table), not need link any more and S3(all object systems) not support link.
   
   you can't force user to create table with light_schema_change = true, when he want to use cold data in a old table, it doesn't work. This is used to compatible old table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1005295743


##########
be/src/io/fs/s3_file_system.cpp:
##########
@@ -226,7 +229,26 @@ Status S3FileSystem::delete_directory(const Path& path) {
 }
 
 Status S3FileSystem::link_file(const Path& src, const Path& dest) {
-    return Status::NotSupported("not support");
+    auto client = get_client();
+    CHECK_S3_CLIENT(client);
+
+    Aws::S3::Model::CopyObjectRequest request;
+    auto src_key = get_key(src);
+    auto dest_key = get_key(dest);
+    request.WithBucket(_s3_conf.bucket)
+            .WithCopySource(_s3_conf.bucket + "/" + src_key)

Review Comment:
   Pls do not use copy to implement link, it is very expensive and not assure atomic and also there will be garbage if it failed. 
   Should read and write a new rowset at application layer and then we could do gc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
pengxiangyu commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1284843735

   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1291586809

   > light_schema_change is only useful in alter table add value, but not alter table add key, which is still used in old case, so we must change the implement in BE.
   
   Do you mean add key could be resolved by using linked schema change? Could you please provide a case? 
   
   I think if an old schema change could use linked schema change, then it could also use light weight schema change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1286316920

   we will remove linked schema change after 1.2 is released and it is stable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1294394637

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1292063054

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 39.74 seconds
    load time: 557 seconds
    storage size: 17154655370 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221026133910_clickbench_pr_34414.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1293349591

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 38.5 seconds
    load time: 571 seconds
    storage size: 17154821261 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221027185618_clickbench_pr_34799.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
pengxiangyu commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1291589868

   > > light_schema_change is only useful in alter table add value, but not alter table add key, which is still used in old case, so we must change the implement in BE.
   > 
   > Do you mean add key could be resolved by using linked schema change? Could you please provide a case?
   > 
   > I think if an old schema change could use linked schema change, then it could also use light weight schema change.
   
   No only linked schema change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1293834157

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 40.14 seconds
    load time: 561 seconds
    storage size: 17154644711 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221027171554_clickbench_pr_34952.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1006462262


##########
be/src/io/fs/s3_file_system.h:
##########
@@ -75,7 +75,8 @@ class S3FileSystem final : public RemoteFileSystem {
     // Guarded by external lock.
     void set_sk(std::string sk) { _s3_conf.sk = std::move(sk); }
 
-private:
+    const S3Conf s3_conf() const { return _s3_conf; }

Review Comment:
   This method is not used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
pengxiangyu commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1291419483

   > > 
   > 
   > 1. I think you could add support to convert old table (not support light weight schema change) to new table (support light weight schema change).  Then we could use light weight schema change to do such things.
   > 2. we will set light_schema_change = true as default in one or two months. We will remove linked schema change after 1.2 is released and it is stable. Then most of link file related code will be removed.  I think you could help us add such feature then we could remove linked schema change more quickly.
   
   light_schema_change is only useful in alter table add value, but not alter table add key, which is still used in old case, so we must change the implement in BE.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zxealous commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
zxealous commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r999431563


##########
be/src/olap/rowset/beta_rowset_writer.cpp:
##########
@@ -156,7 +167,7 @@ template Status BetaRowsetWriter::_add_row(const ContiguousRow& row);
 
 Status BetaRowsetWriter::add_rowset(RowsetSharedPtr rowset) {
     assert(rowset->rowset_meta()->rowset_type() == BETA_ROWSET);
-    RETURN_NOT_OK(rowset->link_files_to(_context.tablet_path, _context.rowset_id));
+    RETURN_NOT_OK(rowset->link_files_to(_context.rowset_dir, _context.rowset_id));

Review Comment:
   Function link_files_to() will check whether dst_path is exist, but it use function FileUtils::check_exist(dst_path), may be it should change to fs->exists().



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1284478910

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 37.59 seconds
    load time: 603 seconds
    storage size: 17108337541 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221020033159_clickbench_pr_31534.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
pengxiangyu commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1284848432

   > We should not implement heavy schema change for cooldown, light schema change is done only in fe. We can enable light schema change when creating a table with property light_schema_change = true.
   
   light_schema_change is only useful in alter table add value, but not alter table add key, which is still used in old case, so we must change the implement in BE.
   
   you can't force user to create table with light_schema_change = true, when he want to use cold data in a old table, it doesn't work. This is used to compatible old table.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1006467366


##########
be/src/olap/rowset/beta_rowset_writer.cpp:
##########
@@ -67,8 +67,13 @@ BetaRowsetWriter::~BetaRowsetWriter() {
 Status BetaRowsetWriter::init(const RowsetWriterContext& rowset_writer_context) {
     _context = rowset_writer_context;
     _rowset_meta.reset(new RowsetMeta);
-    if (_context.data_dir) {
+    if (_context.fs == nullptr && _context.data_dir) {
         _rowset_meta->set_fs(_context.data_dir->fs());
+    } else {
+        _rowset_meta->set_fs(_context.fs);
+    }
+    if (_context.resource_id.size() > 0) {
+        _rowset_meta->set_resource_id(_context.resource_id);

Review Comment:
   why rowset meta need save resource_id? I did not find use of it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1294394664

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1007548610


##########
be/src/olap/schema_change.cpp:
##########
@@ -2374,6 +2374,16 @@ Status SchemaChangeHandler::_parse_request(const SchemaChangeParams& sc_params,
         *sc_directly = true;
     }
 
+    if (!(*sc_directly) && !(*sc_sorting)) {

Review Comment:
   Please add a comment on line 2377, explain why we add this logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu closed pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
pengxiangyu closed pull request #13487: [Feature](remote)Add schema change for remote storage file
URL: https://github.com/apache/doris/pull/13487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
dataroaring commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1283913174

   We should not implement heavy schema change for cooldown, light schema change is done only in fe.  We can enable light schema change when creating a table with property light_schema_change = true.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1286316028

   > 
   
   I think we could add support to convert old table (not support light weight schema change) to new table (support light weight schema change) because we will set light_schema_change = true as default in one or two months. Then we could use light weight schema change to do such things. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] pengxiangyu commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
pengxiangyu commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1007555270


##########
be/src/olap/rowset/beta_rowset_writer.cpp:
##########
@@ -67,8 +67,13 @@ BetaRowsetWriter::~BetaRowsetWriter() {
 Status BetaRowsetWriter::init(const RowsetWriterContext& rowset_writer_context) {
     _context = rowset_writer_context;
     _rowset_meta.reset(new RowsetMeta);
-    if (_context.data_dir) {
+    if (_context.fs == nullptr && _context.data_dir) {

Review Comment:
   then _rowset_meta->set_fs(nullptr), this is the same with _rowset_meta->set_fs(_context.fs);



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1006470133


##########
be/src/olap/tablet.cpp:
##########
@@ -1661,6 +1661,17 @@ Status Tablet::create_rowset_writer(const Version& version, const RowsetStatePB&
                                     TabletSchemaSPtr tablet_schema, int64_t oldest_write_timestamp,
                                     int64_t newest_write_timestamp,
                                     std::unique_ptr<RowsetWriter>* rowset_writer) {
+    return create_rowset_writer(version, rowset_state, overlap, tablet_schema,
+                                oldest_write_timestamp, newest_write_timestamp, nullptr, "",
+                                rowset_writer);
+}
+
+Status Tablet::create_rowset_writer(const Version& version, const RowsetStatePB& rowset_state,
+                                    const SegmentsOverlapPB& overlap,
+                                    TabletSchemaSPtr tablet_schema, int64_t oldest_write_timestamp,
+                                    int64_t newest_write_timestamp, io::FileSystemPtr fs,
+                                    const io::ResourceId& resource_id,

Review Comment:
   could add an API resource_id() to filesystem iterface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1006471418


##########
be/test/olap/rowset/segment_v2/bitmap_index_test.cpp:
##########
@@ -35,6 +35,9 @@
 #include "util/file_utils.h"
 
 namespace doris {
+
+using FileSystemPtr = std::shared_ptr<io::FileSystem>;

Review Comment:
   if it is a shared ptr, then name it to  FileSystemSPtr
   if it is a unique ptr, then name it to FileSystemUPtr
   then we could distinguish them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13487:
URL: https://github.com/apache/doris/pull/13487#issuecomment-1294389234

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 38.66 seconds
    load time: 569 seconds
    storage size: 17154644880 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221028110551_clickbench_pr_35053.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1007546864


##########
be/src/olap/rowset/beta_rowset_writer.cpp:
##########
@@ -67,8 +67,13 @@ BetaRowsetWriter::~BetaRowsetWriter() {
 Status BetaRowsetWriter::init(const RowsetWriterContext& rowset_writer_context) {
     _context = rowset_writer_context;
     _rowset_meta.reset(new RowsetMeta);
-    if (_context.data_dir) {
+    if (_context.fs == nullptr && _context.data_dir) {

Review Comment:
   if context.fs == nullptr and context.data_dir is also nullptr, then what happen?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #13487: [Feature](remote)Add schema change for remote storage file

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #13487:
URL: https://github.com/apache/doris/pull/13487#discussion_r1007544886


##########
be/src/olap/tablet.cpp:
##########
@@ -1704,7 +1715,12 @@ void Tablet::_init_context_common_fields(RowsetWriterContext& context) {
     if (context.rowset_type == ALPHA_ROWSET) {
         context.rowset_type = StorageEngine::instance()->default_rowset_type();
     }
-    context.tablet_path = tablet_path();
+    context.rowset_dir = tablet_path();
+    if (context.fs != nullptr && context.fs->type() != io::FileSystemType::LOCAL) {
+        context.rowset_dir = BetaRowset::remote_tablet_path(tablet_id());
+    } else {
+        context.rowset_dir = tablet_path();

Review Comment:
   This line is useless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org