You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/22 16:30:38 UTC

[GitHub] [incubator-doris] dataroaring opened a new pull request, #9727: [opt] avoid memcpy when building page

dataroaring opened a new pull request, #9727:
URL: https://github.com/apache/incubator-doris/pull/9727

   PageBuilder finishes a page when the page is full, when the page is
   full, its size is greater than page size, so the last row would
   trigger an memcpy due to there is not enough space in faststring.
   
   The patch fix the problem by 2 changes:
   1. PageBuilder reserves extra bytes
      If the last row is less than extra bytes, then no memory copy is
      triggered.
   2. faststring uses malloc/realloc/free rather than new/delete
      realloc can avoid memcopy some times.
   
   The patch is not tested, so please do not merge.
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r879004925


##########
be/src/util/faststring.cc:
##########
@@ -37,18 +37,20 @@ void faststring::GrowToAtLeast(size_t newcapacity) {
 
 void faststring::GrowArray(size_t newcapacity) {
     DCHECK_GE(newcapacity, capacity_);
-    std::unique_ptr<uint8_t[]> newdata(new uint8_t[newcapacity]);
-    if (len_ > 0) {
-        memcpy(&newdata[0], &data_[0], len_);
-    }
-    capacity_ = newcapacity;
+    uint8_t* newdata = NULL;
     if (data_ != initial_data_) {
-        delete[] data_;
+        newdata = (uint8_t*)realloc((void*)data_, newcapacity);
     } else {
+        newdata = (uint8_t*)malloc(newcapacity);
+        if (len_ > 0) {
+            memcpy(newdata, data_, len_);
+        }
+    }
+    capacity_ = newcapacity;
+    data_ = newdata;
+    if (data_ == initial_data_) {
         ASAN_POISON_MEMORY_REGION(initial_data_, arraysize(initial_data_));

Review Comment:
   data_ never equal to initial_data_?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r878995037


##########
be/src/util/faststring.cc:
##########
@@ -37,18 +37,20 @@ void faststring::GrowToAtLeast(size_t newcapacity) {
 
 void faststring::GrowArray(size_t newcapacity) {
     DCHECK_GE(newcapacity, capacity_);
-    std::unique_ptr<uint8_t[]> newdata(new uint8_t[newcapacity]);
-    if (len_ > 0) {
-        memcpy(&newdata[0], &data_[0], len_);
-    }
-    capacity_ = newcapacity;
+    uint8_t* newdata = NULL;

Review Comment:
   Not use NULL, use nullptr please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r878999924


##########
be/src/util/faststring.cc:
##########
@@ -57,14 +59,14 @@ void faststring::ShrinkToFitInternal() {
     if (len_ <= kInitialCapacity) {
         ASAN_UNPOISON_MEMORY_REGION(initial_data_, len_);
         memcpy(initial_data_, &data_[0], len_);
-        delete[] data_;

Review Comment:
   I think we'd better use c++ function delete[] not C function free()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] dataroaring commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
dataroaring commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r884074140


##########
be/src/util/faststring.cc:
##########
@@ -37,18 +37,20 @@ void faststring::GrowToAtLeast(size_t newcapacity) {
 
 void faststring::GrowArray(size_t newcapacity) {
     DCHECK_GE(newcapacity, capacity_);
-    std::unique_ptr<uint8_t[]> newdata(new uint8_t[newcapacity]);
-    if (len_ > 0) {
-        memcpy(&newdata[0], &data_[0], len_);
-    }
-    capacity_ = newcapacity;
+    uint8_t* newdata = NULL;
     if (data_ != initial_data_) {
-        delete[] data_;
+        newdata = (uint8_t*)realloc((void*)data_, newcapacity);
     } else {
+        newdata = (uint8_t*)malloc(newcapacity);
+        if (len_ > 0) {
+            memcpy(newdata, data_, len_);
+        }
+    }
+    capacity_ = newcapacity;
+    data_ = newdata;
+    if (data_ == initial_data_) {
         ASAN_POISON_MEMORY_REGION(initial_data_, arraysize(initial_data_));

Review Comment:
   yep, there is a bug visible to the naked eye.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xinyiZzz commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r879936473


##########
be/src/util/faststring.cc:
##########
@@ -37,18 +37,20 @@ void faststring::GrowToAtLeast(size_t newcapacity) {
 
 void faststring::GrowArray(size_t newcapacity) {
     DCHECK_GE(newcapacity, capacity_);
-    std::unique_ptr<uint8_t[]> newdata(new uint8_t[newcapacity]);
-    if (len_ > 0) {
-        memcpy(&newdata[0], &data_[0], len_);
-    }
-    capacity_ = newcapacity;
+    uint8_t* newdata = NULL;
     if (data_ != initial_data_) {
-        delete[] data_;
+        newdata = (uint8_t*)realloc((void*)data_, newcapacity);
     } else {
+        newdata = (uint8_t*)malloc(newcapacity);
+        if (len_ > 0) {
+            memcpy(newdata, data_, len_);
+        }
+    }
+    capacity_ = newcapacity;
+    data_ = newdata;
+    if (data_ == initial_data_) {
         ASAN_POISON_MEMORY_REGION(initial_data_, arraysize(initial_data_));

Review Comment:
   initial_data_ is a reserved memory, which is no longer equal to after the first GrowArray.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r878998363


##########
be/src/util/faststring.cc:
##########
@@ -37,18 +37,20 @@ void faststring::GrowToAtLeast(size_t newcapacity) {
 
 void faststring::GrowArray(size_t newcapacity) {
     DCHECK_GE(newcapacity, capacity_);
-    std::unique_ptr<uint8_t[]> newdata(new uint8_t[newcapacity]);
-    if (len_ > 0) {
-        memcpy(&newdata[0], &data_[0], len_);
-    }
-    capacity_ = newcapacity;
+    uint8_t* newdata = NULL;
     if (data_ != initial_data_) {
-        delete[] data_;
+        newdata = (uint8_t*)realloc((void*)data_, newcapacity);
     } else {
+        newdata = (uint8_t*)malloc(newcapacity);
+        if (len_ > 0) {
+            memcpy(newdata, data_, len_);
+        }
+    }
+    capacity_ = newcapacity;
+    data_ = newdata;
+    if (data_ == initial_data_) {

Review Comment:
   This branch never happens



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r879000096


##########
be/src/util/faststring.cc:
##########
@@ -57,14 +59,14 @@ void faststring::ShrinkToFitInternal() {
     if (len_ <= kInitialCapacity) {
         ASAN_UNPOISON_MEMORY_REGION(initial_data_, len_);
         memcpy(initial_data_, &data_[0], len_);
-        delete[] data_;
+        free(data_);
         data_ = initial_data_;
         capacity_ = kInitialCapacity;
     } else {
-        std::unique_ptr<uint8_t[]> newdata(new uint8_t[len_]);
-        memcpy(&newdata[0], &data_[0], len_);
-        delete[] data_;
-        data_ = newdata.release();
+        uint8_t* newdata = (uint8_t*)malloc(len_);
+        memcpy(newdata, data_, len_);
+        free(data_);

Review Comment:
   And here, use delete[] please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r878995037


##########
be/src/util/faststring.cc:
##########
@@ -37,18 +37,20 @@ void faststring::GrowToAtLeast(size_t newcapacity) {
 
 void faststring::GrowArray(size_t newcapacity) {
     DCHECK_GE(newcapacity, capacity_);
-    std::unique_ptr<uint8_t[]> newdata(new uint8_t[newcapacity]);
-    if (len_ > 0) {
-        memcpy(&newdata[0], &data_[0], len_);
-    }
-    capacity_ = newcapacity;
+    uint8_t* newdata = NULL;

Review Comment:
   Not use NULL, use nullptr please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9727:
URL: https://github.com/apache/doris/pull/9727#issuecomment-1365514823

   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and feel free a maintainer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] closed pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #9727: [opt] avoid memcpy when building page
URL: https://github.com/apache/doris/pull/9727


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r878998363


##########
be/src/util/faststring.cc:
##########
@@ -37,18 +37,20 @@ void faststring::GrowToAtLeast(size_t newcapacity) {
 
 void faststring::GrowArray(size_t newcapacity) {
     DCHECK_GE(newcapacity, capacity_);
-    std::unique_ptr<uint8_t[]> newdata(new uint8_t[newcapacity]);
-    if (len_ > 0) {
-        memcpy(&newdata[0], &data_[0], len_);
-    }
-    capacity_ = newcapacity;
+    uint8_t* newdata = NULL;
     if (data_ != initial_data_) {
-        delete[] data_;
+        newdata = (uint8_t*)realloc((void*)data_, newcapacity);
     } else {
+        newdata = (uint8_t*)malloc(newcapacity);
+        if (len_ > 0) {
+            memcpy(newdata, data_, len_);
+        }
+    }
+    capacity_ = newcapacity;
+    data_ = newdata;
+    if (data_ == initial_data_) {

Review Comment:
   This branch never happens



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9727: [opt] avoid memcpy when building page

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9727:
URL: https://github.com/apache/incubator-doris/pull/9727#discussion_r879005535


##########
be/src/util/faststring.h:
##########
@@ -82,7 +82,7 @@ class faststring {
     OwnedSlice build() {
         uint8_t* ret = data_;
         if (ret == initial_data_) {
-            ret = new uint8_t[len_];
+            ret = (uint8_t*)malloc(len_);
             memcpy(ret, data_, len_);

Review Comment:
   ownedslice deconstructor using delete[]  to release data_, if we use malloc here, we need to modify it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org