You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/10/12 14:56:04 UTC

[GitHub] [doris] dataroaring opened a new pull request, #13336: [fix](jsonreader) teach jsonreader to release memory

dataroaring opened a new pull request, #13336:
URL: https://github.com/apache/doris/pull/13336

   Allocator of rapidjson does not release memory, this fix use allocator with local buffer and call Clear to release memory allocated beyond local buffer.
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
dataroaring commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994588717


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,15 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];
+    char _parse_buffer[512 * 1024];
+
+    typedef rapidjson::GenericDocument<rapidjson::UTF8<>, rapidjson::MemoryPoolAllocator<>,
+                                       rapidjson::MemoryPoolAllocator<>>
+            Document;
+    rapidjson::MemoryPoolAllocator<> _value_allocator;
+    rapidjson::MemoryPoolAllocator<> _parse_allocator;

Review Comment:
   crlAllocator uses malloc and free,it is less efficient.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz merged pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
xinyiZzz merged PR #13336:
URL: https://github.com/apache/doris/pull/13336


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] gavinchou commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
gavinchou commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r993590746


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,14 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];

Review Comment:
   Does it seem that this class `JsonReader` may/should/must not be declared as a stack variable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13336:
URL: https://github.com/apache/doris/pull/13336#issuecomment-1277305132

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhannngchen commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
zhannngchen commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994388402


##########
be/src/exec/json_scanner.cpp:
##########
@@ -343,6 +346,8 @@ Status JsonReader::_parse_json_doc(size_t* size, bool* eof) {
         return Status::OK();
     }
 
+    // clear memory here.
+    _origin_json_doc.GetAllocator().Clear();

Review Comment:
   Why not just create a new `Document` each time we need to parse? It's not reasonable to make _origin_json_doc as a class member variable?



##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,15 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];
+    char _parse_buffer[512 * 1024];
+
+    typedef rapidjson::GenericDocument<rapidjson::UTF8<>, rapidjson::MemoryPoolAllocator<>,
+                                       rapidjson::MemoryPoolAllocator<>>
+            Document;
+    rapidjson::MemoryPoolAllocator<> _value_allocator;
+    rapidjson::MemoryPoolAllocator<> _parse_allocator;

Review Comment:
   Dose `crtAllocator` a better choice?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
dataroaring commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994382135


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,15 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];
+    char _parse_buffer[512 * 1024];

Review Comment:
   The constant is only used here and once, so it can be directly used?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13336:
URL: https://github.com/apache/doris/pull/13336#issuecomment-1277305189

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] levy5307 commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
levy5307 commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994225208


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,15 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];
+    char _parse_buffer[512 * 1024];

Review Comment:
   It's better to define a constant for 512*1024



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
dataroaring commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r995310018


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,15 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];
+    char _parse_buffer[512 * 1024];

Review Comment:
   So, if we define a varible named kParserBufferSize, can you get some meaning? I think it can be got from _parse_buffer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
dataroaring commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994419637


##########
be/src/exec/json_scanner.cpp:
##########
@@ -343,6 +346,8 @@ Status JsonReader::_parse_json_doc(size_t* size, bool* eof) {
         return Status::OK();
     }
 
+    // clear memory here.
+    _origin_json_doc.GetAllocator().Clear();

Review Comment:
   It should work either. Just a pointer variable as a member.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] levy5307 commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
levy5307 commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994392469


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,15 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];
+    char _parse_buffer[512 * 1024];

Review Comment:
   > The constant is only used here and once, so it can be directly used?
   
   I don't think so. Because we can hardly get the meaning  `512*1024`, but we can get it from the name of a constant



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] levy5307 commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
levy5307 commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994392469


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,15 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];
+    char _parse_buffer[512 * 1024];

Review Comment:
   > The constant is only used here and once, so it can be directly used?
   
   I don't think so. Because we can hardly get the meaning  `512*1024`, and we can get it from the name of a constant



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] gavinchou commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
gavinchou commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r993590746


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,14 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];

Review Comment:
   Does it seem that this class may/should/must not be declared as a stack variable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] gavinchou commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
gavinchou commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r993590746


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,14 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];

Review Comment:
   Does it mean that this class `JsonReader` may/should/must not be declared as a stack variable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
eldenmoon commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994234658


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,15 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];

Review Comment:
   what if json data size exceeds the buffer size?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on a diff in pull request #13336: [fix](jsonreader) teach jsonreader to release memory

Posted by GitBox <gi...@apache.org>.
dataroaring commented on code in PR #13336:
URL: https://github.com/apache/doris/pull/13336#discussion_r994194032


##########
be/src/exec/json_scanner.h:
##########
@@ -178,7 +178,14 @@ class JsonReader {
     std::vector<std::vector<JsonPath>> _parsed_jsonpaths;
     std::vector<JsonPath> _parsed_json_root;
 
-    rapidjson::Document _origin_json_doc; // origin json document object from parsed json string
+    char _value_buffer[4 * 1024 * 1024];

Review Comment:
   It is not used on stack except ut.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org