You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/07/21 06:41:50 UTC

[GitHub] [incubator-doris] morningman opened a new pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

morningman opened a new pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125


   ## Proposed changes
   
   Stream load should read all the data completely before parsing the json.
   And also add a new BE config `streaming_load_max_batch_read_mb`
   to limit the data size when loading json data.
   
   Fix: #4124
   
   ## Types of changes
   
   - [x] Bugfix (non-breaking change which fixes an issue)
   
   ## Checklist
   
   - [x ] I have create an issue on #4124, and have described the bug/feature there in detail
   - [ x] Commit messages in my PR start with the related issues ID, like "#4071 Add pull request template to doris project"
   - [ x] Compiling and unit tests pass locally with my changes
   - [x ] If this change need a document change, I have updated the document
   - [ x] Any dependent changes have been merged


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] worker24h commented on a change in pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

Posted by GitBox <gi...@apache.org>.
worker24h commented on a change in pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125#discussion_r457912482



##########
File path: be/src/exec/json_scanner.cpp
##########
@@ -243,6 +243,7 @@ Status JsonReader::_parse_json_doc(bool* eof) {
     uint8_t* json_str = nullptr;
     size_t length = 0;
     RETURN_IF_ERROR(_file_reader->read_one_message(&json_str, &length));
+    LOG(INFO) << "cmy get json length: " << length;

Review comment:
       Please remove it,otherwise Log file will burst




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wutiangan commented on a change in pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

Posted by GitBox <gi...@apache.org>.
wutiangan commented on a change in pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125#discussion_r457892464



##########
File path: be/src/exec/json_scanner.cpp
##########
@@ -243,6 +243,7 @@ Status JsonReader::_parse_json_doc(bool* eof) {
     uint8_t* json_str = nullptr;
     size_t length = 0;
     RETURN_IF_ERROR(_file_reader->read_one_message(&json_str, &length));
+    LOG(INFO) << "cmy get json length: " << length;

Review comment:
       ```suggestion
       LOG(INFO) << "get json length: " << length;
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman closed pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

Posted by GitBox <gi...@apache.org>.
morningman closed pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wutiangan commented on a change in pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

Posted by GitBox <gi...@apache.org>.
wutiangan commented on a change in pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125#discussion_r457892231



##########
File path: be/src/http/action/stream_load.cpp
##########
@@ -312,7 +312,7 @@ Status StreamLoadAction::_process_put(HttpRequest* http_req, StreamLoadContext*
     request.formatType = ctx->format;
     request.__set_loadId(ctx->id.to_thrift());
     if (ctx->use_streaming) {
-        auto pipe = std::make_shared<StreamLoadPipe>();
+        auto pipe = std::make_shared<StreamLoadPipe>(1024 * 1024, 64* 1024, ctx->body_bytes);

Review comment:
       ```suggestion
           auto pipe = std::make_shared<StreamLoadPipe>(1024 * 1024, 64 * 1024, ctx->body_bytes);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wutiangan commented on a change in pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

Posted by GitBox <gi...@apache.org>.
wutiangan commented on a change in pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125#discussion_r457892910



##########
File path: be/src/runtime/stream_load/stream_load_pipe.h
##########
@@ -85,30 +87,14 @@ class StreamLoadPipe : public MessageBodySink, public FileReader {
     }
 
     Status read_one_message(uint8_t** data, size_t* length) override {
-        std::unique_lock<std::mutex> l(_lock);
-        while (!_cancelled && !_finished && _buf_queue.empty()) {
-            _get_cond.wait(l);
+        if (_total_length == -1) {
+            return Status::InternalError("invalid call");
         }
-        // cancelled
-        if (_cancelled) {
-            return Status::InternalError("cancelled");
-        }
-        // finished
-        if (_buf_queue.empty()) {
-            DCHECK(_finished);
-            *data = nullptr;
-            *length = 0;
-            return Status::OK();
-        }
-        auto buf = _buf_queue.front();
-        *length = buf->remaining();
-        *data = new uint8_t[*length];
-        buf->get_bytes((char*)(*data) , *length);
-
-        _buf_queue.pop_front();
-        _buffered_bytes -= buf->limit;
-        _put_cond.notify_one();
-        return Status::OK();
+        *data = new uint8_t[_total_length];

Review comment:
       what about if _total_lenght == 0?
   if total_legth = 0, there may be a memory leak.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] worker24h commented on a change in pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

Posted by GitBox <gi...@apache.org>.
worker24h commented on a change in pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125#discussion_r457912482



##########
File path: be/src/exec/json_scanner.cpp
##########
@@ -243,6 +243,7 @@ Status JsonReader::_parse_json_doc(bool* eof) {
     uint8_t* json_str = nullptr;
     size_t length = 0;
     RETURN_IF_ERROR(_file_reader->read_one_message(&json_str, &length));
+    LOG(INFO) << "cmy get json length: " << length;

Review comment:
       Please remove it,otherwise Log file will burst




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wutiangan commented on a change in pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

Posted by GitBox <gi...@apache.org>.
wutiangan commented on a change in pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125#discussion_r457892910



##########
File path: be/src/runtime/stream_load/stream_load_pipe.h
##########
@@ -85,30 +87,14 @@ class StreamLoadPipe : public MessageBodySink, public FileReader {
     }
 
     Status read_one_message(uint8_t** data, size_t* length) override {
-        std::unique_lock<std::mutex> l(_lock);
-        while (!_cancelled && !_finished && _buf_queue.empty()) {
-            _get_cond.wait(l);
+        if (_total_length == -1) {
+            return Status::InternalError("invalid call");
         }
-        // cancelled
-        if (_cancelled) {
-            return Status::InternalError("cancelled");
-        }
-        // finished
-        if (_buf_queue.empty()) {
-            DCHECK(_finished);
-            *data = nullptr;
-            *length = 0;
-            return Status::OK();
-        }
-        auto buf = _buf_queue.front();
-        *length = buf->remaining();
-        *data = new uint8_t[*length];
-        buf->get_bytes((char*)(*data) , *length);
-
-        _buf_queue.pop_front();
-        _buffered_bytes -= buf->limit;
-        _put_cond.notify_one();
-        return Status::OK();
+        *data = new uint8_t[_total_length];

Review comment:
       if _total_lenght == 0?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #4125: [Bug][Load][Json] #4124 Load json format with stream load failed

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #4125:
URL: https://github.com/apache/incubator-doris/pull/4125#discussion_r458207883



##########
File path: be/src/exec/json_scanner.cpp
##########
@@ -243,6 +243,7 @@ Status JsonReader::_parse_json_doc(bool* eof) {
     uint8_t* json_str = nullptr;
     size_t length = 0;
     RETURN_IF_ERROR(_file_reader->read_one_message(&json_str, &length));
+    LOG(INFO) << "cmy get json length: " << length;

Review comment:
       Removed

##########
File path: be/src/runtime/stream_load/stream_load_pipe.h
##########
@@ -85,30 +87,14 @@ class StreamLoadPipe : public MessageBodySink, public FileReader {
     }
 
     Status read_one_message(uint8_t** data, size_t* length) override {
-        std::unique_lock<std::mutex> l(_lock);
-        while (!_cancelled && !_finished && _buf_queue.empty()) {
-            _get_cond.wait(l);
+        if (_total_length == -1) {
+            return Status::InternalError("invalid call");
         }
-        // cancelled
-        if (_cancelled) {
-            return Status::InternalError("cancelled");
-        }
-        // finished
-        if (_buf_queue.empty()) {
-            DCHECK(_finished);
-            *data = nullptr;
-            *length = 0;
-            return Status::OK();
-        }
-        auto buf = _buf_queue.front();
-        *length = buf->remaining();
-        *data = new uint8_t[*length];
-        buf->get_bytes((char*)(*data) , *length);
-
-        _buf_queue.pop_front();
-        _buffered_bytes -= buf->limit;
-        _put_cond.notify_one();
-        return Status::OK();
+        *data = new uint8_t[_total_length];

Review comment:
       I will add a check here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org