You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/07/31 03:50:51 UTC

[GitHub] [incubator-doris] wuyunfeng commented on a change in pull request #4217: [Bug][Load][Json] #4124 Load json format with stream load failed

wuyunfeng commented on a change in pull request #4217:
URL: https://github.com/apache/incubator-doris/pull/4217#discussion_r463389954



##########
File path: be/src/common/config.h
##########
@@ -303,11 +303,14 @@ namespace config {
     CONF_Int64(load_data_reserve_hours, "4");
     // log error log will be removed after this time
     CONF_mInt64(load_error_log_reserve_hours, "48");
-    // Deprecated, use streaming_load_max_mb instead
-    // CONF_Int64(mini_load_max_mb, "2048");
     CONF_Int32(number_tablet_writer_threads, "16");
 
+    // The maximum amount of data that can be processed by a stream load

Review comment:
       a stream load can process 10G by default?

##########
File path: be/src/http/action/stream_load.cpp
##########
@@ -234,11 +234,19 @@ Status StreamLoadAction::_on_header(HttpRequest* http_req, StreamLoadContext* ct
     } else {
         ctx->format = parse_format(http_req->header(HTTP_FORMAT_KEY));
         if (ctx->format == TFileFormatType::FORMAT_UNKNOWN) {
-            LOG(WARNING) << "unknown data format." << ctx->brief();
             std::stringstream ss;
             ss << "unknown data format, format=" << http_req->header(HTTP_FORMAT_KEY);
             return Status::InternalError(ss.str());
         }
+
+        size_t max_body_bytes = config::streaming_load_max_batch_size_mb * 1024 * 1024;
+        if (ctx->format == TFileFormatType::FORMAT_JSON) {
+            if (ctx->body_bytes > max_body_bytes) {
+                std::stringstream ss;
+                ss << "body exceed max size of json format: " << ctx->body_bytes << ", limit: " << max_body_bytes;

Review comment:
       the size of this batch exceed the max size of json type data
   ```suggestion
                   ss << "the size of this batch exceed the max size [" << max_body_bytes << "]  of json type data " << " data [ " << ctx->body_bytes << " ] "
   ```
   And I suggest you should truncate the logged `body_bytes` such as just show 1024 byte 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org