You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "amorynan (via GitHub)" <gi...@apache.org> on 2023/06/07 10:56:00 UTC

[GitHub] [doris] amorynan opened a new pull request, #20556: [Improve](array-type) update stream_load for array nested array

amorynan opened a new pull request, #20556:
URL: https://github.com/apache/doris/pull/20556

   ## Proposed changes
   Now We can not load array nested array to doris by stream load, because we use from_string() which is so hardcode 
   so I add a from_json for complex type such as array,map, struct , to support in from_string , later we can support complex type nested complex type with stream load 
   Issue Number: close #xxx
   
   <!--Describe your changes.-->
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1586580782

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan closed pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan closed pull request #20556: [Improve](array-type) update stream_load for array nested array
URL: https://github.com/apache/doris/pull/20556


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588528372

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1226021190


##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -188,77 +188,49 @@ std::string DataTypeArray::to_string(const IColumn& column, size_t row_num) cons
     return str;
 }
 
-bool next_element_from_string(ReadBuffer& rb, StringRef& output, bool& has_quota) {
-    StringRef element(rb.position(), 0);
-    has_quota = false;
-    if (rb.eof()) {
-        return false;
-    }
-
-    // ltrim
-    while (!rb.eof() && isspace(*rb.position())) {
-        ++rb.position();
-        element.data = rb.position();
-    }
-
-    // parse string
-    if (*rb.position() == '"' || *rb.position() == '\'') {
-        const char str_sep = *rb.position();
-        size_t str_len = 1;
-        // search until next '"' or '\''
-        while (str_len < rb.count() && *(rb.position() + str_len) != str_sep) {
-            ++str_len;
-        }
-        // invalid string
-        if (str_len >= rb.count()) {
-            rb.position() = rb.end();
-            return false;
-        }
-        has_quota = true;
-        rb.position() += str_len + 1;
-        element.size += str_len + 1;
-    }
+Status DataTypeArray::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    CHECK(json_value.type() == simdjson::ondemand::json_type::array);
+    simdjson::ondemand::array outer_array = json_value.get_array();
+    auto* array_column = assert_cast<ColumnArray*>(column);
+    auto& offsets = array_column->get_offsets();
+    IColumn& nested_column = array_column->get_data();
+    DCHECK(nested_column.is_nullable());
+    auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
 
-    // parse array element until array separator ',' or end ']'
-    while (!rb.eof() && (*rb.position() != ',') && (rb.count() != 1 || *rb.position() != ']')) {
-        // invalid elements such as ["123" 456,"789" 777]
-        // correct elements such as ["123"    ,"789"    ]
-        if (has_quota && !isspace(*rb.position())) {
-            return false;
+    size_t element_num = 0;
+    for (auto it = outer_array.begin(); it != outer_array.end(); ++it) {
+        Status st;
+        try {
+            if (is_complex_type(remove_nullable(nested))) {
+                simdjson::ondemand::value val;
+                (*it).get(val);
+                st = nested->from_json(val, &nested_null_col);
+            } else {
+                std::string_view sv = (*it).raw_json_token().value();
+                ReadBuffer nested_rb(const_cast<char*>(sv.data()), sv.size());

Review Comment:
   when in complex type, we always in function_cast to call data-type from_string(), but this is not suitable for nested complex type to deserliaze, so when we call from_string from complex type, just call from_json thats ok. and we should decrease using from_string which will not exists in the future



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588787794

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1226021428


##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -267,56 +239,16 @@ Status DataTypeArray::from_string(ReadBuffer& rb, IColumn* column) const {
         return Status::InvalidArgument("Array does not end with ']' character, found '{}'",
                                        *(rb.end() - 1));
     }
-    // empty array []
-    if (rb.count() == 2) {
-        offsets.push_back(offsets.back());
-        return Status::OK();
-    }
-    ++rb.position();
-
-    size_t element_num = 0;
-    // parse array element until end of array
-    while (!rb.eof()) {
-        StringRef element(rb.position(), rb.count());
-        bool has_quota = false;
-        if (!next_element_from_string(rb, element, has_quota)) {
-            // we should do array element column revert if error
-            nested_column.pop_back(element_num);
-            return Status::InvalidArgument("Cannot read array element from text '{}'",
-                                           element.to_string());
-        }
-
-        // handle empty element
-        if (element.size == 0) {
-            auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
-            nested_null_col.get_nested_column().insert_default();
-            nested_null_col.get_null_map_data().push_back(0);
-            ++element_num;
-            continue;
-        }
-
-        // handle null element, need to distinguish null and "null"
-        if (!has_quota && element.size == 4 && strncmp(element.data, "null", 4) == 0) {
-            // insert null
-            auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
-            nested_null_col.get_nested_column().insert_default();
-            nested_null_col.get_null_map_data().push_back(1);
-            ++element_num;
-            continue;
-        }
-
-        // handle normal element
-        ReadBuffer read_buffer(const_cast<char*>(element.data), element.size);
-        auto st = nested->from_string(read_buffer, &nested_column);
-        if (!st.ok()) {
-            // we should do array element column revert if error
-            nested_column.pop_back(element_num);
-            return st;
-        }
-        ++element_num;
-    }
-    offsets.push_back(offsets.back() + element_num);
-    return Status::OK();
+    // json parser
+    std::unique_ptr<simdjson::ondemand::parser> _ondemand_json_parser =
+            std::make_unique<simdjson::ondemand::parser>();
+    size_t _padded_size = rb.count() + simdjson::SIMDJSON_PADDING;
+    std::string _simdjson_ondemand_padding_buffer;
+    _simdjson_ondemand_padding_buffer.resize(_padded_size);
+    memcpy(&_simdjson_ondemand_padding_buffer.front(), rb.position(), rb.count());
+    simdjson::ondemand::document array_doc = _ondemand_json_parser->iterate(
+            std::string_view(_simdjson_ondemand_padding_buffer.data(), rb.count()), _padded_size);
+    auto value = array_doc.get_value();

Review Comment:
   I think it should in from_json begin avoid other complex type to call from_json 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1226158732


##########
be/src/vec/data_types/data_type_nullable.cpp:
##########
@@ -80,10 +81,27 @@ void DataTypeNullable::to_string(const IColumn& column, size_t row_num,
     }
 }
 
+Status DataTypeNullable::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    DCHECK(is_complex_type(nested_data_type));

Review Comment:
   from_nested_json



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588789124

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1580549851

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1222981644


##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -188,77 +188,49 @@ std::string DataTypeArray::to_string(const IColumn& column, size_t row_num) cons
     return str;
 }
 
-bool next_element_from_string(ReadBuffer& rb, StringRef& output, bool& has_quota) {
-    StringRef element(rb.position(), 0);
-    has_quota = false;
-    if (rb.eof()) {
-        return false;
-    }
-
-    // ltrim
-    while (!rb.eof() && isspace(*rb.position())) {
-        ++rb.position();
-        element.data = rb.position();
-    }
-
-    // parse string
-    if (*rb.position() == '"' || *rb.position() == '\'') {
-        const char str_sep = *rb.position();
-        size_t str_len = 1;
-        // search until next '"' or '\''
-        while (str_len < rb.count() && *(rb.position() + str_len) != str_sep) {
-            ++str_len;
-        }
-        // invalid string
-        if (str_len >= rb.count()) {
-            rb.position() = rb.end();
-            return false;
-        }
-        has_quota = true;
-        rb.position() += str_len + 1;
-        element.size += str_len + 1;
-    }
+Status DataTypeArray::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    CHECK(json_value.type() == simdjson::ondemand::json_type::array);
+    simdjson::ondemand::array outer_array = json_value.get_array();
+    auto* array_column = assert_cast<ColumnArray*>(column);
+    auto& offsets = array_column->get_offsets();
+    IColumn& nested_column = array_column->get_data();
+    DCHECK(nested_column.is_nullable());
+    auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
 
-    // parse array element until array separator ',' or end ']'
-    while (!rb.eof() && (*rb.position() != ',') && (rb.count() != 1 || *rb.position() != ']')) {
-        // invalid elements such as ["123" 456,"789" 777]
-        // correct elements such as ["123"    ,"789"    ]
-        if (has_quota && !isspace(*rb.position())) {
-            return false;
+    size_t element_num = 0;
+    for (auto it = outer_array.begin(); it != outer_array.end(); ++it) {
+        Status st;
+        try {
+            if (is_complex_type(remove_nullable(nested))) {
+                simdjson::ondemand::value val;
+                (*it).get(val);
+                st = nested->from_json(val, &nested_null_col);
+            } else {
+                std::string_view sv = (*it).raw_json_token().value();
+                ReadBuffer nested_rb(const_cast<char*>(sv.data()), sv.size());

Review Comment:
   it seems little confusing for me that some use `from_json` while others use `from_string`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1592440752

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1586644159

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588525459

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588525749

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588535393

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1226142589


##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -188,77 +189,64 @@ std::string DataTypeArray::to_string(const IColumn& column, size_t row_num) cons
     return str;
 }
 
-bool next_element_from_string(ReadBuffer& rb, StringRef& output, bool& has_quota) {
-    StringRef element(rb.position(), 0);
-    has_quota = false;
-    if (rb.eof()) {
-        return false;
+Status DataTypeArray::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    if (json_value.type() != simdjson::ondemand::json_type::array) {
+        return Status::InvalidArgument("Parse json data failed, not array type '{}'",
+                                       json_value.type().take_value());
     }
-
-    // ltrim
-    while (!rb.eof() && isspace(*rb.position())) {
-        ++rb.position();
-        element.data = rb.position();
-    }
-
-    // parse string
-    if (*rb.position() == '"' || *rb.position() == '\'') {
-        const char str_sep = *rb.position();
-        size_t str_len = 1;
-        // search until next '"' or '\''
-        while (str_len < rb.count() && *(rb.position() + str_len) != str_sep) {
-            ++str_len;
-        }
-        // invalid string
-        if (str_len >= rb.count()) {
-            rb.position() = rb.end();
-            return false;
-        }
-        has_quota = true;
-        rb.position() += str_len + 1;
-        element.size += str_len + 1;
-    }
-
-    // parse array element until array separator ',' or end ']'
-    while (!rb.eof() && (*rb.position() != ',') && (rb.count() != 1 || *rb.position() != ']')) {
-        // invalid elements such as ["123" 456,"789" 777]
-        // correct elements such as ["123"    ,"789"    ]
-        if (has_quota && !isspace(*rb.position())) {
-            return false;
+    simdjson::ondemand::array outer_array = json_value.get_array();
+    auto* array_column = assert_cast<ColumnArray*>(column);
+    auto& offsets = array_column->get_offsets();
+    IColumn& nested_column = array_column->get_data();
+    DCHECK(nested_column.is_nullable());
+    auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
+    bool is_string_nested = is_string(remove_nullable(nested));
+    size_t element_num = 0;
+    for (auto it = outer_array.begin(); it != outer_array.end(); ++it) {
+        Status st;
+        try {
+            if (is_complex_type(remove_nullable(nested))) {
+                simdjson::ondemand::value val;
+                auto error_code = (*it).get(val);
+                if (simdjson::SUCCESS != (*it).get(val)) {
+                    st = Status::InvalidArgument(
+                            "Parse json data failed, error code: {}, error "
+                            "info: {}",
+                            error_code, simdjson::error_message(error_code));
+                } else {
+                    st = nested->from_json(val, &nested_null_col);
+                }
+            } else {
+                std::string_view sv = simdjson::trim((*it).raw_json_token().value());
+                if (is_string_nested) {
+                    StringRef sr(sv.data(), sv.size());
+                    StringRef del("\"");
+                    sv = simd::VStringFunctions::trim(sr, del);
+                }
+                ReadBuffer nested_rb(const_cast<char*>(sv.data()), sv.size());
+                st = nested->from_string(nested_rb, &nested_column);

Review Comment:
   i don't think it's quite reasonable to call from_string here, `from_json` semantic is different from `from_string`



##########
be/src/vec/data_types/data_type_nullable.cpp:
##########
@@ -80,10 +81,27 @@ void DataTypeNullable::to_string(const IColumn& column, size_t row_num,
     }
 }
 
+Status DataTypeNullable::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    DCHECK(is_complex_type(nested_data_type));

Review Comment:
   what if it's not complex type, i think we should not DCHECK complex type here



##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -188,77 +189,64 @@ std::string DataTypeArray::to_string(const IColumn& column, size_t row_num) cons
     return str;
 }
 
-bool next_element_from_string(ReadBuffer& rb, StringRef& output, bool& has_quota) {
-    StringRef element(rb.position(), 0);
-    has_quota = false;
-    if (rb.eof()) {
-        return false;
+Status DataTypeArray::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    if (json_value.type() != simdjson::ondemand::json_type::array) {
+        return Status::InvalidArgument("Parse json data failed, not array type '{}'",
+                                       json_value.type().take_value());
     }
-
-    // ltrim
-    while (!rb.eof() && isspace(*rb.position())) {
-        ++rb.position();
-        element.data = rb.position();
-    }
-
-    // parse string
-    if (*rb.position() == '"' || *rb.position() == '\'') {
-        const char str_sep = *rb.position();
-        size_t str_len = 1;
-        // search until next '"' or '\''
-        while (str_len < rb.count() && *(rb.position() + str_len) != str_sep) {
-            ++str_len;
-        }
-        // invalid string
-        if (str_len >= rb.count()) {
-            rb.position() = rb.end();
-            return false;
-        }
-        has_quota = true;
-        rb.position() += str_len + 1;
-        element.size += str_len + 1;
-    }
-
-    // parse array element until array separator ',' or end ']'
-    while (!rb.eof() && (*rb.position() != ',') && (rb.count() != 1 || *rb.position() != ']')) {
-        // invalid elements such as ["123" 456,"789" 777]
-        // correct elements such as ["123"    ,"789"    ]
-        if (has_quota && !isspace(*rb.position())) {
-            return false;
+    simdjson::ondemand::array outer_array = json_value.get_array();
+    auto* array_column = assert_cast<ColumnArray*>(column);
+    auto& offsets = array_column->get_offsets();
+    IColumn& nested_column = array_column->get_data();
+    DCHECK(nested_column.is_nullable());
+    auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
+    bool is_string_nested = is_string(remove_nullable(nested));
+    size_t element_num = 0;
+    for (auto it = outer_array.begin(); it != outer_array.end(); ++it) {
+        Status st;
+        try {

Review Comment:
   why not `try catch`outof for loop? could iterate array throw simd execption?



##########
be/src/vec/data_types/data_type_nullable.cpp:
##########
@@ -80,10 +81,27 @@ void DataTypeNullable::to_string(const IColumn& column, size_t row_num,
     }
 }
 
+Status DataTypeNullable::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    DCHECK(is_complex_type(nested_data_type));

Review Comment:
   why dcheck complex in DataTypeNullable? 



##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -188,77 +189,64 @@ std::string DataTypeArray::to_string(const IColumn& column, size_t row_num) cons
     return str;
 }
 
-bool next_element_from_string(ReadBuffer& rb, StringRef& output, bool& has_quota) {
-    StringRef element(rb.position(), 0);
-    has_quota = false;
-    if (rb.eof()) {
-        return false;
+Status DataTypeArray::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    if (json_value.type() != simdjson::ondemand::json_type::array) {
+        return Status::InvalidArgument("Parse json data failed, not array type '{}'",
+                                       json_value.type().take_value());
     }
-
-    // ltrim
-    while (!rb.eof() && isspace(*rb.position())) {
-        ++rb.position();
-        element.data = rb.position();
-    }
-
-    // parse string
-    if (*rb.position() == '"' || *rb.position() == '\'') {
-        const char str_sep = *rb.position();
-        size_t str_len = 1;
-        // search until next '"' or '\''
-        while (str_len < rb.count() && *(rb.position() + str_len) != str_sep) {
-            ++str_len;
-        }
-        // invalid string
-        if (str_len >= rb.count()) {
-            rb.position() = rb.end();
-            return false;
-        }
-        has_quota = true;
-        rb.position() += str_len + 1;
-        element.size += str_len + 1;
-    }
-
-    // parse array element until array separator ',' or end ']'
-    while (!rb.eof() && (*rb.position() != ',') && (rb.count() != 1 || *rb.position() != ']')) {
-        // invalid elements such as ["123" 456,"789" 777]
-        // correct elements such as ["123"    ,"789"    ]
-        if (has_quota && !isspace(*rb.position())) {
-            return false;
+    simdjson::ondemand::array outer_array = json_value.get_array();
+    auto* array_column = assert_cast<ColumnArray*>(column);
+    auto& offsets = array_column->get_offsets();
+    IColumn& nested_column = array_column->get_data();
+    DCHECK(nested_column.is_nullable());
+    auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
+    bool is_string_nested = is_string(remove_nullable(nested));
+    size_t element_num = 0;
+    for (auto it = outer_array.begin(); it != outer_array.end(); ++it) {
+        Status st;
+        try {
+            if (is_complex_type(remove_nullable(nested))) {
+                simdjson::ondemand::value val;
+                auto error_code = (*it).get(val);
+                if (simdjson::SUCCESS != (*it).get(val)) {
+                    st = Status::InvalidArgument(
+                            "Parse json data failed, error code: {}, error "
+                            "info: {}",
+                            error_code, simdjson::error_message(error_code));
+                } else {
+                    st = nested->from_json(val, &nested_null_col);
+                }
+            } else {
+                std::string_view sv = simdjson::trim((*it).raw_json_token().value());
+                if (is_string_nested) {
+                    StringRef sr(sv.data(), sv.size());
+                    StringRef del("\"");
+                    sv = simd::VStringFunctions::trim(sr, del);
+                }
+                ReadBuffer nested_rb(const_cast<char*>(sv.data()), sv.size());
+                st = nested->from_string(nested_rb, &nested_column);

Review Comment:
   mix them could be miss leading



##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -188,77 +189,64 @@ std::string DataTypeArray::to_string(const IColumn& column, size_t row_num) cons
     return str;
 }
 
-bool next_element_from_string(ReadBuffer& rb, StringRef& output, bool& has_quota) {
-    StringRef element(rb.position(), 0);
-    has_quota = false;
-    if (rb.eof()) {
-        return false;
+Status DataTypeArray::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    if (json_value.type() != simdjson::ondemand::json_type::array) {
+        return Status::InvalidArgument("Parse json data failed, not array type '{}'",
+                                       json_value.type().take_value());
     }
-
-    // ltrim
-    while (!rb.eof() && isspace(*rb.position())) {
-        ++rb.position();
-        element.data = rb.position();
-    }
-
-    // parse string
-    if (*rb.position() == '"' || *rb.position() == '\'') {
-        const char str_sep = *rb.position();
-        size_t str_len = 1;
-        // search until next '"' or '\''
-        while (str_len < rb.count() && *(rb.position() + str_len) != str_sep) {
-            ++str_len;
-        }
-        // invalid string
-        if (str_len >= rb.count()) {
-            rb.position() = rb.end();
-            return false;
-        }
-        has_quota = true;
-        rb.position() += str_len + 1;
-        element.size += str_len + 1;
-    }
-
-    // parse array element until array separator ',' or end ']'
-    while (!rb.eof() && (*rb.position() != ',') && (rb.count() != 1 || *rb.position() != ']')) {
-        // invalid elements such as ["123" 456,"789" 777]
-        // correct elements such as ["123"    ,"789"    ]
-        if (has_quota && !isspace(*rb.position())) {
-            return false;
+    simdjson::ondemand::array outer_array = json_value.get_array();
+    auto* array_column = assert_cast<ColumnArray*>(column);
+    auto& offsets = array_column->get_offsets();
+    IColumn& nested_column = array_column->get_data();
+    DCHECK(nested_column.is_nullable());

Review Comment:
   what if nested is not nullable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588535501

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588787438

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1222361210


##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -267,56 +239,16 @@ Status DataTypeArray::from_string(ReadBuffer& rb, IColumn* column) const {
         return Status::InvalidArgument("Array does not end with ']' character, found '{}'",
                                        *(rb.end() - 1));
     }
-    // empty array []
-    if (rb.count() == 2) {
-        offsets.push_back(offsets.back());
-        return Status::OK();
-    }
-    ++rb.position();
-
-    size_t element_num = 0;
-    // parse array element until end of array
-    while (!rb.eof()) {
-        StringRef element(rb.position(), rb.count());
-        bool has_quota = false;
-        if (!next_element_from_string(rb, element, has_quota)) {
-            // we should do array element column revert if error
-            nested_column.pop_back(element_num);
-            return Status::InvalidArgument("Cannot read array element from text '{}'",
-                                           element.to_string());
-        }
-
-        // handle empty element
-        if (element.size == 0) {
-            auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
-            nested_null_col.get_nested_column().insert_default();
-            nested_null_col.get_null_map_data().push_back(0);
-            ++element_num;
-            continue;
-        }
-
-        // handle null element, need to distinguish null and "null"
-        if (!has_quota && element.size == 4 && strncmp(element.data, "null", 4) == 0) {
-            // insert null
-            auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
-            nested_null_col.get_nested_column().insert_default();
-            nested_null_col.get_null_map_data().push_back(1);
-            ++element_num;
-            continue;
-        }
-
-        // handle normal element
-        ReadBuffer read_buffer(const_cast<char*>(element.data), element.size);
-        auto st = nested->from_string(read_buffer, &nested_column);
-        if (!st.ok()) {
-            // we should do array element column revert if error
-            nested_column.pop_back(element_num);
-            return st;
-        }
-        ++element_num;
-    }
-    offsets.push_back(offsets.back() + element_num);
-    return Status::OK();
+    // json parser
+    std::unique_ptr<simdjson::ondemand::parser> _ondemand_json_parser =
+            std::make_unique<simdjson::ondemand::parser>();
+    size_t _padded_size = rb.count() + simdjson::SIMDJSON_PADDING;
+    std::string _simdjson_ondemand_padding_buffer;
+    _simdjson_ondemand_padding_buffer.resize(_padded_size);

Review Comment:
   this is not a member variable, use should use `simdjson_ondemand_padding_buffer` instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1222360523


##########
regression-test/suites/load_p0/stream_load/test_array_nested_array_load.groovy:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   OK I will make more wrong cases  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1222872396


##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -267,56 +239,16 @@ Status DataTypeArray::from_string(ReadBuffer& rb, IColumn* column) const {
         return Status::InvalidArgument("Array does not end with ']' character, found '{}'",
                                        *(rb.end() - 1));
     }
-    // empty array []
-    if (rb.count() == 2) {
-        offsets.push_back(offsets.back());
-        return Status::OK();
-    }
-    ++rb.position();
-
-    size_t element_num = 0;
-    // parse array element until end of array
-    while (!rb.eof()) {
-        StringRef element(rb.position(), rb.count());
-        bool has_quota = false;
-        if (!next_element_from_string(rb, element, has_quota)) {
-            // we should do array element column revert if error
-            nested_column.pop_back(element_num);
-            return Status::InvalidArgument("Cannot read array element from text '{}'",
-                                           element.to_string());
-        }
-
-        // handle empty element
-        if (element.size == 0) {
-            auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
-            nested_null_col.get_nested_column().insert_default();
-            nested_null_col.get_null_map_data().push_back(0);
-            ++element_num;
-            continue;
-        }
-
-        // handle null element, need to distinguish null and "null"
-        if (!has_quota && element.size == 4 && strncmp(element.data, "null", 4) == 0) {
-            // insert null
-            auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
-            nested_null_col.get_nested_column().insert_default();
-            nested_null_col.get_null_map_data().push_back(1);
-            ++element_num;
-            continue;
-        }
-
-        // handle normal element
-        ReadBuffer read_buffer(const_cast<char*>(element.data), element.size);
-        auto st = nested->from_string(read_buffer, &nested_column);
-        if (!st.ok()) {
-            // we should do array element column revert if error
-            nested_column.pop_back(element_num);
-            return st;
-        }
-        ++element_num;
-    }
-    offsets.push_back(offsets.back() + element_num);
-    return Status::OK();
+    // json parser
+    std::unique_ptr<simdjson::ondemand::parser> _ondemand_json_parser =
+            std::make_unique<simdjson::ondemand::parser>();
+    size_t _padded_size = rb.count() + simdjson::SIMDJSON_PADDING;
+    std::string _simdjson_ondemand_padding_buffer;
+    _simdjson_ondemand_padding_buffer.resize(_padded_size);
+    memcpy(&_simdjson_ondemand_padding_buffer.front(), rb.position(), rb.count());
+    simdjson::ondemand::document array_doc = _ondemand_json_parser->iterate(
+            std::string_view(_simdjson_ondemand_padding_buffer.data(), rb.count()), _padded_size);
+    auto value = array_doc.get_value();

Review Comment:
   should check if it is array



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1227511488


##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -188,77 +189,64 @@ std::string DataTypeArray::to_string(const IColumn& column, size_t row_num) cons
     return str;
 }
 
-bool next_element_from_string(ReadBuffer& rb, StringRef& output, bool& has_quota) {
-    StringRef element(rb.position(), 0);
-    has_quota = false;
-    if (rb.eof()) {
-        return false;
+Status DataTypeArray::from_json(simdjson::ondemand::value& json_value, IColumn* column) const {
+    if (json_value.type() != simdjson::ondemand::json_type::array) {
+        return Status::InvalidArgument("Parse json data failed, not array type '{}'",
+                                       json_value.type().take_value());
     }
-
-    // ltrim
-    while (!rb.eof() && isspace(*rb.position())) {
-        ++rb.position();
-        element.data = rb.position();
-    }
-
-    // parse string
-    if (*rb.position() == '"' || *rb.position() == '\'') {
-        const char str_sep = *rb.position();
-        size_t str_len = 1;
-        // search until next '"' or '\''
-        while (str_len < rb.count() && *(rb.position() + str_len) != str_sep) {
-            ++str_len;
-        }
-        // invalid string
-        if (str_len >= rb.count()) {
-            rb.position() = rb.end();
-            return false;
-        }
-        has_quota = true;
-        rb.position() += str_len + 1;
-        element.size += str_len + 1;
-    }
-
-    // parse array element until array separator ',' or end ']'
-    while (!rb.eof() && (*rb.position() != ',') && (rb.count() != 1 || *rb.position() != ']')) {
-        // invalid elements such as ["123" 456,"789" 777]
-        // correct elements such as ["123"    ,"789"    ]
-        if (has_quota && !isspace(*rb.position())) {
-            return false;
+    simdjson::ondemand::array outer_array = json_value.get_array();
+    auto* array_column = assert_cast<ColumnArray*>(column);
+    auto& offsets = array_column->get_offsets();
+    IColumn& nested_column = array_column->get_data();
+    DCHECK(nested_column.is_nullable());
+    auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
+    bool is_string_nested = is_string(remove_nullable(nested));
+    size_t element_num = 0;
+    for (auto it = outer_array.begin(); it != outer_array.end(); ++it) {
+        Status st;
+        try {

Review Comment:
   done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588519289

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1588781259

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] amorynan commented on pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "amorynan (via GitHub)" <gi...@apache.org>.
amorynan commented on PR #20556:
URL: https://github.com/apache/doris/pull/20556#issuecomment-1590431255

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1222361210


##########
be/src/vec/data_types/data_type_array.cpp:
##########
@@ -267,56 +239,16 @@ Status DataTypeArray::from_string(ReadBuffer& rb, IColumn* column) const {
         return Status::InvalidArgument("Array does not end with ']' character, found '{}'",
                                        *(rb.end() - 1));
     }
-    // empty array []
-    if (rb.count() == 2) {
-        offsets.push_back(offsets.back());
-        return Status::OK();
-    }
-    ++rb.position();
-
-    size_t element_num = 0;
-    // parse array element until end of array
-    while (!rb.eof()) {
-        StringRef element(rb.position(), rb.count());
-        bool has_quota = false;
-        if (!next_element_from_string(rb, element, has_quota)) {
-            // we should do array element column revert if error
-            nested_column.pop_back(element_num);
-            return Status::InvalidArgument("Cannot read array element from text '{}'",
-                                           element.to_string());
-        }
-
-        // handle empty element
-        if (element.size == 0) {
-            auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
-            nested_null_col.get_nested_column().insert_default();
-            nested_null_col.get_null_map_data().push_back(0);
-            ++element_num;
-            continue;
-        }
-
-        // handle null element, need to distinguish null and "null"
-        if (!has_quota && element.size == 4 && strncmp(element.data, "null", 4) == 0) {
-            // insert null
-            auto& nested_null_col = reinterpret_cast<ColumnNullable&>(nested_column);
-            nested_null_col.get_nested_column().insert_default();
-            nested_null_col.get_null_map_data().push_back(1);
-            ++element_num;
-            continue;
-        }
-
-        // handle normal element
-        ReadBuffer read_buffer(const_cast<char*>(element.data), element.size);
-        auto st = nested->from_string(read_buffer, &nested_column);
-        if (!st.ok()) {
-            // we should do array element column revert if error
-            nested_column.pop_back(element_num);
-            return st;
-        }
-        ++element_num;
-    }
-    offsets.push_back(offsets.back() + element_num);
-    return Status::OK();
+    // json parser
+    std::unique_ptr<simdjson::ondemand::parser> _ondemand_json_parser =
+            std::make_unique<simdjson::ondemand::parser>();
+    size_t _padded_size = rb.count() + simdjson::SIMDJSON_PADDING;
+    std::string _simdjson_ondemand_padding_buffer;
+    _simdjson_ondemand_padding_buffer.resize(_padded_size);

Review Comment:
   this is not a member function, use should use `simdjson_ondemand_padding_buffer` instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #20556: [Improve](array-type) update stream_load for array nested array

Posted by "eldenmoon (via GitHub)" <gi...@apache.org>.
eldenmoon commented on code in PR #20556:
URL: https://github.com/apache/doris/pull/20556#discussion_r1221505595


##########
regression-test/suites/load_p0/stream_load/test_array_nested_array_load.groovy:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   we should add malformed json cases, in case that the simdjson might through exception



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org