You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/06 02:56:36 UTC

[GitHub] [arrow] n3world commented on a change in pull request #10202: ARROW-12001: [C++] Add parser handler for incorrect column counts

n3world commented on a change in pull request #10202:
URL: https://github.com/apache/arrow/pull/10202#discussion_r627043424



##########
File path: cpp/src/arrow/csv/parser.cc
##########
@@ -76,9 +76,45 @@ class PresizedDataWriter {
     parsed_[parsed_size_++] = static_cast<uint8_t>(c);
   }
 
+  // Push the value of a fully complete field. This should only be used to fill in missing
+  // values. This method can reallocate the buffer if there isn't enough extra space for
+  // the field.
+  Status PushField(const std::string& field) {
+    if (field.length() > extra_allocated_) {
+      // just in case this happens more allocate enough for 10x this amount
+      auto to_allocate = static_cast<uint32_t>(
+          std::max(field.length() * 10, static_cast<std::string::size_type>(128)));

Review comment:
       I have a bit of a confession. My original intent was to write a way to handle rows with incorrect number of columns and not add nulls or truncate the rows but instead record them in a custom handler. I actually just piggy backed on this issue since it was a subset of what I wanted to implement.
   With that being said I would strongly like to be able to keep the custom handlers in the API.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org