You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/01/26 10:50:35 UTC

[GitHub] [nifi-minifi-cpp] fgerlits commented on a change in pull request #1248: MINIFICPP-1702: DefragmentText multiinput improvement

fgerlits commented on a change in pull request #1248:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1248#discussion_r792455005



##########
File path: docker/test/integration/features/defragtextflowfiles.feature
##########
@@ -2,7 +2,34 @@ Feature: DefragmentText can defragment fragmented data from TailFile
   Background:
     Given the content of "/tmp/output" is monitored
 
-  Scenario Outline: DefragmentText merges split messages from TailFile
+  Scenario Outline: DefragmentText correctly merges split messages from multiple TailFile
+    Given a TailFile processor with the name "TailOne" and the "File to Tail" property set to "/tmp/input/test_file_one.log"
+    And the "Initial Start Position" property of the TailOne processor is set to "Beginning of File"
+    And the "Input Delimiter" property of the TailOne processor is set to "%"
+    And a TailFile processor with the name "TailTwo" and the "File to Tail" property set to "/tmp/input/test_file_two.log"
+    And the "Initial Start Position" property of the TailTwo processor is set to "Beginning of File"
+    And the "Input Delimiter" property of the TailTwo processor is set to "%"
+    And "TailTwo" processor is a start node

Review comment:
       Instead of (or in addition to) this test, a more typical use case would be a single TailFile processor with `tail-mode` = `Multiple file`, `tail-base-directory` = `/tmp/input` and `File to Tail` = `test_file_.*\.log`.

##########
File path: extensions/standard-processors/processors/DefragmentText.cpp
##########
@@ -297,29 +297,34 @@ void DefragmentText::Buffer::store(core::ProcessSession* session, const std::sha
   }
 }
 
-bool DefragmentText::Buffer::isCompatible(const core::FlowFile& fragment) const {
+std::optional<size_t> DefragmentText::Buffer::getNextFragmentOffset() const {
   if (empty())
-    return true;
-  if (buffered_flow_file_->getAttribute(textfragmentutils::BASE_NAME_ATTRIBUTE)
-      != fragment.getAttribute(textfragmentutils::BASE_NAME_ATTRIBUTE)) {
-    return false;
-  }
-  if (buffered_flow_file_->getAttribute(textfragmentutils::POST_NAME_ATTRIBUTE)
-      != fragment.getAttribute(textfragmentutils::POST_NAME_ATTRIBUTE)) {
-    return false;
-  }
-  std::string current_offset_str, append_offset_str;
-  if (buffered_flow_file_->getAttribute(textfragmentutils::OFFSET_ATTRIBUTE, current_offset_str)
-      != fragment.getAttribute(textfragmentutils::OFFSET_ATTRIBUTE, append_offset_str)) {
-    return false;
-  }
-  if (!current_offset_str.empty() && !append_offset_str.empty()) {
-    size_t current_offset = std::stoi(current_offset_str);
-    size_t append_offset = std::stoi(append_offset_str);
-    if (current_offset + buffered_flow_file_->getSize() != append_offset)
-      return false;
-  }
-  return true;
+    return std::nullopt;
+  if (auto offset_attribute = buffered_flow_file_->getAttribute(textfragmentutils::OFFSET_ATTRIBUTE))
+    return std::stoi(*offset_attribute) + buffered_flow_file_->getSize();
+  return std::nullopt;
+}
+
+DefragmentText::FragmentSource::Id::Id(const core::FlowFile& flow_file) {
+  if (auto base_name_attribute = flow_file.getAttribute(textfragmentutils::BASE_NAME_ATTRIBUTE))
+    base_name_attribute_ = *base_name_attribute;
+  if (auto post_name_attribute = flow_file.getAttribute(textfragmentutils::POST_NAME_ATTRIBUTE))
+    post_name_attribute_ = *post_name_attribute;

Review comment:
       This won't work in the Kubernetes use case, where all log files are called `0.log` (or `1.log` etc after restarts), and the pod name etc are contained in the path.  For example:
   ```
   FlowFile Attributes Map Content
   key:TextFragmentAttribute.base_name value:0
   key:TextFragmentAttribute.offset value:3431278
   key:TextFragmentAttribute.post_name value:log
   key:absolute.path value:/var/log/pods/default_counter_dd5befc8-5573-40c3-a136-8daf6eb77b01/count/0.log
   key:filename value:0.3431278-3431357.log
   key:flow.id value:cbd22e73-f01b-43ee-aa73-a28963dc1d56
   key:path value:/var/log/pods/default_counter_dd5befc8-5573-40c3-a136-8daf6eb77b01/count
   ```
   I think `absolute.path` would be a good choice.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org