You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/01/26 10:50:35 UTC
[GitHub] [nifi-minifi-cpp] fgerlits commented on a change in pull request #1248: MINIFICPP-1702: DefragmentText multiinput improvement
fgerlits commented on a change in pull request #1248:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1248#discussion_r792455005
##########
File path: docker/test/integration/features/defragtextflowfiles.feature
##########
@@ -2,7 +2,34 @@ Feature: DefragmentText can defragment fragmented data from TailFile
Background:
Given the content of "/tmp/output" is monitored
- Scenario Outline: DefragmentText merges split messages from TailFile
+ Scenario Outline: DefragmentText correctly merges split messages from multiple TailFile
+ Given a TailFile processor with the name "TailOne" and the "File to Tail" property set to "/tmp/input/test_file_one.log"
+ And the "Initial Start Position" property of the TailOne processor is set to "Beginning of File"
+ And the "Input Delimiter" property of the TailOne processor is set to "%"
+ And a TailFile processor with the name "TailTwo" and the "File to Tail" property set to "/tmp/input/test_file_two.log"
+ And the "Initial Start Position" property of the TailTwo processor is set to "Beginning of File"
+ And the "Input Delimiter" property of the TailTwo processor is set to "%"
+ And "TailTwo" processor is a start node
Review comment:
Instead of (or in addition to) this test, a more typical use case would be a single TailFile processor with `tail-mode` = `Multiple file`, `tail-base-directory` = `/tmp/input` and `File to Tail` = `test_file_.*\.log`.
##########
File path: extensions/standard-processors/processors/DefragmentText.cpp
##########
@@ -297,29 +297,34 @@ void DefragmentText::Buffer::store(core::ProcessSession* session, const std::sha
}
}
-bool DefragmentText::Buffer::isCompatible(const core::FlowFile& fragment) const {
+std::optional<size_t> DefragmentText::Buffer::getNextFragmentOffset() const {
if (empty())
- return true;
- if (buffered_flow_file_->getAttribute(textfragmentutils::BASE_NAME_ATTRIBUTE)
- != fragment.getAttribute(textfragmentutils::BASE_NAME_ATTRIBUTE)) {
- return false;
- }
- if (buffered_flow_file_->getAttribute(textfragmentutils::POST_NAME_ATTRIBUTE)
- != fragment.getAttribute(textfragmentutils::POST_NAME_ATTRIBUTE)) {
- return false;
- }
- std::string current_offset_str, append_offset_str;
- if (buffered_flow_file_->getAttribute(textfragmentutils::OFFSET_ATTRIBUTE, current_offset_str)
- != fragment.getAttribute(textfragmentutils::OFFSET_ATTRIBUTE, append_offset_str)) {
- return false;
- }
- if (!current_offset_str.empty() && !append_offset_str.empty()) {
- size_t current_offset = std::stoi(current_offset_str);
- size_t append_offset = std::stoi(append_offset_str);
- if (current_offset + buffered_flow_file_->getSize() != append_offset)
- return false;
- }
- return true;
+ return std::nullopt;
+ if (auto offset_attribute = buffered_flow_file_->getAttribute(textfragmentutils::OFFSET_ATTRIBUTE))
+ return std::stoi(*offset_attribute) + buffered_flow_file_->getSize();
+ return std::nullopt;
+}
+
+DefragmentText::FragmentSource::Id::Id(const core::FlowFile& flow_file) {
+ if (auto base_name_attribute = flow_file.getAttribute(textfragmentutils::BASE_NAME_ATTRIBUTE))
+ base_name_attribute_ = *base_name_attribute;
+ if (auto post_name_attribute = flow_file.getAttribute(textfragmentutils::POST_NAME_ATTRIBUTE))
+ post_name_attribute_ = *post_name_attribute;
Review comment:
This won't work in the Kubernetes use case, where all log files are called `0.log` (or `1.log` etc after restarts), and the pod name etc are contained in the path. For example:
```
FlowFile Attributes Map Content
key:TextFragmentAttribute.base_name value:0
key:TextFragmentAttribute.offset value:3431278
key:TextFragmentAttribute.post_name value:log
key:absolute.path value:/var/log/pods/default_counter_dd5befc8-5573-40c3-a136-8daf6eb77b01/count/0.log
key:filename value:0.3431278-3431357.log
key:flow.id value:cbd22e73-f01b-43ee-aa73-a28963dc1d56
key:path value:/var/log/pods/default_counter_dd5befc8-5573-40c3-a136-8daf6eb77b01/count
```
I think `absolute.path` would be a good choice.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org