You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Malthe <mb...@gmail.com> on 2019/09/13 09:15:38 UTC

Issue with empty files in content repository

Trying to figure out why a `MergeContent` processor was producing a
linearly rising amount of content which wasn't reaped correctly (the
retention policies would not be upheld and disk space would fall to
zero), we realized that some flow files in the queue pointed to
content which didn't exist on disk. The file in the content repository
was zero bytes.

How might this have happened and if it happens, shouldn't processors
somehow be able to recover from it?

What seems to happen is that the flow file goes right back into the
queue where it will of course fail again. Further, a simple grep seems
to show that references to the empty content file id appears in many
other files in the content repository. This seems to suggest that all
this content can't be reaped because there it is still being
referenced somehow and thus isn't applicable for archival and/or
deletion.

Thanks for any ideas.