You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Jens M. Kofoed" <jm...@gmail.com> on 2021/10/15 05:38:49 UTC

CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Dear developers

For a couple of days ago I wrote an email about file corruption with
Put/Fetch SFTP. As Joe Witt have been writing it should not be sftp nor
network issues but something else. We just don't know yet what causes these
issues we are seeing very very rare.
Therefore I had setup a test flow at my 3 node cluster running 9x 500MB
files in a loop, and this morning there was another file where the sha256
is not matching.

To check if the files was actually corrupted I ran a sha256sum in linux on
the original file and at the file which should be corrupted. The two hashes
was the same, and it match the sha256 which was calculated the very first
time. Therefore a added  another cryptographicHashContent after my "PutSFTP
Corrupt" (Red box in picture) to calculate a new sha256 and to my surprise
it calculated a different sha256.

In our production flow there was also a file where the hash did not match,
and was going in our corruption error flow. Here I also ran the flowfile
through another one CryptographicHashContent, and this time it also
calculated a new hash which was the same as the original, and therefore the
file are not corrupted anyway. YES !!!!

The big problem are now, that 2 times this night and morning 2 different
files have been running through a CryptographicHashContent and given a
different hash that it was original. And next time it ran through another
CryptographicHashContent it calculate the "original" hash again.
Are there a very very rare bug here? That could influence the calculation
inside NIFI to calculate a wrong hash?

Kind regards
Jens

[image: image.png]

[image: image.png]