You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Jens M. Kofoed" <jm...@gmail.com> on 2021/10/19 05:22:27 UTC

Fwd: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Dear NIFI Users

I have posted this mail in the developers mailing list and just want to
inform all of our about a very odd behavior we are facing.
The background:
We have data going between 2 different NIFI systems which has no direct
network access to each other. Therefore we calculate a SHA256 hash value of
the content at system 1, before the flowfile and data are combined and
saved as a "flowfile-stream-v3" pkg file. The file is then transported to
system 2, where the pkg file is unpacked and the flow can continue. To be
sure about file integrity we calculate a new sha256 at system 2. But
sometimes we see that the sha256 gets another value, which might suggest
the file was corrupted. But recalculating the sha256 again gives a new hash
value.

----

Tonight I had yet another file which didn't match the expected sha256 hash
value. The content is a 1.7GB file and the Event Duration was
"00:00:17.539" to calculate the hash.
I have created a Retry loop, where the file will go to a Wait process for
delaying the file 1 minute and going back to the CryptographicHashContent
for a new calculation. After 3 retries the file goes to the
retries_exceeded and goes to a disabled process just to be in a queue so I
manually can look at it. This morning I rerouted the file from my
retries_exceeded queue back to the CryptographicHashContent for a new
calculation and this time it calculated the correct hash value.

THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is
happening.
[image: image.png]

We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with
openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
I have inspected different logs to see if I can find any correlation what
happened at the same time as the file is going through my loop, but there
are no event/task at that exact time.

System 1:
At 10/19/2021 00:15:11.247 CEST my file is going through
a CryptographicHashContent: SHA256 value:
dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
The file is exported as a "FlowFile Stream, v3" to System 2

SYSTEM 2:
At 10/19/2021 00:18:10.528 CEST the file is going through
a CryptographicHashContent: SHA256 value:
f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
[image: image.png]
At 10/19/2021 00:19:08.996 CEST the file is going through the same
CryptographicHashContent at system 2: SHA256 value:
f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
At 10/19/2021 00:20:04.376 CEST the file is going through the
same a CryptographicHashContent at system 2: SHA256 value:
f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
At 10/19/2021 00:21:01.711 CEST the file is going through the
same a CryptographicHashContent at system 2: SHA256 value:
f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819

At 10/19/2021 06:07:43.376 CEST the file is going through the
same a CryptographicHashContent at system 2: SHA256 value:
dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
[image: image.png]

How on earth can this happen???

Kind Regards
Jens M. Kofoed

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Mark

I'm back at the office and looking into your suggestions.

1) I'm lookin at the bootstrap.conf file and I can't find any information
about the garbage collector except that the #java.arg.13=-XX:+UseG1GC is
not active. And I'm using java 8, so that's why I didn't enabled it.
This is all the active lines from the boostrap.conf file:
java=java
run.as=nifi
lib.dir=./lib
conf.dir=./conf
graceful.shutdown.seconds=20
java.arg.1=-Dorg.apache.jasper.compiler.disablejsr199=true
java.arg.2=-Xms6g
java.arg.3=-Xmx6g
java.arg.4=-Djava.net.preferIPv4Stack=true
java.arg.5=-Dsun.net.http.allowRestrictedHeaders=true
java.arg.6=-Djava.protocol.handler.pkgs=sun.net.www.protocol
java.arg.14=-Djava.awt.headless=true
nifi.bootstrap.sensitive.key=
java.arg.15=-Djava.security.egd=file:/dev/urandom
java.arg.16=-Djavax.security.auth.useSubjectCredsOnly=true
java.arg.17=-Dzookeeper.admin.enableServer=false
notification.services.file=./conf/bootstrap-notification-services.xml
notification.max.attempts=5
java.arg.curator.supress.excessive.logs=-Dcurator-log-only-first-connection-issue-as-error-level=true

2) I've cleaned up my test cluster and made a backup just as Mark described.

3) About the repository.always.sync. I've set it to true for all 3
repositories.

I'm also investigating what happens when vmWare are creating a snapshot and
backups a vm. I've found other posts on the internet that vmWare Tools are
making stuff with the disks for optimizing the vm before a backup. So I'm
trying to force it to fail.
The issues is that there can be days between a file is "corrupted" or
actually the hashing of the content is going wrong. But I will let it run.

Many thanks for all your help and I will let you know as soon I have news

Kind Regards
Jens

Den ons. 3. nov. 2021 kl. 15.58 skrev Mark Payne <ma...@hotmail.com>:

> So what I found interesting about the histogram output was that in each
> case, the input file was 1 GB. The number of bytes that differed between
> the ‘good’ and ‘bad’ hashes was something like 500-700 bytes whose values
> were different. But the values ranged significantly. There was no
> indication that the type of thing we’ve seen with NFS mounts was happening,
> where data was nulled out until received and then updated. If that had been
> the case we’d have seen the NUL byte (or some other value) have a very
> significant change in the histogram, but we didn’t see that.
>
> So a couple more ideas that I think can be useful.
>
> 1) Which garbage collector are you using? It’s configured in the
> bootstrap.conf file
>
> 2) We can try to definitively prove out whether the content on the disk is
> changing or if there’s an issue reading the content. To do this:
>
> 1. Stop all processors.
> 2. Shutdown nifi
> 3. rm -rf content_repository; rm -rf flowfile_repository   (warning, this
> will delete all FlowFiles & content, so only do this on a dev/test system
> where you’re comfortable deleting it!)
> 4. Start nifi
> 5. Let exactly 1 FlowFile into your flow.
> 6. While it is looping through, create a copy of your entire Content
> Repository: cp -r content_repository content_backup1; zip
> content_backup1.zip content_backup1
> 7. Wait for the hashes to differ
> 8. Create another copy of the Content Repository: cp -r content_repository
> content_backup2
> 9. Find the files within the content_backup1 and content_backup2 and
> compare them to see if they are identical. Would recommend comparing them
> using each of the 3 methods: sha256, sha512, diff
>
> This should make it pretty clear that either:
> (1) the issue resides in the software: either NiFi or the JVM
> (2) the issue resides outside of the software: the disk, the disk driver,
> the operating system, the VM hypervisor, etc.
>
> Thanks
> -Mark
>
> > On Nov 3, 2021, at 10:44 AM, Joe Witt <jo...@gmail.com> wrote:
> >
> > Jens,
> >
> > 184 hours (7.6 days) in and zero issues.
> >
> > Will need to turn this off soon but wanted to give a final update.
> > Looks great.  Given the information on your system there appears to be
> > something we dont understand related to the virtual file system
> > involved or something.
> >
> > Thanks
> >
> > On Tue, Nov 2, 2021 at 10:55 PM Jens M. Kofoed <jm...@gmail.com>
> wrote:
> >>
> >> Hi Mark
> >>
> >> Of course, sorry :-)  By looking at the error messages, I can see that
> it is only the histograms which has differences which is listed. And all 3
> have the first issue at histogram.9. Don't know what that mean
> >>
> >> /Jens
> >>
> >> Here are the error log:
> >> 2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10]
> org.apache.nifi.processors.script.ExecuteScript
> ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are
> differences in the histogram
> >> Byte Value: histogram.10, Previous Count: 11926720, New Count: 11926721
> >> Byte Value: histogram.100, Previous Count: 11927504, New Count: 11927503
> >> Byte Value: histogram.101, Previous Count: 11925396, New Count: 11925407
> >> Byte Value: histogram.102, Previous Count: 11929923, New Count: 11929941
> >> Byte Value: histogram.103, Previous Count: 11931596, New Count: 11931591
> >> Byte Value: histogram.104, Previous Count: 11929071, New Count: 11929064
> >> Byte Value: histogram.105, Previous Count: 11931365, New Count: 11931348
> >> Byte Value: histogram.106, Previous Count: 11928661, New Count: 11928645
> >> Byte Value: histogram.107, Previous Count: 11929864, New Count: 11929866
> >> Byte Value: histogram.108, Previous Count: 11931611, New Count: 11931642
> >> Byte Value: histogram.109, Previous Count: 11932758, New Count: 11932763
> >> Byte Value: histogram.110, Previous Count: 11927893, New Count: 11927895
> >> Byte Value: histogram.111, Previous Count: 11933519, New Count: 11933522
> >> Byte Value: histogram.112, Previous Count: 11931392, New Count: 11931397
> >> Byte Value: histogram.113, Previous Count: 11928534, New Count: 11928548
> >> Byte Value: histogram.114, Previous Count: 11936879, New Count: 11936874
> >> Byte Value: histogram.115, Previous Count: 11932818, New Count: 11932804
> >> Byte Value: histogram.117, Previous Count: 11929143, New Count: 11929151
> >> Byte Value: histogram.118, Previous Count: 11931854, New Count: 11931829
> >> Byte Value: histogram.119, Previous Count: 11926333, New Count: 11926327
> >> Byte Value: histogram.120, Previous Count: 11928731, New Count: 11928740
> >> Byte Value: histogram.121, Previous Count: 11931149, New Count: 11931162
> >> Byte Value: histogram.122, Previous Count: 11926725, New Count: 11926733
> >> Byte Value: histogram.32, Previous Count: 11930422, New Count: 11930425
> >> Byte Value: histogram.33, Previous Count: 11934311, New Count: 11934313
> >> Byte Value: histogram.34, Previous Count: 11930459, New Count: 11930446
> >> Byte Value: histogram.35, Previous Count: 11924776, New Count: 11924758
> >> Byte Value: histogram.36, Previous Count: 11924186, New Count: 11924183
> >> Byte Value: histogram.37, Previous Count: 11928616, New Count: 11928627
> >> Byte Value: histogram.38, Previous Count: 11929474, New Count: 11929490
> >> Byte Value: histogram.39, Previous Count: 11929607, New Count: 11929600
> >> Byte Value: histogram.40, Previous Count: 11928053, New Count: 11928048
> >> Byte Value: histogram.41, Previous Count: 11930402, New Count: 11930399
> >> Byte Value: histogram.42, Previous Count: 11926830, New Count: 11926846
> >> Byte Value: histogram.44, Previous Count: 11932536, New Count: 11932538
> >> Byte Value: histogram.45, Previous Count: 11931053, New Count: 11931044
> >> Byte Value: histogram.46, Previous Count: 11930008, New Count: 11930011
> >> Byte Value: histogram.47, Previous Count: 11927747, New Count: 11927734
> >> Byte Value: histogram.48, Previous Count: 11936055, New Count: 11936057
> >> Byte Value: histogram.49, Previous Count: 11931471, New Count: 11931474
> >> Byte Value: histogram.50, Previous Count: 11931921, New Count: 11931908
> >> Byte Value: histogram.51, Previous Count: 11929643, New Count: 11929637
> >> Byte Value: histogram.52, Previous Count: 11923847, New Count: 11923854
> >> Byte Value: histogram.53, Previous Count: 11927311, New Count: 11927303
> >> Byte Value: histogram.54, Previous Count: 11933754, New Count: 11933766
> >> Byte Value: histogram.55, Previous Count: 11925964, New Count: 11925970
> >> Byte Value: histogram.56, Previous Count: 11928872, New Count: 11928873
> >> Byte Value: histogram.57, Previous Count: 11931124, New Count: 11931127
> >> Byte Value: histogram.58, Previous Count: 11928474, New Count: 11928477
> >> Byte Value: histogram.59, Previous Count: 11925814, New Count: 11925812
> >> Byte Value: histogram.60, Previous Count: 11933978, New Count: 11933991
> >> Byte Value: histogram.61, Previous Count: 11934136, New Count: 11934123
> >> Byte Value: histogram.62, Previous Count: 11932016, New Count: 11932011
> >> Byte Value: histogram.63, Previous Count: 23864588, New Count: 23864584
> >> Byte Value: histogram.64, Previous Count: 11924792, New Count: 11924789
> >> Byte Value: histogram.65, Previous Count: 11934789, New Count: 11934797
> >> Byte Value: histogram.66, Previous Count: 11933047, New Count: 11933044
> >> Byte Value: histogram.67, Previous Count: 11931899, New Count: 11931909
> >> Byte Value: histogram.68, Previous Count: 11935615, New Count: 11935609
> >> Byte Value: histogram.69, Previous Count: 11927249, New Count: 11927239
> >> Byte Value: histogram.70, Previous Count: 11933276, New Count: 11933274
> >> Byte Value: histogram.71, Previous Count: 11927953, New Count: 11927969
> >> Byte Value: histogram.72, Previous Count: 11929275, New Count: 11929266
> >> Byte Value: histogram.73, Previous Count: 11930292, New Count: 11930306
> >> Byte Value: histogram.74, Previous Count: 11935428, New Count: 11935427
> >> Byte Value: histogram.75, Previous Count: 11930317, New Count: 11930307
> >> Byte Value: histogram.76, Previous Count: 11935737, New Count: 11935726
> >> Byte Value: histogram.77, Previous Count: 11932127, New Count: 11932125
> >> Byte Value: histogram.78, Previous Count: 11932344, New Count: 11932349
> >> Byte Value: histogram.79, Previous Count: 11932094, New Count: 11932100
> >> Byte Value: histogram.80, Previous Count: 11930688, New Count: 11930687
> >> Byte Value: histogram.81, Previous Count: 11928415, New Count: 11928416
> >> Byte Value: histogram.82, Previous Count: 11931559, New Count: 11931542
> >> Byte Value: histogram.83, Previous Count: 11934192, New Count: 11934176
> >> Byte Value: histogram.84, Previous Count: 11927224, New Count: 11927231
> >> Byte Value: histogram.85, Previous Count: 11929491, New Count: 11929484
> >> Byte Value: histogram.87, Previous Count: 11932201, New Count: 11932190
> >> Byte Value: histogram.88, Previous Count: 11930694, New Count: 11930680
> >> Byte Value: histogram.89, Previous Count: 11936439, New Count: 11936448
> >> Byte Value: histogram.9, Previous Count: 11933187, New Count: 11933193
> >> Byte Value: histogram.90, Previous Count: 11926445, New Count: 11926455
> >> Byte Value: histogram.94, Previous Count: 11931596, New Count: 11931609
> >> Byte Value: histogram.95, Previous Count: 11929379, New Count: 11929384
> >> Byte Value: histogram.97, Previous Count: 11928864, New Count: 11928874
> >> Byte Value: histogram.98, Previous Count: 11924738, New Count: 11924729
> >> Byte Value: histogram.99, Previous Count: 11930062, New Count: 11930059
> >>
> >> 2021-11-01 22:10:02,765 ERROR [Timer-Driven Process Thread-9]
> org.apache.nifi.processors.script.ExecuteScript
> ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are
> differences in the histogram
> >> Byte Value: histogram.10, Previous Count: 11932402, New Count: 11932407
> >> Byte Value: histogram.100, Previous Count: 11927531, New Count: 11927541
> >> Byte Value: histogram.101, Previous Count: 11928454, New Count: 11928430
> >> Byte Value: histogram.102, Previous Count: 11934432, New Count: 11934439
> >> Byte Value: histogram.103, Previous Count: 11924623, New Count: 11924633
> >> Byte Value: histogram.104, Previous Count: 11934492, New Count: 11934474
> >> Byte Value: histogram.105, Previous Count: 11934585, New Count: 11934591
> >> Byte Value: histogram.106, Previous Count: 11928955, New Count: 11928948
> >> Byte Value: histogram.108, Previous Count: 11930139, New Count: 11930140
> >> Byte Value: histogram.109, Previous Count: 11929325, New Count: 11929321
> >> Byte Value: histogram.110, Previous Count: 11930486, New Count: 11930478
> >> Byte Value: histogram.111, Previous Count: 11933517, New Count: 11933508
> >> Byte Value: histogram.112, Previous Count: 11928334, New Count: 11928339
> >> Byte Value: histogram.114, Previous Count: 11929222, New Count: 11929213
> >> Byte Value: histogram.116, Previous Count: 11931182, New Count: 11931188
> >> Byte Value: histogram.117, Previous Count: 11933407, New Count: 11933402
> >> Byte Value: histogram.118, Previous Count: 11932709, New Count: 11932705
> >> Byte Value: histogram.120, Previous Count: 11933700, New Count: 11933708
> >> Byte Value: histogram.121, Previous Count: 11929803, New Count: 11929801
> >> Byte Value: histogram.122, Previous Count: 11930218, New Count: 11930220
> >> Byte Value: histogram.32, Previous Count: 11924458, New Count: 11924469
> >> Byte Value: histogram.33, Previous Count: 11934243, New Count: 11934248
> >> Byte Value: histogram.34, Previous Count: 11930696, New Count: 11930700
> >> Byte Value: histogram.35, Previous Count: 11925574, New Count: 11925577
> >> Byte Value: histogram.36, Previous Count: 11929198, New Count: 11929187
> >> Byte Value: histogram.37, Previous Count: 11928146, New Count: 11928143
> >> Byte Value: histogram.38, Previous Count: 11932505, New Count: 11932510
> >> Byte Value: histogram.39, Previous Count: 11929406, New Count: 11929412
> >> Byte Value: histogram.40, Previous Count: 11930100, New Count: 11930098
> >> Byte Value: histogram.41, Previous Count: 11930867, New Count: 11930872
> >> Byte Value: histogram.42, Previous Count: 11930796, New Count: 11930793
> >> Byte Value: histogram.43, Previous Count: 11930796, New Count: 11930789
> >> Byte Value: histogram.44, Previous Count: 11921866, New Count: 11921865
> >> Byte Value: histogram.45, Previous Count: 11935682, New Count: 11935699
> >> Byte Value: histogram.46, Previous Count: 11930075, New Count: 11930073
> >> Byte Value: histogram.47, Previous Count: 11928169, New Count: 11928165
> >> Byte Value: histogram.48, Previous Count: 11933490, New Count: 11933478
> >> Byte Value: histogram.49, Previous Count: 11932174, New Count: 11932180
> >> Byte Value: histogram.50, Previous Count: 11933255, New Count: 11933239
> >> Byte Value: histogram.51, Previous Count: 11934009, New Count: 11934013
> >> Byte Value: histogram.52, Previous Count: 11928361, New Count: 11928367
> >> Byte Value: histogram.53, Previous Count: 11927626, New Count: 11927627
> >> Byte Value: histogram.54, Previous Count: 11931611, New Count: 11931617
> >> Byte Value: histogram.55, Previous Count: 11930755, New Count: 11930746
> >> Byte Value: histogram.56, Previous Count: 11933823, New Count: 11933824
> >> Byte Value: histogram.57, Previous Count: 11922508, New Count: 11922510
> >> Byte Value: histogram.58, Previous Count: 11930384, New Count: 11930362
> >> Byte Value: histogram.59, Previous Count: 11929805, New Count: 11929820
> >> Byte Value: histogram.60, Previous Count: 11930064, New Count: 11930055
> >> Byte Value: histogram.61, Previous Count: 11926761, New Count: 11926762
> >> Byte Value: histogram.62, Previous Count: 11927605, New Count: 11927604
> >> Byte Value: histogram.63, Previous Count: 23858926, New Count: 23858913
> >> Byte Value: histogram.64, Previous Count: 11929516, New Count: 11929512
> >> Byte Value: histogram.65, Previous Count: 11930217, New Count: 11930223
> >> Byte Value: histogram.66, Previous Count: 11930478, New Count: 11930481
> >> Byte Value: histogram.67, Previous Count: 11939855, New Count: 11939858
> >> Byte Value: histogram.68, Previous Count: 11927850, New Count: 11927852
> >> Byte Value: histogram.69, Previous Count: 11931154, New Count: 11931175
> >> Byte Value: histogram.70, Previous Count: 11935374, New Count: 11935369
> >> Byte Value: histogram.71, Previous Count: 11930754, New Count: 11930751
> >> Byte Value: histogram.72, Previous Count: 11928304, New Count: 11928318
> >> Byte Value: histogram.73, Previous Count: 11931772, New Count: 11931766
> >> Byte Value: histogram.74, Previous Count: 11939417, New Count: 11939426
> >> Byte Value: histogram.75, Previous Count: 11930712, New Count: 11930718
> >> Byte Value: histogram.76, Previous Count: 11933331, New Count: 11933346
> >> Byte Value: histogram.77, Previous Count: 11931279, New Count: 11931272
> >> Byte Value: histogram.78, Previous Count: 11928276, New Count: 11928290
> >> Byte Value: histogram.79, Previous Count: 11930071, New Count: 11930067
> >> Byte Value: histogram.80, Previous Count: 11927830, New Count: 11927825
> >> Byte Value: histogram.81, Previous Count: 11931213, New Count: 11931206
> >> Byte Value: histogram.82, Previous Count: 11930964, New Count: 11930958
> >> Byte Value: histogram.83, Previous Count: 11928973, New Count: 11928966
> >> Byte Value: histogram.84, Previous Count: 11934325, New Count: 11934331
> >> Byte Value: histogram.85, Previous Count: 11929658, New Count: 11929654
> >> Byte Value: histogram.86, Previous Count: 11924667, New Count: 11924666
> >> Byte Value: histogram.87, Previous Count: 11931100, New Count: 11931106
> >> Byte Value: histogram.88, Previous Count: 11930252, New Count: 11930248
> >> Byte Value: histogram.89, Previous Count: 11927281, New Count: 11927299
> >> Byte Value: histogram.9, Previous Count: 11932848, New Count: 11932851
> >> Byte Value: histogram.90, Previous Count: 11930398, New Count: 11930399
> >> Byte Value: histogram.94, Previous Count: 11928720, New Count: 11928715
> >> Byte Value: histogram.95, Previous Count: 11928988, New Count: 11928977
> >> Byte Value: histogram.97, Previous Count: 11931423, New Count: 11931426
> >> Byte Value: histogram.98, Previous Count: 11928181, New Count: 11928184
> >> Byte Value: histogram.99, Previous Count: 11935549, New Count: 11935542
> >>
> >> 2021-11-01 22:23:08,989 ERROR [Timer-Driven Process Thread-10]
> org.apache.nifi.processors.script.ExecuteScript
> ExecuteScript[id=24d13930-49e8-1062-9a2c-943118738138] There are
> differences in the histogram
> >> Byte Value: histogram.10, Previous Count: 11930417, New Count: 11930411
> >> Byte Value: histogram.100, Previous Count: 11926739, New Count: 11926755
> >> Byte Value: histogram.101, Previous Count: 11930580, New Count: 11930574
> >> Byte Value: histogram.102, Previous Count: 11928210, New Count: 11928202
> >> Byte Value: histogram.103, Previous Count: 11935300, New Count: 11935297
> >> Byte Value: histogram.104, Previous Count: 11925804, New Count: 11925820
> >> Byte Value: histogram.105, Previous Count: 11931023, New Count: 11931012
> >> Byte Value: histogram.106, Previous Count: 11932342, New Count: 11932344
> >> Byte Value: histogram.108, Previous Count: 11930098, New Count: 11930106
> >> Byte Value: histogram.109, Previous Count: 11930759, New Count: 11930750
> >> Byte Value: histogram.110, Previous Count: 11934343, New Count: 11934352
> >> Byte Value: histogram.111, Previous Count: 11935775, New Count: 11935782
> >> Byte Value: histogram.112, Previous Count: 11933877, New Count: 11933884
> >> Byte Value: histogram.113, Previous Count: 11926675, New Count: 11926674
> >> Byte Value: histogram.114, Previous Count: 11929332, New Count: 11929336
> >> Byte Value: histogram.115, Previous Count: 11928876, New Count: 11928878
> >> Byte Value: histogram.116, Previous Count: 11927819, New Count: 11927833
> >> Byte Value: histogram.117, Previous Count: 11932657, New Count: 11932638
> >> Byte Value: histogram.118, Previous Count: 11933508, New Count: 11933507
> >> Byte Value: histogram.119, Previous Count: 11928808, New Count: 11928821
> >> Byte Value: histogram.120, Previous Count: 11937532, New Count: 11937528
> >> Byte Value: histogram.121, Previous Count: 11926907, New Count: 11926921
> >> Byte Value: histogram.32, Previous Count: 11929486, New Count: 11929489
> >> Byte Value: histogram.33, Previous Count: 11930737, New Count: 11930741
> >> Byte Value: histogram.34, Previous Count: 11931092, New Count: 11931086
> >> Byte Value: histogram.36, Previous Count: 11927605, New Count: 11927615
> >> Byte Value: histogram.37, Previous Count: 11930735, New Count: 11930745
> >> Byte Value: histogram.38, Previous Count: 11932174, New Count: 11932178
> >> Byte Value: histogram.39, Previous Count: 11936180, New Count: 11936182
> >> Byte Value: histogram.40, Previous Count: 11931666, New Count: 11931676
> >> Byte Value: histogram.41, Previous Count: 11927043, New Count: 11927034
> >> Byte Value: histogram.42, Previous Count: 11929044, New Count: 11929042
> >> Byte Value: histogram.43, Previous Count: 11934104, New Count: 11934098
> >> Byte Value: histogram.44, Previous Count: 11936337, New Count: 11936346
> >> Byte Value: histogram.45, Previous Count: 11935580, New Count: 11935582
> >> Byte Value: histogram.46, Previous Count: 11929598, New Count: 11929599
> >> Byte Value: histogram.47, Previous Count: 11934083, New Count: 11934085
> >> Byte Value: histogram.48, Previous Count: 11928858, New Count: 11928860
> >> Byte Value: histogram.49, Previous Count: 11931098, New Count: 11931113
> >> Byte Value: histogram.50, Previous Count: 11930618, New Count: 11930614
> >> Byte Value: histogram.51, Previous Count: 11925429, New Count: 11925435
> >> Byte Value: histogram.52, Previous Count: 11929741, New Count: 11929733
> >> Byte Value: histogram.53, Previous Count: 11934160, New Count: 11934155
> >> Byte Value: histogram.54, Previous Count: 11931999, New Count: 11931980
> >> Byte Value: histogram.55, Previous Count: 11930465, New Count: 11930477
> >> Byte Value: histogram.56, Previous Count: 11926194, New Count: 11926190
> >> Byte Value: histogram.57, Previous Count: 11926386, New Count: 11926381
> >> Byte Value: histogram.58, Previous Count: 11924871, New Count: 11924865
> >> Byte Value: histogram.59, Previous Count: 11929331, New Count: 11929326
> >> Byte Value: histogram.60, Previous Count: 11926951, New Count: 11926943
> >> Byte Value: histogram.61, Previous Count: 11928631, New Count: 11928619
> >> Byte Value: histogram.62, Previous Count: 11927549, New Count: 11927553
> >> Byte Value: histogram.63, Previous Count: 23856730, New Count: 23856718
> >> Byte Value: histogram.64, Previous Count: 11930288, New Count: 11930293
> >> Byte Value: histogram.65, Previous Count: 11931523, New Count: 11931527
> >> Byte Value: histogram.66, Previous Count: 11932821, New Count: 11932818
> >> Byte Value: histogram.67, Previous Count: 11932509, New Count: 11932510
> >> Byte Value: histogram.68, Previous Count: 11929613, New Count: 11929614
> >> Byte Value: histogram.69, Previous Count: 11928651, New Count: 11928654
> >> Byte Value: histogram.70, Previous Count: 11929253, New Count: 11929247
> >> Byte Value: histogram.71, Previous Count: 11931521, New Count: 11931512
> >> Byte Value: histogram.72, Previous Count: 11925805, New Count: 11925808
> >> Byte Value: histogram.73, Previous Count: 11934833, New Count: 11934826
> >> Byte Value: histogram.74, Previous Count: 11928314, New Count: 11928312
> >> Byte Value: histogram.75, Previous Count: 11923854, New Count: 11923863
> >> Byte Value: histogram.76, Previous Count: 11930892, New Count: 11930898
> >> Byte Value: histogram.77, Previous Count: 11927528, New Count: 11927525
> >> Byte Value: histogram.78, Previous Count: 11932850, New Count: 11932857
> >> Byte Value: histogram.79, Previous Count: 11934471, New Count: 11934461
> >> Byte Value: histogram.80, Previous Count: 11925707, New Count: 11925714
> >> Byte Value: histogram.81, Previous Count: 11929213, New Count: 11929206
> >> Byte Value: histogram.82, Previous Count: 11931334, New Count: 11931323
> >> Byte Value: histogram.83, Previous Count: 11936739, New Count: 11936732
> >> Byte Value: histogram.84, Previous Count: 11927855, New Count: 11927832
> >> Byte Value: histogram.85, Previous Count: 11931668, New Count: 11931665
> >> Byte Value: histogram.86, Previous Count: 11928609, New Count: 11928604
> >> Byte Value: histogram.87, Previous Count: 11931930, New Count: 11931933
> >> Byte Value: histogram.88, Previous Count: 11934341, New Count: 11934345
> >> Byte Value: histogram.89, Previous Count: 11927519, New Count: 11927518
> >> Byte Value: histogram.9, Previous Count: 11928004, New Count: 11928001
> >> Byte Value: histogram.90, Previous Count: 11933502, New Count: 11933517
> >> Byte Value: histogram.94, Previous Count: 11932024, New Count: 11932035
> >> Byte Value: histogram.95, Previous Count: 11932693, New Count: 11932679
> >> Byte Value: histogram.97, Previous Count: 11928428, New Count: 11928424
> >> Byte Value: histogram.98, Previous Count: 11933195, New Count: 11933196
> >> Byte Value: histogram.99, Previous Count: 11924273, New Count: 11924282
> >>
> >> Den tir. 2. nov. 2021 kl. 15.41 skrev Mark Payne <markap14@hotmail.com
> >:
> >>>
> >>> Jens,
> >>>
> >>> The histograms, in and of themselves, are not very interesting. The
> interesting thing would be the difference in the histogram before & after
> the hash. Can you provide the ERROR level logs generated by the
> ExecuteScript? That’s what is of interest.
> >>>
> >>> Thanks
> >>> -Mark
> >>>
> >>>
> >>> On Nov 2, 2021, at 1:35 AM, Jens M. Kofoed <jm...@gmail.com>
> wrote:
> >>>
> >>> Hi Mark and Joe
> >>>
> >>> Yesterday morning I implemented Mark's script in my 2 testflows. One
> testflow using sftp the other MergeContent/UnpackContent. Both testflow are
> running at a test cluster with 3 nodes and NIFI 1.14.0
> >>> The 1st flow with sftp have had 1 file going into the failure queue
> after about 16 hours.
> >>> The 2nd flow have had 2 files  going into the failure queue after
> about 15 and 17 hours.
> >>>
> >>> There are definitely something going wrongs in my setup, but I can't
> figure out what.
> >>>
> >>> Information from file 1:
> >>> histogram.0;0
> >>> histogram.1;0
> >>> histogram.10;11926720
> >>> histogram.100;11927504
> >>> histogram.101;11925396
> >>> histogram.102;11929923
> >>> histogram.103;11931596
> >>> histogram.104;11929071
> >>> histogram.105;11931365
> >>> histogram.106;11928661
> >>> histogram.107;11929864
> >>> histogram.108;11931611
> >>> histogram.109;11932758
> >>> histogram.11;0
> >>> histogram.110;11927893
> >>> histogram.111;11933519
> >>> histogram.112;11931392
> >>> histogram.113;11928534
> >>> histogram.114;11936879
> >>> histogram.115;11932818
> >>> histogram.116;11934767
> >>> histogram.117;11929143
> >>> histogram.118;11931854
> >>> histogram.119;11926333
> >>> histogram.12;0
> >>> histogram.120;11928731
> >>> histogram.121;11931149
> >>> histogram.122;11926725
> >>> histogram.123;0
> >>> histogram.124;0
> >>> histogram.125;0
> >>> histogram.126;0
> >>> histogram.127;0
> >>> histogram.128;0
> >>> histogram.129;0
> >>> histogram.13;0
> >>> histogram.130;0
> >>> histogram.131;0
> >>> histogram.132;0
> >>> histogram.133;0
> >>> histogram.134;0
> >>> histogram.135;0
> >>> histogram.136;0
> >>> histogram.137;0
> >>> histogram.138;0
> >>> histogram.139;0
> >>> histogram.14;0
> >>> histogram.140;0
> >>> histogram.141;0
> >>> histogram.142;0
> >>> histogram.143;0
> >>> histogram.144;0
> >>> histogram.145;0
> >>> histogram.146;0
> >>> histogram.147;0
> >>> histogram.148;0
> >>> histogram.149;0
> >>> histogram.15;0
> >>> histogram.150;0
> >>> histogram.151;0
> >>> histogram.152;0
> >>> histogram.153;0
> >>> histogram.154;0
> >>> histogram.155;0
> >>> histogram.156;0
> >>> histogram.157;0
> >>> histogram.158;0
> >>> histogram.159;0
> >>> histogram.16;0
> >>> histogram.160;0
> >>> histogram.161;0
> >>> histogram.162;0
> >>> histogram.163;0
> >>> histogram.164;0
> >>> histogram.165;0
> >>> histogram.166;0
> >>> histogram.167;0
> >>> histogram.168;0
> >>> histogram.169;0
> >>> histogram.17;0
> >>> histogram.170;0
> >>> histogram.171;0
> >>> histogram.172;0
> >>> histogram.173;0
> >>> histogram.174;0
> >>> histogram.175;0
> >>> histogram.176;0
> >>> histogram.177;0
> >>> histogram.178;0
> >>> histogram.179;0
> >>> histogram.18;0
> >>> histogram.180;0
> >>> histogram.181;0
> >>> histogram.182;0
> >>> histogram.183;0
> >>> histogram.184;0
> >>> histogram.185;0
> >>> histogram.186;0
> >>> histogram.187;0
> >>> histogram.188;0
> >>> histogram.189;0
> >>> histogram.19;0
> >>> histogram.190;0
> >>> histogram.191;0
> >>> histogram.192;0
> >>> histogram.193;0
> >>> histogram.194;0
> >>> histogram.195;0
> >>> histogram.196;0
> >>> histogram.197;0
> >>> histogram.198;0
> >>> histogram.199;0
> >>> histogram.2;0
> >>> histogram.20;0
> >>> histogram.200;0
> >>> histogram.201;0
> >>> histogram.202;0
> >>> histogram.203;0
> >>> histogram.204;0
> >>> histogram.205;0
> >>> histogram.206;0
> >>> histogram.207;0
> >>> histogram.208;0
> >>> histogram.209;0
> >>> histogram.21;0
> >>> histogram.210;0
> >>> histogram.211;0
> >>> histogram.212;0
> >>> histogram.213;0
> >>> histogram.214;0
> >>> histogram.215;0
> >>> histogram.216;0
> >>> histogram.217;0
> >>> histogram.218;0
> >>> histogram.219;0
> >>> histogram.22;0
> >>> histogram.220;0
> >>> histogram.221;0
> >>> histogram.222;0
> >>> histogram.223;0
> >>> histogram.224;0
> >>> histogram.225;0
> >>> histogram.226;0
> >>> histogram.227;0
> >>> histogram.228;0
> >>> histogram.229;0
> >>> histogram.23;0
> >>> histogram.230;0
> >>> histogram.231;0
> >>> histogram.232;0
> >>> histogram.233;0
> >>> histogram.234;0
> >>> histogram.235;0
> >>> histogram.236;0
> >>> histogram.237;0
> >>> histogram.238;0
> >>> histogram.239;0
> >>> histogram.24;0
> >>> histogram.240;0
> >>> histogram.241;0
> >>> histogram.242;0
> >>> histogram.243;0
> >>> histogram.244;0
> >>> histogram.245;0
> >>> histogram.246;0
> >>> histogram.247;0
> >>> histogram.248;0
> >>> histogram.249;0
> >>> histogram.25;0
> >>> histogram.250;0
> >>> histogram.251;0
> >>> histogram.252;0
> >>> histogram.253;0
> >>> histogram.254;0
> >>> histogram.255;0
> >>> histogram.26;0
> >>> histogram.27;0
> >>> histogram.28;0
> >>> histogram.29;0
> >>> histogram.3;0
> >>> histogram.30;0
> >>> histogram.31;0
> >>> histogram.32;11930422
> >>> histogram.33;11934311
> >>> histogram.34;11930459
> >>> histogram.35;11924776
> >>> histogram.36;11924186
> >>> histogram.37;11928616
> >>> histogram.38;11929474
> >>> histogram.39;11929607
> >>> histogram.4;0
> >>> histogram.40;11928053
> >>> histogram.41;11930402
> >>> histogram.42;11926830
> >>> histogram.43;11938138
> >>> histogram.44;11932536
> >>> histogram.45;11931053
> >>> histogram.46;11930008
> >>> histogram.47;11927747
> >>> histogram.48;11936055
> >>> histogram.49;11931471
> >>> histogram.5;0
> >>> histogram.50;11931921
> >>> histogram.51;11929643
> >>> histogram.52;11923847
> >>> histogram.53;11927311
> >>> histogram.54;11933754
> >>> histogram.55;11925964
> >>> histogram.56;11928872
> >>> histogram.57;11931124
> >>> histogram.58;11928474
> >>> histogram.59;11925814
> >>> histogram.6;0
> >>> histogram.60;11933978
> >>> histogram.61;11934136
> >>> histogram.62;11932016
> >>> histogram.63;23864588
> >>> histogram.64;11924792
> >>> histogram.65;11934789
> >>> histogram.66;11933047
> >>> histogram.67;11931899
> >>> histogram.68;11935615
> >>> histogram.69;11927249
> >>> histogram.7;0
> >>> histogram.70;11933276
> >>> histogram.71;11927953
> >>> histogram.72;11929275
> >>> histogram.73;11930292
> >>> histogram.74;11935428
> >>> histogram.75;11930317
> >>> histogram.76;11935737
> >>> histogram.77;11932127
> >>> histogram.78;11932344
> >>> histogram.79;11932094
> >>> histogram.8;0
> >>> histogram.80;11930688
> >>> histogram.81;11928415
> >>> histogram.82;11931559
> >>> histogram.83;11934192
> >>> histogram.84;11927224
> >>> histogram.85;11929491
> >>> histogram.86;11930624
> >>> histogram.87;11932201
> >>> histogram.88;11930694
> >>> histogram.89;11936439
> >>> histogram.9;11933187
> >>> histogram.90;11926445
> >>> histogram.91;0
> >>> histogram.92;0
> >>> histogram.93;0
> >>> histogram.94;11931596
> >>> histogram.95;11929379
> >>> histogram.96;0
> >>> histogram.97;11928864
> >>> histogram.98;11924738
> >>> histogram.99;11930062
> >>> histogram.totalBytes;1073741824
> >>>
> >>> File 2:
> >>> histogram.0;0
> >>> histogram.1;0
> >>> histogram.10;11932402
> >>> histogram.100;11927531
> >>> histogram.101;11928454
> >>> histogram.102;11934432
> >>> histogram.103;11924623
> >>> histogram.104;11934492
> >>> histogram.105;11934585
> >>> histogram.106;11928955
> >>> histogram.107;11928651
> >>> histogram.108;11930139
> >>> histogram.109;11929325
> >>> histogram.11;0
> >>> histogram.110;11930486
> >>> histogram.111;11933517
> >>> histogram.112;11928334
> >>> histogram.113;11927798
> >>> histogram.114;11929222
> >>> histogram.115;11932057
> >>> histogram.116;11931182
> >>> histogram.117;11933407
> >>> histogram.118;11932709
> >>> histogram.119;11931338
> >>> histogram.12;0
> >>> histogram.120;11933700
> >>> histogram.121;11929803
> >>> histogram.122;11930218
> >>> histogram.123;0
> >>> histogram.124;0
> >>> histogram.125;0
> >>> histogram.126;0
> >>> histogram.127;0
> >>> histogram.128;0
> >>> histogram.129;0
> >>> histogram.13;0
> >>> histogram.130;0
> >>> histogram.131;0
> >>> histogram.132;0
> >>> histogram.133;0
> >>> histogram.134;0
> >>> histogram.135;0
> >>> histogram.136;0
> >>> histogram.137;0
> >>> histogram.138;0
> >>> histogram.139;0
> >>> histogram.14;0
> >>> histogram.140;0
> >>> histogram.141;0
> >>> histogram.142;0
> >>> histogram.143;0
> >>> histogram.144;0
> >>> histogram.145;0
> >>> histogram.146;0
> >>> histogram.147;0
> >>> histogram.148;0
> >>> histogram.149;0
> >>> histogram.15;0
> >>> histogram.150;0
> >>> histogram.151;0
> >>> histogram.152;0
> >>> histogram.153;0
> >>> histogram.154;0
> >>> histogram.155;0
> >>> histogram.156;0
> >>> histogram.157;0
> >>> histogram.158;0
> >>> histogram.159;0
> >>> histogram.16;0
> >>> histogram.160;0
> >>> histogram.161;0
> >>> histogram.162;0
> >>> histogram.163;0
> >>> histogram.164;0
> >>> histogram.165;0
> >>> histogram.166;0
> >>> histogram.167;0
> >>> histogram.168;0
> >>> histogram.169;0
> >>> histogram.17;0
> >>> histogram.170;0
> >>> histogram.171;0
> >>> histogram.172;0
> >>> histogram.173;0
> >>> histogram.174;0
> >>> histogram.175;0
> >>> histogram.176;0
> >>> histogram.177;0
> >>> histogram.178;0
> >>> histogram.179;0
> >>> histogram.18;0
> >>> histogram.180;0
> >>> histogram.181;0
> >>> histogram.182;0
> >>> histogram.183;0
> >>> histogram.184;0
> >>> histogram.185;0
> >>> histogram.186;0
> >>> histogram.187;0
> >>> histogram.188;0
> >>> histogram.189;0
> >>> histogram.19;0
> >>> histogram.190;0
> >>> histogram.191;0
> >>> histogram.192;0
> >>> histogram.193;0
> >>> histogram.194;0
> >>> histogram.195;0
> >>> histogram.196;0
> >>> histogram.197;0
> >>> histogram.198;0
> >>> histogram.199;0
> >>> histogram.2;0
> >>> histogram.20;0
> >>> histogram.200;0
> >>> histogram.201;0
> >>> histogram.202;0
> >>> histogram.203;0
> >>> histogram.204;0
> >>> histogram.205;0
> >>> histogram.206;0
> >>> histogram.207;0
> >>> histogram.208;0
> >>> histogram.209;0
> >>> histogram.21;0
> >>> histogram.210;0
> >>> histogram.211;0
> >>> histogram.212;0
> >>> histogram.213;0
> >>> histogram.214;0
> >>> histogram.215;0
> >>> histogram.216;0
> >>> histogram.217;0
> >>> histogram.218;0
> >>> histogram.219;0
> >>> histogram.22;0
> >>> histogram.220;0
> >>> histogram.221;0
> >>> histogram.222;0
> >>> histogram.223;0
> >>> histogram.224;0
> >>> histogram.225;0
> >>> histogram.226;0
> >>> histogram.227;0
> >>> histogram.228;0
> >>> histogram.229;0
> >>> histogram.23;0
> >>> histogram.230;0
> >>> histogram.231;0
> >>> histogram.232;0
> >>> histogram.233;0
> >>> histogram.234;0
> >>> histogram.235;0
> >>> histogram.236;0
> >>> histogram.237;0
> >>> histogram.238;0
> >>> histogram.239;0
> >>> histogram.24;0
> >>> histogram.240;0
> >>> histogram.241;0
> >>> histogram.242;0
> >>> histogram.243;0
> >>> histogram.244;0
> >>> histogram.245;0
> >>> histogram.246;0
> >>> histogram.247;0
> >>> histogram.248;0
> >>> histogram.249;0
> >>> histogram.25;0
> >>> histogram.250;0
> >>> histogram.251;0
> >>> histogram.252;0
> >>> histogram.253;0
> >>> histogram.254;0
> >>> histogram.255;0
> >>> histogram.26;0
> >>> histogram.27;0
> >>> histogram.28;0
> >>> histogram.29;0
> >>> histogram.3;0
> >>> histogram.30;0
> >>> histogram.31;0
> >>> histogram.32;11924458
> >>> histogram.33;11934243
> >>> histogram.34;11930696
> >>> histogram.35;11925574
> >>> histogram.36;11929198
> >>> histogram.37;11928146
> >>> histogram.38;11932505
> >>> histogram.39;11929406
> >>> histogram.4;0
> >>> histogram.40;11930100
> >>> histogram.41;11930867
> >>> histogram.42;11930796
> >>> histogram.43;11930796
> >>> histogram.44;11921866
> >>> histogram.45;11935682
> >>> histogram.46;11930075
> >>> histogram.47;11928169
> >>> histogram.48;11933490
> >>> histogram.49;11932174
> >>> histogram.5;0
> >>> histogram.50;11933255
> >>> histogram.51;11934009
> >>> histogram.52;11928361
> >>> histogram.53;11927626
> >>> histogram.54;11931611
> >>> histogram.55;11930755
> >>> histogram.56;11933823
> >>> histogram.57;11922508
> >>> histogram.58;11930384
> >>> histogram.59;11929805
> >>> histogram.6;0
> >>> histogram.60;11930064
> >>> histogram.61;11926761
> >>> histogram.62;11927605
> >>> histogram.63;23858926
> >>> histogram.64;11929516
> >>> histogram.65;11930217
> >>> histogram.66;11930478
> >>> histogram.67;11939855
> >>> histogram.68;11927850
> >>> histogram.69;11931154
> >>> histogram.7;0
> >>> histogram.70;11935374
> >>> histogram.71;11930754
> >>> histogram.72;11928304
> >>> histogram.73;11931772
> >>> histogram.74;11939417
> >>> histogram.75;11930712
> >>> histogram.76;11933331
> >>> histogram.77;11931279
> >>> histogram.78;11928276
> >>> histogram.79;11930071
> >>> histogram.8;0
> >>> histogram.80;11927830
> >>> histogram.81;11931213
> >>> histogram.82;11930964
> >>> histogram.83;11928973
> >>> histogram.84;11934325
> >>> histogram.85;11929658
> >>> histogram.86;11924667
> >>> histogram.87;11931100
> >>> histogram.88;11930252
> >>> histogram.89;11927281
> >>> histogram.9;11932848
> >>> histogram.90;11930398
> >>> histogram.91;0
> >>> histogram.92;0
> >>> histogram.93;0
> >>> histogram.94;11928720
> >>> histogram.95;11928988
> >>> histogram.96;0
> >>> histogram.97;11931423
> >>> histogram.98;11928181
> >>> histogram.99;11935549
> >>> histogram.totalBytes;1073741824
> >>>
> >>> File3:
> >>> histogram.0;0
> >>> histogram.1;0
> >>> histogram.10;11930417
> >>> histogram.100;11926739
> >>> histogram.101;11930580
> >>> histogram.102;11928210
> >>> histogram.103;11935300
> >>> histogram.104;11925804
> >>> histogram.105;11931023
> >>> histogram.106;11932342
> >>> histogram.107;11929778
> >>> histogram.108;11930098
> >>> histogram.109;11930759
> >>> histogram.11;0
> >>> histogram.110;11934343
> >>> histogram.111;11935775
> >>> histogram.112;11933877
> >>> histogram.113;11926675
> >>> histogram.114;11929332
> >>> histogram.115;11928876
> >>> histogram.116;11927819
> >>> histogram.117;11932657
> >>> histogram.118;11933508
> >>> histogram.119;11928808
> >>> histogram.12;0
> >>> histogram.120;11937532
> >>> histogram.121;11926907
> >>> histogram.122;11933942
> >>> histogram.123;0
> >>> histogram.124;0
> >>> histogram.125;0
> >>> histogram.126;0
> >>> histogram.127;0
> >>> histogram.128;0
> >>> histogram.129;0
> >>> histogram.13;0
> >>> histogram.130;0
> >>> histogram.131;0
> >>> histogram.132;0
> >>> histogram.133;0
> >>> histogram.134;0
> >>> histogram.135;0
> >>> histogram.136;0
> >>> histogram.137;0
> >>> histogram.138;0
> >>> histogram.139;0
> >>> histogram.14;0
> >>> histogram.140;0
> >>> histogram.141;0
> >>> histogram.142;0
> >>> histogram.143;0
> >>> histogram.144;0
> >>> histogram.145;0
> >>> histogram.146;0
> >>> histogram.147;0
> >>> histogram.148;0
> >>> histogram.149;0
> >>> histogram.15;0
> >>> histogram.150;0
> >>> histogram.151;0
> >>> histogram.152;0
> >>> histogram.153;0
> >>> histogram.154;0
> >>> histogram.155;0
> >>> histogram.156;0
> >>> histogram.157;0
> >>> histogram.158;0
> >>> histogram.159;0
> >>> histogram.16;0
> >>> histogram.160;0
> >>> histogram.161;0
> >>> histogram.162;0
> >>> histogram.163;0
> >>> histogram.164;0
> >>> histogram.165;0
> >>> histogram.166;0
> >>> histogram.167;0
> >>> histogram.168;0
> >>> histogram.169;0
> >>> histogram.17;0
> >>> histogram.170;0
> >>> histogram.171;0
> >>> histogram.172;0
> >>> histogram.173;0
> >>> histogram.174;0
> >>> histogram.175;0
> >>> histogram.176;0
> >>> histogram.177;0
> >>> histogram.178;0
> >>> histogram.179;0
> >>> histogram.18;0
> >>> histogram.180;0
> >>> histogram.181;0
> >>> histogram.182;0
> >>> histogram.183;0
> >>> histogram.184;0
> >>> histogram.185;0
> >>> histogram.186;0
> >>> histogram.187;0
> >>> histogram.188;0
> >>> histogram.189;0
> >>> histogram.19;0
> >>> histogram.190;0
> >>> histogram.191;0
> >>> histogram.192;0
> >>> histogram.193;0
> >>> histogram.194;0
> >>> histogram.195;0
> >>> histogram.196;0
> >>> histogram.197;0
> >>> histogram.198;0
> >>> histogram.199;0
> >>> histogram.2;0
> >>> histogram.20;0
> >>> histogram.200;0
> >>> histogram.201;0
> >>> histogram.202;0
> >>> histogram.203;0
> >>> histogram.204;0
> >>> histogram.205;0
> >>> histogram.206;0
> >>> histogram.207;0
> >>> histogram.208;0
> >>> histogram.209;0
> >>> histogram.21;0
> >>> histogram.210;0
> >>> histogram.211;0
> >>> histogram.212;0
> >>> histogram.213;0
> >>> histogram.214;0
> >>> histogram.215;0
> >>> histogram.216;0
> >>> histogram.217;0
> >>> histogram.218;0
> >>> histogram.219;0
> >>> histogram.22;0
> >>> histogram.220;0
> >>> histogram.221;0
> >>> histogram.222;0
> >>> histogram.223;0
> >>> histogram.224;0
> >>> histogram.225;0
> >>> histogram.226;0
> >>> histogram.227;0
> >>> histogram.228;0
> >>> histogram.229;0
> >>> histogram.23;0
> >>> histogram.230;0
> >>> histogram.231;0
> >>> histogram.232;0
> >>> histogram.233;0
> >>> histogram.234;0
> >>> histogram.235;0
> >>> histogram.236;0
> >>> histogram.237;0
> >>> histogram.238;0
> >>> histogram.239;0
> >>> histogram.24;0
> >>> histogram.240;0
> >>> histogram.241;0
> >>> histogram.242;0
> >>> histogram.243;0
> >>> histogram.244;0
> >>> histogram.245;0
> >>> histogram.246;0
> >>> histogram.247;0
> >>> histogram.248;0
> >>> histogram.249;0
> >>> histogram.25;0
> >>> histogram.250;0
> >>> histogram.251;0
> >>> histogram.252;0
> >>> histogram.253;0
> >>> histogram.254;0
> >>> histogram.255;0
> >>> histogram.26;0
> >>> histogram.27;0
> >>> histogram.28;0
> >>> histogram.29;0
> >>> histogram.3;0
> >>> histogram.30;0
> >>> histogram.31;0
> >>> histogram.32;11929486
> >>> histogram.33;11930737
> >>> histogram.34;11931092
> >>> histogram.35;11934488
> >>> histogram.36;11927605
> >>> histogram.37;11930735
> >>> histogram.38;11932174
> >>> histogram.39;11936180
> >>> histogram.4;0
> >>> histogram.40;11931666
> >>> histogram.41;11927043
> >>> histogram.42;11929044
> >>> histogram.43;11934104
> >>> histogram.44;11936337
> >>> histogram.45;11935580
> >>> histogram.46;11929598
> >>> histogram.47;11934083
> >>> histogram.48;11928858
> >>> histogram.49;11931098
> >>> histogram.5;0
> >>> histogram.50;11930618
> >>> histogram.51;11925429
> >>> histogram.52;11929741
> >>> histogram.53;11934160
> >>> histogram.54;11931999
> >>> histogram.55;11930465
> >>> histogram.56;11926194
> >>> histogram.57;11926386
> >>> histogram.58;11924871
> >>> histogram.59;11929331
> >>> histogram.6;0
> >>> histogram.60;11926951
> >>> histogram.61;11928631
> >>> histogram.62;11927549
> >>> histogram.63;23856730
> >>> histogram.64;11930288
> >>> histogram.65;11931523
> >>> histogram.66;11932821
> >>> histogram.67;11932509
> >>> histogram.68;11929613
> >>> histogram.69;11928651
> >>> histogram.7;0
> >>> histogram.70;11929253
> >>> histogram.71;11931521
> >>> histogram.72;11925805
> >>> histogram.73;11934833
> >>> histogram.74;11928314
> >>> histogram.75;11923854
> >>> histogram.76;11930892
> >>> histogram.77;11927528
> >>> histogram.78;11932850
> >>> histogram.79;11934471
> >>> histogram.8;0
> >>> histogram.80;11925707
> >>> histogram.81;11929213
> >>> histogram.82;11931334
> >>> histogram.83;11936739
> >>> histogram.84;11927855
> >>> histogram.85;11931668
> >>> histogram.86;11928609
> >>> histogram.87;11931930
> >>> histogram.88;11934341
> >>> histogram.89;11927519
> >>> histogram.9;11928004
> >>> histogram.90;11933502
> >>> histogram.91;0
> >>> histogram.92;0
> >>> histogram.93;0
> >>> histogram.94;11932024
> >>> histogram.95;11932693
> >>> histogram.96;0
> >>> histogram.97;11928428
> >>> histogram.98;11933195
> >>> histogram.99;11924273
> >>> histogram.totalBytes;1073741824
> >>>
> >>> Kind regards
> >>> Jens
> >>>
> >>> Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <jo...@gmail.com>:
> >>>>
> >>>> Jen
> >>>>
> >>>> 118 hours in - still goood.
> >>>>
> >>>> Thanks
> >>>>
> >>>> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com> wrote:
> >>>>>
> >>>>> Jens
> >>>>>
> >>>>> Update from hour 67.  Still lookin' good.
> >>>>>
> >>>>> Will advise.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> >>>>>>
> >>>>>> Many many thanks 🙏 Joe for looking into this. My test flow was
> running for 6 days before the first error occurred
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>> Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
> >>>>>>>
> >>>>>>> Jens,
> >>>>>>>
> >>>>>>> Am 40+ hours in running both your flow and mine to reproduce.  So
> far
> >>>>>>> neither have shown any sign of trouble.  Will keep running for
> another
> >>>>>>> week or so if I can.
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> The Physical hosts with VMWare is using the vmfs but the vm
> machines running at hosts can’t see that.
> >>>>>>>> But you asked about the underlying file system 😀 and since my
> first answer with the copy from the fstab file wasn’t enough I just wanted
> to give all the details 😁.
> >>>>>>>>
> >>>>>>>> If you create a vm for windows you would probably use NTFS (on
> top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> >>>>>>>>
> >>>>>>>> All the partitions at my nifi nodes, are local devices (sda, sdb,
> sdc and sdd) for each Linux machine. I don’t use nfs
> >>>>>>>>
> >>>>>>>> Kind regards
> >>>>>>>> Jens
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>> I don't quite follow the EXT4 usage on top of VMFS but the point
> here
> >>>>>>>> is you'll ultimately need to truly understand your underlying
> storage
> >>>>>>>> system and what sorts of guarantees it is giving you.  If
> linux/the
> >>>>>>>> jvm/nifi think it has a typical EXT4 type block storage system to
> work
> >>>>>>>> with it can only be safe/operate within those constraints.  I
> have no
> >>>>>>>> idea about what VMFS brings to the table or the settings for it.
> >>>>>>>>
> >>>>>>>> The sync properties I shared previously might help force the
> issue of
> >>>>>>>> ensuring a formal sync/flush cycle all the way through the disk
> has
> >>>>>>>> occurred which we'd normally not do or need to do but again in
> some
> >>>>>>>> cases offers a stronger guarantee in exchange for performance.
> >>>>>>>>
> >>>>>>>> In any case...Mark's path for you here will help identify what
> we're
> >>>>>>>> dealing with and we can go from there.
> >>>>>>>>
> >>>>>>>> I am aware of significant usage of NiFi on VMWare configurations
> >>>>>>>> without issue at high rates for many years so whatever it is here
> is
> >>>>>>>> likely solvable.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Mark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks for the clarification. I will implement the script when I
> return to the office at Monday next week ( November 1st).
> >>>>>>>>
> >>>>>>>> I don’t use NFS, but ext4. But I will implement the script so we
> can check if it’s the case here. But I think the issue might be after the
> processors writing content to the repository.
> >>>>>>>>
> >>>>>>>> I have a test flow running for more than 2 weeks without any
> errors. But this flow only calculate hash and comparing.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Two other flows both create errors. One flow use
> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use
> MergeContent->UnpackContent->CryptographicHashContent->compares. The last
> flow is totally inside nifi, excluding other network/server issues.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> In both cases the CryptographicHashContent is right after a
> process which writes new content to the repository. But in one case a file
> in our production flow did calculate a wrong hash 4 times with a 1 minutes
> delay between each calculation. A few hours later I looped the file back
> and this time it was OK.
> >>>>>>>>
> >>>>>>>> Just like the case in step 5 and 12 in the pdf file
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I will let you all know more later next week
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Kind regards
> >>>>>>>>
> >>>>>>>> Jens
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <
> markap14@hotmail.com>:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> And the actual script:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> import org.apache.nifi.flowfile.FlowFile
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> import java.util.stream.Collectors
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Map<String, String> getPreviousHistogram(final FlowFile flowFile)
> {
> >>>>>>>>
> >>>>>>>>  final Map<String, String> histogram =
> flowFile.getAttributes().entrySet().stream()
> >>>>>>>>
> >>>>>>>>      .filter({ entry -> entry.getKey().startsWith("histogram.") })
> >>>>>>>>
> >>>>>>>>      .collect(Collectors.toMap({ entry -> entry.key}, { entry ->
> entry.value }))
> >>>>>>>>
> >>>>>>>>  return histogram;
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Map<String, String> createHistogram(final FlowFile flowFile,
> final InputStream inStream) {
> >>>>>>>>
> >>>>>>>>  final Map<String, String> histogram = new HashMap<>();
> >>>>>>>>
> >>>>>>>>  final int[] distribution = new int[256];
> >>>>>>>>
> >>>>>>>>  Arrays.fill(distribution, 0);
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  long total = 0L;
> >>>>>>>>
> >>>>>>>>  final byte[] buffer = new byte[8192];
> >>>>>>>>
> >>>>>>>>  int len;
> >>>>>>>>
> >>>>>>>>  while ((len = inStream.read(buffer)) > 0) {
> >>>>>>>>
> >>>>>>>>      for (int i=0; i < len; i++) {
> >>>>>>>>
> >>>>>>>>          final int val = buffer[i];
> >>>>>>>>
> >>>>>>>>          distribution[val]++;
> >>>>>>>>
> >>>>>>>>          total++;
> >>>>>>>>
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>>  }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  for (int i=0; i < 256; i++) {
> >>>>>>>>
> >>>>>>>>      histogram.put("histogram." + i,
> String.valueOf(distribution[i]));
> >>>>>>>>
> >>>>>>>>  }
> >>>>>>>>
> >>>>>>>>  histogram.put("histogram.totalBytes", String.valueOf(total));
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  return histogram;
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> void logHistogramDifferences(final Map<String, String> previous,
> final Map<String, String> updated) {
> >>>>>>>>
> >>>>>>>>  final StringBuilder sb = new StringBuilder("There are
> differences in the histogram\n");
> >>>>>>>>
> >>>>>>>>  final Map<String, String> sorted = new TreeMap<>(previous)
> >>>>>>>>
> >>>>>>>>  for (final Map.Entry<String, String> entry : sorted.entrySet()) {
> >>>>>>>>
> >>>>>>>>      final String key = entry.getKey();
> >>>>>>>>
> >>>>>>>>      final String previousValue = entry.getValue();
> >>>>>>>>
> >>>>>>>>      final String updatedValue = updated.get(entry.getKey())
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>      if (!Objects.equals(previousValue, updatedValue)) {
> >>>>>>>>
> >>>>>>>>          sb.append("Byte Value: ").append(key).append(", Previous
> Count: ").append(previousValue).append(", New Count:
> ").append(updatedValue).append("\n");
> >>>>>>>>
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>>  }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  log.error(sb.toString());
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> def flowFile = session.get()
> >>>>>>>>
> >>>>>>>> if (flowFile == null) {
> >>>>>>>>
> >>>>>>>>  return
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> final Map<String, String> previousHistogram =
> getPreviousHistogram(flowFile)
> >>>>>>>>
> >>>>>>>> Map<String, String> histogram = null;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> final InputStream inStream = session.read(flowFile);
> >>>>>>>>
> >>>>>>>> try {
> >>>>>>>>
> >>>>>>>>  histogram = createHistogram(flowFile, inStream);
> >>>>>>>>
> >>>>>>>> } finally {
> >>>>>>>>
> >>>>>>>>  inStream.close()
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> if (!previousHistogram.isEmpty()) {
> >>>>>>>>
> >>>>>>>>  if (previousHistogram.equals(histogram)) {
> >>>>>>>>
> >>>>>>>>      log.info("Histograms match")
> >>>>>>>>
> >>>>>>>>  } else {
> >>>>>>>>
> >>>>>>>>      logHistogramDifferences(previousHistogram, histogram)
> >>>>>>>>
> >>>>>>>>      session.transfer(flowFile, REL_FAILURE)
> >>>>>>>>
> >>>>>>>>      return;
> >>>>>>>>
> >>>>>>>>  }
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> flowFile = session.putAllAttributes(flowFile, histogram)
> >>>>>>>>
> >>>>>>>> session.transfer(flowFile, REL_SUCCESS)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> For a bit of background here, the reason that Joe and I have
> expressed interest in NFS file systems is that the way the protocol works,
> it is allowed to receive packets/chunks of the file out-of-order. So, what
> happens is let’s say a 1 MB file is being written. The first 500 KB are
> received. Then instead of the the 501st KB it receives the 503rd KB. What
> happens is that the size of the file on the file system becomes 503 KB. But
> what about 501 & 502? Well when you read the data, the file system just
> returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server
> receives those bytes, it then goes back and fills in the proper bytes. So
> if you’re running on NFS, it is possible for the contents of the file on
> the underlying file system to change out from under you. It’s not clear to
> me what other types of file system might do something similar.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> So, one thing that we can do is to find out whether or not the
> contents of the underlying file have changed in some way, or if there’s
> something else happening that could perhaps result in the hashes being
> wrong. I’ve put together a script that should help diagnose this.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Can you insert an ExecuteScript processor either just before or
> just after your CryptographicHashContent processor? Doesn’t really matter
> whether it’s run just before or just after. I’ll attach the script here.
> It’s a Groovy Script so you should be able to use ExecuteScript with Script
> Engine = Groovy and the following script as the Script Body. No other
> changes needed.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The way the script works, it reads in the contents of the
> FlowFile, and then it builds up a histogram of all byte values (0-255) that
> it sees in the contents, and then adds that as attributes. So it adds
> attributes such as:
> >>>>>>>>
> >>>>>>>> histogram.0 = 280273
> >>>>>>>>
> >>>>>>>> histogram.1 = 2820
> >>>>>>>>
> >>>>>>>> histogram.2 = 48202
> >>>>>>>>
> >>>>>>>> histogram.3 = 3820
> >>>>>>>>
> >>>>>>>> …
> >>>>>>>>
> >>>>>>>> histogram.totalBytes = 1780928732
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> It then checks if those attributes have already been added. If
> so, after calculating that histogram, it checks against the previous values
> (in the attributes). If they are the same, the FlowFile goes to ’success’.
> If they are different, it logs an error indicating the before/after value
> for any byte whose distribution was different, and it routes to failure.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> So, if for example, the first time through it sees 280,273 bytes
> with a value of ‘0’, and the second times it only sees 12,001 then we know
> there were a bunch of 0’s previously that were updated to be some other
> value. And it includes the total number of bytes in case somehow we find
> that we’re reading too many bytes or not enough bytes or something like
> that. This should help narrow down what’s happening.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> -Mark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Attached is the flow I was using (now running yours and this
> one).  Curious if that one reproduces the issue for you as well.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I have your flow running and will keep it running for several
> days/week to see if I can reproduce.  Also of note please use your same
> test flow but use HashContent instead of crypto hash.  Curious if that
> matters for any reason...
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Still want to know more about your underlying storage system.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> You could also try updating nifi.properties and changing the
> following lines:
> >>>>>>>>
> >>>>>>>> nifi.flowfile.repository.always.sync=true
> >>>>>>>>
> >>>>>>>> nifi.content.repository.always.sync=true
> >>>>>>>>
> >>>>>>>> nifi.provenance.repository.always.sync=true
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> It will hurt performance but can be useful/necessary on certain
> storage subsystems.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ignore "For the scenario where you can replicate this please
> share the flow.xml.gz for which it is reproducible."  I see the uploaded
> JSON
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> We asked about the underlying storage system.  You replied with
> some info but not the specifics.  Do you know precisely what the underlying
> storage is and how it is presented to the operating system?  For instance
> is it NFS or something similar?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I've setup a very similar flow at extremely high rates running
> for the past several days with no issue.  In my case though I know
> precisely what the config is and the disk setup is.  Didn't do anything
> special to be clear but still it is important to know.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> For the scenario where you can replicate this please share the
> flow.xml.gz for which it is reproducible.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> Joe
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Dear Joe and Mark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I have created a test flow without the sftp processors, which
> don't create any errors. Therefore I created a new test flow where I use a
> MergeContent and UnpackContent instead of the sftp processors. This keeps
> all data internal in NIFI, but force NIFI to write and read new files
> totally local.
> >>>>>>>>
> >>>>>>>> My flow have been running for 7 days and this morning there where
> 2 files where the sha256 has been given another has value than original. I
> have set this flow up in another nifi cluster only for testing, and the
> cluster is not doing anything else. It is using Nifi 1.14.0
> >>>>>>>>
> >>>>>>>> So I can reproduce issues at different nifi clusters and versions
> (1.13.2 and 1.14.0) where the calculation of a hash on content can give
> different outputs. Is doesn't make any sense, but it happens. In all my
> cases the issues happens where the calculations of the hashcontent happens
> right after NIFI writes the content to the content repository. I don't know
> if there cut be some kind of delay writing the content 100% before the next
> processors begin reading the content???
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Please see attach test flow, and the previous mail with a pdf
> showing the lineage of a production file which also had issues. In the pdf
> check step 5 and 12.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Kind regards
> >>>>>>>>
> >>>>>>>> Jens M. Kofoed
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Joe,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> To start from the last mail :-)
> >>>>>>>>
> >>>>>>>> All the repositories has it's own disk, and I'm using ext4
> >>>>>>>>
> >>>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
> >>>>>>>>
> >>>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
> >>>>>>>>
> >>>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> My test flow WITH sftp looks like this:
> >>>>>>>>
> >>>>>>>> <image.png>
> >>>>>>>>
> >>>>>>>> And this flow has produced 1 error within 3 days. After many many
> loops the file fails and went out via the "unmatched" output to  the
> disabled UpdateAttribute, which is doing nothing. Just for keeping the
> failed flowfile in a queue.  I enabled the UpdateAttribute and looped the
> file back to the CryptographicHashContent and now it calculated the hash
> correct again. But in this flow I have a FetchSFTP Process right before the
> Hashing.
> >>>>>>>>
> >>>>>>>> Right now my flow is running without the 2 sftp processors, and
> the last 24hours there has been no errors.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> About the Lineage:
> >>>>>>>>
> >>>>>>>> Are there a way to export all the lineage data? The export only
> generate a svg file.
> >>>>>>>>
> >>>>>>>> This is only for the receiving nifi which is internally calculate
> 2 different hashes on the same content with ca. 1 minutes delay. Attached
> is a pdf-document with the lineage, the flow and all the relevant
> Provenance information's for each step in the lineage.
> >>>>>>>>
> >>>>>>>> The interesting steps are step 5 and 12.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Can the issues be that data is not written 100% to disk between
> step 4 and 5 in the flow?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Kind regards
> >>>>>>>>
> >>>>>>>> Jens M. Kofoed
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <
> joe.witt@gmail.com>:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Also what type of file system/storage system are you running NiFi
> on
> >>>>>>>>
> >>>>>>>> in this case?  We'll need to know this for the NiFi
> >>>>>>>>
> >>>>>>>> content/flowfile/provenance repositories? Is it NFS?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> And to further narrow this down
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> "I have a test flow, where a GenerateFlowfile has created 6x 1GB
> files
> >>>>>>>>
> >>>>>>>> (2 files per node) and next process was a hashcontent before it
> run
> >>>>>>>>
> >>>>>>>> into a test loop. Where files are uploaded via PutSFTP to a test
> >>>>>>>>
> >>>>>>>> server, and downloaded again and recalculated the hash. I have
> had one
> >>>>>>>>
> >>>>>>>> issue after 3 days of running."
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> So to be clear with GenerateFlowFile making these files and then
> you
> >>>>>>>>
> >>>>>>>> looping the content is wholly and fully exclusively within the
> control
> >>>>>>>>
> >>>>>>>> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping
> the
> >>>>>>>>
> >>>>>>>> same files over and over in nifi itself you can make this happen
> or
> >>>>>>>>
> >>>>>>>> cannot?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> "After fetching a FlowFile-stream file and unpacked it back into
> NiFi
> >>>>>>>>
> >>>>>>>> I calculate a sha256. 1 minutes later I recalculate the sha256 on
> the
> >>>>>>>>
> >>>>>>>> exact same file. And got a new hash. That is what worry’s me.
> >>>>>>>>
> >>>>>>>> The fact that the same file can be recalculated and produce two
> >>>>>>>>
> >>>>>>>> different hashes, is very strange, but it happens. "
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ok so to confirm you are saying that in each case this happens
> you see
> >>>>>>>>
> >>>>>>>> it first compute the wrong hash, but then if you retry the same
> >>>>>>>>
> >>>>>>>> flowfile it then provides the correct hash?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Can you please also show/share the lineage history for such a flow
> >>>>>>>>
> >>>>>>>> file then?  It should have events for the initial hash, second
> hash,
> >>>>>>>>
> >>>>>>>> the unpacking, trace to the original stream, etc...
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Dear Mark and Joe
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I know my setup isn’t normal for many people. But if we only
> looks at my receive side, which the last mails is about. Every thing is
> happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> >>>>>>>>
> >>>>>>>> After fetching a FlowFile-stream file and unpacked it back into
> NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> exact same file. And got a new hash. That is what worry’s me.
> >>>>>>>>
> >>>>>>>> The fact that the same file can be recalculated and produce two
> different hashes, is very strange, but it happens. Over the last 5 months
> it have only happen 35-40 times.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I can understand if the file is not completely loaded and saved
> into the content repository before the hashing starts. But I believe that
> the unpack process don’t forward the flow file to the next process before
> it is 100% finish unpacking and saving the new content to the repository.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I have a test flow, where a GenerateFlowfile has created 6x 1GB
> files (2 files per node) and next process was a hashcontent before it run
> into a test loop. Where files are uploaded via PutSFTP to a test server,
> and downloaded again and recalculated the hash. I have had one issue after
> 3 days of running.
> >>>>>>>>
> >>>>>>>> Now the test flow is running without the Put/Fetch sftp
> processors.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Another problem is that I can’t find any correlation to other
> events. Not within NIFI, nor the server itself or VMWare. If I just could
> find any other event which happens at the same time, I might be able to
> force some kind of event to trigger the issue.
> >>>>>>>>
> >>>>>>>> I have tried to force VMware to migrate a NiFi node to another
> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
> trigger and error.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I know it will be very very difficult to reproduce. But I will
> setup multiple NiFi instances running different test flows to see if I can
> find any reason why it behaves as it does.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Kind Regards
> >>>>>>>>
> >>>>>>>> Jens M. Kofoed
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <
> markap14@hotmail.com>:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks for sharing the images.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I tried to setup a test to reproduce the issue. I’ve had it
> running for quite some time. Running through millions of iterations.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to
> the tune of hundreds of MB). I’ve been unable to reproduce an issue after
> millions of iterations.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> So far I cannot replicate. And since you’re pulling the data via
> SFTP and then unpacking, which preserves all original attributes from a
> different system, this can easily become confusing.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Recommend trying to reproduce with SFTP-related processors out of
> the picture, as Joe is mentioning. Either using GetFile/FetchFile or
> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
> an ‘initial hash’, copy that value to another attribute, and then loop,
> generating the hash and comparing against the original one. I’ll attach a
> flow that does this, but not sure if the email server will strip out the
> attachment or not.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> This way we remove any possibility of actual corruption between
> the two nifi instances. If we can still see corruption / different hashes
> within a single nifi instance, then it certainly warrants further
> investigation but i can’t see any issues so far.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> -Mark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Actually is this current loop test contained within a single nifi
> and there you see corruption happen?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Joe
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> You have a very involved setup including other systems (non
> NiFi).  Have you removed those systems from the equation so you have more
> evidence to support your expectation that NiFi is doing something other
> than you expect?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Joe
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Today I have another file which have been running through the
> retry loop one time. To test the processors and the algorithm I added the
> HashContent processor and also added hashing by SHA-1.
> >>>>>>>>
> >>>>>>>> I file have been going through the system, and both the SHA-1 and
> SHA-256 are both different than expected. with a 1 minutes delay the file
> is going back into the hashing content flow and this time it calculates
> both hashes fine.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I don't believe that the hashing is buggy, but something is very
> very strange. What can influence the processors/algorithm to calculate a
> different hash???
> >>>>>>>>
> >>>>>>>> All the input/output claim information is exactly the same. It is
> the same flow/content file going in a loop. It happens on all 3 nodes.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Any suggestions for where to dig ?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>>
> >>>>>>>> Jens M. Kofoed
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Mark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks for replaying and the suggestion to look at the content
> Claim.
> >>>>>>>>
> >>>>>>>> These 3 pictures is from the first attempt:
> >>>>>>>>
> >>>>>>>> <image.png>   <image.png>   <image.png>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Yesterday I realized that the content was still in the archive,
> so I could Replay the file.
> >>>>>>>>
> >>>>>>>> <image.png>
> >>>>>>>>
> >>>>>>>> So here are the same pictures but for the replay and as you can
> see the Identifier, offset and Size are all the same.
> >>>>>>>>
> >>>>>>>> <image.png>   <image.png>   <image.png>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> In my flow if the hash does not match my original first
> calculated hash, it goes into a retry loop. Here are the pictures for the
> 4th time the file went through:
> >>>>>>>>
> >>>>>>>> <image.png>   <image.png>   <image.png>
> >>>>>>>>
> >>>>>>>> Here the content Claim is all the same.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> It is very rare that we see these issues <1 : 1.000.000 files and
> only with large files. Only once have I seen the error with a 110MB file,
> the other times the files size are above 800MB.
> >>>>>>>>
> >>>>>>>> This time it was a Nifi-Flowstream v3 file, which has been
> exported from one system and imported in another. But while the file has
> been imported it is the same file inside NIFI and it stays at the same
> node. Going through the same loop of processors multiple times and in the
> end the CryptographicHashContent calculate a different SHA256 than it did
> earlier. This should not be possible!!! And that is what concern my the
> most.
> >>>>>>>>
> >>>>>>>> What can influence the same processor to calculate 2 different
> sha256 on the exact same content???
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>>
> >>>>>>>> Jens M. Kofoed
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
> markap14@hotmail.com>:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> In the two provenance events - one showing a hash of dd4cc… and
> the other showing f6f0….
> >>>>>>>>
> >>>>>>>> If you go to the Content tab, do they both show the same Content
> Claim? I.e., do the Input Claim / Output Claim show the same values for
> Container, Section, Identifier, Offset, and Size?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> -Mark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Dear NIFI Users
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I have posted this mail in the developers mailing list and just
> want to inform all of our about a very odd behavior we are facing.
> >>>>>>>>
> >>>>>>>> The background:
> >>>>>>>>
> >>>>>>>> We have data going between 2 different NIFI systems which has no
> direct network access to each other. Therefore we calculate a SHA256 hash
> value of the content at system 1, before the flowfile and data are combined
> and saved as a "flowfile-stream-v3" pkg file. The file is then transported
> to system 2, where the pkg file is unpacked and the flow can continue. To
> be sure about file integrity we calculate a new sha256 at system 2. But
> sometimes we see that the sha256 gets another value, which might suggest
> the file was corrupted. But recalculating the sha256 again gives a new hash
> value.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ----
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Tonight I had yet another file which didn't match the expected
> sha256 hash value. The content is a 1.7GB file and the Event Duration was
> "00:00:17.539" to calculate the hash.
> >>>>>>>>
> >>>>>>>> I have created a Retry loop, where the file will go to a Wait
> process for delaying the file 1 minute and going back to the
> CryptographicHashContent for a new calculation. After 3 retries the file
> goes to the retries_exceeded and goes to a disabled process just to be in a
> queue so I manually can look at it. This morning I rerouted the file from
> my retries_exceeded queue back to the CryptographicHashContent for a new
> calculation and this time it calculated the correct hash value.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very
> strange is happening.
> >>>>>>>>
> >>>>>>>> <image.png>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02
> with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> >>>>>>>>
> >>>>>>>> I have inspected different logs to see if I can find any
> correlation what happened at the same time as the file is going through my
> loop, but there are no event/task at that exact time.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> System 1:
> >>>>>>>>
> >>>>>>>> At 10/19/2021 00:15:11.247 CEST my file is going through a
> CryptographicHashContent: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> >>>>>>>>
> >>>>>>>> The file is exported as a "FlowFile Stream, v3" to System 2
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> SYSTEM 2:
> >>>>>>>>
> >>>>>>>> At 10/19/2021 00:18:10.528 CEST the file is going through a
> CryptographicHashContent: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>>>>>>>
> >>>>>>>> <image.png>
> >>>>>>>>
> >>>>>>>> At 10/19/2021 00:19:08.996 CEST the file is going through the
> same CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>>>>>>>
> >>>>>>>> At 10/19/2021 00:20:04.376 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>>>>>>>
> >>>>>>>> At 10/19/2021 00:21:01.711 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> At 10/19/2021 06:07:43.376 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> >>>>>>>>
> >>>>>>>> <image.png>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> How on earth can this happen???
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Kind Regards
> >>>>>>>>
> >>>>>>>> Jens M. Kofoed
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> <Repro.json>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> <Try_to_recreate_Jens_Challenge.json>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>
> >>>
>
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Dear Joe and Mark

I'm very glad you're trying to help me here. I'm sorry if you feel I do not
want to try the suggestions you make. These are not meant to be. I also
thought it's totally weird, the things that happen.
However, I'm glad to hear that you can not reproduce it. Unfortunately,
this indicates an internal problem in our server setup.

What I wrote to Mark was not that I would not try the suggestions he comes
up with. However, when a file fails in the flow, it may well go through the
flow after some time. Therefore, I do not think that it is the file itself
that has been changed, but more a problem with reading the right data.
Otherwise the file will have to change 2 times.
But I would very much like to set this up as well, to confirm if data is
being changed back and forth. The challenge is that I can not monitor the
system 24/7. So when it fails, it may take many hours before I can make a
new backup and compare it with the first one.

When I return to the office and the system tomorrow, I will start
implementing the suggestions that you have made. Which I have not already
implementet.
I will also lookup which garbage collector I'm using. I have also searched
the internet for other issues with linux filesystems and vmWare. And there
are a "driver" suggestion which I also would like to test.

As I wrote, I'm very very pleased for all what you both are doing.

Kind regards
Jens M. Kofoed

Den ons. 3. nov. 2021 kl. 16.25 skrev Joe Witt <jo...@gmail.com>:

> Jens,
>
> I think we're at a loss how to help you specifically then for your
> specific installation.  We have attempted to recreate the scenario
> with no luck.  We've offered suggestions on experiments which would
> help us narrow in but you don't think that will help.
>
> At this point we'll probably have to leave this thread here.  If you
> used the forced sync properties we mentioned and it is still happening
> then you can pretty much ensure the issue is with the JVM or the
> virtual file system mechanism.
>
> Thanks
> Joe
>
> On Wed, Nov 3, 2021 at 8:09 AM Jens M. Kofoed <jm...@gmail.com>
> wrote:
> >
> > Hi Mark
> >
> > All the files in my testflow are 1GB files. But it happens in my
> production flow with different file sizes.
> >
> > When these issues have happened, I have the flowfile routed to an
> updateAttribute process which is disabled. Just to keep the file in a
> queue. Enable the process and sent the file back to a new hash calculation,
> the file is OK. So I don’t think the test with backup and compare makes any
> sense to do.
> >
> > Regards
> > Jens
> >
> > > Den 3. nov. 2021 kl. 15.57 skrev Mark Payne <ma...@hotmail.com>:
> > >
> > > So what I found interesting about the histogram output was that in
> each case, the input file was 1 GB. The number of bytes that differed
> between the ‘good’ and ‘bad’ hashes was something like 500-700 bytes whose
> values were different. But the values ranged significantly. There was no
> indication that the type of thing we’ve seen with NFS mounts was happening,
> where data was nulled out until received and then updated. If that had been
> the case we’d have seen the NUL byte (or some other value) have a very
> significant change in the histogram, but we didn’t see that.
> > >
> > > So a couple more ideas that I think can be useful.
> > >
> > > 1) Which garbage collector are you using? It’s configured in the
> bootstrap.conf file
> > >
> > > 2) We can try to definitively prove out whether the content on the
> disk is changing or if there’s an issue reading the content. To do this:
> > >
> > > 1. Stop all processors.
> > > 2. Shutdown nifi
> > > 3. rm -rf content_repository; rm -rf flowfile_repository   (warning,
> this will delete all FlowFiles & content, so only do this on a dev/test
> system where you’re comfortable deleting it!)
> > > 4. Start nifi
> > > 5. Let exactly 1 FlowFile into your flow.
> > > 6. While it is looping through, create a copy of your entire Content
> Repository: cp -r content_repository content_backup1; zip
> content_backup1.zip content_backup1
> > > 7. Wait for the hashes to differ
> > > 8. Create another copy of the Content Repository: cp -r
> content_repository content_backup2
> > > 9. Find the files within the content_backup1 and content_backup2 and
> compare them to see if they are identical. Would recommend comparing them
> using each of the 3 methods: sha256, sha512, diff
> > >
> > > This should make it pretty clear that either:
> > > (1) the issue resides in the software: either NiFi or the JVM
> > > (2) the issue resides outside of the software: the disk, the disk
> driver, the operating system, the VM hypervisor, etc.
> > >
> > > Thanks
> > > -Mark
> > >
> > >> On Nov 3, 2021, at 10:44 AM, Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >> Jens,
> > >>
> > >> 184 hours (7.6 days) in and zero issues.
> > >>
> > >> Will need to turn this off soon but wanted to give a final update.
> > >> Looks great.  Given the information on your system there appears to be
> > >> something we dont understand related to the virtual file system
> > >> involved or something.
> > >>
> > >> Thanks
> > >>
> > >>> On Tue, Nov 2, 2021 at 10:55 PM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > >>>
> > >>> Hi Mark
> > >>>
> > >>> Of course, sorry :-)  By looking at the error messages, I can see
> that it is only the histograms which has differences which is listed. And
> all 3 have the first issue at histogram.9. Don't know what that mean
> > >>>
> > >>> /Jens
> > >>>
> > >>> Here are the error log:
> > >>> 2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10]
> org.apache.nifi.processors.script.ExecuteScript
> ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are
> differences in the histogram
> > >>> Byte Value: histogram.10, Previous Count: 11926720, New Count:
> 11926721
> > >>> Byte Value: histogram.100, Previous Count: 11927504, New Count:
> 11927503
> > >>> Byte Value: histogram.101, Previous Count: 11925396, New Count:
> 11925407
> > >>> Byte Value: histogram.102, Previous Count: 11929923, New Count:
> 11929941
> > >>> Byte Value: histogram.103, Previous Count: 11931596, New Count:
> 11931591
> > >>> Byte Value: histogram.104, Previous Count: 11929071, New Count:
> 11929064
> > >>> Byte Value: histogram.105, Previous Count: 11931365, New Count:
> 11931348
> > >>> Byte Value: histogram.106, Previous Count: 11928661, New Count:
> 11928645
> > >>> Byte Value: histogram.107, Previous Count: 11929864, New Count:
> 11929866
> > >>> Byte Value: histogram.108, Previous Count: 11931611, New Count:
> 11931642
> > >>> Byte Value: histogram.109, Previous Count: 11932758, New Count:
> 11932763
> > >>> Byte Value: histogram.110, Previous Count: 11927893, New Count:
> 11927895
> > >>> Byte Value: histogram.111, Previous Count: 11933519, New Count:
> 11933522
> > >>> Byte Value: histogram.112, Previous Count: 11931392, New Count:
> 11931397
> > >>> Byte Value: histogram.113, Previous Count: 11928534, New Count:
> 11928548
> > >>> Byte Value: histogram.114, Previous Count: 11936879, New Count:
> 11936874
> > >>> Byte Value: histogram.115, Previous Count: 11932818, New Count:
> 11932804
> > >>> Byte Value: histogram.117, Previous Count: 11929143, New Count:
> 11929151
> > >>> Byte Value: histogram.118, Previous Count: 11931854, New Count:
> 11931829
> > >>> Byte Value: histogram.119, Previous Count: 11926333, New Count:
> 11926327
> > >>> Byte Value: histogram.120, Previous Count: 11928731, New Count:
> 11928740
> > >>> Byte Value: histogram.121, Previous Count: 11931149, New Count:
> 11931162
> > >>> Byte Value: histogram.122, Previous Count: 11926725, New Count:
> 11926733
> > >>> Byte Value: histogram.32, Previous Count: 11930422, New Count:
> 11930425
> > >>> Byte Value: histogram.33, Previous Count: 11934311, New Count:
> 11934313
> > >>> Byte Value: histogram.34, Previous Count: 11930459, New Count:
> 11930446
> > >>> Byte Value: histogram.35, Previous Count: 11924776, New Count:
> 11924758
> > >>> Byte Value: histogram.36, Previous Count: 11924186, New Count:
> 11924183
> > >>> Byte Value: histogram.37, Previous Count: 11928616, New Count:
> 11928627
> > >>> Byte Value: histogram.38, Previous Count: 11929474, New Count:
> 11929490
> > >>> Byte Value: histogram.39, Previous Count: 11929607, New Count:
> 11929600
> > >>> Byte Value: histogram.40, Previous Count: 11928053, New Count:
> 11928048
> > >>> Byte Value: histogram.41, Previous Count: 11930402, New Count:
> 11930399
> > >>> Byte Value: histogram.42, Previous Count: 11926830, New Count:
> 11926846
> > >>> Byte Value: histogram.44, Previous Count: 11932536, New Count:
> 11932538
> > >>> Byte Value: histogram.45, Previous Count: 11931053, New Count:
> 11931044
> > >>> Byte Value: histogram.46, Previous Count: 11930008, New Count:
> 11930011
> > >>> Byte Value: histogram.47, Previous Count: 11927747, New Count:
> 11927734
> > >>> Byte Value: histogram.48, Previous Count: 11936055, New Count:
> 11936057
> > >>> Byte Value: histogram.49, Previous Count: 11931471, New Count:
> 11931474
> > >>> Byte Value: histogram.50, Previous Count: 11931921, New Count:
> 11931908
> > >>> Byte Value: histogram.51, Previous Count: 11929643, New Count:
> 11929637
> > >>> Byte Value: histogram.52, Previous Count: 11923847, New Count:
> 11923854
> > >>> Byte Value: histogram.53, Previous Count: 11927311, New Count:
> 11927303
> > >>> Byte Value: histogram.54, Previous Count: 11933754, New Count:
> 11933766
> > >>> Byte Value: histogram.55, Previous Count: 11925964, New Count:
> 11925970
> > >>> Byte Value: histogram.56, Previous Count: 11928872, New Count:
> 11928873
> > >>> Byte Value: histogram.57, Previous Count: 11931124, New Count:
> 11931127
> > >>> Byte Value: histogram.58, Previous Count: 11928474, New Count:
> 11928477
> > >>> Byte Value: histogram.59, Previous Count: 11925814, New Count:
> 11925812
> > >>> Byte Value: histogram.60, Previous Count: 11933978, New Count:
> 11933991
> > >>> Byte Value: histogram.61, Previous Count: 11934136, New Count:
> 11934123
> > >>> Byte Value: histogram.62, Previous Count: 11932016, New Count:
> 11932011
> > >>> Byte Value: histogram.63, Previous Count: 23864588, New Count:
> 23864584
> > >>> Byte Value: histogram.64, Previous Count: 11924792, New Count:
> 11924789
> > >>> Byte Value: histogram.65, Previous Count: 11934789, New Count:
> 11934797
> > >>> Byte Value: histogram.66, Previous Count: 11933047, New Count:
> 11933044
> > >>> Byte Value: histogram.67, Previous Count: 11931899, New Count:
> 11931909
> > >>> Byte Value: histogram.68, Previous Count: 11935615, New Count:
> 11935609
> > >>> Byte Value: histogram.69, Previous Count: 11927249, New Count:
> 11927239
> > >>> Byte Value: histogram.70, Previous Count: 11933276, New Count:
> 11933274
> > >>> Byte Value: histogram.71, Previous Count: 11927953, New Count:
> 11927969
> > >>> Byte Value: histogram.72, Previous Count: 11929275, New Count:
> 11929266
> > >>> Byte Value: histogram.73, Previous Count: 11930292, New Count:
> 11930306
> > >>> Byte Value: histogram.74, Previous Count: 11935428, New Count:
> 11935427
> > >>> Byte Value: histogram.75, Previous Count: 11930317, New Count:
> 11930307
> > >>> Byte Value: histogram.76, Previous Count: 11935737, New Count:
> 11935726
> > >>> Byte Value: histogram.77, Previous Count: 11932127, New Count:
> 11932125
> > >>> Byte Value: histogram.78, Previous Count: 11932344, New Count:
> 11932349
> > >>> Byte Value: histogram.79, Previous Count: 11932094, New Count:
> 11932100
> > >>> Byte Value: histogram.80, Previous Count: 11930688, New Count:
> 11930687
> > >>> Byte Value: histogram.81, Previous Count: 11928415, New Count:
> 11928416
> > >>> Byte Value: histogram.82, Previous Count: 11931559, New Count:
> 11931542
> > >>> Byte Value: histogram.83, Previous Count: 11934192, New Count:
> 11934176
> > >>> Byte Value: histogram.84, Previous Count: 11927224, New Count:
> 11927231
> > >>> Byte Value: histogram.85, Previous Count: 11929491, New Count:
> 11929484
> > >>> Byte Value: histogram.87, Previous Count: 11932201, New Count:
> 11932190
> > >>> Byte Value: histogram.88, Previous Count: 11930694, New Count:
> 11930680
> > >>> Byte Value: histogram.89, Previous Count: 11936439, New Count:
> 11936448
> > >>> Byte Value: histogram.9, Previous Count: 11933187, New Count:
> 11933193
> > >>> Byte Value: histogram.90, Previous Count: 11926445, New Count:
> 11926455
> > >>> Byte Value: histogram.94, Previous Count: 11931596, New Count:
> 11931609
> > >>> Byte Value: histogram.95, Previous Count: 11929379, New Count:
> 11929384
> > >>> Byte Value: histogram.97, Previous Count: 11928864, New Count:
> 11928874
> > >>> Byte Value: histogram.98, Previous Count: 11924738, New Count:
> 11924729
> > >>> Byte Value: histogram.99, Previous Count: 11930062, New Count:
> 11930059
> > >>>
> > >>> 2021-11-01 22:10:02,765 ERROR [Timer-Driven Process Thread-9]
> org.apache.nifi.processors.script.ExecuteScript
> ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are
> differences in the histogram
> > >>> Byte Value: histogram.10, Previous Count: 11932402, New Count:
> 11932407
> > >>> Byte Value: histogram.100, Previous Count: 11927531, New Count:
> 11927541
> > >>> Byte Value: histogram.101, Previous Count: 11928454, New Count:
> 11928430
> > >>> Byte Value: histogram.102, Previous Count: 11934432, New Count:
> 11934439
> > >>> Byte Value: histogram.103, Previous Count: 11924623, New Count:
> 11924633
> > >>> Byte Value: histogram.104, Previous Count: 11934492, New Count:
> 11934474
> > >>> Byte Value: histogram.105, Previous Count: 11934585, New Count:
> 11934591
> > >>> Byte Value: histogram.106, Previous Count: 11928955, New Count:
> 11928948
> > >>> Byte Value: histogram.108, Previous Count: 11930139, New Count:
> 11930140
> > >>> Byte Value: histogram.109, Previous Count: 11929325, New Count:
> 11929321
> > >>> Byte Value: histogram.110, Previous Count: 11930486, New Count:
> 11930478
> > >>> Byte Value: histogram.111, Previous Count: 11933517, New Count:
> 11933508
> > >>> Byte Value: histogram.112, Previous Count: 11928334, New Count:
> 11928339
> > >>> Byte Value: histogram.114, Previous Count: 11929222, New Count:
> 11929213
> > >>> Byte Value: histogram.116, Previous Count: 11931182, New Count:
> 11931188
> > >>> Byte Value: histogram.117, Previous Count: 11933407, New Count:
> 11933402
> > >>> Byte Value: histogram.118, Previous Count: 11932709, New Count:
> 11932705
> > >>> Byte Value: histogram.120, Previous Count: 11933700, New Count:
> 11933708
> > >>> Byte Value: histogram.121, Previous Count: 11929803, New Count:
> 11929801
> > >>> Byte Value: histogram.122, Previous Count: 11930218, New Count:
> 11930220
> > >>> Byte Value: histogram.32, Previous Count: 11924458, New Count:
> 11924469
> > >>> Byte Value: histogram.33, Previous Count: 11934243, New Count:
> 11934248
> > >>> Byte Value: histogram.34, Previous Count: 11930696, New Count:
> 11930700
> > >>> Byte Value: histogram.35, Previous Count: 11925574, New Count:
> 11925577
> > >>> Byte Value: histogram.36, Previous Count: 11929198, New Count:
> 11929187
> > >>> Byte Value: histogram.37, Previous Count: 11928146, New Count:
> 11928143
> > >>> Byte Value: histogram.38, Previous Count: 11932505, New Count:
> 11932510
> > >>> Byte Value: histogram.39, Previous Count: 11929406, New Count:
> 11929412
> > >>> Byte Value: histogram.40, Previous Count: 11930100, New Count:
> 11930098
> > >>> Byte Value: histogram.41, Previous Count: 11930867, New Count:
> 11930872
> > >>> Byte Value: histogram.42, Previous Count: 11930796, New Count:
> 11930793
> > >>> Byte Value: histogram.43, Previous Count: 11930796, New Count:
> 11930789
> > >>> Byte Value: histogram.44, Previous Count: 11921866, New Count:
> 11921865
> > >>> Byte Value: histogram.45, Previous Count: 11935682, New Count:
> 11935699
> > >>> Byte Value: histogram.46, Previous Count: 11930075, New Count:
> 11930073
> > >>> Byte Value: histogram.47, Previous Count: 11928169, New Count:
> 11928165
> > >>> Byte Value: histogram.48, Previous Count: 11933490, New Count:
> 11933478
> > >>> Byte Value: histogram.49, Previous Count: 11932174, New Count:
> 11932180
> > >>> Byte Value: histogram.50, Previous Count: 11933255, New Count:
> 11933239
> > >>> Byte Value: histogram.51, Previous Count: 11934009, New Count:
> 11934013
> > >>> Byte Value: histogram.52, Previous Count: 11928361, New Count:
> 11928367
> > >>> Byte Value: histogram.53, Previous Count: 11927626, New Count:
> 11927627
> > >>> Byte Value: histogram.54, Previous Count: 11931611, New Count:
> 11931617
> > >>> Byte Value: histogram.55, Previous Count: 11930755, New Count:
> 11930746
> > >>> Byte Value: histogram.56, Previous Count: 11933823, New Count:
> 11933824
> > >>> Byte Value: histogram.57, Previous Count: 11922508, New Count:
> 11922510
> > >>> Byte Value: histogram.58, Previous Count: 11930384, New Count:
> 11930362
> > >>> Byte Value: histogram.59, Previous Count: 11929805, New Count:
> 11929820
> > >>> Byte Value: histogram.60, Previous Count: 11930064, New Count:
> 11930055
> > >>> Byte Value: histogram.61, Previous Count: 11926761, New Count:
> 11926762
> > >>> Byte Value: histogram.62, Previous Count: 11927605, New Count:
> 11927604
> > >>> Byte Value: histogram.63, Previous Count: 23858926, New Count:
> 23858913
> > >>> Byte Value: histogram.64, Previous Count: 11929516, New Count:
> 11929512
> > >>> Byte Value: histogram.65, Previous Count: 11930217, New Count:
> 11930223
> > >>> Byte Value: histogram.66, Previous Count: 11930478, New Count:
> 11930481
> > >>> Byte Value: histogram.67, Previous Count: 11939855, New Count:
> 11939858
> > >>> Byte Value: histogram.68, Previous Count: 11927850, New Count:
> 11927852
> > >>> Byte Value: histogram.69, Previous Count: 11931154, New Count:
> 11931175
> > >>> Byte Value: histogram.70, Previous Count: 11935374, New Count:
> 11935369
> > >>> Byte Value: histogram.71, Previous Count: 11930754, New Count:
> 11930751
> > >>> Byte Value: histogram.72, Previous Count: 11928304, New Count:
> 11928318
> > >>> Byte Value: histogram.73, Previous Count: 11931772, New Count:
> 11931766
> > >>> Byte Value: histogram.74, Previous Count: 11939417, New Count:
> 11939426
> > >>> Byte Value: histogram.75, Previous Count: 11930712, New Count:
> 11930718
> > >>> Byte Value: histogram.76, Previous Count: 11933331, New Count:
> 11933346
> > >>> Byte Value: histogram.77, Previous Count: 11931279, New Count:
> 11931272
> > >>> Byte Value: histogram.78, Previous Count: 11928276, New Count:
> 11928290
> > >>> Byte Value: histogram.79, Previous Count: 11930071, New Count:
> 11930067
> > >>> Byte Value: histogram.80, Previous Count: 11927830, New Count:
> 11927825
> > >>> Byte Value: histogram.81, Previous Count: 11931213, New Count:
> 11931206
> > >>> Byte Value: histogram.82, Previous Count: 11930964, New Count:
> 11930958
> > >>> Byte Value: histogram.83, Previous Count: 11928973, New Count:
> 11928966
> > >>> Byte Value: histogram.84, Previous Count: 11934325, New Count:
> 11934331
> > >>> Byte Value: histogram.85, Previous Count: 11929658, New Count:
> 11929654
> > >>> Byte Value: histogram.86, Previous Count: 11924667, New Count:
> 11924666
> > >>> Byte Value: histogram.87, Previous Count: 11931100, New Count:
> 11931106
> > >>> Byte Value: histogram.88, Previous Count: 11930252, New Count:
> 11930248
> > >>> Byte Value: histogram.89, Previous Count: 11927281, New Count:
> 11927299
> > >>> Byte Value: histogram.9, Previous Count: 11932848, New Count:
> 11932851
> > >>> Byte Value: histogram.90, Previous Count: 11930398, New Count:
> 11930399
> > >>> Byte Value: histogram.94, Previous Count: 11928720, New Count:
> 11928715
> > >>> Byte Value: histogram.95, Previous Count: 11928988, New Count:
> 11928977
> > >>> Byte Value: histogram.97, Previous Count: 11931423, New Count:
> 11931426
> > >>> Byte Value: histogram.98, Previous Count: 11928181, New Count:
> 11928184
> > >>> Byte Value: histogram.99, Previous Count: 11935549, New Count:
> 11935542
> > >>>
> > >>> 2021-11-01 22:23:08,989 ERROR [Timer-Driven Process Thread-10]
> org.apache.nifi.processors.script.ExecuteScript
> ExecuteScript[id=24d13930-49e8-1062-9a2c-943118738138] There are
> differences in the histogram
> > >>> Byte Value: histogram.10, Previous Count: 11930417, New Count:
> 11930411
> > >>> Byte Value: histogram.100, Previous Count: 11926739, New Count:
> 11926755
> > >>> Byte Value: histogram.101, Previous Count: 11930580, New Count:
> 11930574
> > >>> Byte Value: histogram.102, Previous Count: 11928210, New Count:
> 11928202
> > >>> Byte Value: histogram.103, Previous Count: 11935300, New Count:
> 11935297
> > >>> Byte Value: histogram.104, Previous Count: 11925804, New Count:
> 11925820
> > >>> Byte Value: histogram.105, Previous Count: 11931023, New Count:
> 11931012
> > >>> Byte Value: histogram.106, Previous Count: 11932342, New Count:
> 11932344
> > >>> Byte Value: histogram.108, Previous Count: 11930098, New Count:
> 11930106
> > >>> Byte Value: histogram.109, Previous Count: 11930759, New Count:
> 11930750
> > >>> Byte Value: histogram.110, Previous Count: 11934343, New Count:
> 11934352
> > >>> Byte Value: histogram.111, Previous Count: 11935775, New Count:
> 11935782
> > >>> Byte Value: histogram.112, Previous Count: 11933877, New Count:
> 11933884
> > >>> Byte Value: histogram.113, Previous Count: 11926675, New Count:
> 11926674
> > >>> Byte Value: histogram.114, Previous Count: 11929332, New Count:
> 11929336
> > >>> Byte Value: histogram.115, Previous Count: 11928876, New Count:
> 11928878
> > >>> Byte Value: histogram.116, Previous Count: 11927819, New Count:
> 11927833
> > >>> Byte Value: histogram.117, Previous Count: 11932657, New Count:
> 11932638
> > >>> Byte Value: histogram.118, Previous Count: 11933508, New Count:
> 11933507
> > >>> Byte Value: histogram.119, Previous Count: 11928808, New Count:
> 11928821
> > >>> Byte Value: histogram.120, Previous Count: 11937532, New Count:
> 11937528
> > >>> Byte Value: histogram.121, Previous Count: 11926907, New Count:
> 11926921
> > >>> Byte Value: histogram.32, Previous Count: 11929486, New Count:
> 11929489
> > >>> Byte Value: histogram.33, Previous Count: 11930737, New Count:
> 11930741
> > >>> Byte Value: histogram.34, Previous Count: 11931092, New Count:
> 11931086
> > >>> Byte Value: histogram.36, Previous Count: 11927605, New Count:
> 11927615
> > >>> Byte Value: histogram.37, Previous Count: 11930735, New Count:
> 11930745
> > >>> Byte Value: histogram.38, Previous Count: 11932174, New Count:
> 11932178
> > >>> Byte Value: histogram.39, Previous Count: 11936180, New Count:
> 11936182
> > >>> Byte Value: histogram.40, Previous Count: 11931666, New Count:
> 11931676
> > >>> Byte Value: histogram.41, Previous Count: 11927043, New Count:
> 11927034
> > >>> Byte Value: histogram.42, Previous Count: 11929044, New Count:
> 11929042
> > >>> Byte Value: histogram.43, Previous Count: 11934104, New Count:
> 11934098
> > >>> Byte Value: histogram.44, Previous Count: 11936337, New Count:
> 11936346
> > >>> Byte Value: histogram.45, Previous Count: 11935580, New Count:
> 11935582
> > >>> Byte Value: histogram.46, Previous Count: 11929598, New Count:
> 11929599
> > >>> Byte Value: histogram.47, Previous Count: 11934083, New Count:
> 11934085
> > >>> Byte Value: histogram.48, Previous Count: 11928858, New Count:
> 11928860
> > >>> Byte Value: histogram.49, Previous Count: 11931098, New Count:
> 11931113
> > >>> Byte Value: histogram.50, Previous Count: 11930618, New Count:
> 11930614
> > >>> Byte Value: histogram.51, Previous Count: 11925429, New Count:
> 11925435
> > >>> Byte Value: histogram.52, Previous Count: 11929741, New Count:
> 11929733
> > >>> Byte Value: histogram.53, Previous Count: 11934160, New Count:
> 11934155
> > >>> Byte Value: histogram.54, Previous Count: 11931999, New Count:
> 11931980
> > >>> Byte Value: histogram.55, Previous Count: 11930465, New Count:
> 11930477
> > >>> Byte Value: histogram.56, Previous Count: 11926194, New Count:
> 11926190
> > >>> Byte Value: histogram.57, Previous Count: 11926386, New Count:
> 11926381
> > >>> Byte Value: histogram.58, Previous Count: 11924871, New Count:
> 11924865
> > >>> Byte Value: histogram.59, Previous Count: 11929331, New Count:
> 11929326
> > >>> Byte Value: histogram.60, Previous Count: 11926951, New Count:
> 11926943
> > >>> Byte Value: histogram.61, Previous Count: 11928631, New Count:
> 11928619
> > >>> Byte Value: histogram.62, Previous Count: 11927549, New Count:
> 11927553
> > >>> Byte Value: histogram.63, Previous Count: 23856730, New Count:
> 23856718
> > >>> Byte Value: histogram.64, Previous Count: 11930288, New Count:
> 11930293
> > >>> Byte Value: histogram.65, Previous Count: 11931523, New Count:
> 11931527
> > >>> Byte Value: histogram.66, Previous Count: 11932821, New Count:
> 11932818
> > >>> Byte Value: histogram.67, Previous Count: 11932509, New Count:
> 11932510
> > >>> Byte Value: histogram.68, Previous Count: 11929613, New Count:
> 11929614
> > >>> Byte Value: histogram.69, Previous Count: 11928651, New Count:
> 11928654
> > >>> Byte Value: histogram.70, Previous Count: 11929253, New Count:
> 11929247
> > >>> Byte Value: histogram.71, Previous Count: 11931521, New Count:
> 11931512
> > >>> Byte Value: histogram.72, Previous Count: 11925805, New Count:
> 11925808
> > >>> Byte Value: histogram.73, Previous Count: 11934833, New Count:
> 11934826
> > >>> Byte Value: histogram.74, Previous Count: 11928314, New Count:
> 11928312
> > >>> Byte Value: histogram.75, Previous Count: 11923854, New Count:
> 11923863
> > >>> Byte Value: histogram.76, Previous Count: 11930892, New Count:
> 11930898
> > >>> Byte Value: histogram.77, Previous Count: 11927528, New Count:
> 11927525
> > >>> Byte Value: histogram.78, Previous Count: 11932850, New Count:
> 11932857
> > >>> Byte Value: histogram.79, Previous Count: 11934471, New Count:
> 11934461
> > >>> Byte Value: histogram.80, Previous Count: 11925707, New Count:
> 11925714
> > >>> Byte Value: histogram.81, Previous Count: 11929213, New Count:
> 11929206
> > >>> Byte Value: histogram.82, Previous Count: 11931334, New Count:
> 11931323
> > >>> Byte Value: histogram.83, Previous Count: 11936739, New Count:
> 11936732
> > >>> Byte Value: histogram.84, Previous Count: 11927855, New Count:
> 11927832
> > >>> Byte Value: histogram.85, Previous Count: 11931668, New Count:
> 11931665
> > >>> Byte Value: histogram.86, Previous Count: 11928609, New Count:
> 11928604
> > >>> Byte Value: histogram.87, Previous Count: 11931930, New Count:
> 11931933
> > >>> Byte Value: histogram.88, Previous Count: 11934341, New Count:
> 11934345
> > >>> Byte Value: histogram.89, Previous Count: 11927519, New Count:
> 11927518
> > >>> Byte Value: histogram.9, Previous Count: 11928004, New Count:
> 11928001
> > >>> Byte Value: histogram.90, Previous Count: 11933502, New Count:
> 11933517
> > >>> Byte Value: histogram.94, Previous Count: 11932024, New Count:
> 11932035
> > >>> Byte Value: histogram.95, Previous Count: 11932693, New Count:
> 11932679
> > >>> Byte Value: histogram.97, Previous Count: 11928428, New Count:
> 11928424
> > >>> Byte Value: histogram.98, Previous Count: 11933195, New Count:
> 11933196
> > >>> Byte Value: histogram.99, Previous Count: 11924273, New Count:
> 11924282
> > >>>
> > >>>> Den tir. 2. nov. 2021 kl. 15.41 skrev Mark Payne <
> markap14@hotmail.com>:
> > >>>>
> > >>>> Jens,
> > >>>>
> > >>>> The histograms, in and of themselves, are not very interesting. The
> interesting thing would be the difference in the histogram before & after
> the hash. Can you provide the ERROR level logs generated by the
> ExecuteScript? That’s what is of interest.
> > >>>>
> > >>>> Thanks
> > >>>> -Mark
> > >>>>
> > >>>>
> > >>>> On Nov 2, 2021, at 1:35 AM, Jens M. Kofoed <jm...@gmail.com>
> wrote:
> > >>>>
> > >>>> Hi Mark and Joe
> > >>>>
> > >>>> Yesterday morning I implemented Mark's script in my 2 testflows.
> One testflow using sftp the other MergeContent/UnpackContent. Both testflow
> are running at a test cluster with 3 nodes and NIFI 1.14.0
> > >>>> The 1st flow with sftp have had 1 file going into the failure queue
> after about 16 hours.
> > >>>> The 2nd flow have had 2 files  going into the failure queue after
> about 15 and 17 hours.
> > >>>>
> > >>>> There are definitely something going wrongs in my setup, but I
> can't figure out what.
> > >>>>
> > >>>> Information from file 1:
> > >>>> histogram.0;0
> > >>>> histogram.1;0
> > >>>> histogram.10;11926720
> > >>>> histogram.100;11927504
> > >>>> histogram.101;11925396
> > >>>> histogram.102;11929923
> > >>>> histogram.103;11931596
> > >>>> histogram.104;11929071
> > >>>> histogram.105;11931365
> > >>>> histogram.106;11928661
> > >>>> histogram.107;11929864
> > >>>> histogram.108;11931611
> > >>>> histogram.109;11932758
> > >>>> histogram.11;0
> > >>>> histogram.110;11927893
> > >>>> histogram.111;11933519
> > >>>> histogram.112;11931392
> > >>>> histogram.113;11928534
> > >>>> histogram.114;11936879
> > >>>> histogram.115;11932818
> > >>>> histogram.116;11934767
> > >>>> histogram.117;11929143
> > >>>> histogram.118;11931854
> > >>>> histogram.119;11926333
> > >>>> histogram.12;0
> > >>>> histogram.120;11928731
> > >>>> histogram.121;11931149
> > >>>> histogram.122;11926725
> > >>>> histogram.123;0
> > >>>> histogram.124;0
> > >>>> histogram.125;0
> > >>>> histogram.126;0
> > >>>> histogram.127;0
> > >>>> histogram.128;0
> > >>>> histogram.129;0
> > >>>> histogram.13;0
> > >>>> histogram.130;0
> > >>>> histogram.131;0
> > >>>> histogram.132;0
> > >>>> histogram.133;0
> > >>>> histogram.134;0
> > >>>> histogram.135;0
> > >>>> histogram.136;0
> > >>>> histogram.137;0
> > >>>> histogram.138;0
> > >>>> histogram.139;0
> > >>>> histogram.14;0
> > >>>> histogram.140;0
> > >>>> histogram.141;0
> > >>>> histogram.142;0
> > >>>> histogram.143;0
> > >>>> histogram.144;0
> > >>>> histogram.145;0
> > >>>> histogram.146;0
> > >>>> histogram.147;0
> > >>>> histogram.148;0
> > >>>> histogram.149;0
> > >>>> histogram.15;0
> > >>>> histogram.150;0
> > >>>> histogram.151;0
> > >>>> histogram.152;0
> > >>>> histogram.153;0
> > >>>> histogram.154;0
> > >>>> histogram.155;0
> > >>>> histogram.156;0
> > >>>> histogram.157;0
> > >>>> histogram.158;0
> > >>>> histogram.159;0
> > >>>> histogram.16;0
> > >>>> histogram.160;0
> > >>>> histogram.161;0
> > >>>> histogram.162;0
> > >>>> histogram.163;0
> > >>>> histogram.164;0
> > >>>> histogram.165;0
> > >>>> histogram.166;0
> > >>>> histogram.167;0
> > >>>> histogram.168;0
> > >>>> histogram.169;0
> > >>>> histogram.17;0
> > >>>> histogram.170;0
> > >>>> histogram.171;0
> > >>>> histogram.172;0
> > >>>> histogram.173;0
> > >>>> histogram.174;0
> > >>>> histogram.175;0
> > >>>> histogram.176;0
> > >>>> histogram.177;0
> > >>>> histogram.178;0
> > >>>> histogram.179;0
> > >>>> histogram.18;0
> > >>>> histogram.180;0
> > >>>> histogram.181;0
> > >>>> histogram.182;0
> > >>>> histogram.183;0
> > >>>> histogram.184;0
> > >>>> histogram.185;0
> > >>>> histogram.186;0
> > >>>> histogram.187;0
> > >>>> histogram.188;0
> > >>>> histogram.189;0
> > >>>> histogram.19;0
> > >>>> histogram.190;0
> > >>>> histogram.191;0
> > >>>> histogram.192;0
> > >>>> histogram.193;0
> > >>>> histogram.194;0
> > >>>> histogram.195;0
> > >>>> histogram.196;0
> > >>>> histogram.197;0
> > >>>> histogram.198;0
> > >>>> histogram.199;0
> > >>>> histogram.2;0
> > >>>> histogram.20;0
> > >>>> histogram.200;0
> > >>>> histogram.201;0
> > >>>> histogram.202;0
> > >>>> histogram.203;0
> > >>>> histogram.204;0
> > >>>> histogram.205;0
> > >>>> histogram.206;0
> > >>>> histogram.207;0
> > >>>> histogram.208;0
> > >>>> histogram.209;0
> > >>>> histogram.21;0
> > >>>> histogram.210;0
> > >>>> histogram.211;0
> > >>>> histogram.212;0
> > >>>> histogram.213;0
> > >>>> histogram.214;0
> > >>>> histogram.215;0
> > >>>> histogram.216;0
> > >>>> histogram.217;0
> > >>>> histogram.218;0
> > >>>> histogram.219;0
> > >>>> histogram.22;0
> > >>>> histogram.220;0
> > >>>> histogram.221;0
> > >>>> histogram.222;0
> > >>>> histogram.223;0
> > >>>> histogram.224;0
> > >>>> histogram.225;0
> > >>>> histogram.226;0
> > >>>> histogram.227;0
> > >>>> histogram.228;0
> > >>>> histogram.229;0
> > >>>> histogram.23;0
> > >>>> histogram.230;0
> > >>>> histogram.231;0
> > >>>> histogram.232;0
> > >>>> histogram.233;0
> > >>>> histogram.234;0
> > >>>> histogram.235;0
> > >>>> histogram.236;0
> > >>>> histogram.237;0
> > >>>> histogram.238;0
> > >>>> histogram.239;0
> > >>>> histogram.24;0
> > >>>> histogram.240;0
> > >>>> histogram.241;0
> > >>>> histogram.242;0
> > >>>> histogram.243;0
> > >>>> histogram.244;0
> > >>>> histogram.245;0
> > >>>> histogram.246;0
> > >>>> histogram.247;0
> > >>>> histogram.248;0
> > >>>> histogram.249;0
> > >>>> histogram.25;0
> > >>>> histogram.250;0
> > >>>> histogram.251;0
> > >>>> histogram.252;0
> > >>>> histogram.253;0
> > >>>> histogram.254;0
> > >>>> histogram.255;0
> > >>>> histogram.26;0
> > >>>> histogram.27;0
> > >>>> histogram.28;0
> > >>>> histogram.29;0
> > >>>> histogram.3;0
> > >>>> histogram.30;0
> > >>>> histogram.31;0
> > >>>> histogram.32;11930422
> > >>>> histogram.33;11934311
> > >>>> histogram.34;11930459
> > >>>> histogram.35;11924776
> > >>>> histogram.36;11924186
> > >>>> histogram.37;11928616
> > >>>> histogram.38;11929474
> > >>>> histogram.39;11929607
> > >>>> histogram.4;0
> > >>>> histogram.40;11928053
> > >>>> histogram.41;11930402
> > >>>> histogram.42;11926830
> > >>>> histogram.43;11938138
> > >>>> histogram.44;11932536
> > >>>> histogram.45;11931053
> > >>>> histogram.46;11930008
> > >>>> histogram.47;11927747
> > >>>> histogram.48;11936055
> > >>>> histogram.49;11931471
> > >>>> histogram.5;0
> > >>>> histogram.50;11931921
> > >>>> histogram.51;11929643
> > >>>> histogram.52;11923847
> > >>>> histogram.53;11927311
> > >>>> histogram.54;11933754
> > >>>> histogram.55;11925964
> > >>>> histogram.56;11928872
> > >>>> histogram.57;11931124
> > >>>> histogram.58;11928474
> > >>>> histogram.59;11925814
> > >>>> histogram.6;0
> > >>>> histogram.60;11933978
> > >>>> histogram.61;11934136
> > >>>> histogram.62;11932016
> > >>>> histogram.63;23864588
> > >>>> histogram.64;11924792
> > >>>> histogram.65;11934789
> > >>>> histogram.66;11933047
> > >>>> histogram.67;11931899
> > >>>> histogram.68;11935615
> > >>>> histogram.69;11927249
> > >>>> histogram.7;0
> > >>>> histogram.70;11933276
> > >>>> histogram.71;11927953
> > >>>> histogram.72;11929275
> > >>>> histogram.73;11930292
> > >>>> histogram.74;11935428
> > >>>> histogram.75;11930317
> > >>>> histogram.76;11935737
> > >>>> histogram.77;11932127
> > >>>> histogram.78;11932344
> > >>>> histogram.79;11932094
> > >>>> histogram.8;0
> > >>>> histogram.80;11930688
> > >>>> histogram.81;11928415
> > >>>> histogram.82;11931559
> > >>>> histogram.83;11934192
> > >>>> histogram.84;11927224
> > >>>> histogram.85;11929491
> > >>>> histogram.86;11930624
> > >>>> histogram.87;11932201
> > >>>> histogram.88;11930694
> > >>>> histogram.89;11936439
> > >>>> histogram.9;11933187
> > >>>> histogram.90;11926445
> > >>>> histogram.91;0
> > >>>> histogram.92;0
> > >>>> histogram.93;0
> > >>>> histogram.94;11931596
> > >>>> histogram.95;11929379
> > >>>> histogram.96;0
> > >>>> histogram.97;11928864
> > >>>> histogram.98;11924738
> > >>>> histogram.99;11930062
> > >>>> histogram.totalBytes;1073741824
> > >>>>
> > >>>> File 2:
> > >>>> histogram.0;0
> > >>>> histogram.1;0
> > >>>> histogram.10;11932402
> > >>>> histogram.100;11927531
> > >>>> histogram.101;11928454
> > >>>> histogram.102;11934432
> > >>>> histogram.103;11924623
> > >>>> histogram.104;11934492
> > >>>> histogram.105;11934585
> > >>>> histogram.106;11928955
> > >>>> histogram.107;11928651
> > >>>> histogram.108;11930139
> > >>>> histogram.109;11929325
> > >>>> histogram.11;0
> > >>>> histogram.110;11930486
> > >>>> histogram.111;11933517
> > >>>> histogram.112;11928334
> > >>>> histogram.113;11927798
> > >>>> histogram.114;11929222
> > >>>> histogram.115;11932057
> > >>>> histogram.116;11931182
> > >>>> histogram.117;11933407
> > >>>> histogram.118;11932709
> > >>>> histogram.119;11931338
> > >>>> histogram.12;0
> > >>>> histogram.120;11933700
> > >>>> histogram.121;11929803
> > >>>> histogram.122;11930218
> > >>>> histogram.123;0
> > >>>> histogram.124;0
> > >>>> histogram.125;0
> > >>>> histogram.126;0
> > >>>> histogram.127;0
> > >>>> histogram.128;0
> > >>>> histogram.129;0
> > >>>> histogram.13;0
> > >>>> histogram.130;0
> > >>>> histogram.131;0
> > >>>> histogram.132;0
> > >>>> histogram.133;0
> > >>>> histogram.134;0
> > >>>> histogram.135;0
> > >>>> histogram.136;0
> > >>>> histogram.137;0
> > >>>> histogram.138;0
> > >>>> histogram.139;0
> > >>>> histogram.14;0
> > >>>> histogram.140;0
> > >>>> histogram.141;0
> > >>>> histogram.142;0
> > >>>> histogram.143;0
> > >>>> histogram.144;0
> > >>>> histogram.145;0
> > >>>> histogram.146;0
> > >>>> histogram.147;0
> > >>>> histogram.148;0
> > >>>> histogram.149;0
> > >>>> histogram.15;0
> > >>>> histogram.150;0
> > >>>> histogram.151;0
> > >>>> histogram.152;0
> > >>>> histogram.153;0
> > >>>> histogram.154;0
> > >>>> histogram.155;0
> > >>>> histogram.156;0
> > >>>> histogram.157;0
> > >>>> histogram.158;0
> > >>>> histogram.159;0
> > >>>> histogram.16;0
> > >>>> histogram.160;0
> > >>>> histogram.161;0
> > >>>> histogram.162;0
> > >>>> histogram.163;0
> > >>>> histogram.164;0
> > >>>> histogram.165;0
> > >>>> histogram.166;0
> > >>>> histogram.167;0
> > >>>> histogram.168;0
> > >>>> histogram.169;0
> > >>>> histogram.17;0
> > >>>> histogram.170;0
> > >>>> histogram.171;0
> > >>>> histogram.172;0
> > >>>> histogram.173;0
> > >>>> histogram.174;0
> > >>>> histogram.175;0
> > >>>> histogram.176;0
> > >>>> histogram.177;0
> > >>>> histogram.178;0
> > >>>> histogram.179;0
> > >>>> histogram.18;0
> > >>>> histogram.180;0
> > >>>> histogram.181;0
> > >>>> histogram.182;0
> > >>>> histogram.183;0
> > >>>> histogram.184;0
> > >>>> histogram.185;0
> > >>>> histogram.186;0
> > >>>> histogram.187;0
> > >>>> histogram.188;0
> > >>>> histogram.189;0
> > >>>> histogram.19;0
> > >>>> histogram.190;0
> > >>>> histogram.191;0
> > >>>> histogram.192;0
> > >>>> histogram.193;0
> > >>>> histogram.194;0
> > >>>> histogram.195;0
> > >>>> histogram.196;0
> > >>>> histogram.197;0
> > >>>> histogram.198;0
> > >>>> histogram.199;0
> > >>>> histogram.2;0
> > >>>> histogram.20;0
> > >>>> histogram.200;0
> > >>>> histogram.201;0
> > >>>> histogram.202;0
> > >>>> histogram.203;0
> > >>>> histogram.204;0
> > >>>> histogram.205;0
> > >>>> histogram.206;0
> > >>>> histogram.207;0
> > >>>> histogram.208;0
> > >>>> histogram.209;0
> > >>>> histogram.21;0
> > >>>> histogram.210;0
> > >>>> histogram.211;0
> > >>>> histogram.212;0
> > >>>> histogram.213;0
> > >>>> histogram.214;0
> > >>>> histogram.215;0
> > >>>> histogram.216;0
> > >>>> histogram.217;0
> > >>>> histogram.218;0
> > >>>> histogram.219;0
> > >>>> histogram.22;0
> > >>>> histogram.220;0
> > >>>> histogram.221;0
> > >>>> histogram.222;0
> > >>>> histogram.223;0
> > >>>> histogram.224;0
> > >>>> histogram.225;0
> > >>>> histogram.226;0
> > >>>> histogram.227;0
> > >>>> histogram.228;0
> > >>>> histogram.229;0
> > >>>> histogram.23;0
> > >>>> histogram.230;0
> > >>>> histogram.231;0
> > >>>> histogram.232;0
> > >>>> histogram.233;0
> > >>>> histogram.234;0
> > >>>> histogram.235;0
> > >>>> histogram.236;0
> > >>>> histogram.237;0
> > >>>> histogram.238;0
> > >>>> histogram.239;0
> > >>>> histogram.24;0
> > >>>> histogram.240;0
> > >>>> histogram.241;0
> > >>>> histogram.242;0
> > >>>> histogram.243;0
> > >>>> histogram.244;0
> > >>>> histogram.245;0
> > >>>> histogram.246;0
> > >>>> histogram.247;0
> > >>>> histogram.248;0
> > >>>> histogram.249;0
> > >>>> histogram.25;0
> > >>>> histogram.250;0
> > >>>> histogram.251;0
> > >>>> histogram.252;0
> > >>>> histogram.253;0
> > >>>> histogram.254;0
> > >>>> histogram.255;0
> > >>>> histogram.26;0
> > >>>> histogram.27;0
> > >>>> histogram.28;0
> > >>>> histogram.29;0
> > >>>> histogram.3;0
> > >>>> histogram.30;0
> > >>>> histogram.31;0
> > >>>> histogram.32;11924458
> > >>>> histogram.33;11934243
> > >>>> histogram.34;11930696
> > >>>> histogram.35;11925574
> > >>>> histogram.36;11929198
> > >>>> histogram.37;11928146
> > >>>> histogram.38;11932505
> > >>>> histogram.39;11929406
> > >>>> histogram.4;0
> > >>>> histogram.40;11930100
> > >>>> histogram.41;11930867
> > >>>> histogram.42;11930796
> > >>>> histogram.43;11930796
> > >>>> histogram.44;11921866
> > >>>> histogram.45;11935682
> > >>>> histogram.46;11930075
> > >>>> histogram.47;11928169
> > >>>> histogram.48;11933490
> > >>>> histogram.49;11932174
> > >>>> histogram.5;0
> > >>>> histogram.50;11933255
> > >>>> histogram.51;11934009
> > >>>> histogram.52;11928361
> > >>>> histogram.53;11927626
> > >>>> histogram.54;11931611
> > >>>> histogram.55;11930755
> > >>>> histogram.56;11933823
> > >>>> histogram.57;11922508
> > >>>> histogram.58;11930384
> > >>>> histogram.59;11929805
> > >>>> histogram.6;0
> > >>>> histogram.60;11930064
> > >>>> histogram.61;11926761
> > >>>> histogram.62;11927605
> > >>>> histogram.63;23858926
> > >>>> histogram.64;11929516
> > >>>> histogram.65;11930217
> > >>>> histogram.66;11930478
> > >>>> histogram.67;11939855
> > >>>> histogram.68;11927850
> > >>>> histogram.69;11931154
> > >>>> histogram.7;0
> > >>>> histogram.70;11935374
> > >>>> histogram.71;11930754
> > >>>> histogram.72;11928304
> > >>>> histogram.73;11931772
> > >>>> histogram.74;11939417
> > >>>> histogram.75;11930712
> > >>>> histogram.76;11933331
> > >>>> histogram.77;11931279
> > >>>> histogram.78;11928276
> > >>>> histogram.79;11930071
> > >>>> histogram.8;0
> > >>>> histogram.80;11927830
> > >>>> histogram.81;11931213
> > >>>> histogram.82;11930964
> > >>>> histogram.83;11928973
> > >>>> histogram.84;11934325
> > >>>> histogram.85;11929658
> > >>>> histogram.86;11924667
> > >>>> histogram.87;11931100
> > >>>> histogram.88;11930252
> > >>>> histogram.89;11927281
> > >>>> histogram.9;11932848
> > >>>> histogram.90;11930398
> > >>>> histogram.91;0
> > >>>> histogram.92;0
> > >>>> histogram.93;0
> > >>>> histogram.94;11928720
> > >>>> histogram.95;11928988
> > >>>> histogram.96;0
> > >>>> histogram.97;11931423
> > >>>> histogram.98;11928181
> > >>>> histogram.99;11935549
> > >>>> histogram.totalBytes;1073741824
> > >>>>
> > >>>> File3:
> > >>>> histogram.0;0
> > >>>> histogram.1;0
> > >>>> histogram.10;11930417
> > >>>> histogram.100;11926739
> > >>>> histogram.101;11930580
> > >>>> histogram.102;11928210
> > >>>> histogram.103;11935300
> > >>>> histogram.104;11925804
> > >>>> histogram.105;11931023
> > >>>> histogram.106;11932342
> > >>>> histogram.107;11929778
> > >>>> histogram.108;11930098
> > >>>> histogram.109;11930759
> > >>>> histogram.11;0
> > >>>> histogram.110;11934343
> > >>>> histogram.111;11935775
> > >>>> histogram.112;11933877
> > >>>> histogram.113;11926675
> > >>>> histogram.114;11929332
> > >>>> histogram.115;11928876
> > >>>> histogram.116;11927819
> > >>>> histogram.117;11932657
> > >>>> histogram.118;11933508
> > >>>> histogram.119;11928808
> > >>>> histogram.12;0
> > >>>> histogram.120;11937532
> > >>>> histogram.121;11926907
> > >>>> histogram.122;11933942
> > >>>> histogram.123;0
> > >>>> histogram.124;0
> > >>>> histogram.125;0
> > >>>> histogram.126;0
> > >>>> histogram.127;0
> > >>>> histogram.128;0
> > >>>> histogram.129;0
> > >>>> histogram.13;0
> > >>>> histogram.130;0
> > >>>> histogram.131;0
> > >>>> histogram.132;0
> > >>>> histogram.133;0
> > >>>> histogram.134;0
> > >>>> histogram.135;0
> > >>>> histogram.136;0
> > >>>> histogram.137;0
> > >>>> histogram.138;0
> > >>>> histogram.139;0
> > >>>> histogram.14;0
> > >>>> histogram.140;0
> > >>>> histogram.141;0
> > >>>> histogram.142;0
> > >>>> histogram.143;0
> > >>>> histogram.144;0
> > >>>> histogram.145;0
> > >>>> histogram.146;0
> > >>>> histogram.147;0
> > >>>> histogram.148;0
> > >>>> histogram.149;0
> > >>>> histogram.15;0
> > >>>> histogram.150;0
> > >>>> histogram.151;0
> > >>>> histogram.152;0
> > >>>> histogram.153;0
> > >>>> histogram.154;0
> > >>>> histogram.155;0
> > >>>> histogram.156;0
> > >>>> histogram.157;0
> > >>>> histogram.158;0
> > >>>> histogram.159;0
> > >>>> histogram.16;0
> > >>>> histogram.160;0
> > >>>> histogram.161;0
> > >>>> histogram.162;0
> > >>>> histogram.163;0
> > >>>> histogram.164;0
> > >>>> histogram.165;0
> > >>>> histogram.166;0
> > >>>> histogram.167;0
> > >>>> histogram.168;0
> > >>>> histogram.169;0
> > >>>> histogram.17;0
> > >>>> histogram.170;0
> > >>>> histogram.171;0
> > >>>> histogram.172;0
> > >>>> histogram.173;0
> > >>>> histogram.174;0
> > >>>> histogram.175;0
> > >>>> histogram.176;0
> > >>>> histogram.177;0
> > >>>> histogram.178;0
> > >>>> histogram.179;0
> > >>>> histogram.18;0
> > >>>> histogram.180;0
> > >>>> histogram.181;0
> > >>>> histogram.182;0
> > >>>> histogram.183;0
> > >>>> histogram.184;0
> > >>>> histogram.185;0
> > >>>> histogram.186;0
> > >>>> histogram.187;0
> > >>>> histogram.188;0
> > >>>> histogram.189;0
> > >>>> histogram.19;0
> > >>>> histogram.190;0
> > >>>> histogram.191;0
> > >>>> histogram.192;0
> > >>>> histogram.193;0
> > >>>> histogram.194;0
> > >>>> histogram.195;0
> > >>>> histogram.196;0
> > >>>> histogram.197;0
> > >>>> histogram.198;0
> > >>>> histogram.199;0
> > >>>> histogram.2;0
> > >>>> histogram.20;0
> > >>>> histogram.200;0
> > >>>> histogram.201;0
> > >>>> histogram.202;0
> > >>>> histogram.203;0
> > >>>> histogram.204;0
> > >>>> histogram.205;0
> > >>>> histogram.206;0
> > >>>> histogram.207;0
> > >>>> histogram.208;0
> > >>>> histogram.209;0
> > >>>> histogram.21;0
> > >>>> histogram.210;0
> > >>>> histogram.211;0
> > >>>> histogram.212;0
> > >>>> histogram.213;0
> > >>>> histogram.214;0
> > >>>> histogram.215;0
> > >>>> histogram.216;0
> > >>>> histogram.217;0
> > >>>> histogram.218;0
> > >>>> histogram.219;0
> > >>>> histogram.22;0
> > >>>> histogram.220;0
> > >>>> histogram.221;0
> > >>>> histogram.222;0
> > >>>> histogram.223;0
> > >>>> histogram.224;0
> > >>>> histogram.225;0
> > >>>> histogram.226;0
> > >>>> histogram.227;0
> > >>>> histogram.228;0
> > >>>> histogram.229;0
> > >>>> histogram.23;0
> > >>>> histogram.230;0
> > >>>> histogram.231;0
> > >>>> histogram.232;0
> > >>>> histogram.233;0
> > >>>> histogram.234;0
> > >>>> histogram.235;0
> > >>>> histogram.236;0
> > >>>> histogram.237;0
> > >>>> histogram.238;0
> > >>>> histogram.239;0
> > >>>> histogram.24;0
> > >>>> histogram.240;0
> > >>>> histogram.241;0
> > >>>> histogram.242;0
> > >>>> histogram.243;0
> > >>>> histogram.244;0
> > >>>> histogram.245;0
> > >>>> histogram.246;0
> > >>>> histogram.247;0
> > >>>> histogram.248;0
> > >>>> histogram.249;0
> > >>>> histogram.25;0
> > >>>> histogram.250;0
> > >>>> histogram.251;0
> > >>>> histogram.252;0
> > >>>> histogram.253;0
> > >>>> histogram.254;0
> > >>>> histogram.255;0
> > >>>> histogram.26;0
> > >>>> histogram.27;0
> > >>>> histogram.28;0
> > >>>> histogram.29;0
> > >>>> histogram.3;0
> > >>>> histogram.30;0
> > >>>> histogram.31;0
> > >>>> histogram.32;11929486
> > >>>> histogram.33;11930737
> > >>>> histogram.34;11931092
> > >>>> histogram.35;11934488
> > >>>> histogram.36;11927605
> > >>>> histogram.37;11930735
> > >>>> histogram.38;11932174
> > >>>> histogram.39;11936180
> > >>>> histogram.4;0
> > >>>> histogram.40;11931666
> > >>>> histogram.41;11927043
> > >>>> histogram.42;11929044
> > >>>> histogram.43;11934104
> > >>>> histogram.44;11936337
> > >>>> histogram.45;11935580
> > >>>> histogram.46;11929598
> > >>>> histogram.47;11934083
> > >>>> histogram.48;11928858
> > >>>> histogram.49;11931098
> > >>>> histogram.5;0
> > >>>> histogram.50;11930618
> > >>>> histogram.51;11925429
> > >>>> histogram.52;11929741
> > >>>> histogram.53;11934160
> > >>>> histogram.54;11931999
> > >>>> histogram.55;11930465
> > >>>> histogram.56;11926194
> > >>>> histogram.57;11926386
> > >>>> histogram.58;11924871
> > >>>> histogram.59;11929331
> > >>>> histogram.6;0
> > >>>> histogram.60;11926951
> > >>>> histogram.61;11928631
> > >>>> histogram.62;11927549
> > >>>> histogram.63;23856730
> > >>>> histogram.64;11930288
> > >>>> histogram.65;11931523
> > >>>> histogram.66;11932821
> > >>>> histogram.67;11932509
> > >>>> histogram.68;11929613
> > >>>> histogram.69;11928651
> > >>>> histogram.7;0
> > >>>> histogram.70;11929253
> > >>>> histogram.71;11931521
> > >>>> histogram.72;11925805
> > >>>> histogram.73;11934833
> > >>>> histogram.74;11928314
> > >>>> histogram.75;11923854
> > >>>> histogram.76;11930892
> > >>>> histogram.77;11927528
> > >>>> histogram.78;11932850
> > >>>> histogram.79;11934471
> > >>>> histogram.8;0
> > >>>> histogram.80;11925707
> > >>>> histogram.81;11929213
> > >>>> histogram.82;11931334
> > >>>> histogram.83;11936739
> > >>>> histogram.84;11927855
> > >>>> histogram.85;11931668
> > >>>> histogram.86;11928609
> > >>>> histogram.87;11931930
> > >>>> histogram.88;11934341
> > >>>> histogram.89;11927519
> > >>>> histogram.9;11928004
> > >>>> histogram.90;11933502
> > >>>> histogram.91;0
> > >>>> histogram.92;0
> > >>>> histogram.93;0
> > >>>> histogram.94;11932024
> > >>>> histogram.95;11932693
> > >>>> histogram.96;0
> > >>>> histogram.97;11928428
> > >>>> histogram.98;11933195
> > >>>> histogram.99;11924273
> > >>>> histogram.totalBytes;1073741824
> > >>>>
> > >>>> Kind regards
> > >>>> Jens
> > >>>>
> > >>>>> Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <
> joe.witt@gmail.com>:
> > >>>>>
> > >>>>> Jen
> > >>>>>
> > >>>>> 118 hours in - still goood.
> > >>>>>
> > >>>>> Thanks
> > >>>>>
> > >>>>>> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>
> > >>>>>> Jens
> > >>>>>>
> > >>>>>> Update from hour 67.  Still lookin' good.
> > >>>>>>
> > >>>>>> Will advise.
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>> On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Many many thanks 🙏 Joe for looking into this. My test flow was
> running for 6 days before the first error occurred
> > >>>>>>>
> > >>>>>>> Thanks
> > >>>>>>>
> > >>>>>>>> Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <joe.witt@gmail.com
> >:
> > >>>>>>>>
> > >>>>>>>> Jens,
> > >>>>>>>>
> > >>>>>>>> Am 40+ hours in running both your flow and mine to reproduce.
> So far
> > >>>>>>>> neither have shown any sign of trouble.  Will keep running for
> another
> > >>>>>>>> week or so if I can.
> > >>>>>>>>
> > >>>>>>>> Thanks
> > >>>>>>>>
> > >>>>>>>>> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> The Physical hosts with VMWare is using the vmfs but the vm
> machines running at hosts can’t see that.
> > >>>>>>>>> But you asked about the underlying file system 😀 and since my
> first answer with the copy from the fstab file wasn’t enough I just wanted
> to give all the details 😁.
> > >>>>>>>>>
> > >>>>>>>>> If you create a vm for windows you would probably use NTFS (on
> top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> > >>>>>>>>>
> > >>>>>>>>> All the partitions at my nifi nodes, are local devices (sda,
> sdb, sdc and sdd) for each Linux machine. I don’t use nfs
> > >>>>>>>>>
> > >>>>>>>>> Kind regards
> > >>>>>>>>> Jens
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <joe.witt@gmail.com
> >:
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>> I don't quite follow the EXT4 usage on top of VMFS but the
> point here
> > >>>>>>>>> is you'll ultimately need to truly understand your underlying
> storage
> > >>>>>>>>> system and what sorts of guarantees it is giving you.  If
> linux/the
> > >>>>>>>>> jvm/nifi think it has a typical EXT4 type block storage system
> to work
> > >>>>>>>>> with it can only be safe/operate within those constraints.  I
> have no
> > >>>>>>>>> idea about what VMFS brings to the table or the settings for
> it.
> > >>>>>>>>>
> > >>>>>>>>> The sync properties I shared previously might help force the
> issue of
> > >>>>>>>>> ensuring a formal sync/flush cycle all the way through the
> disk has
> > >>>>>>>>> occurred which we'd normally not do or need to do but again in
> some
> > >>>>>>>>> cases offers a stronger guarantee in exchange for performance.
> > >>>>>>>>>
> > >>>>>>>>> In any case...Mark's path for you here will help identify what
> we're
> > >>>>>>>>> dealing with and we can go from there.
> > >>>>>>>>>
> > >>>>>>>>> I am aware of significant usage of NiFi on VMWare
> configurations
> > >>>>>>>>> without issue at high rates for many years so whatever it is
> here is
> > >>>>>>>>> likely solvable.
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Hi Mark
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks for the clarification. I will implement the script when
> I return to the office at Monday next week ( November 1st).
> > >>>>>>>>>
> > >>>>>>>>> I don’t use NFS, but ext4. But I will implement the script so
> we can check if it’s the case here. But I think the issue might be after
> the processors writing content to the repository.
> > >>>>>>>>>
> > >>>>>>>>> I have a test flow running for more than 2 weeks without any
> errors. But this flow only calculate hash and comparing.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Two other flows both create errors. One flow use
> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use
> MergeContent->UnpackContent->CryptographicHashContent->compares. The last
> flow is totally inside nifi, excluding other network/server issues.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> In both cases the CryptographicHashContent is right after a
> process which writes new content to the repository. But in one case a file
> in our production flow did calculate a wrong hash 4 times with a 1 minutes
> delay between each calculation. A few hours later I looped the file back
> and this time it was OK.
> > >>>>>>>>>
> > >>>>>>>>> Just like the case in step 5 and 12 in the pdf file
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I will let you all know more later next week
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Kind regards
> > >>>>>>>>>
> > >>>>>>>>> Jens
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <
> markap14@hotmail.com>:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> And the actual script:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> import org.apache.nifi.flowfile.FlowFile
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> import java.util.stream.Collectors
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Map<String, String> getPreviousHistogram(final FlowFile
> flowFile) {
> > >>>>>>>>>
> > >>>>>>>>> final Map<String, String> histogram =
> flowFile.getAttributes().entrySet().stream()
> > >>>>>>>>>
> > >>>>>>>>>     .filter({ entry -> entry.getKey().startsWith("histogram.")
> })
> > >>>>>>>>>
> > >>>>>>>>>     .collect(Collectors.toMap({ entry -> entry.key}, { entry
> -> entry.value }))
> > >>>>>>>>>
> > >>>>>>>>> return histogram;
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Map<String, String> createHistogram(final FlowFile flowFile,
> final InputStream inStream) {
> > >>>>>>>>>
> > >>>>>>>>> final Map<String, String> histogram = new HashMap<>();
> > >>>>>>>>>
> > >>>>>>>>> final int[] distribution = new int[256];
> > >>>>>>>>>
> > >>>>>>>>> Arrays.fill(distribution, 0);
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> long total = 0L;
> > >>>>>>>>>
> > >>>>>>>>> final byte[] buffer = new byte[8192];
> > >>>>>>>>>
> > >>>>>>>>> int len;
> > >>>>>>>>>
> > >>>>>>>>> while ((len = inStream.read(buffer)) > 0) {
> > >>>>>>>>>
> > >>>>>>>>>     for (int i=0; i < len; i++) {
> > >>>>>>>>>
> > >>>>>>>>>         final int val = buffer[i];
> > >>>>>>>>>
> > >>>>>>>>>         distribution[val]++;
> > >>>>>>>>>
> > >>>>>>>>>         total++;
> > >>>>>>>>>
> > >>>>>>>>>     }
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> for (int i=0; i < 256; i++) {
> > >>>>>>>>>
> > >>>>>>>>>     histogram.put("histogram." + i,
> String.valueOf(distribution[i]));
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> histogram.put("histogram.totalBytes", String.valueOf(total));
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> return histogram;
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> void logHistogramDifferences(final Map<String, String>
> previous, final Map<String, String> updated) {
> > >>>>>>>>>
> > >>>>>>>>> final StringBuilder sb = new StringBuilder("There are
> differences in the histogram\n");
> > >>>>>>>>>
> > >>>>>>>>> final Map<String, String> sorted = new TreeMap<>(previous)
> > >>>>>>>>>
> > >>>>>>>>> for (final Map.Entry<String, String> entry :
> sorted.entrySet()) {
> > >>>>>>>>>
> > >>>>>>>>>     final String key = entry.getKey();
> > >>>>>>>>>
> > >>>>>>>>>     final String previousValue = entry.getValue();
> > >>>>>>>>>
> > >>>>>>>>>     final String updatedValue = updated.get(entry.getKey())
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>     if (!Objects.equals(previousValue, updatedValue)) {
> > >>>>>>>>>
> > >>>>>>>>>         sb.append("Byte Value: ").append(key).append(",
> Previous Count: ").append(previousValue).append(", New Count:
> ").append(updatedValue).append("\n");
> > >>>>>>>>>
> > >>>>>>>>>     }
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> log.error(sb.toString());
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> def flowFile = session.get()
> > >>>>>>>>>
> > >>>>>>>>> if (flowFile == null) {
> > >>>>>>>>>
> > >>>>>>>>> return
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> final Map<String, String> previousHistogram =
> getPreviousHistogram(flowFile)
> > >>>>>>>>>
> > >>>>>>>>> Map<String, String> histogram = null;
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> final InputStream inStream = session.read(flowFile);
> > >>>>>>>>>
> > >>>>>>>>> try {
> > >>>>>>>>>
> > >>>>>>>>> histogram = createHistogram(flowFile, inStream);
> > >>>>>>>>>
> > >>>>>>>>> } finally {
> > >>>>>>>>>
> > >>>>>>>>> inStream.close()
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> if (!previousHistogram.isEmpty()) {
> > >>>>>>>>>
> > >>>>>>>>> if (previousHistogram.equals(histogram)) {
> > >>>>>>>>>
> > >>>>>>>>>     log.info("Histograms match")
> > >>>>>>>>>
> > >>>>>>>>> } else {
> > >>>>>>>>>
> > >>>>>>>>>     logHistogramDifferences(previousHistogram, histogram)
> > >>>>>>>>>
> > >>>>>>>>>     session.transfer(flowFile, REL_FAILURE)
> > >>>>>>>>>
> > >>>>>>>>>     return;
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> flowFile = session.putAllAttributes(flowFile, histogram)
> > >>>>>>>>>
> > >>>>>>>>> session.transfer(flowFile, REL_SUCCESS)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> For a bit of background here, the reason that Joe and I have
> expressed interest in NFS file systems is that the way the protocol works,
> it is allowed to receive packets/chunks of the file out-of-order. So, what
> happens is let’s say a 1 MB file is being written. The first 500 KB are
> received. Then instead of the the 501st KB it receives the 503rd KB. What
> happens is that the size of the file on the file system becomes 503 KB. But
> what about 501 & 502? Well when you read the data, the file system just
> returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server
> receives those bytes, it then goes back and fills in the proper bytes. So
> if you’re running on NFS, it is possible for the contents of the file on
> the underlying file system to change out from under you. It’s not clear to
> me what other types of file system might do something similar.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> So, one thing that we can do is to find out whether or not the
> contents of the underlying file have changed in some way, or if there’s
> something else happening that could perhaps result in the hashes being
> wrong. I’ve put together a script that should help diagnose this.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Can you insert an ExecuteScript processor either just before
> or just after your CryptographicHashContent processor? Doesn’t really
> matter whether it’s run just before or just after. I’ll attach the script
> here. It’s a Groovy Script so you should be able to use ExecuteScript with
> Script Engine = Groovy and the following script as the Script Body. No
> other changes needed.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> The way the script works, it reads in the contents of the
> FlowFile, and then it builds up a histogram of all byte values (0-255) that
> it sees in the contents, and then adds that as attributes. So it adds
> attributes such as:
> > >>>>>>>>>
> > >>>>>>>>> histogram.0 = 280273
> > >>>>>>>>>
> > >>>>>>>>> histogram.1 = 2820
> > >>>>>>>>>
> > >>>>>>>>> histogram.2 = 48202
> > >>>>>>>>>
> > >>>>>>>>> histogram.3 = 3820
> > >>>>>>>>>
> > >>>>>>>>> …
> > >>>>>>>>>
> > >>>>>>>>> histogram.totalBytes = 1780928732
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> It then checks if those attributes have already been added. If
> so, after calculating that histogram, it checks against the previous values
> (in the attributes). If they are the same, the FlowFile goes to ’success’.
> If they are different, it logs an error indicating the before/after value
> for any byte whose distribution was different, and it routes to failure.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> So, if for example, the first time through it sees 280,273
> bytes with a value of ‘0’, and the second times it only sees 12,001 then we
> know there were a bunch of 0’s previously that were updated to be some
> other value. And it includes the total number of bytes in case somehow we
> find that we’re reading too many bytes or not enough bytes or something
> like that. This should help narrow down what’s happening.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>> -Mark
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Attached is the flow I was using (now running yours and this
> one).  Curious if that one reproduces the issue for you as well.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I have your flow running and will keep it running for several
> days/week to see if I can reproduce.  Also of note please use your same
> test flow but use HashContent instead of crypto hash.  Curious if that
> matters for any reason...
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Still want to know more about your underlying storage system.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> You could also try updating nifi.properties and changing the
> following lines:
> > >>>>>>>>>
> > >>>>>>>>> nifi.flowfile.repository.always.sync=true
> > >>>>>>>>>
> > >>>>>>>>> nifi.content.repository.always.sync=true
> > >>>>>>>>>
> > >>>>>>>>> nifi.provenance.repository.always.sync=true
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> It will hurt performance but can be useful/necessary on
> certain storage subsystems.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Ignore "For the scenario where you can replicate this please
> share the flow.xml.gz for which it is reproducible."  I see the uploaded
> JSON
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> We asked about the underlying storage system.  You replied
> with some info but not the specifics.  Do you know precisely what the
> underlying storage is and how it is presented to the operating system?  For
> instance is it NFS or something similar?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I've setup a very similar flow at extremely high rates running
> for the past several days with no issue.  In my case though I know
> precisely what the config is and the disk setup is.  Didn't do anything
> special to be clear but still it is important to know.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> For the scenario where you can replicate this please share the
> flow.xml.gz for which it is reproducible.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>> Joe
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Dear Joe and Mark
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I have created a test flow without the sftp processors, which
> don't create any errors. Therefore I created a new test flow where I use a
> MergeContent and UnpackContent instead of the sftp processors. This keeps
> all data internal in NIFI, but force NIFI to write and read new files
> totally local.
> > >>>>>>>>>
> > >>>>>>>>> My flow have been running for 7 days and this morning there
> where 2 files where the sha256 has been given another has value than
> original. I have set this flow up in another nifi cluster only for testing,
> and the cluster is not doing anything else. It is using Nifi 1.14.0
> > >>>>>>>>>
> > >>>>>>>>> So I can reproduce issues at different nifi clusters and
> versions (1.13.2 and 1.14.0) where the calculation of a hash on content can
> give different outputs. Is doesn't make any sense, but it happens. In all
> my cases the issues happens where the calculations of the hashcontent
> happens right after NIFI writes the content to the content repository. I
> don't know if there cut be some kind of delay writing the content 100%
> before the next processors begin reading the content???
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Please see attach test flow, and the previous mail with a pdf
> showing the lineage of a production file which also had issues. In the pdf
> check step 5 and 12.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Kind regards
> > >>>>>>>>>
> > >>>>>>>>> Jens M. Kofoed
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Joe,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> To start from the last mail :-)
> > >>>>>>>>>
> > >>>>>>>>> All the repositories has it's own disk, and I'm using ext4
> > >>>>>>>>>
> > >>>>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
> > >>>>>>>>>
> > >>>>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0
> 0
> > >>>>>>>>>
> > >>>>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0
> 0
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> My test flow WITH sftp looks like this:
> > >>>>>>>>>
> > >>>>>>>>> <image.png>
> > >>>>>>>>>
> > >>>>>>>>> And this flow has produced 1 error within 3 days. After many
> many loops the file fails and went out via the "unmatched" output to  the
> disabled UpdateAttribute, which is doing nothing. Just for keeping the
> failed flowfile in a queue.  I enabled the UpdateAttribute and looped the
> file back to the CryptographicHashContent and now it calculated the hash
> correct again. But in this flow I have a FetchSFTP Process right before the
> Hashing.
> > >>>>>>>>>
> > >>>>>>>>> Right now my flow is running without the 2 sftp processors,
> and the last 24hours there has been no errors.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> About the Lineage:
> > >>>>>>>>>
> > >>>>>>>>> Are there a way to export all the lineage data? The export
> only generate a svg file.
> > >>>>>>>>>
> > >>>>>>>>> This is only for the receiving nifi which is internally
> calculate 2 different hashes on the same content with ca. 1 minutes delay.
> Attached is a pdf-document with the lineage, the flow and all the relevant
> Provenance information's for each step in the lineage.
> > >>>>>>>>>
> > >>>>>>>>> The interesting steps are step 5 and 12.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Can the issues be that data is not written 100% to disk
> between step 4 and 5 in the flow?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Kind regards
> > >>>>>>>>>
> > >>>>>>>>> Jens M. Kofoed
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <
> joe.witt@gmail.com>:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Also what type of file system/storage system are you running
> NiFi on
> > >>>>>>>>>
> > >>>>>>>>> in this case?  We'll need to know this for the NiFi
> > >>>>>>>>>
> > >>>>>>>>> content/flowfile/provenance repositories? Is it NFS?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> And to further narrow this down
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> "I have a test flow, where a GenerateFlowfile has created 6x
> 1GB files
> > >>>>>>>>>
> > >>>>>>>>> (2 files per node) and next process was a hashcontent before
> it run
> > >>>>>>>>>
> > >>>>>>>>> into a test loop. Where files are uploaded via PutSFTP to a
> test
> > >>>>>>>>>
> > >>>>>>>>> server, and downloaded again and recalculated the hash. I have
> had one
> > >>>>>>>>>
> > >>>>>>>>> issue after 3 days of running."
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> So to be clear with GenerateFlowFile making these files and
> then you
> > >>>>>>>>>
> > >>>>>>>>> looping the content is wholly and fully exclusively within the
> control
> > >>>>>>>>>
> > >>>>>>>>> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by
> looping the
> > >>>>>>>>>
> > >>>>>>>>> same files over and over in nifi itself you can make this
> happen or
> > >>>>>>>>>
> > >>>>>>>>> cannot?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> "After fetching a FlowFile-stream file and unpacked it back
> into NiFi
> > >>>>>>>>>
> > >>>>>>>>> I calculate a sha256. 1 minutes later I recalculate the sha256
> on the
> > >>>>>>>>>
> > >>>>>>>>> exact same file. And got a new hash. That is what worry’s me.
> > >>>>>>>>>
> > >>>>>>>>> The fact that the same file can be recalculated and produce two
> > >>>>>>>>>
> > >>>>>>>>> different hashes, is very strange, but it happens. "
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Ok so to confirm you are saying that in each case this happens
> you see
> > >>>>>>>>>
> > >>>>>>>>> it first compute the wrong hash, but then if you retry the same
> > >>>>>>>>>
> > >>>>>>>>> flowfile it then provides the correct hash?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Can you please also show/share the lineage history for such a
> flow
> > >>>>>>>>>
> > >>>>>>>>> file then?  It should have events for the initial hash, second
> hash,
> > >>>>>>>>>
> > >>>>>>>>> the unpacking, trace to the original stream, etc...
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Dear Mark and Joe
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I know my setup isn’t normal for many people. But if we only
> looks at my receive side, which the last mails is about. Every thing is
> happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> > >>>>>>>>>
> > >>>>>>>>> After fetching a FlowFile-stream file and unpacked it back
> into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on
> the exact same file. And got a new hash. That is what worry’s me.
> > >>>>>>>>>
> > >>>>>>>>> The fact that the same file can be recalculated and produce
> two different hashes, is very strange, but it happens. Over the last 5
> months it have only happen 35-40 times.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I can understand if the file is not completely loaded and
> saved into the content repository before the hashing starts. But I believe
> that the unpack process don’t forward the flow file to the next process
> before it is 100% finish unpacking and saving the new content to the
> repository.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I have a test flow, where a GenerateFlowfile has created 6x
> 1GB files (2 files per node) and next process was a hashcontent before it
> run into a test loop. Where files are uploaded via PutSFTP to a test
> server, and downloaded again and recalculated the hash. I have had one
> issue after 3 days of running.
> > >>>>>>>>>
> > >>>>>>>>> Now the test flow is running without the Put/Fetch sftp
> processors.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Another problem is that I can’t find any correlation to other
> events. Not within NIFI, nor the server itself or VMWare. If I just could
> find any other event which happens at the same time, I might be able to
> force some kind of event to trigger the issue.
> > >>>>>>>>>
> > >>>>>>>>> I have tried to force VMware to migrate a NiFi node to another
> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
> trigger and error.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I know it will be very very difficult to reproduce. But I will
> setup multiple NiFi instances running different test flows to see if I can
> find any reason why it behaves as it does.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Kind Regards
> > >>>>>>>>>
> > >>>>>>>>> Jens M. Kofoed
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <
> markap14@hotmail.com>:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks for sharing the images.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I tried to setup a test to reproduce the issue. I’ve had it
> running for quite some time. Running through millions of iterations.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to
> the tune of hundreds of MB). I’ve been unable to reproduce an issue after
> millions of iterations.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> So far I cannot replicate. And since you’re pulling the data
> via SFTP and then unpacking, which preserves all original attributes from a
> different system, this can easily become confusing.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Recommend trying to reproduce with SFTP-related processors out
> of the picture, as Joe is mentioning. Either using GetFile/FetchFile or
> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
> an ‘initial hash’, copy that value to another attribute, and then loop,
> generating the hash and comparing against the original one. I’ll attach a
> flow that does this, but not sure if the email server will strip out the
> attachment or not.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> This way we remove any possibility of actual corruption
> between the two nifi instances. If we can still see corruption / different
> hashes within a single nifi instance, then it certainly warrants further
> investigation but i can’t see any issues so far.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>> -Mark
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Actually is this current loop test contained within a single
> nifi and there you see corruption happen?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Joe
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> You have a very involved setup including other systems (non
> NiFi).  Have you removed those systems from the equation so you have more
> evidence to support your expectation that NiFi is doing something other
> than you expect?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Joe
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Hi
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Today I have another file which have been running through the
> retry loop one time. To test the processors and the algorithm I added the
> HashContent processor and also added hashing by SHA-1.
> > >>>>>>>>>
> > >>>>>>>>> I file have been going through the system, and both the SHA-1
> and SHA-256 are both different than expected. with a 1 minutes delay the
> file is going back into the hashing content flow and this time it
> calculates both hashes fine.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I don't believe that the hashing is buggy, but something is
> very very strange. What can influence the processors/algorithm to calculate
> a different hash???
> > >>>>>>>>>
> > >>>>>>>>> All the input/output claim information is exactly the same. It
> is the same flow/content file going in a loop. It happens on all 3 nodes.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Any suggestions for where to dig ?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Regards
> > >>>>>>>>>
> > >>>>>>>>> Jens M. Kofoed
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Hi Mark
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks for replaying and the suggestion to look at the content
> Claim.
> > >>>>>>>>>
> > >>>>>>>>> These 3 pictures is from the first attempt:
> > >>>>>>>>>
> > >>>>>>>>> <image.png>   <image.png>   <image.png>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Yesterday I realized that the content was still in the
> archive, so I could Replay the file.
> > >>>>>>>>>
> > >>>>>>>>> <image.png>
> > >>>>>>>>>
> > >>>>>>>>> So here are the same pictures but for the replay and as you
> can see the Identifier, offset and Size are all the same.
> > >>>>>>>>>
> > >>>>>>>>> <image.png>   <image.png>   <image.png>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> In my flow if the hash does not match my original first
> calculated hash, it goes into a retry loop. Here are the pictures for the
> 4th time the file went through:
> > >>>>>>>>>
> > >>>>>>>>> <image.png>   <image.png>   <image.png>
> > >>>>>>>>>
> > >>>>>>>>> Here the content Claim is all the same.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> It is very rare that we see these issues <1 : 1.000.000 files
> and only with large files. Only once have I seen the error with a 110MB
> file, the other times the files size are above 800MB.
> > >>>>>>>>>
> > >>>>>>>>> This time it was a Nifi-Flowstream v3 file, which has been
> exported from one system and imported in another. But while the file has
> been imported it is the same file inside NIFI and it stays at the same
> node. Going through the same loop of processors multiple times and in the
> end the CryptographicHashContent calculate a different SHA256 than it did
> earlier. This should not be possible!!! And that is what concern my the
> most.
> > >>>>>>>>>
> > >>>>>>>>> What can influence the same processor to calculate 2 different
> sha256 on the exact same content???
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Regards
> > >>>>>>>>>
> > >>>>>>>>> Jens M. Kofoed
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
> markap14@hotmail.com>:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Jens,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> In the two provenance events - one showing a hash of dd4cc…
> and the other showing f6f0….
> > >>>>>>>>>
> > >>>>>>>>> If you go to the Content tab, do they both show the same
> Content Claim? I.e., do the Input Claim / Output Claim show the same values
> for Container, Section, Identifier, Offset, and Size?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>> -Mark
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Dear NIFI Users
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I have posted this mail in the developers mailing list and
> just want to inform all of our about a very odd behavior we are facing.
> > >>>>>>>>>
> > >>>>>>>>> The background:
> > >>>>>>>>>
> > >>>>>>>>> We have data going between 2 different NIFI systems which has
> no direct network access to each other. Therefore we calculate a SHA256
> hash value of the content at system 1, before the flowfile and data are
> combined and saved as a "flowfile-stream-v3" pkg file. The file is then
> transported to system 2, where the pkg file is unpacked and the flow can
> continue. To be sure about file integrity we calculate a new sha256 at
> system 2. But sometimes we see that the sha256 gets another value, which
> might suggest the file was corrupted. But recalculating the sha256 again
> gives a new hash value.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> ----
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Tonight I had yet another file which didn't match the expected
> sha256 hash value. The content is a 1.7GB file and the Event Duration was
> "00:00:17.539" to calculate the hash.
> > >>>>>>>>>
> > >>>>>>>>> I have created a Retry loop, where the file will go to a Wait
> process for delaying the file 1 minute and going back to the
> CryptographicHashContent for a new calculation. After 3 retries the file
> goes to the retries_exceeded and goes to a disabled process just to be in a
> queue so I manually can look at it. This morning I rerouted the file from
> my retries_exceeded queue back to the CryptographicHashContent for a new
> calculation and this time it calculated the correct hash value.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very
> strange is happening.
> > >>>>>>>>>
> > >>>>>>>>> <image.png>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu
> 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment
> (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM
> (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on
> VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical
> hosts.
> > >>>>>>>>>
> > >>>>>>>>> I have inspected different logs to see if I can find any
> correlation what happened at the same time as the file is going through my
> loop, but there are no event/task at that exact time.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> System 1:
> > >>>>>>>>>
> > >>>>>>>>> At 10/19/2021 00:15:11.247 CEST my file is going through a
> CryptographicHashContent: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > >>>>>>>>>
> > >>>>>>>>> The file is exported as a "FlowFile Stream, v3" to System 2
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> SYSTEM 2:
> > >>>>>>>>>
> > >>>>>>>>> At 10/19/2021 00:18:10.528 CEST the file is going through a
> CryptographicHashContent: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>>>>>>>>
> > >>>>>>>>> <image.png>
> > >>>>>>>>>
> > >>>>>>>>> At 10/19/2021 00:19:08.996 CEST the file is going through the
> same CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>>>>>>>>
> > >>>>>>>>> At 10/19/2021 00:20:04.376 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>>>>>>>>
> > >>>>>>>>> At 10/19/2021 00:21:01.711 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> At 10/19/2021 06:07:43.376 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > >>>>>>>>>
> > >>>>>>>>> <image.png>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> How on earth can this happen???
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Kind Regards
> > >>>>>>>>>
> > >>>>>>>>> Jens M. Kofoed
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> <Repro.json>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> <Try_to_recreate_Jens_Challenge.json>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>
> > >>>>
> > >
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

I think we're at a loss how to help you specifically then for your
specific installation.  We have attempted to recreate the scenario
with no luck.  We've offered suggestions on experiments which would
help us narrow in but you don't think that will help.

At this point we'll probably have to leave this thread here.  If you
used the forced sync properties we mentioned and it is still happening
then you can pretty much ensure the issue is with the JVM or the
virtual file system mechanism.

Thanks
Joe

On Wed, Nov 3, 2021 at 8:09 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Hi Mark
>
> All the files in my testflow are 1GB files. But it happens in my production flow with different file sizes.
>
> When these issues have happened, I have the flowfile routed to an updateAttribute process which is disabled. Just to keep the file in a queue. Enable the process and sent the file back to a new hash calculation, the file is OK. So I don’t think the test with backup and compare makes any sense to do.
>
> Regards
> Jens
>
> > Den 3. nov. 2021 kl. 15.57 skrev Mark Payne <ma...@hotmail.com>:
> >
> > So what I found interesting about the histogram output was that in each case, the input file was 1 GB. The number of bytes that differed between the ‘good’ and ‘bad’ hashes was something like 500-700 bytes whose values were different. But the values ranged significantly. There was no indication that the type of thing we’ve seen with NFS mounts was happening, where data was nulled out until received and then updated. If that had been the case we’d have seen the NUL byte (or some other value) have a very significant change in the histogram, but we didn’t see that.
> >
> > So a couple more ideas that I think can be useful.
> >
> > 1) Which garbage collector are you using? It’s configured in the bootstrap.conf file
> >
> > 2) We can try to definitively prove out whether the content on the disk is changing or if there’s an issue reading the content. To do this:
> >
> > 1. Stop all processors.
> > 2. Shutdown nifi
> > 3. rm -rf content_repository; rm -rf flowfile_repository   (warning, this will delete all FlowFiles & content, so only do this on a dev/test system where you’re comfortable deleting it!)
> > 4. Start nifi
> > 5. Let exactly 1 FlowFile into your flow.
> > 6. While it is looping through, create a copy of your entire Content Repository: cp -r content_repository content_backup1; zip content_backup1.zip content_backup1
> > 7. Wait for the hashes to differ
> > 8. Create another copy of the Content Repository: cp -r content_repository content_backup2
> > 9. Find the files within the content_backup1 and content_backup2 and compare them to see if they are identical. Would recommend comparing them using each of the 3 methods: sha256, sha512, diff
> >
> > This should make it pretty clear that either:
> > (1) the issue resides in the software: either NiFi or the JVM
> > (2) the issue resides outside of the software: the disk, the disk driver, the operating system, the VM hypervisor, etc.
> >
> > Thanks
> > -Mark
> >
> >> On Nov 3, 2021, at 10:44 AM, Joe Witt <jo...@gmail.com> wrote:
> >>
> >> Jens,
> >>
> >> 184 hours (7.6 days) in and zero issues.
> >>
> >> Will need to turn this off soon but wanted to give a final update.
> >> Looks great.  Given the information on your system there appears to be
> >> something we dont understand related to the virtual file system
> >> involved or something.
> >>
> >> Thanks
> >>
> >>> On Tue, Nov 2, 2021 at 10:55 PM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>
> >>> Hi Mark
> >>>
> >>> Of course, sorry :-)  By looking at the error messages, I can see that it is only the histograms which has differences which is listed. And all 3 have the first issue at histogram.9. Don't know what that mean
> >>>
> >>> /Jens
> >>>
> >>> Here are the error log:
> >>> 2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are differences in the histogram
> >>> Byte Value: histogram.10, Previous Count: 11926720, New Count: 11926721
> >>> Byte Value: histogram.100, Previous Count: 11927504, New Count: 11927503
> >>> Byte Value: histogram.101, Previous Count: 11925396, New Count: 11925407
> >>> Byte Value: histogram.102, Previous Count: 11929923, New Count: 11929941
> >>> Byte Value: histogram.103, Previous Count: 11931596, New Count: 11931591
> >>> Byte Value: histogram.104, Previous Count: 11929071, New Count: 11929064
> >>> Byte Value: histogram.105, Previous Count: 11931365, New Count: 11931348
> >>> Byte Value: histogram.106, Previous Count: 11928661, New Count: 11928645
> >>> Byte Value: histogram.107, Previous Count: 11929864, New Count: 11929866
> >>> Byte Value: histogram.108, Previous Count: 11931611, New Count: 11931642
> >>> Byte Value: histogram.109, Previous Count: 11932758, New Count: 11932763
> >>> Byte Value: histogram.110, Previous Count: 11927893, New Count: 11927895
> >>> Byte Value: histogram.111, Previous Count: 11933519, New Count: 11933522
> >>> Byte Value: histogram.112, Previous Count: 11931392, New Count: 11931397
> >>> Byte Value: histogram.113, Previous Count: 11928534, New Count: 11928548
> >>> Byte Value: histogram.114, Previous Count: 11936879, New Count: 11936874
> >>> Byte Value: histogram.115, Previous Count: 11932818, New Count: 11932804
> >>> Byte Value: histogram.117, Previous Count: 11929143, New Count: 11929151
> >>> Byte Value: histogram.118, Previous Count: 11931854, New Count: 11931829
> >>> Byte Value: histogram.119, Previous Count: 11926333, New Count: 11926327
> >>> Byte Value: histogram.120, Previous Count: 11928731, New Count: 11928740
> >>> Byte Value: histogram.121, Previous Count: 11931149, New Count: 11931162
> >>> Byte Value: histogram.122, Previous Count: 11926725, New Count: 11926733
> >>> Byte Value: histogram.32, Previous Count: 11930422, New Count: 11930425
> >>> Byte Value: histogram.33, Previous Count: 11934311, New Count: 11934313
> >>> Byte Value: histogram.34, Previous Count: 11930459, New Count: 11930446
> >>> Byte Value: histogram.35, Previous Count: 11924776, New Count: 11924758
> >>> Byte Value: histogram.36, Previous Count: 11924186, New Count: 11924183
> >>> Byte Value: histogram.37, Previous Count: 11928616, New Count: 11928627
> >>> Byte Value: histogram.38, Previous Count: 11929474, New Count: 11929490
> >>> Byte Value: histogram.39, Previous Count: 11929607, New Count: 11929600
> >>> Byte Value: histogram.40, Previous Count: 11928053, New Count: 11928048
> >>> Byte Value: histogram.41, Previous Count: 11930402, New Count: 11930399
> >>> Byte Value: histogram.42, Previous Count: 11926830, New Count: 11926846
> >>> Byte Value: histogram.44, Previous Count: 11932536, New Count: 11932538
> >>> Byte Value: histogram.45, Previous Count: 11931053, New Count: 11931044
> >>> Byte Value: histogram.46, Previous Count: 11930008, New Count: 11930011
> >>> Byte Value: histogram.47, Previous Count: 11927747, New Count: 11927734
> >>> Byte Value: histogram.48, Previous Count: 11936055, New Count: 11936057
> >>> Byte Value: histogram.49, Previous Count: 11931471, New Count: 11931474
> >>> Byte Value: histogram.50, Previous Count: 11931921, New Count: 11931908
> >>> Byte Value: histogram.51, Previous Count: 11929643, New Count: 11929637
> >>> Byte Value: histogram.52, Previous Count: 11923847, New Count: 11923854
> >>> Byte Value: histogram.53, Previous Count: 11927311, New Count: 11927303
> >>> Byte Value: histogram.54, Previous Count: 11933754, New Count: 11933766
> >>> Byte Value: histogram.55, Previous Count: 11925964, New Count: 11925970
> >>> Byte Value: histogram.56, Previous Count: 11928872, New Count: 11928873
> >>> Byte Value: histogram.57, Previous Count: 11931124, New Count: 11931127
> >>> Byte Value: histogram.58, Previous Count: 11928474, New Count: 11928477
> >>> Byte Value: histogram.59, Previous Count: 11925814, New Count: 11925812
> >>> Byte Value: histogram.60, Previous Count: 11933978, New Count: 11933991
> >>> Byte Value: histogram.61, Previous Count: 11934136, New Count: 11934123
> >>> Byte Value: histogram.62, Previous Count: 11932016, New Count: 11932011
> >>> Byte Value: histogram.63, Previous Count: 23864588, New Count: 23864584
> >>> Byte Value: histogram.64, Previous Count: 11924792, New Count: 11924789
> >>> Byte Value: histogram.65, Previous Count: 11934789, New Count: 11934797
> >>> Byte Value: histogram.66, Previous Count: 11933047, New Count: 11933044
> >>> Byte Value: histogram.67, Previous Count: 11931899, New Count: 11931909
> >>> Byte Value: histogram.68, Previous Count: 11935615, New Count: 11935609
> >>> Byte Value: histogram.69, Previous Count: 11927249, New Count: 11927239
> >>> Byte Value: histogram.70, Previous Count: 11933276, New Count: 11933274
> >>> Byte Value: histogram.71, Previous Count: 11927953, New Count: 11927969
> >>> Byte Value: histogram.72, Previous Count: 11929275, New Count: 11929266
> >>> Byte Value: histogram.73, Previous Count: 11930292, New Count: 11930306
> >>> Byte Value: histogram.74, Previous Count: 11935428, New Count: 11935427
> >>> Byte Value: histogram.75, Previous Count: 11930317, New Count: 11930307
> >>> Byte Value: histogram.76, Previous Count: 11935737, New Count: 11935726
> >>> Byte Value: histogram.77, Previous Count: 11932127, New Count: 11932125
> >>> Byte Value: histogram.78, Previous Count: 11932344, New Count: 11932349
> >>> Byte Value: histogram.79, Previous Count: 11932094, New Count: 11932100
> >>> Byte Value: histogram.80, Previous Count: 11930688, New Count: 11930687
> >>> Byte Value: histogram.81, Previous Count: 11928415, New Count: 11928416
> >>> Byte Value: histogram.82, Previous Count: 11931559, New Count: 11931542
> >>> Byte Value: histogram.83, Previous Count: 11934192, New Count: 11934176
> >>> Byte Value: histogram.84, Previous Count: 11927224, New Count: 11927231
> >>> Byte Value: histogram.85, Previous Count: 11929491, New Count: 11929484
> >>> Byte Value: histogram.87, Previous Count: 11932201, New Count: 11932190
> >>> Byte Value: histogram.88, Previous Count: 11930694, New Count: 11930680
> >>> Byte Value: histogram.89, Previous Count: 11936439, New Count: 11936448
> >>> Byte Value: histogram.9, Previous Count: 11933187, New Count: 11933193
> >>> Byte Value: histogram.90, Previous Count: 11926445, New Count: 11926455
> >>> Byte Value: histogram.94, Previous Count: 11931596, New Count: 11931609
> >>> Byte Value: histogram.95, Previous Count: 11929379, New Count: 11929384
> >>> Byte Value: histogram.97, Previous Count: 11928864, New Count: 11928874
> >>> Byte Value: histogram.98, Previous Count: 11924738, New Count: 11924729
> >>> Byte Value: histogram.99, Previous Count: 11930062, New Count: 11930059
> >>>
> >>> 2021-11-01 22:10:02,765 ERROR [Timer-Driven Process Thread-9] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are differences in the histogram
> >>> Byte Value: histogram.10, Previous Count: 11932402, New Count: 11932407
> >>> Byte Value: histogram.100, Previous Count: 11927531, New Count: 11927541
> >>> Byte Value: histogram.101, Previous Count: 11928454, New Count: 11928430
> >>> Byte Value: histogram.102, Previous Count: 11934432, New Count: 11934439
> >>> Byte Value: histogram.103, Previous Count: 11924623, New Count: 11924633
> >>> Byte Value: histogram.104, Previous Count: 11934492, New Count: 11934474
> >>> Byte Value: histogram.105, Previous Count: 11934585, New Count: 11934591
> >>> Byte Value: histogram.106, Previous Count: 11928955, New Count: 11928948
> >>> Byte Value: histogram.108, Previous Count: 11930139, New Count: 11930140
> >>> Byte Value: histogram.109, Previous Count: 11929325, New Count: 11929321
> >>> Byte Value: histogram.110, Previous Count: 11930486, New Count: 11930478
> >>> Byte Value: histogram.111, Previous Count: 11933517, New Count: 11933508
> >>> Byte Value: histogram.112, Previous Count: 11928334, New Count: 11928339
> >>> Byte Value: histogram.114, Previous Count: 11929222, New Count: 11929213
> >>> Byte Value: histogram.116, Previous Count: 11931182, New Count: 11931188
> >>> Byte Value: histogram.117, Previous Count: 11933407, New Count: 11933402
> >>> Byte Value: histogram.118, Previous Count: 11932709, New Count: 11932705
> >>> Byte Value: histogram.120, Previous Count: 11933700, New Count: 11933708
> >>> Byte Value: histogram.121, Previous Count: 11929803, New Count: 11929801
> >>> Byte Value: histogram.122, Previous Count: 11930218, New Count: 11930220
> >>> Byte Value: histogram.32, Previous Count: 11924458, New Count: 11924469
> >>> Byte Value: histogram.33, Previous Count: 11934243, New Count: 11934248
> >>> Byte Value: histogram.34, Previous Count: 11930696, New Count: 11930700
> >>> Byte Value: histogram.35, Previous Count: 11925574, New Count: 11925577
> >>> Byte Value: histogram.36, Previous Count: 11929198, New Count: 11929187
> >>> Byte Value: histogram.37, Previous Count: 11928146, New Count: 11928143
> >>> Byte Value: histogram.38, Previous Count: 11932505, New Count: 11932510
> >>> Byte Value: histogram.39, Previous Count: 11929406, New Count: 11929412
> >>> Byte Value: histogram.40, Previous Count: 11930100, New Count: 11930098
> >>> Byte Value: histogram.41, Previous Count: 11930867, New Count: 11930872
> >>> Byte Value: histogram.42, Previous Count: 11930796, New Count: 11930793
> >>> Byte Value: histogram.43, Previous Count: 11930796, New Count: 11930789
> >>> Byte Value: histogram.44, Previous Count: 11921866, New Count: 11921865
> >>> Byte Value: histogram.45, Previous Count: 11935682, New Count: 11935699
> >>> Byte Value: histogram.46, Previous Count: 11930075, New Count: 11930073
> >>> Byte Value: histogram.47, Previous Count: 11928169, New Count: 11928165
> >>> Byte Value: histogram.48, Previous Count: 11933490, New Count: 11933478
> >>> Byte Value: histogram.49, Previous Count: 11932174, New Count: 11932180
> >>> Byte Value: histogram.50, Previous Count: 11933255, New Count: 11933239
> >>> Byte Value: histogram.51, Previous Count: 11934009, New Count: 11934013
> >>> Byte Value: histogram.52, Previous Count: 11928361, New Count: 11928367
> >>> Byte Value: histogram.53, Previous Count: 11927626, New Count: 11927627
> >>> Byte Value: histogram.54, Previous Count: 11931611, New Count: 11931617
> >>> Byte Value: histogram.55, Previous Count: 11930755, New Count: 11930746
> >>> Byte Value: histogram.56, Previous Count: 11933823, New Count: 11933824
> >>> Byte Value: histogram.57, Previous Count: 11922508, New Count: 11922510
> >>> Byte Value: histogram.58, Previous Count: 11930384, New Count: 11930362
> >>> Byte Value: histogram.59, Previous Count: 11929805, New Count: 11929820
> >>> Byte Value: histogram.60, Previous Count: 11930064, New Count: 11930055
> >>> Byte Value: histogram.61, Previous Count: 11926761, New Count: 11926762
> >>> Byte Value: histogram.62, Previous Count: 11927605, New Count: 11927604
> >>> Byte Value: histogram.63, Previous Count: 23858926, New Count: 23858913
> >>> Byte Value: histogram.64, Previous Count: 11929516, New Count: 11929512
> >>> Byte Value: histogram.65, Previous Count: 11930217, New Count: 11930223
> >>> Byte Value: histogram.66, Previous Count: 11930478, New Count: 11930481
> >>> Byte Value: histogram.67, Previous Count: 11939855, New Count: 11939858
> >>> Byte Value: histogram.68, Previous Count: 11927850, New Count: 11927852
> >>> Byte Value: histogram.69, Previous Count: 11931154, New Count: 11931175
> >>> Byte Value: histogram.70, Previous Count: 11935374, New Count: 11935369
> >>> Byte Value: histogram.71, Previous Count: 11930754, New Count: 11930751
> >>> Byte Value: histogram.72, Previous Count: 11928304, New Count: 11928318
> >>> Byte Value: histogram.73, Previous Count: 11931772, New Count: 11931766
> >>> Byte Value: histogram.74, Previous Count: 11939417, New Count: 11939426
> >>> Byte Value: histogram.75, Previous Count: 11930712, New Count: 11930718
> >>> Byte Value: histogram.76, Previous Count: 11933331, New Count: 11933346
> >>> Byte Value: histogram.77, Previous Count: 11931279, New Count: 11931272
> >>> Byte Value: histogram.78, Previous Count: 11928276, New Count: 11928290
> >>> Byte Value: histogram.79, Previous Count: 11930071, New Count: 11930067
> >>> Byte Value: histogram.80, Previous Count: 11927830, New Count: 11927825
> >>> Byte Value: histogram.81, Previous Count: 11931213, New Count: 11931206
> >>> Byte Value: histogram.82, Previous Count: 11930964, New Count: 11930958
> >>> Byte Value: histogram.83, Previous Count: 11928973, New Count: 11928966
> >>> Byte Value: histogram.84, Previous Count: 11934325, New Count: 11934331
> >>> Byte Value: histogram.85, Previous Count: 11929658, New Count: 11929654
> >>> Byte Value: histogram.86, Previous Count: 11924667, New Count: 11924666
> >>> Byte Value: histogram.87, Previous Count: 11931100, New Count: 11931106
> >>> Byte Value: histogram.88, Previous Count: 11930252, New Count: 11930248
> >>> Byte Value: histogram.89, Previous Count: 11927281, New Count: 11927299
> >>> Byte Value: histogram.9, Previous Count: 11932848, New Count: 11932851
> >>> Byte Value: histogram.90, Previous Count: 11930398, New Count: 11930399
> >>> Byte Value: histogram.94, Previous Count: 11928720, New Count: 11928715
> >>> Byte Value: histogram.95, Previous Count: 11928988, New Count: 11928977
> >>> Byte Value: histogram.97, Previous Count: 11931423, New Count: 11931426
> >>> Byte Value: histogram.98, Previous Count: 11928181, New Count: 11928184
> >>> Byte Value: histogram.99, Previous Count: 11935549, New Count: 11935542
> >>>
> >>> 2021-11-01 22:23:08,989 ERROR [Timer-Driven Process Thread-10] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=24d13930-49e8-1062-9a2c-943118738138] There are differences in the histogram
> >>> Byte Value: histogram.10, Previous Count: 11930417, New Count: 11930411
> >>> Byte Value: histogram.100, Previous Count: 11926739, New Count: 11926755
> >>> Byte Value: histogram.101, Previous Count: 11930580, New Count: 11930574
> >>> Byte Value: histogram.102, Previous Count: 11928210, New Count: 11928202
> >>> Byte Value: histogram.103, Previous Count: 11935300, New Count: 11935297
> >>> Byte Value: histogram.104, Previous Count: 11925804, New Count: 11925820
> >>> Byte Value: histogram.105, Previous Count: 11931023, New Count: 11931012
> >>> Byte Value: histogram.106, Previous Count: 11932342, New Count: 11932344
> >>> Byte Value: histogram.108, Previous Count: 11930098, New Count: 11930106
> >>> Byte Value: histogram.109, Previous Count: 11930759, New Count: 11930750
> >>> Byte Value: histogram.110, Previous Count: 11934343, New Count: 11934352
> >>> Byte Value: histogram.111, Previous Count: 11935775, New Count: 11935782
> >>> Byte Value: histogram.112, Previous Count: 11933877, New Count: 11933884
> >>> Byte Value: histogram.113, Previous Count: 11926675, New Count: 11926674
> >>> Byte Value: histogram.114, Previous Count: 11929332, New Count: 11929336
> >>> Byte Value: histogram.115, Previous Count: 11928876, New Count: 11928878
> >>> Byte Value: histogram.116, Previous Count: 11927819, New Count: 11927833
> >>> Byte Value: histogram.117, Previous Count: 11932657, New Count: 11932638
> >>> Byte Value: histogram.118, Previous Count: 11933508, New Count: 11933507
> >>> Byte Value: histogram.119, Previous Count: 11928808, New Count: 11928821
> >>> Byte Value: histogram.120, Previous Count: 11937532, New Count: 11937528
> >>> Byte Value: histogram.121, Previous Count: 11926907, New Count: 11926921
> >>> Byte Value: histogram.32, Previous Count: 11929486, New Count: 11929489
> >>> Byte Value: histogram.33, Previous Count: 11930737, New Count: 11930741
> >>> Byte Value: histogram.34, Previous Count: 11931092, New Count: 11931086
> >>> Byte Value: histogram.36, Previous Count: 11927605, New Count: 11927615
> >>> Byte Value: histogram.37, Previous Count: 11930735, New Count: 11930745
> >>> Byte Value: histogram.38, Previous Count: 11932174, New Count: 11932178
> >>> Byte Value: histogram.39, Previous Count: 11936180, New Count: 11936182
> >>> Byte Value: histogram.40, Previous Count: 11931666, New Count: 11931676
> >>> Byte Value: histogram.41, Previous Count: 11927043, New Count: 11927034
> >>> Byte Value: histogram.42, Previous Count: 11929044, New Count: 11929042
> >>> Byte Value: histogram.43, Previous Count: 11934104, New Count: 11934098
> >>> Byte Value: histogram.44, Previous Count: 11936337, New Count: 11936346
> >>> Byte Value: histogram.45, Previous Count: 11935580, New Count: 11935582
> >>> Byte Value: histogram.46, Previous Count: 11929598, New Count: 11929599
> >>> Byte Value: histogram.47, Previous Count: 11934083, New Count: 11934085
> >>> Byte Value: histogram.48, Previous Count: 11928858, New Count: 11928860
> >>> Byte Value: histogram.49, Previous Count: 11931098, New Count: 11931113
> >>> Byte Value: histogram.50, Previous Count: 11930618, New Count: 11930614
> >>> Byte Value: histogram.51, Previous Count: 11925429, New Count: 11925435
> >>> Byte Value: histogram.52, Previous Count: 11929741, New Count: 11929733
> >>> Byte Value: histogram.53, Previous Count: 11934160, New Count: 11934155
> >>> Byte Value: histogram.54, Previous Count: 11931999, New Count: 11931980
> >>> Byte Value: histogram.55, Previous Count: 11930465, New Count: 11930477
> >>> Byte Value: histogram.56, Previous Count: 11926194, New Count: 11926190
> >>> Byte Value: histogram.57, Previous Count: 11926386, New Count: 11926381
> >>> Byte Value: histogram.58, Previous Count: 11924871, New Count: 11924865
> >>> Byte Value: histogram.59, Previous Count: 11929331, New Count: 11929326
> >>> Byte Value: histogram.60, Previous Count: 11926951, New Count: 11926943
> >>> Byte Value: histogram.61, Previous Count: 11928631, New Count: 11928619
> >>> Byte Value: histogram.62, Previous Count: 11927549, New Count: 11927553
> >>> Byte Value: histogram.63, Previous Count: 23856730, New Count: 23856718
> >>> Byte Value: histogram.64, Previous Count: 11930288, New Count: 11930293
> >>> Byte Value: histogram.65, Previous Count: 11931523, New Count: 11931527
> >>> Byte Value: histogram.66, Previous Count: 11932821, New Count: 11932818
> >>> Byte Value: histogram.67, Previous Count: 11932509, New Count: 11932510
> >>> Byte Value: histogram.68, Previous Count: 11929613, New Count: 11929614
> >>> Byte Value: histogram.69, Previous Count: 11928651, New Count: 11928654
> >>> Byte Value: histogram.70, Previous Count: 11929253, New Count: 11929247
> >>> Byte Value: histogram.71, Previous Count: 11931521, New Count: 11931512
> >>> Byte Value: histogram.72, Previous Count: 11925805, New Count: 11925808
> >>> Byte Value: histogram.73, Previous Count: 11934833, New Count: 11934826
> >>> Byte Value: histogram.74, Previous Count: 11928314, New Count: 11928312
> >>> Byte Value: histogram.75, Previous Count: 11923854, New Count: 11923863
> >>> Byte Value: histogram.76, Previous Count: 11930892, New Count: 11930898
> >>> Byte Value: histogram.77, Previous Count: 11927528, New Count: 11927525
> >>> Byte Value: histogram.78, Previous Count: 11932850, New Count: 11932857
> >>> Byte Value: histogram.79, Previous Count: 11934471, New Count: 11934461
> >>> Byte Value: histogram.80, Previous Count: 11925707, New Count: 11925714
> >>> Byte Value: histogram.81, Previous Count: 11929213, New Count: 11929206
> >>> Byte Value: histogram.82, Previous Count: 11931334, New Count: 11931323
> >>> Byte Value: histogram.83, Previous Count: 11936739, New Count: 11936732
> >>> Byte Value: histogram.84, Previous Count: 11927855, New Count: 11927832
> >>> Byte Value: histogram.85, Previous Count: 11931668, New Count: 11931665
> >>> Byte Value: histogram.86, Previous Count: 11928609, New Count: 11928604
> >>> Byte Value: histogram.87, Previous Count: 11931930, New Count: 11931933
> >>> Byte Value: histogram.88, Previous Count: 11934341, New Count: 11934345
> >>> Byte Value: histogram.89, Previous Count: 11927519, New Count: 11927518
> >>> Byte Value: histogram.9, Previous Count: 11928004, New Count: 11928001
> >>> Byte Value: histogram.90, Previous Count: 11933502, New Count: 11933517
> >>> Byte Value: histogram.94, Previous Count: 11932024, New Count: 11932035
> >>> Byte Value: histogram.95, Previous Count: 11932693, New Count: 11932679
> >>> Byte Value: histogram.97, Previous Count: 11928428, New Count: 11928424
> >>> Byte Value: histogram.98, Previous Count: 11933195, New Count: 11933196
> >>> Byte Value: histogram.99, Previous Count: 11924273, New Count: 11924282
> >>>
> >>>> Den tir. 2. nov. 2021 kl. 15.41 skrev Mark Payne <ma...@hotmail.com>:
> >>>>
> >>>> Jens,
> >>>>
> >>>> The histograms, in and of themselves, are not very interesting. The interesting thing would be the difference in the histogram before & after the hash. Can you provide the ERROR level logs generated by the ExecuteScript? That’s what is of interest.
> >>>>
> >>>> Thanks
> >>>> -Mark
> >>>>
> >>>>
> >>>> On Nov 2, 2021, at 1:35 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>>
> >>>> Hi Mark and Joe
> >>>>
> >>>> Yesterday morning I implemented Mark's script in my 2 testflows. One testflow using sftp the other MergeContent/UnpackContent. Both testflow are running at a test cluster with 3 nodes and NIFI 1.14.0
> >>>> The 1st flow with sftp have had 1 file going into the failure queue after about 16 hours.
> >>>> The 2nd flow have had 2 files  going into the failure queue after about 15 and 17 hours.
> >>>>
> >>>> There are definitely something going wrongs in my setup, but I can't figure out what.
> >>>>
> >>>> Information from file 1:
> >>>> histogram.0;0
> >>>> histogram.1;0
> >>>> histogram.10;11926720
> >>>> histogram.100;11927504
> >>>> histogram.101;11925396
> >>>> histogram.102;11929923
> >>>> histogram.103;11931596
> >>>> histogram.104;11929071
> >>>> histogram.105;11931365
> >>>> histogram.106;11928661
> >>>> histogram.107;11929864
> >>>> histogram.108;11931611
> >>>> histogram.109;11932758
> >>>> histogram.11;0
> >>>> histogram.110;11927893
> >>>> histogram.111;11933519
> >>>> histogram.112;11931392
> >>>> histogram.113;11928534
> >>>> histogram.114;11936879
> >>>> histogram.115;11932818
> >>>> histogram.116;11934767
> >>>> histogram.117;11929143
> >>>> histogram.118;11931854
> >>>> histogram.119;11926333
> >>>> histogram.12;0
> >>>> histogram.120;11928731
> >>>> histogram.121;11931149
> >>>> histogram.122;11926725
> >>>> histogram.123;0
> >>>> histogram.124;0
> >>>> histogram.125;0
> >>>> histogram.126;0
> >>>> histogram.127;0
> >>>> histogram.128;0
> >>>> histogram.129;0
> >>>> histogram.13;0
> >>>> histogram.130;0
> >>>> histogram.131;0
> >>>> histogram.132;0
> >>>> histogram.133;0
> >>>> histogram.134;0
> >>>> histogram.135;0
> >>>> histogram.136;0
> >>>> histogram.137;0
> >>>> histogram.138;0
> >>>> histogram.139;0
> >>>> histogram.14;0
> >>>> histogram.140;0
> >>>> histogram.141;0
> >>>> histogram.142;0
> >>>> histogram.143;0
> >>>> histogram.144;0
> >>>> histogram.145;0
> >>>> histogram.146;0
> >>>> histogram.147;0
> >>>> histogram.148;0
> >>>> histogram.149;0
> >>>> histogram.15;0
> >>>> histogram.150;0
> >>>> histogram.151;0
> >>>> histogram.152;0
> >>>> histogram.153;0
> >>>> histogram.154;0
> >>>> histogram.155;0
> >>>> histogram.156;0
> >>>> histogram.157;0
> >>>> histogram.158;0
> >>>> histogram.159;0
> >>>> histogram.16;0
> >>>> histogram.160;0
> >>>> histogram.161;0
> >>>> histogram.162;0
> >>>> histogram.163;0
> >>>> histogram.164;0
> >>>> histogram.165;0
> >>>> histogram.166;0
> >>>> histogram.167;0
> >>>> histogram.168;0
> >>>> histogram.169;0
> >>>> histogram.17;0
> >>>> histogram.170;0
> >>>> histogram.171;0
> >>>> histogram.172;0
> >>>> histogram.173;0
> >>>> histogram.174;0
> >>>> histogram.175;0
> >>>> histogram.176;0
> >>>> histogram.177;0
> >>>> histogram.178;0
> >>>> histogram.179;0
> >>>> histogram.18;0
> >>>> histogram.180;0
> >>>> histogram.181;0
> >>>> histogram.182;0
> >>>> histogram.183;0
> >>>> histogram.184;0
> >>>> histogram.185;0
> >>>> histogram.186;0
> >>>> histogram.187;0
> >>>> histogram.188;0
> >>>> histogram.189;0
> >>>> histogram.19;0
> >>>> histogram.190;0
> >>>> histogram.191;0
> >>>> histogram.192;0
> >>>> histogram.193;0
> >>>> histogram.194;0
> >>>> histogram.195;0
> >>>> histogram.196;0
> >>>> histogram.197;0
> >>>> histogram.198;0
> >>>> histogram.199;0
> >>>> histogram.2;0
> >>>> histogram.20;0
> >>>> histogram.200;0
> >>>> histogram.201;0
> >>>> histogram.202;0
> >>>> histogram.203;0
> >>>> histogram.204;0
> >>>> histogram.205;0
> >>>> histogram.206;0
> >>>> histogram.207;0
> >>>> histogram.208;0
> >>>> histogram.209;0
> >>>> histogram.21;0
> >>>> histogram.210;0
> >>>> histogram.211;0
> >>>> histogram.212;0
> >>>> histogram.213;0
> >>>> histogram.214;0
> >>>> histogram.215;0
> >>>> histogram.216;0
> >>>> histogram.217;0
> >>>> histogram.218;0
> >>>> histogram.219;0
> >>>> histogram.22;0
> >>>> histogram.220;0
> >>>> histogram.221;0
> >>>> histogram.222;0
> >>>> histogram.223;0
> >>>> histogram.224;0
> >>>> histogram.225;0
> >>>> histogram.226;0
> >>>> histogram.227;0
> >>>> histogram.228;0
> >>>> histogram.229;0
> >>>> histogram.23;0
> >>>> histogram.230;0
> >>>> histogram.231;0
> >>>> histogram.232;0
> >>>> histogram.233;0
> >>>> histogram.234;0
> >>>> histogram.235;0
> >>>> histogram.236;0
> >>>> histogram.237;0
> >>>> histogram.238;0
> >>>> histogram.239;0
> >>>> histogram.24;0
> >>>> histogram.240;0
> >>>> histogram.241;0
> >>>> histogram.242;0
> >>>> histogram.243;0
> >>>> histogram.244;0
> >>>> histogram.245;0
> >>>> histogram.246;0
> >>>> histogram.247;0
> >>>> histogram.248;0
> >>>> histogram.249;0
> >>>> histogram.25;0
> >>>> histogram.250;0
> >>>> histogram.251;0
> >>>> histogram.252;0
> >>>> histogram.253;0
> >>>> histogram.254;0
> >>>> histogram.255;0
> >>>> histogram.26;0
> >>>> histogram.27;0
> >>>> histogram.28;0
> >>>> histogram.29;0
> >>>> histogram.3;0
> >>>> histogram.30;0
> >>>> histogram.31;0
> >>>> histogram.32;11930422
> >>>> histogram.33;11934311
> >>>> histogram.34;11930459
> >>>> histogram.35;11924776
> >>>> histogram.36;11924186
> >>>> histogram.37;11928616
> >>>> histogram.38;11929474
> >>>> histogram.39;11929607
> >>>> histogram.4;0
> >>>> histogram.40;11928053
> >>>> histogram.41;11930402
> >>>> histogram.42;11926830
> >>>> histogram.43;11938138
> >>>> histogram.44;11932536
> >>>> histogram.45;11931053
> >>>> histogram.46;11930008
> >>>> histogram.47;11927747
> >>>> histogram.48;11936055
> >>>> histogram.49;11931471
> >>>> histogram.5;0
> >>>> histogram.50;11931921
> >>>> histogram.51;11929643
> >>>> histogram.52;11923847
> >>>> histogram.53;11927311
> >>>> histogram.54;11933754
> >>>> histogram.55;11925964
> >>>> histogram.56;11928872
> >>>> histogram.57;11931124
> >>>> histogram.58;11928474
> >>>> histogram.59;11925814
> >>>> histogram.6;0
> >>>> histogram.60;11933978
> >>>> histogram.61;11934136
> >>>> histogram.62;11932016
> >>>> histogram.63;23864588
> >>>> histogram.64;11924792
> >>>> histogram.65;11934789
> >>>> histogram.66;11933047
> >>>> histogram.67;11931899
> >>>> histogram.68;11935615
> >>>> histogram.69;11927249
> >>>> histogram.7;0
> >>>> histogram.70;11933276
> >>>> histogram.71;11927953
> >>>> histogram.72;11929275
> >>>> histogram.73;11930292
> >>>> histogram.74;11935428
> >>>> histogram.75;11930317
> >>>> histogram.76;11935737
> >>>> histogram.77;11932127
> >>>> histogram.78;11932344
> >>>> histogram.79;11932094
> >>>> histogram.8;0
> >>>> histogram.80;11930688
> >>>> histogram.81;11928415
> >>>> histogram.82;11931559
> >>>> histogram.83;11934192
> >>>> histogram.84;11927224
> >>>> histogram.85;11929491
> >>>> histogram.86;11930624
> >>>> histogram.87;11932201
> >>>> histogram.88;11930694
> >>>> histogram.89;11936439
> >>>> histogram.9;11933187
> >>>> histogram.90;11926445
> >>>> histogram.91;0
> >>>> histogram.92;0
> >>>> histogram.93;0
> >>>> histogram.94;11931596
> >>>> histogram.95;11929379
> >>>> histogram.96;0
> >>>> histogram.97;11928864
> >>>> histogram.98;11924738
> >>>> histogram.99;11930062
> >>>> histogram.totalBytes;1073741824
> >>>>
> >>>> File 2:
> >>>> histogram.0;0
> >>>> histogram.1;0
> >>>> histogram.10;11932402
> >>>> histogram.100;11927531
> >>>> histogram.101;11928454
> >>>> histogram.102;11934432
> >>>> histogram.103;11924623
> >>>> histogram.104;11934492
> >>>> histogram.105;11934585
> >>>> histogram.106;11928955
> >>>> histogram.107;11928651
> >>>> histogram.108;11930139
> >>>> histogram.109;11929325
> >>>> histogram.11;0
> >>>> histogram.110;11930486
> >>>> histogram.111;11933517
> >>>> histogram.112;11928334
> >>>> histogram.113;11927798
> >>>> histogram.114;11929222
> >>>> histogram.115;11932057
> >>>> histogram.116;11931182
> >>>> histogram.117;11933407
> >>>> histogram.118;11932709
> >>>> histogram.119;11931338
> >>>> histogram.12;0
> >>>> histogram.120;11933700
> >>>> histogram.121;11929803
> >>>> histogram.122;11930218
> >>>> histogram.123;0
> >>>> histogram.124;0
> >>>> histogram.125;0
> >>>> histogram.126;0
> >>>> histogram.127;0
> >>>> histogram.128;0
> >>>> histogram.129;0
> >>>> histogram.13;0
> >>>> histogram.130;0
> >>>> histogram.131;0
> >>>> histogram.132;0
> >>>> histogram.133;0
> >>>> histogram.134;0
> >>>> histogram.135;0
> >>>> histogram.136;0
> >>>> histogram.137;0
> >>>> histogram.138;0
> >>>> histogram.139;0
> >>>> histogram.14;0
> >>>> histogram.140;0
> >>>> histogram.141;0
> >>>> histogram.142;0
> >>>> histogram.143;0
> >>>> histogram.144;0
> >>>> histogram.145;0
> >>>> histogram.146;0
> >>>> histogram.147;0
> >>>> histogram.148;0
> >>>> histogram.149;0
> >>>> histogram.15;0
> >>>> histogram.150;0
> >>>> histogram.151;0
> >>>> histogram.152;0
> >>>> histogram.153;0
> >>>> histogram.154;0
> >>>> histogram.155;0
> >>>> histogram.156;0
> >>>> histogram.157;0
> >>>> histogram.158;0
> >>>> histogram.159;0
> >>>> histogram.16;0
> >>>> histogram.160;0
> >>>> histogram.161;0
> >>>> histogram.162;0
> >>>> histogram.163;0
> >>>> histogram.164;0
> >>>> histogram.165;0
> >>>> histogram.166;0
> >>>> histogram.167;0
> >>>> histogram.168;0
> >>>> histogram.169;0
> >>>> histogram.17;0
> >>>> histogram.170;0
> >>>> histogram.171;0
> >>>> histogram.172;0
> >>>> histogram.173;0
> >>>> histogram.174;0
> >>>> histogram.175;0
> >>>> histogram.176;0
> >>>> histogram.177;0
> >>>> histogram.178;0
> >>>> histogram.179;0
> >>>> histogram.18;0
> >>>> histogram.180;0
> >>>> histogram.181;0
> >>>> histogram.182;0
> >>>> histogram.183;0
> >>>> histogram.184;0
> >>>> histogram.185;0
> >>>> histogram.186;0
> >>>> histogram.187;0
> >>>> histogram.188;0
> >>>> histogram.189;0
> >>>> histogram.19;0
> >>>> histogram.190;0
> >>>> histogram.191;0
> >>>> histogram.192;0
> >>>> histogram.193;0
> >>>> histogram.194;0
> >>>> histogram.195;0
> >>>> histogram.196;0
> >>>> histogram.197;0
> >>>> histogram.198;0
> >>>> histogram.199;0
> >>>> histogram.2;0
> >>>> histogram.20;0
> >>>> histogram.200;0
> >>>> histogram.201;0
> >>>> histogram.202;0
> >>>> histogram.203;0
> >>>> histogram.204;0
> >>>> histogram.205;0
> >>>> histogram.206;0
> >>>> histogram.207;0
> >>>> histogram.208;0
> >>>> histogram.209;0
> >>>> histogram.21;0
> >>>> histogram.210;0
> >>>> histogram.211;0
> >>>> histogram.212;0
> >>>> histogram.213;0
> >>>> histogram.214;0
> >>>> histogram.215;0
> >>>> histogram.216;0
> >>>> histogram.217;0
> >>>> histogram.218;0
> >>>> histogram.219;0
> >>>> histogram.22;0
> >>>> histogram.220;0
> >>>> histogram.221;0
> >>>> histogram.222;0
> >>>> histogram.223;0
> >>>> histogram.224;0
> >>>> histogram.225;0
> >>>> histogram.226;0
> >>>> histogram.227;0
> >>>> histogram.228;0
> >>>> histogram.229;0
> >>>> histogram.23;0
> >>>> histogram.230;0
> >>>> histogram.231;0
> >>>> histogram.232;0
> >>>> histogram.233;0
> >>>> histogram.234;0
> >>>> histogram.235;0
> >>>> histogram.236;0
> >>>> histogram.237;0
> >>>> histogram.238;0
> >>>> histogram.239;0
> >>>> histogram.24;0
> >>>> histogram.240;0
> >>>> histogram.241;0
> >>>> histogram.242;0
> >>>> histogram.243;0
> >>>> histogram.244;0
> >>>> histogram.245;0
> >>>> histogram.246;0
> >>>> histogram.247;0
> >>>> histogram.248;0
> >>>> histogram.249;0
> >>>> histogram.25;0
> >>>> histogram.250;0
> >>>> histogram.251;0
> >>>> histogram.252;0
> >>>> histogram.253;0
> >>>> histogram.254;0
> >>>> histogram.255;0
> >>>> histogram.26;0
> >>>> histogram.27;0
> >>>> histogram.28;0
> >>>> histogram.29;0
> >>>> histogram.3;0
> >>>> histogram.30;0
> >>>> histogram.31;0
> >>>> histogram.32;11924458
> >>>> histogram.33;11934243
> >>>> histogram.34;11930696
> >>>> histogram.35;11925574
> >>>> histogram.36;11929198
> >>>> histogram.37;11928146
> >>>> histogram.38;11932505
> >>>> histogram.39;11929406
> >>>> histogram.4;0
> >>>> histogram.40;11930100
> >>>> histogram.41;11930867
> >>>> histogram.42;11930796
> >>>> histogram.43;11930796
> >>>> histogram.44;11921866
> >>>> histogram.45;11935682
> >>>> histogram.46;11930075
> >>>> histogram.47;11928169
> >>>> histogram.48;11933490
> >>>> histogram.49;11932174
> >>>> histogram.5;0
> >>>> histogram.50;11933255
> >>>> histogram.51;11934009
> >>>> histogram.52;11928361
> >>>> histogram.53;11927626
> >>>> histogram.54;11931611
> >>>> histogram.55;11930755
> >>>> histogram.56;11933823
> >>>> histogram.57;11922508
> >>>> histogram.58;11930384
> >>>> histogram.59;11929805
> >>>> histogram.6;0
> >>>> histogram.60;11930064
> >>>> histogram.61;11926761
> >>>> histogram.62;11927605
> >>>> histogram.63;23858926
> >>>> histogram.64;11929516
> >>>> histogram.65;11930217
> >>>> histogram.66;11930478
> >>>> histogram.67;11939855
> >>>> histogram.68;11927850
> >>>> histogram.69;11931154
> >>>> histogram.7;0
> >>>> histogram.70;11935374
> >>>> histogram.71;11930754
> >>>> histogram.72;11928304
> >>>> histogram.73;11931772
> >>>> histogram.74;11939417
> >>>> histogram.75;11930712
> >>>> histogram.76;11933331
> >>>> histogram.77;11931279
> >>>> histogram.78;11928276
> >>>> histogram.79;11930071
> >>>> histogram.8;0
> >>>> histogram.80;11927830
> >>>> histogram.81;11931213
> >>>> histogram.82;11930964
> >>>> histogram.83;11928973
> >>>> histogram.84;11934325
> >>>> histogram.85;11929658
> >>>> histogram.86;11924667
> >>>> histogram.87;11931100
> >>>> histogram.88;11930252
> >>>> histogram.89;11927281
> >>>> histogram.9;11932848
> >>>> histogram.90;11930398
> >>>> histogram.91;0
> >>>> histogram.92;0
> >>>> histogram.93;0
> >>>> histogram.94;11928720
> >>>> histogram.95;11928988
> >>>> histogram.96;0
> >>>> histogram.97;11931423
> >>>> histogram.98;11928181
> >>>> histogram.99;11935549
> >>>> histogram.totalBytes;1073741824
> >>>>
> >>>> File3:
> >>>> histogram.0;0
> >>>> histogram.1;0
> >>>> histogram.10;11930417
> >>>> histogram.100;11926739
> >>>> histogram.101;11930580
> >>>> histogram.102;11928210
> >>>> histogram.103;11935300
> >>>> histogram.104;11925804
> >>>> histogram.105;11931023
> >>>> histogram.106;11932342
> >>>> histogram.107;11929778
> >>>> histogram.108;11930098
> >>>> histogram.109;11930759
> >>>> histogram.11;0
> >>>> histogram.110;11934343
> >>>> histogram.111;11935775
> >>>> histogram.112;11933877
> >>>> histogram.113;11926675
> >>>> histogram.114;11929332
> >>>> histogram.115;11928876
> >>>> histogram.116;11927819
> >>>> histogram.117;11932657
> >>>> histogram.118;11933508
> >>>> histogram.119;11928808
> >>>> histogram.12;0
> >>>> histogram.120;11937532
> >>>> histogram.121;11926907
> >>>> histogram.122;11933942
> >>>> histogram.123;0
> >>>> histogram.124;0
> >>>> histogram.125;0
> >>>> histogram.126;0
> >>>> histogram.127;0
> >>>> histogram.128;0
> >>>> histogram.129;0
> >>>> histogram.13;0
> >>>> histogram.130;0
> >>>> histogram.131;0
> >>>> histogram.132;0
> >>>> histogram.133;0
> >>>> histogram.134;0
> >>>> histogram.135;0
> >>>> histogram.136;0
> >>>> histogram.137;0
> >>>> histogram.138;0
> >>>> histogram.139;0
> >>>> histogram.14;0
> >>>> histogram.140;0
> >>>> histogram.141;0
> >>>> histogram.142;0
> >>>> histogram.143;0
> >>>> histogram.144;0
> >>>> histogram.145;0
> >>>> histogram.146;0
> >>>> histogram.147;0
> >>>> histogram.148;0
> >>>> histogram.149;0
> >>>> histogram.15;0
> >>>> histogram.150;0
> >>>> histogram.151;0
> >>>> histogram.152;0
> >>>> histogram.153;0
> >>>> histogram.154;0
> >>>> histogram.155;0
> >>>> histogram.156;0
> >>>> histogram.157;0
> >>>> histogram.158;0
> >>>> histogram.159;0
> >>>> histogram.16;0
> >>>> histogram.160;0
> >>>> histogram.161;0
> >>>> histogram.162;0
> >>>> histogram.163;0
> >>>> histogram.164;0
> >>>> histogram.165;0
> >>>> histogram.166;0
> >>>> histogram.167;0
> >>>> histogram.168;0
> >>>> histogram.169;0
> >>>> histogram.17;0
> >>>> histogram.170;0
> >>>> histogram.171;0
> >>>> histogram.172;0
> >>>> histogram.173;0
> >>>> histogram.174;0
> >>>> histogram.175;0
> >>>> histogram.176;0
> >>>> histogram.177;0
> >>>> histogram.178;0
> >>>> histogram.179;0
> >>>> histogram.18;0
> >>>> histogram.180;0
> >>>> histogram.181;0
> >>>> histogram.182;0
> >>>> histogram.183;0
> >>>> histogram.184;0
> >>>> histogram.185;0
> >>>> histogram.186;0
> >>>> histogram.187;0
> >>>> histogram.188;0
> >>>> histogram.189;0
> >>>> histogram.19;0
> >>>> histogram.190;0
> >>>> histogram.191;0
> >>>> histogram.192;0
> >>>> histogram.193;0
> >>>> histogram.194;0
> >>>> histogram.195;0
> >>>> histogram.196;0
> >>>> histogram.197;0
> >>>> histogram.198;0
> >>>> histogram.199;0
> >>>> histogram.2;0
> >>>> histogram.20;0
> >>>> histogram.200;0
> >>>> histogram.201;0
> >>>> histogram.202;0
> >>>> histogram.203;0
> >>>> histogram.204;0
> >>>> histogram.205;0
> >>>> histogram.206;0
> >>>> histogram.207;0
> >>>> histogram.208;0
> >>>> histogram.209;0
> >>>> histogram.21;0
> >>>> histogram.210;0
> >>>> histogram.211;0
> >>>> histogram.212;0
> >>>> histogram.213;0
> >>>> histogram.214;0
> >>>> histogram.215;0
> >>>> histogram.216;0
> >>>> histogram.217;0
> >>>> histogram.218;0
> >>>> histogram.219;0
> >>>> histogram.22;0
> >>>> histogram.220;0
> >>>> histogram.221;0
> >>>> histogram.222;0
> >>>> histogram.223;0
> >>>> histogram.224;0
> >>>> histogram.225;0
> >>>> histogram.226;0
> >>>> histogram.227;0
> >>>> histogram.228;0
> >>>> histogram.229;0
> >>>> histogram.23;0
> >>>> histogram.230;0
> >>>> histogram.231;0
> >>>> histogram.232;0
> >>>> histogram.233;0
> >>>> histogram.234;0
> >>>> histogram.235;0
> >>>> histogram.236;0
> >>>> histogram.237;0
> >>>> histogram.238;0
> >>>> histogram.239;0
> >>>> histogram.24;0
> >>>> histogram.240;0
> >>>> histogram.241;0
> >>>> histogram.242;0
> >>>> histogram.243;0
> >>>> histogram.244;0
> >>>> histogram.245;0
> >>>> histogram.246;0
> >>>> histogram.247;0
> >>>> histogram.248;0
> >>>> histogram.249;0
> >>>> histogram.25;0
> >>>> histogram.250;0
> >>>> histogram.251;0
> >>>> histogram.252;0
> >>>> histogram.253;0
> >>>> histogram.254;0
> >>>> histogram.255;0
> >>>> histogram.26;0
> >>>> histogram.27;0
> >>>> histogram.28;0
> >>>> histogram.29;0
> >>>> histogram.3;0
> >>>> histogram.30;0
> >>>> histogram.31;0
> >>>> histogram.32;11929486
> >>>> histogram.33;11930737
> >>>> histogram.34;11931092
> >>>> histogram.35;11934488
> >>>> histogram.36;11927605
> >>>> histogram.37;11930735
> >>>> histogram.38;11932174
> >>>> histogram.39;11936180
> >>>> histogram.4;0
> >>>> histogram.40;11931666
> >>>> histogram.41;11927043
> >>>> histogram.42;11929044
> >>>> histogram.43;11934104
> >>>> histogram.44;11936337
> >>>> histogram.45;11935580
> >>>> histogram.46;11929598
> >>>> histogram.47;11934083
> >>>> histogram.48;11928858
> >>>> histogram.49;11931098
> >>>> histogram.5;0
> >>>> histogram.50;11930618
> >>>> histogram.51;11925429
> >>>> histogram.52;11929741
> >>>> histogram.53;11934160
> >>>> histogram.54;11931999
> >>>> histogram.55;11930465
> >>>> histogram.56;11926194
> >>>> histogram.57;11926386
> >>>> histogram.58;11924871
> >>>> histogram.59;11929331
> >>>> histogram.6;0
> >>>> histogram.60;11926951
> >>>> histogram.61;11928631
> >>>> histogram.62;11927549
> >>>> histogram.63;23856730
> >>>> histogram.64;11930288
> >>>> histogram.65;11931523
> >>>> histogram.66;11932821
> >>>> histogram.67;11932509
> >>>> histogram.68;11929613
> >>>> histogram.69;11928651
> >>>> histogram.7;0
> >>>> histogram.70;11929253
> >>>> histogram.71;11931521
> >>>> histogram.72;11925805
> >>>> histogram.73;11934833
> >>>> histogram.74;11928314
> >>>> histogram.75;11923854
> >>>> histogram.76;11930892
> >>>> histogram.77;11927528
> >>>> histogram.78;11932850
> >>>> histogram.79;11934471
> >>>> histogram.8;0
> >>>> histogram.80;11925707
> >>>> histogram.81;11929213
> >>>> histogram.82;11931334
> >>>> histogram.83;11936739
> >>>> histogram.84;11927855
> >>>> histogram.85;11931668
> >>>> histogram.86;11928609
> >>>> histogram.87;11931930
> >>>> histogram.88;11934341
> >>>> histogram.89;11927519
> >>>> histogram.9;11928004
> >>>> histogram.90;11933502
> >>>> histogram.91;0
> >>>> histogram.92;0
> >>>> histogram.93;0
> >>>> histogram.94;11932024
> >>>> histogram.95;11932693
> >>>> histogram.96;0
> >>>> histogram.97;11928428
> >>>> histogram.98;11933195
> >>>> histogram.99;11924273
> >>>> histogram.totalBytes;1073741824
> >>>>
> >>>> Kind regards
> >>>> Jens
> >>>>
> >>>>> Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <jo...@gmail.com>:
> >>>>>
> >>>>> Jen
> >>>>>
> >>>>> 118 hours in - still goood.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>>> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com> wrote:
> >>>>>>
> >>>>>> Jens
> >>>>>>
> >>>>>> Update from hour 67.  Still lookin' good.
> >>>>>>
> >>>>>> Will advise.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>> On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 days before the first error occurred
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>> Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
> >>>>>>>>
> >>>>>>>> Jens,
> >>>>>>>>
> >>>>>>>> Am 40+ hours in running both your flow and mine to reproduce.  So far
> >>>>>>>> neither have shown any sign of trouble.  Will keep running for another
> >>>>>>>> week or so if I can.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
> >>>>>>>>> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
> >>>>>>>>>
> >>>>>>>>> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> >>>>>>>>>
> >>>>>>>>> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
> >>>>>>>>>
> >>>>>>>>> Kind regards
> >>>>>>>>> Jens
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>> I don't quite follow the EXT4 usage on top of VMFS but the point here
> >>>>>>>>> is you'll ultimately need to truly understand your underlying storage
> >>>>>>>>> system and what sorts of guarantees it is giving you.  If linux/the
> >>>>>>>>> jvm/nifi think it has a typical EXT4 type block storage system to work
> >>>>>>>>> with it can only be safe/operate within those constraints.  I have no
> >>>>>>>>> idea about what VMFS brings to the table or the settings for it.
> >>>>>>>>>
> >>>>>>>>> The sync properties I shared previously might help force the issue of
> >>>>>>>>> ensuring a formal sync/flush cycle all the way through the disk has
> >>>>>>>>> occurred which we'd normally not do or need to do but again in some
> >>>>>>>>> cases offers a stronger guarantee in exchange for performance.
> >>>>>>>>>
> >>>>>>>>> In any case...Mark's path for you here will help identify what we're
> >>>>>>>>> dealing with and we can go from there.
> >>>>>>>>>
> >>>>>>>>> I am aware of significant usage of NiFi on VMWare configurations
> >>>>>>>>> without issue at high rates for many years so whatever it is here is
> >>>>>>>>> likely solvable.
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi Mark
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
> >>>>>>>>>
> >>>>>>>>> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
> >>>>>>>>>
> >>>>>>>>> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
> >>>>>>>>>
> >>>>>>>>> Just like the case in step 5 and 12 in the pdf file
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I will let you all know more later next week
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Kind regards
> >>>>>>>>>
> >>>>>>>>> Jens
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And the actual script:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> import org.apache.nifi.flowfile.FlowFile
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> import java.util.stream.Collectors
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
> >>>>>>>>>
> >>>>>>>>> final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
> >>>>>>>>>
> >>>>>>>>>     .filter({ entry -> entry.getKey().startsWith("histogram.") })
> >>>>>>>>>
> >>>>>>>>>     .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
> >>>>>>>>>
> >>>>>>>>> return histogram;
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
> >>>>>>>>>
> >>>>>>>>> final Map<String, String> histogram = new HashMap<>();
> >>>>>>>>>
> >>>>>>>>> final int[] distribution = new int[256];
> >>>>>>>>>
> >>>>>>>>> Arrays.fill(distribution, 0);
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> long total = 0L;
> >>>>>>>>>
> >>>>>>>>> final byte[] buffer = new byte[8192];
> >>>>>>>>>
> >>>>>>>>> int len;
> >>>>>>>>>
> >>>>>>>>> while ((len = inStream.read(buffer)) > 0) {
> >>>>>>>>>
> >>>>>>>>>     for (int i=0; i < len; i++) {
> >>>>>>>>>
> >>>>>>>>>         final int val = buffer[i];
> >>>>>>>>>
> >>>>>>>>>         distribution[val]++;
> >>>>>>>>>
> >>>>>>>>>         total++;
> >>>>>>>>>
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> for (int i=0; i < 256; i++) {
> >>>>>>>>>
> >>>>>>>>>     histogram.put("histogram." + i, String.valueOf(distribution[i]));
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> histogram.put("histogram.totalBytes", String.valueOf(total));
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> return histogram;
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
> >>>>>>>>>
> >>>>>>>>> final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
> >>>>>>>>>
> >>>>>>>>> final Map<String, String> sorted = new TreeMap<>(previous)
> >>>>>>>>>
> >>>>>>>>> for (final Map.Entry<String, String> entry : sorted.entrySet()) {
> >>>>>>>>>
> >>>>>>>>>     final String key = entry.getKey();
> >>>>>>>>>
> >>>>>>>>>     final String previousValue = entry.getValue();
> >>>>>>>>>
> >>>>>>>>>     final String updatedValue = updated.get(entry.getKey())
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>     if (!Objects.equals(previousValue, updatedValue)) {
> >>>>>>>>>
> >>>>>>>>>         sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
> >>>>>>>>>
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> log.error(sb.toString());
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> def flowFile = session.get()
> >>>>>>>>>
> >>>>>>>>> if (flowFile == null) {
> >>>>>>>>>
> >>>>>>>>> return
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
> >>>>>>>>>
> >>>>>>>>> Map<String, String> histogram = null;
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> final InputStream inStream = session.read(flowFile);
> >>>>>>>>>
> >>>>>>>>> try {
> >>>>>>>>>
> >>>>>>>>> histogram = createHistogram(flowFile, inStream);
> >>>>>>>>>
> >>>>>>>>> } finally {
> >>>>>>>>>
> >>>>>>>>> inStream.close()
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> if (!previousHistogram.isEmpty()) {
> >>>>>>>>>
> >>>>>>>>> if (previousHistogram.equals(histogram)) {
> >>>>>>>>>
> >>>>>>>>>     log.info("Histograms match")
> >>>>>>>>>
> >>>>>>>>> } else {
> >>>>>>>>>
> >>>>>>>>>     logHistogramDifferences(previousHistogram, histogram)
> >>>>>>>>>
> >>>>>>>>>     session.transfer(flowFile, REL_FAILURE)
> >>>>>>>>>
> >>>>>>>>>     return;
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> flowFile = session.putAllAttributes(flowFile, histogram)
> >>>>>>>>>
> >>>>>>>>> session.transfer(flowFile, REL_SUCCESS)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
> >>>>>>>>>
> >>>>>>>>> histogram.0 = 280273
> >>>>>>>>>
> >>>>>>>>> histogram.1 = 2820
> >>>>>>>>>
> >>>>>>>>> histogram.2 = 48202
> >>>>>>>>>
> >>>>>>>>> histogram.3 = 3820
> >>>>>>>>>
> >>>>>>>>> …
> >>>>>>>>>
> >>>>>>>>> histogram.totalBytes = 1780928732
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> -Mark
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Still want to know more about your underlying storage system.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> You could also try updating nifi.properties and changing the following lines:
> >>>>>>>>>
> >>>>>>>>> nifi.flowfile.repository.always.sync=true
> >>>>>>>>>
> >>>>>>>>> nifi.content.repository.always.sync=true
> >>>>>>>>>
> >>>>>>>>> nifi.provenance.repository.always.sync=true
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It will hurt performance but can be useful/necessary on certain storage subsystems.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> Joe
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Dear Joe and Mark
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
> >>>>>>>>>
> >>>>>>>>> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
> >>>>>>>>>
> >>>>>>>>> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Kind regards
> >>>>>>>>>
> >>>>>>>>> Jens M. Kofoed
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Joe,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> To start from the last mail :-)
> >>>>>>>>>
> >>>>>>>>> All the repositories has it's own disk, and I'm using ext4
> >>>>>>>>>
> >>>>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
> >>>>>>>>>
> >>>>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
> >>>>>>>>>
> >>>>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> My test flow WITH sftp looks like this:
> >>>>>>>>>
> >>>>>>>>> <image.png>
> >>>>>>>>>
> >>>>>>>>> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
> >>>>>>>>>
> >>>>>>>>> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> About the Lineage:
> >>>>>>>>>
> >>>>>>>>> Are there a way to export all the lineage data? The export only generate a svg file.
> >>>>>>>>>
> >>>>>>>>> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
> >>>>>>>>>
> >>>>>>>>> The interesting steps are step 5 and 12.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Kind regards
> >>>>>>>>>
> >>>>>>>>> Jens M. Kofoed
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Also what type of file system/storage system are you running NiFi on
> >>>>>>>>>
> >>>>>>>>> in this case?  We'll need to know this for the NiFi
> >>>>>>>>>
> >>>>>>>>> content/flowfile/provenance repositories? Is it NFS?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And to further narrow this down
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> >>>>>>>>>
> >>>>>>>>> (2 files per node) and next process was a hashcontent before it run
> >>>>>>>>>
> >>>>>>>>> into a test loop. Where files are uploaded via PutSFTP to a test
> >>>>>>>>>
> >>>>>>>>> server, and downloaded again and recalculated the hash. I have had one
> >>>>>>>>>
> >>>>>>>>> issue after 3 days of running."
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So to be clear with GenerateFlowFile making these files and then you
> >>>>>>>>>
> >>>>>>>>> looping the content is wholly and fully exclusively within the control
> >>>>>>>>>
> >>>>>>>>> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> >>>>>>>>>
> >>>>>>>>> same files over and over in nifi itself you can make this happen or
> >>>>>>>>>
> >>>>>>>>> cannot?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> "After fetching a FlowFile-stream file and unpacked it back into NiFi
> >>>>>>>>>
> >>>>>>>>> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> >>>>>>>>>
> >>>>>>>>> exact same file. And got a new hash. That is what worry’s me.
> >>>>>>>>>
> >>>>>>>>> The fact that the same file can be recalculated and produce two
> >>>>>>>>>
> >>>>>>>>> different hashes, is very strange, but it happens. "
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Ok so to confirm you are saying that in each case this happens you see
> >>>>>>>>>
> >>>>>>>>> it first compute the wrong hash, but then if you retry the same
> >>>>>>>>>
> >>>>>>>>> flowfile it then provides the correct hash?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Can you please also show/share the lineage history for such a flow
> >>>>>>>>>
> >>>>>>>>> file then?  It should have events for the initial hash, second hash,
> >>>>>>>>>
> >>>>>>>>> the unpacking, trace to the original stream, etc...
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Dear Mark and Joe
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> >>>>>>>>>
> >>>>>>>>> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> >>>>>>>>>
> >>>>>>>>> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> >>>>>>>>>
> >>>>>>>>> Now the test flow is running without the Put/Fetch sftp processors.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> >>>>>>>>>
> >>>>>>>>> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Kind Regards
> >>>>>>>>>
> >>>>>>>>> Jens M. Kofoed
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks for sharing the images.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> -Mark
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Actually is this current loop test contained within a single nifi and there you see corruption happen?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Joe
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Joe
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> >>>>>>>>>
> >>>>>>>>> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> >>>>>>>>>
> >>>>>>>>> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Any suggestions for where to dig ?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>>
> >>>>>>>>> Jens M. Kofoed
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi Mark
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks for replaying and the suggestion to look at the content Claim.
> >>>>>>>>>
> >>>>>>>>> These 3 pictures is from the first attempt:
> >>>>>>>>>
> >>>>>>>>> <image.png>   <image.png>   <image.png>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Yesterday I realized that the content was still in the archive, so I could Replay the file.
> >>>>>>>>>
> >>>>>>>>> <image.png>
> >>>>>>>>>
> >>>>>>>>> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> >>>>>>>>>
> >>>>>>>>> <image.png>   <image.png>   <image.png>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> >>>>>>>>>
> >>>>>>>>> <image.png>   <image.png>   <image.png>
> >>>>>>>>>
> >>>>>>>>> Here the content Claim is all the same.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> >>>>>>>>>
> >>>>>>>>> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> >>>>>>>>>
> >>>>>>>>> What can influence the same processor to calculate 2 different sha256 on the exact same content???
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>>
> >>>>>>>>> Jens M. Kofoed
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jens,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> >>>>>>>>>
> >>>>>>>>> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> -Mark
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Dear NIFI Users
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> >>>>>>>>>
> >>>>>>>>> The background:
> >>>>>>>>>
> >>>>>>>>> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ----
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> >>>>>>>>>
> >>>>>>>>> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> >>>>>>>>>
> >>>>>>>>> <image.png>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> >>>>>>>>>
> >>>>>>>>> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> System 1:
> >>>>>>>>>
> >>>>>>>>> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> >>>>>>>>>
> >>>>>>>>> The file is exported as a "FlowFile Stream, v3" to System 2
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> SYSTEM 2:
> >>>>>>>>>
> >>>>>>>>> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>>>>>>>>
> >>>>>>>>> <image.png>
> >>>>>>>>>
> >>>>>>>>> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>>>>>>>>
> >>>>>>>>> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>>>>>>>>
> >>>>>>>>> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> >>>>>>>>>
> >>>>>>>>> <image.png>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> How on earth can this happen???
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Kind Regards
> >>>>>>>>>
> >>>>>>>>> Jens M. Kofoed
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> <Repro.json>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> <Try_to_recreate_Jens_Challenge.json>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>
> >>>>
> >

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Mark

All the files in my testflow are 1GB files. But it happens in my production flow with different file sizes. 

When these issues have happened, I have the flowfile routed to an updateAttribute process which is disabled. Just to keep the file in a queue. Enable the process and sent the file back to a new hash calculation, the file is OK. So I don’t think the test with backup and compare makes any sense to do. 

Regards 
Jens

> Den 3. nov. 2021 kl. 15.57 skrev Mark Payne <ma...@hotmail.com>:
> 
> So what I found interesting about the histogram output was that in each case, the input file was 1 GB. The number of bytes that differed between the ‘good’ and ‘bad’ hashes was something like 500-700 bytes whose values were different. But the values ranged significantly. There was no indication that the type of thing we’ve seen with NFS mounts was happening, where data was nulled out until received and then updated. If that had been the case we’d have seen the NUL byte (or some other value) have a very significant change in the histogram, but we didn’t see that.
> 
> So a couple more ideas that I think can be useful.
> 
> 1) Which garbage collector are you using? It’s configured in the bootstrap.conf file
> 
> 2) We can try to definitively prove out whether the content on the disk is changing or if there’s an issue reading the content. To do this:
> 
> 1. Stop all processors.
> 2. Shutdown nifi
> 3. rm -rf content_repository; rm -rf flowfile_repository   (warning, this will delete all FlowFiles & content, so only do this on a dev/test system where you’re comfortable deleting it!)
> 4. Start nifi
> 5. Let exactly 1 FlowFile into your flow.
> 6. While it is looping through, create a copy of your entire Content Repository: cp -r content_repository content_backup1; zip content_backup1.zip content_backup1
> 7. Wait for the hashes to differ
> 8. Create another copy of the Content Repository: cp -r content_repository content_backup2
> 9. Find the files within the content_backup1 and content_backup2 and compare them to see if they are identical. Would recommend comparing them using each of the 3 methods: sha256, sha512, diff
> 
> This should make it pretty clear that either:
> (1) the issue resides in the software: either NiFi or the JVM
> (2) the issue resides outside of the software: the disk, the disk driver, the operating system, the VM hypervisor, etc.
> 
> Thanks
> -Mark
> 
>> On Nov 3, 2021, at 10:44 AM, Joe Witt <jo...@gmail.com> wrote:
>> 
>> Jens,
>> 
>> 184 hours (7.6 days) in and zero issues.
>> 
>> Will need to turn this off soon but wanted to give a final update.
>> Looks great.  Given the information on your system there appears to be
>> something we dont understand related to the virtual file system
>> involved or something.
>> 
>> Thanks
>> 
>>> On Tue, Nov 2, 2021 at 10:55 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>> 
>>> Hi Mark
>>> 
>>> Of course, sorry :-)  By looking at the error messages, I can see that it is only the histograms which has differences which is listed. And all 3 have the first issue at histogram.9. Don't know what that mean
>>> 
>>> /Jens
>>> 
>>> Here are the error log:
>>> 2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are differences in the histogram
>>> Byte Value: histogram.10, Previous Count: 11926720, New Count: 11926721
>>> Byte Value: histogram.100, Previous Count: 11927504, New Count: 11927503
>>> Byte Value: histogram.101, Previous Count: 11925396, New Count: 11925407
>>> Byte Value: histogram.102, Previous Count: 11929923, New Count: 11929941
>>> Byte Value: histogram.103, Previous Count: 11931596, New Count: 11931591
>>> Byte Value: histogram.104, Previous Count: 11929071, New Count: 11929064
>>> Byte Value: histogram.105, Previous Count: 11931365, New Count: 11931348
>>> Byte Value: histogram.106, Previous Count: 11928661, New Count: 11928645
>>> Byte Value: histogram.107, Previous Count: 11929864, New Count: 11929866
>>> Byte Value: histogram.108, Previous Count: 11931611, New Count: 11931642
>>> Byte Value: histogram.109, Previous Count: 11932758, New Count: 11932763
>>> Byte Value: histogram.110, Previous Count: 11927893, New Count: 11927895
>>> Byte Value: histogram.111, Previous Count: 11933519, New Count: 11933522
>>> Byte Value: histogram.112, Previous Count: 11931392, New Count: 11931397
>>> Byte Value: histogram.113, Previous Count: 11928534, New Count: 11928548
>>> Byte Value: histogram.114, Previous Count: 11936879, New Count: 11936874
>>> Byte Value: histogram.115, Previous Count: 11932818, New Count: 11932804
>>> Byte Value: histogram.117, Previous Count: 11929143, New Count: 11929151
>>> Byte Value: histogram.118, Previous Count: 11931854, New Count: 11931829
>>> Byte Value: histogram.119, Previous Count: 11926333, New Count: 11926327
>>> Byte Value: histogram.120, Previous Count: 11928731, New Count: 11928740
>>> Byte Value: histogram.121, Previous Count: 11931149, New Count: 11931162
>>> Byte Value: histogram.122, Previous Count: 11926725, New Count: 11926733
>>> Byte Value: histogram.32, Previous Count: 11930422, New Count: 11930425
>>> Byte Value: histogram.33, Previous Count: 11934311, New Count: 11934313
>>> Byte Value: histogram.34, Previous Count: 11930459, New Count: 11930446
>>> Byte Value: histogram.35, Previous Count: 11924776, New Count: 11924758
>>> Byte Value: histogram.36, Previous Count: 11924186, New Count: 11924183
>>> Byte Value: histogram.37, Previous Count: 11928616, New Count: 11928627
>>> Byte Value: histogram.38, Previous Count: 11929474, New Count: 11929490
>>> Byte Value: histogram.39, Previous Count: 11929607, New Count: 11929600
>>> Byte Value: histogram.40, Previous Count: 11928053, New Count: 11928048
>>> Byte Value: histogram.41, Previous Count: 11930402, New Count: 11930399
>>> Byte Value: histogram.42, Previous Count: 11926830, New Count: 11926846
>>> Byte Value: histogram.44, Previous Count: 11932536, New Count: 11932538
>>> Byte Value: histogram.45, Previous Count: 11931053, New Count: 11931044
>>> Byte Value: histogram.46, Previous Count: 11930008, New Count: 11930011
>>> Byte Value: histogram.47, Previous Count: 11927747, New Count: 11927734
>>> Byte Value: histogram.48, Previous Count: 11936055, New Count: 11936057
>>> Byte Value: histogram.49, Previous Count: 11931471, New Count: 11931474
>>> Byte Value: histogram.50, Previous Count: 11931921, New Count: 11931908
>>> Byte Value: histogram.51, Previous Count: 11929643, New Count: 11929637
>>> Byte Value: histogram.52, Previous Count: 11923847, New Count: 11923854
>>> Byte Value: histogram.53, Previous Count: 11927311, New Count: 11927303
>>> Byte Value: histogram.54, Previous Count: 11933754, New Count: 11933766
>>> Byte Value: histogram.55, Previous Count: 11925964, New Count: 11925970
>>> Byte Value: histogram.56, Previous Count: 11928872, New Count: 11928873
>>> Byte Value: histogram.57, Previous Count: 11931124, New Count: 11931127
>>> Byte Value: histogram.58, Previous Count: 11928474, New Count: 11928477
>>> Byte Value: histogram.59, Previous Count: 11925814, New Count: 11925812
>>> Byte Value: histogram.60, Previous Count: 11933978, New Count: 11933991
>>> Byte Value: histogram.61, Previous Count: 11934136, New Count: 11934123
>>> Byte Value: histogram.62, Previous Count: 11932016, New Count: 11932011
>>> Byte Value: histogram.63, Previous Count: 23864588, New Count: 23864584
>>> Byte Value: histogram.64, Previous Count: 11924792, New Count: 11924789
>>> Byte Value: histogram.65, Previous Count: 11934789, New Count: 11934797
>>> Byte Value: histogram.66, Previous Count: 11933047, New Count: 11933044
>>> Byte Value: histogram.67, Previous Count: 11931899, New Count: 11931909
>>> Byte Value: histogram.68, Previous Count: 11935615, New Count: 11935609
>>> Byte Value: histogram.69, Previous Count: 11927249, New Count: 11927239
>>> Byte Value: histogram.70, Previous Count: 11933276, New Count: 11933274
>>> Byte Value: histogram.71, Previous Count: 11927953, New Count: 11927969
>>> Byte Value: histogram.72, Previous Count: 11929275, New Count: 11929266
>>> Byte Value: histogram.73, Previous Count: 11930292, New Count: 11930306
>>> Byte Value: histogram.74, Previous Count: 11935428, New Count: 11935427
>>> Byte Value: histogram.75, Previous Count: 11930317, New Count: 11930307
>>> Byte Value: histogram.76, Previous Count: 11935737, New Count: 11935726
>>> Byte Value: histogram.77, Previous Count: 11932127, New Count: 11932125
>>> Byte Value: histogram.78, Previous Count: 11932344, New Count: 11932349
>>> Byte Value: histogram.79, Previous Count: 11932094, New Count: 11932100
>>> Byte Value: histogram.80, Previous Count: 11930688, New Count: 11930687
>>> Byte Value: histogram.81, Previous Count: 11928415, New Count: 11928416
>>> Byte Value: histogram.82, Previous Count: 11931559, New Count: 11931542
>>> Byte Value: histogram.83, Previous Count: 11934192, New Count: 11934176
>>> Byte Value: histogram.84, Previous Count: 11927224, New Count: 11927231
>>> Byte Value: histogram.85, Previous Count: 11929491, New Count: 11929484
>>> Byte Value: histogram.87, Previous Count: 11932201, New Count: 11932190
>>> Byte Value: histogram.88, Previous Count: 11930694, New Count: 11930680
>>> Byte Value: histogram.89, Previous Count: 11936439, New Count: 11936448
>>> Byte Value: histogram.9, Previous Count: 11933187, New Count: 11933193
>>> Byte Value: histogram.90, Previous Count: 11926445, New Count: 11926455
>>> Byte Value: histogram.94, Previous Count: 11931596, New Count: 11931609
>>> Byte Value: histogram.95, Previous Count: 11929379, New Count: 11929384
>>> Byte Value: histogram.97, Previous Count: 11928864, New Count: 11928874
>>> Byte Value: histogram.98, Previous Count: 11924738, New Count: 11924729
>>> Byte Value: histogram.99, Previous Count: 11930062, New Count: 11930059
>>> 
>>> 2021-11-01 22:10:02,765 ERROR [Timer-Driven Process Thread-9] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are differences in the histogram
>>> Byte Value: histogram.10, Previous Count: 11932402, New Count: 11932407
>>> Byte Value: histogram.100, Previous Count: 11927531, New Count: 11927541
>>> Byte Value: histogram.101, Previous Count: 11928454, New Count: 11928430
>>> Byte Value: histogram.102, Previous Count: 11934432, New Count: 11934439
>>> Byte Value: histogram.103, Previous Count: 11924623, New Count: 11924633
>>> Byte Value: histogram.104, Previous Count: 11934492, New Count: 11934474
>>> Byte Value: histogram.105, Previous Count: 11934585, New Count: 11934591
>>> Byte Value: histogram.106, Previous Count: 11928955, New Count: 11928948
>>> Byte Value: histogram.108, Previous Count: 11930139, New Count: 11930140
>>> Byte Value: histogram.109, Previous Count: 11929325, New Count: 11929321
>>> Byte Value: histogram.110, Previous Count: 11930486, New Count: 11930478
>>> Byte Value: histogram.111, Previous Count: 11933517, New Count: 11933508
>>> Byte Value: histogram.112, Previous Count: 11928334, New Count: 11928339
>>> Byte Value: histogram.114, Previous Count: 11929222, New Count: 11929213
>>> Byte Value: histogram.116, Previous Count: 11931182, New Count: 11931188
>>> Byte Value: histogram.117, Previous Count: 11933407, New Count: 11933402
>>> Byte Value: histogram.118, Previous Count: 11932709, New Count: 11932705
>>> Byte Value: histogram.120, Previous Count: 11933700, New Count: 11933708
>>> Byte Value: histogram.121, Previous Count: 11929803, New Count: 11929801
>>> Byte Value: histogram.122, Previous Count: 11930218, New Count: 11930220
>>> Byte Value: histogram.32, Previous Count: 11924458, New Count: 11924469
>>> Byte Value: histogram.33, Previous Count: 11934243, New Count: 11934248
>>> Byte Value: histogram.34, Previous Count: 11930696, New Count: 11930700
>>> Byte Value: histogram.35, Previous Count: 11925574, New Count: 11925577
>>> Byte Value: histogram.36, Previous Count: 11929198, New Count: 11929187
>>> Byte Value: histogram.37, Previous Count: 11928146, New Count: 11928143
>>> Byte Value: histogram.38, Previous Count: 11932505, New Count: 11932510
>>> Byte Value: histogram.39, Previous Count: 11929406, New Count: 11929412
>>> Byte Value: histogram.40, Previous Count: 11930100, New Count: 11930098
>>> Byte Value: histogram.41, Previous Count: 11930867, New Count: 11930872
>>> Byte Value: histogram.42, Previous Count: 11930796, New Count: 11930793
>>> Byte Value: histogram.43, Previous Count: 11930796, New Count: 11930789
>>> Byte Value: histogram.44, Previous Count: 11921866, New Count: 11921865
>>> Byte Value: histogram.45, Previous Count: 11935682, New Count: 11935699
>>> Byte Value: histogram.46, Previous Count: 11930075, New Count: 11930073
>>> Byte Value: histogram.47, Previous Count: 11928169, New Count: 11928165
>>> Byte Value: histogram.48, Previous Count: 11933490, New Count: 11933478
>>> Byte Value: histogram.49, Previous Count: 11932174, New Count: 11932180
>>> Byte Value: histogram.50, Previous Count: 11933255, New Count: 11933239
>>> Byte Value: histogram.51, Previous Count: 11934009, New Count: 11934013
>>> Byte Value: histogram.52, Previous Count: 11928361, New Count: 11928367
>>> Byte Value: histogram.53, Previous Count: 11927626, New Count: 11927627
>>> Byte Value: histogram.54, Previous Count: 11931611, New Count: 11931617
>>> Byte Value: histogram.55, Previous Count: 11930755, New Count: 11930746
>>> Byte Value: histogram.56, Previous Count: 11933823, New Count: 11933824
>>> Byte Value: histogram.57, Previous Count: 11922508, New Count: 11922510
>>> Byte Value: histogram.58, Previous Count: 11930384, New Count: 11930362
>>> Byte Value: histogram.59, Previous Count: 11929805, New Count: 11929820
>>> Byte Value: histogram.60, Previous Count: 11930064, New Count: 11930055
>>> Byte Value: histogram.61, Previous Count: 11926761, New Count: 11926762
>>> Byte Value: histogram.62, Previous Count: 11927605, New Count: 11927604
>>> Byte Value: histogram.63, Previous Count: 23858926, New Count: 23858913
>>> Byte Value: histogram.64, Previous Count: 11929516, New Count: 11929512
>>> Byte Value: histogram.65, Previous Count: 11930217, New Count: 11930223
>>> Byte Value: histogram.66, Previous Count: 11930478, New Count: 11930481
>>> Byte Value: histogram.67, Previous Count: 11939855, New Count: 11939858
>>> Byte Value: histogram.68, Previous Count: 11927850, New Count: 11927852
>>> Byte Value: histogram.69, Previous Count: 11931154, New Count: 11931175
>>> Byte Value: histogram.70, Previous Count: 11935374, New Count: 11935369
>>> Byte Value: histogram.71, Previous Count: 11930754, New Count: 11930751
>>> Byte Value: histogram.72, Previous Count: 11928304, New Count: 11928318
>>> Byte Value: histogram.73, Previous Count: 11931772, New Count: 11931766
>>> Byte Value: histogram.74, Previous Count: 11939417, New Count: 11939426
>>> Byte Value: histogram.75, Previous Count: 11930712, New Count: 11930718
>>> Byte Value: histogram.76, Previous Count: 11933331, New Count: 11933346
>>> Byte Value: histogram.77, Previous Count: 11931279, New Count: 11931272
>>> Byte Value: histogram.78, Previous Count: 11928276, New Count: 11928290
>>> Byte Value: histogram.79, Previous Count: 11930071, New Count: 11930067
>>> Byte Value: histogram.80, Previous Count: 11927830, New Count: 11927825
>>> Byte Value: histogram.81, Previous Count: 11931213, New Count: 11931206
>>> Byte Value: histogram.82, Previous Count: 11930964, New Count: 11930958
>>> Byte Value: histogram.83, Previous Count: 11928973, New Count: 11928966
>>> Byte Value: histogram.84, Previous Count: 11934325, New Count: 11934331
>>> Byte Value: histogram.85, Previous Count: 11929658, New Count: 11929654
>>> Byte Value: histogram.86, Previous Count: 11924667, New Count: 11924666
>>> Byte Value: histogram.87, Previous Count: 11931100, New Count: 11931106
>>> Byte Value: histogram.88, Previous Count: 11930252, New Count: 11930248
>>> Byte Value: histogram.89, Previous Count: 11927281, New Count: 11927299
>>> Byte Value: histogram.9, Previous Count: 11932848, New Count: 11932851
>>> Byte Value: histogram.90, Previous Count: 11930398, New Count: 11930399
>>> Byte Value: histogram.94, Previous Count: 11928720, New Count: 11928715
>>> Byte Value: histogram.95, Previous Count: 11928988, New Count: 11928977
>>> Byte Value: histogram.97, Previous Count: 11931423, New Count: 11931426
>>> Byte Value: histogram.98, Previous Count: 11928181, New Count: 11928184
>>> Byte Value: histogram.99, Previous Count: 11935549, New Count: 11935542
>>> 
>>> 2021-11-01 22:23:08,989 ERROR [Timer-Driven Process Thread-10] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=24d13930-49e8-1062-9a2c-943118738138] There are differences in the histogram
>>> Byte Value: histogram.10, Previous Count: 11930417, New Count: 11930411
>>> Byte Value: histogram.100, Previous Count: 11926739, New Count: 11926755
>>> Byte Value: histogram.101, Previous Count: 11930580, New Count: 11930574
>>> Byte Value: histogram.102, Previous Count: 11928210, New Count: 11928202
>>> Byte Value: histogram.103, Previous Count: 11935300, New Count: 11935297
>>> Byte Value: histogram.104, Previous Count: 11925804, New Count: 11925820
>>> Byte Value: histogram.105, Previous Count: 11931023, New Count: 11931012
>>> Byte Value: histogram.106, Previous Count: 11932342, New Count: 11932344
>>> Byte Value: histogram.108, Previous Count: 11930098, New Count: 11930106
>>> Byte Value: histogram.109, Previous Count: 11930759, New Count: 11930750
>>> Byte Value: histogram.110, Previous Count: 11934343, New Count: 11934352
>>> Byte Value: histogram.111, Previous Count: 11935775, New Count: 11935782
>>> Byte Value: histogram.112, Previous Count: 11933877, New Count: 11933884
>>> Byte Value: histogram.113, Previous Count: 11926675, New Count: 11926674
>>> Byte Value: histogram.114, Previous Count: 11929332, New Count: 11929336
>>> Byte Value: histogram.115, Previous Count: 11928876, New Count: 11928878
>>> Byte Value: histogram.116, Previous Count: 11927819, New Count: 11927833
>>> Byte Value: histogram.117, Previous Count: 11932657, New Count: 11932638
>>> Byte Value: histogram.118, Previous Count: 11933508, New Count: 11933507
>>> Byte Value: histogram.119, Previous Count: 11928808, New Count: 11928821
>>> Byte Value: histogram.120, Previous Count: 11937532, New Count: 11937528
>>> Byte Value: histogram.121, Previous Count: 11926907, New Count: 11926921
>>> Byte Value: histogram.32, Previous Count: 11929486, New Count: 11929489
>>> Byte Value: histogram.33, Previous Count: 11930737, New Count: 11930741
>>> Byte Value: histogram.34, Previous Count: 11931092, New Count: 11931086
>>> Byte Value: histogram.36, Previous Count: 11927605, New Count: 11927615
>>> Byte Value: histogram.37, Previous Count: 11930735, New Count: 11930745
>>> Byte Value: histogram.38, Previous Count: 11932174, New Count: 11932178
>>> Byte Value: histogram.39, Previous Count: 11936180, New Count: 11936182
>>> Byte Value: histogram.40, Previous Count: 11931666, New Count: 11931676
>>> Byte Value: histogram.41, Previous Count: 11927043, New Count: 11927034
>>> Byte Value: histogram.42, Previous Count: 11929044, New Count: 11929042
>>> Byte Value: histogram.43, Previous Count: 11934104, New Count: 11934098
>>> Byte Value: histogram.44, Previous Count: 11936337, New Count: 11936346
>>> Byte Value: histogram.45, Previous Count: 11935580, New Count: 11935582
>>> Byte Value: histogram.46, Previous Count: 11929598, New Count: 11929599
>>> Byte Value: histogram.47, Previous Count: 11934083, New Count: 11934085
>>> Byte Value: histogram.48, Previous Count: 11928858, New Count: 11928860
>>> Byte Value: histogram.49, Previous Count: 11931098, New Count: 11931113
>>> Byte Value: histogram.50, Previous Count: 11930618, New Count: 11930614
>>> Byte Value: histogram.51, Previous Count: 11925429, New Count: 11925435
>>> Byte Value: histogram.52, Previous Count: 11929741, New Count: 11929733
>>> Byte Value: histogram.53, Previous Count: 11934160, New Count: 11934155
>>> Byte Value: histogram.54, Previous Count: 11931999, New Count: 11931980
>>> Byte Value: histogram.55, Previous Count: 11930465, New Count: 11930477
>>> Byte Value: histogram.56, Previous Count: 11926194, New Count: 11926190
>>> Byte Value: histogram.57, Previous Count: 11926386, New Count: 11926381
>>> Byte Value: histogram.58, Previous Count: 11924871, New Count: 11924865
>>> Byte Value: histogram.59, Previous Count: 11929331, New Count: 11929326
>>> Byte Value: histogram.60, Previous Count: 11926951, New Count: 11926943
>>> Byte Value: histogram.61, Previous Count: 11928631, New Count: 11928619
>>> Byte Value: histogram.62, Previous Count: 11927549, New Count: 11927553
>>> Byte Value: histogram.63, Previous Count: 23856730, New Count: 23856718
>>> Byte Value: histogram.64, Previous Count: 11930288, New Count: 11930293
>>> Byte Value: histogram.65, Previous Count: 11931523, New Count: 11931527
>>> Byte Value: histogram.66, Previous Count: 11932821, New Count: 11932818
>>> Byte Value: histogram.67, Previous Count: 11932509, New Count: 11932510
>>> Byte Value: histogram.68, Previous Count: 11929613, New Count: 11929614
>>> Byte Value: histogram.69, Previous Count: 11928651, New Count: 11928654
>>> Byte Value: histogram.70, Previous Count: 11929253, New Count: 11929247
>>> Byte Value: histogram.71, Previous Count: 11931521, New Count: 11931512
>>> Byte Value: histogram.72, Previous Count: 11925805, New Count: 11925808
>>> Byte Value: histogram.73, Previous Count: 11934833, New Count: 11934826
>>> Byte Value: histogram.74, Previous Count: 11928314, New Count: 11928312
>>> Byte Value: histogram.75, Previous Count: 11923854, New Count: 11923863
>>> Byte Value: histogram.76, Previous Count: 11930892, New Count: 11930898
>>> Byte Value: histogram.77, Previous Count: 11927528, New Count: 11927525
>>> Byte Value: histogram.78, Previous Count: 11932850, New Count: 11932857
>>> Byte Value: histogram.79, Previous Count: 11934471, New Count: 11934461
>>> Byte Value: histogram.80, Previous Count: 11925707, New Count: 11925714
>>> Byte Value: histogram.81, Previous Count: 11929213, New Count: 11929206
>>> Byte Value: histogram.82, Previous Count: 11931334, New Count: 11931323
>>> Byte Value: histogram.83, Previous Count: 11936739, New Count: 11936732
>>> Byte Value: histogram.84, Previous Count: 11927855, New Count: 11927832
>>> Byte Value: histogram.85, Previous Count: 11931668, New Count: 11931665
>>> Byte Value: histogram.86, Previous Count: 11928609, New Count: 11928604
>>> Byte Value: histogram.87, Previous Count: 11931930, New Count: 11931933
>>> Byte Value: histogram.88, Previous Count: 11934341, New Count: 11934345
>>> Byte Value: histogram.89, Previous Count: 11927519, New Count: 11927518
>>> Byte Value: histogram.9, Previous Count: 11928004, New Count: 11928001
>>> Byte Value: histogram.90, Previous Count: 11933502, New Count: 11933517
>>> Byte Value: histogram.94, Previous Count: 11932024, New Count: 11932035
>>> Byte Value: histogram.95, Previous Count: 11932693, New Count: 11932679
>>> Byte Value: histogram.97, Previous Count: 11928428, New Count: 11928424
>>> Byte Value: histogram.98, Previous Count: 11933195, New Count: 11933196
>>> Byte Value: histogram.99, Previous Count: 11924273, New Count: 11924282
>>> 
>>>> Den tir. 2. nov. 2021 kl. 15.41 skrev Mark Payne <ma...@hotmail.com>:
>>>> 
>>>> Jens,
>>>> 
>>>> The histograms, in and of themselves, are not very interesting. The interesting thing would be the difference in the histogram before & after the hash. Can you provide the ERROR level logs generated by the ExecuteScript? That’s what is of interest.
>>>> 
>>>> Thanks
>>>> -Mark
>>>> 
>>>> 
>>>> On Nov 2, 2021, at 1:35 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>>> 
>>>> Hi Mark and Joe
>>>> 
>>>> Yesterday morning I implemented Mark's script in my 2 testflows. One testflow using sftp the other MergeContent/UnpackContent. Both testflow are running at a test cluster with 3 nodes and NIFI 1.14.0
>>>> The 1st flow with sftp have had 1 file going into the failure queue after about 16 hours.
>>>> The 2nd flow have had 2 files  going into the failure queue after about 15 and 17 hours.
>>>> 
>>>> There are definitely something going wrongs in my setup, but I can't figure out what.
>>>> 
>>>> Information from file 1:
>>>> histogram.0;0
>>>> histogram.1;0
>>>> histogram.10;11926720
>>>> histogram.100;11927504
>>>> histogram.101;11925396
>>>> histogram.102;11929923
>>>> histogram.103;11931596
>>>> histogram.104;11929071
>>>> histogram.105;11931365
>>>> histogram.106;11928661
>>>> histogram.107;11929864
>>>> histogram.108;11931611
>>>> histogram.109;11932758
>>>> histogram.11;0
>>>> histogram.110;11927893
>>>> histogram.111;11933519
>>>> histogram.112;11931392
>>>> histogram.113;11928534
>>>> histogram.114;11936879
>>>> histogram.115;11932818
>>>> histogram.116;11934767
>>>> histogram.117;11929143
>>>> histogram.118;11931854
>>>> histogram.119;11926333
>>>> histogram.12;0
>>>> histogram.120;11928731
>>>> histogram.121;11931149
>>>> histogram.122;11926725
>>>> histogram.123;0
>>>> histogram.124;0
>>>> histogram.125;0
>>>> histogram.126;0
>>>> histogram.127;0
>>>> histogram.128;0
>>>> histogram.129;0
>>>> histogram.13;0
>>>> histogram.130;0
>>>> histogram.131;0
>>>> histogram.132;0
>>>> histogram.133;0
>>>> histogram.134;0
>>>> histogram.135;0
>>>> histogram.136;0
>>>> histogram.137;0
>>>> histogram.138;0
>>>> histogram.139;0
>>>> histogram.14;0
>>>> histogram.140;0
>>>> histogram.141;0
>>>> histogram.142;0
>>>> histogram.143;0
>>>> histogram.144;0
>>>> histogram.145;0
>>>> histogram.146;0
>>>> histogram.147;0
>>>> histogram.148;0
>>>> histogram.149;0
>>>> histogram.15;0
>>>> histogram.150;0
>>>> histogram.151;0
>>>> histogram.152;0
>>>> histogram.153;0
>>>> histogram.154;0
>>>> histogram.155;0
>>>> histogram.156;0
>>>> histogram.157;0
>>>> histogram.158;0
>>>> histogram.159;0
>>>> histogram.16;0
>>>> histogram.160;0
>>>> histogram.161;0
>>>> histogram.162;0
>>>> histogram.163;0
>>>> histogram.164;0
>>>> histogram.165;0
>>>> histogram.166;0
>>>> histogram.167;0
>>>> histogram.168;0
>>>> histogram.169;0
>>>> histogram.17;0
>>>> histogram.170;0
>>>> histogram.171;0
>>>> histogram.172;0
>>>> histogram.173;0
>>>> histogram.174;0
>>>> histogram.175;0
>>>> histogram.176;0
>>>> histogram.177;0
>>>> histogram.178;0
>>>> histogram.179;0
>>>> histogram.18;0
>>>> histogram.180;0
>>>> histogram.181;0
>>>> histogram.182;0
>>>> histogram.183;0
>>>> histogram.184;0
>>>> histogram.185;0
>>>> histogram.186;0
>>>> histogram.187;0
>>>> histogram.188;0
>>>> histogram.189;0
>>>> histogram.19;0
>>>> histogram.190;0
>>>> histogram.191;0
>>>> histogram.192;0
>>>> histogram.193;0
>>>> histogram.194;0
>>>> histogram.195;0
>>>> histogram.196;0
>>>> histogram.197;0
>>>> histogram.198;0
>>>> histogram.199;0
>>>> histogram.2;0
>>>> histogram.20;0
>>>> histogram.200;0
>>>> histogram.201;0
>>>> histogram.202;0
>>>> histogram.203;0
>>>> histogram.204;0
>>>> histogram.205;0
>>>> histogram.206;0
>>>> histogram.207;0
>>>> histogram.208;0
>>>> histogram.209;0
>>>> histogram.21;0
>>>> histogram.210;0
>>>> histogram.211;0
>>>> histogram.212;0
>>>> histogram.213;0
>>>> histogram.214;0
>>>> histogram.215;0
>>>> histogram.216;0
>>>> histogram.217;0
>>>> histogram.218;0
>>>> histogram.219;0
>>>> histogram.22;0
>>>> histogram.220;0
>>>> histogram.221;0
>>>> histogram.222;0
>>>> histogram.223;0
>>>> histogram.224;0
>>>> histogram.225;0
>>>> histogram.226;0
>>>> histogram.227;0
>>>> histogram.228;0
>>>> histogram.229;0
>>>> histogram.23;0
>>>> histogram.230;0
>>>> histogram.231;0
>>>> histogram.232;0
>>>> histogram.233;0
>>>> histogram.234;0
>>>> histogram.235;0
>>>> histogram.236;0
>>>> histogram.237;0
>>>> histogram.238;0
>>>> histogram.239;0
>>>> histogram.24;0
>>>> histogram.240;0
>>>> histogram.241;0
>>>> histogram.242;0
>>>> histogram.243;0
>>>> histogram.244;0
>>>> histogram.245;0
>>>> histogram.246;0
>>>> histogram.247;0
>>>> histogram.248;0
>>>> histogram.249;0
>>>> histogram.25;0
>>>> histogram.250;0
>>>> histogram.251;0
>>>> histogram.252;0
>>>> histogram.253;0
>>>> histogram.254;0
>>>> histogram.255;0
>>>> histogram.26;0
>>>> histogram.27;0
>>>> histogram.28;0
>>>> histogram.29;0
>>>> histogram.3;0
>>>> histogram.30;0
>>>> histogram.31;0
>>>> histogram.32;11930422
>>>> histogram.33;11934311
>>>> histogram.34;11930459
>>>> histogram.35;11924776
>>>> histogram.36;11924186
>>>> histogram.37;11928616
>>>> histogram.38;11929474
>>>> histogram.39;11929607
>>>> histogram.4;0
>>>> histogram.40;11928053
>>>> histogram.41;11930402
>>>> histogram.42;11926830
>>>> histogram.43;11938138
>>>> histogram.44;11932536
>>>> histogram.45;11931053
>>>> histogram.46;11930008
>>>> histogram.47;11927747
>>>> histogram.48;11936055
>>>> histogram.49;11931471
>>>> histogram.5;0
>>>> histogram.50;11931921
>>>> histogram.51;11929643
>>>> histogram.52;11923847
>>>> histogram.53;11927311
>>>> histogram.54;11933754
>>>> histogram.55;11925964
>>>> histogram.56;11928872
>>>> histogram.57;11931124
>>>> histogram.58;11928474
>>>> histogram.59;11925814
>>>> histogram.6;0
>>>> histogram.60;11933978
>>>> histogram.61;11934136
>>>> histogram.62;11932016
>>>> histogram.63;23864588
>>>> histogram.64;11924792
>>>> histogram.65;11934789
>>>> histogram.66;11933047
>>>> histogram.67;11931899
>>>> histogram.68;11935615
>>>> histogram.69;11927249
>>>> histogram.7;0
>>>> histogram.70;11933276
>>>> histogram.71;11927953
>>>> histogram.72;11929275
>>>> histogram.73;11930292
>>>> histogram.74;11935428
>>>> histogram.75;11930317
>>>> histogram.76;11935737
>>>> histogram.77;11932127
>>>> histogram.78;11932344
>>>> histogram.79;11932094
>>>> histogram.8;0
>>>> histogram.80;11930688
>>>> histogram.81;11928415
>>>> histogram.82;11931559
>>>> histogram.83;11934192
>>>> histogram.84;11927224
>>>> histogram.85;11929491
>>>> histogram.86;11930624
>>>> histogram.87;11932201
>>>> histogram.88;11930694
>>>> histogram.89;11936439
>>>> histogram.9;11933187
>>>> histogram.90;11926445
>>>> histogram.91;0
>>>> histogram.92;0
>>>> histogram.93;0
>>>> histogram.94;11931596
>>>> histogram.95;11929379
>>>> histogram.96;0
>>>> histogram.97;11928864
>>>> histogram.98;11924738
>>>> histogram.99;11930062
>>>> histogram.totalBytes;1073741824
>>>> 
>>>> File 2:
>>>> histogram.0;0
>>>> histogram.1;0
>>>> histogram.10;11932402
>>>> histogram.100;11927531
>>>> histogram.101;11928454
>>>> histogram.102;11934432
>>>> histogram.103;11924623
>>>> histogram.104;11934492
>>>> histogram.105;11934585
>>>> histogram.106;11928955
>>>> histogram.107;11928651
>>>> histogram.108;11930139
>>>> histogram.109;11929325
>>>> histogram.11;0
>>>> histogram.110;11930486
>>>> histogram.111;11933517
>>>> histogram.112;11928334
>>>> histogram.113;11927798
>>>> histogram.114;11929222
>>>> histogram.115;11932057
>>>> histogram.116;11931182
>>>> histogram.117;11933407
>>>> histogram.118;11932709
>>>> histogram.119;11931338
>>>> histogram.12;0
>>>> histogram.120;11933700
>>>> histogram.121;11929803
>>>> histogram.122;11930218
>>>> histogram.123;0
>>>> histogram.124;0
>>>> histogram.125;0
>>>> histogram.126;0
>>>> histogram.127;0
>>>> histogram.128;0
>>>> histogram.129;0
>>>> histogram.13;0
>>>> histogram.130;0
>>>> histogram.131;0
>>>> histogram.132;0
>>>> histogram.133;0
>>>> histogram.134;0
>>>> histogram.135;0
>>>> histogram.136;0
>>>> histogram.137;0
>>>> histogram.138;0
>>>> histogram.139;0
>>>> histogram.14;0
>>>> histogram.140;0
>>>> histogram.141;0
>>>> histogram.142;0
>>>> histogram.143;0
>>>> histogram.144;0
>>>> histogram.145;0
>>>> histogram.146;0
>>>> histogram.147;0
>>>> histogram.148;0
>>>> histogram.149;0
>>>> histogram.15;0
>>>> histogram.150;0
>>>> histogram.151;0
>>>> histogram.152;0
>>>> histogram.153;0
>>>> histogram.154;0
>>>> histogram.155;0
>>>> histogram.156;0
>>>> histogram.157;0
>>>> histogram.158;0
>>>> histogram.159;0
>>>> histogram.16;0
>>>> histogram.160;0
>>>> histogram.161;0
>>>> histogram.162;0
>>>> histogram.163;0
>>>> histogram.164;0
>>>> histogram.165;0
>>>> histogram.166;0
>>>> histogram.167;0
>>>> histogram.168;0
>>>> histogram.169;0
>>>> histogram.17;0
>>>> histogram.170;0
>>>> histogram.171;0
>>>> histogram.172;0
>>>> histogram.173;0
>>>> histogram.174;0
>>>> histogram.175;0
>>>> histogram.176;0
>>>> histogram.177;0
>>>> histogram.178;0
>>>> histogram.179;0
>>>> histogram.18;0
>>>> histogram.180;0
>>>> histogram.181;0
>>>> histogram.182;0
>>>> histogram.183;0
>>>> histogram.184;0
>>>> histogram.185;0
>>>> histogram.186;0
>>>> histogram.187;0
>>>> histogram.188;0
>>>> histogram.189;0
>>>> histogram.19;0
>>>> histogram.190;0
>>>> histogram.191;0
>>>> histogram.192;0
>>>> histogram.193;0
>>>> histogram.194;0
>>>> histogram.195;0
>>>> histogram.196;0
>>>> histogram.197;0
>>>> histogram.198;0
>>>> histogram.199;0
>>>> histogram.2;0
>>>> histogram.20;0
>>>> histogram.200;0
>>>> histogram.201;0
>>>> histogram.202;0
>>>> histogram.203;0
>>>> histogram.204;0
>>>> histogram.205;0
>>>> histogram.206;0
>>>> histogram.207;0
>>>> histogram.208;0
>>>> histogram.209;0
>>>> histogram.21;0
>>>> histogram.210;0
>>>> histogram.211;0
>>>> histogram.212;0
>>>> histogram.213;0
>>>> histogram.214;0
>>>> histogram.215;0
>>>> histogram.216;0
>>>> histogram.217;0
>>>> histogram.218;0
>>>> histogram.219;0
>>>> histogram.22;0
>>>> histogram.220;0
>>>> histogram.221;0
>>>> histogram.222;0
>>>> histogram.223;0
>>>> histogram.224;0
>>>> histogram.225;0
>>>> histogram.226;0
>>>> histogram.227;0
>>>> histogram.228;0
>>>> histogram.229;0
>>>> histogram.23;0
>>>> histogram.230;0
>>>> histogram.231;0
>>>> histogram.232;0
>>>> histogram.233;0
>>>> histogram.234;0
>>>> histogram.235;0
>>>> histogram.236;0
>>>> histogram.237;0
>>>> histogram.238;0
>>>> histogram.239;0
>>>> histogram.24;0
>>>> histogram.240;0
>>>> histogram.241;0
>>>> histogram.242;0
>>>> histogram.243;0
>>>> histogram.244;0
>>>> histogram.245;0
>>>> histogram.246;0
>>>> histogram.247;0
>>>> histogram.248;0
>>>> histogram.249;0
>>>> histogram.25;0
>>>> histogram.250;0
>>>> histogram.251;0
>>>> histogram.252;0
>>>> histogram.253;0
>>>> histogram.254;0
>>>> histogram.255;0
>>>> histogram.26;0
>>>> histogram.27;0
>>>> histogram.28;0
>>>> histogram.29;0
>>>> histogram.3;0
>>>> histogram.30;0
>>>> histogram.31;0
>>>> histogram.32;11924458
>>>> histogram.33;11934243
>>>> histogram.34;11930696
>>>> histogram.35;11925574
>>>> histogram.36;11929198
>>>> histogram.37;11928146
>>>> histogram.38;11932505
>>>> histogram.39;11929406
>>>> histogram.4;0
>>>> histogram.40;11930100
>>>> histogram.41;11930867
>>>> histogram.42;11930796
>>>> histogram.43;11930796
>>>> histogram.44;11921866
>>>> histogram.45;11935682
>>>> histogram.46;11930075
>>>> histogram.47;11928169
>>>> histogram.48;11933490
>>>> histogram.49;11932174
>>>> histogram.5;0
>>>> histogram.50;11933255
>>>> histogram.51;11934009
>>>> histogram.52;11928361
>>>> histogram.53;11927626
>>>> histogram.54;11931611
>>>> histogram.55;11930755
>>>> histogram.56;11933823
>>>> histogram.57;11922508
>>>> histogram.58;11930384
>>>> histogram.59;11929805
>>>> histogram.6;0
>>>> histogram.60;11930064
>>>> histogram.61;11926761
>>>> histogram.62;11927605
>>>> histogram.63;23858926
>>>> histogram.64;11929516
>>>> histogram.65;11930217
>>>> histogram.66;11930478
>>>> histogram.67;11939855
>>>> histogram.68;11927850
>>>> histogram.69;11931154
>>>> histogram.7;0
>>>> histogram.70;11935374
>>>> histogram.71;11930754
>>>> histogram.72;11928304
>>>> histogram.73;11931772
>>>> histogram.74;11939417
>>>> histogram.75;11930712
>>>> histogram.76;11933331
>>>> histogram.77;11931279
>>>> histogram.78;11928276
>>>> histogram.79;11930071
>>>> histogram.8;0
>>>> histogram.80;11927830
>>>> histogram.81;11931213
>>>> histogram.82;11930964
>>>> histogram.83;11928973
>>>> histogram.84;11934325
>>>> histogram.85;11929658
>>>> histogram.86;11924667
>>>> histogram.87;11931100
>>>> histogram.88;11930252
>>>> histogram.89;11927281
>>>> histogram.9;11932848
>>>> histogram.90;11930398
>>>> histogram.91;0
>>>> histogram.92;0
>>>> histogram.93;0
>>>> histogram.94;11928720
>>>> histogram.95;11928988
>>>> histogram.96;0
>>>> histogram.97;11931423
>>>> histogram.98;11928181
>>>> histogram.99;11935549
>>>> histogram.totalBytes;1073741824
>>>> 
>>>> File3:
>>>> histogram.0;0
>>>> histogram.1;0
>>>> histogram.10;11930417
>>>> histogram.100;11926739
>>>> histogram.101;11930580
>>>> histogram.102;11928210
>>>> histogram.103;11935300
>>>> histogram.104;11925804
>>>> histogram.105;11931023
>>>> histogram.106;11932342
>>>> histogram.107;11929778
>>>> histogram.108;11930098
>>>> histogram.109;11930759
>>>> histogram.11;0
>>>> histogram.110;11934343
>>>> histogram.111;11935775
>>>> histogram.112;11933877
>>>> histogram.113;11926675
>>>> histogram.114;11929332
>>>> histogram.115;11928876
>>>> histogram.116;11927819
>>>> histogram.117;11932657
>>>> histogram.118;11933508
>>>> histogram.119;11928808
>>>> histogram.12;0
>>>> histogram.120;11937532
>>>> histogram.121;11926907
>>>> histogram.122;11933942
>>>> histogram.123;0
>>>> histogram.124;0
>>>> histogram.125;0
>>>> histogram.126;0
>>>> histogram.127;0
>>>> histogram.128;0
>>>> histogram.129;0
>>>> histogram.13;0
>>>> histogram.130;0
>>>> histogram.131;0
>>>> histogram.132;0
>>>> histogram.133;0
>>>> histogram.134;0
>>>> histogram.135;0
>>>> histogram.136;0
>>>> histogram.137;0
>>>> histogram.138;0
>>>> histogram.139;0
>>>> histogram.14;0
>>>> histogram.140;0
>>>> histogram.141;0
>>>> histogram.142;0
>>>> histogram.143;0
>>>> histogram.144;0
>>>> histogram.145;0
>>>> histogram.146;0
>>>> histogram.147;0
>>>> histogram.148;0
>>>> histogram.149;0
>>>> histogram.15;0
>>>> histogram.150;0
>>>> histogram.151;0
>>>> histogram.152;0
>>>> histogram.153;0
>>>> histogram.154;0
>>>> histogram.155;0
>>>> histogram.156;0
>>>> histogram.157;0
>>>> histogram.158;0
>>>> histogram.159;0
>>>> histogram.16;0
>>>> histogram.160;0
>>>> histogram.161;0
>>>> histogram.162;0
>>>> histogram.163;0
>>>> histogram.164;0
>>>> histogram.165;0
>>>> histogram.166;0
>>>> histogram.167;0
>>>> histogram.168;0
>>>> histogram.169;0
>>>> histogram.17;0
>>>> histogram.170;0
>>>> histogram.171;0
>>>> histogram.172;0
>>>> histogram.173;0
>>>> histogram.174;0
>>>> histogram.175;0
>>>> histogram.176;0
>>>> histogram.177;0
>>>> histogram.178;0
>>>> histogram.179;0
>>>> histogram.18;0
>>>> histogram.180;0
>>>> histogram.181;0
>>>> histogram.182;0
>>>> histogram.183;0
>>>> histogram.184;0
>>>> histogram.185;0
>>>> histogram.186;0
>>>> histogram.187;0
>>>> histogram.188;0
>>>> histogram.189;0
>>>> histogram.19;0
>>>> histogram.190;0
>>>> histogram.191;0
>>>> histogram.192;0
>>>> histogram.193;0
>>>> histogram.194;0
>>>> histogram.195;0
>>>> histogram.196;0
>>>> histogram.197;0
>>>> histogram.198;0
>>>> histogram.199;0
>>>> histogram.2;0
>>>> histogram.20;0
>>>> histogram.200;0
>>>> histogram.201;0
>>>> histogram.202;0
>>>> histogram.203;0
>>>> histogram.204;0
>>>> histogram.205;0
>>>> histogram.206;0
>>>> histogram.207;0
>>>> histogram.208;0
>>>> histogram.209;0
>>>> histogram.21;0
>>>> histogram.210;0
>>>> histogram.211;0
>>>> histogram.212;0
>>>> histogram.213;0
>>>> histogram.214;0
>>>> histogram.215;0
>>>> histogram.216;0
>>>> histogram.217;0
>>>> histogram.218;0
>>>> histogram.219;0
>>>> histogram.22;0
>>>> histogram.220;0
>>>> histogram.221;0
>>>> histogram.222;0
>>>> histogram.223;0
>>>> histogram.224;0
>>>> histogram.225;0
>>>> histogram.226;0
>>>> histogram.227;0
>>>> histogram.228;0
>>>> histogram.229;0
>>>> histogram.23;0
>>>> histogram.230;0
>>>> histogram.231;0
>>>> histogram.232;0
>>>> histogram.233;0
>>>> histogram.234;0
>>>> histogram.235;0
>>>> histogram.236;0
>>>> histogram.237;0
>>>> histogram.238;0
>>>> histogram.239;0
>>>> histogram.24;0
>>>> histogram.240;0
>>>> histogram.241;0
>>>> histogram.242;0
>>>> histogram.243;0
>>>> histogram.244;0
>>>> histogram.245;0
>>>> histogram.246;0
>>>> histogram.247;0
>>>> histogram.248;0
>>>> histogram.249;0
>>>> histogram.25;0
>>>> histogram.250;0
>>>> histogram.251;0
>>>> histogram.252;0
>>>> histogram.253;0
>>>> histogram.254;0
>>>> histogram.255;0
>>>> histogram.26;0
>>>> histogram.27;0
>>>> histogram.28;0
>>>> histogram.29;0
>>>> histogram.3;0
>>>> histogram.30;0
>>>> histogram.31;0
>>>> histogram.32;11929486
>>>> histogram.33;11930737
>>>> histogram.34;11931092
>>>> histogram.35;11934488
>>>> histogram.36;11927605
>>>> histogram.37;11930735
>>>> histogram.38;11932174
>>>> histogram.39;11936180
>>>> histogram.4;0
>>>> histogram.40;11931666
>>>> histogram.41;11927043
>>>> histogram.42;11929044
>>>> histogram.43;11934104
>>>> histogram.44;11936337
>>>> histogram.45;11935580
>>>> histogram.46;11929598
>>>> histogram.47;11934083
>>>> histogram.48;11928858
>>>> histogram.49;11931098
>>>> histogram.5;0
>>>> histogram.50;11930618
>>>> histogram.51;11925429
>>>> histogram.52;11929741
>>>> histogram.53;11934160
>>>> histogram.54;11931999
>>>> histogram.55;11930465
>>>> histogram.56;11926194
>>>> histogram.57;11926386
>>>> histogram.58;11924871
>>>> histogram.59;11929331
>>>> histogram.6;0
>>>> histogram.60;11926951
>>>> histogram.61;11928631
>>>> histogram.62;11927549
>>>> histogram.63;23856730
>>>> histogram.64;11930288
>>>> histogram.65;11931523
>>>> histogram.66;11932821
>>>> histogram.67;11932509
>>>> histogram.68;11929613
>>>> histogram.69;11928651
>>>> histogram.7;0
>>>> histogram.70;11929253
>>>> histogram.71;11931521
>>>> histogram.72;11925805
>>>> histogram.73;11934833
>>>> histogram.74;11928314
>>>> histogram.75;11923854
>>>> histogram.76;11930892
>>>> histogram.77;11927528
>>>> histogram.78;11932850
>>>> histogram.79;11934471
>>>> histogram.8;0
>>>> histogram.80;11925707
>>>> histogram.81;11929213
>>>> histogram.82;11931334
>>>> histogram.83;11936739
>>>> histogram.84;11927855
>>>> histogram.85;11931668
>>>> histogram.86;11928609
>>>> histogram.87;11931930
>>>> histogram.88;11934341
>>>> histogram.89;11927519
>>>> histogram.9;11928004
>>>> histogram.90;11933502
>>>> histogram.91;0
>>>> histogram.92;0
>>>> histogram.93;0
>>>> histogram.94;11932024
>>>> histogram.95;11932693
>>>> histogram.96;0
>>>> histogram.97;11928428
>>>> histogram.98;11933195
>>>> histogram.99;11924273
>>>> histogram.totalBytes;1073741824
>>>> 
>>>> Kind regards
>>>> Jens
>>>> 
>>>>> Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <jo...@gmail.com>:
>>>>> 
>>>>> Jen
>>>>> 
>>>>> 118 hours in - still goood.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>>> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>> 
>>>>>> Jens
>>>>>> 
>>>>>> Update from hour 67.  Still lookin' good.
>>>>>> 
>>>>>> Will advise.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 days before the first error occurred
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>>> Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> Am 40+ hours in running both your flow and mine to reproduce.  So far
>>>>>>>> neither have shown any sign of trouble.  Will keep running for another
>>>>>>>> week or so if I can.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>>> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
>>>>>>>>> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
>>>>>>>>> 
>>>>>>>>> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
>>>>>>>>> 
>>>>>>>>> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
>>>>>>>>> 
>>>>>>>>> Kind regards
>>>>>>>>> Jens
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> I don't quite follow the EXT4 usage on top of VMFS but the point here
>>>>>>>>> is you'll ultimately need to truly understand your underlying storage
>>>>>>>>> system and what sorts of guarantees it is giving you.  If linux/the
>>>>>>>>> jvm/nifi think it has a typical EXT4 type block storage system to work
>>>>>>>>> with it can only be safe/operate within those constraints.  I have no
>>>>>>>>> idea about what VMFS brings to the table or the settings for it.
>>>>>>>>> 
>>>>>>>>> The sync properties I shared previously might help force the issue of
>>>>>>>>> ensuring a formal sync/flush cycle all the way through the disk has
>>>>>>>>> occurred which we'd normally not do or need to do but again in some
>>>>>>>>> cases offers a stronger guarantee in exchange for performance.
>>>>>>>>> 
>>>>>>>>> In any case...Mark's path for you here will help identify what we're
>>>>>>>>> dealing with and we can go from there.
>>>>>>>>> 
>>>>>>>>> I am aware of significant usage of NiFi on VMWare configurations
>>>>>>>>> without issue at high rates for many years so whatever it is here is
>>>>>>>>> likely solvable.
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi Mark
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
>>>>>>>>> 
>>>>>>>>> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
>>>>>>>>> 
>>>>>>>>> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
>>>>>>>>> 
>>>>>>>>> Just like the case in step 5 and 12 in the pdf file
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I will let you all know more later next week
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Kind regards
>>>>>>>>> 
>>>>>>>>> Jens
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> And the actual script:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> import org.apache.nifi.flowfile.FlowFile
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> import java.util.stream.Collectors
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>>>>>>>>> 
>>>>>>>>> final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
>>>>>>>>> 
>>>>>>>>>     .filter({ entry -> entry.getKey().startsWith("histogram.") })
>>>>>>>>> 
>>>>>>>>>     .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
>>>>>>>>> 
>>>>>>>>> return histogram;
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
>>>>>>>>> 
>>>>>>>>> final Map<String, String> histogram = new HashMap<>();
>>>>>>>>> 
>>>>>>>>> final int[] distribution = new int[256];
>>>>>>>>> 
>>>>>>>>> Arrays.fill(distribution, 0);
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> long total = 0L;
>>>>>>>>> 
>>>>>>>>> final byte[] buffer = new byte[8192];
>>>>>>>>> 
>>>>>>>>> int len;
>>>>>>>>> 
>>>>>>>>> while ((len = inStream.read(buffer)) > 0) {
>>>>>>>>> 
>>>>>>>>>     for (int i=0; i < len; i++) {
>>>>>>>>> 
>>>>>>>>>         final int val = buffer[i];
>>>>>>>>> 
>>>>>>>>>         distribution[val]++;
>>>>>>>>> 
>>>>>>>>>         total++;
>>>>>>>>> 
>>>>>>>>>     }
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> for (int i=0; i < 256; i++) {
>>>>>>>>> 
>>>>>>>>>     histogram.put("histogram." + i, String.valueOf(distribution[i]));
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> histogram.put("histogram.totalBytes", String.valueOf(total));
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> return histogram;
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
>>>>>>>>> 
>>>>>>>>> final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
>>>>>>>>> 
>>>>>>>>> final Map<String, String> sorted = new TreeMap<>(previous)
>>>>>>>>> 
>>>>>>>>> for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>>>>>>>>> 
>>>>>>>>>     final String key = entry.getKey();
>>>>>>>>> 
>>>>>>>>>     final String previousValue = entry.getValue();
>>>>>>>>> 
>>>>>>>>>     final String updatedValue = updated.get(entry.getKey())
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>     if (!Objects.equals(previousValue, updatedValue)) {
>>>>>>>>> 
>>>>>>>>>         sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
>>>>>>>>> 
>>>>>>>>>     }
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> log.error(sb.toString());
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> def flowFile = session.get()
>>>>>>>>> 
>>>>>>>>> if (flowFile == null) {
>>>>>>>>> 
>>>>>>>>> return
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
>>>>>>>>> 
>>>>>>>>> Map<String, String> histogram = null;
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> final InputStream inStream = session.read(flowFile);
>>>>>>>>> 
>>>>>>>>> try {
>>>>>>>>> 
>>>>>>>>> histogram = createHistogram(flowFile, inStream);
>>>>>>>>> 
>>>>>>>>> } finally {
>>>>>>>>> 
>>>>>>>>> inStream.close()
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> if (!previousHistogram.isEmpty()) {
>>>>>>>>> 
>>>>>>>>> if (previousHistogram.equals(histogram)) {
>>>>>>>>> 
>>>>>>>>>     log.info("Histograms match")
>>>>>>>>> 
>>>>>>>>> } else {
>>>>>>>>> 
>>>>>>>>>     logHistogramDifferences(previousHistogram, histogram)
>>>>>>>>> 
>>>>>>>>>     session.transfer(flowFile, REL_FAILURE)
>>>>>>>>> 
>>>>>>>>>     return;
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> flowFile = session.putAllAttributes(flowFile, histogram)
>>>>>>>>> 
>>>>>>>>> session.transfer(flowFile, REL_SUCCESS)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
>>>>>>>>> 
>>>>>>>>> histogram.0 = 280273
>>>>>>>>> 
>>>>>>>>> histogram.1 = 2820
>>>>>>>>> 
>>>>>>>>> histogram.2 = 48202
>>>>>>>>> 
>>>>>>>>> histogram.3 = 3820
>>>>>>>>> 
>>>>>>>>> …
>>>>>>>>> 
>>>>>>>>> histogram.totalBytes = 1780928732
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> -Mark
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Still want to know more about your underlying storage system.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> You could also try updating nifi.properties and changing the following lines:
>>>>>>>>> 
>>>>>>>>> nifi.flowfile.repository.always.sync=true
>>>>>>>>> 
>>>>>>>>> nifi.content.repository.always.sync=true
>>>>>>>>> 
>>>>>>>>> nifi.provenance.repository.always.sync=true
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> It will hurt performance but can be useful/necessary on certain storage subsystems.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> Joe
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Dear Joe and Mark
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>>>>>>>>> 
>>>>>>>>> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>>>>>>>>> 
>>>>>>>>> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Kind regards
>>>>>>>>> 
>>>>>>>>> Jens M. Kofoed
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Joe,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> To start from the last mail :-)
>>>>>>>>> 
>>>>>>>>> All the repositories has it's own disk, and I'm using ext4
>>>>>>>>> 
>>>>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>>>>>>> 
>>>>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>>>>>>> 
>>>>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> My test flow WITH sftp looks like this:
>>>>>>>>> 
>>>>>>>>> <image.png>
>>>>>>>>> 
>>>>>>>>> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>>>>>>>> 
>>>>>>>>> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> About the Lineage:
>>>>>>>>> 
>>>>>>>>> Are there a way to export all the lineage data? The export only generate a svg file.
>>>>>>>>> 
>>>>>>>>> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>>>>>>>>> 
>>>>>>>>> The interesting steps are step 5 and 12.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Kind regards
>>>>>>>>> 
>>>>>>>>> Jens M. Kofoed
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Also what type of file system/storage system are you running NiFi on
>>>>>>>>> 
>>>>>>>>> in this case?  We'll need to know this for the NiFi
>>>>>>>>> 
>>>>>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> And to further narrow this down
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>>>>>>>> 
>>>>>>>>> (2 files per node) and next process was a hashcontent before it run
>>>>>>>>> 
>>>>>>>>> into a test loop. Where files are uploaded via PutSFTP to a test
>>>>>>>>> 
>>>>>>>>> server, and downloaded again and recalculated the hash. I have had one
>>>>>>>>> 
>>>>>>>>> issue after 3 days of running."
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> So to be clear with GenerateFlowFile making these files and then you
>>>>>>>>> 
>>>>>>>>> looping the content is wholly and fully exclusively within the control
>>>>>>>>> 
>>>>>>>>> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>>>>>>> 
>>>>>>>>> same files over and over in nifi itself you can make this happen or
>>>>>>>>> 
>>>>>>>>> cannot?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> "After fetching a FlowFile-stream file and unpacked it back into NiFi
>>>>>>>>> 
>>>>>>>>> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>>>>>>> 
>>>>>>>>> exact same file. And got a new hash. That is what worry’s me.
>>>>>>>>> 
>>>>>>>>> The fact that the same file can be recalculated and produce two
>>>>>>>>> 
>>>>>>>>> different hashes, is very strange, but it happens. "
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Ok so to confirm you are saying that in each case this happens you see
>>>>>>>>> 
>>>>>>>>> it first compute the wrong hash, but then if you retry the same
>>>>>>>>> 
>>>>>>>>> flowfile it then provides the correct hash?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Can you please also show/share the lineage history for such a flow
>>>>>>>>> 
>>>>>>>>> file then?  It should have events for the initial hash, second hash,
>>>>>>>>> 
>>>>>>>>> the unpacking, trace to the original stream, etc...
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Dear Mark and Joe
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>>>>>>> 
>>>>>>>>> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>>>>>>>>> 
>>>>>>>>> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>>>>>>>>> 
>>>>>>>>> Now the test flow is running without the Put/Fetch sftp processors.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>>>>>>>>> 
>>>>>>>>> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Kind Regards
>>>>>>>>> 
>>>>>>>>> Jens M. Kofoed
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks for sharing the images.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> -Mark
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Actually is this current loop test contained within a single nifi and there you see corruption happen?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Joe
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Joe
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>>>>>>>>> 
>>>>>>>>> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>>>>>>>>> 
>>>>>>>>> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Any suggestions for where to dig ?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> 
>>>>>>>>> Jens M. Kofoed
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi Mark
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks for replaying and the suggestion to look at the content Claim.
>>>>>>>>> 
>>>>>>>>> These 3 pictures is from the first attempt:
>>>>>>>>> 
>>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Yesterday I realized that the content was still in the archive, so I could Replay the file.
>>>>>>>>> 
>>>>>>>>> <image.png>
>>>>>>>>> 
>>>>>>>>> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>>>>>>>>> 
>>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>>>>>>>>> 
>>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>>> 
>>>>>>>>> Here the content Claim is all the same.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>>>>>>>>> 
>>>>>>>>> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>>>>>>>>> 
>>>>>>>>> What can influence the same processor to calculate 2 different sha256 on the exact same content???
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> 
>>>>>>>>> Jens M. Kofoed
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>>>>>>>>> 
>>>>>>>>> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> -Mark
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Dear NIFI Users
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>>>>>>>>> 
>>>>>>>>> The background:
>>>>>>>>> 
>>>>>>>>> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ----
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>>>>>>>>> 
>>>>>>>>> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>>>>>>>>> 
>>>>>>>>> <image.png>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>>>>>>>> 
>>>>>>>>> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> System 1:
>>>>>>>>> 
>>>>>>>>> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>>>> 
>>>>>>>>> The file is exported as a "FlowFile Stream, v3" to System 2
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> SYSTEM 2:
>>>>>>>>> 
>>>>>>>>> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>> 
>>>>>>>>> <image.png>
>>>>>>>>> 
>>>>>>>>> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>> 
>>>>>>>>> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>> 
>>>>>>>>> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>>>> 
>>>>>>>>> <image.png>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> How on earth can this happen???
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Kind Regards
>>>>>>>>> 
>>>>>>>>> Jens M. Kofoed
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> <Repro.json>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> <Try_to_recreate_Jens_Challenge.json>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>> 
>>>> 
> 

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Mark Payne <ma...@hotmail.com>.
So what I found interesting about the histogram output was that in each case, the input file was 1 GB. The number of bytes that differed between the ‘good’ and ‘bad’ hashes was something like 500-700 bytes whose values were different. But the values ranged significantly. There was no indication that the type of thing we’ve seen with NFS mounts was happening, where data was nulled out until received and then updated. If that had been the case we’d have seen the NUL byte (or some other value) have a very significant change in the histogram, but we didn’t see that.

So a couple more ideas that I think can be useful.

1) Which garbage collector are you using? It’s configured in the bootstrap.conf file

2) We can try to definitively prove out whether the content on the disk is changing or if there’s an issue reading the content. To do this:

1. Stop all processors.
2. Shutdown nifi
3. rm -rf content_repository; rm -rf flowfile_repository   (warning, this will delete all FlowFiles & content, so only do this on a dev/test system where you’re comfortable deleting it!)
4. Start nifi
5. Let exactly 1 FlowFile into your flow.
6. While it is looping through, create a copy of your entire Content Repository: cp -r content_repository content_backup1; zip content_backup1.zip content_backup1
7. Wait for the hashes to differ
8. Create another copy of the Content Repository: cp -r content_repository content_backup2
9. Find the files within the content_backup1 and content_backup2 and compare them to see if they are identical. Would recommend comparing them using each of the 3 methods: sha256, sha512, diff

This should make it pretty clear that either:
(1) the issue resides in the software: either NiFi or the JVM
(2) the issue resides outside of the software: the disk, the disk driver, the operating system, the VM hypervisor, etc.

Thanks
-Mark

> On Nov 3, 2021, at 10:44 AM, Joe Witt <jo...@gmail.com> wrote:
> 
> Jens,
> 
> 184 hours (7.6 days) in and zero issues.
> 
> Will need to turn this off soon but wanted to give a final update.
> Looks great.  Given the information on your system there appears to be
> something we dont understand related to the virtual file system
> involved or something.
> 
> Thanks
> 
> On Tue, Nov 2, 2021 at 10:55 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> Hi Mark
>> 
>> Of course, sorry :-)  By looking at the error messages, I can see that it is only the histograms which has differences which is listed. And all 3 have the first issue at histogram.9. Don't know what that mean
>> 
>> /Jens
>> 
>> Here are the error log:
>> 2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are differences in the histogram
>> Byte Value: histogram.10, Previous Count: 11926720, New Count: 11926721
>> Byte Value: histogram.100, Previous Count: 11927504, New Count: 11927503
>> Byte Value: histogram.101, Previous Count: 11925396, New Count: 11925407
>> Byte Value: histogram.102, Previous Count: 11929923, New Count: 11929941
>> Byte Value: histogram.103, Previous Count: 11931596, New Count: 11931591
>> Byte Value: histogram.104, Previous Count: 11929071, New Count: 11929064
>> Byte Value: histogram.105, Previous Count: 11931365, New Count: 11931348
>> Byte Value: histogram.106, Previous Count: 11928661, New Count: 11928645
>> Byte Value: histogram.107, Previous Count: 11929864, New Count: 11929866
>> Byte Value: histogram.108, Previous Count: 11931611, New Count: 11931642
>> Byte Value: histogram.109, Previous Count: 11932758, New Count: 11932763
>> Byte Value: histogram.110, Previous Count: 11927893, New Count: 11927895
>> Byte Value: histogram.111, Previous Count: 11933519, New Count: 11933522
>> Byte Value: histogram.112, Previous Count: 11931392, New Count: 11931397
>> Byte Value: histogram.113, Previous Count: 11928534, New Count: 11928548
>> Byte Value: histogram.114, Previous Count: 11936879, New Count: 11936874
>> Byte Value: histogram.115, Previous Count: 11932818, New Count: 11932804
>> Byte Value: histogram.117, Previous Count: 11929143, New Count: 11929151
>> Byte Value: histogram.118, Previous Count: 11931854, New Count: 11931829
>> Byte Value: histogram.119, Previous Count: 11926333, New Count: 11926327
>> Byte Value: histogram.120, Previous Count: 11928731, New Count: 11928740
>> Byte Value: histogram.121, Previous Count: 11931149, New Count: 11931162
>> Byte Value: histogram.122, Previous Count: 11926725, New Count: 11926733
>> Byte Value: histogram.32, Previous Count: 11930422, New Count: 11930425
>> Byte Value: histogram.33, Previous Count: 11934311, New Count: 11934313
>> Byte Value: histogram.34, Previous Count: 11930459, New Count: 11930446
>> Byte Value: histogram.35, Previous Count: 11924776, New Count: 11924758
>> Byte Value: histogram.36, Previous Count: 11924186, New Count: 11924183
>> Byte Value: histogram.37, Previous Count: 11928616, New Count: 11928627
>> Byte Value: histogram.38, Previous Count: 11929474, New Count: 11929490
>> Byte Value: histogram.39, Previous Count: 11929607, New Count: 11929600
>> Byte Value: histogram.40, Previous Count: 11928053, New Count: 11928048
>> Byte Value: histogram.41, Previous Count: 11930402, New Count: 11930399
>> Byte Value: histogram.42, Previous Count: 11926830, New Count: 11926846
>> Byte Value: histogram.44, Previous Count: 11932536, New Count: 11932538
>> Byte Value: histogram.45, Previous Count: 11931053, New Count: 11931044
>> Byte Value: histogram.46, Previous Count: 11930008, New Count: 11930011
>> Byte Value: histogram.47, Previous Count: 11927747, New Count: 11927734
>> Byte Value: histogram.48, Previous Count: 11936055, New Count: 11936057
>> Byte Value: histogram.49, Previous Count: 11931471, New Count: 11931474
>> Byte Value: histogram.50, Previous Count: 11931921, New Count: 11931908
>> Byte Value: histogram.51, Previous Count: 11929643, New Count: 11929637
>> Byte Value: histogram.52, Previous Count: 11923847, New Count: 11923854
>> Byte Value: histogram.53, Previous Count: 11927311, New Count: 11927303
>> Byte Value: histogram.54, Previous Count: 11933754, New Count: 11933766
>> Byte Value: histogram.55, Previous Count: 11925964, New Count: 11925970
>> Byte Value: histogram.56, Previous Count: 11928872, New Count: 11928873
>> Byte Value: histogram.57, Previous Count: 11931124, New Count: 11931127
>> Byte Value: histogram.58, Previous Count: 11928474, New Count: 11928477
>> Byte Value: histogram.59, Previous Count: 11925814, New Count: 11925812
>> Byte Value: histogram.60, Previous Count: 11933978, New Count: 11933991
>> Byte Value: histogram.61, Previous Count: 11934136, New Count: 11934123
>> Byte Value: histogram.62, Previous Count: 11932016, New Count: 11932011
>> Byte Value: histogram.63, Previous Count: 23864588, New Count: 23864584
>> Byte Value: histogram.64, Previous Count: 11924792, New Count: 11924789
>> Byte Value: histogram.65, Previous Count: 11934789, New Count: 11934797
>> Byte Value: histogram.66, Previous Count: 11933047, New Count: 11933044
>> Byte Value: histogram.67, Previous Count: 11931899, New Count: 11931909
>> Byte Value: histogram.68, Previous Count: 11935615, New Count: 11935609
>> Byte Value: histogram.69, Previous Count: 11927249, New Count: 11927239
>> Byte Value: histogram.70, Previous Count: 11933276, New Count: 11933274
>> Byte Value: histogram.71, Previous Count: 11927953, New Count: 11927969
>> Byte Value: histogram.72, Previous Count: 11929275, New Count: 11929266
>> Byte Value: histogram.73, Previous Count: 11930292, New Count: 11930306
>> Byte Value: histogram.74, Previous Count: 11935428, New Count: 11935427
>> Byte Value: histogram.75, Previous Count: 11930317, New Count: 11930307
>> Byte Value: histogram.76, Previous Count: 11935737, New Count: 11935726
>> Byte Value: histogram.77, Previous Count: 11932127, New Count: 11932125
>> Byte Value: histogram.78, Previous Count: 11932344, New Count: 11932349
>> Byte Value: histogram.79, Previous Count: 11932094, New Count: 11932100
>> Byte Value: histogram.80, Previous Count: 11930688, New Count: 11930687
>> Byte Value: histogram.81, Previous Count: 11928415, New Count: 11928416
>> Byte Value: histogram.82, Previous Count: 11931559, New Count: 11931542
>> Byte Value: histogram.83, Previous Count: 11934192, New Count: 11934176
>> Byte Value: histogram.84, Previous Count: 11927224, New Count: 11927231
>> Byte Value: histogram.85, Previous Count: 11929491, New Count: 11929484
>> Byte Value: histogram.87, Previous Count: 11932201, New Count: 11932190
>> Byte Value: histogram.88, Previous Count: 11930694, New Count: 11930680
>> Byte Value: histogram.89, Previous Count: 11936439, New Count: 11936448
>> Byte Value: histogram.9, Previous Count: 11933187, New Count: 11933193
>> Byte Value: histogram.90, Previous Count: 11926445, New Count: 11926455
>> Byte Value: histogram.94, Previous Count: 11931596, New Count: 11931609
>> Byte Value: histogram.95, Previous Count: 11929379, New Count: 11929384
>> Byte Value: histogram.97, Previous Count: 11928864, New Count: 11928874
>> Byte Value: histogram.98, Previous Count: 11924738, New Count: 11924729
>> Byte Value: histogram.99, Previous Count: 11930062, New Count: 11930059
>> 
>> 2021-11-01 22:10:02,765 ERROR [Timer-Driven Process Thread-9] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are differences in the histogram
>> Byte Value: histogram.10, Previous Count: 11932402, New Count: 11932407
>> Byte Value: histogram.100, Previous Count: 11927531, New Count: 11927541
>> Byte Value: histogram.101, Previous Count: 11928454, New Count: 11928430
>> Byte Value: histogram.102, Previous Count: 11934432, New Count: 11934439
>> Byte Value: histogram.103, Previous Count: 11924623, New Count: 11924633
>> Byte Value: histogram.104, Previous Count: 11934492, New Count: 11934474
>> Byte Value: histogram.105, Previous Count: 11934585, New Count: 11934591
>> Byte Value: histogram.106, Previous Count: 11928955, New Count: 11928948
>> Byte Value: histogram.108, Previous Count: 11930139, New Count: 11930140
>> Byte Value: histogram.109, Previous Count: 11929325, New Count: 11929321
>> Byte Value: histogram.110, Previous Count: 11930486, New Count: 11930478
>> Byte Value: histogram.111, Previous Count: 11933517, New Count: 11933508
>> Byte Value: histogram.112, Previous Count: 11928334, New Count: 11928339
>> Byte Value: histogram.114, Previous Count: 11929222, New Count: 11929213
>> Byte Value: histogram.116, Previous Count: 11931182, New Count: 11931188
>> Byte Value: histogram.117, Previous Count: 11933407, New Count: 11933402
>> Byte Value: histogram.118, Previous Count: 11932709, New Count: 11932705
>> Byte Value: histogram.120, Previous Count: 11933700, New Count: 11933708
>> Byte Value: histogram.121, Previous Count: 11929803, New Count: 11929801
>> Byte Value: histogram.122, Previous Count: 11930218, New Count: 11930220
>> Byte Value: histogram.32, Previous Count: 11924458, New Count: 11924469
>> Byte Value: histogram.33, Previous Count: 11934243, New Count: 11934248
>> Byte Value: histogram.34, Previous Count: 11930696, New Count: 11930700
>> Byte Value: histogram.35, Previous Count: 11925574, New Count: 11925577
>> Byte Value: histogram.36, Previous Count: 11929198, New Count: 11929187
>> Byte Value: histogram.37, Previous Count: 11928146, New Count: 11928143
>> Byte Value: histogram.38, Previous Count: 11932505, New Count: 11932510
>> Byte Value: histogram.39, Previous Count: 11929406, New Count: 11929412
>> Byte Value: histogram.40, Previous Count: 11930100, New Count: 11930098
>> Byte Value: histogram.41, Previous Count: 11930867, New Count: 11930872
>> Byte Value: histogram.42, Previous Count: 11930796, New Count: 11930793
>> Byte Value: histogram.43, Previous Count: 11930796, New Count: 11930789
>> Byte Value: histogram.44, Previous Count: 11921866, New Count: 11921865
>> Byte Value: histogram.45, Previous Count: 11935682, New Count: 11935699
>> Byte Value: histogram.46, Previous Count: 11930075, New Count: 11930073
>> Byte Value: histogram.47, Previous Count: 11928169, New Count: 11928165
>> Byte Value: histogram.48, Previous Count: 11933490, New Count: 11933478
>> Byte Value: histogram.49, Previous Count: 11932174, New Count: 11932180
>> Byte Value: histogram.50, Previous Count: 11933255, New Count: 11933239
>> Byte Value: histogram.51, Previous Count: 11934009, New Count: 11934013
>> Byte Value: histogram.52, Previous Count: 11928361, New Count: 11928367
>> Byte Value: histogram.53, Previous Count: 11927626, New Count: 11927627
>> Byte Value: histogram.54, Previous Count: 11931611, New Count: 11931617
>> Byte Value: histogram.55, Previous Count: 11930755, New Count: 11930746
>> Byte Value: histogram.56, Previous Count: 11933823, New Count: 11933824
>> Byte Value: histogram.57, Previous Count: 11922508, New Count: 11922510
>> Byte Value: histogram.58, Previous Count: 11930384, New Count: 11930362
>> Byte Value: histogram.59, Previous Count: 11929805, New Count: 11929820
>> Byte Value: histogram.60, Previous Count: 11930064, New Count: 11930055
>> Byte Value: histogram.61, Previous Count: 11926761, New Count: 11926762
>> Byte Value: histogram.62, Previous Count: 11927605, New Count: 11927604
>> Byte Value: histogram.63, Previous Count: 23858926, New Count: 23858913
>> Byte Value: histogram.64, Previous Count: 11929516, New Count: 11929512
>> Byte Value: histogram.65, Previous Count: 11930217, New Count: 11930223
>> Byte Value: histogram.66, Previous Count: 11930478, New Count: 11930481
>> Byte Value: histogram.67, Previous Count: 11939855, New Count: 11939858
>> Byte Value: histogram.68, Previous Count: 11927850, New Count: 11927852
>> Byte Value: histogram.69, Previous Count: 11931154, New Count: 11931175
>> Byte Value: histogram.70, Previous Count: 11935374, New Count: 11935369
>> Byte Value: histogram.71, Previous Count: 11930754, New Count: 11930751
>> Byte Value: histogram.72, Previous Count: 11928304, New Count: 11928318
>> Byte Value: histogram.73, Previous Count: 11931772, New Count: 11931766
>> Byte Value: histogram.74, Previous Count: 11939417, New Count: 11939426
>> Byte Value: histogram.75, Previous Count: 11930712, New Count: 11930718
>> Byte Value: histogram.76, Previous Count: 11933331, New Count: 11933346
>> Byte Value: histogram.77, Previous Count: 11931279, New Count: 11931272
>> Byte Value: histogram.78, Previous Count: 11928276, New Count: 11928290
>> Byte Value: histogram.79, Previous Count: 11930071, New Count: 11930067
>> Byte Value: histogram.80, Previous Count: 11927830, New Count: 11927825
>> Byte Value: histogram.81, Previous Count: 11931213, New Count: 11931206
>> Byte Value: histogram.82, Previous Count: 11930964, New Count: 11930958
>> Byte Value: histogram.83, Previous Count: 11928973, New Count: 11928966
>> Byte Value: histogram.84, Previous Count: 11934325, New Count: 11934331
>> Byte Value: histogram.85, Previous Count: 11929658, New Count: 11929654
>> Byte Value: histogram.86, Previous Count: 11924667, New Count: 11924666
>> Byte Value: histogram.87, Previous Count: 11931100, New Count: 11931106
>> Byte Value: histogram.88, Previous Count: 11930252, New Count: 11930248
>> Byte Value: histogram.89, Previous Count: 11927281, New Count: 11927299
>> Byte Value: histogram.9, Previous Count: 11932848, New Count: 11932851
>> Byte Value: histogram.90, Previous Count: 11930398, New Count: 11930399
>> Byte Value: histogram.94, Previous Count: 11928720, New Count: 11928715
>> Byte Value: histogram.95, Previous Count: 11928988, New Count: 11928977
>> Byte Value: histogram.97, Previous Count: 11931423, New Count: 11931426
>> Byte Value: histogram.98, Previous Count: 11928181, New Count: 11928184
>> Byte Value: histogram.99, Previous Count: 11935549, New Count: 11935542
>> 
>> 2021-11-01 22:23:08,989 ERROR [Timer-Driven Process Thread-10] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=24d13930-49e8-1062-9a2c-943118738138] There are differences in the histogram
>> Byte Value: histogram.10, Previous Count: 11930417, New Count: 11930411
>> Byte Value: histogram.100, Previous Count: 11926739, New Count: 11926755
>> Byte Value: histogram.101, Previous Count: 11930580, New Count: 11930574
>> Byte Value: histogram.102, Previous Count: 11928210, New Count: 11928202
>> Byte Value: histogram.103, Previous Count: 11935300, New Count: 11935297
>> Byte Value: histogram.104, Previous Count: 11925804, New Count: 11925820
>> Byte Value: histogram.105, Previous Count: 11931023, New Count: 11931012
>> Byte Value: histogram.106, Previous Count: 11932342, New Count: 11932344
>> Byte Value: histogram.108, Previous Count: 11930098, New Count: 11930106
>> Byte Value: histogram.109, Previous Count: 11930759, New Count: 11930750
>> Byte Value: histogram.110, Previous Count: 11934343, New Count: 11934352
>> Byte Value: histogram.111, Previous Count: 11935775, New Count: 11935782
>> Byte Value: histogram.112, Previous Count: 11933877, New Count: 11933884
>> Byte Value: histogram.113, Previous Count: 11926675, New Count: 11926674
>> Byte Value: histogram.114, Previous Count: 11929332, New Count: 11929336
>> Byte Value: histogram.115, Previous Count: 11928876, New Count: 11928878
>> Byte Value: histogram.116, Previous Count: 11927819, New Count: 11927833
>> Byte Value: histogram.117, Previous Count: 11932657, New Count: 11932638
>> Byte Value: histogram.118, Previous Count: 11933508, New Count: 11933507
>> Byte Value: histogram.119, Previous Count: 11928808, New Count: 11928821
>> Byte Value: histogram.120, Previous Count: 11937532, New Count: 11937528
>> Byte Value: histogram.121, Previous Count: 11926907, New Count: 11926921
>> Byte Value: histogram.32, Previous Count: 11929486, New Count: 11929489
>> Byte Value: histogram.33, Previous Count: 11930737, New Count: 11930741
>> Byte Value: histogram.34, Previous Count: 11931092, New Count: 11931086
>> Byte Value: histogram.36, Previous Count: 11927605, New Count: 11927615
>> Byte Value: histogram.37, Previous Count: 11930735, New Count: 11930745
>> Byte Value: histogram.38, Previous Count: 11932174, New Count: 11932178
>> Byte Value: histogram.39, Previous Count: 11936180, New Count: 11936182
>> Byte Value: histogram.40, Previous Count: 11931666, New Count: 11931676
>> Byte Value: histogram.41, Previous Count: 11927043, New Count: 11927034
>> Byte Value: histogram.42, Previous Count: 11929044, New Count: 11929042
>> Byte Value: histogram.43, Previous Count: 11934104, New Count: 11934098
>> Byte Value: histogram.44, Previous Count: 11936337, New Count: 11936346
>> Byte Value: histogram.45, Previous Count: 11935580, New Count: 11935582
>> Byte Value: histogram.46, Previous Count: 11929598, New Count: 11929599
>> Byte Value: histogram.47, Previous Count: 11934083, New Count: 11934085
>> Byte Value: histogram.48, Previous Count: 11928858, New Count: 11928860
>> Byte Value: histogram.49, Previous Count: 11931098, New Count: 11931113
>> Byte Value: histogram.50, Previous Count: 11930618, New Count: 11930614
>> Byte Value: histogram.51, Previous Count: 11925429, New Count: 11925435
>> Byte Value: histogram.52, Previous Count: 11929741, New Count: 11929733
>> Byte Value: histogram.53, Previous Count: 11934160, New Count: 11934155
>> Byte Value: histogram.54, Previous Count: 11931999, New Count: 11931980
>> Byte Value: histogram.55, Previous Count: 11930465, New Count: 11930477
>> Byte Value: histogram.56, Previous Count: 11926194, New Count: 11926190
>> Byte Value: histogram.57, Previous Count: 11926386, New Count: 11926381
>> Byte Value: histogram.58, Previous Count: 11924871, New Count: 11924865
>> Byte Value: histogram.59, Previous Count: 11929331, New Count: 11929326
>> Byte Value: histogram.60, Previous Count: 11926951, New Count: 11926943
>> Byte Value: histogram.61, Previous Count: 11928631, New Count: 11928619
>> Byte Value: histogram.62, Previous Count: 11927549, New Count: 11927553
>> Byte Value: histogram.63, Previous Count: 23856730, New Count: 23856718
>> Byte Value: histogram.64, Previous Count: 11930288, New Count: 11930293
>> Byte Value: histogram.65, Previous Count: 11931523, New Count: 11931527
>> Byte Value: histogram.66, Previous Count: 11932821, New Count: 11932818
>> Byte Value: histogram.67, Previous Count: 11932509, New Count: 11932510
>> Byte Value: histogram.68, Previous Count: 11929613, New Count: 11929614
>> Byte Value: histogram.69, Previous Count: 11928651, New Count: 11928654
>> Byte Value: histogram.70, Previous Count: 11929253, New Count: 11929247
>> Byte Value: histogram.71, Previous Count: 11931521, New Count: 11931512
>> Byte Value: histogram.72, Previous Count: 11925805, New Count: 11925808
>> Byte Value: histogram.73, Previous Count: 11934833, New Count: 11934826
>> Byte Value: histogram.74, Previous Count: 11928314, New Count: 11928312
>> Byte Value: histogram.75, Previous Count: 11923854, New Count: 11923863
>> Byte Value: histogram.76, Previous Count: 11930892, New Count: 11930898
>> Byte Value: histogram.77, Previous Count: 11927528, New Count: 11927525
>> Byte Value: histogram.78, Previous Count: 11932850, New Count: 11932857
>> Byte Value: histogram.79, Previous Count: 11934471, New Count: 11934461
>> Byte Value: histogram.80, Previous Count: 11925707, New Count: 11925714
>> Byte Value: histogram.81, Previous Count: 11929213, New Count: 11929206
>> Byte Value: histogram.82, Previous Count: 11931334, New Count: 11931323
>> Byte Value: histogram.83, Previous Count: 11936739, New Count: 11936732
>> Byte Value: histogram.84, Previous Count: 11927855, New Count: 11927832
>> Byte Value: histogram.85, Previous Count: 11931668, New Count: 11931665
>> Byte Value: histogram.86, Previous Count: 11928609, New Count: 11928604
>> Byte Value: histogram.87, Previous Count: 11931930, New Count: 11931933
>> Byte Value: histogram.88, Previous Count: 11934341, New Count: 11934345
>> Byte Value: histogram.89, Previous Count: 11927519, New Count: 11927518
>> Byte Value: histogram.9, Previous Count: 11928004, New Count: 11928001
>> Byte Value: histogram.90, Previous Count: 11933502, New Count: 11933517
>> Byte Value: histogram.94, Previous Count: 11932024, New Count: 11932035
>> Byte Value: histogram.95, Previous Count: 11932693, New Count: 11932679
>> Byte Value: histogram.97, Previous Count: 11928428, New Count: 11928424
>> Byte Value: histogram.98, Previous Count: 11933195, New Count: 11933196
>> Byte Value: histogram.99, Previous Count: 11924273, New Count: 11924282
>> 
>> Den tir. 2. nov. 2021 kl. 15.41 skrev Mark Payne <ma...@hotmail.com>:
>>> 
>>> Jens,
>>> 
>>> The histograms, in and of themselves, are not very interesting. The interesting thing would be the difference in the histogram before & after the hash. Can you provide the ERROR level logs generated by the ExecuteScript? That’s what is of interest.
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> On Nov 2, 2021, at 1:35 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>> 
>>> Hi Mark and Joe
>>> 
>>> Yesterday morning I implemented Mark's script in my 2 testflows. One testflow using sftp the other MergeContent/UnpackContent. Both testflow are running at a test cluster with 3 nodes and NIFI 1.14.0
>>> The 1st flow with sftp have had 1 file going into the failure queue after about 16 hours.
>>> The 2nd flow have had 2 files  going into the failure queue after about 15 and 17 hours.
>>> 
>>> There are definitely something going wrongs in my setup, but I can't figure out what.
>>> 
>>> Information from file 1:
>>> histogram.0;0
>>> histogram.1;0
>>> histogram.10;11926720
>>> histogram.100;11927504
>>> histogram.101;11925396
>>> histogram.102;11929923
>>> histogram.103;11931596
>>> histogram.104;11929071
>>> histogram.105;11931365
>>> histogram.106;11928661
>>> histogram.107;11929864
>>> histogram.108;11931611
>>> histogram.109;11932758
>>> histogram.11;0
>>> histogram.110;11927893
>>> histogram.111;11933519
>>> histogram.112;11931392
>>> histogram.113;11928534
>>> histogram.114;11936879
>>> histogram.115;11932818
>>> histogram.116;11934767
>>> histogram.117;11929143
>>> histogram.118;11931854
>>> histogram.119;11926333
>>> histogram.12;0
>>> histogram.120;11928731
>>> histogram.121;11931149
>>> histogram.122;11926725
>>> histogram.123;0
>>> histogram.124;0
>>> histogram.125;0
>>> histogram.126;0
>>> histogram.127;0
>>> histogram.128;0
>>> histogram.129;0
>>> histogram.13;0
>>> histogram.130;0
>>> histogram.131;0
>>> histogram.132;0
>>> histogram.133;0
>>> histogram.134;0
>>> histogram.135;0
>>> histogram.136;0
>>> histogram.137;0
>>> histogram.138;0
>>> histogram.139;0
>>> histogram.14;0
>>> histogram.140;0
>>> histogram.141;0
>>> histogram.142;0
>>> histogram.143;0
>>> histogram.144;0
>>> histogram.145;0
>>> histogram.146;0
>>> histogram.147;0
>>> histogram.148;0
>>> histogram.149;0
>>> histogram.15;0
>>> histogram.150;0
>>> histogram.151;0
>>> histogram.152;0
>>> histogram.153;0
>>> histogram.154;0
>>> histogram.155;0
>>> histogram.156;0
>>> histogram.157;0
>>> histogram.158;0
>>> histogram.159;0
>>> histogram.16;0
>>> histogram.160;0
>>> histogram.161;0
>>> histogram.162;0
>>> histogram.163;0
>>> histogram.164;0
>>> histogram.165;0
>>> histogram.166;0
>>> histogram.167;0
>>> histogram.168;0
>>> histogram.169;0
>>> histogram.17;0
>>> histogram.170;0
>>> histogram.171;0
>>> histogram.172;0
>>> histogram.173;0
>>> histogram.174;0
>>> histogram.175;0
>>> histogram.176;0
>>> histogram.177;0
>>> histogram.178;0
>>> histogram.179;0
>>> histogram.18;0
>>> histogram.180;0
>>> histogram.181;0
>>> histogram.182;0
>>> histogram.183;0
>>> histogram.184;0
>>> histogram.185;0
>>> histogram.186;0
>>> histogram.187;0
>>> histogram.188;0
>>> histogram.189;0
>>> histogram.19;0
>>> histogram.190;0
>>> histogram.191;0
>>> histogram.192;0
>>> histogram.193;0
>>> histogram.194;0
>>> histogram.195;0
>>> histogram.196;0
>>> histogram.197;0
>>> histogram.198;0
>>> histogram.199;0
>>> histogram.2;0
>>> histogram.20;0
>>> histogram.200;0
>>> histogram.201;0
>>> histogram.202;0
>>> histogram.203;0
>>> histogram.204;0
>>> histogram.205;0
>>> histogram.206;0
>>> histogram.207;0
>>> histogram.208;0
>>> histogram.209;0
>>> histogram.21;0
>>> histogram.210;0
>>> histogram.211;0
>>> histogram.212;0
>>> histogram.213;0
>>> histogram.214;0
>>> histogram.215;0
>>> histogram.216;0
>>> histogram.217;0
>>> histogram.218;0
>>> histogram.219;0
>>> histogram.22;0
>>> histogram.220;0
>>> histogram.221;0
>>> histogram.222;0
>>> histogram.223;0
>>> histogram.224;0
>>> histogram.225;0
>>> histogram.226;0
>>> histogram.227;0
>>> histogram.228;0
>>> histogram.229;0
>>> histogram.23;0
>>> histogram.230;0
>>> histogram.231;0
>>> histogram.232;0
>>> histogram.233;0
>>> histogram.234;0
>>> histogram.235;0
>>> histogram.236;0
>>> histogram.237;0
>>> histogram.238;0
>>> histogram.239;0
>>> histogram.24;0
>>> histogram.240;0
>>> histogram.241;0
>>> histogram.242;0
>>> histogram.243;0
>>> histogram.244;0
>>> histogram.245;0
>>> histogram.246;0
>>> histogram.247;0
>>> histogram.248;0
>>> histogram.249;0
>>> histogram.25;0
>>> histogram.250;0
>>> histogram.251;0
>>> histogram.252;0
>>> histogram.253;0
>>> histogram.254;0
>>> histogram.255;0
>>> histogram.26;0
>>> histogram.27;0
>>> histogram.28;0
>>> histogram.29;0
>>> histogram.3;0
>>> histogram.30;0
>>> histogram.31;0
>>> histogram.32;11930422
>>> histogram.33;11934311
>>> histogram.34;11930459
>>> histogram.35;11924776
>>> histogram.36;11924186
>>> histogram.37;11928616
>>> histogram.38;11929474
>>> histogram.39;11929607
>>> histogram.4;0
>>> histogram.40;11928053
>>> histogram.41;11930402
>>> histogram.42;11926830
>>> histogram.43;11938138
>>> histogram.44;11932536
>>> histogram.45;11931053
>>> histogram.46;11930008
>>> histogram.47;11927747
>>> histogram.48;11936055
>>> histogram.49;11931471
>>> histogram.5;0
>>> histogram.50;11931921
>>> histogram.51;11929643
>>> histogram.52;11923847
>>> histogram.53;11927311
>>> histogram.54;11933754
>>> histogram.55;11925964
>>> histogram.56;11928872
>>> histogram.57;11931124
>>> histogram.58;11928474
>>> histogram.59;11925814
>>> histogram.6;0
>>> histogram.60;11933978
>>> histogram.61;11934136
>>> histogram.62;11932016
>>> histogram.63;23864588
>>> histogram.64;11924792
>>> histogram.65;11934789
>>> histogram.66;11933047
>>> histogram.67;11931899
>>> histogram.68;11935615
>>> histogram.69;11927249
>>> histogram.7;0
>>> histogram.70;11933276
>>> histogram.71;11927953
>>> histogram.72;11929275
>>> histogram.73;11930292
>>> histogram.74;11935428
>>> histogram.75;11930317
>>> histogram.76;11935737
>>> histogram.77;11932127
>>> histogram.78;11932344
>>> histogram.79;11932094
>>> histogram.8;0
>>> histogram.80;11930688
>>> histogram.81;11928415
>>> histogram.82;11931559
>>> histogram.83;11934192
>>> histogram.84;11927224
>>> histogram.85;11929491
>>> histogram.86;11930624
>>> histogram.87;11932201
>>> histogram.88;11930694
>>> histogram.89;11936439
>>> histogram.9;11933187
>>> histogram.90;11926445
>>> histogram.91;0
>>> histogram.92;0
>>> histogram.93;0
>>> histogram.94;11931596
>>> histogram.95;11929379
>>> histogram.96;0
>>> histogram.97;11928864
>>> histogram.98;11924738
>>> histogram.99;11930062
>>> histogram.totalBytes;1073741824
>>> 
>>> File 2:
>>> histogram.0;0
>>> histogram.1;0
>>> histogram.10;11932402
>>> histogram.100;11927531
>>> histogram.101;11928454
>>> histogram.102;11934432
>>> histogram.103;11924623
>>> histogram.104;11934492
>>> histogram.105;11934585
>>> histogram.106;11928955
>>> histogram.107;11928651
>>> histogram.108;11930139
>>> histogram.109;11929325
>>> histogram.11;0
>>> histogram.110;11930486
>>> histogram.111;11933517
>>> histogram.112;11928334
>>> histogram.113;11927798
>>> histogram.114;11929222
>>> histogram.115;11932057
>>> histogram.116;11931182
>>> histogram.117;11933407
>>> histogram.118;11932709
>>> histogram.119;11931338
>>> histogram.12;0
>>> histogram.120;11933700
>>> histogram.121;11929803
>>> histogram.122;11930218
>>> histogram.123;0
>>> histogram.124;0
>>> histogram.125;0
>>> histogram.126;0
>>> histogram.127;0
>>> histogram.128;0
>>> histogram.129;0
>>> histogram.13;0
>>> histogram.130;0
>>> histogram.131;0
>>> histogram.132;0
>>> histogram.133;0
>>> histogram.134;0
>>> histogram.135;0
>>> histogram.136;0
>>> histogram.137;0
>>> histogram.138;0
>>> histogram.139;0
>>> histogram.14;0
>>> histogram.140;0
>>> histogram.141;0
>>> histogram.142;0
>>> histogram.143;0
>>> histogram.144;0
>>> histogram.145;0
>>> histogram.146;0
>>> histogram.147;0
>>> histogram.148;0
>>> histogram.149;0
>>> histogram.15;0
>>> histogram.150;0
>>> histogram.151;0
>>> histogram.152;0
>>> histogram.153;0
>>> histogram.154;0
>>> histogram.155;0
>>> histogram.156;0
>>> histogram.157;0
>>> histogram.158;0
>>> histogram.159;0
>>> histogram.16;0
>>> histogram.160;0
>>> histogram.161;0
>>> histogram.162;0
>>> histogram.163;0
>>> histogram.164;0
>>> histogram.165;0
>>> histogram.166;0
>>> histogram.167;0
>>> histogram.168;0
>>> histogram.169;0
>>> histogram.17;0
>>> histogram.170;0
>>> histogram.171;0
>>> histogram.172;0
>>> histogram.173;0
>>> histogram.174;0
>>> histogram.175;0
>>> histogram.176;0
>>> histogram.177;0
>>> histogram.178;0
>>> histogram.179;0
>>> histogram.18;0
>>> histogram.180;0
>>> histogram.181;0
>>> histogram.182;0
>>> histogram.183;0
>>> histogram.184;0
>>> histogram.185;0
>>> histogram.186;0
>>> histogram.187;0
>>> histogram.188;0
>>> histogram.189;0
>>> histogram.19;0
>>> histogram.190;0
>>> histogram.191;0
>>> histogram.192;0
>>> histogram.193;0
>>> histogram.194;0
>>> histogram.195;0
>>> histogram.196;0
>>> histogram.197;0
>>> histogram.198;0
>>> histogram.199;0
>>> histogram.2;0
>>> histogram.20;0
>>> histogram.200;0
>>> histogram.201;0
>>> histogram.202;0
>>> histogram.203;0
>>> histogram.204;0
>>> histogram.205;0
>>> histogram.206;0
>>> histogram.207;0
>>> histogram.208;0
>>> histogram.209;0
>>> histogram.21;0
>>> histogram.210;0
>>> histogram.211;0
>>> histogram.212;0
>>> histogram.213;0
>>> histogram.214;0
>>> histogram.215;0
>>> histogram.216;0
>>> histogram.217;0
>>> histogram.218;0
>>> histogram.219;0
>>> histogram.22;0
>>> histogram.220;0
>>> histogram.221;0
>>> histogram.222;0
>>> histogram.223;0
>>> histogram.224;0
>>> histogram.225;0
>>> histogram.226;0
>>> histogram.227;0
>>> histogram.228;0
>>> histogram.229;0
>>> histogram.23;0
>>> histogram.230;0
>>> histogram.231;0
>>> histogram.232;0
>>> histogram.233;0
>>> histogram.234;0
>>> histogram.235;0
>>> histogram.236;0
>>> histogram.237;0
>>> histogram.238;0
>>> histogram.239;0
>>> histogram.24;0
>>> histogram.240;0
>>> histogram.241;0
>>> histogram.242;0
>>> histogram.243;0
>>> histogram.244;0
>>> histogram.245;0
>>> histogram.246;0
>>> histogram.247;0
>>> histogram.248;0
>>> histogram.249;0
>>> histogram.25;0
>>> histogram.250;0
>>> histogram.251;0
>>> histogram.252;0
>>> histogram.253;0
>>> histogram.254;0
>>> histogram.255;0
>>> histogram.26;0
>>> histogram.27;0
>>> histogram.28;0
>>> histogram.29;0
>>> histogram.3;0
>>> histogram.30;0
>>> histogram.31;0
>>> histogram.32;11924458
>>> histogram.33;11934243
>>> histogram.34;11930696
>>> histogram.35;11925574
>>> histogram.36;11929198
>>> histogram.37;11928146
>>> histogram.38;11932505
>>> histogram.39;11929406
>>> histogram.4;0
>>> histogram.40;11930100
>>> histogram.41;11930867
>>> histogram.42;11930796
>>> histogram.43;11930796
>>> histogram.44;11921866
>>> histogram.45;11935682
>>> histogram.46;11930075
>>> histogram.47;11928169
>>> histogram.48;11933490
>>> histogram.49;11932174
>>> histogram.5;0
>>> histogram.50;11933255
>>> histogram.51;11934009
>>> histogram.52;11928361
>>> histogram.53;11927626
>>> histogram.54;11931611
>>> histogram.55;11930755
>>> histogram.56;11933823
>>> histogram.57;11922508
>>> histogram.58;11930384
>>> histogram.59;11929805
>>> histogram.6;0
>>> histogram.60;11930064
>>> histogram.61;11926761
>>> histogram.62;11927605
>>> histogram.63;23858926
>>> histogram.64;11929516
>>> histogram.65;11930217
>>> histogram.66;11930478
>>> histogram.67;11939855
>>> histogram.68;11927850
>>> histogram.69;11931154
>>> histogram.7;0
>>> histogram.70;11935374
>>> histogram.71;11930754
>>> histogram.72;11928304
>>> histogram.73;11931772
>>> histogram.74;11939417
>>> histogram.75;11930712
>>> histogram.76;11933331
>>> histogram.77;11931279
>>> histogram.78;11928276
>>> histogram.79;11930071
>>> histogram.8;0
>>> histogram.80;11927830
>>> histogram.81;11931213
>>> histogram.82;11930964
>>> histogram.83;11928973
>>> histogram.84;11934325
>>> histogram.85;11929658
>>> histogram.86;11924667
>>> histogram.87;11931100
>>> histogram.88;11930252
>>> histogram.89;11927281
>>> histogram.9;11932848
>>> histogram.90;11930398
>>> histogram.91;0
>>> histogram.92;0
>>> histogram.93;0
>>> histogram.94;11928720
>>> histogram.95;11928988
>>> histogram.96;0
>>> histogram.97;11931423
>>> histogram.98;11928181
>>> histogram.99;11935549
>>> histogram.totalBytes;1073741824
>>> 
>>> File3:
>>> histogram.0;0
>>> histogram.1;0
>>> histogram.10;11930417
>>> histogram.100;11926739
>>> histogram.101;11930580
>>> histogram.102;11928210
>>> histogram.103;11935300
>>> histogram.104;11925804
>>> histogram.105;11931023
>>> histogram.106;11932342
>>> histogram.107;11929778
>>> histogram.108;11930098
>>> histogram.109;11930759
>>> histogram.11;0
>>> histogram.110;11934343
>>> histogram.111;11935775
>>> histogram.112;11933877
>>> histogram.113;11926675
>>> histogram.114;11929332
>>> histogram.115;11928876
>>> histogram.116;11927819
>>> histogram.117;11932657
>>> histogram.118;11933508
>>> histogram.119;11928808
>>> histogram.12;0
>>> histogram.120;11937532
>>> histogram.121;11926907
>>> histogram.122;11933942
>>> histogram.123;0
>>> histogram.124;0
>>> histogram.125;0
>>> histogram.126;0
>>> histogram.127;0
>>> histogram.128;0
>>> histogram.129;0
>>> histogram.13;0
>>> histogram.130;0
>>> histogram.131;0
>>> histogram.132;0
>>> histogram.133;0
>>> histogram.134;0
>>> histogram.135;0
>>> histogram.136;0
>>> histogram.137;0
>>> histogram.138;0
>>> histogram.139;0
>>> histogram.14;0
>>> histogram.140;0
>>> histogram.141;0
>>> histogram.142;0
>>> histogram.143;0
>>> histogram.144;0
>>> histogram.145;0
>>> histogram.146;0
>>> histogram.147;0
>>> histogram.148;0
>>> histogram.149;0
>>> histogram.15;0
>>> histogram.150;0
>>> histogram.151;0
>>> histogram.152;0
>>> histogram.153;0
>>> histogram.154;0
>>> histogram.155;0
>>> histogram.156;0
>>> histogram.157;0
>>> histogram.158;0
>>> histogram.159;0
>>> histogram.16;0
>>> histogram.160;0
>>> histogram.161;0
>>> histogram.162;0
>>> histogram.163;0
>>> histogram.164;0
>>> histogram.165;0
>>> histogram.166;0
>>> histogram.167;0
>>> histogram.168;0
>>> histogram.169;0
>>> histogram.17;0
>>> histogram.170;0
>>> histogram.171;0
>>> histogram.172;0
>>> histogram.173;0
>>> histogram.174;0
>>> histogram.175;0
>>> histogram.176;0
>>> histogram.177;0
>>> histogram.178;0
>>> histogram.179;0
>>> histogram.18;0
>>> histogram.180;0
>>> histogram.181;0
>>> histogram.182;0
>>> histogram.183;0
>>> histogram.184;0
>>> histogram.185;0
>>> histogram.186;0
>>> histogram.187;0
>>> histogram.188;0
>>> histogram.189;0
>>> histogram.19;0
>>> histogram.190;0
>>> histogram.191;0
>>> histogram.192;0
>>> histogram.193;0
>>> histogram.194;0
>>> histogram.195;0
>>> histogram.196;0
>>> histogram.197;0
>>> histogram.198;0
>>> histogram.199;0
>>> histogram.2;0
>>> histogram.20;0
>>> histogram.200;0
>>> histogram.201;0
>>> histogram.202;0
>>> histogram.203;0
>>> histogram.204;0
>>> histogram.205;0
>>> histogram.206;0
>>> histogram.207;0
>>> histogram.208;0
>>> histogram.209;0
>>> histogram.21;0
>>> histogram.210;0
>>> histogram.211;0
>>> histogram.212;0
>>> histogram.213;0
>>> histogram.214;0
>>> histogram.215;0
>>> histogram.216;0
>>> histogram.217;0
>>> histogram.218;0
>>> histogram.219;0
>>> histogram.22;0
>>> histogram.220;0
>>> histogram.221;0
>>> histogram.222;0
>>> histogram.223;0
>>> histogram.224;0
>>> histogram.225;0
>>> histogram.226;0
>>> histogram.227;0
>>> histogram.228;0
>>> histogram.229;0
>>> histogram.23;0
>>> histogram.230;0
>>> histogram.231;0
>>> histogram.232;0
>>> histogram.233;0
>>> histogram.234;0
>>> histogram.235;0
>>> histogram.236;0
>>> histogram.237;0
>>> histogram.238;0
>>> histogram.239;0
>>> histogram.24;0
>>> histogram.240;0
>>> histogram.241;0
>>> histogram.242;0
>>> histogram.243;0
>>> histogram.244;0
>>> histogram.245;0
>>> histogram.246;0
>>> histogram.247;0
>>> histogram.248;0
>>> histogram.249;0
>>> histogram.25;0
>>> histogram.250;0
>>> histogram.251;0
>>> histogram.252;0
>>> histogram.253;0
>>> histogram.254;0
>>> histogram.255;0
>>> histogram.26;0
>>> histogram.27;0
>>> histogram.28;0
>>> histogram.29;0
>>> histogram.3;0
>>> histogram.30;0
>>> histogram.31;0
>>> histogram.32;11929486
>>> histogram.33;11930737
>>> histogram.34;11931092
>>> histogram.35;11934488
>>> histogram.36;11927605
>>> histogram.37;11930735
>>> histogram.38;11932174
>>> histogram.39;11936180
>>> histogram.4;0
>>> histogram.40;11931666
>>> histogram.41;11927043
>>> histogram.42;11929044
>>> histogram.43;11934104
>>> histogram.44;11936337
>>> histogram.45;11935580
>>> histogram.46;11929598
>>> histogram.47;11934083
>>> histogram.48;11928858
>>> histogram.49;11931098
>>> histogram.5;0
>>> histogram.50;11930618
>>> histogram.51;11925429
>>> histogram.52;11929741
>>> histogram.53;11934160
>>> histogram.54;11931999
>>> histogram.55;11930465
>>> histogram.56;11926194
>>> histogram.57;11926386
>>> histogram.58;11924871
>>> histogram.59;11929331
>>> histogram.6;0
>>> histogram.60;11926951
>>> histogram.61;11928631
>>> histogram.62;11927549
>>> histogram.63;23856730
>>> histogram.64;11930288
>>> histogram.65;11931523
>>> histogram.66;11932821
>>> histogram.67;11932509
>>> histogram.68;11929613
>>> histogram.69;11928651
>>> histogram.7;0
>>> histogram.70;11929253
>>> histogram.71;11931521
>>> histogram.72;11925805
>>> histogram.73;11934833
>>> histogram.74;11928314
>>> histogram.75;11923854
>>> histogram.76;11930892
>>> histogram.77;11927528
>>> histogram.78;11932850
>>> histogram.79;11934471
>>> histogram.8;0
>>> histogram.80;11925707
>>> histogram.81;11929213
>>> histogram.82;11931334
>>> histogram.83;11936739
>>> histogram.84;11927855
>>> histogram.85;11931668
>>> histogram.86;11928609
>>> histogram.87;11931930
>>> histogram.88;11934341
>>> histogram.89;11927519
>>> histogram.9;11928004
>>> histogram.90;11933502
>>> histogram.91;0
>>> histogram.92;0
>>> histogram.93;0
>>> histogram.94;11932024
>>> histogram.95;11932693
>>> histogram.96;0
>>> histogram.97;11928428
>>> histogram.98;11933195
>>> histogram.99;11924273
>>> histogram.totalBytes;1073741824
>>> 
>>> Kind regards
>>> Jens
>>> 
>>> Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <jo...@gmail.com>:
>>>> 
>>>> Jen
>>>> 
>>>> 118 hours in - still goood.
>>>> 
>>>> Thanks
>>>> 
>>>> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com> wrote:
>>>>> 
>>>>> Jens
>>>>> 
>>>>> Update from hour 67.  Still lookin' good.
>>>>> 
>>>>> Will advise.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>> 
>>>>>> Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 days before the first error occurred
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
>>>>>>> 
>>>>>>> Jens,
>>>>>>> 
>>>>>>> Am 40+ hours in running both your flow and mine to reproduce.  So far
>>>>>>> neither have shown any sign of trouble.  Will keep running for another
>>>>>>> week or so if I can.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>>> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
>>>>>>>> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
>>>>>>>> 
>>>>>>>> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
>>>>>>>> 
>>>>>>>> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> Jens
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> I don't quite follow the EXT4 usage on top of VMFS but the point here
>>>>>>>> is you'll ultimately need to truly understand your underlying storage
>>>>>>>> system and what sorts of guarantees it is giving you.  If linux/the
>>>>>>>> jvm/nifi think it has a typical EXT4 type block storage system to work
>>>>>>>> with it can only be safe/operate within those constraints.  I have no
>>>>>>>> idea about what VMFS brings to the table or the settings for it.
>>>>>>>> 
>>>>>>>> The sync properties I shared previously might help force the issue of
>>>>>>>> ensuring a formal sync/flush cycle all the way through the disk has
>>>>>>>> occurred which we'd normally not do or need to do but again in some
>>>>>>>> cases offers a stronger guarantee in exchange for performance.
>>>>>>>> 
>>>>>>>> In any case...Mark's path for you here will help identify what we're
>>>>>>>> dealing with and we can go from there.
>>>>>>>> 
>>>>>>>> I am aware of significant usage of NiFi on VMWare configurations
>>>>>>>> without issue at high rates for many years so whatever it is here is
>>>>>>>> likely solvable.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi Mark
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
>>>>>>>> 
>>>>>>>> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
>>>>>>>> 
>>>>>>>> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
>>>>>>>> 
>>>>>>>> Just like the case in step 5 and 12 in the pdf file
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I will let you all know more later next week
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> 
>>>>>>>> Jens
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> And the actual script:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> import org.apache.nifi.flowfile.FlowFile
>>>>>>>> 
>>>>>>>> 
>>>>>>>> import java.util.stream.Collectors
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>>>>>>>> 
>>>>>>>>  final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
>>>>>>>> 
>>>>>>>>      .filter({ entry -> entry.getKey().startsWith("histogram.") })
>>>>>>>> 
>>>>>>>>      .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
>>>>>>>> 
>>>>>>>>  return histogram;
>>>>>>>> 
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
>>>>>>>> 
>>>>>>>>  final Map<String, String> histogram = new HashMap<>();
>>>>>>>> 
>>>>>>>>  final int[] distribution = new int[256];
>>>>>>>> 
>>>>>>>>  Arrays.fill(distribution, 0);
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  long total = 0L;
>>>>>>>> 
>>>>>>>>  final byte[] buffer = new byte[8192];
>>>>>>>> 
>>>>>>>>  int len;
>>>>>>>> 
>>>>>>>>  while ((len = inStream.read(buffer)) > 0) {
>>>>>>>> 
>>>>>>>>      for (int i=0; i < len; i++) {
>>>>>>>> 
>>>>>>>>          final int val = buffer[i];
>>>>>>>> 
>>>>>>>>          distribution[val]++;
>>>>>>>> 
>>>>>>>>          total++;
>>>>>>>> 
>>>>>>>>      }
>>>>>>>> 
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  for (int i=0; i < 256; i++) {
>>>>>>>> 
>>>>>>>>      histogram.put("histogram." + i, String.valueOf(distribution[i]));
>>>>>>>> 
>>>>>>>>  }
>>>>>>>> 
>>>>>>>>  histogram.put("histogram.totalBytes", String.valueOf(total));
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  return histogram;
>>>>>>>> 
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
>>>>>>>> 
>>>>>>>>  final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
>>>>>>>> 
>>>>>>>>  final Map<String, String> sorted = new TreeMap<>(previous)
>>>>>>>> 
>>>>>>>>  for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>>>>>>>> 
>>>>>>>>      final String key = entry.getKey();
>>>>>>>> 
>>>>>>>>      final String previousValue = entry.getValue();
>>>>>>>> 
>>>>>>>>      final String updatedValue = updated.get(entry.getKey())
>>>>>>>> 
>>>>>>>> 
>>>>>>>>      if (!Objects.equals(previousValue, updatedValue)) {
>>>>>>>> 
>>>>>>>>          sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
>>>>>>>> 
>>>>>>>>      }
>>>>>>>> 
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  log.error(sb.toString());
>>>>>>>> 
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> def flowFile = session.get()
>>>>>>>> 
>>>>>>>> if (flowFile == null) {
>>>>>>>> 
>>>>>>>>  return
>>>>>>>> 
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
>>>>>>>> 
>>>>>>>> Map<String, String> histogram = null;
>>>>>>>> 
>>>>>>>> 
>>>>>>>> final InputStream inStream = session.read(flowFile);
>>>>>>>> 
>>>>>>>> try {
>>>>>>>> 
>>>>>>>>  histogram = createHistogram(flowFile, inStream);
>>>>>>>> 
>>>>>>>> } finally {
>>>>>>>> 
>>>>>>>>  inStream.close()
>>>>>>>> 
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> if (!previousHistogram.isEmpty()) {
>>>>>>>> 
>>>>>>>>  if (previousHistogram.equals(histogram)) {
>>>>>>>> 
>>>>>>>>      log.info("Histograms match")
>>>>>>>> 
>>>>>>>>  } else {
>>>>>>>> 
>>>>>>>>      logHistogramDifferences(previousHistogram, histogram)
>>>>>>>> 
>>>>>>>>      session.transfer(flowFile, REL_FAILURE)
>>>>>>>> 
>>>>>>>>      return;
>>>>>>>> 
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> flowFile = session.putAllAttributes(flowFile, histogram)
>>>>>>>> 
>>>>>>>> session.transfer(flowFile, REL_SUCCESS)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
>>>>>>>> 
>>>>>>>> histogram.0 = 280273
>>>>>>>> 
>>>>>>>> histogram.1 = 2820
>>>>>>>> 
>>>>>>>> histogram.2 = 48202
>>>>>>>> 
>>>>>>>> histogram.3 = 3820
>>>>>>>> 
>>>>>>>> …
>>>>>>>> 
>>>>>>>> histogram.totalBytes = 1780928732
>>>>>>>> 
>>>>>>>> 
>>>>>>>> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> -Mark
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Still want to know more about your underlying storage system.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> You could also try updating nifi.properties and changing the following lines:
>>>>>>>> 
>>>>>>>> nifi.flowfile.repository.always.sync=true
>>>>>>>> 
>>>>>>>> nifi.content.repository.always.sync=true
>>>>>>>> 
>>>>>>>> nifi.provenance.repository.always.sync=true
>>>>>>>> 
>>>>>>>> 
>>>>>>>> It will hurt performance but can be useful/necessary on certain storage subsystems.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> Joe
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Dear Joe and Mark
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>>>>>>>> 
>>>>>>>> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>>>>>>>> 
>>>>>>>> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> 
>>>>>>>> Jens M. Kofoed
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Joe,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> To start from the last mail :-)
>>>>>>>> 
>>>>>>>> All the repositories has it's own disk, and I'm using ext4
>>>>>>>> 
>>>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>>>>>> 
>>>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>>>>>> 
>>>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>>>>> 
>>>>>>>> 
>>>>>>>> My test flow WITH sftp looks like this:
>>>>>>>> 
>>>>>>>> <image.png>
>>>>>>>> 
>>>>>>>> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>>>>>>> 
>>>>>>>> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> About the Lineage:
>>>>>>>> 
>>>>>>>> Are there a way to export all the lineage data? The export only generate a svg file.
>>>>>>>> 
>>>>>>>> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>>>>>>>> 
>>>>>>>> The interesting steps are step 5 and 12.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> 
>>>>>>>> Jens M. Kofoed
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Also what type of file system/storage system are you running NiFi on
>>>>>>>> 
>>>>>>>> in this case?  We'll need to know this for the NiFi
>>>>>>>> 
>>>>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> And to further narrow this down
>>>>>>>> 
>>>>>>>> 
>>>>>>>> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>>>>>>> 
>>>>>>>> (2 files per node) and next process was a hashcontent before it run
>>>>>>>> 
>>>>>>>> into a test loop. Where files are uploaded via PutSFTP to a test
>>>>>>>> 
>>>>>>>> server, and downloaded again and recalculated the hash. I have had one
>>>>>>>> 
>>>>>>>> issue after 3 days of running."
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So to be clear with GenerateFlowFile making these files and then you
>>>>>>>> 
>>>>>>>> looping the content is wholly and fully exclusively within the control
>>>>>>>> 
>>>>>>>> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>>>>>> 
>>>>>>>> same files over and over in nifi itself you can make this happen or
>>>>>>>> 
>>>>>>>> cannot?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> "After fetching a FlowFile-stream file and unpacked it back into NiFi
>>>>>>>> 
>>>>>>>> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>>>>>> 
>>>>>>>> exact same file. And got a new hash. That is what worry’s me.
>>>>>>>> 
>>>>>>>> The fact that the same file can be recalculated and produce two
>>>>>>>> 
>>>>>>>> different hashes, is very strange, but it happens. "
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Ok so to confirm you are saying that in each case this happens you see
>>>>>>>> 
>>>>>>>> it first compute the wrong hash, but then if you retry the same
>>>>>>>> 
>>>>>>>> flowfile it then provides the correct hash?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Can you please also show/share the lineage history for such a flow
>>>>>>>> 
>>>>>>>> file then?  It should have events for the initial hash, second hash,
>>>>>>>> 
>>>>>>>> the unpacking, trace to the original stream, etc...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Dear Mark and Joe
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>>>>>> 
>>>>>>>> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>>>>>>>> 
>>>>>>>> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>>>>>>>> 
>>>>>>>> Now the test flow is running without the Put/Fetch sftp processors.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>>>>>>>> 
>>>>>>>> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kind Regards
>>>>>>>> 
>>>>>>>> Jens M. Kofoed
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks for sharing the images.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> -Mark
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Actually is this current loop test contained within a single nifi and there you see corruption happen?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Joe
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Joe
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>>>>>>>> 
>>>>>>>> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>>>>>>>> 
>>>>>>>> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Any suggestions for where to dig ?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> 
>>>>>>>> Jens M. Kofoed
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi Mark
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks for replaying and the suggestion to look at the content Claim.
>>>>>>>> 
>>>>>>>> These 3 pictures is from the first attempt:
>>>>>>>> 
>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Yesterday I realized that the content was still in the archive, so I could Replay the file.
>>>>>>>> 
>>>>>>>> <image.png>
>>>>>>>> 
>>>>>>>> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>>>>>>>> 
>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>>>>>>>> 
>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>> 
>>>>>>>> Here the content Claim is all the same.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>>>>>>>> 
>>>>>>>> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>>>>>>>> 
>>>>>>>> What can influence the same processor to calculate 2 different sha256 on the exact same content???
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> 
>>>>>>>> Jens M. Kofoed
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>>>>>>>> 
>>>>>>>> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> -Mark
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Dear NIFI Users
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>>>>>>>> 
>>>>>>>> The background:
>>>>>>>> 
>>>>>>>> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ----
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>>>>>>>> 
>>>>>>>> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>>>>>>>> 
>>>>>>>> <image.png>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>>>>>>> 
>>>>>>>> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> System 1:
>>>>>>>> 
>>>>>>>> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>>> 
>>>>>>>> The file is exported as a "FlowFile Stream, v3" to System 2
>>>>>>>> 
>>>>>>>> 
>>>>>>>> SYSTEM 2:
>>>>>>>> 
>>>>>>>> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>> 
>>>>>>>> <image.png>
>>>>>>>> 
>>>>>>>> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>> 
>>>>>>>> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>> 
>>>>>>>> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>> 
>>>>>>>> 
>>>>>>>> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>>> 
>>>>>>>> <image.png>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> How on earth can this happen???
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kind Regards
>>>>>>>> 
>>>>>>>> Jens M. Kofoed
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> <Repro.json>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> <Try_to_recreate_Jens_Challenge.json>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>> 
>>> 


Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

184 hours (7.6 days) in and zero issues.

Will need to turn this off soon but wanted to give a final update.
Looks great.  Given the information on your system there appears to be
something we dont understand related to the virtual file system
involved or something.

Thanks

On Tue, Nov 2, 2021 at 10:55 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Hi Mark
>
> Of course, sorry :-)  By looking at the error messages, I can see that it is only the histograms which has differences which is listed. And all 3 have the first issue at histogram.9. Don't know what that mean
>
> /Jens
>
> Here are the error log:
> 2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are differences in the histogram
> Byte Value: histogram.10, Previous Count: 11926720, New Count: 11926721
> Byte Value: histogram.100, Previous Count: 11927504, New Count: 11927503
> Byte Value: histogram.101, Previous Count: 11925396, New Count: 11925407
> Byte Value: histogram.102, Previous Count: 11929923, New Count: 11929941
> Byte Value: histogram.103, Previous Count: 11931596, New Count: 11931591
> Byte Value: histogram.104, Previous Count: 11929071, New Count: 11929064
> Byte Value: histogram.105, Previous Count: 11931365, New Count: 11931348
> Byte Value: histogram.106, Previous Count: 11928661, New Count: 11928645
> Byte Value: histogram.107, Previous Count: 11929864, New Count: 11929866
> Byte Value: histogram.108, Previous Count: 11931611, New Count: 11931642
> Byte Value: histogram.109, Previous Count: 11932758, New Count: 11932763
> Byte Value: histogram.110, Previous Count: 11927893, New Count: 11927895
> Byte Value: histogram.111, Previous Count: 11933519, New Count: 11933522
> Byte Value: histogram.112, Previous Count: 11931392, New Count: 11931397
> Byte Value: histogram.113, Previous Count: 11928534, New Count: 11928548
> Byte Value: histogram.114, Previous Count: 11936879, New Count: 11936874
> Byte Value: histogram.115, Previous Count: 11932818, New Count: 11932804
> Byte Value: histogram.117, Previous Count: 11929143, New Count: 11929151
> Byte Value: histogram.118, Previous Count: 11931854, New Count: 11931829
> Byte Value: histogram.119, Previous Count: 11926333, New Count: 11926327
> Byte Value: histogram.120, Previous Count: 11928731, New Count: 11928740
> Byte Value: histogram.121, Previous Count: 11931149, New Count: 11931162
> Byte Value: histogram.122, Previous Count: 11926725, New Count: 11926733
> Byte Value: histogram.32, Previous Count: 11930422, New Count: 11930425
> Byte Value: histogram.33, Previous Count: 11934311, New Count: 11934313
> Byte Value: histogram.34, Previous Count: 11930459, New Count: 11930446
> Byte Value: histogram.35, Previous Count: 11924776, New Count: 11924758
> Byte Value: histogram.36, Previous Count: 11924186, New Count: 11924183
> Byte Value: histogram.37, Previous Count: 11928616, New Count: 11928627
> Byte Value: histogram.38, Previous Count: 11929474, New Count: 11929490
> Byte Value: histogram.39, Previous Count: 11929607, New Count: 11929600
> Byte Value: histogram.40, Previous Count: 11928053, New Count: 11928048
> Byte Value: histogram.41, Previous Count: 11930402, New Count: 11930399
> Byte Value: histogram.42, Previous Count: 11926830, New Count: 11926846
> Byte Value: histogram.44, Previous Count: 11932536, New Count: 11932538
> Byte Value: histogram.45, Previous Count: 11931053, New Count: 11931044
> Byte Value: histogram.46, Previous Count: 11930008, New Count: 11930011
> Byte Value: histogram.47, Previous Count: 11927747, New Count: 11927734
> Byte Value: histogram.48, Previous Count: 11936055, New Count: 11936057
> Byte Value: histogram.49, Previous Count: 11931471, New Count: 11931474
> Byte Value: histogram.50, Previous Count: 11931921, New Count: 11931908
> Byte Value: histogram.51, Previous Count: 11929643, New Count: 11929637
> Byte Value: histogram.52, Previous Count: 11923847, New Count: 11923854
> Byte Value: histogram.53, Previous Count: 11927311, New Count: 11927303
> Byte Value: histogram.54, Previous Count: 11933754, New Count: 11933766
> Byte Value: histogram.55, Previous Count: 11925964, New Count: 11925970
> Byte Value: histogram.56, Previous Count: 11928872, New Count: 11928873
> Byte Value: histogram.57, Previous Count: 11931124, New Count: 11931127
> Byte Value: histogram.58, Previous Count: 11928474, New Count: 11928477
> Byte Value: histogram.59, Previous Count: 11925814, New Count: 11925812
> Byte Value: histogram.60, Previous Count: 11933978, New Count: 11933991
> Byte Value: histogram.61, Previous Count: 11934136, New Count: 11934123
> Byte Value: histogram.62, Previous Count: 11932016, New Count: 11932011
> Byte Value: histogram.63, Previous Count: 23864588, New Count: 23864584
> Byte Value: histogram.64, Previous Count: 11924792, New Count: 11924789
> Byte Value: histogram.65, Previous Count: 11934789, New Count: 11934797
> Byte Value: histogram.66, Previous Count: 11933047, New Count: 11933044
> Byte Value: histogram.67, Previous Count: 11931899, New Count: 11931909
> Byte Value: histogram.68, Previous Count: 11935615, New Count: 11935609
> Byte Value: histogram.69, Previous Count: 11927249, New Count: 11927239
> Byte Value: histogram.70, Previous Count: 11933276, New Count: 11933274
> Byte Value: histogram.71, Previous Count: 11927953, New Count: 11927969
> Byte Value: histogram.72, Previous Count: 11929275, New Count: 11929266
> Byte Value: histogram.73, Previous Count: 11930292, New Count: 11930306
> Byte Value: histogram.74, Previous Count: 11935428, New Count: 11935427
> Byte Value: histogram.75, Previous Count: 11930317, New Count: 11930307
> Byte Value: histogram.76, Previous Count: 11935737, New Count: 11935726
> Byte Value: histogram.77, Previous Count: 11932127, New Count: 11932125
> Byte Value: histogram.78, Previous Count: 11932344, New Count: 11932349
> Byte Value: histogram.79, Previous Count: 11932094, New Count: 11932100
> Byte Value: histogram.80, Previous Count: 11930688, New Count: 11930687
> Byte Value: histogram.81, Previous Count: 11928415, New Count: 11928416
> Byte Value: histogram.82, Previous Count: 11931559, New Count: 11931542
> Byte Value: histogram.83, Previous Count: 11934192, New Count: 11934176
> Byte Value: histogram.84, Previous Count: 11927224, New Count: 11927231
> Byte Value: histogram.85, Previous Count: 11929491, New Count: 11929484
> Byte Value: histogram.87, Previous Count: 11932201, New Count: 11932190
> Byte Value: histogram.88, Previous Count: 11930694, New Count: 11930680
> Byte Value: histogram.89, Previous Count: 11936439, New Count: 11936448
> Byte Value: histogram.9, Previous Count: 11933187, New Count: 11933193
> Byte Value: histogram.90, Previous Count: 11926445, New Count: 11926455
> Byte Value: histogram.94, Previous Count: 11931596, New Count: 11931609
> Byte Value: histogram.95, Previous Count: 11929379, New Count: 11929384
> Byte Value: histogram.97, Previous Count: 11928864, New Count: 11928874
> Byte Value: histogram.98, Previous Count: 11924738, New Count: 11924729
> Byte Value: histogram.99, Previous Count: 11930062, New Count: 11930059
>
> 2021-11-01 22:10:02,765 ERROR [Timer-Driven Process Thread-9] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are differences in the histogram
> Byte Value: histogram.10, Previous Count: 11932402, New Count: 11932407
> Byte Value: histogram.100, Previous Count: 11927531, New Count: 11927541
> Byte Value: histogram.101, Previous Count: 11928454, New Count: 11928430
> Byte Value: histogram.102, Previous Count: 11934432, New Count: 11934439
> Byte Value: histogram.103, Previous Count: 11924623, New Count: 11924633
> Byte Value: histogram.104, Previous Count: 11934492, New Count: 11934474
> Byte Value: histogram.105, Previous Count: 11934585, New Count: 11934591
> Byte Value: histogram.106, Previous Count: 11928955, New Count: 11928948
> Byte Value: histogram.108, Previous Count: 11930139, New Count: 11930140
> Byte Value: histogram.109, Previous Count: 11929325, New Count: 11929321
> Byte Value: histogram.110, Previous Count: 11930486, New Count: 11930478
> Byte Value: histogram.111, Previous Count: 11933517, New Count: 11933508
> Byte Value: histogram.112, Previous Count: 11928334, New Count: 11928339
> Byte Value: histogram.114, Previous Count: 11929222, New Count: 11929213
> Byte Value: histogram.116, Previous Count: 11931182, New Count: 11931188
> Byte Value: histogram.117, Previous Count: 11933407, New Count: 11933402
> Byte Value: histogram.118, Previous Count: 11932709, New Count: 11932705
> Byte Value: histogram.120, Previous Count: 11933700, New Count: 11933708
> Byte Value: histogram.121, Previous Count: 11929803, New Count: 11929801
> Byte Value: histogram.122, Previous Count: 11930218, New Count: 11930220
> Byte Value: histogram.32, Previous Count: 11924458, New Count: 11924469
> Byte Value: histogram.33, Previous Count: 11934243, New Count: 11934248
> Byte Value: histogram.34, Previous Count: 11930696, New Count: 11930700
> Byte Value: histogram.35, Previous Count: 11925574, New Count: 11925577
> Byte Value: histogram.36, Previous Count: 11929198, New Count: 11929187
> Byte Value: histogram.37, Previous Count: 11928146, New Count: 11928143
> Byte Value: histogram.38, Previous Count: 11932505, New Count: 11932510
> Byte Value: histogram.39, Previous Count: 11929406, New Count: 11929412
> Byte Value: histogram.40, Previous Count: 11930100, New Count: 11930098
> Byte Value: histogram.41, Previous Count: 11930867, New Count: 11930872
> Byte Value: histogram.42, Previous Count: 11930796, New Count: 11930793
> Byte Value: histogram.43, Previous Count: 11930796, New Count: 11930789
> Byte Value: histogram.44, Previous Count: 11921866, New Count: 11921865
> Byte Value: histogram.45, Previous Count: 11935682, New Count: 11935699
> Byte Value: histogram.46, Previous Count: 11930075, New Count: 11930073
> Byte Value: histogram.47, Previous Count: 11928169, New Count: 11928165
> Byte Value: histogram.48, Previous Count: 11933490, New Count: 11933478
> Byte Value: histogram.49, Previous Count: 11932174, New Count: 11932180
> Byte Value: histogram.50, Previous Count: 11933255, New Count: 11933239
> Byte Value: histogram.51, Previous Count: 11934009, New Count: 11934013
> Byte Value: histogram.52, Previous Count: 11928361, New Count: 11928367
> Byte Value: histogram.53, Previous Count: 11927626, New Count: 11927627
> Byte Value: histogram.54, Previous Count: 11931611, New Count: 11931617
> Byte Value: histogram.55, Previous Count: 11930755, New Count: 11930746
> Byte Value: histogram.56, Previous Count: 11933823, New Count: 11933824
> Byte Value: histogram.57, Previous Count: 11922508, New Count: 11922510
> Byte Value: histogram.58, Previous Count: 11930384, New Count: 11930362
> Byte Value: histogram.59, Previous Count: 11929805, New Count: 11929820
> Byte Value: histogram.60, Previous Count: 11930064, New Count: 11930055
> Byte Value: histogram.61, Previous Count: 11926761, New Count: 11926762
> Byte Value: histogram.62, Previous Count: 11927605, New Count: 11927604
> Byte Value: histogram.63, Previous Count: 23858926, New Count: 23858913
> Byte Value: histogram.64, Previous Count: 11929516, New Count: 11929512
> Byte Value: histogram.65, Previous Count: 11930217, New Count: 11930223
> Byte Value: histogram.66, Previous Count: 11930478, New Count: 11930481
> Byte Value: histogram.67, Previous Count: 11939855, New Count: 11939858
> Byte Value: histogram.68, Previous Count: 11927850, New Count: 11927852
> Byte Value: histogram.69, Previous Count: 11931154, New Count: 11931175
> Byte Value: histogram.70, Previous Count: 11935374, New Count: 11935369
> Byte Value: histogram.71, Previous Count: 11930754, New Count: 11930751
> Byte Value: histogram.72, Previous Count: 11928304, New Count: 11928318
> Byte Value: histogram.73, Previous Count: 11931772, New Count: 11931766
> Byte Value: histogram.74, Previous Count: 11939417, New Count: 11939426
> Byte Value: histogram.75, Previous Count: 11930712, New Count: 11930718
> Byte Value: histogram.76, Previous Count: 11933331, New Count: 11933346
> Byte Value: histogram.77, Previous Count: 11931279, New Count: 11931272
> Byte Value: histogram.78, Previous Count: 11928276, New Count: 11928290
> Byte Value: histogram.79, Previous Count: 11930071, New Count: 11930067
> Byte Value: histogram.80, Previous Count: 11927830, New Count: 11927825
> Byte Value: histogram.81, Previous Count: 11931213, New Count: 11931206
> Byte Value: histogram.82, Previous Count: 11930964, New Count: 11930958
> Byte Value: histogram.83, Previous Count: 11928973, New Count: 11928966
> Byte Value: histogram.84, Previous Count: 11934325, New Count: 11934331
> Byte Value: histogram.85, Previous Count: 11929658, New Count: 11929654
> Byte Value: histogram.86, Previous Count: 11924667, New Count: 11924666
> Byte Value: histogram.87, Previous Count: 11931100, New Count: 11931106
> Byte Value: histogram.88, Previous Count: 11930252, New Count: 11930248
> Byte Value: histogram.89, Previous Count: 11927281, New Count: 11927299
> Byte Value: histogram.9, Previous Count: 11932848, New Count: 11932851
> Byte Value: histogram.90, Previous Count: 11930398, New Count: 11930399
> Byte Value: histogram.94, Previous Count: 11928720, New Count: 11928715
> Byte Value: histogram.95, Previous Count: 11928988, New Count: 11928977
> Byte Value: histogram.97, Previous Count: 11931423, New Count: 11931426
> Byte Value: histogram.98, Previous Count: 11928181, New Count: 11928184
> Byte Value: histogram.99, Previous Count: 11935549, New Count: 11935542
>
> 2021-11-01 22:23:08,989 ERROR [Timer-Driven Process Thread-10] org.apache.nifi.processors.script.ExecuteScript ExecuteScript[id=24d13930-49e8-1062-9a2c-943118738138] There are differences in the histogram
> Byte Value: histogram.10, Previous Count: 11930417, New Count: 11930411
> Byte Value: histogram.100, Previous Count: 11926739, New Count: 11926755
> Byte Value: histogram.101, Previous Count: 11930580, New Count: 11930574
> Byte Value: histogram.102, Previous Count: 11928210, New Count: 11928202
> Byte Value: histogram.103, Previous Count: 11935300, New Count: 11935297
> Byte Value: histogram.104, Previous Count: 11925804, New Count: 11925820
> Byte Value: histogram.105, Previous Count: 11931023, New Count: 11931012
> Byte Value: histogram.106, Previous Count: 11932342, New Count: 11932344
> Byte Value: histogram.108, Previous Count: 11930098, New Count: 11930106
> Byte Value: histogram.109, Previous Count: 11930759, New Count: 11930750
> Byte Value: histogram.110, Previous Count: 11934343, New Count: 11934352
> Byte Value: histogram.111, Previous Count: 11935775, New Count: 11935782
> Byte Value: histogram.112, Previous Count: 11933877, New Count: 11933884
> Byte Value: histogram.113, Previous Count: 11926675, New Count: 11926674
> Byte Value: histogram.114, Previous Count: 11929332, New Count: 11929336
> Byte Value: histogram.115, Previous Count: 11928876, New Count: 11928878
> Byte Value: histogram.116, Previous Count: 11927819, New Count: 11927833
> Byte Value: histogram.117, Previous Count: 11932657, New Count: 11932638
> Byte Value: histogram.118, Previous Count: 11933508, New Count: 11933507
> Byte Value: histogram.119, Previous Count: 11928808, New Count: 11928821
> Byte Value: histogram.120, Previous Count: 11937532, New Count: 11937528
> Byte Value: histogram.121, Previous Count: 11926907, New Count: 11926921
> Byte Value: histogram.32, Previous Count: 11929486, New Count: 11929489
> Byte Value: histogram.33, Previous Count: 11930737, New Count: 11930741
> Byte Value: histogram.34, Previous Count: 11931092, New Count: 11931086
> Byte Value: histogram.36, Previous Count: 11927605, New Count: 11927615
> Byte Value: histogram.37, Previous Count: 11930735, New Count: 11930745
> Byte Value: histogram.38, Previous Count: 11932174, New Count: 11932178
> Byte Value: histogram.39, Previous Count: 11936180, New Count: 11936182
> Byte Value: histogram.40, Previous Count: 11931666, New Count: 11931676
> Byte Value: histogram.41, Previous Count: 11927043, New Count: 11927034
> Byte Value: histogram.42, Previous Count: 11929044, New Count: 11929042
> Byte Value: histogram.43, Previous Count: 11934104, New Count: 11934098
> Byte Value: histogram.44, Previous Count: 11936337, New Count: 11936346
> Byte Value: histogram.45, Previous Count: 11935580, New Count: 11935582
> Byte Value: histogram.46, Previous Count: 11929598, New Count: 11929599
> Byte Value: histogram.47, Previous Count: 11934083, New Count: 11934085
> Byte Value: histogram.48, Previous Count: 11928858, New Count: 11928860
> Byte Value: histogram.49, Previous Count: 11931098, New Count: 11931113
> Byte Value: histogram.50, Previous Count: 11930618, New Count: 11930614
> Byte Value: histogram.51, Previous Count: 11925429, New Count: 11925435
> Byte Value: histogram.52, Previous Count: 11929741, New Count: 11929733
> Byte Value: histogram.53, Previous Count: 11934160, New Count: 11934155
> Byte Value: histogram.54, Previous Count: 11931999, New Count: 11931980
> Byte Value: histogram.55, Previous Count: 11930465, New Count: 11930477
> Byte Value: histogram.56, Previous Count: 11926194, New Count: 11926190
> Byte Value: histogram.57, Previous Count: 11926386, New Count: 11926381
> Byte Value: histogram.58, Previous Count: 11924871, New Count: 11924865
> Byte Value: histogram.59, Previous Count: 11929331, New Count: 11929326
> Byte Value: histogram.60, Previous Count: 11926951, New Count: 11926943
> Byte Value: histogram.61, Previous Count: 11928631, New Count: 11928619
> Byte Value: histogram.62, Previous Count: 11927549, New Count: 11927553
> Byte Value: histogram.63, Previous Count: 23856730, New Count: 23856718
> Byte Value: histogram.64, Previous Count: 11930288, New Count: 11930293
> Byte Value: histogram.65, Previous Count: 11931523, New Count: 11931527
> Byte Value: histogram.66, Previous Count: 11932821, New Count: 11932818
> Byte Value: histogram.67, Previous Count: 11932509, New Count: 11932510
> Byte Value: histogram.68, Previous Count: 11929613, New Count: 11929614
> Byte Value: histogram.69, Previous Count: 11928651, New Count: 11928654
> Byte Value: histogram.70, Previous Count: 11929253, New Count: 11929247
> Byte Value: histogram.71, Previous Count: 11931521, New Count: 11931512
> Byte Value: histogram.72, Previous Count: 11925805, New Count: 11925808
> Byte Value: histogram.73, Previous Count: 11934833, New Count: 11934826
> Byte Value: histogram.74, Previous Count: 11928314, New Count: 11928312
> Byte Value: histogram.75, Previous Count: 11923854, New Count: 11923863
> Byte Value: histogram.76, Previous Count: 11930892, New Count: 11930898
> Byte Value: histogram.77, Previous Count: 11927528, New Count: 11927525
> Byte Value: histogram.78, Previous Count: 11932850, New Count: 11932857
> Byte Value: histogram.79, Previous Count: 11934471, New Count: 11934461
> Byte Value: histogram.80, Previous Count: 11925707, New Count: 11925714
> Byte Value: histogram.81, Previous Count: 11929213, New Count: 11929206
> Byte Value: histogram.82, Previous Count: 11931334, New Count: 11931323
> Byte Value: histogram.83, Previous Count: 11936739, New Count: 11936732
> Byte Value: histogram.84, Previous Count: 11927855, New Count: 11927832
> Byte Value: histogram.85, Previous Count: 11931668, New Count: 11931665
> Byte Value: histogram.86, Previous Count: 11928609, New Count: 11928604
> Byte Value: histogram.87, Previous Count: 11931930, New Count: 11931933
> Byte Value: histogram.88, Previous Count: 11934341, New Count: 11934345
> Byte Value: histogram.89, Previous Count: 11927519, New Count: 11927518
> Byte Value: histogram.9, Previous Count: 11928004, New Count: 11928001
> Byte Value: histogram.90, Previous Count: 11933502, New Count: 11933517
> Byte Value: histogram.94, Previous Count: 11932024, New Count: 11932035
> Byte Value: histogram.95, Previous Count: 11932693, New Count: 11932679
> Byte Value: histogram.97, Previous Count: 11928428, New Count: 11928424
> Byte Value: histogram.98, Previous Count: 11933195, New Count: 11933196
> Byte Value: histogram.99, Previous Count: 11924273, New Count: 11924282
>
> Den tir. 2. nov. 2021 kl. 15.41 skrev Mark Payne <ma...@hotmail.com>:
>>
>> Jens,
>>
>> The histograms, in and of themselves, are not very interesting. The interesting thing would be the difference in the histogram before & after the hash. Can you provide the ERROR level logs generated by the ExecuteScript? That’s what is of interest.
>>
>> Thanks
>> -Mark
>>
>>
>> On Nov 2, 2021, at 1:35 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>
>> Hi Mark and Joe
>>
>> Yesterday morning I implemented Mark's script in my 2 testflows. One testflow using sftp the other MergeContent/UnpackContent. Both testflow are running at a test cluster with 3 nodes and NIFI 1.14.0
>> The 1st flow with sftp have had 1 file going into the failure queue after about 16 hours.
>> The 2nd flow have had 2 files  going into the failure queue after about 15 and 17 hours.
>>
>> There are definitely something going wrongs in my setup, but I can't figure out what.
>>
>> Information from file 1:
>> histogram.0;0
>> histogram.1;0
>> histogram.10;11926720
>> histogram.100;11927504
>> histogram.101;11925396
>> histogram.102;11929923
>> histogram.103;11931596
>> histogram.104;11929071
>> histogram.105;11931365
>> histogram.106;11928661
>> histogram.107;11929864
>> histogram.108;11931611
>> histogram.109;11932758
>> histogram.11;0
>> histogram.110;11927893
>> histogram.111;11933519
>> histogram.112;11931392
>> histogram.113;11928534
>> histogram.114;11936879
>> histogram.115;11932818
>> histogram.116;11934767
>> histogram.117;11929143
>> histogram.118;11931854
>> histogram.119;11926333
>> histogram.12;0
>> histogram.120;11928731
>> histogram.121;11931149
>> histogram.122;11926725
>> histogram.123;0
>> histogram.124;0
>> histogram.125;0
>> histogram.126;0
>> histogram.127;0
>> histogram.128;0
>> histogram.129;0
>> histogram.13;0
>> histogram.130;0
>> histogram.131;0
>> histogram.132;0
>> histogram.133;0
>> histogram.134;0
>> histogram.135;0
>> histogram.136;0
>> histogram.137;0
>> histogram.138;0
>> histogram.139;0
>> histogram.14;0
>> histogram.140;0
>> histogram.141;0
>> histogram.142;0
>> histogram.143;0
>> histogram.144;0
>> histogram.145;0
>> histogram.146;0
>> histogram.147;0
>> histogram.148;0
>> histogram.149;0
>> histogram.15;0
>> histogram.150;0
>> histogram.151;0
>> histogram.152;0
>> histogram.153;0
>> histogram.154;0
>> histogram.155;0
>> histogram.156;0
>> histogram.157;0
>> histogram.158;0
>> histogram.159;0
>> histogram.16;0
>> histogram.160;0
>> histogram.161;0
>> histogram.162;0
>> histogram.163;0
>> histogram.164;0
>> histogram.165;0
>> histogram.166;0
>> histogram.167;0
>> histogram.168;0
>> histogram.169;0
>> histogram.17;0
>> histogram.170;0
>> histogram.171;0
>> histogram.172;0
>> histogram.173;0
>> histogram.174;0
>> histogram.175;0
>> histogram.176;0
>> histogram.177;0
>> histogram.178;0
>> histogram.179;0
>> histogram.18;0
>> histogram.180;0
>> histogram.181;0
>> histogram.182;0
>> histogram.183;0
>> histogram.184;0
>> histogram.185;0
>> histogram.186;0
>> histogram.187;0
>> histogram.188;0
>> histogram.189;0
>> histogram.19;0
>> histogram.190;0
>> histogram.191;0
>> histogram.192;0
>> histogram.193;0
>> histogram.194;0
>> histogram.195;0
>> histogram.196;0
>> histogram.197;0
>> histogram.198;0
>> histogram.199;0
>> histogram.2;0
>> histogram.20;0
>> histogram.200;0
>> histogram.201;0
>> histogram.202;0
>> histogram.203;0
>> histogram.204;0
>> histogram.205;0
>> histogram.206;0
>> histogram.207;0
>> histogram.208;0
>> histogram.209;0
>> histogram.21;0
>> histogram.210;0
>> histogram.211;0
>> histogram.212;0
>> histogram.213;0
>> histogram.214;0
>> histogram.215;0
>> histogram.216;0
>> histogram.217;0
>> histogram.218;0
>> histogram.219;0
>> histogram.22;0
>> histogram.220;0
>> histogram.221;0
>> histogram.222;0
>> histogram.223;0
>> histogram.224;0
>> histogram.225;0
>> histogram.226;0
>> histogram.227;0
>> histogram.228;0
>> histogram.229;0
>> histogram.23;0
>> histogram.230;0
>> histogram.231;0
>> histogram.232;0
>> histogram.233;0
>> histogram.234;0
>> histogram.235;0
>> histogram.236;0
>> histogram.237;0
>> histogram.238;0
>> histogram.239;0
>> histogram.24;0
>> histogram.240;0
>> histogram.241;0
>> histogram.242;0
>> histogram.243;0
>> histogram.244;0
>> histogram.245;0
>> histogram.246;0
>> histogram.247;0
>> histogram.248;0
>> histogram.249;0
>> histogram.25;0
>> histogram.250;0
>> histogram.251;0
>> histogram.252;0
>> histogram.253;0
>> histogram.254;0
>> histogram.255;0
>> histogram.26;0
>> histogram.27;0
>> histogram.28;0
>> histogram.29;0
>> histogram.3;0
>> histogram.30;0
>> histogram.31;0
>> histogram.32;11930422
>> histogram.33;11934311
>> histogram.34;11930459
>> histogram.35;11924776
>> histogram.36;11924186
>> histogram.37;11928616
>> histogram.38;11929474
>> histogram.39;11929607
>> histogram.4;0
>> histogram.40;11928053
>> histogram.41;11930402
>> histogram.42;11926830
>> histogram.43;11938138
>> histogram.44;11932536
>> histogram.45;11931053
>> histogram.46;11930008
>> histogram.47;11927747
>> histogram.48;11936055
>> histogram.49;11931471
>> histogram.5;0
>> histogram.50;11931921
>> histogram.51;11929643
>> histogram.52;11923847
>> histogram.53;11927311
>> histogram.54;11933754
>> histogram.55;11925964
>> histogram.56;11928872
>> histogram.57;11931124
>> histogram.58;11928474
>> histogram.59;11925814
>> histogram.6;0
>> histogram.60;11933978
>> histogram.61;11934136
>> histogram.62;11932016
>> histogram.63;23864588
>> histogram.64;11924792
>> histogram.65;11934789
>> histogram.66;11933047
>> histogram.67;11931899
>> histogram.68;11935615
>> histogram.69;11927249
>> histogram.7;0
>> histogram.70;11933276
>> histogram.71;11927953
>> histogram.72;11929275
>> histogram.73;11930292
>> histogram.74;11935428
>> histogram.75;11930317
>> histogram.76;11935737
>> histogram.77;11932127
>> histogram.78;11932344
>> histogram.79;11932094
>> histogram.8;0
>> histogram.80;11930688
>> histogram.81;11928415
>> histogram.82;11931559
>> histogram.83;11934192
>> histogram.84;11927224
>> histogram.85;11929491
>> histogram.86;11930624
>> histogram.87;11932201
>> histogram.88;11930694
>> histogram.89;11936439
>> histogram.9;11933187
>> histogram.90;11926445
>> histogram.91;0
>> histogram.92;0
>> histogram.93;0
>> histogram.94;11931596
>> histogram.95;11929379
>> histogram.96;0
>> histogram.97;11928864
>> histogram.98;11924738
>> histogram.99;11930062
>> histogram.totalBytes;1073741824
>>
>> File 2:
>> histogram.0;0
>> histogram.1;0
>> histogram.10;11932402
>> histogram.100;11927531
>> histogram.101;11928454
>> histogram.102;11934432
>> histogram.103;11924623
>> histogram.104;11934492
>> histogram.105;11934585
>> histogram.106;11928955
>> histogram.107;11928651
>> histogram.108;11930139
>> histogram.109;11929325
>> histogram.11;0
>> histogram.110;11930486
>> histogram.111;11933517
>> histogram.112;11928334
>> histogram.113;11927798
>> histogram.114;11929222
>> histogram.115;11932057
>> histogram.116;11931182
>> histogram.117;11933407
>> histogram.118;11932709
>> histogram.119;11931338
>> histogram.12;0
>> histogram.120;11933700
>> histogram.121;11929803
>> histogram.122;11930218
>> histogram.123;0
>> histogram.124;0
>> histogram.125;0
>> histogram.126;0
>> histogram.127;0
>> histogram.128;0
>> histogram.129;0
>> histogram.13;0
>> histogram.130;0
>> histogram.131;0
>> histogram.132;0
>> histogram.133;0
>> histogram.134;0
>> histogram.135;0
>> histogram.136;0
>> histogram.137;0
>> histogram.138;0
>> histogram.139;0
>> histogram.14;0
>> histogram.140;0
>> histogram.141;0
>> histogram.142;0
>> histogram.143;0
>> histogram.144;0
>> histogram.145;0
>> histogram.146;0
>> histogram.147;0
>> histogram.148;0
>> histogram.149;0
>> histogram.15;0
>> histogram.150;0
>> histogram.151;0
>> histogram.152;0
>> histogram.153;0
>> histogram.154;0
>> histogram.155;0
>> histogram.156;0
>> histogram.157;0
>> histogram.158;0
>> histogram.159;0
>> histogram.16;0
>> histogram.160;0
>> histogram.161;0
>> histogram.162;0
>> histogram.163;0
>> histogram.164;0
>> histogram.165;0
>> histogram.166;0
>> histogram.167;0
>> histogram.168;0
>> histogram.169;0
>> histogram.17;0
>> histogram.170;0
>> histogram.171;0
>> histogram.172;0
>> histogram.173;0
>> histogram.174;0
>> histogram.175;0
>> histogram.176;0
>> histogram.177;0
>> histogram.178;0
>> histogram.179;0
>> histogram.18;0
>> histogram.180;0
>> histogram.181;0
>> histogram.182;0
>> histogram.183;0
>> histogram.184;0
>> histogram.185;0
>> histogram.186;0
>> histogram.187;0
>> histogram.188;0
>> histogram.189;0
>> histogram.19;0
>> histogram.190;0
>> histogram.191;0
>> histogram.192;0
>> histogram.193;0
>> histogram.194;0
>> histogram.195;0
>> histogram.196;0
>> histogram.197;0
>> histogram.198;0
>> histogram.199;0
>> histogram.2;0
>> histogram.20;0
>> histogram.200;0
>> histogram.201;0
>> histogram.202;0
>> histogram.203;0
>> histogram.204;0
>> histogram.205;0
>> histogram.206;0
>> histogram.207;0
>> histogram.208;0
>> histogram.209;0
>> histogram.21;0
>> histogram.210;0
>> histogram.211;0
>> histogram.212;0
>> histogram.213;0
>> histogram.214;0
>> histogram.215;0
>> histogram.216;0
>> histogram.217;0
>> histogram.218;0
>> histogram.219;0
>> histogram.22;0
>> histogram.220;0
>> histogram.221;0
>> histogram.222;0
>> histogram.223;0
>> histogram.224;0
>> histogram.225;0
>> histogram.226;0
>> histogram.227;0
>> histogram.228;0
>> histogram.229;0
>> histogram.23;0
>> histogram.230;0
>> histogram.231;0
>> histogram.232;0
>> histogram.233;0
>> histogram.234;0
>> histogram.235;0
>> histogram.236;0
>> histogram.237;0
>> histogram.238;0
>> histogram.239;0
>> histogram.24;0
>> histogram.240;0
>> histogram.241;0
>> histogram.242;0
>> histogram.243;0
>> histogram.244;0
>> histogram.245;0
>> histogram.246;0
>> histogram.247;0
>> histogram.248;0
>> histogram.249;0
>> histogram.25;0
>> histogram.250;0
>> histogram.251;0
>> histogram.252;0
>> histogram.253;0
>> histogram.254;0
>> histogram.255;0
>> histogram.26;0
>> histogram.27;0
>> histogram.28;0
>> histogram.29;0
>> histogram.3;0
>> histogram.30;0
>> histogram.31;0
>> histogram.32;11924458
>> histogram.33;11934243
>> histogram.34;11930696
>> histogram.35;11925574
>> histogram.36;11929198
>> histogram.37;11928146
>> histogram.38;11932505
>> histogram.39;11929406
>> histogram.4;0
>> histogram.40;11930100
>> histogram.41;11930867
>> histogram.42;11930796
>> histogram.43;11930796
>> histogram.44;11921866
>> histogram.45;11935682
>> histogram.46;11930075
>> histogram.47;11928169
>> histogram.48;11933490
>> histogram.49;11932174
>> histogram.5;0
>> histogram.50;11933255
>> histogram.51;11934009
>> histogram.52;11928361
>> histogram.53;11927626
>> histogram.54;11931611
>> histogram.55;11930755
>> histogram.56;11933823
>> histogram.57;11922508
>> histogram.58;11930384
>> histogram.59;11929805
>> histogram.6;0
>> histogram.60;11930064
>> histogram.61;11926761
>> histogram.62;11927605
>> histogram.63;23858926
>> histogram.64;11929516
>> histogram.65;11930217
>> histogram.66;11930478
>> histogram.67;11939855
>> histogram.68;11927850
>> histogram.69;11931154
>> histogram.7;0
>> histogram.70;11935374
>> histogram.71;11930754
>> histogram.72;11928304
>> histogram.73;11931772
>> histogram.74;11939417
>> histogram.75;11930712
>> histogram.76;11933331
>> histogram.77;11931279
>> histogram.78;11928276
>> histogram.79;11930071
>> histogram.8;0
>> histogram.80;11927830
>> histogram.81;11931213
>> histogram.82;11930964
>> histogram.83;11928973
>> histogram.84;11934325
>> histogram.85;11929658
>> histogram.86;11924667
>> histogram.87;11931100
>> histogram.88;11930252
>> histogram.89;11927281
>> histogram.9;11932848
>> histogram.90;11930398
>> histogram.91;0
>> histogram.92;0
>> histogram.93;0
>> histogram.94;11928720
>> histogram.95;11928988
>> histogram.96;0
>> histogram.97;11931423
>> histogram.98;11928181
>> histogram.99;11935549
>> histogram.totalBytes;1073741824
>>
>> File3:
>> histogram.0;0
>> histogram.1;0
>> histogram.10;11930417
>> histogram.100;11926739
>> histogram.101;11930580
>> histogram.102;11928210
>> histogram.103;11935300
>> histogram.104;11925804
>> histogram.105;11931023
>> histogram.106;11932342
>> histogram.107;11929778
>> histogram.108;11930098
>> histogram.109;11930759
>> histogram.11;0
>> histogram.110;11934343
>> histogram.111;11935775
>> histogram.112;11933877
>> histogram.113;11926675
>> histogram.114;11929332
>> histogram.115;11928876
>> histogram.116;11927819
>> histogram.117;11932657
>> histogram.118;11933508
>> histogram.119;11928808
>> histogram.12;0
>> histogram.120;11937532
>> histogram.121;11926907
>> histogram.122;11933942
>> histogram.123;0
>> histogram.124;0
>> histogram.125;0
>> histogram.126;0
>> histogram.127;0
>> histogram.128;0
>> histogram.129;0
>> histogram.13;0
>> histogram.130;0
>> histogram.131;0
>> histogram.132;0
>> histogram.133;0
>> histogram.134;0
>> histogram.135;0
>> histogram.136;0
>> histogram.137;0
>> histogram.138;0
>> histogram.139;0
>> histogram.14;0
>> histogram.140;0
>> histogram.141;0
>> histogram.142;0
>> histogram.143;0
>> histogram.144;0
>> histogram.145;0
>> histogram.146;0
>> histogram.147;0
>> histogram.148;0
>> histogram.149;0
>> histogram.15;0
>> histogram.150;0
>> histogram.151;0
>> histogram.152;0
>> histogram.153;0
>> histogram.154;0
>> histogram.155;0
>> histogram.156;0
>> histogram.157;0
>> histogram.158;0
>> histogram.159;0
>> histogram.16;0
>> histogram.160;0
>> histogram.161;0
>> histogram.162;0
>> histogram.163;0
>> histogram.164;0
>> histogram.165;0
>> histogram.166;0
>> histogram.167;0
>> histogram.168;0
>> histogram.169;0
>> histogram.17;0
>> histogram.170;0
>> histogram.171;0
>> histogram.172;0
>> histogram.173;0
>> histogram.174;0
>> histogram.175;0
>> histogram.176;0
>> histogram.177;0
>> histogram.178;0
>> histogram.179;0
>> histogram.18;0
>> histogram.180;0
>> histogram.181;0
>> histogram.182;0
>> histogram.183;0
>> histogram.184;0
>> histogram.185;0
>> histogram.186;0
>> histogram.187;0
>> histogram.188;0
>> histogram.189;0
>> histogram.19;0
>> histogram.190;0
>> histogram.191;0
>> histogram.192;0
>> histogram.193;0
>> histogram.194;0
>> histogram.195;0
>> histogram.196;0
>> histogram.197;0
>> histogram.198;0
>> histogram.199;0
>> histogram.2;0
>> histogram.20;0
>> histogram.200;0
>> histogram.201;0
>> histogram.202;0
>> histogram.203;0
>> histogram.204;0
>> histogram.205;0
>> histogram.206;0
>> histogram.207;0
>> histogram.208;0
>> histogram.209;0
>> histogram.21;0
>> histogram.210;0
>> histogram.211;0
>> histogram.212;0
>> histogram.213;0
>> histogram.214;0
>> histogram.215;0
>> histogram.216;0
>> histogram.217;0
>> histogram.218;0
>> histogram.219;0
>> histogram.22;0
>> histogram.220;0
>> histogram.221;0
>> histogram.222;0
>> histogram.223;0
>> histogram.224;0
>> histogram.225;0
>> histogram.226;0
>> histogram.227;0
>> histogram.228;0
>> histogram.229;0
>> histogram.23;0
>> histogram.230;0
>> histogram.231;0
>> histogram.232;0
>> histogram.233;0
>> histogram.234;0
>> histogram.235;0
>> histogram.236;0
>> histogram.237;0
>> histogram.238;0
>> histogram.239;0
>> histogram.24;0
>> histogram.240;0
>> histogram.241;0
>> histogram.242;0
>> histogram.243;0
>> histogram.244;0
>> histogram.245;0
>> histogram.246;0
>> histogram.247;0
>> histogram.248;0
>> histogram.249;0
>> histogram.25;0
>> histogram.250;0
>> histogram.251;0
>> histogram.252;0
>> histogram.253;0
>> histogram.254;0
>> histogram.255;0
>> histogram.26;0
>> histogram.27;0
>> histogram.28;0
>> histogram.29;0
>> histogram.3;0
>> histogram.30;0
>> histogram.31;0
>> histogram.32;11929486
>> histogram.33;11930737
>> histogram.34;11931092
>> histogram.35;11934488
>> histogram.36;11927605
>> histogram.37;11930735
>> histogram.38;11932174
>> histogram.39;11936180
>> histogram.4;0
>> histogram.40;11931666
>> histogram.41;11927043
>> histogram.42;11929044
>> histogram.43;11934104
>> histogram.44;11936337
>> histogram.45;11935580
>> histogram.46;11929598
>> histogram.47;11934083
>> histogram.48;11928858
>> histogram.49;11931098
>> histogram.5;0
>> histogram.50;11930618
>> histogram.51;11925429
>> histogram.52;11929741
>> histogram.53;11934160
>> histogram.54;11931999
>> histogram.55;11930465
>> histogram.56;11926194
>> histogram.57;11926386
>> histogram.58;11924871
>> histogram.59;11929331
>> histogram.6;0
>> histogram.60;11926951
>> histogram.61;11928631
>> histogram.62;11927549
>> histogram.63;23856730
>> histogram.64;11930288
>> histogram.65;11931523
>> histogram.66;11932821
>> histogram.67;11932509
>> histogram.68;11929613
>> histogram.69;11928651
>> histogram.7;0
>> histogram.70;11929253
>> histogram.71;11931521
>> histogram.72;11925805
>> histogram.73;11934833
>> histogram.74;11928314
>> histogram.75;11923854
>> histogram.76;11930892
>> histogram.77;11927528
>> histogram.78;11932850
>> histogram.79;11934471
>> histogram.8;0
>> histogram.80;11925707
>> histogram.81;11929213
>> histogram.82;11931334
>> histogram.83;11936739
>> histogram.84;11927855
>> histogram.85;11931668
>> histogram.86;11928609
>> histogram.87;11931930
>> histogram.88;11934341
>> histogram.89;11927519
>> histogram.9;11928004
>> histogram.90;11933502
>> histogram.91;0
>> histogram.92;0
>> histogram.93;0
>> histogram.94;11932024
>> histogram.95;11932693
>> histogram.96;0
>> histogram.97;11928428
>> histogram.98;11933195
>> histogram.99;11924273
>> histogram.totalBytes;1073741824
>>
>> Kind regards
>> Jens
>>
>> Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <jo...@gmail.com>:
>>>
>>> Jen
>>>
>>> 118 hours in - still goood.
>>>
>>> Thanks
>>>
>>> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com> wrote:
>>> >
>>> > Jens
>>> >
>>> > Update from hour 67.  Still lookin' good.
>>> >
>>> > Will advise.
>>> >
>>> > Thanks
>>> >
>>> > On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>> > >
>>> > > Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 days before the first error occurred
>>> > >
>>> > > Thanks
>>> > >
>>> > > > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
>>> > > >
>>> > > > Jens,
>>> > > >
>>> > > > Am 40+ hours in running both your flow and mine to reproduce.  So far
>>> > > > neither have shown any sign of trouble.  Will keep running for another
>>> > > > week or so if I can.
>>> > > >
>>> > > > Thanks
>>> > > >
>>> > > >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>> > > >>
>>> > > >> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
>>> > > >> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
>>> > > >>
>>> > > >> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
>>> > > >>
>>> > > >> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
>>> > > >>
>>> > > >> Kind regards
>>> > > >> Jens
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >> I don't quite follow the EXT4 usage on top of VMFS but the point here
>>> > > >> is you'll ultimately need to truly understand your underlying storage
>>> > > >> system and what sorts of guarantees it is giving you.  If linux/the
>>> > > >> jvm/nifi think it has a typical EXT4 type block storage system to work
>>> > > >> with it can only be safe/operate within those constraints.  I have no
>>> > > >> idea about what VMFS brings to the table or the settings for it.
>>> > > >>
>>> > > >> The sync properties I shared previously might help force the issue of
>>> > > >> ensuring a formal sync/flush cycle all the way through the disk has
>>> > > >> occurred which we'd normally not do or need to do but again in some
>>> > > >> cases offers a stronger guarantee in exchange for performance.
>>> > > >>
>>> > > >> In any case...Mark's path for you here will help identify what we're
>>> > > >> dealing with and we can go from there.
>>> > > >>
>>> > > >> I am aware of significant usage of NiFi on VMWare configurations
>>> > > >> without issue at high rates for many years so whatever it is here is
>>> > > >> likely solvable.
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Hi Mark
>>> > > >>
>>> > > >>
>>> > > >> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
>>> > > >>
>>> > > >> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
>>> > > >>
>>> > > >> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
>>> > > >>
>>> > > >>
>>> > > >> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
>>> > > >>
>>> > > >>
>>> > > >> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
>>> > > >>
>>> > > >> Just like the case in step 5 and 12 in the pdf file
>>> > > >>
>>> > > >>
>>> > > >> I will let you all know more later next week
>>> > > >>
>>> > > >>
>>> > > >> Kind regards
>>> > > >>
>>> > > >> Jens
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
>>> > > >>
>>> > > >>
>>> > > >> And the actual script:
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> import org.apache.nifi.flowfile.FlowFile
>>> > > >>
>>> > > >>
>>> > > >> import java.util.stream.Collectors
>>> > > >>
>>> > > >>
>>> > > >> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>>> > > >>
>>> > > >>   final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
>>> > > >>
>>> > > >>       .filter({ entry -> entry.getKey().startsWith("histogram.") })
>>> > > >>
>>> > > >>       .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
>>> > > >>
>>> > > >>   return histogram;
>>> > > >>
>>> > > >> }
>>> > > >>
>>> > > >>
>>> > > >> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
>>> > > >>
>>> > > >>   final Map<String, String> histogram = new HashMap<>();
>>> > > >>
>>> > > >>   final int[] distribution = new int[256];
>>> > > >>
>>> > > >>   Arrays.fill(distribution, 0);
>>> > > >>
>>> > > >>
>>> > > >>   long total = 0L;
>>> > > >>
>>> > > >>   final byte[] buffer = new byte[8192];
>>> > > >>
>>> > > >>   int len;
>>> > > >>
>>> > > >>   while ((len = inStream.read(buffer)) > 0) {
>>> > > >>
>>> > > >>       for (int i=0; i < len; i++) {
>>> > > >>
>>> > > >>           final int val = buffer[i];
>>> > > >>
>>> > > >>           distribution[val]++;
>>> > > >>
>>> > > >>           total++;
>>> > > >>
>>> > > >>       }
>>> > > >>
>>> > > >>   }
>>> > > >>
>>> > > >>
>>> > > >>   for (int i=0; i < 256; i++) {
>>> > > >>
>>> > > >>       histogram.put("histogram." + i, String.valueOf(distribution[i]));
>>> > > >>
>>> > > >>   }
>>> > > >>
>>> > > >>   histogram.put("histogram.totalBytes", String.valueOf(total));
>>> > > >>
>>> > > >>
>>> > > >>   return histogram;
>>> > > >>
>>> > > >> }
>>> > > >>
>>> > > >>
>>> > > >> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
>>> > > >>
>>> > > >>   final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
>>> > > >>
>>> > > >>   final Map<String, String> sorted = new TreeMap<>(previous)
>>> > > >>
>>> > > >>   for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>>> > > >>
>>> > > >>       final String key = entry.getKey();
>>> > > >>
>>> > > >>       final String previousValue = entry.getValue();
>>> > > >>
>>> > > >>       final String updatedValue = updated.get(entry.getKey())
>>> > > >>
>>> > > >>
>>> > > >>       if (!Objects.equals(previousValue, updatedValue)) {
>>> > > >>
>>> > > >>           sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
>>> > > >>
>>> > > >>       }
>>> > > >>
>>> > > >>   }
>>> > > >>
>>> > > >>
>>> > > >>   log.error(sb.toString());
>>> > > >>
>>> > > >> }
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> def flowFile = session.get()
>>> > > >>
>>> > > >> if (flowFile == null) {
>>> > > >>
>>> > > >>   return
>>> > > >>
>>> > > >> }
>>> > > >>
>>> > > >>
>>> > > >> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
>>> > > >>
>>> > > >> Map<String, String> histogram = null;
>>> > > >>
>>> > > >>
>>> > > >> final InputStream inStream = session.read(flowFile);
>>> > > >>
>>> > > >> try {
>>> > > >>
>>> > > >>   histogram = createHistogram(flowFile, inStream);
>>> > > >>
>>> > > >> } finally {
>>> > > >>
>>> > > >>   inStream.close()
>>> > > >>
>>> > > >> }
>>> > > >>
>>> > > >>
>>> > > >> if (!previousHistogram.isEmpty()) {
>>> > > >>
>>> > > >>   if (previousHistogram.equals(histogram)) {
>>> > > >>
>>> > > >>       log.info("Histograms match")
>>> > > >>
>>> > > >>   } else {
>>> > > >>
>>> > > >>       logHistogramDifferences(previousHistogram, histogram)
>>> > > >>
>>> > > >>       session.transfer(flowFile, REL_FAILURE)
>>> > > >>
>>> > > >>       return;
>>> > > >>
>>> > > >>   }
>>> > > >>
>>> > > >> }
>>> > > >>
>>> > > >>
>>> > > >> flowFile = session.putAllAttributes(flowFile, histogram)
>>> > > >>
>>> > > >> session.transfer(flowFile, REL_SUCCESS)
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >>
>>> > > >> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
>>> > > >>
>>> > > >>
>>> > > >> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
>>> > > >>
>>> > > >>
>>> > > >> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
>>> > > >>
>>> > > >>
>>> > > >> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
>>> > > >>
>>> > > >> histogram.0 = 280273
>>> > > >>
>>> > > >> histogram.1 = 2820
>>> > > >>
>>> > > >> histogram.2 = 48202
>>> > > >>
>>> > > >> histogram.3 = 3820
>>> > > >>
>>> > > >> …
>>> > > >>
>>> > > >> histogram.totalBytes = 1780928732
>>> > > >>
>>> > > >>
>>> > > >> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
>>> > > >>
>>> > > >>
>>> > > >> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >> -Mark
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Jens
>>> > > >>
>>> > > >>
>>> > > >> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >>
>>> > > >> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Jens
>>> > > >>
>>> > > >>
>>> > > >> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>>> > > >>
>>> > > >>
>>> > > >> Still want to know more about your underlying storage system.
>>> > > >>
>>> > > >>
>>> > > >> You could also try updating nifi.properties and changing the following lines:
>>> > > >>
>>> > > >> nifi.flowfile.repository.always.sync=true
>>> > > >>
>>> > > >> nifi.content.repository.always.sync=true
>>> > > >>
>>> > > >> nifi.provenance.repository.always.sync=true
>>> > > >>
>>> > > >>
>>> > > >> It will hurt performance but can be useful/necessary on certain storage subsystems.
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >>
>>> > > >> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>> > > >>
>>> > > >>
>>> > > >> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >>
>>> > > >> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>>> > > >>
>>> > > >>
>>> > > >> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>>> > > >>
>>> > > >>
>>> > > >> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >> Joe
>>> > > >>
>>> > > >>
>>> > > >> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Dear Joe and Mark
>>> > > >>
>>> > > >>
>>> > > >> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>>> > > >>
>>> > > >> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>>> > > >>
>>> > > >> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>>> > > >>
>>> > > >>
>>> > > >> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>>> > > >>
>>> > > >>
>>> > > >> Kind regards
>>> > > >>
>>> > > >> Jens M. Kofoed
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>>> > > >>
>>> > > >>
>>> > > >> Joe,
>>> > > >>
>>> > > >>
>>> > > >> To start from the last mail :-)
>>> > > >>
>>> > > >> All the repositories has it's own disk, and I'm using ext4
>>> > > >>
>>> > > >> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>> > > >>
>>> > > >> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>> > > >>
>>> > > >> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>> > > >>
>>> > > >>
>>> > > >> My test flow WITH sftp looks like this:
>>> > > >>
>>> > > >> <image.png>
>>> > > >>
>>> > > >> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>> > > >>
>>> > > >> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>>> > > >>
>>> > > >>
>>> > > >> About the Lineage:
>>> > > >>
>>> > > >> Are there a way to export all the lineage data? The export only generate a svg file.
>>> > > >>
>>> > > >> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>>> > > >>
>>> > > >> The interesting steps are step 5 and 12.
>>> > > >>
>>> > > >>
>>> > > >> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>>> > > >>
>>> > > >>
>>> > > >> Kind regards
>>> > > >>
>>> > > >> Jens M. Kofoed
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>> > > >>
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >>
>>> > > >> Also what type of file system/storage system are you running NiFi on
>>> > > >>
>>> > > >> in this case?  We'll need to know this for the NiFi
>>> > > >>
>>> > > >> content/flowfile/provenance repositories? Is it NFS?
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >>
>>> > > >> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >>
>>> > > >> And to further narrow this down
>>> > > >>
>>> > > >>
>>> > > >> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>> > > >>
>>> > > >> (2 files per node) and next process was a hashcontent before it run
>>> > > >>
>>> > > >> into a test loop. Where files are uploaded via PutSFTP to a test
>>> > > >>
>>> > > >> server, and downloaded again and recalculated the hash. I have had one
>>> > > >>
>>> > > >> issue after 3 days of running."
>>> > > >>
>>> > > >>
>>> > > >> So to be clear with GenerateFlowFile making these files and then you
>>> > > >>
>>> > > >> looping the content is wholly and fully exclusively within the control
>>> > > >>
>>> > > >> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>> > > >>
>>> > > >> same files over and over in nifi itself you can make this happen or
>>> > > >>
>>> > > >> cannot?
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >>
>>> > > >> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >>
>>> > > >> "After fetching a FlowFile-stream file and unpacked it back into NiFi
>>> > > >>
>>> > > >> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>> > > >>
>>> > > >> exact same file. And got a new hash. That is what worry’s me.
>>> > > >>
>>> > > >> The fact that the same file can be recalculated and produce two
>>> > > >>
>>> > > >> different hashes, is very strange, but it happens. "
>>> > > >>
>>> > > >>
>>> > > >> Ok so to confirm you are saying that in each case this happens you see
>>> > > >>
>>> > > >> it first compute the wrong hash, but then if you retry the same
>>> > > >>
>>> > > >> flowfile it then provides the correct hash?
>>> > > >>
>>> > > >>
>>> > > >> Can you please also show/share the lineage history for such a flow
>>> > > >>
>>> > > >> file then?  It should have events for the initial hash, second hash,
>>> > > >>
>>> > > >> the unpacking, trace to the original stream, etc...
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >>
>>> > > >> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Dear Mark and Joe
>>> > > >>
>>> > > >>
>>> > > >> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>> > > >>
>>> > > >> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>>> > > >>
>>> > > >> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>>> > > >>
>>> > > >>
>>> > > >> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>>> > > >>
>>> > > >>
>>> > > >> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>>> > > >>
>>> > > >> Now the test flow is running without the Put/Fetch sftp processors.
>>> > > >>
>>> > > >>
>>> > > >> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>>> > > >>
>>> > > >> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>>> > > >>
>>> > > >>
>>> > > >> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>>> > > >>
>>> > > >>
>>> > > >> Kind Regards
>>> > > >>
>>> > > >> Jens M. Kofoed
>>> > > >>
>>> > > >>
>>> > > >> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>>> > > >>
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >>
>>> > > >> Thanks for sharing the images.
>>> > > >>
>>> > > >>
>>> > > >> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>>> > > >>
>>> > > >>
>>> > > >> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>>> > > >>
>>> > > >>
>>> > > >> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>>> > > >>
>>> > > >>
>>> > > >> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>>> > > >>
>>> > > >>
>>> > > >> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >> -Mark
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Jens
>>> > > >>
>>> > > >>
>>> > > >> Actually is this current loop test contained within a single nifi and there you see corruption happen?
>>> > > >>
>>> > > >>
>>> > > >> Joe
>>> > > >>
>>> > > >>
>>> > > >> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >>
>>> > > >> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>>> > > >>
>>> > > >>
>>> > > >> Joe
>>> > > >>
>>> > > >>
>>> > > >> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Hi
>>> > > >>
>>> > > >>
>>> > > >> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>>> > > >>
>>> > > >> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>>> > > >>
>>> > > >>
>>> > > >> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>>> > > >>
>>> > > >> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>>> > > >>
>>> > > >>
>>> > > >> Any suggestions for where to dig ?
>>> > > >>
>>> > > >>
>>> > > >> Regards
>>> > > >>
>>> > > >> Jens M. Kofoed
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>>> > > >>
>>> > > >>
>>> > > >> Hi Mark
>>> > > >>
>>> > > >>
>>> > > >> Thanks for replaying and the suggestion to look at the content Claim.
>>> > > >>
>>> > > >> These 3 pictures is from the first attempt:
>>> > > >>
>>> > > >> <image.png>   <image.png>   <image.png>
>>> > > >>
>>> > > >>
>>> > > >> Yesterday I realized that the content was still in the archive, so I could Replay the file.
>>> > > >>
>>> > > >> <image.png>
>>> > > >>
>>> > > >> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>>> > > >>
>>> > > >> <image.png>   <image.png>   <image.png>
>>> > > >>
>>> > > >>
>>> > > >> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>>> > > >>
>>> > > >> <image.png>   <image.png>   <image.png>
>>> > > >>
>>> > > >> Here the content Claim is all the same.
>>> > > >>
>>> > > >>
>>> > > >> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>>> > > >>
>>> > > >> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>>> > > >>
>>> > > >> What can influence the same processor to calculate 2 different sha256 on the exact same content???
>>> > > >>
>>> > > >>
>>> > > >> Regards
>>> > > >>
>>> > > >> Jens M. Kofoed
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>>> > > >>
>>> > > >>
>>> > > >> Jens,
>>> > > >>
>>> > > >>
>>> > > >> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>>> > > >>
>>> > > >> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>>> > > >>
>>> > > >>
>>> > > >> Thanks
>>> > > >>
>>> > > >> -Mark
>>> > > >>
>>> > > >>
>>> > > >> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Dear NIFI Users
>>> > > >>
>>> > > >>
>>> > > >> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>>> > > >>
>>> > > >> The background:
>>> > > >>
>>> > > >> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>>> > > >>
>>> > > >>
>>> > > >> ----
>>> > > >>
>>> > > >>
>>> > > >> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>>> > > >>
>>> > > >> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>>> > > >>
>>> > > >>
>>> > > >> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>>> > > >>
>>> > > >> <image.png>
>>> > > >>
>>> > > >>
>>> > > >> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>> > > >>
>>> > > >> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>>> > > >>
>>> > > >>
>>> > > >> System 1:
>>> > > >>
>>> > > >> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>> > > >>
>>> > > >> The file is exported as a "FlowFile Stream, v3" to System 2
>>> > > >>
>>> > > >>
>>> > > >> SYSTEM 2:
>>> > > >>
>>> > > >> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> > > >>
>>> > > >> <image.png>
>>> > > >>
>>> > > >> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> > > >>
>>> > > >> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> > > >>
>>> > > >> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> > > >>
>>> > > >>
>>> > > >> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>> > > >>
>>> > > >> <image.png>
>>> > > >>
>>> > > >>
>>> > > >> How on earth can this happen???
>>> > > >>
>>> > > >>
>>> > > >> Kind Regards
>>> > > >>
>>> > > >> Jens M. Kofoed
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> <Repro.json>
>>> > > >>
>>> > > >>
>>> > > >> <Try_to_recreate_Jens_Challenge.json>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>
>>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Mark

Of course, sorry :-)  By looking at the error messages, I can see that it
is only the histograms which has differences which is listed. And all 3
have the first issue at histogram.9. Don't know what that mean

/Jens

Here are the error log:
2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10]
org.apache.nifi.processors.script.ExecuteScript
ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are
differences in the histogram
Byte Value: histogram.10, Previous Count: 11926720, New Count: 11926721
Byte Value: histogram.100, Previous Count: 11927504, New Count: 11927503
Byte Value: histogram.101, Previous Count: 11925396, New Count: 11925407
Byte Value: histogram.102, Previous Count: 11929923, New Count: 11929941
Byte Value: histogram.103, Previous Count: 11931596, New Count: 11931591
Byte Value: histogram.104, Previous Count: 11929071, New Count: 11929064
Byte Value: histogram.105, Previous Count: 11931365, New Count: 11931348
Byte Value: histogram.106, Previous Count: 11928661, New Count: 11928645
Byte Value: histogram.107, Previous Count: 11929864, New Count: 11929866
Byte Value: histogram.108, Previous Count: 11931611, New Count: 11931642
Byte Value: histogram.109, Previous Count: 11932758, New Count: 11932763
Byte Value: histogram.110, Previous Count: 11927893, New Count: 11927895
Byte Value: histogram.111, Previous Count: 11933519, New Count: 11933522
Byte Value: histogram.112, Previous Count: 11931392, New Count: 11931397
Byte Value: histogram.113, Previous Count: 11928534, New Count: 11928548
Byte Value: histogram.114, Previous Count: 11936879, New Count: 11936874
Byte Value: histogram.115, Previous Count: 11932818, New Count: 11932804
Byte Value: histogram.117, Previous Count: 11929143, New Count: 11929151
Byte Value: histogram.118, Previous Count: 11931854, New Count: 11931829
Byte Value: histogram.119, Previous Count: 11926333, New Count: 11926327
Byte Value: histogram.120, Previous Count: 11928731, New Count: 11928740
Byte Value: histogram.121, Previous Count: 11931149, New Count: 11931162
Byte Value: histogram.122, Previous Count: 11926725, New Count: 11926733
Byte Value: histogram.32, Previous Count: 11930422, New Count: 11930425
Byte Value: histogram.33, Previous Count: 11934311, New Count: 11934313
Byte Value: histogram.34, Previous Count: 11930459, New Count: 11930446
Byte Value: histogram.35, Previous Count: 11924776, New Count: 11924758
Byte Value: histogram.36, Previous Count: 11924186, New Count: 11924183
Byte Value: histogram.37, Previous Count: 11928616, New Count: 11928627
Byte Value: histogram.38, Previous Count: 11929474, New Count: 11929490
Byte Value: histogram.39, Previous Count: 11929607, New Count: 11929600
Byte Value: histogram.40, Previous Count: 11928053, New Count: 11928048
Byte Value: histogram.41, Previous Count: 11930402, New Count: 11930399
Byte Value: histogram.42, Previous Count: 11926830, New Count: 11926846
Byte Value: histogram.44, Previous Count: 11932536, New Count: 11932538
Byte Value: histogram.45, Previous Count: 11931053, New Count: 11931044
Byte Value: histogram.46, Previous Count: 11930008, New Count: 11930011
Byte Value: histogram.47, Previous Count: 11927747, New Count: 11927734
Byte Value: histogram.48, Previous Count: 11936055, New Count: 11936057
Byte Value: histogram.49, Previous Count: 11931471, New Count: 11931474
Byte Value: histogram.50, Previous Count: 11931921, New Count: 11931908
Byte Value: histogram.51, Previous Count: 11929643, New Count: 11929637
Byte Value: histogram.52, Previous Count: 11923847, New Count: 11923854
Byte Value: histogram.53, Previous Count: 11927311, New Count: 11927303
Byte Value: histogram.54, Previous Count: 11933754, New Count: 11933766
Byte Value: histogram.55, Previous Count: 11925964, New Count: 11925970
Byte Value: histogram.56, Previous Count: 11928872, New Count: 11928873
Byte Value: histogram.57, Previous Count: 11931124, New Count: 11931127
Byte Value: histogram.58, Previous Count: 11928474, New Count: 11928477
Byte Value: histogram.59, Previous Count: 11925814, New Count: 11925812
Byte Value: histogram.60, Previous Count: 11933978, New Count: 11933991
Byte Value: histogram.61, Previous Count: 11934136, New Count: 11934123
Byte Value: histogram.62, Previous Count: 11932016, New Count: 11932011
Byte Value: histogram.63, Previous Count: 23864588, New Count: 23864584
Byte Value: histogram.64, Previous Count: 11924792, New Count: 11924789
Byte Value: histogram.65, Previous Count: 11934789, New Count: 11934797
Byte Value: histogram.66, Previous Count: 11933047, New Count: 11933044
Byte Value: histogram.67, Previous Count: 11931899, New Count: 11931909
Byte Value: histogram.68, Previous Count: 11935615, New Count: 11935609
Byte Value: histogram.69, Previous Count: 11927249, New Count: 11927239
Byte Value: histogram.70, Previous Count: 11933276, New Count: 11933274
Byte Value: histogram.71, Previous Count: 11927953, New Count: 11927969
Byte Value: histogram.72, Previous Count: 11929275, New Count: 11929266
Byte Value: histogram.73, Previous Count: 11930292, New Count: 11930306
Byte Value: histogram.74, Previous Count: 11935428, New Count: 11935427
Byte Value: histogram.75, Previous Count: 11930317, New Count: 11930307
Byte Value: histogram.76, Previous Count: 11935737, New Count: 11935726
Byte Value: histogram.77, Previous Count: 11932127, New Count: 11932125
Byte Value: histogram.78, Previous Count: 11932344, New Count: 11932349
Byte Value: histogram.79, Previous Count: 11932094, New Count: 11932100
Byte Value: histogram.80, Previous Count: 11930688, New Count: 11930687
Byte Value: histogram.81, Previous Count: 11928415, New Count: 11928416
Byte Value: histogram.82, Previous Count: 11931559, New Count: 11931542
Byte Value: histogram.83, Previous Count: 11934192, New Count: 11934176
Byte Value: histogram.84, Previous Count: 11927224, New Count: 11927231
Byte Value: histogram.85, Previous Count: 11929491, New Count: 11929484
Byte Value: histogram.87, Previous Count: 11932201, New Count: 11932190
Byte Value: histogram.88, Previous Count: 11930694, New Count: 11930680
Byte Value: histogram.89, Previous Count: 11936439, New Count: 11936448
Byte Value: histogram.9, Previous Count: 11933187, New Count: 11933193
Byte Value: histogram.90, Previous Count: 11926445, New Count: 11926455
Byte Value: histogram.94, Previous Count: 11931596, New Count: 11931609
Byte Value: histogram.95, Previous Count: 11929379, New Count: 11929384
Byte Value: histogram.97, Previous Count: 11928864, New Count: 11928874
Byte Value: histogram.98, Previous Count: 11924738, New Count: 11924729
Byte Value: histogram.99, Previous Count: 11930062, New Count: 11930059

2021-11-01 22:10:02,765 ERROR [Timer-Driven Process Thread-9]
org.apache.nifi.processors.script.ExecuteScript
ExecuteScript[id=c7d3335b-1045-14ed-ffff-ffffa0d62c70] There are
differences in the histogram
Byte Value: histogram.10, Previous Count: 11932402, New Count: 11932407
Byte Value: histogram.100, Previous Count: 11927531, New Count: 11927541
Byte Value: histogram.101, Previous Count: 11928454, New Count: 11928430
Byte Value: histogram.102, Previous Count: 11934432, New Count: 11934439
Byte Value: histogram.103, Previous Count: 11924623, New Count: 11924633
Byte Value: histogram.104, Previous Count: 11934492, New Count: 11934474
Byte Value: histogram.105, Previous Count: 11934585, New Count: 11934591
Byte Value: histogram.106, Previous Count: 11928955, New Count: 11928948
Byte Value: histogram.108, Previous Count: 11930139, New Count: 11930140
Byte Value: histogram.109, Previous Count: 11929325, New Count: 11929321
Byte Value: histogram.110, Previous Count: 11930486, New Count: 11930478
Byte Value: histogram.111, Previous Count: 11933517, New Count: 11933508
Byte Value: histogram.112, Previous Count: 11928334, New Count: 11928339
Byte Value: histogram.114, Previous Count: 11929222, New Count: 11929213
Byte Value: histogram.116, Previous Count: 11931182, New Count: 11931188
Byte Value: histogram.117, Previous Count: 11933407, New Count: 11933402
Byte Value: histogram.118, Previous Count: 11932709, New Count: 11932705
Byte Value: histogram.120, Previous Count: 11933700, New Count: 11933708
Byte Value: histogram.121, Previous Count: 11929803, New Count: 11929801
Byte Value: histogram.122, Previous Count: 11930218, New Count: 11930220
Byte Value: histogram.32, Previous Count: 11924458, New Count: 11924469
Byte Value: histogram.33, Previous Count: 11934243, New Count: 11934248
Byte Value: histogram.34, Previous Count: 11930696, New Count: 11930700
Byte Value: histogram.35, Previous Count: 11925574, New Count: 11925577
Byte Value: histogram.36, Previous Count: 11929198, New Count: 11929187
Byte Value: histogram.37, Previous Count: 11928146, New Count: 11928143
Byte Value: histogram.38, Previous Count: 11932505, New Count: 11932510
Byte Value: histogram.39, Previous Count: 11929406, New Count: 11929412
Byte Value: histogram.40, Previous Count: 11930100, New Count: 11930098
Byte Value: histogram.41, Previous Count: 11930867, New Count: 11930872
Byte Value: histogram.42, Previous Count: 11930796, New Count: 11930793
Byte Value: histogram.43, Previous Count: 11930796, New Count: 11930789
Byte Value: histogram.44, Previous Count: 11921866, New Count: 11921865
Byte Value: histogram.45, Previous Count: 11935682, New Count: 11935699
Byte Value: histogram.46, Previous Count: 11930075, New Count: 11930073
Byte Value: histogram.47, Previous Count: 11928169, New Count: 11928165
Byte Value: histogram.48, Previous Count: 11933490, New Count: 11933478
Byte Value: histogram.49, Previous Count: 11932174, New Count: 11932180
Byte Value: histogram.50, Previous Count: 11933255, New Count: 11933239
Byte Value: histogram.51, Previous Count: 11934009, New Count: 11934013
Byte Value: histogram.52, Previous Count: 11928361, New Count: 11928367
Byte Value: histogram.53, Previous Count: 11927626, New Count: 11927627
Byte Value: histogram.54, Previous Count: 11931611, New Count: 11931617
Byte Value: histogram.55, Previous Count: 11930755, New Count: 11930746
Byte Value: histogram.56, Previous Count: 11933823, New Count: 11933824
Byte Value: histogram.57, Previous Count: 11922508, New Count: 11922510
Byte Value: histogram.58, Previous Count: 11930384, New Count: 11930362
Byte Value: histogram.59, Previous Count: 11929805, New Count: 11929820
Byte Value: histogram.60, Previous Count: 11930064, New Count: 11930055
Byte Value: histogram.61, Previous Count: 11926761, New Count: 11926762
Byte Value: histogram.62, Previous Count: 11927605, New Count: 11927604
Byte Value: histogram.63, Previous Count: 23858926, New Count: 23858913
Byte Value: histogram.64, Previous Count: 11929516, New Count: 11929512
Byte Value: histogram.65, Previous Count: 11930217, New Count: 11930223
Byte Value: histogram.66, Previous Count: 11930478, New Count: 11930481
Byte Value: histogram.67, Previous Count: 11939855, New Count: 11939858
Byte Value: histogram.68, Previous Count: 11927850, New Count: 11927852
Byte Value: histogram.69, Previous Count: 11931154, New Count: 11931175
Byte Value: histogram.70, Previous Count: 11935374, New Count: 11935369
Byte Value: histogram.71, Previous Count: 11930754, New Count: 11930751
Byte Value: histogram.72, Previous Count: 11928304, New Count: 11928318
Byte Value: histogram.73, Previous Count: 11931772, New Count: 11931766
Byte Value: histogram.74, Previous Count: 11939417, New Count: 11939426
Byte Value: histogram.75, Previous Count: 11930712, New Count: 11930718
Byte Value: histogram.76, Previous Count: 11933331, New Count: 11933346
Byte Value: histogram.77, Previous Count: 11931279, New Count: 11931272
Byte Value: histogram.78, Previous Count: 11928276, New Count: 11928290
Byte Value: histogram.79, Previous Count: 11930071, New Count: 11930067
Byte Value: histogram.80, Previous Count: 11927830, New Count: 11927825
Byte Value: histogram.81, Previous Count: 11931213, New Count: 11931206
Byte Value: histogram.82, Previous Count: 11930964, New Count: 11930958
Byte Value: histogram.83, Previous Count: 11928973, New Count: 11928966
Byte Value: histogram.84, Previous Count: 11934325, New Count: 11934331
Byte Value: histogram.85, Previous Count: 11929658, New Count: 11929654
Byte Value: histogram.86, Previous Count: 11924667, New Count: 11924666
Byte Value: histogram.87, Previous Count: 11931100, New Count: 11931106
Byte Value: histogram.88, Previous Count: 11930252, New Count: 11930248
Byte Value: histogram.89, Previous Count: 11927281, New Count: 11927299
Byte Value: histogram.9, Previous Count: 11932848, New Count: 11932851
Byte Value: histogram.90, Previous Count: 11930398, New Count: 11930399
Byte Value: histogram.94, Previous Count: 11928720, New Count: 11928715
Byte Value: histogram.95, Previous Count: 11928988, New Count: 11928977
Byte Value: histogram.97, Previous Count: 11931423, New Count: 11931426
Byte Value: histogram.98, Previous Count: 11928181, New Count: 11928184
Byte Value: histogram.99, Previous Count: 11935549, New Count: 11935542

2021-11-01 22:23:08,989 ERROR [Timer-Driven Process Thread-10]
org.apache.nifi.processors.script.ExecuteScript
ExecuteScript[id=24d13930-49e8-1062-9a2c-943118738138] There are
differences in the histogram
Byte Value: histogram.10, Previous Count: 11930417, New Count: 11930411
Byte Value: histogram.100, Previous Count: 11926739, New Count: 11926755
Byte Value: histogram.101, Previous Count: 11930580, New Count: 11930574
Byte Value: histogram.102, Previous Count: 11928210, New Count: 11928202
Byte Value: histogram.103, Previous Count: 11935300, New Count: 11935297
Byte Value: histogram.104, Previous Count: 11925804, New Count: 11925820
Byte Value: histogram.105, Previous Count: 11931023, New Count: 11931012
Byte Value: histogram.106, Previous Count: 11932342, New Count: 11932344
Byte Value: histogram.108, Previous Count: 11930098, New Count: 11930106
Byte Value: histogram.109, Previous Count: 11930759, New Count: 11930750
Byte Value: histogram.110, Previous Count: 11934343, New Count: 11934352
Byte Value: histogram.111, Previous Count: 11935775, New Count: 11935782
Byte Value: histogram.112, Previous Count: 11933877, New Count: 11933884
Byte Value: histogram.113, Previous Count: 11926675, New Count: 11926674
Byte Value: histogram.114, Previous Count: 11929332, New Count: 11929336
Byte Value: histogram.115, Previous Count: 11928876, New Count: 11928878
Byte Value: histogram.116, Previous Count: 11927819, New Count: 11927833
Byte Value: histogram.117, Previous Count: 11932657, New Count: 11932638
Byte Value: histogram.118, Previous Count: 11933508, New Count: 11933507
Byte Value: histogram.119, Previous Count: 11928808, New Count: 11928821
Byte Value: histogram.120, Previous Count: 11937532, New Count: 11937528
Byte Value: histogram.121, Previous Count: 11926907, New Count: 11926921
Byte Value: histogram.32, Previous Count: 11929486, New Count: 11929489
Byte Value: histogram.33, Previous Count: 11930737, New Count: 11930741
Byte Value: histogram.34, Previous Count: 11931092, New Count: 11931086
Byte Value: histogram.36, Previous Count: 11927605, New Count: 11927615
Byte Value: histogram.37, Previous Count: 11930735, New Count: 11930745
Byte Value: histogram.38, Previous Count: 11932174, New Count: 11932178
Byte Value: histogram.39, Previous Count: 11936180, New Count: 11936182
Byte Value: histogram.40, Previous Count: 11931666, New Count: 11931676
Byte Value: histogram.41, Previous Count: 11927043, New Count: 11927034
Byte Value: histogram.42, Previous Count: 11929044, New Count: 11929042
Byte Value: histogram.43, Previous Count: 11934104, New Count: 11934098
Byte Value: histogram.44, Previous Count: 11936337, New Count: 11936346
Byte Value: histogram.45, Previous Count: 11935580, New Count: 11935582
Byte Value: histogram.46, Previous Count: 11929598, New Count: 11929599
Byte Value: histogram.47, Previous Count: 11934083, New Count: 11934085
Byte Value: histogram.48, Previous Count: 11928858, New Count: 11928860
Byte Value: histogram.49, Previous Count: 11931098, New Count: 11931113
Byte Value: histogram.50, Previous Count: 11930618, New Count: 11930614
Byte Value: histogram.51, Previous Count: 11925429, New Count: 11925435
Byte Value: histogram.52, Previous Count: 11929741, New Count: 11929733
Byte Value: histogram.53, Previous Count: 11934160, New Count: 11934155
Byte Value: histogram.54, Previous Count: 11931999, New Count: 11931980
Byte Value: histogram.55, Previous Count: 11930465, New Count: 11930477
Byte Value: histogram.56, Previous Count: 11926194, New Count: 11926190
Byte Value: histogram.57, Previous Count: 11926386, New Count: 11926381
Byte Value: histogram.58, Previous Count: 11924871, New Count: 11924865
Byte Value: histogram.59, Previous Count: 11929331, New Count: 11929326
Byte Value: histogram.60, Previous Count: 11926951, New Count: 11926943
Byte Value: histogram.61, Previous Count: 11928631, New Count: 11928619
Byte Value: histogram.62, Previous Count: 11927549, New Count: 11927553
Byte Value: histogram.63, Previous Count: 23856730, New Count: 23856718
Byte Value: histogram.64, Previous Count: 11930288, New Count: 11930293
Byte Value: histogram.65, Previous Count: 11931523, New Count: 11931527
Byte Value: histogram.66, Previous Count: 11932821, New Count: 11932818
Byte Value: histogram.67, Previous Count: 11932509, New Count: 11932510
Byte Value: histogram.68, Previous Count: 11929613, New Count: 11929614
Byte Value: histogram.69, Previous Count: 11928651, New Count: 11928654
Byte Value: histogram.70, Previous Count: 11929253, New Count: 11929247
Byte Value: histogram.71, Previous Count: 11931521, New Count: 11931512
Byte Value: histogram.72, Previous Count: 11925805, New Count: 11925808
Byte Value: histogram.73, Previous Count: 11934833, New Count: 11934826
Byte Value: histogram.74, Previous Count: 11928314, New Count: 11928312
Byte Value: histogram.75, Previous Count: 11923854, New Count: 11923863
Byte Value: histogram.76, Previous Count: 11930892, New Count: 11930898
Byte Value: histogram.77, Previous Count: 11927528, New Count: 11927525
Byte Value: histogram.78, Previous Count: 11932850, New Count: 11932857
Byte Value: histogram.79, Previous Count: 11934471, New Count: 11934461
Byte Value: histogram.80, Previous Count: 11925707, New Count: 11925714
Byte Value: histogram.81, Previous Count: 11929213, New Count: 11929206
Byte Value: histogram.82, Previous Count: 11931334, New Count: 11931323
Byte Value: histogram.83, Previous Count: 11936739, New Count: 11936732
Byte Value: histogram.84, Previous Count: 11927855, New Count: 11927832
Byte Value: histogram.85, Previous Count: 11931668, New Count: 11931665
Byte Value: histogram.86, Previous Count: 11928609, New Count: 11928604
Byte Value: histogram.87, Previous Count: 11931930, New Count: 11931933
Byte Value: histogram.88, Previous Count: 11934341, New Count: 11934345
Byte Value: histogram.89, Previous Count: 11927519, New Count: 11927518
Byte Value: histogram.9, Previous Count: 11928004, New Count: 11928001
Byte Value: histogram.90, Previous Count: 11933502, New Count: 11933517
Byte Value: histogram.94, Previous Count: 11932024, New Count: 11932035
Byte Value: histogram.95, Previous Count: 11932693, New Count: 11932679
Byte Value: histogram.97, Previous Count: 11928428, New Count: 11928424
Byte Value: histogram.98, Previous Count: 11933195, New Count: 11933196
Byte Value: histogram.99, Previous Count: 11924273, New Count: 11924282

Den tir. 2. nov. 2021 kl. 15.41 skrev Mark Payne <ma...@hotmail.com>:

> Jens,
>
> The histograms, in and of themselves, are not very interesting. The
> interesting thing would be the difference in the histogram before & after
> the hash. Can you provide the ERROR level logs generated by the
> ExecuteScript? That’s what is of interest.
>
> Thanks
> -Mark
>
>
> On Nov 2, 2021, at 1:35 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Hi Mark and Joe
>
> Yesterday morning I implemented Mark's script in my 2 testflows. One
> testflow using sftp the other MergeContent/UnpackContent. Both testflow are
> running at a test cluster with 3 nodes and NIFI 1.14.0
> The 1st flow with sftp have had 1 file going into the failure queue after
> about 16 hours.
> The 2nd flow have had 2 files  going into the failure queue after about 15
> and 17 hours.
>
> There are definitely something going wrongs in my setup, but I can't
> figure out what.
>
> Information from file 1:
> histogram.0;0
> histogram.1;0
> histogram.10;11926720
> histogram.100;11927504
> histogram.101;11925396
> histogram.102;11929923
> histogram.103;11931596
> histogram.104;11929071
> histogram.105;11931365
> histogram.106;11928661
> histogram.107;11929864
> histogram.108;11931611
> histogram.109;11932758
> histogram.11;0
> histogram.110;11927893
> histogram.111;11933519
> histogram.112;11931392
> histogram.113;11928534
> histogram.114;11936879
> histogram.115;11932818
> histogram.116;11934767
> histogram.117;11929143
> histogram.118;11931854
> histogram.119;11926333
> histogram.12;0
> histogram.120;11928731
> histogram.121;11931149
> histogram.122;11926725
> histogram.123;0
> histogram.124;0
> histogram.125;0
> histogram.126;0
> histogram.127;0
> histogram.128;0
> histogram.129;0
> histogram.13;0
> histogram.130;0
> histogram.131;0
> histogram.132;0
> histogram.133;0
> histogram.134;0
> histogram.135;0
> histogram.136;0
> histogram.137;0
> histogram.138;0
> histogram.139;0
> histogram.14;0
> histogram.140;0
> histogram.141;0
> histogram.142;0
> histogram.143;0
> histogram.144;0
> histogram.145;0
> histogram.146;0
> histogram.147;0
> histogram.148;0
> histogram.149;0
> histogram.15;0
> histogram.150;0
> histogram.151;0
> histogram.152;0
> histogram.153;0
> histogram.154;0
> histogram.155;0
> histogram.156;0
> histogram.157;0
> histogram.158;0
> histogram.159;0
> histogram.16;0
> histogram.160;0
> histogram.161;0
> histogram.162;0
> histogram.163;0
> histogram.164;0
> histogram.165;0
> histogram.166;0
> histogram.167;0
> histogram.168;0
> histogram.169;0
> histogram.17;0
> histogram.170;0
> histogram.171;0
> histogram.172;0
> histogram.173;0
> histogram.174;0
> histogram.175;0
> histogram.176;0
> histogram.177;0
> histogram.178;0
> histogram.179;0
> histogram.18;0
> histogram.180;0
> histogram.181;0
> histogram.182;0
> histogram.183;0
> histogram.184;0
> histogram.185;0
> histogram.186;0
> histogram.187;0
> histogram.188;0
> histogram.189;0
> histogram.19;0
> histogram.190;0
> histogram.191;0
> histogram.192;0
> histogram.193;0
> histogram.194;0
> histogram.195;0
> histogram.196;0
> histogram.197;0
> histogram.198;0
> histogram.199;0
> histogram.2;0
> histogram.20;0
> histogram.200;0
> histogram.201;0
> histogram.202;0
> histogram.203;0
> histogram.204;0
> histogram.205;0
> histogram.206;0
> histogram.207;0
> histogram.208;0
> histogram.209;0
> histogram.21;0
> histogram.210;0
> histogram.211;0
> histogram.212;0
> histogram.213;0
> histogram.214;0
> histogram.215;0
> histogram.216;0
> histogram.217;0
> histogram.218;0
> histogram.219;0
> histogram.22;0
> histogram.220;0
> histogram.221;0
> histogram.222;0
> histogram.223;0
> histogram.224;0
> histogram.225;0
> histogram.226;0
> histogram.227;0
> histogram.228;0
> histogram.229;0
> histogram.23;0
> histogram.230;0
> histogram.231;0
> histogram.232;0
> histogram.233;0
> histogram.234;0
> histogram.235;0
> histogram.236;0
> histogram.237;0
> histogram.238;0
> histogram.239;0
> histogram.24;0
> histogram.240;0
> histogram.241;0
> histogram.242;0
> histogram.243;0
> histogram.244;0
> histogram.245;0
> histogram.246;0
> histogram.247;0
> histogram.248;0
> histogram.249;0
> histogram.25;0
> histogram.250;0
> histogram.251;0
> histogram.252;0
> histogram.253;0
> histogram.254;0
> histogram.255;0
> histogram.26;0
> histogram.27;0
> histogram.28;0
> histogram.29;0
> histogram.3;0
> histogram.30;0
> histogram.31;0
> histogram.32;11930422
> histogram.33;11934311
> histogram.34;11930459
> histogram.35;11924776
> histogram.36;11924186
> histogram.37;11928616
> histogram.38;11929474
> histogram.39;11929607
> histogram.4;0
> histogram.40;11928053
> histogram.41;11930402
> histogram.42;11926830
> histogram.43;11938138
> histogram.44;11932536
> histogram.45;11931053
> histogram.46;11930008
> histogram.47;11927747
> histogram.48;11936055
> histogram.49;11931471
> histogram.5;0
> histogram.50;11931921
> histogram.51;11929643
> histogram.52;11923847
> histogram.53;11927311
> histogram.54;11933754
> histogram.55;11925964
> histogram.56;11928872
> histogram.57;11931124
> histogram.58;11928474
> histogram.59;11925814
> histogram.6;0
> histogram.60;11933978
> histogram.61;11934136
> histogram.62;11932016
> histogram.63;23864588
> histogram.64;11924792
> histogram.65;11934789
> histogram.66;11933047
> histogram.67;11931899
> histogram.68;11935615
> histogram.69;11927249
> histogram.7;0
> histogram.70;11933276
> histogram.71;11927953
> histogram.72;11929275
> histogram.73;11930292
> histogram.74;11935428
> histogram.75;11930317
> histogram.76;11935737
> histogram.77;11932127
> histogram.78;11932344
> histogram.79;11932094
> histogram.8;0
> histogram.80;11930688
> histogram.81;11928415
> histogram.82;11931559
> histogram.83;11934192
> histogram.84;11927224
> histogram.85;11929491
> histogram.86;11930624
> histogram.87;11932201
> histogram.88;11930694
> histogram.89;11936439
> histogram.9;11933187
> histogram.90;11926445
> histogram.91;0
> histogram.92;0
> histogram.93;0
> histogram.94;11931596
> histogram.95;11929379
> histogram.96;0
> histogram.97;11928864
> histogram.98;11924738
> histogram.99;11930062
> histogram.totalBytes;1073741824
>
> File 2:
> histogram.0;0
> histogram.1;0
> histogram.10;11932402
> histogram.100;11927531
> histogram.101;11928454
> histogram.102;11934432
> histogram.103;11924623
> histogram.104;11934492
> histogram.105;11934585
> histogram.106;11928955
> histogram.107;11928651
> histogram.108;11930139
> histogram.109;11929325
> histogram.11;0
> histogram.110;11930486
> histogram.111;11933517
> histogram.112;11928334
> histogram.113;11927798
> histogram.114;11929222
> histogram.115;11932057
> histogram.116;11931182
> histogram.117;11933407
> histogram.118;11932709
> histogram.119;11931338
> histogram.12;0
> histogram.120;11933700
> histogram.121;11929803
> histogram.122;11930218
> histogram.123;0
> histogram.124;0
> histogram.125;0
> histogram.126;0
> histogram.127;0
> histogram.128;0
> histogram.129;0
> histogram.13;0
> histogram.130;0
> histogram.131;0
> histogram.132;0
> histogram.133;0
> histogram.134;0
> histogram.135;0
> histogram.136;0
> histogram.137;0
> histogram.138;0
> histogram.139;0
> histogram.14;0
> histogram.140;0
> histogram.141;0
> histogram.142;0
> histogram.143;0
> histogram.144;0
> histogram.145;0
> histogram.146;0
> histogram.147;0
> histogram.148;0
> histogram.149;0
> histogram.15;0
> histogram.150;0
> histogram.151;0
> histogram.152;0
> histogram.153;0
> histogram.154;0
> histogram.155;0
> histogram.156;0
> histogram.157;0
> histogram.158;0
> histogram.159;0
> histogram.16;0
> histogram.160;0
> histogram.161;0
> histogram.162;0
> histogram.163;0
> histogram.164;0
> histogram.165;0
> histogram.166;0
> histogram.167;0
> histogram.168;0
> histogram.169;0
> histogram.17;0
> histogram.170;0
> histogram.171;0
> histogram.172;0
> histogram.173;0
> histogram.174;0
> histogram.175;0
> histogram.176;0
> histogram.177;0
> histogram.178;0
> histogram.179;0
> histogram.18;0
> histogram.180;0
> histogram.181;0
> histogram.182;0
> histogram.183;0
> histogram.184;0
> histogram.185;0
> histogram.186;0
> histogram.187;0
> histogram.188;0
> histogram.189;0
> histogram.19;0
> histogram.190;0
> histogram.191;0
> histogram.192;0
> histogram.193;0
> histogram.194;0
> histogram.195;0
> histogram.196;0
> histogram.197;0
> histogram.198;0
> histogram.199;0
> histogram.2;0
> histogram.20;0
> histogram.200;0
> histogram.201;0
> histogram.202;0
> histogram.203;0
> histogram.204;0
> histogram.205;0
> histogram.206;0
> histogram.207;0
> histogram.208;0
> histogram.209;0
> histogram.21;0
> histogram.210;0
> histogram.211;0
> histogram.212;0
> histogram.213;0
> histogram.214;0
> histogram.215;0
> histogram.216;0
> histogram.217;0
> histogram.218;0
> histogram.219;0
> histogram.22;0
> histogram.220;0
> histogram.221;0
> histogram.222;0
> histogram.223;0
> histogram.224;0
> histogram.225;0
> histogram.226;0
> histogram.227;0
> histogram.228;0
> histogram.229;0
> histogram.23;0
> histogram.230;0
> histogram.231;0
> histogram.232;0
> histogram.233;0
> histogram.234;0
> histogram.235;0
> histogram.236;0
> histogram.237;0
> histogram.238;0
> histogram.239;0
> histogram.24;0
> histogram.240;0
> histogram.241;0
> histogram.242;0
> histogram.243;0
> histogram.244;0
> histogram.245;0
> histogram.246;0
> histogram.247;0
> histogram.248;0
> histogram.249;0
> histogram.25;0
> histogram.250;0
> histogram.251;0
> histogram.252;0
> histogram.253;0
> histogram.254;0
> histogram.255;0
> histogram.26;0
> histogram.27;0
> histogram.28;0
> histogram.29;0
> histogram.3;0
> histogram.30;0
> histogram.31;0
> histogram.32;11924458
> histogram.33;11934243
> histogram.34;11930696
> histogram.35;11925574
> histogram.36;11929198
> histogram.37;11928146
> histogram.38;11932505
> histogram.39;11929406
> histogram.4;0
> histogram.40;11930100
> histogram.41;11930867
> histogram.42;11930796
> histogram.43;11930796
> histogram.44;11921866
> histogram.45;11935682
> histogram.46;11930075
> histogram.47;11928169
> histogram.48;11933490
> histogram.49;11932174
> histogram.5;0
> histogram.50;11933255
> histogram.51;11934009
> histogram.52;11928361
> histogram.53;11927626
> histogram.54;11931611
> histogram.55;11930755
> histogram.56;11933823
> histogram.57;11922508
> histogram.58;11930384
> histogram.59;11929805
> histogram.6;0
> histogram.60;11930064
> histogram.61;11926761
> histogram.62;11927605
> histogram.63;23858926
> histogram.64;11929516
> histogram.65;11930217
> histogram.66;11930478
> histogram.67;11939855
> histogram.68;11927850
> histogram.69;11931154
> histogram.7;0
> histogram.70;11935374
> histogram.71;11930754
> histogram.72;11928304
> histogram.73;11931772
> histogram.74;11939417
> histogram.75;11930712
> histogram.76;11933331
> histogram.77;11931279
> histogram.78;11928276
> histogram.79;11930071
> histogram.8;0
> histogram.80;11927830
> histogram.81;11931213
> histogram.82;11930964
> histogram.83;11928973
> histogram.84;11934325
> histogram.85;11929658
> histogram.86;11924667
> histogram.87;11931100
> histogram.88;11930252
> histogram.89;11927281
> histogram.9;11932848
> histogram.90;11930398
> histogram.91;0
> histogram.92;0
> histogram.93;0
> histogram.94;11928720
> histogram.95;11928988
> histogram.96;0
> histogram.97;11931423
> histogram.98;11928181
> histogram.99;11935549
> histogram.totalBytes;1073741824
>
> File3:
> histogram.0;0
> histogram.1;0
> histogram.10;11930417
> histogram.100;11926739
> histogram.101;11930580
> histogram.102;11928210
> histogram.103;11935300
> histogram.104;11925804
> histogram.105;11931023
> histogram.106;11932342
> histogram.107;11929778
> histogram.108;11930098
> histogram.109;11930759
> histogram.11;0
> histogram.110;11934343
> histogram.111;11935775
> histogram.112;11933877
> histogram.113;11926675
> histogram.114;11929332
> histogram.115;11928876
> histogram.116;11927819
> histogram.117;11932657
> histogram.118;11933508
> histogram.119;11928808
> histogram.12;0
> histogram.120;11937532
> histogram.121;11926907
> histogram.122;11933942
> histogram.123;0
> histogram.124;0
> histogram.125;0
> histogram.126;0
> histogram.127;0
> histogram.128;0
> histogram.129;0
> histogram.13;0
> histogram.130;0
> histogram.131;0
> histogram.132;0
> histogram.133;0
> histogram.134;0
> histogram.135;0
> histogram.136;0
> histogram.137;0
> histogram.138;0
> histogram.139;0
> histogram.14;0
> histogram.140;0
> histogram.141;0
> histogram.142;0
> histogram.143;0
> histogram.144;0
> histogram.145;0
> histogram.146;0
> histogram.147;0
> histogram.148;0
> histogram.149;0
> histogram.15;0
> histogram.150;0
> histogram.151;0
> histogram.152;0
> histogram.153;0
> histogram.154;0
> histogram.155;0
> histogram.156;0
> histogram.157;0
> histogram.158;0
> histogram.159;0
> histogram.16;0
> histogram.160;0
> histogram.161;0
> histogram.162;0
> histogram.163;0
> histogram.164;0
> histogram.165;0
> histogram.166;0
> histogram.167;0
> histogram.168;0
> histogram.169;0
> histogram.17;0
> histogram.170;0
> histogram.171;0
> histogram.172;0
> histogram.173;0
> histogram.174;0
> histogram.175;0
> histogram.176;0
> histogram.177;0
> histogram.178;0
> histogram.179;0
> histogram.18;0
> histogram.180;0
> histogram.181;0
> histogram.182;0
> histogram.183;0
> histogram.184;0
> histogram.185;0
> histogram.186;0
> histogram.187;0
> histogram.188;0
> histogram.189;0
> histogram.19;0
> histogram.190;0
> histogram.191;0
> histogram.192;0
> histogram.193;0
> histogram.194;0
> histogram.195;0
> histogram.196;0
> histogram.197;0
> histogram.198;0
> histogram.199;0
> histogram.2;0
> histogram.20;0
> histogram.200;0
> histogram.201;0
> histogram.202;0
> histogram.203;0
> histogram.204;0
> histogram.205;0
> histogram.206;0
> histogram.207;0
> histogram.208;0
> histogram.209;0
> histogram.21;0
> histogram.210;0
> histogram.211;0
> histogram.212;0
> histogram.213;0
> histogram.214;0
> histogram.215;0
> histogram.216;0
> histogram.217;0
> histogram.218;0
> histogram.219;0
> histogram.22;0
> histogram.220;0
> histogram.221;0
> histogram.222;0
> histogram.223;0
> histogram.224;0
> histogram.225;0
> histogram.226;0
> histogram.227;0
> histogram.228;0
> histogram.229;0
> histogram.23;0
> histogram.230;0
> histogram.231;0
> histogram.232;0
> histogram.233;0
> histogram.234;0
> histogram.235;0
> histogram.236;0
> histogram.237;0
> histogram.238;0
> histogram.239;0
> histogram.24;0
> histogram.240;0
> histogram.241;0
> histogram.242;0
> histogram.243;0
> histogram.244;0
> histogram.245;0
> histogram.246;0
> histogram.247;0
> histogram.248;0
> histogram.249;0
> histogram.25;0
> histogram.250;0
> histogram.251;0
> histogram.252;0
> histogram.253;0
> histogram.254;0
> histogram.255;0
> histogram.26;0
> histogram.27;0
> histogram.28;0
> histogram.29;0
> histogram.3;0
> histogram.30;0
> histogram.31;0
> histogram.32;11929486
> histogram.33;11930737
> histogram.34;11931092
> histogram.35;11934488
> histogram.36;11927605
> histogram.37;11930735
> histogram.38;11932174
> histogram.39;11936180
> histogram.4;0
> histogram.40;11931666
> histogram.41;11927043
> histogram.42;11929044
> histogram.43;11934104
> histogram.44;11936337
> histogram.45;11935580
> histogram.46;11929598
> histogram.47;11934083
> histogram.48;11928858
> histogram.49;11931098
> histogram.5;0
> histogram.50;11930618
> histogram.51;11925429
> histogram.52;11929741
> histogram.53;11934160
> histogram.54;11931999
> histogram.55;11930465
> histogram.56;11926194
> histogram.57;11926386
> histogram.58;11924871
> histogram.59;11929331
> histogram.6;0
> histogram.60;11926951
> histogram.61;11928631
> histogram.62;11927549
> histogram.63;23856730
> histogram.64;11930288
> histogram.65;11931523
> histogram.66;11932821
> histogram.67;11932509
> histogram.68;11929613
> histogram.69;11928651
> histogram.7;0
> histogram.70;11929253
> histogram.71;11931521
> histogram.72;11925805
> histogram.73;11934833
> histogram.74;11928314
> histogram.75;11923854
> histogram.76;11930892
> histogram.77;11927528
> histogram.78;11932850
> histogram.79;11934471
> histogram.8;0
> histogram.80;11925707
> histogram.81;11929213
> histogram.82;11931334
> histogram.83;11936739
> histogram.84;11927855
> histogram.85;11931668
> histogram.86;11928609
> histogram.87;11931930
> histogram.88;11934341
> histogram.89;11927519
> histogram.9;11928004
> histogram.90;11933502
> histogram.91;0
> histogram.92;0
> histogram.93;0
> histogram.94;11932024
> histogram.95;11932693
> histogram.96;0
> histogram.97;11928428
> histogram.98;11933195
> histogram.99;11924273
> histogram.totalBytes;1073741824
>
> Kind regards
> Jens
>
> Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <jo...@gmail.com>:
>
>> Jen
>>
>> 118 hours in - still goood.
>>
>> Thanks
>>
>> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com> wrote:
>> >
>> > Jens
>> >
>> > Update from hour 67.  Still lookin' good.
>> >
>> > Will advise.
>> >
>> > Thanks
>> >
>> > On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com>
>> wrote:
>> > >
>> > > Many many thanks 🙏 Joe for looking into this. My test flow was
>> running for 6 days before the first error occurred
>> > >
>> > > Thanks
>> > >
>> > > > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
>> > > >
>> > > > Jens,
>> > > >
>> > > > Am 40+ hours in running both your flow and mine to reproduce.  So
>> far
>> > > > neither have shown any sign of trouble.  Will keep running for
>> another
>> > > > week or so if I can.
>> > > >
>> > > > Thanks
>> > > >
>> > > >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <
>> jmkofoed.ube@gmail.com> wrote:
>> > > >>
>> > > >> The Physical hosts with VMWare is using the vmfs but the vm
>> machines running at hosts can’t see that.
>> > > >> But you asked about the underlying file system 😀 and since my
>> first answer with the copy from the fstab file wasn’t enough I just wanted
>> to give all the details 😁.
>> > > >>
>> > > >> If you create a vm for windows you would probably use NTFS (on top
>> of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
>> > > >>
>> > > >> All the partitions at my nifi nodes, are local devices (sda, sdb,
>> sdc and sdd) for each Linux machine. I don’t use nfs
>> > > >>
>> > > >> Kind regards
>> > > >> Jens
>> > > >>
>> > > >>
>> > > >>
>> > > >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >> I don't quite follow the EXT4 usage on top of VMFS but the point
>> here
>> > > >> is you'll ultimately need to truly understand your underlying
>> storage
>> > > >> system and what sorts of guarantees it is giving you.  If linux/the
>> > > >> jvm/nifi think it has a typical EXT4 type block storage system to
>> work
>> > > >> with it can only be safe/operate within those constraints.  I have
>> no
>> > > >> idea about what VMFS brings to the table or the settings for it.
>> > > >>
>> > > >> The sync properties I shared previously might help force the issue
>> of
>> > > >> ensuring a formal sync/flush cycle all the way through the disk has
>> > > >> occurred which we'd normally not do or need to do but again in some
>> > > >> cases offers a stronger guarantee in exchange for performance.
>> > > >>
>> > > >> In any case...Mark's path for you here will help identify what
>> we're
>> > > >> dealing with and we can go from there.
>> > > >>
>> > > >> I am aware of significant usage of NiFi on VMWare configurations
>> > > >> without issue at high rates for many years so whatever it is here
>> is
>> > > >> likely solvable.
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <
>> jmkofoed.ube@gmail.com> wrote:
>> > > >>
>> > > >>
>> > > >> Hi Mark
>> > > >>
>> > > >>
>> > > >> Thanks for the clarification. I will implement the script when I
>> return to the office at Monday next week ( November 1st).
>> > > >>
>> > > >> I don’t use NFS, but ext4. But I will implement the script so we
>> can check if it’s the case here. But I think the issue might be after the
>> processors writing content to the repository.
>> > > >>
>> > > >> I have a test flow running for more than 2 weeks without any
>> errors. But this flow only calculate hash and comparing.
>> > > >>
>> > > >>
>> > > >> Two other flows both create errors. One flow use
>> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use
>> MergeContent->UnpackContent->CryptographicHashContent->compares. The last
>> flow is totally inside nifi, excluding other network/server issues.
>> > > >>
>> > > >>
>> > > >> In both cases the CryptographicHashContent is right after a
>> process which writes new content to the repository. But in one case a file
>> in our production flow did calculate a wrong hash 4 times with a 1 minutes
>> delay between each calculation. A few hours later I looped the file back
>> and this time it was OK.
>> > > >>
>> > > >> Just like the case in step 5 and 12 in the pdf file
>> > > >>
>> > > >>
>> > > >> I will let you all know more later next week
>> > > >>
>> > > >>
>> > > >> Kind regards
>> > > >>
>> > > >> Jens
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <markap14@hotmail.com
>> >:
>> > > >>
>> > > >>
>> > > >> And the actual script:
>> > > >>
>> > > >>
>> > > >>
>> > > >> import org.apache.nifi.flowfile.FlowFile
>> > > >>
>> > > >>
>> > > >> import java.util.stream.Collectors
>> > > >>
>> > > >>
>> > > >> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>> > > >>
>> > > >>   final Map<String, String> histogram =
>> flowFile.getAttributes().entrySet().stream()
>> > > >>
>> > > >>       .filter({ entry -> entry.getKey().startsWith("histogram.") })
>> > > >>
>> > > >>       .collect(Collectors.toMap({ entry -> entry.key}, { entry ->
>> entry.value }))
>> > > >>
>> > > >>   return histogram;
>> > > >>
>> > > >> }
>> > > >>
>> > > >>
>> > > >> Map<String, String> createHistogram(final FlowFile flowFile, final
>> InputStream inStream) {
>> > > >>
>> > > >>   final Map<String, String> histogram = new HashMap<>();
>> > > >>
>> > > >>   final int[] distribution = new int[256];
>> > > >>
>> > > >>   Arrays.fill(distribution, 0);
>> > > >>
>> > > >>
>> > > >>   long total = 0L;
>> > > >>
>> > > >>   final byte[] buffer = new byte[8192];
>> > > >>
>> > > >>   int len;
>> > > >>
>> > > >>   while ((len = inStream.read(buffer)) > 0) {
>> > > >>
>> > > >>       for (int i=0; i < len; i++) {
>> > > >>
>> > > >>           final int val = buffer[i];
>> > > >>
>> > > >>           distribution[val]++;
>> > > >>
>> > > >>           total++;
>> > > >>
>> > > >>       }
>> > > >>
>> > > >>   }
>> > > >>
>> > > >>
>> > > >>   for (int i=0; i < 256; i++) {
>> > > >>
>> > > >>       histogram.put("histogram." + i,
>> String.valueOf(distribution[i]));
>> > > >>
>> > > >>   }
>> > > >>
>> > > >>   histogram.put("histogram.totalBytes", String.valueOf(total));
>> > > >>
>> > > >>
>> > > >>   return histogram;
>> > > >>
>> > > >> }
>> > > >>
>> > > >>
>> > > >> void logHistogramDifferences(final Map<String, String> previous,
>> final Map<String, String> updated) {
>> > > >>
>> > > >>   final StringBuilder sb = new StringBuilder("There are
>> differences in the histogram\n");
>> > > >>
>> > > >>   final Map<String, String> sorted = new TreeMap<>(previous)
>> > > >>
>> > > >>   for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>> > > >>
>> > > >>       final String key = entry.getKey();
>> > > >>
>> > > >>       final String previousValue = entry.getValue();
>> > > >>
>> > > >>       final String updatedValue = updated.get(entry.getKey())
>> > > >>
>> > > >>
>> > > >>       if (!Objects.equals(previousValue, updatedValue)) {
>> > > >>
>> > > >>           sb.append("Byte Value: ").append(key).append(", Previous
>> Count: ").append(previousValue).append(", New Count:
>> ").append(updatedValue).append("\n");
>> > > >>
>> > > >>       }
>> > > >>
>> > > >>   }
>> > > >>
>> > > >>
>> > > >>   log.error(sb.toString());
>> > > >>
>> > > >> }
>> > > >>
>> > > >>
>> > > >>
>> > > >> def flowFile = session.get()
>> > > >>
>> > > >> if (flowFile == null) {
>> > > >>
>> > > >>   return
>> > > >>
>> > > >> }
>> > > >>
>> > > >>
>> > > >> final Map<String, String> previousHistogram =
>> getPreviousHistogram(flowFile)
>> > > >>
>> > > >> Map<String, String> histogram = null;
>> > > >>
>> > > >>
>> > > >> final InputStream inStream = session.read(flowFile);
>> > > >>
>> > > >> try {
>> > > >>
>> > > >>   histogram = createHistogram(flowFile, inStream);
>> > > >>
>> > > >> } finally {
>> > > >>
>> > > >>   inStream.close()
>> > > >>
>> > > >> }
>> > > >>
>> > > >>
>> > > >> if (!previousHistogram.isEmpty()) {
>> > > >>
>> > > >>   if (previousHistogram.equals(histogram)) {
>> > > >>
>> > > >>       log.info("Histograms match")
>> > > >>
>> > > >>   } else {
>> > > >>
>> > > >>       logHistogramDifferences(previousHistogram, histogram)
>> > > >>
>> > > >>       session.transfer(flowFile, REL_FAILURE)
>> > > >>
>> > > >>       return;
>> > > >>
>> > > >>   }
>> > > >>
>> > > >> }
>> > > >>
>> > > >>
>> > > >> flowFile = session.putAllAttributes(flowFile, histogram)
>> > > >>
>> > > >> session.transfer(flowFile, REL_SUCCESS)
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com>
>> wrote:
>> > > >>
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >>
>> > > >> For a bit of background here, the reason that Joe and I have
>> expressed interest in NFS file systems is that the way the protocol works,
>> it is allowed to receive packets/chunks of the file out-of-order. So, what
>> happens is let’s say a 1 MB file is being written. The first 500 KB are
>> received. Then instead of the the 501st KB it receives the 503rd KB. What
>> happens is that the size of the file on the file system becomes 503 KB. But
>> what about 501 & 502? Well when you read the data, the file system just
>> returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server
>> receives those bytes, it then goes back and fills in the proper bytes. So
>> if you’re running on NFS, it is possible for the contents of the file on
>> the underlying file system to change out from under you. It’s not clear to
>> me what other types of file system might do something similar.
>> > > >>
>> > > >>
>> > > >> So, one thing that we can do is to find out whether or not the
>> contents of the underlying file have changed in some way, or if there’s
>> something else happening that could perhaps result in the hashes being
>> wrong. I’ve put together a script that should help diagnose this.
>> > > >>
>> > > >>
>> > > >> Can you insert an ExecuteScript processor either just before or
>> just after your CryptographicHashContent processor? Doesn’t really matter
>> whether it’s run just before or just after. I’ll attach the script here.
>> It’s a Groovy Script so you should be able to use ExecuteScript with Script
>> Engine = Groovy and the following script as the Script Body. No other
>> changes needed.
>> > > >>
>> > > >>
>> > > >> The way the script works, it reads in the contents of the
>> FlowFile, and then it builds up a histogram of all byte values (0-255) that
>> it sees in the contents, and then adds that as attributes. So it adds
>> attributes such as:
>> > > >>
>> > > >> histogram.0 = 280273
>> > > >>
>> > > >> histogram.1 = 2820
>> > > >>
>> > > >> histogram.2 = 48202
>> > > >>
>> > > >> histogram.3 = 3820
>> > > >>
>> > > >> …
>> > > >>
>> > > >> histogram.totalBytes = 1780928732
>> > > >>
>> > > >>
>> > > >> It then checks if those attributes have already been added. If so,
>> after calculating that histogram, it checks against the previous values (in
>> the attributes). If they are the same, the FlowFile goes to ’success’. If
>> they are different, it logs an error indicating the before/after value for
>> any byte whose distribution was different, and it routes to failure.
>> > > >>
>> > > >>
>> > > >> So, if for example, the first time through it sees 280,273 bytes
>> with a value of ‘0’, and the second times it only sees 12,001 then we know
>> there were a bunch of 0’s previously that were updated to be some other
>> value. And it includes the total number of bytes in case somehow we find
>> that we’re reading too many bytes or not enough bytes or something like
>> that. This should help narrow down what’s happening.
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >> -Mark
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>> > > >>
>> > > >>
>> > > >> Jens
>> > > >>
>> > > >>
>> > > >> Attached is the flow I was using (now running yours and this
>> one).  Curious if that one reproduces the issue for you as well.
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >>
>> > > >> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com>
>> wrote:
>> > > >>
>> > > >>
>> > > >> Jens
>> > > >>
>> > > >>
>> > > >> I have your flow running and will keep it running for several
>> days/week to see if I can reproduce.  Also of note please use your same
>> test flow but use HashContent instead of crypto hash.  Curious if that
>> matters for any reason...
>> > > >>
>> > > >>
>> > > >> Still want to know more about your underlying storage system.
>> > > >>
>> > > >>
>> > > >> You could also try updating nifi.properties and changing the
>> following lines:
>> > > >>
>> > > >> nifi.flowfile.repository.always.sync=true
>> > > >>
>> > > >> nifi.content.repository.always.sync=true
>> > > >>
>> > > >> nifi.provenance.repository.always.sync=true
>> > > >>
>> > > >>
>> > > >> It will hurt performance but can be useful/necessary on certain
>> storage subsystems.
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >>
>> > > >> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com>
>> wrote:
>> > > >>
>> > > >>
>> > > >> Ignore "For the scenario where you can replicate this please share
>> the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>> > > >>
>> > > >>
>> > > >> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com>
>> wrote:
>> > > >>
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >>
>> > > >> We asked about the underlying storage system.  You replied with
>> some info but not the specifics.  Do you know precisely what the underlying
>> storage is and how it is presented to the operating system?  For instance
>> is it NFS or something similar?
>> > > >>
>> > > >>
>> > > >> I've setup a very similar flow at extremely high rates running for
>> the past several days with no issue.  In my case though I know precisely
>> what the config is and the disk setup is.  Didn't do anything special to be
>> clear but still it is important to know.
>> > > >>
>> > > >>
>> > > >> For the scenario where you can replicate this please share the
>> flow.xml.gz for which it is reproducible.
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >> Joe
>> > > >>
>> > > >>
>> > > >> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <
>> jmkofoed.ube@gmail.com> wrote:
>> > > >>
>> > > >>
>> > > >> Dear Joe and Mark
>> > > >>
>> > > >>
>> > > >> I have created a test flow without the sftp processors, which
>> don't create any errors. Therefore I created a new test flow where I use a
>> MergeContent and UnpackContent instead of the sftp processors. This keeps
>> all data internal in NIFI, but force NIFI to write and read new files
>> totally local.
>> > > >>
>> > > >> My flow have been running for 7 days and this morning there where
>> 2 files where the sha256 has been given another has value than original. I
>> have set this flow up in another nifi cluster only for testing, and the
>> cluster is not doing anything else. It is using Nifi 1.14.0
>> > > >>
>> > > >> So I can reproduce issues at different nifi clusters and versions
>> (1.13.2 and 1.14.0) where the calculation of a hash on content can give
>> different outputs. Is doesn't make any sense, but it happens. In all my
>> cases the issues happens where the calculations of the hashcontent happens
>> right after NIFI writes the content to the content repository. I don't know
>> if there cut be some kind of delay writing the content 100% before the next
>> processors begin reading the content???
>> > > >>
>> > > >>
>> > > >> Please see attach test flow, and the previous mail with a pdf
>> showing the lineage of a production file which also had issues. In the pdf
>> check step 5 and 12.
>> > > >>
>> > > >>
>> > > >> Kind regards
>> > > >>
>> > > >> Jens M. Kofoed
>> > > >>
>> > > >>
>> > > >>
>> > > >> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
>> jmkofoed.ube@gmail.com>:
>> > > >>
>> > > >>
>> > > >> Joe,
>> > > >>
>> > > >>
>> > > >> To start from the last mail :-)
>> > > >>
>> > > >> All the repositories has it's own disk, and I'm using ext4
>> > > >>
>> > > >> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>> > > >>
>> > > >> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>> > > >>
>> > > >> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>> > > >>
>> > > >>
>> > > >> My test flow WITH sftp looks like this:
>> > > >>
>> > > >> <image.png>
>> > > >>
>> > > >> And this flow has produced 1 error within 3 days. After many many
>> loops the file fails and went out via the "unmatched" output to  the
>> disabled UpdateAttribute, which is doing nothing. Just for keeping the
>> failed flowfile in a queue.  I enabled the UpdateAttribute and looped the
>> file back to the CryptographicHashContent and now it calculated the hash
>> correct again. But in this flow I have a FetchSFTP Process right before the
>> Hashing.
>> > > >>
>> > > >> Right now my flow is running without the 2 sftp processors, and
>> the last 24hours there has been no errors.
>> > > >>
>> > > >>
>> > > >> About the Lineage:
>> > > >>
>> > > >> Are there a way to export all the lineage data? The export only
>> generate a svg file.
>> > > >>
>> > > >> This is only for the receiving nifi which is internally calculate
>> 2 different hashes on the same content with ca. 1 minutes delay. Attached
>> is a pdf-document with the lineage, the flow and all the relevant
>> Provenance information's for each step in the lineage.
>> > > >>
>> > > >> The interesting steps are step 5 and 12.
>> > > >>
>> > > >>
>> > > >> Can the issues be that data is not written 100% to disk between
>> step 4 and 5 in the flow?
>> > > >>
>> > > >>
>> > > >> Kind regards
>> > > >>
>> > > >> Jens M. Kofoed
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <
>> joe.witt@gmail.com>:
>> > > >>
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >>
>> > > >> Also what type of file system/storage system are you running NiFi
>> on
>> > > >>
>> > > >> in this case?  We'll need to know this for the NiFi
>> > > >>
>> > > >> content/flowfile/provenance repositories? Is it NFS?
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >>
>> > > >> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com>
>> wrote:
>> > > >>
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >>
>> > > >> And to further narrow this down
>> > > >>
>> > > >>
>> > > >> "I have a test flow, where a GenerateFlowfile has created 6x 1GB
>> files
>> > > >>
>> > > >> (2 files per node) and next process was a hashcontent before it run
>> > > >>
>> > > >> into a test loop. Where files are uploaded via PutSFTP to a test
>> > > >>
>> > > >> server, and downloaded again and recalculated the hash. I have had
>> one
>> > > >>
>> > > >> issue after 3 days of running."
>> > > >>
>> > > >>
>> > > >> So to be clear with GenerateFlowFile making these files and then
>> you
>> > > >>
>> > > >> looping the content is wholly and fully exclusively within the
>> control
>> > > >>
>> > > >> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping
>> the
>> > > >>
>> > > >> same files over and over in nifi itself you can make this happen or
>> > > >>
>> > > >> cannot?
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >>
>> > > >> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>
>> wrote:
>> > > >>
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >>
>> > > >> "After fetching a FlowFile-stream file and unpacked it back into
>> NiFi
>> > > >>
>> > > >> I calculate a sha256. 1 minutes later I recalculate the sha256 on
>> the
>> > > >>
>> > > >> exact same file. And got a new hash. That is what worry’s me.
>> > > >>
>> > > >> The fact that the same file can be recalculated and produce two
>> > > >>
>> > > >> different hashes, is very strange, but it happens. "
>> > > >>
>> > > >>
>> > > >> Ok so to confirm you are saying that in each case this happens you
>> see
>> > > >>
>> > > >> it first compute the wrong hash, but then if you retry the same
>> > > >>
>> > > >> flowfile it then provides the correct hash?
>> > > >>
>> > > >>
>> > > >> Can you please also show/share the lineage history for such a flow
>> > > >>
>> > > >> file then?  It should have events for the initial hash, second
>> hash,
>> > > >>
>> > > >> the unpacking, trace to the original stream, etc...
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >>
>> > > >> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
>> jmkofoed.ube@gmail.com> wrote:
>> > > >>
>> > > >>
>> > > >> Dear Mark and Joe
>> > > >>
>> > > >>
>> > > >> I know my setup isn’t normal for many people. But if we only looks
>> at my receive side, which the last mails is about. Every thing is happening
>> at the same NIFI instance. It is the same 3 node NIFI cluster.
>> > > >>
>> > > >> After fetching a FlowFile-stream file and unpacked it back into
>> NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>> exact same file. And got a new hash. That is what worry’s me.
>> > > >>
>> > > >> The fact that the same file can be recalculated and produce two
>> different hashes, is very strange, but it happens. Over the last 5 months
>> it have only happen 35-40 times.
>> > > >>
>> > > >>
>> > > >> I can understand if the file is not completely loaded and saved
>> into the content repository before the hashing starts. But I believe that
>> the unpack process don’t forward the flow file to the next process before
>> it is 100% finish unpacking and saving the new content to the repository.
>> > > >>
>> > > >>
>> > > >> I have a test flow, where a GenerateFlowfile has created 6x 1GB
>> files (2 files per node) and next process was a hashcontent before it run
>> into a test loop. Where files are uploaded via PutSFTP to a test server,
>> and downloaded again and recalculated the hash. I have had one issue after
>> 3 days of running.
>> > > >>
>> > > >> Now the test flow is running without the Put/Fetch sftp processors.
>> > > >>
>> > > >>
>> > > >> Another problem is that I can’t find any correlation to other
>> events. Not within NIFI, nor the server itself or VMWare. If I just could
>> find any other event which happens at the same time, I might be able to
>> force some kind of event to trigger the issue.
>> > > >>
>> > > >> I have tried to force VMware to migrate a NiFi node to another
>> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
>> trigger and error.
>> > > >>
>> > > >>
>> > > >> I know it will be very very difficult to reproduce. But I will
>> setup multiple NiFi instances running different test flows to see if I can
>> find any reason why it behaves as it does.
>> > > >>
>> > > >>
>> > > >> Kind Regards
>> > > >>
>> > > >> Jens M. Kofoed
>> > > >>
>> > > >>
>> > > >> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <markap14@hotmail.com
>> >:
>> > > >>
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >>
>> > > >> Thanks for sharing the images.
>> > > >>
>> > > >>
>> > > >> I tried to setup a test to reproduce the issue. I’ve had it
>> running for quite some time. Running through millions of iterations.
>> > > >>
>> > > >>
>> > > >> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the
>> tune of hundreds of MB). I’ve been unable to reproduce an issue after
>> millions of iterations.
>> > > >>
>> > > >>
>> > > >> So far I cannot replicate. And since you’re pulling the data via
>> SFTP and then unpacking, which preserves all original attributes from a
>> different system, this can easily become confusing.
>> > > >>
>> > > >>
>> > > >> Recommend trying to reproduce with SFTP-related processors out of
>> the picture, as Joe is mentioning. Either using GetFile/FetchFile or
>> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
>> an ‘initial hash’, copy that value to another attribute, and then loop,
>> generating the hash and comparing against the original one. I’ll attach a
>> flow that does this, but not sure if the email server will strip out the
>> attachment or not.
>> > > >>
>> > > >>
>> > > >> This way we remove any possibility of actual corruption between
>> the two nifi instances. If we can still see corruption / different hashes
>> within a single nifi instance, then it certainly warrants further
>> investigation but i can’t see any issues so far.
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >> -Mark
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>> > > >>
>> > > >>
>> > > >> Jens
>> > > >>
>> > > >>
>> > > >> Actually is this current loop test contained within a single nifi
>> and there you see corruption happen?
>> > > >>
>> > > >>
>> > > >> Joe
>> > > >>
>> > > >>
>> > > >> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
>> wrote:
>> > > >>
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >>
>> > > >> You have a very involved setup including other systems (non
>> NiFi).  Have you removed those systems from the equation so you have more
>> evidence to support your expectation that NiFi is doing something other
>> than you expect?
>> > > >>
>> > > >>
>> > > >> Joe
>> > > >>
>> > > >>
>> > > >> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
>> jmkofoed.ube@gmail.com> wrote:
>> > > >>
>> > > >>
>> > > >> Hi
>> > > >>
>> > > >>
>> > > >> Today I have another file which have been running through the
>> retry loop one time. To test the processors and the algorithm I added the
>> HashContent processor and also added hashing by SHA-1.
>> > > >>
>> > > >> I file have been going through the system, and both the SHA-1 and
>> SHA-256 are both different than expected. with a 1 minutes delay the file
>> is going back into the hashing content flow and this time it calculates
>> both hashes fine.
>> > > >>
>> > > >>
>> > > >> I don't believe that the hashing is buggy, but something is very
>> very strange. What can influence the processors/algorithm to calculate a
>> different hash???
>> > > >>
>> > > >> All the input/output claim information is exactly the same. It is
>> the same flow/content file going in a loop. It happens on all 3 nodes.
>> > > >>
>> > > >>
>> > > >> Any suggestions for where to dig ?
>> > > >>
>> > > >>
>> > > >> Regards
>> > > >>
>> > > >> Jens M. Kofoed
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
>> jmkofoed.ube@gmail.com>:
>> > > >>
>> > > >>
>> > > >> Hi Mark
>> > > >>
>> > > >>
>> > > >> Thanks for replaying and the suggestion to look at the content
>> Claim.
>> > > >>
>> > > >> These 3 pictures is from the first attempt:
>> > > >>
>> > > >> <image.png>   <image.png>   <image.png>
>> > > >>
>> > > >>
>> > > >> Yesterday I realized that the content was still in the archive, so
>> I could Replay the file.
>> > > >>
>> > > >> <image.png>
>> > > >>
>> > > >> So here are the same pictures but for the replay and as you can
>> see the Identifier, offset and Size are all the same.
>> > > >>
>> > > >> <image.png>   <image.png>   <image.png>
>> > > >>
>> > > >>
>> > > >> In my flow if the hash does not match my original first calculated
>> hash, it goes into a retry loop. Here are the pictures for the 4th time the
>> file went through:
>> > > >>
>> > > >> <image.png>   <image.png>   <image.png>
>> > > >>
>> > > >> Here the content Claim is all the same.
>> > > >>
>> > > >>
>> > > >> It is very rare that we see these issues <1 : 1.000.000 files and
>> only with large files. Only once have I seen the error with a 110MB file,
>> the other times the files size are above 800MB.
>> > > >>
>> > > >> This time it was a Nifi-Flowstream v3 file, which has been
>> exported from one system and imported in another. But while the file has
>> been imported it is the same file inside NIFI and it stays at the same
>> node. Going through the same loop of processors multiple times and in the
>> end the CryptographicHashContent calculate a different SHA256 than it did
>> earlier. This should not be possible!!! And that is what concern my the
>> most.
>> > > >>
>> > > >> What can influence the same processor to calculate 2 different
>> sha256 on the exact same content???
>> > > >>
>> > > >>
>> > > >> Regards
>> > > >>
>> > > >> Jens M. Kofoed
>> > > >>
>> > > >>
>> > > >>
>> > > >> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
>> markap14@hotmail.com>:
>> > > >>
>> > > >>
>> > > >> Jens,
>> > > >>
>> > > >>
>> > > >> In the two provenance events - one showing a hash of dd4cc… and
>> the other showing f6f0….
>> > > >>
>> > > >> If you go to the Content tab, do they both show the same Content
>> Claim? I.e., do the Input Claim / Output Claim show the same values for
>> Container, Section, Identifier, Offset, and Size?
>> > > >>
>> > > >>
>> > > >> Thanks
>> > > >>
>> > > >> -Mark
>> > > >>
>> > > >>
>> > > >> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <
>> jmkofoed.ube@gmail.com> wrote:
>> > > >>
>> > > >>
>> > > >> Dear NIFI Users
>> > > >>
>> > > >>
>> > > >> I have posted this mail in the developers mailing list and just
>> want to inform all of our about a very odd behavior we are facing.
>> > > >>
>> > > >> The background:
>> > > >>
>> > > >> We have data going between 2 different NIFI systems which has no
>> direct network access to each other. Therefore we calculate a SHA256 hash
>> value of the content at system 1, before the flowfile and data are combined
>> and saved as a "flowfile-stream-v3" pkg file. The file is then transported
>> to system 2, where the pkg file is unpacked and the flow can continue. To
>> be sure about file integrity we calculate a new sha256 at system 2. But
>> sometimes we see that the sha256 gets another value, which might suggest
>> the file was corrupted. But recalculating the sha256 again gives a new hash
>> value.
>> > > >>
>> > > >>
>> > > >> ----
>> > > >>
>> > > >>
>> > > >> Tonight I had yet another file which didn't match the expected
>> sha256 hash value. The content is a 1.7GB file and the Event Duration was
>> "00:00:17.539" to calculate the hash.
>> > > >>
>> > > >> I have created a Retry loop, where the file will go to a Wait
>> process for delaying the file 1 minute and going back to the
>> CryptographicHashContent for a new calculation. After 3 retries the file
>> goes to the retries_exceeded and goes to a disabled process just to be in a
>> queue so I manually can look at it. This morning I rerouted the file from
>> my retries_exceeded queue back to the CryptographicHashContent for a new
>> calculation and this time it calculated the correct hash value.
>> > > >>
>> > > >>
>> > > >> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very
>> strange is happening.
>> > > >>
>> > > >> <image.png>
>> > > >>
>> > > >>
>> > > >> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02
>> with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
>> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
>> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
>> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>> > > >>
>> > > >> I have inspected different logs to see if I can find any
>> correlation what happened at the same time as the file is going through my
>> loop, but there are no event/task at that exact time.
>> > > >>
>> > > >>
>> > > >> System 1:
>> > > >>
>> > > >> At 10/19/2021 00:15:11.247 CEST my file is going through a
>> CryptographicHashContent: SHA256 value:
>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>> > > >>
>> > > >> The file is exported as a "FlowFile Stream, v3" to System 2
>> > > >>
>> > > >>
>> > > >> SYSTEM 2:
>> > > >>
>> > > >> At 10/19/2021 00:18:10.528 CEST the file is going through a
>> CryptographicHashContent: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> > > >>
>> > > >> <image.png>
>> > > >>
>> > > >> At 10/19/2021 00:19:08.996 CEST the file is going through the same
>> CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> > > >>
>> > > >> At 10/19/2021 00:20:04.376 CEST the file is going through the same
>> a CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> > > >>
>> > > >> At 10/19/2021 00:21:01.711 CEST the file is going through the same
>> a CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> > > >>
>> > > >>
>> > > >> At 10/19/2021 06:07:43.376 CEST the file is going through the same
>> a CryptographicHashContent at system 2: SHA256 value:
>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>> > > >>
>> > > >> <image.png>
>> > > >>
>> > > >>
>> > > >> How on earth can this happen???
>> > > >>
>> > > >>
>> > > >> Kind Regards
>> > > >>
>> > > >> Jens M. Kofoed
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> <Repro.json>
>> > > >>
>> > > >>
>> > > >> <Try_to_recreate_Jens_Challenge.json>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>>
>
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Mark Payne <ma...@hotmail.com>.
Jens,

The histograms, in and of themselves, are not very interesting. The interesting thing would be the difference in the histogram before & after the hash. Can you provide the ERROR level logs generated by the ExecuteScript? That’s what is of interest.

Thanks
-Mark


On Nov 2, 2021, at 1:35 AM, Jens M. Kofoed <jm...@gmail.com>> wrote:

Hi Mark and Joe

Yesterday morning I implemented Mark's script in my 2 testflows. One testflow using sftp the other MergeContent/UnpackContent. Both testflow are running at a test cluster with 3 nodes and NIFI 1.14.0
The 1st flow with sftp have had 1 file going into the failure queue after about 16 hours.
The 2nd flow have had 2 files  going into the failure queue after about 15 and 17 hours.

There are definitely something going wrongs in my setup, but I can't figure out what.

Information from file 1:
histogram.0;0
histogram.1;0
histogram.10;11926720
histogram.100;11927504
histogram.101;11925396
histogram.102;11929923
histogram.103;11931596
histogram.104;11929071
histogram.105;11931365
histogram.106;11928661
histogram.107;11929864
histogram.108;11931611
histogram.109;11932758
histogram.11;0
histogram.110;11927893
histogram.111;11933519
histogram.112;11931392
histogram.113;11928534
histogram.114;11936879
histogram.115;11932818
histogram.116;11934767
histogram.117;11929143
histogram.118;11931854
histogram.119;11926333
histogram.12;0
histogram.120;11928731
histogram.121;11931149
histogram.122;11926725
histogram.123;0
histogram.124;0
histogram.125;0
histogram.126;0
histogram.127;0
histogram.128;0
histogram.129;0
histogram.13;0
histogram.130;0
histogram.131;0
histogram.132;0
histogram.133;0
histogram.134;0
histogram.135;0
histogram.136;0
histogram.137;0
histogram.138;0
histogram.139;0
histogram.14;0
histogram.140;0
histogram.141;0
histogram.142;0
histogram.143;0
histogram.144;0
histogram.145;0
histogram.146;0
histogram.147;0
histogram.148;0
histogram.149;0
histogram.15;0
histogram.150;0
histogram.151;0
histogram.152;0
histogram.153;0
histogram.154;0
histogram.155;0
histogram.156;0
histogram.157;0
histogram.158;0
histogram.159;0
histogram.16;0
histogram.160;0
histogram.161;0
histogram.162;0
histogram.163;0
histogram.164;0
histogram.165;0
histogram.166;0
histogram.167;0
histogram.168;0
histogram.169;0
histogram.17;0
histogram.170;0
histogram.171;0
histogram.172;0
histogram.173;0
histogram.174;0
histogram.175;0
histogram.176;0
histogram.177;0
histogram.178;0
histogram.179;0
histogram.18;0
histogram.180;0
histogram.181;0
histogram.182;0
histogram.183;0
histogram.184;0
histogram.185;0
histogram.186;0
histogram.187;0
histogram.188;0
histogram.189;0
histogram.19;0
histogram.190;0
histogram.191;0
histogram.192;0
histogram.193;0
histogram.194;0
histogram.195;0
histogram.196;0
histogram.197;0
histogram.198;0
histogram.199;0
histogram.2;0
histogram.20;0
histogram.200;0
histogram.201;0
histogram.202;0
histogram.203;0
histogram.204;0
histogram.205;0
histogram.206;0
histogram.207;0
histogram.208;0
histogram.209;0
histogram.21;0
histogram.210;0
histogram.211;0
histogram.212;0
histogram.213;0
histogram.214;0
histogram.215;0
histogram.216;0
histogram.217;0
histogram.218;0
histogram.219;0
histogram.22;0
histogram.220;0
histogram.221;0
histogram.222;0
histogram.223;0
histogram.224;0
histogram.225;0
histogram.226;0
histogram.227;0
histogram.228;0
histogram.229;0
histogram.23;0
histogram.230;0
histogram.231;0
histogram.232;0
histogram.233;0
histogram.234;0
histogram.235;0
histogram.236;0
histogram.237;0
histogram.238;0
histogram.239;0
histogram.24;0
histogram.240;0
histogram.241;0
histogram.242;0
histogram.243;0
histogram.244;0
histogram.245;0
histogram.246;0
histogram.247;0
histogram.248;0
histogram.249;0
histogram.25;0
histogram.250;0
histogram.251;0
histogram.252;0
histogram.253;0
histogram.254;0
histogram.255;0
histogram.26;0
histogram.27;0
histogram.28;0
histogram.29;0
histogram.3;0
histogram.30;0
histogram.31;0
histogram.32;11930422
histogram.33;11934311
histogram.34;11930459
histogram.35;11924776
histogram.36;11924186
histogram.37;11928616
histogram.38;11929474
histogram.39;11929607
histogram.4;0
histogram.40;11928053
histogram.41;11930402
histogram.42;11926830
histogram.43;11938138
histogram.44;11932536
histogram.45;11931053
histogram.46;11930008
histogram.47;11927747
histogram.48;11936055
histogram.49;11931471
histogram.5;0
histogram.50;11931921
histogram.51;11929643
histogram.52;11923847
histogram.53;11927311
histogram.54;11933754
histogram.55;11925964
histogram.56;11928872
histogram.57;11931124
histogram.58;11928474
histogram.59;11925814
histogram.6;0
histogram.60;11933978
histogram.61;11934136
histogram.62;11932016
histogram.63;23864588
histogram.64;11924792
histogram.65;11934789
histogram.66;11933047
histogram.67;11931899
histogram.68;11935615
histogram.69;11927249
histogram.7;0
histogram.70;11933276
histogram.71;11927953
histogram.72;11929275
histogram.73;11930292
histogram.74;11935428
histogram.75;11930317
histogram.76;11935737
histogram.77;11932127
histogram.78;11932344
histogram.79;11932094
histogram.8;0
histogram.80;11930688
histogram.81;11928415
histogram.82;11931559
histogram.83;11934192
histogram.84;11927224
histogram.85;11929491
histogram.86;11930624
histogram.87;11932201
histogram.88;11930694
histogram.89;11936439
histogram.9;11933187
histogram.90;11926445
histogram.91;0
histogram.92;0
histogram.93;0
histogram.94;11931596
histogram.95;11929379
histogram.96;0
histogram.97;11928864
histogram.98;11924738
histogram.99;11930062
histogram.totalBytes;1073741824

File 2:
histogram.0;0
histogram.1;0
histogram.10;11932402
histogram.100;11927531
histogram.101;11928454
histogram.102;11934432
histogram.103;11924623
histogram.104;11934492
histogram.105;11934585
histogram.106;11928955
histogram.107;11928651
histogram.108;11930139
histogram.109;11929325
histogram.11;0
histogram.110;11930486
histogram.111;11933517
histogram.112;11928334
histogram.113;11927798
histogram.114;11929222
histogram.115;11932057
histogram.116;11931182
histogram.117;11933407
histogram.118;11932709
histogram.119;11931338
histogram.12;0
histogram.120;11933700
histogram.121;11929803
histogram.122;11930218
histogram.123;0
histogram.124;0
histogram.125;0
histogram.126;0
histogram.127;0
histogram.128;0
histogram.129;0
histogram.13;0
histogram.130;0
histogram.131;0
histogram.132;0
histogram.133;0
histogram.134;0
histogram.135;0
histogram.136;0
histogram.137;0
histogram.138;0
histogram.139;0
histogram.14;0
histogram.140;0
histogram.141;0
histogram.142;0
histogram.143;0
histogram.144;0
histogram.145;0
histogram.146;0
histogram.147;0
histogram.148;0
histogram.149;0
histogram.15;0
histogram.150;0
histogram.151;0
histogram.152;0
histogram.153;0
histogram.154;0
histogram.155;0
histogram.156;0
histogram.157;0
histogram.158;0
histogram.159;0
histogram.16;0
histogram.160;0
histogram.161;0
histogram.162;0
histogram.163;0
histogram.164;0
histogram.165;0
histogram.166;0
histogram.167;0
histogram.168;0
histogram.169;0
histogram.17;0
histogram.170;0
histogram.171;0
histogram.172;0
histogram.173;0
histogram.174;0
histogram.175;0
histogram.176;0
histogram.177;0
histogram.178;0
histogram.179;0
histogram.18;0
histogram.180;0
histogram.181;0
histogram.182;0
histogram.183;0
histogram.184;0
histogram.185;0
histogram.186;0
histogram.187;0
histogram.188;0
histogram.189;0
histogram.19;0
histogram.190;0
histogram.191;0
histogram.192;0
histogram.193;0
histogram.194;0
histogram.195;0
histogram.196;0
histogram.197;0
histogram.198;0
histogram.199;0
histogram.2;0
histogram.20;0
histogram.200;0
histogram.201;0
histogram.202;0
histogram.203;0
histogram.204;0
histogram.205;0
histogram.206;0
histogram.207;0
histogram.208;0
histogram.209;0
histogram.21;0
histogram.210;0
histogram.211;0
histogram.212;0
histogram.213;0
histogram.214;0
histogram.215;0
histogram.216;0
histogram.217;0
histogram.218;0
histogram.219;0
histogram.22;0
histogram.220;0
histogram.221;0
histogram.222;0
histogram.223;0
histogram.224;0
histogram.225;0
histogram.226;0
histogram.227;0
histogram.228;0
histogram.229;0
histogram.23;0
histogram.230;0
histogram.231;0
histogram.232;0
histogram.233;0
histogram.234;0
histogram.235;0
histogram.236;0
histogram.237;0
histogram.238;0
histogram.239;0
histogram.24;0
histogram.240;0
histogram.241;0
histogram.242;0
histogram.243;0
histogram.244;0
histogram.245;0
histogram.246;0
histogram.247;0
histogram.248;0
histogram.249;0
histogram.25;0
histogram.250;0
histogram.251;0
histogram.252;0
histogram.253;0
histogram.254;0
histogram.255;0
histogram.26;0
histogram.27;0
histogram.28;0
histogram.29;0
histogram.3;0
histogram.30;0
histogram.31;0
histogram.32;11924458
histogram.33;11934243
histogram.34;11930696
histogram.35;11925574
histogram.36;11929198
histogram.37;11928146
histogram.38;11932505
histogram.39;11929406
histogram.4;0
histogram.40;11930100
histogram.41;11930867
histogram.42;11930796
histogram.43;11930796
histogram.44;11921866
histogram.45;11935682
histogram.46;11930075
histogram.47;11928169
histogram.48;11933490
histogram.49;11932174
histogram.5;0
histogram.50;11933255
histogram.51;11934009
histogram.52;11928361
histogram.53;11927626
histogram.54;11931611
histogram.55;11930755
histogram.56;11933823
histogram.57;11922508
histogram.58;11930384
histogram.59;11929805
histogram.6;0
histogram.60;11930064
histogram.61;11926761
histogram.62;11927605
histogram.63;23858926
histogram.64;11929516
histogram.65;11930217
histogram.66;11930478
histogram.67;11939855
histogram.68;11927850
histogram.69;11931154
histogram.7;0
histogram.70;11935374
histogram.71;11930754
histogram.72;11928304
histogram.73;11931772
histogram.74;11939417
histogram.75;11930712
histogram.76;11933331
histogram.77;11931279
histogram.78;11928276
histogram.79;11930071
histogram.8;0
histogram.80;11927830
histogram.81;11931213
histogram.82;11930964
histogram.83;11928973
histogram.84;11934325
histogram.85;11929658
histogram.86;11924667
histogram.87;11931100
histogram.88;11930252
histogram.89;11927281
histogram.9;11932848
histogram.90;11930398
histogram.91;0
histogram.92;0
histogram.93;0
histogram.94;11928720
histogram.95;11928988
histogram.96;0
histogram.97;11931423
histogram.98;11928181
histogram.99;11935549
histogram.totalBytes;1073741824

File3:
histogram.0;0
histogram.1;0
histogram.10;11930417
histogram.100;11926739
histogram.101;11930580
histogram.102;11928210
histogram.103;11935300
histogram.104;11925804
histogram.105;11931023
histogram.106;11932342
histogram.107;11929778
histogram.108;11930098
histogram.109;11930759
histogram.11;0
histogram.110;11934343
histogram.111;11935775
histogram.112;11933877
histogram.113;11926675
histogram.114;11929332
histogram.115;11928876
histogram.116;11927819
histogram.117;11932657
histogram.118;11933508
histogram.119;11928808
histogram.12;0
histogram.120;11937532
histogram.121;11926907
histogram.122;11933942
histogram.123;0
histogram.124;0
histogram.125;0
histogram.126;0
histogram.127;0
histogram.128;0
histogram.129;0
histogram.13;0
histogram.130;0
histogram.131;0
histogram.132;0
histogram.133;0
histogram.134;0
histogram.135;0
histogram.136;0
histogram.137;0
histogram.138;0
histogram.139;0
histogram.14;0
histogram.140;0
histogram.141;0
histogram.142;0
histogram.143;0
histogram.144;0
histogram.145;0
histogram.146;0
histogram.147;0
histogram.148;0
histogram.149;0
histogram.15;0
histogram.150;0
histogram.151;0
histogram.152;0
histogram.153;0
histogram.154;0
histogram.155;0
histogram.156;0
histogram.157;0
histogram.158;0
histogram.159;0
histogram.16;0
histogram.160;0
histogram.161;0
histogram.162;0
histogram.163;0
histogram.164;0
histogram.165;0
histogram.166;0
histogram.167;0
histogram.168;0
histogram.169;0
histogram.17;0
histogram.170;0
histogram.171;0
histogram.172;0
histogram.173;0
histogram.174;0
histogram.175;0
histogram.176;0
histogram.177;0
histogram.178;0
histogram.179;0
histogram.18;0
histogram.180;0
histogram.181;0
histogram.182;0
histogram.183;0
histogram.184;0
histogram.185;0
histogram.186;0
histogram.187;0
histogram.188;0
histogram.189;0
histogram.19;0
histogram.190;0
histogram.191;0
histogram.192;0
histogram.193;0
histogram.194;0
histogram.195;0
histogram.196;0
histogram.197;0
histogram.198;0
histogram.199;0
histogram.2;0
histogram.20;0
histogram.200;0
histogram.201;0
histogram.202;0
histogram.203;0
histogram.204;0
histogram.205;0
histogram.206;0
histogram.207;0
histogram.208;0
histogram.209;0
histogram.21;0
histogram.210;0
histogram.211;0
histogram.212;0
histogram.213;0
histogram.214;0
histogram.215;0
histogram.216;0
histogram.217;0
histogram.218;0
histogram.219;0
histogram.22;0
histogram.220;0
histogram.221;0
histogram.222;0
histogram.223;0
histogram.224;0
histogram.225;0
histogram.226;0
histogram.227;0
histogram.228;0
histogram.229;0
histogram.23;0
histogram.230;0
histogram.231;0
histogram.232;0
histogram.233;0
histogram.234;0
histogram.235;0
histogram.236;0
histogram.237;0
histogram.238;0
histogram.239;0
histogram.24;0
histogram.240;0
histogram.241;0
histogram.242;0
histogram.243;0
histogram.244;0
histogram.245;0
histogram.246;0
histogram.247;0
histogram.248;0
histogram.249;0
histogram.25;0
histogram.250;0
histogram.251;0
histogram.252;0
histogram.253;0
histogram.254;0
histogram.255;0
histogram.26;0
histogram.27;0
histogram.28;0
histogram.29;0
histogram.3;0
histogram.30;0
histogram.31;0
histogram.32;11929486
histogram.33;11930737
histogram.34;11931092
histogram.35;11934488
histogram.36;11927605
histogram.37;11930735
histogram.38;11932174
histogram.39;11936180
histogram.4;0
histogram.40;11931666
histogram.41;11927043
histogram.42;11929044
histogram.43;11934104
histogram.44;11936337
histogram.45;11935580
histogram.46;11929598
histogram.47;11934083
histogram.48;11928858
histogram.49;11931098
histogram.5;0
histogram.50;11930618
histogram.51;11925429
histogram.52;11929741
histogram.53;11934160
histogram.54;11931999
histogram.55;11930465
histogram.56;11926194
histogram.57;11926386
histogram.58;11924871
histogram.59;11929331
histogram.6;0
histogram.60;11926951
histogram.61;11928631
histogram.62;11927549
histogram.63;23856730
histogram.64;11930288
histogram.65;11931523
histogram.66;11932821
histogram.67;11932509
histogram.68;11929613
histogram.69;11928651
histogram.7;0
histogram.70;11929253
histogram.71;11931521
histogram.72;11925805
histogram.73;11934833
histogram.74;11928314
histogram.75;11923854
histogram.76;11930892
histogram.77;11927528
histogram.78;11932850
histogram.79;11934471
histogram.8;0
histogram.80;11925707
histogram.81;11929213
histogram.82;11931334
histogram.83;11936739
histogram.84;11927855
histogram.85;11931668
histogram.86;11928609
histogram.87;11931930
histogram.88;11934341
histogram.89;11927519
histogram.9;11928004
histogram.90;11933502
histogram.91;0
histogram.92;0
histogram.93;0
histogram.94;11932024
histogram.95;11932693
histogram.96;0
histogram.97;11928428
histogram.98;11933195
histogram.99;11924273
histogram.totalBytes;1073741824

Kind regards
Jens

Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <jo...@gmail.com>>:
Jen

118 hours in - still goood.

Thanks

On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com>> wrote:
>
> Jens
>
> Update from hour 67.  Still lookin' good.
>
> Will advise.
>
> Thanks
>
> On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
> >
> > Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 days before the first error occurred
> >
> > Thanks
> >
> > > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>>:
> > >
> > > Jens,
> > >
> > > Am 40+ hours in running both your flow and mine to reproduce.  So far
> > > neither have shown any sign of trouble.  Will keep running for another
> > > week or so if I can.
> > >
> > > Thanks
> > >
> > >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >>
> > >> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
> > >> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
> > >>
> > >> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> > >>
> > >> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
> > >>
> > >> Kind regards
> > >> Jens
> > >>
> > >>
> > >>
> > >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>>:
> > >>
> > >> Jens,
> > >>
> > >> I don't quite follow the EXT4 usage on top of VMFS but the point here
> > >> is you'll ultimately need to truly understand your underlying storage
> > >> system and what sorts of guarantees it is giving you.  If linux/the
> > >> jvm/nifi think it has a typical EXT4 type block storage system to work
> > >> with it can only be safe/operate within those constraints.  I have no
> > >> idea about what VMFS brings to the table or the settings for it.
> > >>
> > >> The sync properties I shared previously might help force the issue of
> > >> ensuring a formal sync/flush cycle all the way through the disk has
> > >> occurred which we'd normally not do or need to do but again in some
> > >> cases offers a stronger guarantee in exchange for performance.
> > >>
> > >> In any case...Mark's path for you here will help identify what we're
> > >> dealing with and we can go from there.
> > >>
> > >> I am aware of significant usage of NiFi on VMWare configurations
> > >> without issue at high rates for many years so whatever it is here is
> > >> likely solvable.
> > >>
> > >> Thanks
> > >>
> > >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >>
> > >>
> > >> Hi Mark
> > >>
> > >>
> > >> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
> > >>
> > >> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
> > >>
> > >> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
> > >>
> > >>
> > >> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
> > >>
> > >>
> > >> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
> > >>
> > >> Just like the case in step 5 and 12 in the pdf file
> > >>
> > >>
> > >> I will let you all know more later next week
> > >>
> > >>
> > >> Kind regards
> > >>
> > >> Jens
> > >>
> > >>
> > >>
> > >>
> > >> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>>:
> > >>
> > >>
> > >> And the actual script:
> > >>
> > >>
> > >>
> > >> import org.apache.nifi.flowfile.FlowFile
> > >>
> > >>
> > >> import java.util.stream.Collectors
> > >>
> > >>
> > >> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
> > >>
> > >>   final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
> > >>
> > >>       .filter({ entry -> entry.getKey().startsWith("histogram.") })
> > >>
> > >>       .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
> > >>
> > >>   return histogram;
> > >>
> > >> }
> > >>
> > >>
> > >> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
> > >>
> > >>   final Map<String, String> histogram = new HashMap<>();
> > >>
> > >>   final int[] distribution = new int[256];
> > >>
> > >>   Arrays.fill(distribution, 0);
> > >>
> > >>
> > >>   long total = 0L;
> > >>
> > >>   final byte[] buffer = new byte[8192];
> > >>
> > >>   int len;
> > >>
> > >>   while ((len = inStream.read(buffer)) > 0) {
> > >>
> > >>       for (int i=0; i < len; i++) {
> > >>
> > >>           final int val = buffer[i];
> > >>
> > >>           distribution[val]++;
> > >>
> > >>           total++;
> > >>
> > >>       }
> > >>
> > >>   }
> > >>
> > >>
> > >>   for (int i=0; i < 256; i++) {
> > >>
> > >>       histogram.put("histogram." + i, String.valueOf(distribution[i]));
> > >>
> > >>   }
> > >>
> > >>   histogram.put("histogram.totalBytes", String.valueOf(total));
> > >>
> > >>
> > >>   return histogram;
> > >>
> > >> }
> > >>
> > >>
> > >> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
> > >>
> > >>   final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
> > >>
> > >>   final Map<String, String> sorted = new TreeMap<>(previous)
> > >>
> > >>   for (final Map.Entry<String, String> entry : sorted.entrySet()) {
> > >>
> > >>       final String key = entry.getKey();
> > >>
> > >>       final String previousValue = entry.getValue();
> > >>
> > >>       final String updatedValue = updated.get(entry.getKey())
> > >>
> > >>
> > >>       if (!Objects.equals(previousValue, updatedValue)) {
> > >>
> > >>           sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
> > >>
> > >>       }
> > >>
> > >>   }
> > >>
> > >>
> > >>   log.error(sb.toString());
> > >>
> > >> }
> > >>
> > >>
> > >>
> > >> def flowFile = session.get()
> > >>
> > >> if (flowFile == null) {
> > >>
> > >>   return
> > >>
> > >> }
> > >>
> > >>
> > >> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
> > >>
> > >> Map<String, String> histogram = null;
> > >>
> > >>
> > >> final InputStream inStream = session.read(flowFile);
> > >>
> > >> try {
> > >>
> > >>   histogram = createHistogram(flowFile, inStream);
> > >>
> > >> } finally {
> > >>
> > >>   inStream.close()
> > >>
> > >> }
> > >>
> > >>
> > >> if (!previousHistogram.isEmpty()) {
> > >>
> > >>   if (previousHistogram.equals(histogram)) {
> > >>
> > >>       log.info<http://log.info/>("Histograms match")
> > >>
> > >>   } else {
> > >>
> > >>       logHistogramDifferences(previousHistogram, histogram)
> > >>
> > >>       session.transfer(flowFile, REL_FAILURE)
> > >>
> > >>       return;
> > >>
> > >>   }
> > >>
> > >> }
> > >>
> > >>
> > >> flowFile = session.putAllAttributes(flowFile, histogram)
> > >>
> > >> session.transfer(flowFile, REL_SUCCESS)
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com>> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
> > >>
> > >>
> > >> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
> > >>
> > >>
> > >> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
> > >>
> > >>
> > >> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
> > >>
> > >> histogram.0 = 280273
> > >>
> > >> histogram.1 = 2820
> > >>
> > >> histogram.2 = 48202
> > >>
> > >> histogram.3 = 3820
> > >>
> > >> …
> > >>
> > >> histogram.totalBytes = 1780928732
> > >>
> > >>
> > >> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
> > >>
> > >>
> > >> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
> > >>
> > >>
> > >> Thanks
> > >>
> > >> -Mark
> > >>
> > >>
> > >>
> > >>
> > >> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com>> wrote:
> > >>
> > >>
> > >> Jens
> > >>
> > >>
> > >> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com>> wrote:
> > >>
> > >>
> > >> Jens
> > >>
> > >>
> > >> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
> > >>
> > >>
> > >> Still want to know more about your underlying storage system.
> > >>
> > >>
> > >> You could also try updating nifi.properties and changing the following lines:
> > >>
> > >> nifi.flowfile.repository.always.sync=true
> > >>
> > >> nifi.content.repository.always.sync=true
> > >>
> > >> nifi.provenance.repository.always.sync=true
> > >>
> > >>
> > >> It will hurt performance but can be useful/necessary on certain storage subsystems.
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com>> wrote:
> > >>
> > >>
> > >> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
> > >>
> > >>
> > >> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com>> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
> > >>
> > >>
> > >> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
> > >>
> > >>
> > >> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
> > >>
> > >>
> > >> Thanks
> > >>
> > >> Joe
> > >>
> > >>
> > >> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >>
> > >>
> > >> Dear Joe and Mark
> > >>
> > >>
> > >> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
> > >>
> > >> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
> > >>
> > >> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
> > >>
> > >>
> > >> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
> > >>
> > >>
> > >> Kind regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>>:
> > >>
> > >>
> > >> Joe,
> > >>
> > >>
> > >> To start from the last mail :-)
> > >>
> > >> All the repositories has it's own disk, and I'm using ext4
> > >>
> > >> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
> > >>
> > >> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
> > >>
> > >> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
> > >>
> > >>
> > >> My test flow WITH sftp looks like this:
> > >>
> > >> <image.png>
> > >>
> > >> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
> > >>
> > >> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
> > >>
> > >>
> > >> About the Lineage:
> > >>
> > >> Are there a way to export all the lineage data? The export only generate a svg file.
> > >>
> > >> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
> > >>
> > >> The interesting steps are step 5 and 12.
> > >>
> > >>
> > >> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
> > >>
> > >>
> > >> Kind regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >>
> > >> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>>:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> Also what type of file system/storage system are you running NiFi on
> > >>
> > >> in this case?  We'll need to know this for the NiFi
> > >>
> > >> content/flowfile/provenance repositories? Is it NFS?
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com>> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> And to further narrow this down
> > >>
> > >>
> > >> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> > >>
> > >> (2 files per node) and next process was a hashcontent before it run
> > >>
> > >> into a test loop. Where files are uploaded via PutSFTP to a test
> > >>
> > >> server, and downloaded again and recalculated the hash. I have had one
> > >>
> > >> issue after 3 days of running."
> > >>
> > >>
> > >> So to be clear with GenerateFlowFile making these files and then you
> > >>
> > >> looping the content is wholly and fully exclusively within the control
> > >>
> > >> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> > >>
> > >> same files over and over in nifi itself you can make this happen or
> > >>
> > >> cannot?
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> "After fetching a FlowFile-stream file and unpacked it back into NiFi
> > >>
> > >> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> > >>
> > >> exact same file. And got a new hash. That is what worry’s me.
> > >>
> > >> The fact that the same file can be recalculated and produce two
> > >>
> > >> different hashes, is very strange, but it happens. "
> > >>
> > >>
> > >> Ok so to confirm you are saying that in each case this happens you see
> > >>
> > >> it first compute the wrong hash, but then if you retry the same
> > >>
> > >> flowfile it then provides the correct hash?
> > >>
> > >>
> > >> Can you please also show/share the lineage history for such a flow
> > >>
> > >> file then?  It should have events for the initial hash, second hash,
> > >>
> > >> the unpacking, trace to the original stream, etc...
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >>
> > >>
> > >> Dear Mark and Joe
> > >>
> > >>
> > >> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> > >>
> > >> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> > >>
> > >> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
> > >>
> > >>
> > >> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
> > >>
> > >>
> > >> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> > >>
> > >> Now the test flow is running without the Put/Fetch sftp processors.
> > >>
> > >>
> > >> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> > >>
> > >> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
> > >>
> > >>
> > >> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
> > >>
> > >>
> > >> Kind Regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>>:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> Thanks for sharing the images.
> > >>
> > >>
> > >> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> > >>
> > >>
> > >> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> > >>
> > >>
> > >> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> > >>
> > >>
> > >> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> > >>
> > >>
> > >> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> > >>
> > >>
> > >> Thanks
> > >>
> > >> -Mark
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>> wrote:
> > >>
> > >>
> > >> Jens
> > >>
> > >>
> > >> Actually is this current loop test contained within a single nifi and there you see corruption happen?
> > >>
> > >>
> > >> Joe
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
> > >>
> > >>
> > >> Joe
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >>
> > >>
> > >> Hi
> > >>
> > >>
> > >> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> > >>
> > >> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
> > >>
> > >>
> > >> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> > >>
> > >> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
> > >>
> > >>
> > >> Any suggestions for where to dig ?
> > >>
> > >>
> > >> Regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >>
> > >> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>>:
> > >>
> > >>
> > >> Hi Mark
> > >>
> > >>
> > >> Thanks for replaying and the suggestion to look at the content Claim.
> > >>
> > >> These 3 pictures is from the first attempt:
> > >>
> > >> <image.png>   <image.png>   <image.png>
> > >>
> > >>
> > >> Yesterday I realized that the content was still in the archive, so I could Replay the file.
> > >>
> > >> <image.png>
> > >>
> > >> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> > >>
> > >> <image.png>   <image.png>   <image.png>
> > >>
> > >>
> > >> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> > >>
> > >> <image.png>   <image.png>   <image.png>
> > >>
> > >> Here the content Claim is all the same.
> > >>
> > >>
> > >> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> > >>
> > >> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> > >>
> > >> What can influence the same processor to calculate 2 different sha256 on the exact same content???
> > >>
> > >>
> > >> Regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>>:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> > >>
> > >> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
> > >>
> > >>
> > >> Thanks
> > >>
> > >> -Mark
> > >>
> > >>
> > >> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >>
> > >>
> > >> Dear NIFI Users
> > >>
> > >>
> > >> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> > >>
> > >> The background:
> > >>
> > >> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
> > >>
> > >>
> > >> ----
> > >>
> > >>
> > >> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> > >>
> > >> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
> > >>
> > >>
> > >> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> > >>
> > >> <image.png>
> > >>
> > >>
> > >> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> > >>
> > >> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
> > >>
> > >>
> > >> System 1:
> > >>
> > >> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > >>
> > >> The file is exported as a "FlowFile Stream, v3" to System 2
> > >>
> > >>
> > >> SYSTEM 2:
> > >>
> > >> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>
> > >> <image.png>
> > >>
> > >> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>
> > >> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>
> > >> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>
> > >>
> > >> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > >>
> > >> <image.png>
> > >>
> > >>
> > >> How on earth can this happen???
> > >>
> > >>
> > >> Kind Regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >>
> > >> <Repro.json>
> > >>
> > >>
> > >> <Try_to_recreate_Jens_Challenge.json>
> > >>
> > >>
> > >>
> > >>


Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Mark and Joe

Yesterday morning I implemented Mark's script in my 2 testflows. One
testflow using sftp the other MergeContent/UnpackContent. Both testflow are
running at a test cluster with 3 nodes and NIFI 1.14.0
The 1st flow with sftp have had 1 file going into the failure queue after
about 16 hours.
The 2nd flow have had 2 files  going into the failure queue after about 15
and 17 hours.

There are definitely something going wrongs in my setup, but I can't figure
out what.

Information from file 1:
histogram.0;0
histogram.1;0
histogram.10;11926720
histogram.100;11927504
histogram.101;11925396
histogram.102;11929923
histogram.103;11931596
histogram.104;11929071
histogram.105;11931365
histogram.106;11928661
histogram.107;11929864
histogram.108;11931611
histogram.109;11932758
histogram.11;0
histogram.110;11927893
histogram.111;11933519
histogram.112;11931392
histogram.113;11928534
histogram.114;11936879
histogram.115;11932818
histogram.116;11934767
histogram.117;11929143
histogram.118;11931854
histogram.119;11926333
histogram.12;0
histogram.120;11928731
histogram.121;11931149
histogram.122;11926725
histogram.123;0
histogram.124;0
histogram.125;0
histogram.126;0
histogram.127;0
histogram.128;0
histogram.129;0
histogram.13;0
histogram.130;0
histogram.131;0
histogram.132;0
histogram.133;0
histogram.134;0
histogram.135;0
histogram.136;0
histogram.137;0
histogram.138;0
histogram.139;0
histogram.14;0
histogram.140;0
histogram.141;0
histogram.142;0
histogram.143;0
histogram.144;0
histogram.145;0
histogram.146;0
histogram.147;0
histogram.148;0
histogram.149;0
histogram.15;0
histogram.150;0
histogram.151;0
histogram.152;0
histogram.153;0
histogram.154;0
histogram.155;0
histogram.156;0
histogram.157;0
histogram.158;0
histogram.159;0
histogram.16;0
histogram.160;0
histogram.161;0
histogram.162;0
histogram.163;0
histogram.164;0
histogram.165;0
histogram.166;0
histogram.167;0
histogram.168;0
histogram.169;0
histogram.17;0
histogram.170;0
histogram.171;0
histogram.172;0
histogram.173;0
histogram.174;0
histogram.175;0
histogram.176;0
histogram.177;0
histogram.178;0
histogram.179;0
histogram.18;0
histogram.180;0
histogram.181;0
histogram.182;0
histogram.183;0
histogram.184;0
histogram.185;0
histogram.186;0
histogram.187;0
histogram.188;0
histogram.189;0
histogram.19;0
histogram.190;0
histogram.191;0
histogram.192;0
histogram.193;0
histogram.194;0
histogram.195;0
histogram.196;0
histogram.197;0
histogram.198;0
histogram.199;0
histogram.2;0
histogram.20;0
histogram.200;0
histogram.201;0
histogram.202;0
histogram.203;0
histogram.204;0
histogram.205;0
histogram.206;0
histogram.207;0
histogram.208;0
histogram.209;0
histogram.21;0
histogram.210;0
histogram.211;0
histogram.212;0
histogram.213;0
histogram.214;0
histogram.215;0
histogram.216;0
histogram.217;0
histogram.218;0
histogram.219;0
histogram.22;0
histogram.220;0
histogram.221;0
histogram.222;0
histogram.223;0
histogram.224;0
histogram.225;0
histogram.226;0
histogram.227;0
histogram.228;0
histogram.229;0
histogram.23;0
histogram.230;0
histogram.231;0
histogram.232;0
histogram.233;0
histogram.234;0
histogram.235;0
histogram.236;0
histogram.237;0
histogram.238;0
histogram.239;0
histogram.24;0
histogram.240;0
histogram.241;0
histogram.242;0
histogram.243;0
histogram.244;0
histogram.245;0
histogram.246;0
histogram.247;0
histogram.248;0
histogram.249;0
histogram.25;0
histogram.250;0
histogram.251;0
histogram.252;0
histogram.253;0
histogram.254;0
histogram.255;0
histogram.26;0
histogram.27;0
histogram.28;0
histogram.29;0
histogram.3;0
histogram.30;0
histogram.31;0
histogram.32;11930422
histogram.33;11934311
histogram.34;11930459
histogram.35;11924776
histogram.36;11924186
histogram.37;11928616
histogram.38;11929474
histogram.39;11929607
histogram.4;0
histogram.40;11928053
histogram.41;11930402
histogram.42;11926830
histogram.43;11938138
histogram.44;11932536
histogram.45;11931053
histogram.46;11930008
histogram.47;11927747
histogram.48;11936055
histogram.49;11931471
histogram.5;0
histogram.50;11931921
histogram.51;11929643
histogram.52;11923847
histogram.53;11927311
histogram.54;11933754
histogram.55;11925964
histogram.56;11928872
histogram.57;11931124
histogram.58;11928474
histogram.59;11925814
histogram.6;0
histogram.60;11933978
histogram.61;11934136
histogram.62;11932016
histogram.63;23864588
histogram.64;11924792
histogram.65;11934789
histogram.66;11933047
histogram.67;11931899
histogram.68;11935615
histogram.69;11927249
histogram.7;0
histogram.70;11933276
histogram.71;11927953
histogram.72;11929275
histogram.73;11930292
histogram.74;11935428
histogram.75;11930317
histogram.76;11935737
histogram.77;11932127
histogram.78;11932344
histogram.79;11932094
histogram.8;0
histogram.80;11930688
histogram.81;11928415
histogram.82;11931559
histogram.83;11934192
histogram.84;11927224
histogram.85;11929491
histogram.86;11930624
histogram.87;11932201
histogram.88;11930694
histogram.89;11936439
histogram.9;11933187
histogram.90;11926445
histogram.91;0
histogram.92;0
histogram.93;0
histogram.94;11931596
histogram.95;11929379
histogram.96;0
histogram.97;11928864
histogram.98;11924738
histogram.99;11930062
histogram.totalBytes;1073741824

File 2:
histogram.0;0
histogram.1;0
histogram.10;11932402
histogram.100;11927531
histogram.101;11928454
histogram.102;11934432
histogram.103;11924623
histogram.104;11934492
histogram.105;11934585
histogram.106;11928955
histogram.107;11928651
histogram.108;11930139
histogram.109;11929325
histogram.11;0
histogram.110;11930486
histogram.111;11933517
histogram.112;11928334
histogram.113;11927798
histogram.114;11929222
histogram.115;11932057
histogram.116;11931182
histogram.117;11933407
histogram.118;11932709
histogram.119;11931338
histogram.12;0
histogram.120;11933700
histogram.121;11929803
histogram.122;11930218
histogram.123;0
histogram.124;0
histogram.125;0
histogram.126;0
histogram.127;0
histogram.128;0
histogram.129;0
histogram.13;0
histogram.130;0
histogram.131;0
histogram.132;0
histogram.133;0
histogram.134;0
histogram.135;0
histogram.136;0
histogram.137;0
histogram.138;0
histogram.139;0
histogram.14;0
histogram.140;0
histogram.141;0
histogram.142;0
histogram.143;0
histogram.144;0
histogram.145;0
histogram.146;0
histogram.147;0
histogram.148;0
histogram.149;0
histogram.15;0
histogram.150;0
histogram.151;0
histogram.152;0
histogram.153;0
histogram.154;0
histogram.155;0
histogram.156;0
histogram.157;0
histogram.158;0
histogram.159;0
histogram.16;0
histogram.160;0
histogram.161;0
histogram.162;0
histogram.163;0
histogram.164;0
histogram.165;0
histogram.166;0
histogram.167;0
histogram.168;0
histogram.169;0
histogram.17;0
histogram.170;0
histogram.171;0
histogram.172;0
histogram.173;0
histogram.174;0
histogram.175;0
histogram.176;0
histogram.177;0
histogram.178;0
histogram.179;0
histogram.18;0
histogram.180;0
histogram.181;0
histogram.182;0
histogram.183;0
histogram.184;0
histogram.185;0
histogram.186;0
histogram.187;0
histogram.188;0
histogram.189;0
histogram.19;0
histogram.190;0
histogram.191;0
histogram.192;0
histogram.193;0
histogram.194;0
histogram.195;0
histogram.196;0
histogram.197;0
histogram.198;0
histogram.199;0
histogram.2;0
histogram.20;0
histogram.200;0
histogram.201;0
histogram.202;0
histogram.203;0
histogram.204;0
histogram.205;0
histogram.206;0
histogram.207;0
histogram.208;0
histogram.209;0
histogram.21;0
histogram.210;0
histogram.211;0
histogram.212;0
histogram.213;0
histogram.214;0
histogram.215;0
histogram.216;0
histogram.217;0
histogram.218;0
histogram.219;0
histogram.22;0
histogram.220;0
histogram.221;0
histogram.222;0
histogram.223;0
histogram.224;0
histogram.225;0
histogram.226;0
histogram.227;0
histogram.228;0
histogram.229;0
histogram.23;0
histogram.230;0
histogram.231;0
histogram.232;0
histogram.233;0
histogram.234;0
histogram.235;0
histogram.236;0
histogram.237;0
histogram.238;0
histogram.239;0
histogram.24;0
histogram.240;0
histogram.241;0
histogram.242;0
histogram.243;0
histogram.244;0
histogram.245;0
histogram.246;0
histogram.247;0
histogram.248;0
histogram.249;0
histogram.25;0
histogram.250;0
histogram.251;0
histogram.252;0
histogram.253;0
histogram.254;0
histogram.255;0
histogram.26;0
histogram.27;0
histogram.28;0
histogram.29;0
histogram.3;0
histogram.30;0
histogram.31;0
histogram.32;11924458
histogram.33;11934243
histogram.34;11930696
histogram.35;11925574
histogram.36;11929198
histogram.37;11928146
histogram.38;11932505
histogram.39;11929406
histogram.4;0
histogram.40;11930100
histogram.41;11930867
histogram.42;11930796
histogram.43;11930796
histogram.44;11921866
histogram.45;11935682
histogram.46;11930075
histogram.47;11928169
histogram.48;11933490
histogram.49;11932174
histogram.5;0
histogram.50;11933255
histogram.51;11934009
histogram.52;11928361
histogram.53;11927626
histogram.54;11931611
histogram.55;11930755
histogram.56;11933823
histogram.57;11922508
histogram.58;11930384
histogram.59;11929805
histogram.6;0
histogram.60;11930064
histogram.61;11926761
histogram.62;11927605
histogram.63;23858926
histogram.64;11929516
histogram.65;11930217
histogram.66;11930478
histogram.67;11939855
histogram.68;11927850
histogram.69;11931154
histogram.7;0
histogram.70;11935374
histogram.71;11930754
histogram.72;11928304
histogram.73;11931772
histogram.74;11939417
histogram.75;11930712
histogram.76;11933331
histogram.77;11931279
histogram.78;11928276
histogram.79;11930071
histogram.8;0
histogram.80;11927830
histogram.81;11931213
histogram.82;11930964
histogram.83;11928973
histogram.84;11934325
histogram.85;11929658
histogram.86;11924667
histogram.87;11931100
histogram.88;11930252
histogram.89;11927281
histogram.9;11932848
histogram.90;11930398
histogram.91;0
histogram.92;0
histogram.93;0
histogram.94;11928720
histogram.95;11928988
histogram.96;0
histogram.97;11931423
histogram.98;11928181
histogram.99;11935549
histogram.totalBytes;1073741824

File3:
histogram.0;0
histogram.1;0
histogram.10;11930417
histogram.100;11926739
histogram.101;11930580
histogram.102;11928210
histogram.103;11935300
histogram.104;11925804
histogram.105;11931023
histogram.106;11932342
histogram.107;11929778
histogram.108;11930098
histogram.109;11930759
histogram.11;0
histogram.110;11934343
histogram.111;11935775
histogram.112;11933877
histogram.113;11926675
histogram.114;11929332
histogram.115;11928876
histogram.116;11927819
histogram.117;11932657
histogram.118;11933508
histogram.119;11928808
histogram.12;0
histogram.120;11937532
histogram.121;11926907
histogram.122;11933942
histogram.123;0
histogram.124;0
histogram.125;0
histogram.126;0
histogram.127;0
histogram.128;0
histogram.129;0
histogram.13;0
histogram.130;0
histogram.131;0
histogram.132;0
histogram.133;0
histogram.134;0
histogram.135;0
histogram.136;0
histogram.137;0
histogram.138;0
histogram.139;0
histogram.14;0
histogram.140;0
histogram.141;0
histogram.142;0
histogram.143;0
histogram.144;0
histogram.145;0
histogram.146;0
histogram.147;0
histogram.148;0
histogram.149;0
histogram.15;0
histogram.150;0
histogram.151;0
histogram.152;0
histogram.153;0
histogram.154;0
histogram.155;0
histogram.156;0
histogram.157;0
histogram.158;0
histogram.159;0
histogram.16;0
histogram.160;0
histogram.161;0
histogram.162;0
histogram.163;0
histogram.164;0
histogram.165;0
histogram.166;0
histogram.167;0
histogram.168;0
histogram.169;0
histogram.17;0
histogram.170;0
histogram.171;0
histogram.172;0
histogram.173;0
histogram.174;0
histogram.175;0
histogram.176;0
histogram.177;0
histogram.178;0
histogram.179;0
histogram.18;0
histogram.180;0
histogram.181;0
histogram.182;0
histogram.183;0
histogram.184;0
histogram.185;0
histogram.186;0
histogram.187;0
histogram.188;0
histogram.189;0
histogram.19;0
histogram.190;0
histogram.191;0
histogram.192;0
histogram.193;0
histogram.194;0
histogram.195;0
histogram.196;0
histogram.197;0
histogram.198;0
histogram.199;0
histogram.2;0
histogram.20;0
histogram.200;0
histogram.201;0
histogram.202;0
histogram.203;0
histogram.204;0
histogram.205;0
histogram.206;0
histogram.207;0
histogram.208;0
histogram.209;0
histogram.21;0
histogram.210;0
histogram.211;0
histogram.212;0
histogram.213;0
histogram.214;0
histogram.215;0
histogram.216;0
histogram.217;0
histogram.218;0
histogram.219;0
histogram.22;0
histogram.220;0
histogram.221;0
histogram.222;0
histogram.223;0
histogram.224;0
histogram.225;0
histogram.226;0
histogram.227;0
histogram.228;0
histogram.229;0
histogram.23;0
histogram.230;0
histogram.231;0
histogram.232;0
histogram.233;0
histogram.234;0
histogram.235;0
histogram.236;0
histogram.237;0
histogram.238;0
histogram.239;0
histogram.24;0
histogram.240;0
histogram.241;0
histogram.242;0
histogram.243;0
histogram.244;0
histogram.245;0
histogram.246;0
histogram.247;0
histogram.248;0
histogram.249;0
histogram.25;0
histogram.250;0
histogram.251;0
histogram.252;0
histogram.253;0
histogram.254;0
histogram.255;0
histogram.26;0
histogram.27;0
histogram.28;0
histogram.29;0
histogram.3;0
histogram.30;0
histogram.31;0
histogram.32;11929486
histogram.33;11930737
histogram.34;11931092
histogram.35;11934488
histogram.36;11927605
histogram.37;11930735
histogram.38;11932174
histogram.39;11936180
histogram.4;0
histogram.40;11931666
histogram.41;11927043
histogram.42;11929044
histogram.43;11934104
histogram.44;11936337
histogram.45;11935580
histogram.46;11929598
histogram.47;11934083
histogram.48;11928858
histogram.49;11931098
histogram.5;0
histogram.50;11930618
histogram.51;11925429
histogram.52;11929741
histogram.53;11934160
histogram.54;11931999
histogram.55;11930465
histogram.56;11926194
histogram.57;11926386
histogram.58;11924871
histogram.59;11929331
histogram.6;0
histogram.60;11926951
histogram.61;11928631
histogram.62;11927549
histogram.63;23856730
histogram.64;11930288
histogram.65;11931523
histogram.66;11932821
histogram.67;11932509
histogram.68;11929613
histogram.69;11928651
histogram.7;0
histogram.70;11929253
histogram.71;11931521
histogram.72;11925805
histogram.73;11934833
histogram.74;11928314
histogram.75;11923854
histogram.76;11930892
histogram.77;11927528
histogram.78;11932850
histogram.79;11934471
histogram.8;0
histogram.80;11925707
histogram.81;11929213
histogram.82;11931334
histogram.83;11936739
histogram.84;11927855
histogram.85;11931668
histogram.86;11928609
histogram.87;11931930
histogram.88;11934341
histogram.89;11927519
histogram.9;11928004
histogram.90;11933502
histogram.91;0
histogram.92;0
histogram.93;0
histogram.94;11932024
histogram.95;11932693
histogram.96;0
histogram.97;11928428
histogram.98;11933195
histogram.99;11924273
histogram.totalBytes;1073741824

Kind regards
Jens

Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt <jo...@gmail.com>:

> Jen
>
> 118 hours in - still goood.
>
> Thanks
>
> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com> wrote:
> >
> > Jens
> >
> > Update from hour 67.  Still lookin' good.
> >
> > Will advise.
> >
> > Thanks
> >
> > On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com>
> wrote:
> > >
> > > Many many thanks 🙏 Joe for looking into this. My test flow was
> running for 6 days before the first error occurred
> > >
> > > Thanks
> > >
> > > > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
> > > >
> > > > Jens,
> > > >
> > > > Am 40+ hours in running both your flow and mine to reproduce.  So far
> > > > neither have shown any sign of trouble.  Will keep running for
> another
> > > > week or so if I can.
> > > >
> > > > Thanks
> > > >
> > > >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > > >>
> > > >> The Physical hosts with VMWare is using the vmfs but the vm
> machines running at hosts can’t see that.
> > > >> But you asked about the underlying file system 😀 and since my
> first answer with the copy from the fstab file wasn’t enough I just wanted
> to give all the details 😁.
> > > >>
> > > >> If you create a vm for windows you would probably use NTFS (on top
> of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> > > >>
> > > >> All the partitions at my nifi nodes, are local devices (sda, sdb,
> sdc and sdd) for each Linux machine. I don’t use nfs
> > > >>
> > > >> Kind regards
> > > >> Jens
> > > >>
> > > >>
> > > >>
> > > >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
> > > >>
> > > >> Jens,
> > > >>
> > > >> I don't quite follow the EXT4 usage on top of VMFS but the point
> here
> > > >> is you'll ultimately need to truly understand your underlying
> storage
> > > >> system and what sorts of guarantees it is giving you.  If linux/the
> > > >> jvm/nifi think it has a typical EXT4 type block storage system to
> work
> > > >> with it can only be safe/operate within those constraints.  I have
> no
> > > >> idea about what VMFS brings to the table or the settings for it.
> > > >>
> > > >> The sync properties I shared previously might help force the issue
> of
> > > >> ensuring a formal sync/flush cycle all the way through the disk has
> > > >> occurred which we'd normally not do or need to do but again in some
> > > >> cases offers a stronger guarantee in exchange for performance.
> > > >>
> > > >> In any case...Mark's path for you here will help identify what we're
> > > >> dealing with and we can go from there.
> > > >>
> > > >> I am aware of significant usage of NiFi on VMWare configurations
> > > >> without issue at high rates for many years so whatever it is here is
> > > >> likely solvable.
> > > >>
> > > >> Thanks
> > > >>
> > > >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > > >>
> > > >>
> > > >> Hi Mark
> > > >>
> > > >>
> > > >> Thanks for the clarification. I will implement the script when I
> return to the office at Monday next week ( November 1st).
> > > >>
> > > >> I don’t use NFS, but ext4. But I will implement the script so we
> can check if it’s the case here. But I think the issue might be after the
> processors writing content to the repository.
> > > >>
> > > >> I have a test flow running for more than 2 weeks without any
> errors. But this flow only calculate hash and comparing.
> > > >>
> > > >>
> > > >> Two other flows both create errors. One flow use
> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use
> MergeContent->UnpackContent->CryptographicHashContent->compares. The last
> flow is totally inside nifi, excluding other network/server issues.
> > > >>
> > > >>
> > > >> In both cases the CryptographicHashContent is right after a process
> which writes new content to the repository. But in one case a file in our
> production flow did calculate a wrong hash 4 times with a 1 minutes delay
> between each calculation. A few hours later I looped the file back and this
> time it was OK.
> > > >>
> > > >> Just like the case in step 5 and 12 in the pdf file
> > > >>
> > > >>
> > > >> I will let you all know more later next week
> > > >>
> > > >>
> > > >> Kind regards
> > > >>
> > > >> Jens
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <markap14@hotmail.com
> >:
> > > >>
> > > >>
> > > >> And the actual script:
> > > >>
> > > >>
> > > >>
> > > >> import org.apache.nifi.flowfile.FlowFile
> > > >>
> > > >>
> > > >> import java.util.stream.Collectors
> > > >>
> > > >>
> > > >> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
> > > >>
> > > >>   final Map<String, String> histogram =
> flowFile.getAttributes().entrySet().stream()
> > > >>
> > > >>       .filter({ entry -> entry.getKey().startsWith("histogram.") })
> > > >>
> > > >>       .collect(Collectors.toMap({ entry -> entry.key}, { entry ->
> entry.value }))
> > > >>
> > > >>   return histogram;
> > > >>
> > > >> }
> > > >>
> > > >>
> > > >> Map<String, String> createHistogram(final FlowFile flowFile, final
> InputStream inStream) {
> > > >>
> > > >>   final Map<String, String> histogram = new HashMap<>();
> > > >>
> > > >>   final int[] distribution = new int[256];
> > > >>
> > > >>   Arrays.fill(distribution, 0);
> > > >>
> > > >>
> > > >>   long total = 0L;
> > > >>
> > > >>   final byte[] buffer = new byte[8192];
> > > >>
> > > >>   int len;
> > > >>
> > > >>   while ((len = inStream.read(buffer)) > 0) {
> > > >>
> > > >>       for (int i=0; i < len; i++) {
> > > >>
> > > >>           final int val = buffer[i];
> > > >>
> > > >>           distribution[val]++;
> > > >>
> > > >>           total++;
> > > >>
> > > >>       }
> > > >>
> > > >>   }
> > > >>
> > > >>
> > > >>   for (int i=0; i < 256; i++) {
> > > >>
> > > >>       histogram.put("histogram." + i,
> String.valueOf(distribution[i]));
> > > >>
> > > >>   }
> > > >>
> > > >>   histogram.put("histogram.totalBytes", String.valueOf(total));
> > > >>
> > > >>
> > > >>   return histogram;
> > > >>
> > > >> }
> > > >>
> > > >>
> > > >> void logHistogramDifferences(final Map<String, String> previous,
> final Map<String, String> updated) {
> > > >>
> > > >>   final StringBuilder sb = new StringBuilder("There are differences
> in the histogram\n");
> > > >>
> > > >>   final Map<String, String> sorted = new TreeMap<>(previous)
> > > >>
> > > >>   for (final Map.Entry<String, String> entry : sorted.entrySet()) {
> > > >>
> > > >>       final String key = entry.getKey();
> > > >>
> > > >>       final String previousValue = entry.getValue();
> > > >>
> > > >>       final String updatedValue = updated.get(entry.getKey())
> > > >>
> > > >>
> > > >>       if (!Objects.equals(previousValue, updatedValue)) {
> > > >>
> > > >>           sb.append("Byte Value: ").append(key).append(", Previous
> Count: ").append(previousValue).append(", New Count:
> ").append(updatedValue).append("\n");
> > > >>
> > > >>       }
> > > >>
> > > >>   }
> > > >>
> > > >>
> > > >>   log.error(sb.toString());
> > > >>
> > > >> }
> > > >>
> > > >>
> > > >>
> > > >> def flowFile = session.get()
> > > >>
> > > >> if (flowFile == null) {
> > > >>
> > > >>   return
> > > >>
> > > >> }
> > > >>
> > > >>
> > > >> final Map<String, String> previousHistogram =
> getPreviousHistogram(flowFile)
> > > >>
> > > >> Map<String, String> histogram = null;
> > > >>
> > > >>
> > > >> final InputStream inStream = session.read(flowFile);
> > > >>
> > > >> try {
> > > >>
> > > >>   histogram = createHistogram(flowFile, inStream);
> > > >>
> > > >> } finally {
> > > >>
> > > >>   inStream.close()
> > > >>
> > > >> }
> > > >>
> > > >>
> > > >> if (!previousHistogram.isEmpty()) {
> > > >>
> > > >>   if (previousHistogram.equals(histogram)) {
> > > >>
> > > >>       log.info("Histograms match")
> > > >>
> > > >>   } else {
> > > >>
> > > >>       logHistogramDifferences(previousHistogram, histogram)
> > > >>
> > > >>       session.transfer(flowFile, REL_FAILURE)
> > > >>
> > > >>       return;
> > > >>
> > > >>   }
> > > >>
> > > >> }
> > > >>
> > > >>
> > > >> flowFile = session.putAllAttributes(flowFile, histogram)
> > > >>
> > > >> session.transfer(flowFile, REL_SUCCESS)
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com>
> wrote:
> > > >>
> > > >>
> > > >> Jens,
> > > >>
> > > >>
> > > >> For a bit of background here, the reason that Joe and I have
> expressed interest in NFS file systems is that the way the protocol works,
> it is allowed to receive packets/chunks of the file out-of-order. So, what
> happens is let’s say a 1 MB file is being written. The first 500 KB are
> received. Then instead of the the 501st KB it receives the 503rd KB. What
> happens is that the size of the file on the file system becomes 503 KB. But
> what about 501 & 502? Well when you read the data, the file system just
> returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server
> receives those bytes, it then goes back and fills in the proper bytes. So
> if you’re running on NFS, it is possible for the contents of the file on
> the underlying file system to change out from under you. It’s not clear to
> me what other types of file system might do something similar.
> > > >>
> > > >>
> > > >> So, one thing that we can do is to find out whether or not the
> contents of the underlying file have changed in some way, or if there’s
> something else happening that could perhaps result in the hashes being
> wrong. I’ve put together a script that should help diagnose this.
> > > >>
> > > >>
> > > >> Can you insert an ExecuteScript processor either just before or
> just after your CryptographicHashContent processor? Doesn’t really matter
> whether it’s run just before or just after. I’ll attach the script here.
> It’s a Groovy Script so you should be able to use ExecuteScript with Script
> Engine = Groovy and the following script as the Script Body. No other
> changes needed.
> > > >>
> > > >>
> > > >> The way the script works, it reads in the contents of the FlowFile,
> and then it builds up a histogram of all byte values (0-255) that it sees
> in the contents, and then adds that as attributes. So it adds attributes
> such as:
> > > >>
> > > >> histogram.0 = 280273
> > > >>
> > > >> histogram.1 = 2820
> > > >>
> > > >> histogram.2 = 48202
> > > >>
> > > >> histogram.3 = 3820
> > > >>
> > > >> …
> > > >>
> > > >> histogram.totalBytes = 1780928732
> > > >>
> > > >>
> > > >> It then checks if those attributes have already been added. If so,
> after calculating that histogram, it checks against the previous values (in
> the attributes). If they are the same, the FlowFile goes to ’success’. If
> they are different, it logs an error indicating the before/after value for
> any byte whose distribution was different, and it routes to failure.
> > > >>
> > > >>
> > > >> So, if for example, the first time through it sees 280,273 bytes
> with a value of ‘0’, and the second times it only sees 12,001 then we know
> there were a bunch of 0’s previously that were updated to be some other
> value. And it includes the total number of bytes in case somehow we find
> that we’re reading too many bytes or not enough bytes or something like
> that. This should help narrow down what’s happening.
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >> -Mark
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
> > > >>
> > > >>
> > > >> Jens
> > > >>
> > > >>
> > > >> Attached is the flow I was using (now running yours and this one).
> Curious if that one reproduces the issue for you as well.
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com>
> wrote:
> > > >>
> > > >>
> > > >> Jens
> > > >>
> > > >>
> > > >> I have your flow running and will keep it running for several
> days/week to see if I can reproduce.  Also of note please use your same
> test flow but use HashContent instead of crypto hash.  Curious if that
> matters for any reason...
> > > >>
> > > >>
> > > >> Still want to know more about your underlying storage system.
> > > >>
> > > >>
> > > >> You could also try updating nifi.properties and changing the
> following lines:
> > > >>
> > > >> nifi.flowfile.repository.always.sync=true
> > > >>
> > > >> nifi.content.repository.always.sync=true
> > > >>
> > > >> nifi.provenance.repository.always.sync=true
> > > >>
> > > >>
> > > >> It will hurt performance but can be useful/necessary on certain
> storage subsystems.
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com>
> wrote:
> > > >>
> > > >>
> > > >> Ignore "For the scenario where you can replicate this please share
> the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
> > > >>
> > > >>
> > > >> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com>
> wrote:
> > > >>
> > > >>
> > > >> Jens,
> > > >>
> > > >>
> > > >> We asked about the underlying storage system.  You replied with
> some info but not the specifics.  Do you know precisely what the underlying
> storage is and how it is presented to the operating system?  For instance
> is it NFS or something similar?
> > > >>
> > > >>
> > > >> I've setup a very similar flow at extremely high rates running for
> the past several days with no issue.  In my case though I know precisely
> what the config is and the disk setup is.  Didn't do anything special to be
> clear but still it is important to know.
> > > >>
> > > >>
> > > >> For the scenario where you can replicate this please share the
> flow.xml.gz for which it is reproducible.
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >> Joe
> > > >>
> > > >>
> > > >> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > > >>
> > > >>
> > > >> Dear Joe and Mark
> > > >>
> > > >>
> > > >> I have created a test flow without the sftp processors, which don't
> create any errors. Therefore I created a new test flow where I use a
> MergeContent and UnpackContent instead of the sftp processors. This keeps
> all data internal in NIFI, but force NIFI to write and read new files
> totally local.
> > > >>
> > > >> My flow have been running for 7 days and this morning there where 2
> files where the sha256 has been given another has value than original. I
> have set this flow up in another nifi cluster only for testing, and the
> cluster is not doing anything else. It is using Nifi 1.14.0
> > > >>
> > > >> So I can reproduce issues at different nifi clusters and versions
> (1.13.2 and 1.14.0) where the calculation of a hash on content can give
> different outputs. Is doesn't make any sense, but it happens. In all my
> cases the issues happens where the calculations of the hashcontent happens
> right after NIFI writes the content to the content repository. I don't know
> if there cut be some kind of delay writing the content 100% before the next
> processors begin reading the content???
> > > >>
> > > >>
> > > >> Please see attach test flow, and the previous mail with a pdf
> showing the lineage of a production file which also had issues. In the pdf
> check step 5 and 12.
> > > >>
> > > >>
> > > >> Kind regards
> > > >>
> > > >> Jens M. Kofoed
> > > >>
> > > >>
> > > >>
> > > >> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
> > > >>
> > > >>
> > > >> Joe,
> > > >>
> > > >>
> > > >> To start from the last mail :-)
> > > >>
> > > >> All the repositories has it's own disk, and I'm using ext4
> > > >>
> > > >> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
> > > >>
> > > >> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
> > > >>
> > > >> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
> > > >>
> > > >>
> > > >> My test flow WITH sftp looks like this:
> > > >>
> > > >> <image.png>
> > > >>
> > > >> And this flow has produced 1 error within 3 days. After many many
> loops the file fails and went out via the "unmatched" output to  the
> disabled UpdateAttribute, which is doing nothing. Just for keeping the
> failed flowfile in a queue.  I enabled the UpdateAttribute and looped the
> file back to the CryptographicHashContent and now it calculated the hash
> correct again. But in this flow I have a FetchSFTP Process right before the
> Hashing.
> > > >>
> > > >> Right now my flow is running without the 2 sftp processors, and the
> last 24hours there has been no errors.
> > > >>
> > > >>
> > > >> About the Lineage:
> > > >>
> > > >> Are there a way to export all the lineage data? The export only
> generate a svg file.
> > > >>
> > > >> This is only for the receiving nifi which is internally calculate 2
> different hashes on the same content with ca. 1 minutes delay. Attached is
> a pdf-document with the lineage, the flow and all the relevant Provenance
> information's for each step in the lineage.
> > > >>
> > > >> The interesting steps are step 5 and 12.
> > > >>
> > > >>
> > > >> Can the issues be that data is not written 100% to disk between
> step 4 and 5 in the flow?
> > > >>
> > > >>
> > > >> Kind regards
> > > >>
> > > >> Jens M. Kofoed
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <joe.witt@gmail.com
> >:
> > > >>
> > > >>
> > > >> Jens,
> > > >>
> > > >>
> > > >> Also what type of file system/storage system are you running NiFi on
> > > >>
> > > >> in this case?  We'll need to know this for the NiFi
> > > >>
> > > >> content/flowfile/provenance repositories? Is it NFS?
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com>
> wrote:
> > > >>
> > > >>
> > > >> Jens,
> > > >>
> > > >>
> > > >> And to further narrow this down
> > > >>
> > > >>
> > > >> "I have a test flow, where a GenerateFlowfile has created 6x 1GB
> files
> > > >>
> > > >> (2 files per node) and next process was a hashcontent before it run
> > > >>
> > > >> into a test loop. Where files are uploaded via PutSFTP to a test
> > > >>
> > > >> server, and downloaded again and recalculated the hash. I have had
> one
> > > >>
> > > >> issue after 3 days of running."
> > > >>
> > > >>
> > > >> So to be clear with GenerateFlowFile making these files and then you
> > > >>
> > > >> looping the content is wholly and fully exclusively within the
> control
> > > >>
> > > >> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping
> the
> > > >>
> > > >> same files over and over in nifi itself you can make this happen or
> > > >>
> > > >> cannot?
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>
> wrote:
> > > >>
> > > >>
> > > >> Jens,
> > > >>
> > > >>
> > > >> "After fetching a FlowFile-stream file and unpacked it back into
> NiFi
> > > >>
> > > >> I calculate a sha256. 1 minutes later I recalculate the sha256 on
> the
> > > >>
> > > >> exact same file. And got a new hash. That is what worry’s me.
> > > >>
> > > >> The fact that the same file can be recalculated and produce two
> > > >>
> > > >> different hashes, is very strange, but it happens. "
> > > >>
> > > >>
> > > >> Ok so to confirm you are saying that in each case this happens you
> see
> > > >>
> > > >> it first compute the wrong hash, but then if you retry the same
> > > >>
> > > >> flowfile it then provides the correct hash?
> > > >>
> > > >>
> > > >> Can you please also show/share the lineage history for such a flow
> > > >>
> > > >> file then?  It should have events for the initial hash, second hash,
> > > >>
> > > >> the unpacking, trace to the original stream, etc...
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > > >>
> > > >>
> > > >> Dear Mark and Joe
> > > >>
> > > >>
> > > >> I know my setup isn’t normal for many people. But if we only looks
> at my receive side, which the last mails is about. Every thing is happening
> at the same NIFI instance. It is the same 3 node NIFI cluster.
> > > >>
> > > >> After fetching a FlowFile-stream file and unpacked it back into
> NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> exact same file. And got a new hash. That is what worry’s me.
> > > >>
> > > >> The fact that the same file can be recalculated and produce two
> different hashes, is very strange, but it happens. Over the last 5 months
> it have only happen 35-40 times.
> > > >>
> > > >>
> > > >> I can understand if the file is not completely loaded and saved
> into the content repository before the hashing starts. But I believe that
> the unpack process don’t forward the flow file to the next process before
> it is 100% finish unpacking and saving the new content to the repository.
> > > >>
> > > >>
> > > >> I have a test flow, where a GenerateFlowfile has created 6x 1GB
> files (2 files per node) and next process was a hashcontent before it run
> into a test loop. Where files are uploaded via PutSFTP to a test server,
> and downloaded again and recalculated the hash. I have had one issue after
> 3 days of running.
> > > >>
> > > >> Now the test flow is running without the Put/Fetch sftp processors.
> > > >>
> > > >>
> > > >> Another problem is that I can’t find any correlation to other
> events. Not within NIFI, nor the server itself or VMWare. If I just could
> find any other event which happens at the same time, I might be able to
> force some kind of event to trigger the issue.
> > > >>
> > > >> I have tried to force VMware to migrate a NiFi node to another
> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
> trigger and error.
> > > >>
> > > >>
> > > >> I know it will be very very difficult to reproduce. But I will
> setup multiple NiFi instances running different test flows to see if I can
> find any reason why it behaves as it does.
> > > >>
> > > >>
> > > >> Kind Regards
> > > >>
> > > >> Jens M. Kofoed
> > > >>
> > > >>
> > > >> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <markap14@hotmail.com
> >:
> > > >>
> > > >>
> > > >> Jens,
> > > >>
> > > >>
> > > >> Thanks for sharing the images.
> > > >>
> > > >>
> > > >> I tried to setup a test to reproduce the issue. I’ve had it running
> for quite some time. Running through millions of iterations.
> > > >>
> > > >>
> > > >> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the
> tune of hundreds of MB). I’ve been unable to reproduce an issue after
> millions of iterations.
> > > >>
> > > >>
> > > >> So far I cannot replicate. And since you’re pulling the data via
> SFTP and then unpacking, which preserves all original attributes from a
> different system, this can easily become confusing.
> > > >>
> > > >>
> > > >> Recommend trying to reproduce with SFTP-related processors out of
> the picture, as Joe is mentioning. Either using GetFile/FetchFile or
> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
> an ‘initial hash’, copy that value to another attribute, and then loop,
> generating the hash and comparing against the original one. I’ll attach a
> flow that does this, but not sure if the email server will strip out the
> attachment or not.
> > > >>
> > > >>
> > > >> This way we remove any possibility of actual corruption between the
> two nifi instances. If we can still see corruption / different hashes
> within a single nifi instance, then it certainly warrants further
> investigation but i can’t see any issues so far.
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >> -Mark
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
> > > >>
> > > >>
> > > >> Jens
> > > >>
> > > >>
> > > >> Actually is this current loop test contained within a single nifi
> and there you see corruption happen?
> > > >>
> > > >>
> > > >> Joe
> > > >>
> > > >>
> > > >> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
> wrote:
> > > >>
> > > >>
> > > >> Jens,
> > > >>
> > > >>
> > > >> You have a very involved setup including other systems (non NiFi).
> Have you removed those systems from the equation so you have more evidence
> to support your expectation that NiFi is doing something other than you
> expect?
> > > >>
> > > >>
> > > >> Joe
> > > >>
> > > >>
> > > >> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > > >>
> > > >>
> > > >> Hi
> > > >>
> > > >>
> > > >> Today I have another file which have been running through the retry
> loop one time. To test the processors and the algorithm I added the
> HashContent processor and also added hashing by SHA-1.
> > > >>
> > > >> I file have been going through the system, and both the SHA-1 and
> SHA-256 are both different than expected. with a 1 minutes delay the file
> is going back into the hashing content flow and this time it calculates
> both hashes fine.
> > > >>
> > > >>
> > > >> I don't believe that the hashing is buggy, but something is very
> very strange. What can influence the processors/algorithm to calculate a
> different hash???
> > > >>
> > > >> All the input/output claim information is exactly the same. It is
> the same flow/content file going in a loop. It happens on all 3 nodes.
> > > >>
> > > >>
> > > >> Any suggestions for where to dig ?
> > > >>
> > > >>
> > > >> Regards
> > > >>
> > > >> Jens M. Kofoed
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
> > > >>
> > > >>
> > > >> Hi Mark
> > > >>
> > > >>
> > > >> Thanks for replaying and the suggestion to look at the content
> Claim.
> > > >>
> > > >> These 3 pictures is from the first attempt:
> > > >>
> > > >> <image.png>   <image.png>   <image.png>
> > > >>
> > > >>
> > > >> Yesterday I realized that the content was still in the archive, so
> I could Replay the file.
> > > >>
> > > >> <image.png>
> > > >>
> > > >> So here are the same pictures but for the replay and as you can see
> the Identifier, offset and Size are all the same.
> > > >>
> > > >> <image.png>   <image.png>   <image.png>
> > > >>
> > > >>
> > > >> In my flow if the hash does not match my original first calculated
> hash, it goes into a retry loop. Here are the pictures for the 4th time the
> file went through:
> > > >>
> > > >> <image.png>   <image.png>   <image.png>
> > > >>
> > > >> Here the content Claim is all the same.
> > > >>
> > > >>
> > > >> It is very rare that we see these issues <1 : 1.000.000 files and
> only with large files. Only once have I seen the error with a 110MB file,
> the other times the files size are above 800MB.
> > > >>
> > > >> This time it was a Nifi-Flowstream v3 file, which has been exported
> from one system and imported in another. But while the file has been
> imported it is the same file inside NIFI and it stays at the same node.
> Going through the same loop of processors multiple times and in the end the
> CryptographicHashContent calculate a different SHA256 than it did earlier.
> This should not be possible!!! And that is what concern my the most.
> > > >>
> > > >> What can influence the same processor to calculate 2 different
> sha256 on the exact same content???
> > > >>
> > > >>
> > > >> Regards
> > > >>
> > > >> Jens M. Kofoed
> > > >>
> > > >>
> > > >>
> > > >> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
> markap14@hotmail.com>:
> > > >>
> > > >>
> > > >> Jens,
> > > >>
> > > >>
> > > >> In the two provenance events - one showing a hash of dd4cc… and the
> other showing f6f0….
> > > >>
> > > >> If you go to the Content tab, do they both show the same Content
> Claim? I.e., do the Input Claim / Output Claim show the same values for
> Container, Section, Identifier, Offset, and Size?
> > > >>
> > > >>
> > > >> Thanks
> > > >>
> > > >> -Mark
> > > >>
> > > >>
> > > >> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>
> wrote:
> > > >>
> > > >>
> > > >> Dear NIFI Users
> > > >>
> > > >>
> > > >> I have posted this mail in the developers mailing list and just
> want to inform all of our about a very odd behavior we are facing.
> > > >>
> > > >> The background:
> > > >>
> > > >> We have data going between 2 different NIFI systems which has no
> direct network access to each other. Therefore we calculate a SHA256 hash
> value of the content at system 1, before the flowfile and data are combined
> and saved as a "flowfile-stream-v3" pkg file. The file is then transported
> to system 2, where the pkg file is unpacked and the flow can continue. To
> be sure about file integrity we calculate a new sha256 at system 2. But
> sometimes we see that the sha256 gets another value, which might suggest
> the file was corrupted. But recalculating the sha256 again gives a new hash
> value.
> > > >>
> > > >>
> > > >> ----
> > > >>
> > > >>
> > > >> Tonight I had yet another file which didn't match the expected
> sha256 hash value. The content is a 1.7GB file and the Event Duration was
> "00:00:17.539" to calculate the hash.
> > > >>
> > > >> I have created a Retry loop, where the file will go to a Wait
> process for delaying the file 1 minute and going back to the
> CryptographicHashContent for a new calculation. After 3 retries the file
> goes to the retries_exceeded and goes to a disabled process just to be in a
> queue so I manually can look at it. This morning I rerouted the file from
> my retries_exceeded queue back to the CryptographicHashContent for a new
> calculation and this time it calculated the correct hash value.
> > > >>
> > > >>
> > > >> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange
> is happening.
> > > >>
> > > >> <image.png>
> > > >>
> > > >>
> > > >> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02
> with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> > > >>
> > > >> I have inspected different logs to see if I can find any
> correlation what happened at the same time as the file is going through my
> loop, but there are no event/task at that exact time.
> > > >>
> > > >>
> > > >> System 1:
> > > >>
> > > >> At 10/19/2021 00:15:11.247 CEST my file is going through a
> CryptographicHashContent: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > >>
> > > >> The file is exported as a "FlowFile Stream, v3" to System 2
> > > >>
> > > >>
> > > >> SYSTEM 2:
> > > >>
> > > >> At 10/19/2021 00:18:10.528 CEST the file is going through a
> CryptographicHashContent: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > >>
> > > >> <image.png>
> > > >>
> > > >> At 10/19/2021 00:19:08.996 CEST the file is going through the same
> CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > >>
> > > >> At 10/19/2021 00:20:04.376 CEST the file is going through the same
> a CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > >>
> > > >> At 10/19/2021 00:21:01.711 CEST the file is going through the same
> a CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > >>
> > > >>
> > > >> At 10/19/2021 06:07:43.376 CEST the file is going through the same
> a CryptographicHashContent at system 2: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > >>
> > > >> <image.png>
> > > >>
> > > >>
> > > >> How on earth can this happen???
> > > >>
> > > >>
> > > >> Kind Regards
> > > >>
> > > >> Jens M. Kofoed
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> <Repro.json>
> > > >>
> > > >>
> > > >> <Try_to_recreate_Jens_Challenge.json>
> > > >>
> > > >>
> > > >>
> > > >>
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jen

118 hours in - still goood.

Thanks

On Fri, Oct 29, 2021 at 10:22 AM Joe Witt <jo...@gmail.com> wrote:
>
> Jens
>
> Update from hour 67.  Still lookin' good.
>
> Will advise.
>
> Thanks
>
> On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >
> > Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 days before the first error occurred
> >
> > Thanks
> >
> > > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
> > >
> > > Jens,
> > >
> > > Am 40+ hours in running both your flow and mine to reproduce.  So far
> > > neither have shown any sign of trouble.  Will keep running for another
> > > week or so if I can.
> > >
> > > Thanks
> > >
> > >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com> wrote:
> > >>
> > >> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
> > >> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
> > >>
> > >> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> > >>
> > >> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
> > >>
> > >> Kind regards
> > >> Jens
> > >>
> > >>
> > >>
> > >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
> > >>
> > >> Jens,
> > >>
> > >> I don't quite follow the EXT4 usage on top of VMFS but the point here
> > >> is you'll ultimately need to truly understand your underlying storage
> > >> system and what sorts of guarantees it is giving you.  If linux/the
> > >> jvm/nifi think it has a typical EXT4 type block storage system to work
> > >> with it can only be safe/operate within those constraints.  I have no
> > >> idea about what VMFS brings to the table or the settings for it.
> > >>
> > >> The sync properties I shared previously might help force the issue of
> > >> ensuring a formal sync/flush cycle all the way through the disk has
> > >> occurred which we'd normally not do or need to do but again in some
> > >> cases offers a stronger guarantee in exchange for performance.
> > >>
> > >> In any case...Mark's path for you here will help identify what we're
> > >> dealing with and we can go from there.
> > >>
> > >> I am aware of significant usage of NiFi on VMWare configurations
> > >> without issue at high rates for many years so whatever it is here is
> > >> likely solvable.
> > >>
> > >> Thanks
> > >>
> > >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> > >>
> > >>
> > >> Hi Mark
> > >>
> > >>
> > >> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
> > >>
> > >> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
> > >>
> > >> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
> > >>
> > >>
> > >> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
> > >>
> > >>
> > >> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
> > >>
> > >> Just like the case in step 5 and 12 in the pdf file
> > >>
> > >>
> > >> I will let you all know more later next week
> > >>
> > >>
> > >> Kind regards
> > >>
> > >> Jens
> > >>
> > >>
> > >>
> > >>
> > >> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
> > >>
> > >>
> > >> And the actual script:
> > >>
> > >>
> > >>
> > >> import org.apache.nifi.flowfile.FlowFile
> > >>
> > >>
> > >> import java.util.stream.Collectors
> > >>
> > >>
> > >> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
> > >>
> > >>   final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
> > >>
> > >>       .filter({ entry -> entry.getKey().startsWith("histogram.") })
> > >>
> > >>       .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
> > >>
> > >>   return histogram;
> > >>
> > >> }
> > >>
> > >>
> > >> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
> > >>
> > >>   final Map<String, String> histogram = new HashMap<>();
> > >>
> > >>   final int[] distribution = new int[256];
> > >>
> > >>   Arrays.fill(distribution, 0);
> > >>
> > >>
> > >>   long total = 0L;
> > >>
> > >>   final byte[] buffer = new byte[8192];
> > >>
> > >>   int len;
> > >>
> > >>   while ((len = inStream.read(buffer)) > 0) {
> > >>
> > >>       for (int i=0; i < len; i++) {
> > >>
> > >>           final int val = buffer[i];
> > >>
> > >>           distribution[val]++;
> > >>
> > >>           total++;
> > >>
> > >>       }
> > >>
> > >>   }
> > >>
> > >>
> > >>   for (int i=0; i < 256; i++) {
> > >>
> > >>       histogram.put("histogram." + i, String.valueOf(distribution[i]));
> > >>
> > >>   }
> > >>
> > >>   histogram.put("histogram.totalBytes", String.valueOf(total));
> > >>
> > >>
> > >>   return histogram;
> > >>
> > >> }
> > >>
> > >>
> > >> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
> > >>
> > >>   final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
> > >>
> > >>   final Map<String, String> sorted = new TreeMap<>(previous)
> > >>
> > >>   for (final Map.Entry<String, String> entry : sorted.entrySet()) {
> > >>
> > >>       final String key = entry.getKey();
> > >>
> > >>       final String previousValue = entry.getValue();
> > >>
> > >>       final String updatedValue = updated.get(entry.getKey())
> > >>
> > >>
> > >>       if (!Objects.equals(previousValue, updatedValue)) {
> > >>
> > >>           sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
> > >>
> > >>       }
> > >>
> > >>   }
> > >>
> > >>
> > >>   log.error(sb.toString());
> > >>
> > >> }
> > >>
> > >>
> > >>
> > >> def flowFile = session.get()
> > >>
> > >> if (flowFile == null) {
> > >>
> > >>   return
> > >>
> > >> }
> > >>
> > >>
> > >> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
> > >>
> > >> Map<String, String> histogram = null;
> > >>
> > >>
> > >> final InputStream inStream = session.read(flowFile);
> > >>
> > >> try {
> > >>
> > >>   histogram = createHistogram(flowFile, inStream);
> > >>
> > >> } finally {
> > >>
> > >>   inStream.close()
> > >>
> > >> }
> > >>
> > >>
> > >> if (!previousHistogram.isEmpty()) {
> > >>
> > >>   if (previousHistogram.equals(histogram)) {
> > >>
> > >>       log.info("Histograms match")
> > >>
> > >>   } else {
> > >>
> > >>       logHistogramDifferences(previousHistogram, histogram)
> > >>
> > >>       session.transfer(flowFile, REL_FAILURE)
> > >>
> > >>       return;
> > >>
> > >>   }
> > >>
> > >> }
> > >>
> > >>
> > >> flowFile = session.putAllAttributes(flowFile, histogram)
> > >>
> > >> session.transfer(flowFile, REL_SUCCESS)
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
> > >>
> > >>
> > >> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
> > >>
> > >>
> > >> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
> > >>
> > >>
> > >> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
> > >>
> > >> histogram.0 = 280273
> > >>
> > >> histogram.1 = 2820
> > >>
> > >> histogram.2 = 48202
> > >>
> > >> histogram.3 = 3820
> > >>
> > >> …
> > >>
> > >> histogram.totalBytes = 1780928732
> > >>
> > >>
> > >> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
> > >>
> > >>
> > >> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
> > >>
> > >>
> > >> Thanks
> > >>
> > >> -Mark
> > >>
> > >>
> > >>
> > >>
> > >> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >>
> > >> Jens
> > >>
> > >>
> > >> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >>
> > >> Jens
> > >>
> > >>
> > >> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
> > >>
> > >>
> > >> Still want to know more about your underlying storage system.
> > >>
> > >>
> > >> You could also try updating nifi.properties and changing the following lines:
> > >>
> > >> nifi.flowfile.repository.always.sync=true
> > >>
> > >> nifi.content.repository.always.sync=true
> > >>
> > >> nifi.provenance.repository.always.sync=true
> > >>
> > >>
> > >> It will hurt performance but can be useful/necessary on certain storage subsystems.
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >>
> > >> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
> > >>
> > >>
> > >> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
> > >>
> > >>
> > >> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
> > >>
> > >>
> > >> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
> > >>
> > >>
> > >> Thanks
> > >>
> > >> Joe
> > >>
> > >>
> > >> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
> > >>
> > >>
> > >> Dear Joe and Mark
> > >>
> > >>
> > >> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
> > >>
> > >> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
> > >>
> > >> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
> > >>
> > >>
> > >> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
> > >>
> > >>
> > >> Kind regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
> > >>
> > >>
> > >> Joe,
> > >>
> > >>
> > >> To start from the last mail :-)
> > >>
> > >> All the repositories has it's own disk, and I'm using ext4
> > >>
> > >> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
> > >>
> > >> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
> > >>
> > >> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
> > >>
> > >>
> > >> My test flow WITH sftp looks like this:
> > >>
> > >> <image.png>
> > >>
> > >> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
> > >>
> > >> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
> > >>
> > >>
> > >> About the Lineage:
> > >>
> > >> Are there a way to export all the lineage data? The export only generate a svg file.
> > >>
> > >> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
> > >>
> > >> The interesting steps are step 5 and 12.
> > >>
> > >>
> > >> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
> > >>
> > >>
> > >> Kind regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >>
> > >> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> Also what type of file system/storage system are you running NiFi on
> > >>
> > >> in this case?  We'll need to know this for the NiFi
> > >>
> > >> content/flowfile/provenance repositories? Is it NFS?
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> And to further narrow this down
> > >>
> > >>
> > >> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> > >>
> > >> (2 files per node) and next process was a hashcontent before it run
> > >>
> > >> into a test loop. Where files are uploaded via PutSFTP to a test
> > >>
> > >> server, and downloaded again and recalculated the hash. I have had one
> > >>
> > >> issue after 3 days of running."
> > >>
> > >>
> > >> So to be clear with GenerateFlowFile making these files and then you
> > >>
> > >> looping the content is wholly and fully exclusively within the control
> > >>
> > >> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> > >>
> > >> same files over and over in nifi itself you can make this happen or
> > >>
> > >> cannot?
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> "After fetching a FlowFile-stream file and unpacked it back into NiFi
> > >>
> > >> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> > >>
> > >> exact same file. And got a new hash. That is what worry’s me.
> > >>
> > >> The fact that the same file can be recalculated and produce two
> > >>
> > >> different hashes, is very strange, but it happens. "
> > >>
> > >>
> > >> Ok so to confirm you are saying that in each case this happens you see
> > >>
> > >> it first compute the wrong hash, but then if you retry the same
> > >>
> > >> flowfile it then provides the correct hash?
> > >>
> > >>
> > >> Can you please also show/share the lineage history for such a flow
> > >>
> > >> file then?  It should have events for the initial hash, second hash,
> > >>
> > >> the unpacking, trace to the original stream, etc...
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> > >>
> > >>
> > >> Dear Mark and Joe
> > >>
> > >>
> > >> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> > >>
> > >> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> > >>
> > >> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
> > >>
> > >>
> > >> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
> > >>
> > >>
> > >> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> > >>
> > >> Now the test flow is running without the Put/Fetch sftp processors.
> > >>
> > >>
> > >> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> > >>
> > >> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
> > >>
> > >>
> > >> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
> > >>
> > >>
> > >> Kind Regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> Thanks for sharing the images.
> > >>
> > >>
> > >> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> > >>
> > >>
> > >> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> > >>
> > >>
> > >> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> > >>
> > >>
> > >> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> > >>
> > >>
> > >> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> > >>
> > >>
> > >> Thanks
> > >>
> > >> -Mark
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >>
> > >> Jens
> > >>
> > >>
> > >> Actually is this current loop test contained within a single nifi and there you see corruption happen?
> > >>
> > >>
> > >> Joe
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
> > >>
> > >>
> > >> Joe
> > >>
> > >>
> > >> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> > >>
> > >>
> > >> Hi
> > >>
> > >>
> > >> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> > >>
> > >> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
> > >>
> > >>
> > >> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> > >>
> > >> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
> > >>
> > >>
> > >> Any suggestions for where to dig ?
> > >>
> > >>
> > >> Regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >>
> > >> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
> > >>
> > >>
> > >> Hi Mark
> > >>
> > >>
> > >> Thanks for replaying and the suggestion to look at the content Claim.
> > >>
> > >> These 3 pictures is from the first attempt:
> > >>
> > >> <image.png>   <image.png>   <image.png>
> > >>
> > >>
> > >> Yesterday I realized that the content was still in the archive, so I could Replay the file.
> > >>
> > >> <image.png>
> > >>
> > >> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> > >>
> > >> <image.png>   <image.png>   <image.png>
> > >>
> > >>
> > >> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> > >>
> > >> <image.png>   <image.png>   <image.png>
> > >>
> > >> Here the content Claim is all the same.
> > >>
> > >>
> > >> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> > >>
> > >> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> > >>
> > >> What can influence the same processor to calculate 2 different sha256 on the exact same content???
> > >>
> > >>
> > >> Regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
> > >>
> > >>
> > >> Jens,
> > >>
> > >>
> > >> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> > >>
> > >> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
> > >>
> > >>
> > >> Thanks
> > >>
> > >> -Mark
> > >>
> > >>
> > >> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
> > >>
> > >>
> > >> Dear NIFI Users
> > >>
> > >>
> > >> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> > >>
> > >> The background:
> > >>
> > >> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
> > >>
> > >>
> > >> ----
> > >>
> > >>
> > >> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> > >>
> > >> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
> > >>
> > >>
> > >> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> > >>
> > >> <image.png>
> > >>
> > >>
> > >> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> > >>
> > >> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
> > >>
> > >>
> > >> System 1:
> > >>
> > >> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > >>
> > >> The file is exported as a "FlowFile Stream, v3" to System 2
> > >>
> > >>
> > >> SYSTEM 2:
> > >>
> > >> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>
> > >> <image.png>
> > >>
> > >> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>
> > >> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>
> > >> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >>
> > >>
> > >> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > >>
> > >> <image.png>
> > >>
> > >>
> > >> How on earth can this happen???
> > >>
> > >>
> > >> Kind Regards
> > >>
> > >> Jens M. Kofoed
> > >>
> > >>
> > >>
> > >>
> > >> <Repro.json>
> > >>
> > >>
> > >> <Try_to_recreate_Jens_Challenge.json>
> > >>
> > >>
> > >>
> > >>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens

Update from hour 67.  Still lookin' good.

Will advise.

Thanks

On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 days before the first error occurred
>
> Thanks
>
> > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
> >
> > Jens,
> >
> > Am 40+ hours in running both your flow and mine to reproduce.  So far
> > neither have shown any sign of trouble.  Will keep running for another
> > week or so if I can.
> >
> > Thanks
> >
> >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>
> >> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
> >> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
> >>
> >> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> >>
> >> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
> >>
> >> Kind regards
> >> Jens
> >>
> >>
> >>
> >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
> >>
> >> Jens,
> >>
> >> I don't quite follow the EXT4 usage on top of VMFS but the point here
> >> is you'll ultimately need to truly understand your underlying storage
> >> system and what sorts of guarantees it is giving you.  If linux/the
> >> jvm/nifi think it has a typical EXT4 type block storage system to work
> >> with it can only be safe/operate within those constraints.  I have no
> >> idea about what VMFS brings to the table or the settings for it.
> >>
> >> The sync properties I shared previously might help force the issue of
> >> ensuring a formal sync/flush cycle all the way through the disk has
> >> occurred which we'd normally not do or need to do but again in some
> >> cases offers a stronger guarantee in exchange for performance.
> >>
> >> In any case...Mark's path for you here will help identify what we're
> >> dealing with and we can go from there.
> >>
> >> I am aware of significant usage of NiFi on VMWare configurations
> >> without issue at high rates for many years so whatever it is here is
> >> likely solvable.
> >>
> >> Thanks
> >>
> >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>
> >>
> >> Hi Mark
> >>
> >>
> >> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
> >>
> >> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
> >>
> >> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
> >>
> >>
> >> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
> >>
> >>
> >> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
> >>
> >> Just like the case in step 5 and 12 in the pdf file
> >>
> >>
> >> I will let you all know more later next week
> >>
> >>
> >> Kind regards
> >>
> >> Jens
> >>
> >>
> >>
> >>
> >> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
> >>
> >>
> >> And the actual script:
> >>
> >>
> >>
> >> import org.apache.nifi.flowfile.FlowFile
> >>
> >>
> >> import java.util.stream.Collectors
> >>
> >>
> >> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
> >>
> >>   final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
> >>
> >>       .filter({ entry -> entry.getKey().startsWith("histogram.") })
> >>
> >>       .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
> >>
> >>   return histogram;
> >>
> >> }
> >>
> >>
> >> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
> >>
> >>   final Map<String, String> histogram = new HashMap<>();
> >>
> >>   final int[] distribution = new int[256];
> >>
> >>   Arrays.fill(distribution, 0);
> >>
> >>
> >>   long total = 0L;
> >>
> >>   final byte[] buffer = new byte[8192];
> >>
> >>   int len;
> >>
> >>   while ((len = inStream.read(buffer)) > 0) {
> >>
> >>       for (int i=0; i < len; i++) {
> >>
> >>           final int val = buffer[i];
> >>
> >>           distribution[val]++;
> >>
> >>           total++;
> >>
> >>       }
> >>
> >>   }
> >>
> >>
> >>   for (int i=0; i < 256; i++) {
> >>
> >>       histogram.put("histogram." + i, String.valueOf(distribution[i]));
> >>
> >>   }
> >>
> >>   histogram.put("histogram.totalBytes", String.valueOf(total));
> >>
> >>
> >>   return histogram;
> >>
> >> }
> >>
> >>
> >> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
> >>
> >>   final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
> >>
> >>   final Map<String, String> sorted = new TreeMap<>(previous)
> >>
> >>   for (final Map.Entry<String, String> entry : sorted.entrySet()) {
> >>
> >>       final String key = entry.getKey();
> >>
> >>       final String previousValue = entry.getValue();
> >>
> >>       final String updatedValue = updated.get(entry.getKey())
> >>
> >>
> >>       if (!Objects.equals(previousValue, updatedValue)) {
> >>
> >>           sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
> >>
> >>       }
> >>
> >>   }
> >>
> >>
> >>   log.error(sb.toString());
> >>
> >> }
> >>
> >>
> >>
> >> def flowFile = session.get()
> >>
> >> if (flowFile == null) {
> >>
> >>   return
> >>
> >> }
> >>
> >>
> >> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
> >>
> >> Map<String, String> histogram = null;
> >>
> >>
> >> final InputStream inStream = session.read(flowFile);
> >>
> >> try {
> >>
> >>   histogram = createHistogram(flowFile, inStream);
> >>
> >> } finally {
> >>
> >>   inStream.close()
> >>
> >> }
> >>
> >>
> >> if (!previousHistogram.isEmpty()) {
> >>
> >>   if (previousHistogram.equals(histogram)) {
> >>
> >>       log.info("Histograms match")
> >>
> >>   } else {
> >>
> >>       logHistogramDifferences(previousHistogram, histogram)
> >>
> >>       session.transfer(flowFile, REL_FAILURE)
> >>
> >>       return;
> >>
> >>   }
> >>
> >> }
> >>
> >>
> >> flowFile = session.putAllAttributes(flowFile, histogram)
> >>
> >> session.transfer(flowFile, REL_SUCCESS)
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
> >>
> >>
> >> Jens,
> >>
> >>
> >> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
> >>
> >>
> >> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
> >>
> >>
> >> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
> >>
> >>
> >> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
> >>
> >> histogram.0 = 280273
> >>
> >> histogram.1 = 2820
> >>
> >> histogram.2 = 48202
> >>
> >> histogram.3 = 3820
> >>
> >> …
> >>
> >> histogram.totalBytes = 1780928732
> >>
> >>
> >> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
> >>
> >>
> >> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
> >>
> >>
> >> Thanks
> >>
> >> -Mark
> >>
> >>
> >>
> >>
> >> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
> >>
> >>
> >> Jens
> >>
> >>
> >> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
> >>
> >>
> >> Thanks
> >>
> >>
> >> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
> >>
> >>
> >> Jens
> >>
> >>
> >> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
> >>
> >>
> >> Still want to know more about your underlying storage system.
> >>
> >>
> >> You could also try updating nifi.properties and changing the following lines:
> >>
> >> nifi.flowfile.repository.always.sync=true
> >>
> >> nifi.content.repository.always.sync=true
> >>
> >> nifi.provenance.repository.always.sync=true
> >>
> >>
> >> It will hurt performance but can be useful/necessary on certain storage subsystems.
> >>
> >>
> >> Thanks
> >>
> >>
> >> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
> >>
> >>
> >> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
> >>
> >>
> >> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
> >>
> >>
> >> Jens,
> >>
> >>
> >> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
> >>
> >>
> >> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
> >>
> >>
> >> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
> >>
> >>
> >> Thanks
> >>
> >> Joe
> >>
> >>
> >> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>
> >>
> >> Dear Joe and Mark
> >>
> >>
> >> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
> >>
> >> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
> >>
> >> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
> >>
> >>
> >> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
> >>
> >>
> >> Kind regards
> >>
> >> Jens M. Kofoed
> >>
> >>
> >>
> >> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
> >>
> >>
> >> Joe,
> >>
> >>
> >> To start from the last mail :-)
> >>
> >> All the repositories has it's own disk, and I'm using ext4
> >>
> >> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
> >>
> >> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
> >>
> >> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
> >>
> >>
> >> My test flow WITH sftp looks like this:
> >>
> >> <image.png>
> >>
> >> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
> >>
> >> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
> >>
> >>
> >> About the Lineage:
> >>
> >> Are there a way to export all the lineage data? The export only generate a svg file.
> >>
> >> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
> >>
> >> The interesting steps are step 5 and 12.
> >>
> >>
> >> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
> >>
> >>
> >> Kind regards
> >>
> >> Jens M. Kofoed
> >>
> >>
> >>
> >>
> >> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
> >>
> >>
> >> Jens,
> >>
> >>
> >> Also what type of file system/storage system are you running NiFi on
> >>
> >> in this case?  We'll need to know this for the NiFi
> >>
> >> content/flowfile/provenance repositories? Is it NFS?
> >>
> >>
> >> Thanks
> >>
> >>
> >> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
> >>
> >>
> >> Jens,
> >>
> >>
> >> And to further narrow this down
> >>
> >>
> >> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> >>
> >> (2 files per node) and next process was a hashcontent before it run
> >>
> >> into a test loop. Where files are uploaded via PutSFTP to a test
> >>
> >> server, and downloaded again and recalculated the hash. I have had one
> >>
> >> issue after 3 days of running."
> >>
> >>
> >> So to be clear with GenerateFlowFile making these files and then you
> >>
> >> looping the content is wholly and fully exclusively within the control
> >>
> >> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> >>
> >> same files over and over in nifi itself you can make this happen or
> >>
> >> cannot?
> >>
> >>
> >> Thanks
> >>
> >>
> >> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
> >>
> >>
> >> Jens,
> >>
> >>
> >> "After fetching a FlowFile-stream file and unpacked it back into NiFi
> >>
> >> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> >>
> >> exact same file. And got a new hash. That is what worry’s me.
> >>
> >> The fact that the same file can be recalculated and produce two
> >>
> >> different hashes, is very strange, but it happens. "
> >>
> >>
> >> Ok so to confirm you are saying that in each case this happens you see
> >>
> >> it first compute the wrong hash, but then if you retry the same
> >>
> >> flowfile it then provides the correct hash?
> >>
> >>
> >> Can you please also show/share the lineage history for such a flow
> >>
> >> file then?  It should have events for the initial hash, second hash,
> >>
> >> the unpacking, trace to the original stream, etc...
> >>
> >>
> >> Thanks
> >>
> >>
> >> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>
> >>
> >> Dear Mark and Joe
> >>
> >>
> >> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> >>
> >> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> >>
> >> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
> >>
> >>
> >> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
> >>
> >>
> >> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> >>
> >> Now the test flow is running without the Put/Fetch sftp processors.
> >>
> >>
> >> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> >>
> >> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
> >>
> >>
> >> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
> >>
> >>
> >> Kind Regards
> >>
> >> Jens M. Kofoed
> >>
> >>
> >> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
> >>
> >>
> >> Jens,
> >>
> >>
> >> Thanks for sharing the images.
> >>
> >>
> >> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> >>
> >>
> >> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> >>
> >>
> >> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> >>
> >>
> >> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> >>
> >>
> >> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> >>
> >>
> >> Thanks
> >>
> >> -Mark
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
> >>
> >>
> >> Jens
> >>
> >>
> >> Actually is this current loop test contained within a single nifi and there you see corruption happen?
> >>
> >>
> >> Joe
> >>
> >>
> >> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
> >>
> >>
> >> Jens,
> >>
> >>
> >> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
> >>
> >>
> >> Joe
> >>
> >>
> >> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >>
> >>
> >> Hi
> >>
> >>
> >> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> >>
> >> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
> >>
> >>
> >> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> >>
> >> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
> >>
> >>
> >> Any suggestions for where to dig ?
> >>
> >>
> >> Regards
> >>
> >> Jens M. Kofoed
> >>
> >>
> >>
> >>
> >> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
> >>
> >>
> >> Hi Mark
> >>
> >>
> >> Thanks for replaying and the suggestion to look at the content Claim.
> >>
> >> These 3 pictures is from the first attempt:
> >>
> >> <image.png>   <image.png>   <image.png>
> >>
> >>
> >> Yesterday I realized that the content was still in the archive, so I could Replay the file.
> >>
> >> <image.png>
> >>
> >> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> >>
> >> <image.png>   <image.png>   <image.png>
> >>
> >>
> >> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> >>
> >> <image.png>   <image.png>   <image.png>
> >>
> >> Here the content Claim is all the same.
> >>
> >>
> >> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> >>
> >> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> >>
> >> What can influence the same processor to calculate 2 different sha256 on the exact same content???
> >>
> >>
> >> Regards
> >>
> >> Jens M. Kofoed
> >>
> >>
> >>
> >> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
> >>
> >>
> >> Jens,
> >>
> >>
> >> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> >>
> >> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
> >>
> >>
> >> Thanks
> >>
> >> -Mark
> >>
> >>
> >> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
> >>
> >>
> >> Dear NIFI Users
> >>
> >>
> >> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> >>
> >> The background:
> >>
> >> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
> >>
> >>
> >> ----
> >>
> >>
> >> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> >>
> >> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
> >>
> >>
> >> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> >>
> >> <image.png>
> >>
> >>
> >> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> >>
> >> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
> >>
> >>
> >> System 1:
> >>
> >> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> >>
> >> The file is exported as a "FlowFile Stream, v3" to System 2
> >>
> >>
> >> SYSTEM 2:
> >>
> >> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>
> >> <image.png>
> >>
> >> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>
> >> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>
> >> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >>
> >>
> >> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> >>
> >> <image.png>
> >>
> >>
> >> How on earth can this happen???
> >>
> >>
> >> Kind Regards
> >>
> >> Jens M. Kofoed
> >>
> >>
> >>
> >>
> >> <Repro.json>
> >>
> >>
> >> <Try_to_recreate_Jens_Challenge.json>
> >>
> >>
> >>
> >>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 days before the first error occurred

Thanks

> Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <jo...@gmail.com>:
> 
> Jens,
> 
> Am 40+ hours in running both your flow and mine to reproduce.  So far
> neither have shown any sign of trouble.  Will keep running for another
> week or so if I can.
> 
> Thanks
> 
>> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
>> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
>> 
>> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
>> 
>> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
>> 
>> Kind regards
>> Jens
>> 
>> 
>> 
>> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
>> 
>> Jens,
>> 
>> I don't quite follow the EXT4 usage on top of VMFS but the point here
>> is you'll ultimately need to truly understand your underlying storage
>> system and what sorts of guarantees it is giving you.  If linux/the
>> jvm/nifi think it has a typical EXT4 type block storage system to work
>> with it can only be safe/operate within those constraints.  I have no
>> idea about what VMFS brings to the table or the settings for it.
>> 
>> The sync properties I shared previously might help force the issue of
>> ensuring a formal sync/flush cycle all the way through the disk has
>> occurred which we'd normally not do or need to do but again in some
>> cases offers a stronger guarantee in exchange for performance.
>> 
>> In any case...Mark's path for you here will help identify what we're
>> dealing with and we can go from there.
>> 
>> I am aware of significant usage of NiFi on VMWare configurations
>> without issue at high rates for many years so whatever it is here is
>> likely solvable.
>> 
>> Thanks
>> 
>> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> 
>> Hi Mark
>> 
>> 
>> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
>> 
>> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
>> 
>> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
>> 
>> 
>> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
>> 
>> 
>> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
>> 
>> Just like the case in step 5 and 12 in the pdf file
>> 
>> 
>> I will let you all know more later next week
>> 
>> 
>> Kind regards
>> 
>> Jens
>> 
>> 
>> 
>> 
>> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
>> 
>> 
>> And the actual script:
>> 
>> 
>> 
>> import org.apache.nifi.flowfile.FlowFile
>> 
>> 
>> import java.util.stream.Collectors
>> 
>> 
>> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>> 
>>   final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
>> 
>>       .filter({ entry -> entry.getKey().startsWith("histogram.") })
>> 
>>       .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
>> 
>>   return histogram;
>> 
>> }
>> 
>> 
>> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
>> 
>>   final Map<String, String> histogram = new HashMap<>();
>> 
>>   final int[] distribution = new int[256];
>> 
>>   Arrays.fill(distribution, 0);
>> 
>> 
>>   long total = 0L;
>> 
>>   final byte[] buffer = new byte[8192];
>> 
>>   int len;
>> 
>>   while ((len = inStream.read(buffer)) > 0) {
>> 
>>       for (int i=0; i < len; i++) {
>> 
>>           final int val = buffer[i];
>> 
>>           distribution[val]++;
>> 
>>           total++;
>> 
>>       }
>> 
>>   }
>> 
>> 
>>   for (int i=0; i < 256; i++) {
>> 
>>       histogram.put("histogram." + i, String.valueOf(distribution[i]));
>> 
>>   }
>> 
>>   histogram.put("histogram.totalBytes", String.valueOf(total));
>> 
>> 
>>   return histogram;
>> 
>> }
>> 
>> 
>> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
>> 
>>   final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
>> 
>>   final Map<String, String> sorted = new TreeMap<>(previous)
>> 
>>   for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>> 
>>       final String key = entry.getKey();
>> 
>>       final String previousValue = entry.getValue();
>> 
>>       final String updatedValue = updated.get(entry.getKey())
>> 
>> 
>>       if (!Objects.equals(previousValue, updatedValue)) {
>> 
>>           sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
>> 
>>       }
>> 
>>   }
>> 
>> 
>>   log.error(sb.toString());
>> 
>> }
>> 
>> 
>> 
>> def flowFile = session.get()
>> 
>> if (flowFile == null) {
>> 
>>   return
>> 
>> }
>> 
>> 
>> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
>> 
>> Map<String, String> histogram = null;
>> 
>> 
>> final InputStream inStream = session.read(flowFile);
>> 
>> try {
>> 
>>   histogram = createHistogram(flowFile, inStream);
>> 
>> } finally {
>> 
>>   inStream.close()
>> 
>> }
>> 
>> 
>> if (!previousHistogram.isEmpty()) {
>> 
>>   if (previousHistogram.equals(histogram)) {
>> 
>>       log.info("Histograms match")
>> 
>>   } else {
>> 
>>       logHistogramDifferences(previousHistogram, histogram)
>> 
>>       session.transfer(flowFile, REL_FAILURE)
>> 
>>       return;
>> 
>>   }
>> 
>> }
>> 
>> 
>> flowFile = session.putAllAttributes(flowFile, histogram)
>> 
>> session.transfer(flowFile, REL_SUCCESS)
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
>> 
>> 
>> Jens,
>> 
>> 
>> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
>> 
>> 
>> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
>> 
>> 
>> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
>> 
>> 
>> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
>> 
>> histogram.0 = 280273
>> 
>> histogram.1 = 2820
>> 
>> histogram.2 = 48202
>> 
>> histogram.3 = 3820
>> 
>> …
>> 
>> histogram.totalBytes = 1780928732
>> 
>> 
>> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
>> 
>> 
>> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
>> 
>> 
>> Thanks
>> 
>> -Mark
>> 
>> 
>> 
>> 
>> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>> 
>> 
>> Jens
>> 
>> 
>> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
>> 
>> 
>> Thanks
>> 
>> 
>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>> 
>> 
>> Jens
>> 
>> 
>> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>> 
>> 
>> Still want to know more about your underlying storage system.
>> 
>> 
>> You could also try updating nifi.properties and changing the following lines:
>> 
>> nifi.flowfile.repository.always.sync=true
>> 
>> nifi.content.repository.always.sync=true
>> 
>> nifi.provenance.repository.always.sync=true
>> 
>> 
>> It will hurt performance but can be useful/necessary on certain storage subsystems.
>> 
>> 
>> Thanks
>> 
>> 
>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>> 
>> 
>> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>> 
>> 
>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>> 
>> 
>> Jens,
>> 
>> 
>> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>> 
>> 
>> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>> 
>> 
>> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>> 
>> 
>> Thanks
>> 
>> Joe
>> 
>> 
>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> 
>> Dear Joe and Mark
>> 
>> 
>> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>> 
>> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>> 
>> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>> 
>> 
>> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>> 
>> 
>> Kind regards
>> 
>> Jens M. Kofoed
>> 
>> 
>> 
>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>> 
>> 
>> Joe,
>> 
>> 
>> To start from the last mail :-)
>> 
>> All the repositories has it's own disk, and I'm using ext4
>> 
>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>> 
>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>> 
>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>> 
>> 
>> My test flow WITH sftp looks like this:
>> 
>> <image.png>
>> 
>> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>> 
>> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>> 
>> 
>> About the Lineage:
>> 
>> Are there a way to export all the lineage data? The export only generate a svg file.
>> 
>> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>> 
>> The interesting steps are step 5 and 12.
>> 
>> 
>> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>> 
>> 
>> Kind regards
>> 
>> Jens M. Kofoed
>> 
>> 
>> 
>> 
>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>> 
>> 
>> Jens,
>> 
>> 
>> Also what type of file system/storage system are you running NiFi on
>> 
>> in this case?  We'll need to know this for the NiFi
>> 
>> content/flowfile/provenance repositories? Is it NFS?
>> 
>> 
>> Thanks
>> 
>> 
>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>> 
>> 
>> Jens,
>> 
>> 
>> And to further narrow this down
>> 
>> 
>> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>> 
>> (2 files per node) and next process was a hashcontent before it run
>> 
>> into a test loop. Where files are uploaded via PutSFTP to a test
>> 
>> server, and downloaded again and recalculated the hash. I have had one
>> 
>> issue after 3 days of running."
>> 
>> 
>> So to be clear with GenerateFlowFile making these files and then you
>> 
>> looping the content is wholly and fully exclusively within the control
>> 
>> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>> 
>> same files over and over in nifi itself you can make this happen or
>> 
>> cannot?
>> 
>> 
>> Thanks
>> 
>> 
>> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>> 
>> 
>> Jens,
>> 
>> 
>> "After fetching a FlowFile-stream file and unpacked it back into NiFi
>> 
>> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>> 
>> exact same file. And got a new hash. That is what worry’s me.
>> 
>> The fact that the same file can be recalculated and produce two
>> 
>> different hashes, is very strange, but it happens. "
>> 
>> 
>> Ok so to confirm you are saying that in each case this happens you see
>> 
>> it first compute the wrong hash, but then if you retry the same
>> 
>> flowfile it then provides the correct hash?
>> 
>> 
>> Can you please also show/share the lineage history for such a flow
>> 
>> file then?  It should have events for the initial hash, second hash,
>> 
>> the unpacking, trace to the original stream, etc...
>> 
>> 
>> Thanks
>> 
>> 
>> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> 
>> Dear Mark and Joe
>> 
>> 
>> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>> 
>> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>> 
>> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>> 
>> 
>> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>> 
>> 
>> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>> 
>> Now the test flow is running without the Put/Fetch sftp processors.
>> 
>> 
>> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>> 
>> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>> 
>> 
>> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>> 
>> 
>> Kind Regards
>> 
>> Jens M. Kofoed
>> 
>> 
>> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>> 
>> 
>> Jens,
>> 
>> 
>> Thanks for sharing the images.
>> 
>> 
>> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>> 
>> 
>> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>> 
>> 
>> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>> 
>> 
>> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>> 
>> 
>> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>> 
>> 
>> Thanks
>> 
>> -Mark
>> 
>> 
>> 
>> 
>> 
>> 
>> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>> 
>> 
>> Jens
>> 
>> 
>> Actually is this current loop test contained within a single nifi and there you see corruption happen?
>> 
>> 
>> Joe
>> 
>> 
>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>> 
>> 
>> Jens,
>> 
>> 
>> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>> 
>> 
>> Joe
>> 
>> 
>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> 
>> Hi
>> 
>> 
>> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>> 
>> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>> 
>> 
>> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>> 
>> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>> 
>> 
>> Any suggestions for where to dig ?
>> 
>> 
>> Regards
>> 
>> Jens M. Kofoed
>> 
>> 
>> 
>> 
>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>> 
>> 
>> Hi Mark
>> 
>> 
>> Thanks for replaying and the suggestion to look at the content Claim.
>> 
>> These 3 pictures is from the first attempt:
>> 
>> <image.png>   <image.png>   <image.png>
>> 
>> 
>> Yesterday I realized that the content was still in the archive, so I could Replay the file.
>> 
>> <image.png>
>> 
>> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>> 
>> <image.png>   <image.png>   <image.png>
>> 
>> 
>> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>> 
>> <image.png>   <image.png>   <image.png>
>> 
>> Here the content Claim is all the same.
>> 
>> 
>> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>> 
>> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>> 
>> What can influence the same processor to calculate 2 different sha256 on the exact same content???
>> 
>> 
>> Regards
>> 
>> Jens M. Kofoed
>> 
>> 
>> 
>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>> 
>> 
>> Jens,
>> 
>> 
>> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>> 
>> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>> 
>> 
>> Thanks
>> 
>> -Mark
>> 
>> 
>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> 
>> Dear NIFI Users
>> 
>> 
>> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>> 
>> The background:
>> 
>> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>> 
>> 
>> ----
>> 
>> 
>> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>> 
>> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>> 
>> 
>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>> 
>> <image.png>
>> 
>> 
>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>> 
>> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>> 
>> 
>> System 1:
>> 
>> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>> 
>> The file is exported as a "FlowFile Stream, v3" to System 2
>> 
>> 
>> SYSTEM 2:
>> 
>> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> 
>> <image.png>
>> 
>> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> 
>> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> 
>> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> 
>> 
>> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>> 
>> <image.png>
>> 
>> 
>> How on earth can this happen???
>> 
>> 
>> Kind Regards
>> 
>> Jens M. Kofoed
>> 
>> 
>> 
>> 
>> <Repro.json>
>> 
>> 
>> <Try_to_recreate_Jens_Challenge.json>
>> 
>> 
>> 
>> 

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

Am 40+ hours in running both your flow and mine to reproduce.  So far
neither have shown any sign of trouble.  Will keep running for another
week or so if I can.

Thanks

On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
> But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.
>
> If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
>
> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs
>
> Kind regards
> Jens
>
>
>
> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
>
> Jens,
>
> I don't quite follow the EXT4 usage on top of VMFS but the point here
> is you'll ultimately need to truly understand your underlying storage
> system and what sorts of guarantees it is giving you.  If linux/the
> jvm/nifi think it has a typical EXT4 type block storage system to work
> with it can only be safe/operate within those constraints.  I have no
> idea about what VMFS brings to the table or the settings for it.
>
> The sync properties I shared previously might help force the issue of
> ensuring a formal sync/flush cycle all the way through the disk has
> occurred which we'd normally not do or need to do but again in some
> cases offers a stronger guarantee in exchange for performance.
>
> In any case...Mark's path for you here will help identify what we're
> dealing with and we can go from there.
>
> I am aware of significant usage of NiFi on VMWare configurations
> without issue at high rates for many years so whatever it is here is
> likely solvable.
>
> Thanks
>
> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
>
> Hi Mark
>
>
> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
>
> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
>
> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
>
>
> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
>
>
> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
>
> Just like the case in step 5 and 12 in the pdf file
>
>
> I will let you all know more later next week
>
>
> Kind regards
>
> Jens
>
>
>
>
> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
>
>
> And the actual script:
>
>
>
> import org.apache.nifi.flowfile.FlowFile
>
>
> import java.util.stream.Collectors
>
>
> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>
>    final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
>
>        .filter({ entry -> entry.getKey().startsWith("histogram.") })
>
>        .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
>
>    return histogram;
>
> }
>
>
> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
>
>    final Map<String, String> histogram = new HashMap<>();
>
>    final int[] distribution = new int[256];
>
>    Arrays.fill(distribution, 0);
>
>
>    long total = 0L;
>
>    final byte[] buffer = new byte[8192];
>
>    int len;
>
>    while ((len = inStream.read(buffer)) > 0) {
>
>        for (int i=0; i < len; i++) {
>
>            final int val = buffer[i];
>
>            distribution[val]++;
>
>            total++;
>
>        }
>
>    }
>
>
>    for (int i=0; i < 256; i++) {
>
>        histogram.put("histogram." + i, String.valueOf(distribution[i]));
>
>    }
>
>    histogram.put("histogram.totalBytes", String.valueOf(total));
>
>
>    return histogram;
>
> }
>
>
> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
>
>    final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
>
>    final Map<String, String> sorted = new TreeMap<>(previous)
>
>    for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>
>        final String key = entry.getKey();
>
>        final String previousValue = entry.getValue();
>
>        final String updatedValue = updated.get(entry.getKey())
>
>
>        if (!Objects.equals(previousValue, updatedValue)) {
>
>            sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
>
>        }
>
>    }
>
>
>    log.error(sb.toString());
>
> }
>
>
>
> def flowFile = session.get()
>
> if (flowFile == null) {
>
>    return
>
> }
>
>
> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
>
> Map<String, String> histogram = null;
>
>
> final InputStream inStream = session.read(flowFile);
>
> try {
>
>    histogram = createHistogram(flowFile, inStream);
>
> } finally {
>
>    inStream.close()
>
> }
>
>
> if (!previousHistogram.isEmpty()) {
>
>    if (previousHistogram.equals(histogram)) {
>
>        log.info("Histograms match")
>
>    } else {
>
>        logHistogramDifferences(previousHistogram, histogram)
>
>        session.transfer(flowFile, REL_FAILURE)
>
>        return;
>
>    }
>
> }
>
>
> flowFile = session.putAllAttributes(flowFile, histogram)
>
> session.transfer(flowFile, REL_SUCCESS)
>
>
>
>
>
>
>
> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
>
>
> Jens,
>
>
> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
>
>
> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
>
>
> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
>
>
> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
>
> histogram.0 = 280273
>
> histogram.1 = 2820
>
> histogram.2 = 48202
>
> histogram.3 = 3820
>
> …
>
> histogram.totalBytes = 1780928732
>
>
> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
>
>
> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
>
>
> Thanks
>
> -Mark
>
>
>
>
> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>
>
> Jens
>
>
> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
>
>
> Thanks
>
>
> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>
>
> Jens
>
>
> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>
>
> Still want to know more about your underlying storage system.
>
>
> You could also try updating nifi.properties and changing the following lines:
>
> nifi.flowfile.repository.always.sync=true
>
> nifi.content.repository.always.sync=true
>
> nifi.provenance.repository.always.sync=true
>
>
> It will hurt performance but can be useful/necessary on certain storage subsystems.
>
>
> Thanks
>
>
> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>
>
> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>
>
> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>
>
> Jens,
>
>
> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>
>
> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>
>
> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>
>
> Thanks
>
> Joe
>
>
> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>
>
> Dear Joe and Mark
>
>
> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>
> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>
> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>
>
> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>
>
> Kind regards
>
> Jens M. Kofoed
>
>
>
> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>
>
> Joe,
>
>
> To start from the last mail :-)
>
> All the repositories has it's own disk, and I'm using ext4
>
> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>
> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>
> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>
>
> My test flow WITH sftp looks like this:
>
> <image.png>
>
> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>
> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>
>
> About the Lineage:
>
> Are there a way to export all the lineage data? The export only generate a svg file.
>
> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>
> The interesting steps are step 5 and 12.
>
>
> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>
>
> Kind regards
>
> Jens M. Kofoed
>
>
>
>
> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>
>
> Jens,
>
>
> Also what type of file system/storage system are you running NiFi on
>
> in this case?  We'll need to know this for the NiFi
>
> content/flowfile/provenance repositories? Is it NFS?
>
>
> Thanks
>
>
> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>
>
> Jens,
>
>
> And to further narrow this down
>
>
> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>
> (2 files per node) and next process was a hashcontent before it run
>
> into a test loop. Where files are uploaded via PutSFTP to a test
>
> server, and downloaded again and recalculated the hash. I have had one
>
> issue after 3 days of running."
>
>
> So to be clear with GenerateFlowFile making these files and then you
>
> looping the content is wholly and fully exclusively within the control
>
> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>
> same files over and over in nifi itself you can make this happen or
>
> cannot?
>
>
> Thanks
>
>
> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>
>
> Jens,
>
>
> "After fetching a FlowFile-stream file and unpacked it back into NiFi
>
> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>
> exact same file. And got a new hash. That is what worry’s me.
>
> The fact that the same file can be recalculated and produce two
>
> different hashes, is very strange, but it happens. "
>
>
> Ok so to confirm you are saying that in each case this happens you see
>
> it first compute the wrong hash, but then if you retry the same
>
> flowfile it then provides the correct hash?
>
>
> Can you please also show/share the lineage history for such a flow
>
> file then?  It should have events for the initial hash, second hash,
>
> the unpacking, trace to the original stream, etc...
>
>
> Thanks
>
>
> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
>
> Dear Mark and Joe
>
>
> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>
> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>
> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>
>
> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>
>
> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>
> Now the test flow is running without the Put/Fetch sftp processors.
>
>
> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>
> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>
>
> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>
>
> Kind Regards
>
> Jens M. Kofoed
>
>
> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>
>
> Jens,
>
>
> Thanks for sharing the images.
>
>
> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>
>
> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>
>
> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>
>
> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>
>
> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>
>
> Thanks
>
> -Mark
>
>
>
>
>
>
> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>
>
> Jens
>
>
> Actually is this current loop test contained within a single nifi and there you see corruption happen?
>
>
> Joe
>
>
> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>
>
> Jens,
>
>
> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>
>
> Joe
>
>
> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
>
> Hi
>
>
> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>
> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>
>
> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>
> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>
>
> Any suggestions for where to dig ?
>
>
> Regards
>
> Jens M. Kofoed
>
>
>
>
> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>
>
> Hi Mark
>
>
> Thanks for replaying and the suggestion to look at the content Claim.
>
> These 3 pictures is from the first attempt:
>
> <image.png>   <image.png>   <image.png>
>
>
> Yesterday I realized that the content was still in the archive, so I could Replay the file.
>
> <image.png>
>
> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>
> <image.png>   <image.png>   <image.png>
>
>
> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>
> <image.png>   <image.png>   <image.png>
>
> Here the content Claim is all the same.
>
>
> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>
> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>
> What can influence the same processor to calculate 2 different sha256 on the exact same content???
>
>
> Regards
>
> Jens M. Kofoed
>
>
>
> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>
>
> Jens,
>
>
> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>
> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>
>
> Thanks
>
> -Mark
>
>
> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>
>
> Dear NIFI Users
>
>
> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>
> The background:
>
> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>
>
> ----
>
>
> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>
> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>
>
> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>
> <image.png>
>
>
> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>
> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>
>
> System 1:
>
> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>
> The file is exported as a "FlowFile Stream, v3" to System 2
>
>
> SYSTEM 2:
>
> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>
> <image.png>
>
> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>
> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>
> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>
>
> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>
> <image.png>
>
>
> How on earth can this happen???
>
>
> Kind Regards
>
> Jens M. Kofoed
>
>
>
>
> <Repro.json>
>
>
> <Try_to_recreate_Jens_Challenge.json>
>
>
>
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
The Physical hosts with VMWare is using the vmfs but the vm machines running at hosts can’t see that.
But you asked about the underlying file system 😀 and since my first answer with the copy from the fstab file wasn’t enough I just wanted to give all the details 😁.

If you create a vm for windows you would probably use NTFS (on top of vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.

All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and sdd) for each Linux machine. I don’t use nfs

Kind regards 
Jens 



> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <jo...@gmail.com>:
> 
> Jens,
> 
> I don't quite follow the EXT4 usage on top of VMFS but the point here
> is you'll ultimately need to truly understand your underlying storage
> system and what sorts of guarantees it is giving you.  If linux/the
> jvm/nifi think it has a typical EXT4 type block storage system to work
> with it can only be safe/operate within those constraints.  I have no
> idea about what VMFS brings to the table or the settings for it.
> 
> The sync properties I shared previously might help force the issue of
> ensuring a formal sync/flush cycle all the way through the disk has
> occurred which we'd normally not do or need to do but again in some
> cases offers a stronger guarantee in exchange for performance.
> 
> In any case...Mark's path for you here will help identify what we're
> dealing with and we can go from there.
> 
> I am aware of significant usage of NiFi on VMWare configurations
> without issue at high rates for many years so whatever it is here is
> likely solvable.
> 
> Thanks
> 
>> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> Hi Mark
>> 
>> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
>> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
>> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
>> 
>> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
>> 
>> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
>> Just like the case in step 5 and 12 in the pdf file
>> 
>> I will let you all know more later next week
>> 
>> Kind regards
>> Jens
>> 
>> 
>> 
>> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
>> 
>> And the actual script:
>> 
>> 
>> import org.apache.nifi.flowfile.FlowFile
>> 
>> import java.util.stream.Collectors
>> 
>> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>>    final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
>>        .filter({ entry -> entry.getKey().startsWith("histogram.") })
>>        .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
>>    return histogram;
>> }
>> 
>> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
>>    final Map<String, String> histogram = new HashMap<>();
>>    final int[] distribution = new int[256];
>>    Arrays.fill(distribution, 0);
>> 
>>    long total = 0L;
>>    final byte[] buffer = new byte[8192];
>>    int len;
>>    while ((len = inStream.read(buffer)) > 0) {
>>        for (int i=0; i < len; i++) {
>>            final int val = buffer[i];
>>            distribution[val]++;
>>            total++;
>>        }
>>    }
>> 
>>    for (int i=0; i < 256; i++) {
>>        histogram.put("histogram." + i, String.valueOf(distribution[i]));
>>    }
>>    histogram.put("histogram.totalBytes", String.valueOf(total));
>> 
>>    return histogram;
>> }
>> 
>> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
>>    final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
>>    final Map<String, String> sorted = new TreeMap<>(previous)
>>    for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>>        final String key = entry.getKey();
>>        final String previousValue = entry.getValue();
>>        final String updatedValue = updated.get(entry.getKey())
>> 
>>        if (!Objects.equals(previousValue, updatedValue)) {
>>            sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
>>        }
>>    }
>> 
>>    log.error(sb.toString());
>> }
>> 
>> 
>> def flowFile = session.get()
>> if (flowFile == null) {
>>    return
>> }
>> 
>> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
>> Map<String, String> histogram = null;
>> 
>> final InputStream inStream = session.read(flowFile);
>> try {
>>    histogram = createHistogram(flowFile, inStream);
>> } finally {
>>    inStream.close()
>> }
>> 
>> if (!previousHistogram.isEmpty()) {
>>    if (previousHistogram.equals(histogram)) {
>>        log.info("Histograms match")
>>    } else {
>>        logHistogramDifferences(previousHistogram, histogram)
>>        session.transfer(flowFile, REL_FAILURE)
>>        return;
>>    }
>> }
>> 
>> flowFile = session.putAllAttributes(flowFile, histogram)
>> session.transfer(flowFile, REL_SUCCESS)
>> 
>> 
>> 
>> 
>> 
>> 
>> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
>> 
>> Jens,
>> 
>> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
>> 
>> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
>> 
>> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
>> 
>> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
>> histogram.0 = 280273
>> histogram.1 = 2820
>> histogram.2 = 48202
>> histogram.3 = 3820
>> …
>> histogram.totalBytes = 1780928732
>> 
>> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
>> 
>> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>> 
>> Jens
>> 
>> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
>> 
>> Thanks
>> 
>>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>>> 
>>> Jens
>>> 
>>> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>>> 
>>> Still want to know more about your underlying storage system.
>>> 
>>> You could also try updating nifi.properties and changing the following lines:
>>> nifi.flowfile.repository.always.sync=true
>>> nifi.content.repository.always.sync=true
>>> nifi.provenance.repository.always.sync=true
>>> 
>>> It will hurt performance but can be useful/necessary on certain storage subsystems.
>>> 
>>> Thanks
>>> 
>>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>>>> 
>>>> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>>> 
>>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>>>>> 
>>>>> Jens,
>>>>> 
>>>>> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>>>>> 
>>>>> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>>>>> 
>>>>> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>>>>> 
>>>>> Thanks
>>>>> Joe
>>>>> 
>>>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>> 
>>>>>> Dear Joe and Mark
>>>>>> 
>>>>>> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>>>>>> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>>>>>> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>>>>>> 
>>>>>> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>>>>>> 
>>>>>> Kind regards
>>>>>> Jens M. Kofoed
>>>>>> 
>>>>>> 
>>>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>> 
>>>>>>> Joe,
>>>>>>> 
>>>>>>> To start from the last mail :-)
>>>>>>> All the repositories has it's own disk, and I'm using ext4
>>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>>>> 
>>>>>>> My test flow WITH sftp looks like this:
>>>>>>> <image.png>
>>>>>>> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>>>>>> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>>>>>>> 
>>>>>>> About the Lineage:
>>>>>>> Are there a way to export all the lineage data? The export only generate a svg file.
>>>>>>> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>>>>>>> The interesting steps are step 5 and 12.
>>>>>>> 
>>>>>>> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>>>>>>> 
>>>>>>> Kind regards
>>>>>>> Jens M. Kofoed
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> Also what type of file system/storage system are you running NiFi on
>>>>>>>> in this case?  We'll need to know this for the NiFi
>>>>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> And to further narrow this down
>>>>>>>>> 
>>>>>>>>> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>>>>>>>> (2 files per node) and next process was a hashcontent before it run
>>>>>>>>> into a test loop. Where files are uploaded via PutSFTP to a test
>>>>>>>>> server, and downloaded again and recalculated the hash. I have had one
>>>>>>>>> issue after 3 days of running."
>>>>>>>>> 
>>>>>>>>> So to be clear with GenerateFlowFile making these files and then you
>>>>>>>>> looping the content is wholly and fully exclusively within the control
>>>>>>>>> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>>>>>>> same files over and over in nifi itself you can make this happen or
>>>>>>>>> cannot?
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>>> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Jens,
>>>>>>>>>> 
>>>>>>>>>> "After fetching a FlowFile-stream file and unpacked it back into NiFi
>>>>>>>>>> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>>>>>>>> exact same file. And got a new hash. That is what worry’s me.
>>>>>>>>>> The fact that the same file can be recalculated and produce two
>>>>>>>>>> different hashes, is very strange, but it happens. "
>>>>>>>>>> 
>>>>>>>>>> Ok so to confirm you are saying that in each case this happens you see
>>>>>>>>>> it first compute the wrong hash, but then if you retry the same
>>>>>>>>>> flowfile it then provides the correct hash?
>>>>>>>>>> 
>>>>>>>>>> Can you please also show/share the lineage history for such a flow
>>>>>>>>>> file then?  It should have events for the initial hash, second hash,
>>>>>>>>>> the unpacking, trace to the original stream, etc...
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Dear Mark and Joe
>>>>>>>>>>> 
>>>>>>>>>>> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>>>>>>>>> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>>>>>>>>>>> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>>>>>>>>>>> 
>>>>>>>>>>> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>>>>>>>>>>> 
>>>>>>>>>>> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>>>>>>>>>>> Now the test flow is running without the Put/Fetch sftp processors.
>>>>>>>>>>> 
>>>>>>>>>>> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>>>>>>>>>>> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>>>>>>>>>>> 
>>>>>>>>>>> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>>>>>>>>>>> 
>>>>>>>>>>> Kind Regards
>>>>>>>>>>> Jens M. Kofoed
>>>>>>>>>>> 
>>>>>>>>>>> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>>>>> 
>>>>>>>>>>> Jens,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for sharing the images.
>>>>>>>>>>> 
>>>>>>>>>>> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>>>>>>>>>>> 
>>>>>>>>>>> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>>>>>>>>>>> 
>>>>>>>>>>> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>>>>>>>>>>> 
>>>>>>>>>>> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>>>>>>>>>>> 
>>>>>>>>>>> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> -Mark
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Jens
>>>>>>>>>>> 
>>>>>>>>>>> Actually is this current loop test contained within a single nifi and there you see corruption happen?
>>>>>>>>>>> 
>>>>>>>>>>> Joe
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Jens,
>>>>>>>>>>> 
>>>>>>>>>>> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>>>>>>>>>>> 
>>>>>>>>>>> Joe
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi
>>>>>>>>>>> 
>>>>>>>>>>> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>>>>>>>>>>> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>>>>>>>>>>> 
>>>>>>>>>>> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>>>>>>>>>>> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>>>>>>>>>>> 
>>>>>>>>>>> Any suggestions for where to dig ?
>>>>>>>>>>> 
>>>>>>>>>>> Regards
>>>>>>>>>>> Jens M. Kofoed
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Mark
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for replaying and the suggestion to look at the content Claim.
>>>>>>>>>>> These 3 pictures is from the first attempt:
>>>>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>>>>> 
>>>>>>>>>>> Yesterday I realized that the content was still in the archive, so I could Replay the file.
>>>>>>>>>>> <image.png>
>>>>>>>>>>> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>>>>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>>>>> 
>>>>>>>>>>> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>>>>>>>>>>> <image.png>   <image.png>   <image.png>
>>>>>>>>>>> Here the content Claim is all the same.
>>>>>>>>>>> 
>>>>>>>>>>> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>>>>>>>>>>> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>>>>>>>>>>> What can influence the same processor to calculate 2 different sha256 on the exact same content???
>>>>>>>>>>> 
>>>>>>>>>>> Regards
>>>>>>>>>>> Jens M. Kofoed
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>>>>> 
>>>>>>>>>>> Jens,
>>>>>>>>>>> 
>>>>>>>>>>> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>>>>>>>>>>> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> -Mark
>>>>>>>>>>> 
>>>>>>>>>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Dear NIFI Users
>>>>>>>>>>> 
>>>>>>>>>>> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>>>>>>>>>>> The background:
>>>>>>>>>>> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>>>>>>>>>>> 
>>>>>>>>>>> ----
>>>>>>>>>>> 
>>>>>>>>>>> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>>>>>>>>>>> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>>>>>>>>>>> 
>>>>>>>>>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>>>>>>>>>>> <image.png>
>>>>>>>>>>> 
>>>>>>>>>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>>>>>>>>>> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>>>>>>>>>>> 
>>>>>>>>>>> System 1:
>>>>>>>>>>> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>>>>>> The file is exported as a "FlowFile Stream, v3" to System 2
>>>>>>>>>>> 
>>>>>>>>>>> SYSTEM 2:
>>>>>>>>>>> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>>>> <image.png>
>>>>>>>>>>> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>>>> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>>>> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>>>> 
>>>>>>>>>>> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>>>>>> <image.png>
>>>>>>>>>>> 
>>>>>>>>>>> How on earth can this happen???
>>>>>>>>>>> 
>>>>>>>>>>> Kind Regards
>>>>>>>>>>> Jens M. Kofoed
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> <Repro.json>
>> 
>> <Try_to_recreate_Jens_Challenge.json>
>> 
>> 
>> 

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

I don't quite follow the EXT4 usage on top of VMFS but the point here
is you'll ultimately need to truly understand your underlying storage
system and what sorts of guarantees it is giving you.  If linux/the
jvm/nifi think it has a typical EXT4 type block storage system to work
with it can only be safe/operate within those constraints.  I have no
idea about what VMFS brings to the table or the settings for it.

The sync properties I shared previously might help force the issue of
ensuring a formal sync/flush cycle all the way through the disk has
occurred which we'd normally not do or need to do but again in some
cases offers a stronger guarantee in exchange for performance.

In any case...Mark's path for you here will help identify what we're
dealing with and we can go from there.

I am aware of significant usage of NiFi on VMWare configurations
without issue at high rates for many years so whatever it is here is
likely solvable.

Thanks

On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Hi Mark
>
> Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st).
> I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
> I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing.
>
> Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues.
>
> In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK.
> Just like the case in step 5 and 12 in the pdf file
>
> I will let you all know more later next week
>
> Kind regards
> Jens
>
>
>
> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
>
> And the actual script:
>
>
> import org.apache.nifi.flowfile.FlowFile
>
> import java.util.stream.Collectors
>
> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>     final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
>         .filter({ entry -> entry.getKey().startsWith("histogram.") })
>         .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
>     return histogram;
> }
>
> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
>     final Map<String, String> histogram = new HashMap<>();
>     final int[] distribution = new int[256];
>     Arrays.fill(distribution, 0);
>
>     long total = 0L;
>     final byte[] buffer = new byte[8192];
>     int len;
>     while ((len = inStream.read(buffer)) > 0) {
>         for (int i=0; i < len; i++) {
>             final int val = buffer[i];
>             distribution[val]++;
>             total++;
>         }
>     }
>
>     for (int i=0; i < 256; i++) {
>         histogram.put("histogram." + i, String.valueOf(distribution[i]));
>     }
>     histogram.put("histogram.totalBytes", String.valueOf(total));
>
>     return histogram;
> }
>
> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
>     final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
>     final Map<String, String> sorted = new TreeMap<>(previous)
>     for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>         final String key = entry.getKey();
>         final String previousValue = entry.getValue();
>         final String updatedValue = updated.get(entry.getKey())
>
>         if (!Objects.equals(previousValue, updatedValue)) {
>             sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
>         }
>     }
>
>     log.error(sb.toString());
> }
>
>
> def flowFile = session.get()
> if (flowFile == null) {
>     return
> }
>
> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
> Map<String, String> histogram = null;
>
> final InputStream inStream = session.read(flowFile);
> try {
>     histogram = createHistogram(flowFile, inStream);
> } finally {
>     inStream.close()
> }
>
> if (!previousHistogram.isEmpty()) {
>     if (previousHistogram.equals(histogram)) {
>         log.info("Histograms match")
>     } else {
>         logHistogramDifferences(previousHistogram, histogram)
>         session.transfer(flowFile, REL_FAILURE)
>         return;
>     }
> }
>
> flowFile = session.putAllAttributes(flowFile, histogram)
> session.transfer(flowFile, REL_SUCCESS)
>
>
>
>
>
>
> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
>
> Jens,
>
> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
>
> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
>
> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
>
> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
> histogram.0 = 280273
> histogram.1 = 2820
> histogram.2 = 48202
> histogram.3 = 3820
> …
> histogram.totalBytes = 1780928732
>
> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
>
> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
>
> Thanks
> -Mark
>
>
>
> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>
> Jens
>
> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
>
> Thanks
>
> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>>
>> Jens
>>
>> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>>
>> Still want to know more about your underlying storage system.
>>
>> You could also try updating nifi.properties and changing the following lines:
>> nifi.flowfile.repository.always.sync=true
>> nifi.content.repository.always.sync=true
>> nifi.provenance.repository.always.sync=true
>>
>> It will hurt performance but can be useful/necessary on certain storage subsystems.
>>
>> Thanks
>>
>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>>>
>>> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>>
>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>>>>
>>>> Jens,
>>>>
>>>> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>>>>
>>>> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>>>>
>>>> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>
>>>>> Dear Joe and Mark
>>>>>
>>>>> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>>>>> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>>>>> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>>>>>
>>>>> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>>>>>
>>>>> Kind regards
>>>>> Jens M. Kofoed
>>>>>
>>>>>
>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>
>>>>>> Joe,
>>>>>>
>>>>>> To start from the last mail :-)
>>>>>> All the repositories has it's own disk, and I'm using ext4
>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>>>
>>>>>> My test flow WITH sftp looks like this:
>>>>>> <image.png>
>>>>>> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>>>>> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>>>>>>
>>>>>> About the Lineage:
>>>>>> Are there a way to export all the lineage data? The export only generate a svg file.
>>>>>> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>>>>>> The interesting steps are step 5 and 12.
>>>>>>
>>>>>> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>>>>>>
>>>>>> Kind regards
>>>>>> Jens M. Kofoed
>>>>>>
>>>>>>
>>>>>>
>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>>>>>
>>>>>>> Jens,
>>>>>>>
>>>>>>> Also what type of file system/storage system are you running NiFi on
>>>>>>> in this case?  We'll need to know this for the NiFi
>>>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>> >
>>>>>>> > Jens,
>>>>>>> >
>>>>>>> > And to further narrow this down
>>>>>>> >
>>>>>>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>>>>>> > (2 files per node) and next process was a hashcontent before it run
>>>>>>> > into a test loop. Where files are uploaded via PutSFTP to a test
>>>>>>> > server, and downloaded again and recalculated the hash. I have had one
>>>>>>> > issue after 3 days of running."
>>>>>>> >
>>>>>>> > So to be clear with GenerateFlowFile making these files and then you
>>>>>>> > looping the content is wholly and fully exclusively within the control
>>>>>>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>>>>> > same files over and over in nifi itself you can make this happen or
>>>>>>> > cannot?
>>>>>>> >
>>>>>>> > Thanks
>>>>>>> >
>>>>>>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>> > >
>>>>>>> > > Jens,
>>>>>>> > >
>>>>>>> > > "After fetching a FlowFile-stream file and unpacked it back into NiFi
>>>>>>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>>>>> > > exact same file. And got a new hash. That is what worry’s me.
>>>>>>> > > The fact that the same file can be recalculated and produce two
>>>>>>> > > different hashes, is very strange, but it happens. "
>>>>>>> > >
>>>>>>> > > Ok so to confirm you are saying that in each case this happens you see
>>>>>>> > > it first compute the wrong hash, but then if you retry the same
>>>>>>> > > flowfile it then provides the correct hash?
>>>>>>> > >
>>>>>>> > > Can you please also show/share the lineage history for such a flow
>>>>>>> > > file then?  It should have events for the initial hash, second hash,
>>>>>>> > > the unpacking, trace to the original stream, etc...
>>>>>>> > >
>>>>>>> > > Thanks
>>>>>>> > >
>>>>>>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Dear Mark and Joe
>>>>>>> > > >
>>>>>>> > > > I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>>>>> > > > After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>>>>>>> > > > The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>>>>>>> > > >
>>>>>>> > > > I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>>>>>>> > > >
>>>>>>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>>>>>>> > > > Now the test flow is running without the Put/Fetch sftp processors.
>>>>>>> > > >
>>>>>>> > > > Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>>>>>>> > > > I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>>>>>>> > > >
>>>>>>> > > > I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>>>>>>> > > >
>>>>>>> > > > Kind Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > Thanks for sharing the images.
>>>>>>> > > >
>>>>>>> > > > I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>>>>>>> > > >
>>>>>>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>>>>>>> > > >
>>>>>>> > > > So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>>>>>>> > > >
>>>>>>> > > > Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>>>>>>> > > >
>>>>>>> > > > This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>>>>>>> > > >
>>>>>>> > > > Thanks
>>>>>>> > > > -Mark
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Jens
>>>>>>> > > >
>>>>>>> > > > Actually is this current loop test contained within a single nifi and there you see corruption happen?
>>>>>>> > > >
>>>>>>> > > > Joe
>>>>>>> > > >
>>>>>>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>>>>>>> > > >
>>>>>>> > > > Joe
>>>>>>> > > >
>>>>>>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Hi
>>>>>>> > > >
>>>>>>> > > > Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>>>>>>> > > > I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>>>>>>> > > >
>>>>>>> > > > I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>>>>>>> > > > All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>>>>>>> > > >
>>>>>>> > > > Any suggestions for where to dig ?
>>>>>>> > > >
>>>>>>> > > > Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>> > > >
>>>>>>> > > > Hi Mark
>>>>>>> > > >
>>>>>>> > > > Thanks for replaying and the suggestion to look at the content Claim.
>>>>>>> > > > These 3 pictures is from the first attempt:
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > >
>>>>>>> > > > Yesterday I realized that the content was still in the archive, so I could Replay the file.
>>>>>>> > > > <image.png>
>>>>>>> > > > So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > >
>>>>>>> > > > In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > > Here the content Claim is all the same.
>>>>>>> > > >
>>>>>>> > > > It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>>>>>>> > > > This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>>>>>>> > > > What can influence the same processor to calculate 2 different sha256 on the exact same content???
>>>>>>> > > >
>>>>>>> > > > Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>>>>>>> > > > If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>>>>>>> > > >
>>>>>>> > > > Thanks
>>>>>>> > > > -Mark
>>>>>>> > > >
>>>>>>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Dear NIFI Users
>>>>>>> > > >
>>>>>>> > > > I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>>>>>>> > > > The background:
>>>>>>> > > > We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>>>>>>> > > >
>>>>>>> > > > ----
>>>>>>> > > >
>>>>>>> > > > Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>>>>>>> > > > I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>>>>>>> > > >
>>>>>>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>>>>>>> > > > <image.png>
>>>>>>> > > >
>>>>>>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>>>>>> > > > I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>>>>>>> > > >
>>>>>>> > > > System 1:
>>>>>>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>>>>>>> > > >
>>>>>>> > > > SYSTEM 2:
>>>>>>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > <image.png>
>>>>>>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > >
>>>>>>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>> > > > <image.png>
>>>>>>> > > >
>>>>>>> > > > How on earth can this happen???
>>>>>>> > > >
>>>>>>> > > > Kind Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > <Repro.json>
>
> <Try_to_recreate_Jens_Challenge.json>
>
>
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Mark

Thanks for the clarification. I will implement the script when I return to the office at Monday next week ( November 1st). 
I don’t use NFS, but ext4. But I will implement the script so we can check if it’s the case here. But I think the issue might be after the processors writing content to the repository.
I have a test flow running for more than 2 weeks without any errors. But this flow only calculate hash and comparing. 

Two other flows both create errors. One flow use PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use MergeContent->UnpackContent->CryptographicHashContent->compares. The last flow is totally inside nifi, excluding other network/server issues. 

In both cases the CryptographicHashContent is right after a process which writes new content to the repository. But in one case a file in our production flow did calculate a wrong hash 4 times with a 1 minutes delay between each calculation. A few hours later I looped the file back and this time it was OK. 
Just like the case in step 5 and 12 in the pdf file

I will let you all know more later next week 

Kind regards 
Jens



> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <ma...@hotmail.com>:
> 
> And the actual script:
> 
> 
> import org.apache.nifi.flowfile.FlowFile
> 
> import java.util.stream.Collectors
> 
> Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
>     final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
>         .filter({ entry -> entry.getKey().startsWith("histogram.") })
>         .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
>     return histogram;
> }
> 
> Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
>     final Map<String, String> histogram = new HashMap<>();
>     final int[] distribution = new int[256];
>     Arrays.fill(distribution, 0);
> 
>     long total = 0L;
>     final byte[] buffer = new byte[8192];
>     int len;
>     while ((len = inStream.read(buffer)) > 0) {
>         for (int i=0; i < len; i++) {
>             final int val = buffer[i];
>             distribution[val]++;
>             total++;
>         }
>     }
> 
>     for (int i=0; i < 256; i++) {
>         histogram.put("histogram." + i, String.valueOf(distribution[i]));
>     }
>     histogram.put("histogram.totalBytes", String.valueOf(total));
> 
>     return histogram;
> }
> 
> void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
>     final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
>     final Map<String, String> sorted = new TreeMap<>(previous)
>     for (final Map.Entry<String, String> entry : sorted.entrySet()) {
>         final String key = entry.getKey();
>         final String previousValue = entry.getValue();
>         final String updatedValue = updated.get(entry.getKey())
> 
>         if (!Objects.equals(previousValue, updatedValue)) {
>             sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
>         }
>     }
> 
>     log.error(sb.toString());
> }
> 
> 
> def flowFile = session.get()
> if (flowFile == null) {
>     return
> }
> 
> final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
> Map<String, String> histogram = null;
> 
> final InputStream inStream = session.read(flowFile);
> try {
>     histogram = createHistogram(flowFile, inStream);
> } finally {
>     inStream.close()
> }
> 
> if (!previousHistogram.isEmpty()) {
>     if (previousHistogram.equals(histogram)) {
>         log.info("Histograms match")
>     } else {
>         logHistogramDifferences(previousHistogram, histogram)
>         session.transfer(flowFile, REL_FAILURE)
>         return;
>     }
> }
> 
> flowFile = session.putAllAttributes(flowFile, histogram)
> session.transfer(flowFile, REL_SUCCESS)
> 
> 
> 
> 
> 
>> On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com> wrote:
>> 
>> Jens,
>> 
>> For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.
>> 
>> So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.
>> 
>> Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.
>> 
>> The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
>> histogram.0 = 280273
>> histogram.1 = 2820
>> histogram.2 = 48202
>> histogram.3 = 3820
>> …
>> histogram.totalBytes = 1780928732
>> 
>> It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.
>> 
>> So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>>> On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com> wrote:
>>> 
>>> Jens
>>> 
>>> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
>>> 
>>> Thanks
>>> 
>>>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>>>> Jens
>>>> 
>>>> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>>>> 
>>>> Still want to know more about your underlying storage system.
>>>> 
>>>> You could also try updating nifi.properties and changing the following lines:
>>>> nifi.flowfile.repository.always.sync=true
>>>> nifi.content.repository.always.sync=true
>>>> nifi.provenance.repository.always.sync=true
>>>> 
>>>> It will hurt performance but can be useful/necessary on certain storage subsystems.
>>>> 
>>>> Thanks
>>>> 
>>>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>>>>> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>>>> 
>>>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>>>>>> Jens,
>>>>>> 
>>>>>> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>>>>>> 
>>>>>> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>>>>>> 
>>>>>> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>>>>>> 
>>>>>> Thanks
>>>>>> Joe
>>>>>> 
>>>>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>> Dear Joe and Mark
>>>>>>> 
>>>>>>> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>>>>>>> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>>>>>>> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>>>>>>> 
>>>>>>> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>>>>>>> 
>>>>>>> Kind regards
>>>>>>> Jens M. Kofoed
>>>>>>> 
>>>>>>> 
>>>>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>>> Joe,
>>>>>>>> 
>>>>>>>> To start from the last mail :-)
>>>>>>>> All the repositories has it's own disk, and I'm using ext4
>>>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>>>>> 
>>>>>>>> My test flow WITH sftp looks like this:
>>>>>>>> <image.png>
>>>>>>>> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>>>>>>> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>>>>>>>> 
>>>>>>>> About the Lineage:
>>>>>>>> Are there a way to export all the lineage data? The export only generate a svg file.
>>>>>>>> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>>>>>>>> The interesting steps are step 5 and 12.
>>>>>>>> 
>>>>>>>> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> Jens M. Kofoed
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>>>>>>> Jens,
>>>>>>>>> 
>>>>>>>>> Also what type of file system/storage system are you running NiFi on
>>>>>>>>> in this case?  We'll need to know this for the NiFi
>>>>>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Jens,
>>>>>>>>> >
>>>>>>>>> > And to further narrow this down
>>>>>>>>> >
>>>>>>>>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>>>>>>>> > (2 files per node) and next process was a hashcontent before it run
>>>>>>>>> > into a test loop. Where files are uploaded via PutSFTP to a test
>>>>>>>>> > server, and downloaded again and recalculated the hash. I have had one
>>>>>>>>> > issue after 3 days of running."
>>>>>>>>> >
>>>>>>>>> > So to be clear with GenerateFlowFile making these files and then you
>>>>>>>>> > looping the content is wholly and fully exclusively within the control
>>>>>>>>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>>>>>>> > same files over and over in nifi itself you can make this happen or
>>>>>>>>> > cannot?
>>>>>>>>> >
>>>>>>>>> > Thanks
>>>>>>>>> >
>>>>>>>>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> > >
>>>>>>>>> > > Jens,
>>>>>>>>> > >
>>>>>>>>> > > "After fetching a FlowFile-stream file and unpacked it back into NiFi
>>>>>>>>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>>>>>>> > > exact same file. And got a new hash. That is what worry’s me.
>>>>>>>>> > > The fact that the same file can be recalculated and produce two
>>>>>>>>> > > different hashes, is very strange, but it happens. "
>>>>>>>>> > >
>>>>>>>>> > > Ok so to confirm you are saying that in each case this happens you see
>>>>>>>>> > > it first compute the wrong hash, but then if you retry the same
>>>>>>>>> > > flowfile it then provides the correct hash?
>>>>>>>>> > >
>>>>>>>>> > > Can you please also show/share the lineage history for such a flow
>>>>>>>>> > > file then?  It should have events for the initial hash, second hash,
>>>>>>>>> > > the unpacking, trace to the original stream, etc...
>>>>>>>>> > >
>>>>>>>>> > > Thanks
>>>>>>>>> > >
>>>>>>>>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> > > >
>>>>>>>>> > > > Dear Mark and Joe
>>>>>>>>> > > >
>>>>>>>>> > > > I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>>>>>>> > > > After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>>>>>>>>> > > > The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>>>>>>>>> > > >
>>>>>>>>> > > > I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>>>>>>>>> > > >
>>>>>>>>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>>>>>>>>> > > > Now the test flow is running without the Put/Fetch sftp processors.
>>>>>>>>> > > >
>>>>>>>>> > > > Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>>>>>>>>> > > > I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>>>>>>>>> > > >
>>>>>>>>> > > > I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>>>>>>>>> > > >
>>>>>>>>> > > > Kind Regards
>>>>>>>>> > > > Jens M. Kofoed
>>>>>>>>> > > >
>>>>>>>>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>>> > > >
>>>>>>>>> > > > Jens,
>>>>>>>>> > > >
>>>>>>>>> > > > Thanks for sharing the images.
>>>>>>>>> > > >
>>>>>>>>> > > > I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>>>>>>>>> > > >
>>>>>>>>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>>>>>>>>> > > >
>>>>>>>>> > > > So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>>>>>>>>> > > >
>>>>>>>>> > > > Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>>>>>>>>> > > >
>>>>>>>>> > > > This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>>>>>>>>> > > >
>>>>>>>>> > > > Thanks
>>>>>>>>> > > > -Mark
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> > > >
>>>>>>>>> > > > Jens
>>>>>>>>> > > >
>>>>>>>>> > > > Actually is this current loop test contained within a single nifi and there you see corruption happen?
>>>>>>>>> > > >
>>>>>>>>> > > > Joe
>>>>>>>>> > > >
>>>>>>>>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>>>> > > >
>>>>>>>>> > > > Jens,
>>>>>>>>> > > >
>>>>>>>>> > > > You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>>>>>>>>> > > >
>>>>>>>>> > > > Joe
>>>>>>>>> > > >
>>>>>>>>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> > > >
>>>>>>>>> > > > Hi
>>>>>>>>> > > >
>>>>>>>>> > > > Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>>>>>>>>> > > > I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>>>>>>>>> > > >
>>>>>>>>> > > > I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>>>>>>>>> > > > All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>>>>>>>>> > > >
>>>>>>>>> > > > Any suggestions for where to dig ?
>>>>>>>>> > > >
>>>>>>>>> > > > Regards
>>>>>>>>> > > > Jens M. Kofoed
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>>>> > > >
>>>>>>>>> > > > Hi Mark
>>>>>>>>> > > >
>>>>>>>>> > > > Thanks for replaying and the suggestion to look at the content Claim.
>>>>>>>>> > > > These 3 pictures is from the first attempt:
>>>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>>>> > > >
>>>>>>>>> > > > Yesterday I realized that the content was still in the archive, so I could Replay the file.
>>>>>>>>> > > > <image.png>
>>>>>>>>> > > > So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>>>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>>>> > > >
>>>>>>>>> > > > In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>>>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>>>> > > > Here the content Claim is all the same.
>>>>>>>>> > > >
>>>>>>>>> > > > It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>>>>>>>>> > > > This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>>>>>>>>> > > > What can influence the same processor to calculate 2 different sha256 on the exact same content???
>>>>>>>>> > > >
>>>>>>>>> > > > Regards
>>>>>>>>> > > > Jens M. Kofoed
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>>>> > > >
>>>>>>>>> > > > Jens,
>>>>>>>>> > > >
>>>>>>>>> > > > In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>>>>>>>>> > > > If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>>>>>>>>> > > >
>>>>>>>>> > > > Thanks
>>>>>>>>> > > > -Mark
>>>>>>>>> > > >
>>>>>>>>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>>>> > > >
>>>>>>>>> > > > Dear NIFI Users
>>>>>>>>> > > >
>>>>>>>>> > > > I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>>>>>>>>> > > > The background:
>>>>>>>>> > > > We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>>>>>>>>> > > >
>>>>>>>>> > > > ----
>>>>>>>>> > > >
>>>>>>>>> > > > Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>>>>>>>>> > > > I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>>>>>>>>> > > >
>>>>>>>>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>>>>>>>>> > > > <image.png>
>>>>>>>>> > > >
>>>>>>>>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>>>>>>>> > > > I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>>>>>>>>> > > >
>>>>>>>>> > > > System 1:
>>>>>>>>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>>>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>>>>>>>>> > > >
>>>>>>>>> > > > SYSTEM 2:
>>>>>>>>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>> > > > <image.png>
>>>>>>>>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>>>> > > >
>>>>>>>>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>>>> > > > <image.png>
>>>>>>>>> > > >
>>>>>>>>> > > > How on earth can this happen???
>>>>>>>>> > > >
>>>>>>>>> > > > Kind Regards
>>>>>>>>> > > > Jens M. Kofoed
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > <Repro.json>
>>> <Try_to_recreate_Jens_Challenge.json>
>> 
> 

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Mark Payne <ma...@hotmail.com>.
And the actual script:



import org.apache.nifi.flowfile.FlowFile

import java.util.stream.Collectors

Map<String, String> getPreviousHistogram(final FlowFile flowFile) {
    final Map<String, String> histogram = flowFile.getAttributes().entrySet().stream()
        .filter({ entry -> entry.getKey().startsWith("histogram.") })
        .collect(Collectors.toMap({ entry -> entry.key}, { entry -> entry.value }))
    return histogram;
}

Map<String, String> createHistogram(final FlowFile flowFile, final InputStream inStream) {
    final Map<String, String> histogram = new HashMap<>();
    final int[] distribution = new int[256];
    Arrays.fill(distribution, 0);

    long total = 0L;
    final byte[] buffer = new byte[8192];
    int len;
    while ((len = inStream.read(buffer)) > 0) {
        for (int i=0; i < len; i++) {
            final int val = buffer[i];
            distribution[val]++;
            total++;
        }
    }

    for (int i=0; i < 256; i++) {
        histogram.put("histogram." + i, String.valueOf(distribution[i]));
    }
    histogram.put("histogram.totalBytes", String.valueOf(total));

    return histogram;
}

void logHistogramDifferences(final Map<String, String> previous, final Map<String, String> updated) {
    final StringBuilder sb = new StringBuilder("There are differences in the histogram\n");
    final Map<String, String> sorted = new TreeMap<>(previous)
    for (final Map.Entry<String, String> entry : sorted.entrySet()) {
        final String key = entry.getKey();
        final String previousValue = entry.getValue();
        final String updatedValue = updated.get(entry.getKey())

        if (!Objects.equals(previousValue, updatedValue)) {
            sb.append("Byte Value: ").append(key).append(", Previous Count: ").append(previousValue).append(", New Count: ").append(updatedValue).append("\n");
        }
    }

    log.error(sb.toString());
}


def flowFile = session.get()
if (flowFile == null) {
    return
}

final Map<String, String> previousHistogram = getPreviousHistogram(flowFile)
Map<String, String> histogram = null;

final InputStream inStream = session.read(flowFile);
try {
    histogram = createHistogram(flowFile, inStream);
} finally {
    inStream.close()
}

if (!previousHistogram.isEmpty()) {
    if (previousHistogram.equals(histogram)) {
        log.info<http://log.info>("Histograms match")
    } else {
        logHistogramDifferences(previousHistogram, histogram)
        session.transfer(flowFile, REL_FAILURE)
        return;
    }
}

flowFile = session.putAllAttributes(flowFile, histogram)
session.transfer(flowFile, REL_SUCCESS)





On Oct 27, 2021, at 9:43 AM, Mark Payne <ma...@hotmail.com>> wrote:

Jens,

For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.

So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.

Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.

The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
histogram.0 = 280273
histogram.1 = 2820
histogram.2 = 48202
histogram.3 = 3820
…
histogram.totalBytes = 1780928732

It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.

So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.

Thanks
-Mark



On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com>> wrote:

Jens

Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.

Thanks

On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com>> wrote:
Jens

I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...

Still want to know more about your underlying storage system.

You could also try updating nifi.properties and changing the following lines:
nifi.flowfile.repository.always.sync=true
nifi.content.repository.always.sync=true
nifi.provenance.repository.always.sync=true

It will hurt performance but can be useful/necessary on certain storage subsystems.

Thanks

On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com>> wrote:
Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON

On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com>> wrote:
Jens,

We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?

I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.

For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.

Thanks
Joe

On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com>> wrote:
Dear Joe and Mark

I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???

Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.

Kind regards
Jens M. Kofoed


Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>>:
Joe,

To start from the last mail :-)
All the repositories has it's own disk, and I'm using ext4
/dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
/dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
/dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0

My test flow WITH sftp looks like this:
<image.png>
And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.

About the Lineage:
Are there a way to export all the lineage data? The export only generate a svg file.
This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
The interesting steps are step 5 and 12.

Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?

Kind regards
Jens M. Kofoed



Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>>:
Jens,

Also what type of file system/storage system are you running NiFi on
in this case?  We'll need to know this for the NiFi
content/flowfile/provenance repositories? Is it NFS?

Thanks

On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com>> wrote:
>
> Jens,
>
> And to further narrow this down
>
> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> (2 files per node) and next process was a hashcontent before it run
> into a test loop. Where files are uploaded via PutSFTP to a test
> server, and downloaded again and recalculated the hash. I have had one
> issue after 3 days of running."
>
> So to be clear with GenerateFlowFile making these files and then you
> looping the content is wholly and fully exclusively within the control
> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> same files over and over in nifi itself you can make this happen or
> cannot?
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>> wrote:
> >
> > Jens,
> >
> > "After fetching a FlowFile-stream file and unpacked it back into NiFi
> > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> > exact same file. And got a new hash. That is what worry’s me.
> > The fact that the same file can be recalculated and produce two
> > different hashes, is very strange, but it happens. "
> >
> > Ok so to confirm you are saying that in each case this happens you see
> > it first compute the wrong hash, but then if you retry the same
> > flowfile it then provides the correct hash?
> >
> > Can you please also show/share the lineage history for such a flow
> > file then?  It should have events for the initial hash, second hash,
> > the unpacking, trace to the original stream, etc...
> >
> > Thanks
> >
> > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >
> > > Dear Mark and Joe
> > >
> > > I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> > > After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> > > The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
> > >
> > > I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
> > >
> > > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> > > Now the test flow is running without the Put/Fetch sftp processors.
> > >
> > > Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> > > I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
> > >
> > > I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
> > >
> > > Kind Regards
> > > Jens M. Kofoed
> > >
> > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>>:
> > >
> > > Jens,
> > >
> > > Thanks for sharing the images.
> > >
> > > I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> > >
> > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> > >
> > > So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> > >
> > > Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> > >
> > > This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> > >
> > > Thanks
> > > -Mark
> > >
> > >
> > >
> > >
> > >
> > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>> wrote:
> > >
> > > Jens
> > >
> > > Actually is this current loop test contained within a single nifi and there you see corruption happen?
> > >
> > > Joe
> > >
> > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>> wrote:
> > >
> > > Jens,
> > >
> > > You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
> > >
> > > Joe
> > >
> > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >
> > > Hi
> > >
> > > Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> > > I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
> > >
> > > I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> > > All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
> > >
> > > Any suggestions for where to dig ?
> > >
> > > Regards
> > > Jens M. Kofoed
> > >
> > >
> > >
> > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>>:
> > >
> > > Hi Mark
> > >
> > > Thanks for replaying and the suggestion to look at the content Claim.
> > > These 3 pictures is from the first attempt:
> > > <image.png>   <image.png>   <image.png>
> > >
> > > Yesterday I realized that the content was still in the archive, so I could Replay the file.
> > > <image.png>
> > > So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> > > <image.png>   <image.png>   <image.png>
> > >
> > > In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> > > <image.png>   <image.png>   <image.png>
> > > Here the content Claim is all the same.
> > >
> > > It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> > > This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> > > What can influence the same processor to calculate 2 different sha256 on the exact same content???
> > >
> > > Regards
> > > Jens M. Kofoed
> > >
> > >
> > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>>:
> > >
> > > Jens,
> > >
> > > In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> > > If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
> > >
> > > Thanks
> > > -Mark
> > >
> > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >
> > > Dear NIFI Users
> > >
> > > I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> > > The background:
> > > We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
> > >
> > > ----
> > >
> > > Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> > > I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
> > >
> > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> > > <image.png>
> > >
> > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> > > I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
> > >
> > > System 1:
> > > At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > The file is exported as a "FlowFile Stream, v3" to System 2
> > >
> > > SYSTEM 2:
> > > At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > <image.png>
> > > At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >
> > > At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > <image.png>
> > >
> > > How on earth can this happen???
> > >
> > > Kind Regards
> > > Jens M. Kofoed
> > >
> > >
> > >
> > > <Repro.json>
<Try_to_recreate_Jens_Challenge.json>



Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Mark Payne <ma...@hotmail.com>.
Jens,

For a bit of background here, the reason that Joe and I have expressed interest in NFS file systems is that the way the protocol works, it is allowed to receive packets/chunks of the file out-of-order. So, what happens is let’s say a 1 MB file is being written. The first 500 KB are received. Then instead of the the 501st KB it receives the 503rd KB. What happens is that the size of the file on the file system becomes 503 KB. But what about 501 & 502? Well when you read the data, the file system just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS server receives those bytes, it then goes back and fills in the proper bytes. So if you’re running on NFS, it is possible for the contents of the file on the underlying file system to change out from under you. It’s not clear to me what other types of file system might do something similar.

So, one thing that we can do is to find out whether or not the contents of the underlying file have changed in some way, or if there’s something else happening that could perhaps result in the hashes being wrong. I’ve put together a script that should help diagnose this.

Can you insert an ExecuteScript processor either just before or just after your CryptographicHashContent processor? Doesn’t really matter whether it’s run just before or just after. I’ll attach the script here. It’s a Groovy Script so you should be able to use ExecuteScript with Script Engine = Groovy and the following script as the Script Body. No other changes needed.

The way the script works, it reads in the contents of the FlowFile, and then it builds up a histogram of all byte values (0-255) that it sees in the contents, and then adds that as attributes. So it adds attributes such as:
histogram.0 = 280273
histogram.1 = 2820
histogram.2 = 48202
histogram.3 = 3820
…
histogram.totalBytes = 1780928732

It then checks if those attributes have already been added. If so, after calculating that histogram, it checks against the previous values (in the attributes). If they are the same, the FlowFile goes to ’success’. If they are different, it logs an error indicating the before/after value for any byte whose distribution was different, and it routes to failure.

So, if for example, the first time through it sees 280,273 bytes with a value of ‘0’, and the second times it only sees 12,001 then we know there were a bunch of 0’s previously that were updated to be some other value. And it includes the total number of bytes in case somehow we find that we’re reading too many bytes or not enough bytes or something like that. This should help narrow down what’s happening.

Thanks
-Mark



On Oct 26, 2021, at 6:25 PM, Joe Witt <jo...@gmail.com>> wrote:

Jens

Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.

Thanks

On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com>> wrote:
Jens

I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...

Still want to know more about your underlying storage system.

You could also try updating nifi.properties and changing the following lines:
nifi.flowfile.repository.always.sync=true
nifi.content.repository.always.sync=true
nifi.provenance.repository.always.sync=true

It will hurt performance but can be useful/necessary on certain storage subsystems.

Thanks

On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com>> wrote:
Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON

On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com>> wrote:
Jens,

We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?

I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.

For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.

Thanks
Joe

On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com>> wrote:
Dear Joe and Mark

I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???

Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.

Kind regards
Jens M. Kofoed


Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>>:
Joe,

To start from the last mail :-)
All the repositories has it's own disk, and I'm using ext4
/dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
/dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
/dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0

My test flow WITH sftp looks like this:
<image.png>
And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.

About the Lineage:
Are there a way to export all the lineage data? The export only generate a svg file.
This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
The interesting steps are step 5 and 12.

Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?

Kind regards
Jens M. Kofoed



Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>>:
Jens,

Also what type of file system/storage system are you running NiFi on
in this case?  We'll need to know this for the NiFi
content/flowfile/provenance repositories? Is it NFS?

Thanks

On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com>> wrote:
>
> Jens,
>
> And to further narrow this down
>
> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> (2 files per node) and next process was a hashcontent before it run
> into a test loop. Where files are uploaded via PutSFTP to a test
> server, and downloaded again and recalculated the hash. I have had one
> issue after 3 days of running."
>
> So to be clear with GenerateFlowFile making these files and then you
> looping the content is wholly and fully exclusively within the control
> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> same files over and over in nifi itself you can make this happen or
> cannot?
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>> wrote:
> >
> > Jens,
> >
> > "After fetching a FlowFile-stream file and unpacked it back into NiFi
> > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> > exact same file. And got a new hash. That is what worry’s me.
> > The fact that the same file can be recalculated and produce two
> > different hashes, is very strange, but it happens. "
> >
> > Ok so to confirm you are saying that in each case this happens you see
> > it first compute the wrong hash, but then if you retry the same
> > flowfile it then provides the correct hash?
> >
> > Can you please also show/share the lineage history for such a flow
> > file then?  It should have events for the initial hash, second hash,
> > the unpacking, trace to the original stream, etc...
> >
> > Thanks
> >
> > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >
> > > Dear Mark and Joe
> > >
> > > I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> > > After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> > > The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
> > >
> > > I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
> > >
> > > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> > > Now the test flow is running without the Put/Fetch sftp processors.
> > >
> > > Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> > > I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
> > >
> > > I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
> > >
> > > Kind Regards
> > > Jens M. Kofoed
> > >
> > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>>:
> > >
> > > Jens,
> > >
> > > Thanks for sharing the images.
> > >
> > > I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> > >
> > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> > >
> > > So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> > >
> > > Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> > >
> > > This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> > >
> > > Thanks
> > > -Mark
> > >
> > >
> > >
> > >
> > >
> > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>> wrote:
> > >
> > > Jens
> > >
> > > Actually is this current loop test contained within a single nifi and there you see corruption happen?
> > >
> > > Joe
> > >
> > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>> wrote:
> > >
> > > Jens,
> > >
> > > You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
> > >
> > > Joe
> > >
> > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >
> > > Hi
> > >
> > > Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> > > I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
> > >
> > > I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> > > All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
> > >
> > > Any suggestions for where to dig ?
> > >
> > > Regards
> > > Jens M. Kofoed
> > >
> > >
> > >
> > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>>:
> > >
> > > Hi Mark
> > >
> > > Thanks for replaying and the suggestion to look at the content Claim.
> > > These 3 pictures is from the first attempt:
> > > <image.png>   <image.png>   <image.png>
> > >
> > > Yesterday I realized that the content was still in the archive, so I could Replay the file.
> > > <image.png>
> > > So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> > > <image.png>   <image.png>   <image.png>
> > >
> > > In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> > > <image.png>   <image.png>   <image.png>
> > > Here the content Claim is all the same.
> > >
> > > It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> > > This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> > > What can influence the same processor to calculate 2 different sha256 on the exact same content???
> > >
> > > Regards
> > > Jens M. Kofoed
> > >
> > >
> > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>>:
> > >
> > > Jens,
> > >
> > > In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> > > If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
> > >
> > > Thanks
> > > -Mark
> > >
> > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>> wrote:
> > >
> > > Dear NIFI Users
> > >
> > > I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> > > The background:
> > > We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
> > >
> > > ----
> > >
> > > Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> > > I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
> > >
> > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> > > <image.png>
> > >
> > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> > > I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
> > >
> > > System 1:
> > > At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > The file is exported as a "FlowFile Stream, v3" to System 2
> > >
> > > SYSTEM 2:
> > > At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > <image.png>
> > > At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >
> > > At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > <image.png>
> > >
> > > How on earth can this happen???
> > >
> > > Kind Regards
> > > Jens M. Kofoed
> > >
> > >
> > >
> > > <Repro.json>
<Try_to_recreate_Jens_Challenge.json>


Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Joe

Many thanks for looking into this.
In one of my previous mails I wrote all the information from the fstab file. The file system for my Linux servers are EXT4. The underlying file system for the VMWare is VMFS v.6

The test flow where I use the MergeContent/UnpackContent is running at 2 different clusters. One is version 1.13.2 and the other is 1.14.0 and both makes errors. 

The things that worries me is step 5 and 12 from my pdf showing my production flow and the lineage of a file. That file is totally handled by the same nifi node and never leaves the system. 

In my case it takes about 10secs. to calculate the hash of a 1GB file and 20-30 secs for the MergeContent/UnpackContent.

Kind regards 
Jens

> Den 27. okt. 2021 kl. 00.25 skrev Joe Witt <jo...@gmail.com>:
> 
> Jens
> 
> Attached is the flow I was using (now running yours and this one).  Curious if that one reproduces the issue for you as well.
> 
> Thanks
> 
>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:
>> Jens
>> 
>> I have your flow running and will keep it running for several days/week to see if I can reproduce.  Also of note please use your same test flow but use HashContent instead of crypto hash.  Curious if that matters for any reason...
>> 
>> Still want to know more about your underlying storage system.
>> 
>> You could also try updating nifi.properties and changing the following lines:
>> nifi.flowfile.repository.always.sync=true
>> nifi.content.repository.always.sync=true
>> nifi.provenance.repository.always.sync=true
>> 
>> It will hurt performance but can be useful/necessary on certain storage subsystems.
>> 
>> Thanks
>> 
>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>>> Ignore "For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>> 
>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>>>> Jens,
>>>> 
>>>> We asked about the underlying storage system.  You replied with some info but not the specifics.  Do you know precisely what the underlying storage is and how it is presented to the operating system?  For instance is it NFS or something similar?
>>>> 
>>>> I've setup a very similar flow at extremely high rates running for the past several days with no issue.  In my case though I know precisely what the config is and the disk setup is.  Didn't do anything special to be clear but still it is important to know.
>>>> 
>>>> For the scenario where you can replicate this please share the flow.xml.gz for which it is reproducible.
>>>> 
>>>> Thanks
>>>> Joe
>>>> 
>>>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>> Dear Joe and Mark
>>>>> 
>>>>> I have created a test flow without the sftp processors, which don't create any errors. Therefore I created a new test flow where I use a MergeContent and UnpackContent instead of the sftp processors. This keeps all data internal in NIFI, but force NIFI to write and read new files totally local.
>>>>> My flow have been running for 7 days and this morning there where 2 files where the sha256 has been given another has value than original. I have set this flow up in another nifi cluster only for testing, and the cluster is not doing anything else. It is using Nifi 1.14.0
>>>>> So I can reproduce issues at different nifi clusters and versions (1.13.2 and 1.14.0) where the calculation of a hash on content can give different outputs. Is doesn't make any sense, but it happens. In all my cases the issues happens where the calculations of the hashcontent happens right after NIFI writes the content to the content repository. I don't know if there cut be some kind of delay writing the content 100% before the next processors begin reading the content???
>>>>> 
>>>>> Please see attach test flow, and the previous mail with a pdf showing the lineage of a production file which also had issues. In the pdf check step 5 and 12.
>>>>> 
>>>>> Kind regards
>>>>> Jens M. Kofoed
>>>>> 
>>>>> 
>>>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>> Joe,
>>>>>> 
>>>>>> To start from the last mail :-)
>>>>>> All the repositories has it's own disk, and I'm using ext4
>>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>>> 
>>>>>> My test flow WITH sftp looks like this:
>>>>>> 
>>>>>> And this flow has produced 1 error within 3 days. After many many loops the file fails and went out via the "unmatched" output to  the disabled UpdateAttribute, which is doing nothing. Just for keeping the failed flowfile in a queue.  I enabled the UpdateAttribute and looped the file back to the CryptographicHashContent and now it calculated the hash correct again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>>>>> Right now my flow is running without the 2 sftp processors, and the last 24hours there has been no errors.
>>>>>> 
>>>>>> About the Lineage:
>>>>>> Are there a way to export all the lineage data? The export only generate a svg file.
>>>>>> This is only for the receiving nifi which is internally calculate 2 different hashes on the same content with ca. 1 minutes delay. Attached is a pdf-document with the lineage, the flow and all the relevant Provenance information's for each step in the lineage.
>>>>>> The interesting steps are step 5 and 12.
>>>>>> 
>>>>>> Can the issues be that data is not written 100% to disk between step 4 and 5 in the flow?
>>>>>> 
>>>>>> Kind regards
>>>>>> Jens M. Kofoed
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>>>>> Jens,
>>>>>>> 
>>>>>>> Also what type of file system/storage system are you running NiFi on
>>>>>>> in this case?  We'll need to know this for the NiFi
>>>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>> >
>>>>>>> > Jens,
>>>>>>> >
>>>>>>> > And to further narrow this down
>>>>>>> >
>>>>>>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>>>>>> > (2 files per node) and next process was a hashcontent before it run
>>>>>>> > into a test loop. Where files are uploaded via PutSFTP to a test
>>>>>>> > server, and downloaded again and recalculated the hash. I have had one
>>>>>>> > issue after 3 days of running."
>>>>>>> >
>>>>>>> > So to be clear with GenerateFlowFile making these files and then you
>>>>>>> > looping the content is wholly and fully exclusively within the control
>>>>>>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>>>>> > same files over and over in nifi itself you can make this happen or
>>>>>>> > cannot?
>>>>>>> >
>>>>>>> > Thanks
>>>>>>> >
>>>>>>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>> > >
>>>>>>> > > Jens,
>>>>>>> > >
>>>>>>> > > "After fetching a FlowFile-stream file and unpacked it back into NiFi
>>>>>>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>>>>> > > exact same file. And got a new hash. That is what worry’s me.
>>>>>>> > > The fact that the same file can be recalculated and produce two
>>>>>>> > > different hashes, is very strange, but it happens. "
>>>>>>> > >
>>>>>>> > > Ok so to confirm you are saying that in each case this happens you see
>>>>>>> > > it first compute the wrong hash, but then if you retry the same
>>>>>>> > > flowfile it then provides the correct hash?
>>>>>>> > >
>>>>>>> > > Can you please also show/share the lineage history for such a flow
>>>>>>> > > file then?  It should have events for the initial hash, second hash,
>>>>>>> > > the unpacking, trace to the original stream, etc...
>>>>>>> > >
>>>>>>> > > Thanks
>>>>>>> > >
>>>>>>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Dear Mark and Joe
>>>>>>> > > >
>>>>>>> > > > I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>>>>> > > > After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
>>>>>>> > > > The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>>>>>>> > > >
>>>>>>> > > > I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>>>>>>> > > >
>>>>>>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
>>>>>>> > > > Now the test flow is running without the Put/Fetch sftp processors.
>>>>>>> > > >
>>>>>>> > > > Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
>>>>>>> > > > I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>>>>>>> > > >
>>>>>>> > > > I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>>>>>>> > > >
>>>>>>> > > > Kind Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > Thanks for sharing the images.
>>>>>>> > > >
>>>>>>> > > > I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>>>>>>> > > >
>>>>>>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>>>>>>> > > >
>>>>>>> > > > So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>>>>>>> > > >
>>>>>>> > > > Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>>>>>>> > > >
>>>>>>> > > > This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>>>>>>> > > >
>>>>>>> > > > Thanks
>>>>>>> > > > -Mark
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Jens
>>>>>>> > > >
>>>>>>> > > > Actually is this current loop test contained within a single nifi and there you see corruption happen?
>>>>>>> > > >
>>>>>>> > > > Joe
>>>>>>> > > >
>>>>>>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>>>>>>> > > >
>>>>>>> > > > Joe
>>>>>>> > > >
>>>>>>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Hi
>>>>>>> > > >
>>>>>>> > > > Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>>>>>>> > > > I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>>>>>>> > > >
>>>>>>> > > > I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>>>>>>> > > > All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>>>>>>> > > >
>>>>>>> > > > Any suggestions for where to dig ?
>>>>>>> > > >
>>>>>>> > > > Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>>>>>>> > > >
>>>>>>> > > > Hi Mark
>>>>>>> > > >
>>>>>>> > > > Thanks for replaying and the suggestion to look at the content Claim.
>>>>>>> > > > These 3 pictures is from the first attempt:
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > >
>>>>>>> > > > Yesterday I realized that the content was still in the archive, so I could Replay the file.
>>>>>>> > > > <image.png>
>>>>>>> > > > So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > >
>>>>>>> > > > In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>>> > > > Here the content Claim is all the same.
>>>>>>> > > >
>>>>>>> > > > It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>>>>>>> > > > This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>>>>>>> > > > What can influence the same processor to calculate 2 different sha256 on the exact same content???
>>>>>>> > > >
>>>>>>> > > > Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>>>>>>> > > >
>>>>>>> > > > Jens,
>>>>>>> > > >
>>>>>>> > > > In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>>>>>>> > > > If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>>>>>>> > > >
>>>>>>> > > > Thanks
>>>>>>> > > > -Mark
>>>>>>> > > >
>>>>>>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>>>>>> > > >
>>>>>>> > > > Dear NIFI Users
>>>>>>> > > >
>>>>>>> > > > I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>>>>>>> > > > The background:
>>>>>>> > > > We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>>>>>>> > > >
>>>>>>> > > > ----
>>>>>>> > > >
>>>>>>> > > > Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>>>>>>> > > > I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>>>>>>> > > >
>>>>>>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>>>>>>> > > > <image.png>
>>>>>>> > > >
>>>>>>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>>>>>> > > > I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>>>>>>> > > >
>>>>>>> > > > System 1:
>>>>>>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>>>>>>> > > >
>>>>>>> > > > SYSTEM 2:
>>>>>>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > <image.png>
>>>>>>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>>> > > >
>>>>>>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>>> > > > <image.png>
>>>>>>> > > >
>>>>>>> > > > How on earth can this happen???
>>>>>>> > > >
>>>>>>> > > > Kind Regards
>>>>>>> > > > Jens M. Kofoed
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > <Repro.json>
> <Try_to_recreate_Jens_Challenge.json>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens

Attached is the flow I was using (now running yours and this one).  Curious
if that one reproduces the issue for you as well.

Thanks

On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <jo...@gmail.com> wrote:

> Jens
>
> I have your flow running and will keep it running for several days/week to
> see if I can reproduce.  Also of note please use your same test flow but
> use HashContent instead of crypto hash.  Curious if that matters for any
> reason...
>
> Still want to know more about your underlying storage system.
>
> You could also try updating nifi.properties and changing the following
> lines:
> nifi.flowfile.repository.always.sync=true
> nifi.content.repository.always.sync=true
> nifi.provenance.repository.always.sync=true
>
> It will hurt performance but can be useful/necessary on certain storage
> subsystems.
>
> Thanks
>
> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:
>
>> Ignore "For the scenario where you can replicate this please share the
>> flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>
>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>>
>>> Jens,
>>>
>>> We asked about the underlying storage system.  You replied with some
>>> info but not the specifics.  Do you know precisely what the underlying
>>> storage is and how it is presented to the operating system?  For instance
>>> is it NFS or something similar?
>>>
>>> I've setup a very similar flow at extremely high rates running for the
>>> past several days with no issue.  In my case though I know precisely what
>>> the config is and the disk setup is.  Didn't do anything special to be
>>> clear but still it is important to know.
>>>
>>> For the scenario where you can replicate this please share the
>>> flow.xml.gz for which it is reproducible.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com>
>>> wrote:
>>>
>>>> Dear Joe and Mark
>>>>
>>>> I have created a test flow without the sftp processors, which don't
>>>> create any errors. Therefore I created a new test flow where I use a
>>>> MergeContent and UnpackContent instead of the sftp processors. This keeps
>>>> all data internal in NIFI, but force NIFI to write and read new files
>>>> totally local.
>>>> My flow have been running for 7 days and this morning there where 2
>>>> files where the sha256 has been given another has value than original. I
>>>> have set this flow up in another nifi cluster only for testing, and the
>>>> cluster is not doing anything else. It is using Nifi 1.14.0
>>>> So I can reproduce issues at different nifi clusters and versions
>>>> (1.13.2 and 1.14.0) where the calculation of a hash on content can give
>>>> different outputs. Is doesn't make any sense, but it happens. In all my
>>>> cases the issues happens where the calculations of the hashcontent happens
>>>> right after NIFI writes the content to the content repository. I don't know
>>>> if there cut be some kind of delay writing the content 100% before the next
>>>> processors begin reading the content???
>>>>
>>>> Please see attach test flow, and the previous mail with a pdf showing
>>>> the lineage of a production file which also had issues. In the pdf check
>>>> step 5 and 12.
>>>>
>>>> Kind regards
>>>> Jens M. Kofoed
>>>>
>>>>
>>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
>>>> jmkofoed.ube@gmail.com>:
>>>>
>>>>> Joe,
>>>>>
>>>>> To start from the last mail :-)
>>>>> All the repositories has it's own disk, and I'm using ext4
>>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>>
>>>>> My test flow WITH sftp looks like this:
>>>>> [image: image.png]
>>>>> And this flow has produced 1 error within 3 days. After many many
>>>>> loops the file fails and went out via the "unmatched" output to  the
>>>>> disabled UpdateAttribute, which is doing nothing. Just for keeping the
>>>>> failed flowfile in a queue.  I enabled the UpdateAttribute and looped the
>>>>> file back to the CryptographicHashContent and now it calculated the hash
>>>>> correct again. But in this flow I have a FetchSFTP Process right before the
>>>>> Hashing.
>>>>> Right now my flow is running without the 2 sftp processors, and the
>>>>> last 24hours there has been no errors.
>>>>>
>>>>> About the Lineage:
>>>>> Are there a way to export all the lineage data? The export only
>>>>> generate a svg file.
>>>>> This is only for the receiving nifi which is internally calculate 2
>>>>> different hashes on the same content with ca. 1 minutes delay. Attached is
>>>>> a pdf-document with the lineage, the flow and all the relevant Provenance
>>>>> information's for each step in the lineage.
>>>>> The interesting steps are step 5 and 12.
>>>>>
>>>>> Can the issues be that data is not written 100% to disk between step 4
>>>>> and 5 in the flow?
>>>>>
>>>>> Kind regards
>>>>> Jens M. Kofoed
>>>>>
>>>>>
>>>>>
>>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>>>
>>>>>> Jens,
>>>>>>
>>>>>> Also what type of file system/storage system are you running NiFi on
>>>>>> in this case?  We'll need to know this for the NiFi
>>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>>> >
>>>>>> > Jens,
>>>>>> >
>>>>>> > And to further narrow this down
>>>>>> >
>>>>>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB
>>>>>> files
>>>>>> > (2 files per node) and next process was a hashcontent before it run
>>>>>> > into a test loop. Where files are uploaded via PutSFTP to a test
>>>>>> > server, and downloaded again and recalculated the hash. I have had
>>>>>> one
>>>>>> > issue after 3 days of running."
>>>>>> >
>>>>>> > So to be clear with GenerateFlowFile making these files and then you
>>>>>> > looping the content is wholly and fully exclusively within the
>>>>>> control
>>>>>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping
>>>>>> the
>>>>>> > same files over and over in nifi itself you can make this happen or
>>>>>> > cannot?
>>>>>> >
>>>>>> > Thanks
>>>>>> >
>>>>>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>
>>>>>> wrote:
>>>>>> > >
>>>>>> > > Jens,
>>>>>> > >
>>>>>> > > "After fetching a FlowFile-stream file and unpacked it back into
>>>>>> NiFi
>>>>>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on
>>>>>> the
>>>>>> > > exact same file. And got a new hash. That is what worry’s me.
>>>>>> > > The fact that the same file can be recalculated and produce two
>>>>>> > > different hashes, is very strange, but it happens. "
>>>>>> > >
>>>>>> > > Ok so to confirm you are saying that in each case this happens
>>>>>> you see
>>>>>> > > it first compute the wrong hash, but then if you retry the same
>>>>>> > > flowfile it then provides the correct hash?
>>>>>> > >
>>>>>> > > Can you please also show/share the lineage history for such a flow
>>>>>> > > file then?  It should have events for the initial hash, second
>>>>>> hash,
>>>>>> > > the unpacking, trace to the original stream, etc...
>>>>>> > >
>>>>>> > > Thanks
>>>>>> > >
>>>>>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
>>>>>> jmkofoed.ube@gmail.com> wrote:
>>>>>> > > >
>>>>>> > > > Dear Mark and Joe
>>>>>> > > >
>>>>>> > > > I know my setup isn’t normal for many people. But if we only
>>>>>> looks at my receive side, which the last mails is about. Every thing is
>>>>>> happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>>>> > > > After fetching a FlowFile-stream file and unpacked it back into
>>>>>> NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>>>> exact same file. And got a new hash. That is what worry’s me.
>>>>>> > > > The fact that the same file can be recalculated and produce two
>>>>>> different hashes, is very strange, but it happens. Over the last 5 months
>>>>>> it have only happen 35-40 times.
>>>>>> > > >
>>>>>> > > > I can understand if the file is not completely loaded and saved
>>>>>> into the content repository before the hashing starts. But I believe that
>>>>>> the unpack process don’t forward the flow file to the next process before
>>>>>> it is 100% finish unpacking and saving the new content to the repository.
>>>>>> > > >
>>>>>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB
>>>>>> files (2 files per node) and next process was a hashcontent before it run
>>>>>> into a test loop. Where files are uploaded via PutSFTP to a test server,
>>>>>> and downloaded again and recalculated the hash. I have had one issue after
>>>>>> 3 days of running.
>>>>>> > > > Now the test flow is running without the Put/Fetch sftp
>>>>>> processors.
>>>>>> > > >
>>>>>> > > > Another problem is that I can’t find any correlation to other
>>>>>> events. Not within NIFI, nor the server itself or VMWare. If I just could
>>>>>> find any other event which happens at the same time, I might be able to
>>>>>> force some kind of event to trigger the issue.
>>>>>> > > > I have tried to force VMware to migrate a NiFi node to another
>>>>>> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
>>>>>> trigger and error.
>>>>>> > > >
>>>>>> > > > I know it will be very very difficult to reproduce. But I will
>>>>>> setup multiple NiFi instances running different test flows to see if I can
>>>>>> find any reason why it behaves as it does.
>>>>>> > > >
>>>>>> > > > Kind Regards
>>>>>> > > > Jens M. Kofoed
>>>>>> > > >
>>>>>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <
>>>>>> markap14@hotmail.com>:
>>>>>> > > >
>>>>>> > > > Jens,
>>>>>> > > >
>>>>>> > > > Thanks for sharing the images.
>>>>>> > > >
>>>>>> > > > I tried to setup a test to reproduce the issue. I’ve had it
>>>>>> running for quite some time. Running through millions of iterations.
>>>>>> > > >
>>>>>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to
>>>>>> the tune of hundreds of MB). I’ve been unable to reproduce an issue after
>>>>>> millions of iterations.
>>>>>> > > >
>>>>>> > > > So far I cannot replicate. And since you’re pulling the data
>>>>>> via SFTP and then unpacking, which preserves all original attributes from a
>>>>>> different system, this can easily become confusing.
>>>>>> > > >
>>>>>> > > > Recommend trying to reproduce with SFTP-related processors out
>>>>>> of the picture, as Joe is mentioning. Either using GetFile/FetchFile or
>>>>>> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
>>>>>> an ‘initial hash’, copy that value to another attribute, and then loop,
>>>>>> generating the hash and comparing against the original one. I’ll attach a
>>>>>> flow that does this, but not sure if the email server will strip out the
>>>>>> attachment or not.
>>>>>> > > >
>>>>>> > > > This way we remove any possibility of actual corruption between
>>>>>> the two nifi instances. If we can still see corruption / different hashes
>>>>>> within a single nifi instance, then it certainly warrants further
>>>>>> investigation but i can’t see any issues so far.
>>>>>> > > >
>>>>>> > > > Thanks
>>>>>> > > > -Mark
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>
>>>>>> wrote:
>>>>>> > > >
>>>>>> > > > Jens
>>>>>> > > >
>>>>>> > > > Actually is this current loop test contained within a single
>>>>>> nifi and there you see corruption happen?
>>>>>> > > >
>>>>>> > > > Joe
>>>>>> > > >
>>>>>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
>>>>>> wrote:
>>>>>> > > >
>>>>>> > > > Jens,
>>>>>> > > >
>>>>>> > > > You have a very involved setup including other systems (non
>>>>>> NiFi).  Have you removed those systems from the equation so you have more
>>>>>> evidence to support your expectation that NiFi is doing something other
>>>>>> than you expect?
>>>>>> > > >
>>>>>> > > > Joe
>>>>>> > > >
>>>>>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
>>>>>> jmkofoed.ube@gmail.com> wrote:
>>>>>> > > >
>>>>>> > > > Hi
>>>>>> > > >
>>>>>> > > > Today I have another file which have been running through the
>>>>>> retry loop one time. To test the processors and the algorithm I added the
>>>>>> HashContent processor and also added hashing by SHA-1.
>>>>>> > > > I file have been going through the system, and both the SHA-1
>>>>>> and SHA-256 are both different than expected. with a 1 minutes delay the
>>>>>> file is going back into the hashing content flow and this time it
>>>>>> calculates both hashes fine.
>>>>>> > > >
>>>>>> > > > I don't believe that the hashing is buggy, but something is
>>>>>> very very strange. What can influence the processors/algorithm to calculate
>>>>>> a different hash???
>>>>>> > > > All the input/output claim information is exactly the same. It
>>>>>> is the same flow/content file going in a loop. It happens on all 3 nodes.
>>>>>> > > >
>>>>>> > > > Any suggestions for where to dig ?
>>>>>> > > >
>>>>>> > > > Regards
>>>>>> > > > Jens M. Kofoed
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
>>>>>> jmkofoed.ube@gmail.com>:
>>>>>> > > >
>>>>>> > > > Hi Mark
>>>>>> > > >
>>>>>> > > > Thanks for replaying and the suggestion to look at the content
>>>>>> Claim.
>>>>>> > > > These 3 pictures is from the first attempt:
>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>> > > >
>>>>>> > > > Yesterday I realized that the content was still in the archive,
>>>>>> so I could Replay the file.
>>>>>> > > > <image.png>
>>>>>> > > > So here are the same pictures but for the replay and as you can
>>>>>> see the Identifier, offset and Size are all the same.
>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>> > > >
>>>>>> > > > In my flow if the hash does not match my original first
>>>>>> calculated hash, it goes into a retry loop. Here are the pictures for the
>>>>>> 4th time the file went through:
>>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>>> > > > Here the content Claim is all the same.
>>>>>> > > >
>>>>>> > > > It is very rare that we see these issues <1 : 1.000.000 files
>>>>>> and only with large files. Only once have I seen the error with a 110MB
>>>>>> file, the other times the files size are above 800MB.
>>>>>> > > > This time it was a Nifi-Flowstream v3 file, which has been
>>>>>> exported from one system and imported in another. But while the file has
>>>>>> been imported it is the same file inside NIFI and it stays at the same
>>>>>> node. Going through the same loop of processors multiple times and in the
>>>>>> end the CryptographicHashContent calculate a different SHA256 than it did
>>>>>> earlier. This should not be possible!!! And that is what concern my the
>>>>>> most.
>>>>>> > > > What can influence the same processor to calculate 2 different
>>>>>> sha256 on the exact same content???
>>>>>> > > >
>>>>>> > > > Regards
>>>>>> > > > Jens M. Kofoed
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
>>>>>> markap14@hotmail.com>:
>>>>>> > > >
>>>>>> > > > Jens,
>>>>>> > > >
>>>>>> > > > In the two provenance events - one showing a hash of dd4cc… and
>>>>>> the other showing f6f0….
>>>>>> > > > If you go to the Content tab, do they both show the same
>>>>>> Content Claim? I.e., do the Input Claim / Output Claim show the same values
>>>>>> for Container, Section, Identifier, Offset, and Size?
>>>>>> > > >
>>>>>> > > > Thanks
>>>>>> > > > -Mark
>>>>>> > > >
>>>>>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <
>>>>>> jmkofoed.ube@gmail.com> wrote:
>>>>>> > > >
>>>>>> > > > Dear NIFI Users
>>>>>> > > >
>>>>>> > > > I have posted this mail in the developers mailing list and just
>>>>>> want to inform all of our about a very odd behavior we are facing.
>>>>>> > > > The background:
>>>>>> > > > We have data going between 2 different NIFI systems which has
>>>>>> no direct network access to each other. Therefore we calculate a SHA256
>>>>>> hash value of the content at system 1, before the flowfile and data are
>>>>>> combined and saved as a "flowfile-stream-v3" pkg file. The file is then
>>>>>> transported to system 2, where the pkg file is unpacked and the flow can
>>>>>> continue. To be sure about file integrity we calculate a new sha256 at
>>>>>> system 2. But sometimes we see that the sha256 gets another value, which
>>>>>> might suggest the file was corrupted. But recalculating the sha256 again
>>>>>> gives a new hash value.
>>>>>> > > >
>>>>>> > > > ----
>>>>>> > > >
>>>>>> > > > Tonight I had yet another file which didn't match the expected
>>>>>> sha256 hash value. The content is a 1.7GB file and the Event Duration was
>>>>>> "00:00:17.539" to calculate the hash.
>>>>>> > > > I have created a Retry loop, where the file will go to a Wait
>>>>>> process for delaying the file 1 minute and going back to the
>>>>>> CryptographicHashContent for a new calculation. After 3 retries the file
>>>>>> goes to the retries_exceeded and goes to a disabled process just to be in a
>>>>>> queue so I manually can look at it. This morning I rerouted the file from
>>>>>> my retries_exceeded queue back to the CryptographicHashContent for a new
>>>>>> calculation and this time it calculated the correct hash value.
>>>>>> > > >
>>>>>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very
>>>>>> strange is happening.
>>>>>> > > > <image.png>
>>>>>> > > >
>>>>>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu
>>>>>> 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment
>>>>>> (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM
>>>>>> (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on
>>>>>> VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical
>>>>>> hosts.
>>>>>> > > > I have inspected different logs to see if I can find any
>>>>>> correlation what happened at the same time as the file is going through my
>>>>>> loop, but there are no event/task at that exact time.
>>>>>> > > >
>>>>>> > > > System 1:
>>>>>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a
>>>>>> CryptographicHashContent: SHA256 value:
>>>>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>>>>>> > > >
>>>>>> > > > SYSTEM 2:
>>>>>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a
>>>>>> CryptographicHashContent: SHA256 value:
>>>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>> > > > <image.png>
>>>>>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the
>>>>>> same CryptographicHashContent at system 2: SHA256 value:
>>>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the
>>>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the
>>>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>>> > > >
>>>>>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the
>>>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>>> > > > <image.png>
>>>>>> > > >
>>>>>> > > > How on earth can this happen???
>>>>>> > > >
>>>>>> > > > Kind Regards
>>>>>> > > > Jens M. Kofoed
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > <Repro.json>
>>>>>>
>>>>>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens

I have your flow running and will keep it running for several days/week to
see if I can reproduce.  Also of note please use your same test flow but
use HashContent instead of crypto hash.  Curious if that matters for any
reason...

Still want to know more about your underlying storage system.

You could also try updating nifi.properties and changing the following
lines:
nifi.flowfile.repository.always.sync=true
nifi.content.repository.always.sync=true
nifi.provenance.repository.always.sync=true

It will hurt performance but can be useful/necessary on certain storage
subsystems.

Thanks

On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <jo...@gmail.com> wrote:

> Ignore "For the scenario where you can replicate this please share the
> flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>
> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:
>
>> Jens,
>>
>> We asked about the underlying storage system.  You replied with some info
>> but not the specifics.  Do you know precisely what the underlying storage
>> is and how it is presented to the operating system?  For instance is it NFS
>> or something similar?
>>
>> I've setup a very similar flow at extremely high rates running for the
>> past several days with no issue.  In my case though I know precisely what
>> the config is and the disk setup is.  Didn't do anything special to be
>> clear but still it is important to know.
>>
>> For the scenario where you can replicate this please share the
>> flow.xml.gz for which it is reproducible.
>>
>> Thanks
>> Joe
>>
>> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com>
>> wrote:
>>
>>> Dear Joe and Mark
>>>
>>> I have created a test flow without the sftp processors, which don't
>>> create any errors. Therefore I created a new test flow where I use a
>>> MergeContent and UnpackContent instead of the sftp processors. This keeps
>>> all data internal in NIFI, but force NIFI to write and read new files
>>> totally local.
>>> My flow have been running for 7 days and this morning there where 2
>>> files where the sha256 has been given another has value than original. I
>>> have set this flow up in another nifi cluster only for testing, and the
>>> cluster is not doing anything else. It is using Nifi 1.14.0
>>> So I can reproduce issues at different nifi clusters and versions
>>> (1.13.2 and 1.14.0) where the calculation of a hash on content can give
>>> different outputs. Is doesn't make any sense, but it happens. In all my
>>> cases the issues happens where the calculations of the hashcontent happens
>>> right after NIFI writes the content to the content repository. I don't know
>>> if there cut be some kind of delay writing the content 100% before the next
>>> processors begin reading the content???
>>>
>>> Please see attach test flow, and the previous mail with a pdf showing
>>> the lineage of a production file which also had issues. In the pdf check
>>> step 5 and 12.
>>>
>>> Kind regards
>>> Jens M. Kofoed
>>>
>>>
>>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
>>> jmkofoed.ube@gmail.com>:
>>>
>>>> Joe,
>>>>
>>>> To start from the last mail :-)
>>>> All the repositories has it's own disk, and I'm using ext4
>>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>>
>>>> My test flow WITH sftp looks like this:
>>>> [image: image.png]
>>>> And this flow has produced 1 error within 3 days. After many many loops
>>>> the file fails and went out via the "unmatched" output to  the disabled
>>>> UpdateAttribute, which is doing nothing. Just for keeping the failed
>>>> flowfile in a queue.  I enabled the UpdateAttribute and looped the file
>>>> back to the CryptographicHashContent and now it calculated the hash correct
>>>> again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>>> Right now my flow is running without the 2 sftp processors, and the
>>>> last 24hours there has been no errors.
>>>>
>>>> About the Lineage:
>>>> Are there a way to export all the lineage data? The export only
>>>> generate a svg file.
>>>> This is only for the receiving nifi which is internally calculate 2
>>>> different hashes on the same content with ca. 1 minutes delay. Attached is
>>>> a pdf-document with the lineage, the flow and all the relevant Provenance
>>>> information's for each step in the lineage.
>>>> The interesting steps are step 5 and 12.
>>>>
>>>> Can the issues be that data is not written 100% to disk between step 4
>>>> and 5 in the flow?
>>>>
>>>> Kind regards
>>>> Jens M. Kofoed
>>>>
>>>>
>>>>
>>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>>
>>>>> Jens,
>>>>>
>>>>> Also what type of file system/storage system are you running NiFi on
>>>>> in this case?  We'll need to know this for the NiFi
>>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>>> >
>>>>> > Jens,
>>>>> >
>>>>> > And to further narrow this down
>>>>> >
>>>>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB
>>>>> files
>>>>> > (2 files per node) and next process was a hashcontent before it run
>>>>> > into a test loop. Where files are uploaded via PutSFTP to a test
>>>>> > server, and downloaded again and recalculated the hash. I have had
>>>>> one
>>>>> > issue after 3 days of running."
>>>>> >
>>>>> > So to be clear with GenerateFlowFile making these files and then you
>>>>> > looping the content is wholly and fully exclusively within the
>>>>> control
>>>>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>>> > same files over and over in nifi itself you can make this happen or
>>>>> > cannot?
>>>>> >
>>>>> > Thanks
>>>>> >
>>>>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com>
>>>>> wrote:
>>>>> > >
>>>>> > > Jens,
>>>>> > >
>>>>> > > "After fetching a FlowFile-stream file and unpacked it back into
>>>>> NiFi
>>>>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on
>>>>> the
>>>>> > > exact same file. And got a new hash. That is what worry’s me.
>>>>> > > The fact that the same file can be recalculated and produce two
>>>>> > > different hashes, is very strange, but it happens. "
>>>>> > >
>>>>> > > Ok so to confirm you are saying that in each case this happens you
>>>>> see
>>>>> > > it first compute the wrong hash, but then if you retry the same
>>>>> > > flowfile it then provides the correct hash?
>>>>> > >
>>>>> > > Can you please also show/share the lineage history for such a flow
>>>>> > > file then?  It should have events for the initial hash, second
>>>>> hash,
>>>>> > > the unpacking, trace to the original stream, etc...
>>>>> > >
>>>>> > > Thanks
>>>>> > >
>>>>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
>>>>> jmkofoed.ube@gmail.com> wrote:
>>>>> > > >
>>>>> > > > Dear Mark and Joe
>>>>> > > >
>>>>> > > > I know my setup isn’t normal for many people. But if we only
>>>>> looks at my receive side, which the last mails is about. Every thing is
>>>>> happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>>> > > > After fetching a FlowFile-stream file and unpacked it back into
>>>>> NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>>> exact same file. And got a new hash. That is what worry’s me.
>>>>> > > > The fact that the same file can be recalculated and produce two
>>>>> different hashes, is very strange, but it happens. Over the last 5 months
>>>>> it have only happen 35-40 times.
>>>>> > > >
>>>>> > > > I can understand if the file is not completely loaded and saved
>>>>> into the content repository before the hashing starts. But I believe that
>>>>> the unpack process don’t forward the flow file to the next process before
>>>>> it is 100% finish unpacking and saving the new content to the repository.
>>>>> > > >
>>>>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB
>>>>> files (2 files per node) and next process was a hashcontent before it run
>>>>> into a test loop. Where files are uploaded via PutSFTP to a test server,
>>>>> and downloaded again and recalculated the hash. I have had one issue after
>>>>> 3 days of running.
>>>>> > > > Now the test flow is running without the Put/Fetch sftp
>>>>> processors.
>>>>> > > >
>>>>> > > > Another problem is that I can’t find any correlation to other
>>>>> events. Not within NIFI, nor the server itself or VMWare. If I just could
>>>>> find any other event which happens at the same time, I might be able to
>>>>> force some kind of event to trigger the issue.
>>>>> > > > I have tried to force VMware to migrate a NiFi node to another
>>>>> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
>>>>> trigger and error.
>>>>> > > >
>>>>> > > > I know it will be very very difficult to reproduce. But I will
>>>>> setup multiple NiFi instances running different test flows to see if I can
>>>>> find any reason why it behaves as it does.
>>>>> > > >
>>>>> > > > Kind Regards
>>>>> > > > Jens M. Kofoed
>>>>> > > >
>>>>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <
>>>>> markap14@hotmail.com>:
>>>>> > > >
>>>>> > > > Jens,
>>>>> > > >
>>>>> > > > Thanks for sharing the images.
>>>>> > > >
>>>>> > > > I tried to setup a test to reproduce the issue. I’ve had it
>>>>> running for quite some time. Running through millions of iterations.
>>>>> > > >
>>>>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to
>>>>> the tune of hundreds of MB). I’ve been unable to reproduce an issue after
>>>>> millions of iterations.
>>>>> > > >
>>>>> > > > So far I cannot replicate. And since you’re pulling the data via
>>>>> SFTP and then unpacking, which preserves all original attributes from a
>>>>> different system, this can easily become confusing.
>>>>> > > >
>>>>> > > > Recommend trying to reproduce with SFTP-related processors out
>>>>> of the picture, as Joe is mentioning. Either using GetFile/FetchFile or
>>>>> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
>>>>> an ‘initial hash’, copy that value to another attribute, and then loop,
>>>>> generating the hash and comparing against the original one. I’ll attach a
>>>>> flow that does this, but not sure if the email server will strip out the
>>>>> attachment or not.
>>>>> > > >
>>>>> > > > This way we remove any possibility of actual corruption between
>>>>> the two nifi instances. If we can still see corruption / different hashes
>>>>> within a single nifi instance, then it certainly warrants further
>>>>> investigation but i can’t see any issues so far.
>>>>> > > >
>>>>> > > > Thanks
>>>>> > > > -Mark
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>
>>>>> wrote:
>>>>> > > >
>>>>> > > > Jens
>>>>> > > >
>>>>> > > > Actually is this current loop test contained within a single
>>>>> nifi and there you see corruption happen?
>>>>> > > >
>>>>> > > > Joe
>>>>> > > >
>>>>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
>>>>> wrote:
>>>>> > > >
>>>>> > > > Jens,
>>>>> > > >
>>>>> > > > You have a very involved setup including other systems (non
>>>>> NiFi).  Have you removed those systems from the equation so you have more
>>>>> evidence to support your expectation that NiFi is doing something other
>>>>> than you expect?
>>>>> > > >
>>>>> > > > Joe
>>>>> > > >
>>>>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
>>>>> jmkofoed.ube@gmail.com> wrote:
>>>>> > > >
>>>>> > > > Hi
>>>>> > > >
>>>>> > > > Today I have another file which have been running through the
>>>>> retry loop one time. To test the processors and the algorithm I added the
>>>>> HashContent processor and also added hashing by SHA-1.
>>>>> > > > I file have been going through the system, and both the SHA-1
>>>>> and SHA-256 are both different than expected. with a 1 minutes delay the
>>>>> file is going back into the hashing content flow and this time it
>>>>> calculates both hashes fine.
>>>>> > > >
>>>>> > > > I don't believe that the hashing is buggy, but something is very
>>>>> very strange. What can influence the processors/algorithm to calculate a
>>>>> different hash???
>>>>> > > > All the input/output claim information is exactly the same. It
>>>>> is the same flow/content file going in a loop. It happens on all 3 nodes.
>>>>> > > >
>>>>> > > > Any suggestions for where to dig ?
>>>>> > > >
>>>>> > > > Regards
>>>>> > > > Jens M. Kofoed
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
>>>>> jmkofoed.ube@gmail.com>:
>>>>> > > >
>>>>> > > > Hi Mark
>>>>> > > >
>>>>> > > > Thanks for replaying and the suggestion to look at the content
>>>>> Claim.
>>>>> > > > These 3 pictures is from the first attempt:
>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>> > > >
>>>>> > > > Yesterday I realized that the content was still in the archive,
>>>>> so I could Replay the file.
>>>>> > > > <image.png>
>>>>> > > > So here are the same pictures but for the replay and as you can
>>>>> see the Identifier, offset and Size are all the same.
>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>> > > >
>>>>> > > > In my flow if the hash does not match my original first
>>>>> calculated hash, it goes into a retry loop. Here are the pictures for the
>>>>> 4th time the file went through:
>>>>> > > > <image.png>   <image.png>   <image.png>
>>>>> > > > Here the content Claim is all the same.
>>>>> > > >
>>>>> > > > It is very rare that we see these issues <1 : 1.000.000 files
>>>>> and only with large files. Only once have I seen the error with a 110MB
>>>>> file, the other times the files size are above 800MB.
>>>>> > > > This time it was a Nifi-Flowstream v3 file, which has been
>>>>> exported from one system and imported in another. But while the file has
>>>>> been imported it is the same file inside NIFI and it stays at the same
>>>>> node. Going through the same loop of processors multiple times and in the
>>>>> end the CryptographicHashContent calculate a different SHA256 than it did
>>>>> earlier. This should not be possible!!! And that is what concern my the
>>>>> most.
>>>>> > > > What can influence the same processor to calculate 2 different
>>>>> sha256 on the exact same content???
>>>>> > > >
>>>>> > > > Regards
>>>>> > > > Jens M. Kofoed
>>>>> > > >
>>>>> > > >
>>>>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
>>>>> markap14@hotmail.com>:
>>>>> > > >
>>>>> > > > Jens,
>>>>> > > >
>>>>> > > > In the two provenance events - one showing a hash of dd4cc… and
>>>>> the other showing f6f0….
>>>>> > > > If you go to the Content tab, do they both show the same Content
>>>>> Claim? I.e., do the Input Claim / Output Claim show the same values for
>>>>> Container, Section, Identifier, Offset, and Size?
>>>>> > > >
>>>>> > > > Thanks
>>>>> > > > -Mark
>>>>> > > >
>>>>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <
>>>>> jmkofoed.ube@gmail.com> wrote:
>>>>> > > >
>>>>> > > > Dear NIFI Users
>>>>> > > >
>>>>> > > > I have posted this mail in the developers mailing list and just
>>>>> want to inform all of our about a very odd behavior we are facing.
>>>>> > > > The background:
>>>>> > > > We have data going between 2 different NIFI systems which has no
>>>>> direct network access to each other. Therefore we calculate a SHA256 hash
>>>>> value of the content at system 1, before the flowfile and data are combined
>>>>> and saved as a "flowfile-stream-v3" pkg file. The file is then transported
>>>>> to system 2, where the pkg file is unpacked and the flow can continue. To
>>>>> be sure about file integrity we calculate a new sha256 at system 2. But
>>>>> sometimes we see that the sha256 gets another value, which might suggest
>>>>> the file was corrupted. But recalculating the sha256 again gives a new hash
>>>>> value.
>>>>> > > >
>>>>> > > > ----
>>>>> > > >
>>>>> > > > Tonight I had yet another file which didn't match the expected
>>>>> sha256 hash value. The content is a 1.7GB file and the Event Duration was
>>>>> "00:00:17.539" to calculate the hash.
>>>>> > > > I have created a Retry loop, where the file will go to a Wait
>>>>> process for delaying the file 1 minute and going back to the
>>>>> CryptographicHashContent for a new calculation. After 3 retries the file
>>>>> goes to the retries_exceeded and goes to a disabled process just to be in a
>>>>> queue so I manually can look at it. This morning I rerouted the file from
>>>>> my retries_exceeded queue back to the CryptographicHashContent for a new
>>>>> calculation and this time it calculated the correct hash value.
>>>>> > > >
>>>>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very
>>>>> strange is happening.
>>>>> > > > <image.png>
>>>>> > > >
>>>>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu
>>>>> 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment
>>>>> (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM
>>>>> (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on
>>>>> VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical
>>>>> hosts.
>>>>> > > > I have inspected different logs to see if I can find any
>>>>> correlation what happened at the same time as the file is going through my
>>>>> loop, but there are no event/task at that exact time.
>>>>> > > >
>>>>> > > > System 1:
>>>>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a
>>>>> CryptographicHashContent: SHA256 value:
>>>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>>>>> > > >
>>>>> > > > SYSTEM 2:
>>>>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a
>>>>> CryptographicHashContent: SHA256 value:
>>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>> > > > <image.png>
>>>>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the
>>>>> same CryptographicHashContent at system 2: SHA256 value:
>>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the
>>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the
>>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>> > > >
>>>>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the
>>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>>> > > > <image.png>
>>>>> > > >
>>>>> > > > How on earth can this happen???
>>>>> > > >
>>>>> > > > Kind Regards
>>>>> > > > Jens M. Kofoed
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > <Repro.json>
>>>>>
>>>>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Ignore "For the scenario where you can replicate this please share the
flow.xml.gz for which it is reproducible."  I see the uploaded JSON

On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <jo...@gmail.com> wrote:

> Jens,
>
> We asked about the underlying storage system.  You replied with some info
> but not the specifics.  Do you know precisely what the underlying storage
> is and how it is presented to the operating system?  For instance is it NFS
> or something similar?
>
> I've setup a very similar flow at extremely high rates running for the
> past several days with no issue.  In my case though I know precisely what
> the config is and the disk setup is.  Didn't do anything special to be
> clear but still it is important to know.
>
> For the scenario where you can replicate this please share the flow.xml.gz
> for which it is reproducible.
>
> Thanks
> Joe
>
> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com>
> wrote:
>
>> Dear Joe and Mark
>>
>> I have created a test flow without the sftp processors, which don't
>> create any errors. Therefore I created a new test flow where I use a
>> MergeContent and UnpackContent instead of the sftp processors. This keeps
>> all data internal in NIFI, but force NIFI to write and read new files
>> totally local.
>> My flow have been running for 7 days and this morning there where 2 files
>> where the sha256 has been given another has value than original. I have set
>> this flow up in another nifi cluster only for testing, and the cluster is
>> not doing anything else. It is using Nifi 1.14.0
>> So I can reproduce issues at different nifi clusters and versions (1.13.2
>> and 1.14.0) where the calculation of a hash on content can give different
>> outputs. Is doesn't make any sense, but it happens. In all my cases the
>> issues happens where the calculations of the hashcontent happens right
>> after NIFI writes the content to the content repository. I don't know if
>> there cut be some kind of delay writing the content 100% before the next
>> processors begin reading the content???
>>
>> Please see attach test flow, and the previous mail with a pdf showing the
>> lineage of a production file which also had issues. In the pdf check step 5
>> and 12.
>>
>> Kind regards
>> Jens M. Kofoed
>>
>>
>> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
>> jmkofoed.ube@gmail.com>:
>>
>>> Joe,
>>>
>>> To start from the last mail :-)
>>> All the repositories has it's own disk, and I'm using ext4
>>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>>
>>> My test flow WITH sftp looks like this:
>>> [image: image.png]
>>> And this flow has produced 1 error within 3 days. After many many loops
>>> the file fails and went out via the "unmatched" output to  the disabled
>>> UpdateAttribute, which is doing nothing. Just for keeping the failed
>>> flowfile in a queue.  I enabled the UpdateAttribute and looped the file
>>> back to the CryptographicHashContent and now it calculated the hash correct
>>> again. But in this flow I have a FetchSFTP Process right before the Hashing.
>>> Right now my flow is running without the 2 sftp processors, and the last
>>> 24hours there has been no errors.
>>>
>>> About the Lineage:
>>> Are there a way to export all the lineage data? The export only generate
>>> a svg file.
>>> This is only for the receiving nifi which is internally calculate 2
>>> different hashes on the same content with ca. 1 minutes delay. Attached is
>>> a pdf-document with the lineage, the flow and all the relevant Provenance
>>> information's for each step in the lineage.
>>> The interesting steps are step 5 and 12.
>>>
>>> Can the issues be that data is not written 100% to disk between step 4
>>> and 5 in the flow?
>>>
>>> Kind regards
>>> Jens M. Kofoed
>>>
>>>
>>>
>>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>>
>>>> Jens,
>>>>
>>>> Also what type of file system/storage system are you running NiFi on
>>>> in this case?  We'll need to know this for the NiFi
>>>> content/flowfile/provenance repositories? Is it NFS?
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>>> >
>>>> > Jens,
>>>> >
>>>> > And to further narrow this down
>>>> >
>>>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>>> > (2 files per node) and next process was a hashcontent before it run
>>>> > into a test loop. Where files are uploaded via PutSFTP to a test
>>>> > server, and downloaded again and recalculated the hash. I have had one
>>>> > issue after 3 days of running."
>>>> >
>>>> > So to be clear with GenerateFlowFile making these files and then you
>>>> > looping the content is wholly and fully exclusively within the control
>>>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>>> > same files over and over in nifi itself you can make this happen or
>>>> > cannot?
>>>> >
>>>> > Thanks
>>>> >
>>>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>>> > >
>>>> > > Jens,
>>>> > >
>>>> > > "After fetching a FlowFile-stream file and unpacked it back into
>>>> NiFi
>>>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on
>>>> the
>>>> > > exact same file. And got a new hash. That is what worry’s me.
>>>> > > The fact that the same file can be recalculated and produce two
>>>> > > different hashes, is very strange, but it happens. "
>>>> > >
>>>> > > Ok so to confirm you are saying that in each case this happens you
>>>> see
>>>> > > it first compute the wrong hash, but then if you retry the same
>>>> > > flowfile it then provides the correct hash?
>>>> > >
>>>> > > Can you please also show/share the lineage history for such a flow
>>>> > > file then?  It should have events for the initial hash, second hash,
>>>> > > the unpacking, trace to the original stream, etc...
>>>> > >
>>>> > > Thanks
>>>> > >
>>>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
>>>> jmkofoed.ube@gmail.com> wrote:
>>>> > > >
>>>> > > > Dear Mark and Joe
>>>> > > >
>>>> > > > I know my setup isn’t normal for many people. But if we only
>>>> looks at my receive side, which the last mails is about. Every thing is
>>>> happening at the same NIFI instance. It is the same 3 node NIFI cluster.
>>>> > > > After fetching a FlowFile-stream file and unpacked it back into
>>>> NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>>> exact same file. And got a new hash. That is what worry’s me.
>>>> > > > The fact that the same file can be recalculated and produce two
>>>> different hashes, is very strange, but it happens. Over the last 5 months
>>>> it have only happen 35-40 times.
>>>> > > >
>>>> > > > I can understand if the file is not completely loaded and saved
>>>> into the content repository before the hashing starts. But I believe that
>>>> the unpack process don’t forward the flow file to the next process before
>>>> it is 100% finish unpacking and saving the new content to the repository.
>>>> > > >
>>>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB
>>>> files (2 files per node) and next process was a hashcontent before it run
>>>> into a test loop. Where files are uploaded via PutSFTP to a test server,
>>>> and downloaded again and recalculated the hash. I have had one issue after
>>>> 3 days of running.
>>>> > > > Now the test flow is running without the Put/Fetch sftp
>>>> processors.
>>>> > > >
>>>> > > > Another problem is that I can’t find any correlation to other
>>>> events. Not within NIFI, nor the server itself or VMWare. If I just could
>>>> find any other event which happens at the same time, I might be able to
>>>> force some kind of event to trigger the issue.
>>>> > > > I have tried to force VMware to migrate a NiFi node to another
>>>> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
>>>> trigger and error.
>>>> > > >
>>>> > > > I know it will be very very difficult to reproduce. But I will
>>>> setup multiple NiFi instances running different test flows to see if I can
>>>> find any reason why it behaves as it does.
>>>> > > >
>>>> > > > Kind Regards
>>>> > > > Jens M. Kofoed
>>>> > > >
>>>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <
>>>> markap14@hotmail.com>:
>>>> > > >
>>>> > > > Jens,
>>>> > > >
>>>> > > > Thanks for sharing the images.
>>>> > > >
>>>> > > > I tried to setup a test to reproduce the issue. I’ve had it
>>>> running for quite some time. Running through millions of iterations.
>>>> > > >
>>>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to
>>>> the tune of hundreds of MB). I’ve been unable to reproduce an issue after
>>>> millions of iterations.
>>>> > > >
>>>> > > > So far I cannot replicate. And since you’re pulling the data via
>>>> SFTP and then unpacking, which preserves all original attributes from a
>>>> different system, this can easily become confusing.
>>>> > > >
>>>> > > > Recommend trying to reproduce with SFTP-related processors out of
>>>> the picture, as Joe is mentioning. Either using GetFile/FetchFile or
>>>> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
>>>> an ‘initial hash’, copy that value to another attribute, and then loop,
>>>> generating the hash and comparing against the original one. I’ll attach a
>>>> flow that does this, but not sure if the email server will strip out the
>>>> attachment or not.
>>>> > > >
>>>> > > > This way we remove any possibility of actual corruption between
>>>> the two nifi instances. If we can still see corruption / different hashes
>>>> within a single nifi instance, then it certainly warrants further
>>>> investigation but i can’t see any issues so far.
>>>> > > >
>>>> > > > Thanks
>>>> > > > -Mark
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>
>>>> wrote:
>>>> > > >
>>>> > > > Jens
>>>> > > >
>>>> > > > Actually is this current loop test contained within a single nifi
>>>> and there you see corruption happen?
>>>> > > >
>>>> > > > Joe
>>>> > > >
>>>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
>>>> wrote:
>>>> > > >
>>>> > > > Jens,
>>>> > > >
>>>> > > > You have a very involved setup including other systems (non
>>>> NiFi).  Have you removed those systems from the equation so you have more
>>>> evidence to support your expectation that NiFi is doing something other
>>>> than you expect?
>>>> > > >
>>>> > > > Joe
>>>> > > >
>>>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
>>>> jmkofoed.ube@gmail.com> wrote:
>>>> > > >
>>>> > > > Hi
>>>> > > >
>>>> > > > Today I have another file which have been running through the
>>>> retry loop one time. To test the processors and the algorithm I added the
>>>> HashContent processor and also added hashing by SHA-1.
>>>> > > > I file have been going through the system, and both the SHA-1 and
>>>> SHA-256 are both different than expected. with a 1 minutes delay the file
>>>> is going back into the hashing content flow and this time it calculates
>>>> both hashes fine.
>>>> > > >
>>>> > > > I don't believe that the hashing is buggy, but something is very
>>>> very strange. What can influence the processors/algorithm to calculate a
>>>> different hash???
>>>> > > > All the input/output claim information is exactly the same. It is
>>>> the same flow/content file going in a loop. It happens on all 3 nodes.
>>>> > > >
>>>> > > > Any suggestions for where to dig ?
>>>> > > >
>>>> > > > Regards
>>>> > > > Jens M. Kofoed
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
>>>> jmkofoed.ube@gmail.com>:
>>>> > > >
>>>> > > > Hi Mark
>>>> > > >
>>>> > > > Thanks for replaying and the suggestion to look at the content
>>>> Claim.
>>>> > > > These 3 pictures is from the first attempt:
>>>> > > > <image.png>   <image.png>   <image.png>
>>>> > > >
>>>> > > > Yesterday I realized that the content was still in the archive,
>>>> so I could Replay the file.
>>>> > > > <image.png>
>>>> > > > So here are the same pictures but for the replay and as you can
>>>> see the Identifier, offset and Size are all the same.
>>>> > > > <image.png>   <image.png>   <image.png>
>>>> > > >
>>>> > > > In my flow if the hash does not match my original first
>>>> calculated hash, it goes into a retry loop. Here are the pictures for the
>>>> 4th time the file went through:
>>>> > > > <image.png>   <image.png>   <image.png>
>>>> > > > Here the content Claim is all the same.
>>>> > > >
>>>> > > > It is very rare that we see these issues <1 : 1.000.000 files and
>>>> only with large files. Only once have I seen the error with a 110MB file,
>>>> the other times the files size are above 800MB.
>>>> > > > This time it was a Nifi-Flowstream v3 file, which has been
>>>> exported from one system and imported in another. But while the file has
>>>> been imported it is the same file inside NIFI and it stays at the same
>>>> node. Going through the same loop of processors multiple times and in the
>>>> end the CryptographicHashContent calculate a different SHA256 than it did
>>>> earlier. This should not be possible!!! And that is what concern my the
>>>> most.
>>>> > > > What can influence the same processor to calculate 2 different
>>>> sha256 on the exact same content???
>>>> > > >
>>>> > > > Regards
>>>> > > > Jens M. Kofoed
>>>> > > >
>>>> > > >
>>>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
>>>> markap14@hotmail.com>:
>>>> > > >
>>>> > > > Jens,
>>>> > > >
>>>> > > > In the two provenance events - one showing a hash of dd4cc… and
>>>> the other showing f6f0….
>>>> > > > If you go to the Content tab, do they both show the same Content
>>>> Claim? I.e., do the Input Claim / Output Claim show the same values for
>>>> Container, Section, Identifier, Offset, and Size?
>>>> > > >
>>>> > > > Thanks
>>>> > > > -Mark
>>>> > > >
>>>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <
>>>> jmkofoed.ube@gmail.com> wrote:
>>>> > > >
>>>> > > > Dear NIFI Users
>>>> > > >
>>>> > > > I have posted this mail in the developers mailing list and just
>>>> want to inform all of our about a very odd behavior we are facing.
>>>> > > > The background:
>>>> > > > We have data going between 2 different NIFI systems which has no
>>>> direct network access to each other. Therefore we calculate a SHA256 hash
>>>> value of the content at system 1, before the flowfile and data are combined
>>>> and saved as a "flowfile-stream-v3" pkg file. The file is then transported
>>>> to system 2, where the pkg file is unpacked and the flow can continue. To
>>>> be sure about file integrity we calculate a new sha256 at system 2. But
>>>> sometimes we see that the sha256 gets another value, which might suggest
>>>> the file was corrupted. But recalculating the sha256 again gives a new hash
>>>> value.
>>>> > > >
>>>> > > > ----
>>>> > > >
>>>> > > > Tonight I had yet another file which didn't match the expected
>>>> sha256 hash value. The content is a 1.7GB file and the Event Duration was
>>>> "00:00:17.539" to calculate the hash.
>>>> > > > I have created a Retry loop, where the file will go to a Wait
>>>> process for delaying the file 1 minute and going back to the
>>>> CryptographicHashContent for a new calculation. After 3 retries the file
>>>> goes to the retries_exceeded and goes to a disabled process just to be in a
>>>> queue so I manually can look at it. This morning I rerouted the file from
>>>> my retries_exceeded queue back to the CryptographicHashContent for a new
>>>> calculation and this time it calculated the correct hash value.
>>>> > > >
>>>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very
>>>> strange is happening.
>>>> > > > <image.png>
>>>> > > >
>>>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02
>>>> with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
>>>> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
>>>> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
>>>> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>>> > > > I have inspected different logs to see if I can find any
>>>> correlation what happened at the same time as the file is going through my
>>>> loop, but there are no event/task at that exact time.
>>>> > > >
>>>> > > > System 1:
>>>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a
>>>> CryptographicHashContent: SHA256 value:
>>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>>>> > > >
>>>> > > > SYSTEM 2:
>>>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a
>>>> CryptographicHashContent: SHA256 value:
>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>> > > > <image.png>
>>>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the
>>>> same CryptographicHashContent at system 2: SHA256 value:
>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the
>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the
>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>> > > >
>>>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the
>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>> > > > <image.png>
>>>> > > >
>>>> > > > How on earth can this happen???
>>>> > > >
>>>> > > > Kind Regards
>>>> > > > Jens M. Kofoed
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > <Repro.json>
>>>>
>>>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

We asked about the underlying storage system.  You replied with some info
but not the specifics.  Do you know precisely what the underlying storage
is and how it is presented to the operating system?  For instance is it NFS
or something similar?

I've setup a very similar flow at extremely high rates running for the past
several days with no issue.  In my case though I know precisely what the
config is and the disk setup is.  Didn't do anything special to be clear
but still it is important to know.

For the scenario where you can replicate this please share the flow.xml.gz
for which it is reproducible.

Thanks
Joe

On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <jm...@gmail.com>
wrote:

> Dear Joe and Mark
>
> I have created a test flow without the sftp processors, which don't create
> any errors. Therefore I created a new test flow where I use a MergeContent
> and UnpackContent instead of the sftp processors. This keeps all data
> internal in NIFI, but force NIFI to write and read new files totally local.
> My flow have been running for 7 days and this morning there where 2 files
> where the sha256 has been given another has value than original. I have set
> this flow up in another nifi cluster only for testing, and the cluster is
> not doing anything else. It is using Nifi 1.14.0
> So I can reproduce issues at different nifi clusters and versions (1.13.2
> and 1.14.0) where the calculation of a hash on content can give different
> outputs. Is doesn't make any sense, but it happens. In all my cases the
> issues happens where the calculations of the hashcontent happens right
> after NIFI writes the content to the content repository. I don't know if
> there cut be some kind of delay writing the content 100% before the next
> processors begin reading the content???
>
> Please see attach test flow, and the previous mail with a pdf showing the
> lineage of a production file which also had issues. In the pdf check step 5
> and 12.
>
> Kind regards
> Jens M. Kofoed
>
>
> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
>
>> Joe,
>>
>> To start from the last mail :-)
>> All the repositories has it's own disk, and I'm using ext4
>> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
>> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
>> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>>
>> My test flow WITH sftp looks like this:
>> [image: image.png]
>> And this flow has produced 1 error within 3 days. After many many loops
>> the file fails and went out via the "unmatched" output to  the disabled
>> UpdateAttribute, which is doing nothing. Just for keeping the failed
>> flowfile in a queue.  I enabled the UpdateAttribute and looped the file
>> back to the CryptographicHashContent and now it calculated the hash correct
>> again. But in this flow I have a FetchSFTP Process right before the Hashing.
>> Right now my flow is running without the 2 sftp processors, and the last
>> 24hours there has been no errors.
>>
>> About the Lineage:
>> Are there a way to export all the lineage data? The export only generate
>> a svg file.
>> This is only for the receiving nifi which is internally calculate 2
>> different hashes on the same content with ca. 1 minutes delay. Attached is
>> a pdf-document with the lineage, the flow and all the relevant Provenance
>> information's for each step in the lineage.
>> The interesting steps are step 5 and 12.
>>
>> Can the issues be that data is not written 100% to disk between step 4
>> and 5 in the flow?
>>
>> Kind regards
>> Jens M. Kofoed
>>
>>
>>
>> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>>
>>> Jens,
>>>
>>> Also what type of file system/storage system are you running NiFi on
>>> in this case?  We'll need to know this for the NiFi
>>> content/flowfile/provenance repositories? Is it NFS?
>>>
>>> Thanks
>>>
>>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>>> >
>>> > Jens,
>>> >
>>> > And to further narrow this down
>>> >
>>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>>> > (2 files per node) and next process was a hashcontent before it run
>>> > into a test loop. Where files are uploaded via PutSFTP to a test
>>> > server, and downloaded again and recalculated the hash. I have had one
>>> > issue after 3 days of running."
>>> >
>>> > So to be clear with GenerateFlowFile making these files and then you
>>> > looping the content is wholly and fully exclusively within the control
>>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>>> > same files over and over in nifi itself you can make this happen or
>>> > cannot?
>>> >
>>> > Thanks
>>> >
>>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>>> > >
>>> > > Jens,
>>> > >
>>> > > "After fetching a FlowFile-stream file and unpacked it back into NiFi
>>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>> > > exact same file. And got a new hash. That is what worry’s me.
>>> > > The fact that the same file can be recalculated and produce two
>>> > > different hashes, is very strange, but it happens. "
>>> > >
>>> > > Ok so to confirm you are saying that in each case this happens you
>>> see
>>> > > it first compute the wrong hash, but then if you retry the same
>>> > > flowfile it then provides the correct hash?
>>> > >
>>> > > Can you please also show/share the lineage history for such a flow
>>> > > file then?  It should have events for the initial hash, second hash,
>>> > > the unpacking, trace to the original stream, etc...
>>> > >
>>> > > Thanks
>>> > >
>>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
>>> jmkofoed.ube@gmail.com> wrote:
>>> > > >
>>> > > > Dear Mark and Joe
>>> > > >
>>> > > > I know my setup isn’t normal for many people. But if we only looks
>>> at my receive side, which the last mails is about. Every thing is happening
>>> at the same NIFI instance. It is the same 3 node NIFI cluster.
>>> > > > After fetching a FlowFile-stream file and unpacked it back into
>>> NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>>> exact same file. And got a new hash. That is what worry’s me.
>>> > > > The fact that the same file can be recalculated and produce two
>>> different hashes, is very strange, but it happens. Over the last 5 months
>>> it have only happen 35-40 times.
>>> > > >
>>> > > > I can understand if the file is not completely loaded and saved
>>> into the content repository before the hashing starts. But I believe that
>>> the unpack process don’t forward the flow file to the next process before
>>> it is 100% finish unpacking and saving the new content to the repository.
>>> > > >
>>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB
>>> files (2 files per node) and next process was a hashcontent before it run
>>> into a test loop. Where files are uploaded via PutSFTP to a test server,
>>> and downloaded again and recalculated the hash. I have had one issue after
>>> 3 days of running.
>>> > > > Now the test flow is running without the Put/Fetch sftp processors.
>>> > > >
>>> > > > Another problem is that I can’t find any correlation to other
>>> events. Not within NIFI, nor the server itself or VMWare. If I just could
>>> find any other event which happens at the same time, I might be able to
>>> force some kind of event to trigger the issue.
>>> > > > I have tried to force VMware to migrate a NiFi node to another
>>> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
>>> trigger and error.
>>> > > >
>>> > > > I know it will be very very difficult to reproduce. But I will
>>> setup multiple NiFi instances running different test flows to see if I can
>>> find any reason why it behaves as it does.
>>> > > >
>>> > > > Kind Regards
>>> > > > Jens M. Kofoed
>>> > > >
>>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <markap14@hotmail.com
>>> >:
>>> > > >
>>> > > > Jens,
>>> > > >
>>> > > > Thanks for sharing the images.
>>> > > >
>>> > > > I tried to setup a test to reproduce the issue. I’ve had it
>>> running for quite some time. Running through millions of iterations.
>>> > > >
>>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the
>>> tune of hundreds of MB). I’ve been unable to reproduce an issue after
>>> millions of iterations.
>>> > > >
>>> > > > So far I cannot replicate. And since you’re pulling the data via
>>> SFTP and then unpacking, which preserves all original attributes from a
>>> different system, this can easily become confusing.
>>> > > >
>>> > > > Recommend trying to reproduce with SFTP-related processors out of
>>> the picture, as Joe is mentioning. Either using GetFile/FetchFile or
>>> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
>>> an ‘initial hash’, copy that value to another attribute, and then loop,
>>> generating the hash and comparing against the original one. I’ll attach a
>>> flow that does this, but not sure if the email server will strip out the
>>> attachment or not.
>>> > > >
>>> > > > This way we remove any possibility of actual corruption between
>>> the two nifi instances. If we can still see corruption / different hashes
>>> within a single nifi instance, then it certainly warrants further
>>> investigation but i can’t see any issues so far.
>>> > > >
>>> > > > Thanks
>>> > > > -Mark
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>>> > > >
>>> > > > Jens
>>> > > >
>>> > > > Actually is this current loop test contained within a single nifi
>>> and there you see corruption happen?
>>> > > >
>>> > > > Joe
>>> > > >
>>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > Jens,
>>> > > >
>>> > > > You have a very involved setup including other systems (non
>>> NiFi).  Have you removed those systems from the equation so you have more
>>> evidence to support your expectation that NiFi is doing something other
>>> than you expect?
>>> > > >
>>> > > > Joe
>>> > > >
>>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
>>> jmkofoed.ube@gmail.com> wrote:
>>> > > >
>>> > > > Hi
>>> > > >
>>> > > > Today I have another file which have been running through the
>>> retry loop one time. To test the processors and the algorithm I added the
>>> HashContent processor and also added hashing by SHA-1.
>>> > > > I file have been going through the system, and both the SHA-1 and
>>> SHA-256 are both different than expected. with a 1 minutes delay the file
>>> is going back into the hashing content flow and this time it calculates
>>> both hashes fine.
>>> > > >
>>> > > > I don't believe that the hashing is buggy, but something is very
>>> very strange. What can influence the processors/algorithm to calculate a
>>> different hash???
>>> > > > All the input/output claim information is exactly the same. It is
>>> the same flow/content file going in a loop. It happens on all 3 nodes.
>>> > > >
>>> > > > Any suggestions for where to dig ?
>>> > > >
>>> > > > Regards
>>> > > > Jens M. Kofoed
>>> > > >
>>> > > >
>>> > > >
>>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
>>> jmkofoed.ube@gmail.com>:
>>> > > >
>>> > > > Hi Mark
>>> > > >
>>> > > > Thanks for replaying and the suggestion to look at the content
>>> Claim.
>>> > > > These 3 pictures is from the first attempt:
>>> > > > <image.png>   <image.png>   <image.png>
>>> > > >
>>> > > > Yesterday I realized that the content was still in the archive, so
>>> I could Replay the file.
>>> > > > <image.png>
>>> > > > So here are the same pictures but for the replay and as you can
>>> see the Identifier, offset and Size are all the same.
>>> > > > <image.png>   <image.png>   <image.png>
>>> > > >
>>> > > > In my flow if the hash does not match my original first calculated
>>> hash, it goes into a retry loop. Here are the pictures for the 4th time the
>>> file went through:
>>> > > > <image.png>   <image.png>   <image.png>
>>> > > > Here the content Claim is all the same.
>>> > > >
>>> > > > It is very rare that we see these issues <1 : 1.000.000 files and
>>> only with large files. Only once have I seen the error with a 110MB file,
>>> the other times the files size are above 800MB.
>>> > > > This time it was a Nifi-Flowstream v3 file, which has been
>>> exported from one system and imported in another. But while the file has
>>> been imported it is the same file inside NIFI and it stays at the same
>>> node. Going through the same loop of processors multiple times and in the
>>> end the CryptographicHashContent calculate a different SHA256 than it did
>>> earlier. This should not be possible!!! And that is what concern my the
>>> most.
>>> > > > What can influence the same processor to calculate 2 different
>>> sha256 on the exact same content???
>>> > > >
>>> > > > Regards
>>> > > > Jens M. Kofoed
>>> > > >
>>> > > >
>>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
>>> markap14@hotmail.com>:
>>> > > >
>>> > > > Jens,
>>> > > >
>>> > > > In the two provenance events - one showing a hash of dd4cc… and
>>> the other showing f6f0….
>>> > > > If you go to the Content tab, do they both show the same Content
>>> Claim? I.e., do the Input Claim / Output Claim show the same values for
>>> Container, Section, Identifier, Offset, and Size?
>>> > > >
>>> > > > Thanks
>>> > > > -Mark
>>> > > >
>>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <
>>> jmkofoed.ube@gmail.com> wrote:
>>> > > >
>>> > > > Dear NIFI Users
>>> > > >
>>> > > > I have posted this mail in the developers mailing list and just
>>> want to inform all of our about a very odd behavior we are facing.
>>> > > > The background:
>>> > > > We have data going between 2 different NIFI systems which has no
>>> direct network access to each other. Therefore we calculate a SHA256 hash
>>> value of the content at system 1, before the flowfile and data are combined
>>> and saved as a "flowfile-stream-v3" pkg file. The file is then transported
>>> to system 2, where the pkg file is unpacked and the flow can continue. To
>>> be sure about file integrity we calculate a new sha256 at system 2. But
>>> sometimes we see that the sha256 gets another value, which might suggest
>>> the file was corrupted. But recalculating the sha256 again gives a new hash
>>> value.
>>> > > >
>>> > > > ----
>>> > > >
>>> > > > Tonight I had yet another file which didn't match the expected
>>> sha256 hash value. The content is a 1.7GB file and the Event Duration was
>>> "00:00:17.539" to calculate the hash.
>>> > > > I have created a Retry loop, where the file will go to a Wait
>>> process for delaying the file 1 minute and going back to the
>>> CryptographicHashContent for a new calculation. After 3 retries the file
>>> goes to the retries_exceeded and goes to a disabled process just to be in a
>>> queue so I manually can look at it. This morning I rerouted the file from
>>> my retries_exceeded queue back to the CryptographicHashContent for a new
>>> calculation and this time it calculated the correct hash value.
>>> > > >
>>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very
>>> strange is happening.
>>> > > > <image.png>
>>> > > >
>>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02
>>> with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
>>> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
>>> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
>>> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>> > > > I have inspected different logs to see if I can find any
>>> correlation what happened at the same time as the file is going through my
>>> loop, but there are no event/task at that exact time.
>>> > > >
>>> > > > System 1:
>>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a
>>> CryptographicHashContent: SHA256 value:
>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>>> > > >
>>> > > > SYSTEM 2:
>>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a
>>> CryptographicHashContent: SHA256 value:
>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> > > > <image.png>
>>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the same
>>> CryptographicHashContent at system 2: SHA256 value:
>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the same
>>> a CryptographicHashContent at system 2: SHA256 value:
>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the same
>>> a CryptographicHashContent at system 2: SHA256 value:
>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> > > >
>>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the same
>>> a CryptographicHashContent at system 2: SHA256 value:
>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>> > > > <image.png>
>>> > > >
>>> > > > How on earth can this happen???
>>> > > >
>>> > > > Kind Regards
>>> > > > Jens M. Kofoed
>>> > > >
>>> > > >
>>> > > >
>>> > > > <Repro.json>
>>>
>>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Dear Joe and Mark

I have created a test flow without the sftp processors, which don't create
any errors. Therefore I created a new test flow where I use a MergeContent
and UnpackContent instead of the sftp processors. This keeps all data
internal in NIFI, but force NIFI to write and read new files totally local.
My flow have been running for 7 days and this morning there where 2 files
where the sha256 has been given another has value than original. I have set
this flow up in another nifi cluster only for testing, and the cluster is
not doing anything else. It is using Nifi 1.14.0
So I can reproduce issues at different nifi clusters and versions (1.13.2
and 1.14.0) where the calculation of a hash on content can give different
outputs. Is doesn't make any sense, but it happens. In all my cases the
issues happens where the calculations of the hashcontent happens right
after NIFI writes the content to the content repository. I don't know if
there cut be some kind of delay writing the content 100% before the next
processors begin reading the content???

Please see attach test flow, and the previous mail with a pdf showing the
lineage of a production file which also had issues. In the pdf check step 5
and 12.

Kind regards
Jens M. Kofoed


Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed <
jmkofoed.ube@gmail.com>:

> Joe,
>
> To start from the last mail :-)
> All the repositories has it's own disk, and I'm using ext4
> /dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
> /dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
> /dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0
>
> My test flow WITH sftp looks like this:
> [image: image.png]
> And this flow has produced 1 error within 3 days. After many many loops
> the file fails and went out via the "unmatched" output to  the disabled
> UpdateAttribute, which is doing nothing. Just for keeping the failed
> flowfile in a queue.  I enabled the UpdateAttribute and looped the file
> back to the CryptographicHashContent and now it calculated the hash correct
> again. But in this flow I have a FetchSFTP Process right before the Hashing.
> Right now my flow is running without the 2 sftp processors, and the last
> 24hours there has been no errors.
>
> About the Lineage:
> Are there a way to export all the lineage data? The export only generate a
> svg file.
> This is only for the receiving nifi which is internally calculate 2
> different hashes on the same content with ca. 1 minutes delay. Attached is
> a pdf-document with the lineage, the flow and all the relevant Provenance
> information's for each step in the lineage.
> The interesting steps are step 5 and 12.
>
> Can the issues be that data is not written 100% to disk between step 4 and
> 5 in the flow?
>
> Kind regards
> Jens M. Kofoed
>
>
>
> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:
>
>> Jens,
>>
>> Also what type of file system/storage system are you running NiFi on
>> in this case?  We'll need to know this for the NiFi
>> content/flowfile/provenance repositories? Is it NFS?
>>
>> Thanks
>>
>> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>> >
>> > Jens,
>> >
>> > And to further narrow this down
>> >
>> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
>> > (2 files per node) and next process was a hashcontent before it run
>> > into a test loop. Where files are uploaded via PutSFTP to a test
>> > server, and downloaded again and recalculated the hash. I have had one
>> > issue after 3 days of running."
>> >
>> > So to be clear with GenerateFlowFile making these files and then you
>> > looping the content is wholly and fully exclusively within the control
>> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
>> > same files over and over in nifi itself you can make this happen or
>> > cannot?
>> >
>> > Thanks
>> >
>> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>> > >
>> > > Jens,
>> > >
>> > > "After fetching a FlowFile-stream file and unpacked it back into NiFi
>> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>> > > exact same file. And got a new hash. That is what worry’s me.
>> > > The fact that the same file can be recalculated and produce two
>> > > different hashes, is very strange, but it happens. "
>> > >
>> > > Ok so to confirm you are saying that in each case this happens you see
>> > > it first compute the wrong hash, but then if you retry the same
>> > > flowfile it then provides the correct hash?
>> > >
>> > > Can you please also show/share the lineage history for such a flow
>> > > file then?  It should have events for the initial hash, second hash,
>> > > the unpacking, trace to the original stream, etc...
>> > >
>> > > Thanks
>> > >
>> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
>> jmkofoed.ube@gmail.com> wrote:
>> > > >
>> > > > Dear Mark and Joe
>> > > >
>> > > > I know my setup isn’t normal for many people. But if we only looks
>> at my receive side, which the last mails is about. Every thing is happening
>> at the same NIFI instance. It is the same 3 node NIFI cluster.
>> > > > After fetching a FlowFile-stream file and unpacked it back into
>> NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the
>> exact same file. And got a new hash. That is what worry’s me.
>> > > > The fact that the same file can be recalculated and produce two
>> different hashes, is very strange, but it happens. Over the last 5 months
>> it have only happen 35-40 times.
>> > > >
>> > > > I can understand if the file is not completely loaded and saved
>> into the content repository before the hashing starts. But I believe that
>> the unpack process don’t forward the flow file to the next process before
>> it is 100% finish unpacking and saving the new content to the repository.
>> > > >
>> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB
>> files (2 files per node) and next process was a hashcontent before it run
>> into a test loop. Where files are uploaded via PutSFTP to a test server,
>> and downloaded again and recalculated the hash. I have had one issue after
>> 3 days of running.
>> > > > Now the test flow is running without the Put/Fetch sftp processors.
>> > > >
>> > > > Another problem is that I can’t find any correlation to other
>> events. Not within NIFI, nor the server itself or VMWare. If I just could
>> find any other event which happens at the same time, I might be able to
>> force some kind of event to trigger the issue.
>> > > > I have tried to force VMware to migrate a NiFi node to another
>> host. Forcing it to do a snapshot and deleting snapshots, but nothing can
>> trigger and error.
>> > > >
>> > > > I know it will be very very difficult to reproduce. But I will
>> setup multiple NiFi instances running different test flows to see if I can
>> find any reason why it behaves as it does.
>> > > >
>> > > > Kind Regards
>> > > > Jens M. Kofoed
>> > > >
>> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <markap14@hotmail.com
>> >:
>> > > >
>> > > > Jens,
>> > > >
>> > > > Thanks for sharing the images.
>> > > >
>> > > > I tried to setup a test to reproduce the issue. I’ve had it running
>> for quite some time. Running through millions of iterations.
>> > > >
>> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the
>> tune of hundreds of MB). I’ve been unable to reproduce an issue after
>> millions of iterations.
>> > > >
>> > > > So far I cannot replicate. And since you’re pulling the data via
>> SFTP and then unpacking, which preserves all original attributes from a
>> different system, this can easily become confusing.
>> > > >
>> > > > Recommend trying to reproduce with SFTP-related processors out of
>> the picture, as Joe is mentioning. Either using GetFile/FetchFile or
>> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
>> an ‘initial hash’, copy that value to another attribute, and then loop,
>> generating the hash and comparing against the original one. I’ll attach a
>> flow that does this, but not sure if the email server will strip out the
>> attachment or not.
>> > > >
>> > > > This way we remove any possibility of actual corruption between the
>> two nifi instances. If we can still see corruption / different hashes
>> within a single nifi instance, then it certainly warrants further
>> investigation but i can’t see any issues so far.
>> > > >
>> > > > Thanks
>> > > > -Mark
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>> > > >
>> > > > Jens
>> > > >
>> > > > Actually is this current loop test contained within a single nifi
>> and there you see corruption happen?
>> > > >
>> > > > Joe
>> > > >
>> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>
>> wrote:
>> > > >
>> > > > Jens,
>> > > >
>> > > > You have a very involved setup including other systems (non NiFi).
>> Have you removed those systems from the equation so you have more evidence
>> to support your expectation that NiFi is doing something other than you
>> expect?
>> > > >
>> > > > Joe
>> > > >
>> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
>> jmkofoed.ube@gmail.com> wrote:
>> > > >
>> > > > Hi
>> > > >
>> > > > Today I have another file which have been running through the retry
>> loop one time. To test the processors and the algorithm I added the
>> HashContent processor and also added hashing by SHA-1.
>> > > > I file have been going through the system, and both the SHA-1 and
>> SHA-256 are both different than expected. with a 1 minutes delay the file
>> is going back into the hashing content flow and this time it calculates
>> both hashes fine.
>> > > >
>> > > > I don't believe that the hashing is buggy, but something is very
>> very strange. What can influence the processors/algorithm to calculate a
>> different hash???
>> > > > All the input/output claim information is exactly the same. It is
>> the same flow/content file going in a loop. It happens on all 3 nodes.
>> > > >
>> > > > Any suggestions for where to dig ?
>> > > >
>> > > > Regards
>> > > > Jens M. Kofoed
>> > > >
>> > > >
>> > > >
>> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
>> jmkofoed.ube@gmail.com>:
>> > > >
>> > > > Hi Mark
>> > > >
>> > > > Thanks for replaying and the suggestion to look at the content
>> Claim.
>> > > > These 3 pictures is from the first attempt:
>> > > > <image.png>   <image.png>   <image.png>
>> > > >
>> > > > Yesterday I realized that the content was still in the archive, so
>> I could Replay the file.
>> > > > <image.png>
>> > > > So here are the same pictures but for the replay and as you can see
>> the Identifier, offset and Size are all the same.
>> > > > <image.png>   <image.png>   <image.png>
>> > > >
>> > > > In my flow if the hash does not match my original first calculated
>> hash, it goes into a retry loop. Here are the pictures for the 4th time the
>> file went through:
>> > > > <image.png>   <image.png>   <image.png>
>> > > > Here the content Claim is all the same.
>> > > >
>> > > > It is very rare that we see these issues <1 : 1.000.000 files and
>> only with large files. Only once have I seen the error with a 110MB file,
>> the other times the files size are above 800MB.
>> > > > This time it was a Nifi-Flowstream v3 file, which has been exported
>> from one system and imported in another. But while the file has been
>> imported it is the same file inside NIFI and it stays at the same node.
>> Going through the same loop of processors multiple times and in the end the
>> CryptographicHashContent calculate a different SHA256 than it did earlier.
>> This should not be possible!!! And that is what concern my the most.
>> > > > What can influence the same processor to calculate 2 different
>> sha256 on the exact same content???
>> > > >
>> > > > Regards
>> > > > Jens M. Kofoed
>> > > >
>> > > >
>> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
>> markap14@hotmail.com>:
>> > > >
>> > > > Jens,
>> > > >
>> > > > In the two provenance events - one showing a hash of dd4cc… and the
>> other showing f6f0….
>> > > > If you go to the Content tab, do they both show the same Content
>> Claim? I.e., do the Input Claim / Output Claim show the same values for
>> Container, Section, Identifier, Offset, and Size?
>> > > >
>> > > > Thanks
>> > > > -Mark
>> > > >
>> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>
>> wrote:
>> > > >
>> > > > Dear NIFI Users
>> > > >
>> > > > I have posted this mail in the developers mailing list and just
>> want to inform all of our about a very odd behavior we are facing.
>> > > > The background:
>> > > > We have data going between 2 different NIFI systems which has no
>> direct network access to each other. Therefore we calculate a SHA256 hash
>> value of the content at system 1, before the flowfile and data are combined
>> and saved as a "flowfile-stream-v3" pkg file. The file is then transported
>> to system 2, where the pkg file is unpacked and the flow can continue. To
>> be sure about file integrity we calculate a new sha256 at system 2. But
>> sometimes we see that the sha256 gets another value, which might suggest
>> the file was corrupted. But recalculating the sha256 again gives a new hash
>> value.
>> > > >
>> > > > ----
>> > > >
>> > > > Tonight I had yet another file which didn't match the expected
>> sha256 hash value. The content is a 1.7GB file and the Event Duration was
>> "00:00:17.539" to calculate the hash.
>> > > > I have created a Retry loop, where the file will go to a Wait
>> process for delaying the file 1 minute and going back to the
>> CryptographicHashContent for a new calculation. After 3 retries the file
>> goes to the retries_exceeded and goes to a disabled process just to be in a
>> queue so I manually can look at it. This morning I rerouted the file from
>> my retries_exceeded queue back to the CryptographicHashContent for a new
>> calculation and this time it calculated the correct hash value.
>> > > >
>> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange
>> is happening.
>> > > > <image.png>
>> > > >
>> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02
>> with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
>> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
>> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
>> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>> > > > I have inspected different logs to see if I can find any
>> correlation what happened at the same time as the file is going through my
>> loop, but there are no event/task at that exact time.
>> > > >
>> > > > System 1:
>> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a
>> CryptographicHashContent: SHA256 value:
>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>> > > > The file is exported as a "FlowFile Stream, v3" to System 2
>> > > >
>> > > > SYSTEM 2:
>> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a
>> CryptographicHashContent: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> > > > <image.png>
>> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the same
>> CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the same
>> a CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the same
>> a CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> > > >
>> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the same
>> a CryptographicHashContent at system 2: SHA256 value:
>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>> > > > <image.png>
>> > > >
>> > > > How on earth can this happen???
>> > > >
>> > > > Kind Regards
>> > > > Jens M. Kofoed
>> > > >
>> > > >
>> > > >
>> > > > <Repro.json>
>>
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Joe,

To start from the last mail :-)
All the repositories has it's own disk, and I'm using ext4
/dev/VG_b/LV_b    /nifiRepo    ext4    defaults,noatime    0 0
/dev/VG_c/LV_c    /provRepo01    ext4    defaults,noatime    0 0
/dev/VG_d/LV_d    /contRepo01    ext4    defaults,noatime    0 0

My test flow WITH sftp looks like this:
[image: image.png]
And this flow has produced 1 error within 3 days. After many many loops the
file fails and went out via the "unmatched" output to  the disabled
UpdateAttribute, which is doing nothing. Just for keeping the failed
flowfile in a queue.  I enabled the UpdateAttribute and looped the file
back to the CryptographicHashContent and now it calculated the hash correct
again. But in this flow I have a FetchSFTP Process right before the Hashing.
Right now my flow is running without the 2 sftp processors, and the last
24hours there has been no errors.

About the Lineage:
Are there a way to export all the lineage data? The export only generate a
svg file.
This is only for the receiving nifi which is internally calculate 2
different hashes on the same content with ca. 1 minutes delay. Attached is
a pdf-document with the lineage, the flow and all the relevant Provenance
information's for each step in the lineage.
The interesting steps are step 5 and 12.

Can the issues be that data is not written 100% to disk between step 4 and
5 in the flow?

Kind regards
Jens M. Kofoed



Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <jo...@gmail.com>:

> Jens,
>
> Also what type of file system/storage system are you running NiFi on
> in this case?  We'll need to know this for the NiFi
> content/flowfile/provenance repositories? Is it NFS?
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
> >
> > Jens,
> >
> > And to further narrow this down
> >
> > "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> > (2 files per node) and next process was a hashcontent before it run
> > into a test loop. Where files are uploaded via PutSFTP to a test
> > server, and downloaded again and recalculated the hash. I have had one
> > issue after 3 days of running."
> >
> > So to be clear with GenerateFlowFile making these files and then you
> > looping the content is wholly and fully exclusively within the control
> > of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> > same files over and over in nifi itself you can make this happen or
> > cannot?
> >
> > Thanks
> >
> > On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
> > >
> > > Jens,
> > >
> > > "After fetching a FlowFile-stream file and unpacked it back into NiFi
> > > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> > > exact same file. And got a new hash. That is what worry’s me.
> > > The fact that the same file can be recalculated and produce two
> > > different hashes, is very strange, but it happens. "
> > >
> > > Ok so to confirm you are saying that in each case this happens you see
> > > it first compute the wrong hash, but then if you retry the same
> > > flowfile it then provides the correct hash?
> > >
> > > Can you please also show/share the lineage history for such a flow
> > > file then?  It should have events for the initial hash, second hash,
> > > the unpacking, trace to the original stream, etc...
> > >
> > > Thanks
> > >
> > > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > > >
> > > > Dear Mark and Joe
> > > >
> > > > I know my setup isn’t normal for many people. But if we only looks
> at my receive side, which the last mails is about. Every thing is happening
> at the same NIFI instance. It is the same 3 node NIFI cluster.
> > > > After fetching a FlowFile-stream file and unpacked it back into NiFi
> I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact
> same file. And got a new hash. That is what worry’s me.
> > > > The fact that the same file can be recalculated and produce two
> different hashes, is very strange, but it happens. Over the last 5 months
> it have only happen 35-40 times.
> > > >
> > > > I can understand if the file is not completely loaded and saved into
> the content repository before the hashing starts. But I believe that the
> unpack process don’t forward the flow file to the next process before it is
> 100% finish unpacking and saving the new content to the repository.
> > > >
> > > > I have a test flow, where a GenerateFlowfile has created 6x 1GB
> files (2 files per node) and next process was a hashcontent before it run
> into a test loop. Where files are uploaded via PutSFTP to a test server,
> and downloaded again and recalculated the hash. I have had one issue after
> 3 days of running.
> > > > Now the test flow is running without the Put/Fetch sftp processors.
> > > >
> > > > Another problem is that I can’t find any correlation to other
> events. Not within NIFI, nor the server itself or VMWare. If I just could
> find any other event which happens at the same time, I might be able to
> force some kind of event to trigger the issue.
> > > > I have tried to force VMware to migrate a NiFi node to another host.
> Forcing it to do a snapshot and deleting snapshots, but nothing can trigger
> and error.
> > > >
> > > > I know it will be very very difficult to reproduce. But I will setup
> multiple NiFi instances running different test flows to see if I can find
> any reason why it behaves as it does.
> > > >
> > > > Kind Regards
> > > > Jens M. Kofoed
> > > >
> > > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
> > > >
> > > > Jens,
> > > >
> > > > Thanks for sharing the images.
> > > >
> > > > I tried to setup a test to reproduce the issue. I’ve had it running
> for quite some time. Running through millions of iterations.
> > > >
> > > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the
> tune of hundreds of MB). I’ve been unable to reproduce an issue after
> millions of iterations.
> > > >
> > > > So far I cannot replicate. And since you’re pulling the data via
> SFTP and then unpacking, which preserves all original attributes from a
> different system, this can easily become confusing.
> > > >
> > > > Recommend trying to reproduce with SFTP-related processors out of
> the picture, as Joe is mentioning. Either using GetFile/FetchFile or
> GenerateFlowFile. Then immediately use CryptographicHashContent to generate
> an ‘initial hash’, copy that value to another attribute, and then loop,
> generating the hash and comparing against the original one. I’ll attach a
> flow that does this, but not sure if the email server will strip out the
> attachment or not.
> > > >
> > > > This way we remove any possibility of actual corruption between the
> two nifi instances. If we can still see corruption / different hashes
> within a single nifi instance, then it certainly warrants further
> investigation but i can’t see any issues so far.
> > > >
> > > > Thanks
> > > > -Mark
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
> > > >
> > > > Jens
> > > >
> > > > Actually is this current loop test contained within a single nifi
> and there you see corruption happen?
> > > >
> > > > Joe
> > > >
> > > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
> > > >
> > > > Jens,
> > > >
> > > > You have a very involved setup including other systems (non NiFi).
> Have you removed those systems from the equation so you have more evidence
> to support your expectation that NiFi is doing something other than you
> expect?
> > > >
> > > > Joe
> > > >
> > > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <
> jmkofoed.ube@gmail.com> wrote:
> > > >
> > > > Hi
> > > >
> > > > Today I have another file which have been running through the retry
> loop one time. To test the processors and the algorithm I added the
> HashContent processor and also added hashing by SHA-1.
> > > > I file have been going through the system, and both the SHA-1 and
> SHA-256 are both different than expected. with a 1 minutes delay the file
> is going back into the hashing content flow and this time it calculates
> both hashes fine.
> > > >
> > > > I don't believe that the hashing is buggy, but something is very
> very strange. What can influence the processors/algorithm to calculate a
> different hash???
> > > > All the input/output claim information is exactly the same. It is
> the same flow/content file going in a loop. It happens on all 3 nodes.
> > > >
> > > > Any suggestions for where to dig ?
> > > >
> > > > Regards
> > > > Jens M. Kofoed
> > > >
> > > >
> > > >
> > > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
> > > >
> > > > Hi Mark
> > > >
> > > > Thanks for replaying and the suggestion to look at the content Claim.
> > > > These 3 pictures is from the first attempt:
> > > > <image.png>   <image.png>   <image.png>
> > > >
> > > > Yesterday I realized that the content was still in the archive, so I
> could Replay the file.
> > > > <image.png>
> > > > So here are the same pictures but for the replay and as you can see
> the Identifier, offset and Size are all the same.
> > > > <image.png>   <image.png>   <image.png>
> > > >
> > > > In my flow if the hash does not match my original first calculated
> hash, it goes into a retry loop. Here are the pictures for the 4th time the
> file went through:
> > > > <image.png>   <image.png>   <image.png>
> > > > Here the content Claim is all the same.
> > > >
> > > > It is very rare that we see these issues <1 : 1.000.000 files and
> only with large files. Only once have I seen the error with a 110MB file,
> the other times the files size are above 800MB.
> > > > This time it was a Nifi-Flowstream v3 file, which has been exported
> from one system and imported in another. But while the file has been
> imported it is the same file inside NIFI and it stays at the same node.
> Going through the same loop of processors multiple times and in the end the
> CryptographicHashContent calculate a different SHA256 than it did earlier.
> This should not be possible!!! And that is what concern my the most.
> > > > What can influence the same processor to calculate 2 different
> sha256 on the exact same content???
> > > >
> > > > Regards
> > > > Jens M. Kofoed
> > > >
> > > >
> > > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <
> markap14@hotmail.com>:
> > > >
> > > > Jens,
> > > >
> > > > In the two provenance events - one showing a hash of dd4cc… and the
> other showing f6f0….
> > > > If you go to the Content tab, do they both show the same Content
> Claim? I.e., do the Input Claim / Output Claim show the same values for
> Container, Section, Identifier, Offset, and Size?
> > > >
> > > > Thanks
> > > > -Mark
> > > >
> > > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>
> wrote:
> > > >
> > > > Dear NIFI Users
> > > >
> > > > I have posted this mail in the developers mailing list and just want
> to inform all of our about a very odd behavior we are facing.
> > > > The background:
> > > > We have data going between 2 different NIFI systems which has no
> direct network access to each other. Therefore we calculate a SHA256 hash
> value of the content at system 1, before the flowfile and data are combined
> and saved as a "flowfile-stream-v3" pkg file. The file is then transported
> to system 2, where the pkg file is unpacked and the flow can continue. To
> be sure about file integrity we calculate a new sha256 at system 2. But
> sometimes we see that the sha256 gets another value, which might suggest
> the file was corrupted. But recalculating the sha256 again gives a new hash
> value.
> > > >
> > > > ----
> > > >
> > > > Tonight I had yet another file which didn't match the expected
> sha256 hash value. The content is a 1.7GB file and the Event Duration was
> "00:00:17.539" to calculate the hash.
> > > > I have created a Retry loop, where the file will go to a Wait
> process for delaying the file 1 minute and going back to the
> CryptographicHashContent for a new calculation. After 3 retries the file
> goes to the retries_exceeded and goes to a disabled process just to be in a
> queue so I manually can look at it. This morning I rerouted the file from
> my retries_exceeded queue back to the CryptographicHashContent for a new
> calculation and this time it calculated the correct hash value.
> > > >
> > > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange
> is happening.
> > > > <image.png>
> > > >
> > > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02
> with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> > > > I have inspected different logs to see if I can find any correlation
> what happened at the same time as the file is going through my loop, but
> there are no event/task at that exact time.
> > > >
> > > > System 1:
> > > > At 10/19/2021 00:15:11.247 CEST my file is going through a
> CryptographicHashContent: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > > The file is exported as a "FlowFile Stream, v3" to System 2
> > > >
> > > > SYSTEM 2:
> > > > At 10/19/2021 00:18:10.528 CEST the file is going through a
> CryptographicHashContent: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > > <image.png>
> > > > At 10/19/2021 00:19:08.996 CEST the file is going through the same
> CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > > At 10/19/2021 00:20:04.376 CEST the file is going through the same a
> CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > > At 10/19/2021 00:21:01.711 CEST the file is going through the same a
> CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > >
> > > > At 10/19/2021 06:07:43.376 CEST the file is going through the same a
> CryptographicHashContent at system 2: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > > <image.png>
> > > >
> > > > How on earth can this happen???
> > > >
> > > > Kind Regards
> > > > Jens M. Kofoed
> > > >
> > > >
> > > >
> > > > <Repro.json>
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

Also what type of file system/storage system are you running NiFi on
in this case?  We'll need to know this for the NiFi
content/flowfile/provenance repositories? Is it NFS?

Thanks

On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <jo...@gmail.com> wrote:
>
> Jens,
>
> And to further narrow this down
>
> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> (2 files per node) and next process was a hashcontent before it run
> into a test loop. Where files are uploaded via PutSFTP to a test
> server, and downloaded again and recalculated the hash. I have had one
> issue after 3 days of running."
>
> So to be clear with GenerateFlowFile making these files and then you
> looping the content is wholly and fully exclusively within the control
> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> same files over and over in nifi itself you can make this happen or
> cannot?
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
> >
> > Jens,
> >
> > "After fetching a FlowFile-stream file and unpacked it back into NiFi
> > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> > exact same file. And got a new hash. That is what worry’s me.
> > The fact that the same file can be recalculated and produce two
> > different hashes, is very strange, but it happens. "
> >
> > Ok so to confirm you are saying that in each case this happens you see
> > it first compute the wrong hash, but then if you retry the same
> > flowfile it then provides the correct hash?
> >
> > Can you please also show/share the lineage history for such a flow
> > file then?  It should have events for the initial hash, second hash,
> > the unpacking, trace to the original stream, etc...
> >
> > Thanks
> >
> > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> > >
> > > Dear Mark and Joe
> > >
> > > I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> > > After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> > > The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
> > >
> > > I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
> > >
> > > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> > > Now the test flow is running without the Put/Fetch sftp processors.
> > >
> > > Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> > > I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
> > >
> > > I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
> > >
> > > Kind Regards
> > > Jens M. Kofoed
> > >
> > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
> > >
> > > Jens,
> > >
> > > Thanks for sharing the images.
> > >
> > > I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> > >
> > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> > >
> > > So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> > >
> > > Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> > >
> > > This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> > >
> > > Thanks
> > > -Mark
> > >
> > >
> > >
> > >
> > >
> > > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
> > >
> > > Jens
> > >
> > > Actually is this current loop test contained within a single nifi and there you see corruption happen?
> > >
> > > Joe
> > >
> > > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
> > >
> > > Jens,
> > >
> > > You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
> > >
> > > Joe
> > >
> > > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> > >
> > > Hi
> > >
> > > Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> > > I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
> > >
> > > I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> > > All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
> > >
> > > Any suggestions for where to dig ?
> > >
> > > Regards
> > > Jens M. Kofoed
> > >
> > >
> > >
> > > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
> > >
> > > Hi Mark
> > >
> > > Thanks for replaying and the suggestion to look at the content Claim.
> > > These 3 pictures is from the first attempt:
> > > <image.png>   <image.png>   <image.png>
> > >
> > > Yesterday I realized that the content was still in the archive, so I could Replay the file.
> > > <image.png>
> > > So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> > > <image.png>   <image.png>   <image.png>
> > >
> > > In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> > > <image.png>   <image.png>   <image.png>
> > > Here the content Claim is all the same.
> > >
> > > It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> > > This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> > > What can influence the same processor to calculate 2 different sha256 on the exact same content???
> > >
> > > Regards
> > > Jens M. Kofoed
> > >
> > >
> > > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
> > >
> > > Jens,
> > >
> > > In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> > > If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
> > >
> > > Thanks
> > > -Mark
> > >
> > > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
> > >
> > > Dear NIFI Users
> > >
> > > I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> > > The background:
> > > We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
> > >
> > > ----
> > >
> > > Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> > > I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
> > >
> > > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> > > <image.png>
> > >
> > > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> > > I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
> > >
> > > System 1:
> > > At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > The file is exported as a "FlowFile Stream, v3" to System 2
> > >
> > > SYSTEM 2:
> > > At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > <image.png>
> > > At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > > At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > >
> > > At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > > <image.png>
> > >
> > > How on earth can this happen???
> > >
> > > Kind Regards
> > > Jens M. Kofoed
> > >
> > >
> > >
> > > <Repro.json>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

And to further narrow this down

"I have a test flow, where a GenerateFlowfile has created 6x 1GB files
(2 files per node) and next process was a hashcontent before it run
into a test loop. Where files are uploaded via PutSFTP to a test
server, and downloaded again and recalculated the hash. I have had one
issue after 3 days of running."

So to be clear with GenerateFlowFile making these files and then you
looping the content is wholly and fully exclusively within the control
of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
same files over and over in nifi itself you can make this happen or
cannot?

Thanks

On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <jo...@gmail.com> wrote:
>
> Jens,
>
> "After fetching a FlowFile-stream file and unpacked it back into NiFi
> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> exact same file. And got a new hash. That is what worry’s me.
> The fact that the same file can be recalculated and produce two
> different hashes, is very strange, but it happens. "
>
> Ok so to confirm you are saying that in each case this happens you see
> it first compute the wrong hash, but then if you retry the same
> flowfile it then provides the correct hash?
>
> Can you please also show/share the lineage history for such a flow
> file then?  It should have events for the initial hash, second hash,
> the unpacking, trace to the original stream, etc...
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >
> > Dear Mark and Joe
> >
> > I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> > After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> > The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
> >
> > I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
> >
> > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> > Now the test flow is running without the Put/Fetch sftp processors.
> >
> > Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> > I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
> >
> > I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
> >
> > Kind Regards
> > Jens M. Kofoed
> >
> > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
> >
> > Jens,
> >
> > Thanks for sharing the images.
> >
> > I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> >
> > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> >
> > So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> >
> > Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> >
> > This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> >
> > Thanks
> > -Mark
> >
> >
> >
> >
> >
> > On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
> >
> > Jens
> >
> > Actually is this current loop test contained within a single nifi and there you see corruption happen?
> >
> > Joe
> >
> > On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
> >
> > Jens,
> >
> > You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
> >
> > Joe
> >
> > On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
> >
> > Hi
> >
> > Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> > I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
> >
> > I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> > All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
> >
> > Any suggestions for where to dig ?
> >
> > Regards
> > Jens M. Kofoed
> >
> >
> >
> > Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
> >
> > Hi Mark
> >
> > Thanks for replaying and the suggestion to look at the content Claim.
> > These 3 pictures is from the first attempt:
> > <image.png>   <image.png>   <image.png>
> >
> > Yesterday I realized that the content was still in the archive, so I could Replay the file.
> > <image.png>
> > So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> > <image.png>   <image.png>   <image.png>
> >
> > In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> > <image.png>   <image.png>   <image.png>
> > Here the content Claim is all the same.
> >
> > It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> > This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> > What can influence the same processor to calculate 2 different sha256 on the exact same content???
> >
> > Regards
> > Jens M. Kofoed
> >
> >
> > Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
> >
> > Jens,
> >
> > In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> > If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
> >
> > Thanks
> > -Mark
> >
> > On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
> >
> > Dear NIFI Users
> >
> > I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> > The background:
> > We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
> >
> > ----
> >
> > Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> > I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
> >
> > THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> > <image.png>
> >
> > We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> > I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
> >
> > System 1:
> > At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > The file is exported as a "FlowFile Stream, v3" to System 2
> >
> > SYSTEM 2:
> > At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > <image.png>
> > At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> > At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> >
> > At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> > <image.png>
> >
> > How on earth can this happen???
> >
> > Kind Regards
> > Jens M. Kofoed
> >
> >
> >
> > <Repro.json>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

"After fetching a FlowFile-stream file and unpacked it back into NiFi
I calculate a sha256. 1 minutes later I recalculate the sha256 on the
exact same file. And got a new hash. That is what worry’s me.
The fact that the same file can be recalculated and produce two
different hashes, is very strange, but it happens. "

Ok so to confirm you are saying that in each case this happens you see
it first compute the wrong hash, but then if you retry the same
flowfile it then provides the correct hash?

Can you please also show/share the lineage history for such a flow
file then?  It should have events for the initial hash, second hash,
the unpacking, trace to the original stream, etc...

Thanks

On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Dear Mark and Joe
>
> I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
> After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
> The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.
>
> I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.
>
> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
> Now the test flow is running without the Put/Fetch sftp processors.
>
> Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
> I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>
> I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.
>
> Kind Regards
> Jens M. Kofoed
>
> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
>
> Jens,
>
> Thanks for sharing the images.
>
> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
>
> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
>
> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
>
> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
>
> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
>
> Thanks
> -Mark
>
>
>
>
>
> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>
> Jens
>
> Actually is this current loop test contained within a single nifi and there you see corruption happen?
>
> Joe
>
> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>
> Jens,
>
> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>
> Joe
>
> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Hi
>
> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>
> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>
> Any suggestions for where to dig ?
>
> Regards
> Jens M. Kofoed
>
>
>
> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>
> Hi Mark
>
> Thanks for replaying and the suggestion to look at the content Claim.
> These 3 pictures is from the first attempt:
> <image.png>   <image.png>   <image.png>
>
> Yesterday I realized that the content was still in the archive, so I could Replay the file.
> <image.png>
> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
> <image.png>   <image.png>   <image.png>
>
> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
> <image.png>   <image.png>   <image.png>
> Here the content Claim is all the same.
>
> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
> What can influence the same processor to calculate 2 different sha256 on the exact same content???
>
> Regards
> Jens M. Kofoed
>
>
> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>
> Jens,
>
> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>
> Thanks
> -Mark
>
> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Dear NIFI Users
>
> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
> The background:
> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>
> ----
>
> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>
> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
> <image.png>
>
> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>
> System 1:
> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> The file is exported as a "FlowFile Stream, v3" to System 2
>
> SYSTEM 2:
> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> <image.png>
> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>
> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> <image.png>
>
> How on earth can this happen???
>
> Kind Regards
> Jens M. Kofoed
>
>
>
> <Repro.json>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Dear Mark and Joe

I know my setup isn’t normal for many people. But if we only looks at my receive side, which the last mails is about. Every thing is happening at the same NIFI instance. It is the same 3 node NIFI cluster.
After fetching a FlowFile-stream file and unpacked it back into NiFi I calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same file. And got a new hash. That is what worry’s me.
The fact that the same file can be recalculated and produce two different hashes, is very strange, but it happens. Over the last 5 months it have only happen 35-40 times.

I can understand if the file is not completely loaded and saved into the content repository before the hashing starts. But I believe that the unpack process don’t forward the flow file to the next process before it is 100% finish unpacking and saving the new content to the repository.

I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files per node) and next process was a hashcontent before it run into a test loop. Where files are uploaded via PutSFTP to a test server, and downloaded again and recalculated the hash. I have had one issue after 3 days of running.
Now the test flow is running without the Put/Fetch sftp processors.

Another problem is that I can’t find any correlation to other events. Not within NIFI, nor the server itself or VMWare. If I just could find any other event which happens at the same time, I might be able to force some kind of event to trigger the issue.
I have tried to force VMware to migrate a NiFi node to another host. Forcing it to do a snapshot and deleting snapshots, but nothing can trigger and error.

I know it will be very very difficult to reproduce. But I will setup multiple NiFi instances running different test flows to see if I can find any reason why it behaves as it does.

Kind Regards
Jens M. Kofoed

> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <ma...@hotmail.com>:
> 
> Jens,
> 
> Thanks for sharing the images.
> 
> I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.
> 
> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.
> 
> So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.
> 
> Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.
> 
> This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.
> 
> Thanks
> -Mark
> 
> 
> 
> 
> 
>> On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com> wrote:
>> 
>> Jens
>> 
>> Actually is this current loop test contained within a single nifi and there you see corruption happen?
>> 
>> Joe
>> 
>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:
>> Jens,
>> 
>> You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?
>> 
>> Joe
>> 
>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>> Hi
>> 
>> Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
>> I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.
>> 
>> I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
>> All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.
>> 
>> Any suggestions for where to dig ?
>> 
>> Regards
>> Jens M. Kofoed
>> 
>> 
>> 
>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>:
>> Hi Mark
>> 
>> Thanks for replaying and the suggestion to look at the content Claim.
>> These 3 pictures is from the first attempt:
>> <image.png>   <image.png>   <image.png>
>> 
>> Yesterday I realized that the content was still in the archive, so I could Replay the file.
>> <image.png>
>> So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
>> <image.png>   <image.png>   <image.png>
>> 
>> In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
>> <image.png>   <image.png>   <image.png>
>> Here the content Claim is all the same.
>> 
>> It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
>> This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
>> What can influence the same processor to calculate 2 different sha256 on the exact same content???
>> 
>> Regards
>> Jens M. Kofoed
>> 
>> 
>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>> Jens,
>> 
>> In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
>> If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
>> 
>> Thanks
>> -Mark
>> 
>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com> wrote:
>>> 
>>> Dear NIFI Users
>>> 
>>> I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
>>> The background:
>>> We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.
>>> 
>>> ----
>>> 
>>> Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
>>> I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.
>>> 
>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
>>> <image.png>
>>> 
>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>> I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.
>>> 
>>> System 1:
>>> At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>> The file is exported as a "FlowFile Stream, v3" to System 2
>>> 
>>> SYSTEM 2:
>>> At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> <image.png>
>>> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> 
>>> At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>> <image.png>
>>> 
>>> How on earth can this happen???
>>> 
>>> Kind Regards
>>> Jens M. Kofoed
>>> 
>> 
> 
> <Repro.json>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Mark Payne <ma...@hotmail.com>.
Jens,

Thanks for sharing the images.

I tried to setup a test to reproduce the issue. I’ve had it running for quite some time. Running through millions of iterations.

I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of hundreds of MB). I’ve been unable to reproduce an issue after millions of iterations.

So far I cannot replicate. And since you’re pulling the data via SFTP and then unpacking, which preserves all original attributes from a different system, this can easily become confusing.

Recommend trying to reproduce with SFTP-related processors out of the picture, as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then immediately use CryptographicHashContent to generate an ‘initial hash’, copy that value to another attribute, and then loop, generating the hash and comparing against the original one. I’ll attach a flow that does this, but not sure if the email server will strip out the attachment or not.

This way we remove any possibility of actual corruption between the two nifi instances. If we can still see corruption / different hashes within a single nifi instance, then it certainly warrants further investigation but i can’t see any issues so far.

Thanks
-Mark





On Oct 20, 2021, at 10:21 AM, Joe Witt <jo...@gmail.com>> wrote:

Jens

Actually is this current loop test contained within a single nifi and there you see corruption happen?

Joe

On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com>> wrote:
Jens,

You have a very involved setup including other systems (non NiFi).  Have you removed those systems from the equation so you have more evidence to support your expectation that NiFi is doing something other than you expect?

Joe

On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com>> wrote:
Hi

Today I have another file which have been running through the retry loop one time. To test the processors and the algorithm I added the HashContent processor and also added hashing by SHA-1.
I file have been going through the system, and both the SHA-1 and SHA-256 are both different than expected. with a 1 minutes delay the file is going back into the hashing content flow and this time it calculates both hashes fine.

I don't believe that the hashing is buggy, but something is very very strange. What can influence the processors/algorithm to calculate a different hash???
All the input/output claim information is exactly the same. It is the same flow/content file going in a loop. It happens on all 3 nodes.

Any suggestions for where to dig ?

Regards
Jens M. Kofoed



Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <jm...@gmail.com>>:
Hi Mark

Thanks for replaying and the suggestion to look at the content Claim.
These 3 pictures is from the first attempt:
<image.png>   <image.png>   <image.png>

Yesterday I realized that the content was still in the archive, so I could Replay the file.
<image.png>
So here are the same pictures but for the replay and as you can see the Identifier, offset and Size are all the same.
<image.png>   <image.png>   <image.png>

In my flow if the hash does not match my original first calculated hash, it goes into a retry loop. Here are the pictures for the 4th time the file went through:
<image.png>   <image.png>   <image.png>
Here the content Claim is all the same.

It is very rare that we see these issues <1 : 1.000.000 files and only with large files. Only once have I seen the error with a 110MB file, the other times the files size are above 800MB.
This time it was a Nifi-Flowstream v3 file, which has been exported from one system and imported in another. But while the file has been imported it is the same file inside NIFI and it stays at the same node. Going through the same loop of processors multiple times and in the end the CryptographicHashContent calculate a different SHA256 than it did earlier. This should not be possible!!! And that is what concern my the most.
What can influence the same processor to calculate 2 different sha256 on the exact same content???

Regards
Jens M. Kofoed


Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>>:
Jens,

In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?

Thanks
-Mark

On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>> wrote:

Dear NIFI Users

I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
The background:
We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.

----

Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.

THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
<image.png>

We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.

System 1:
At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
The file is exported as a "FlowFile Stream, v3" to System 2

SYSTEM 2:
At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
<image.png>
At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819

At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
<image.png>

How on earth can this happen???

Kind Regards
Jens M. Kofoed




Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens

Actually is this current loop test contained within a single nifi and there
you see corruption happen?

Joe

On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <jo...@gmail.com> wrote:

> Jens,
>
> You have a very involved setup including other systems (non NiFi).  Have
> you removed those systems from the equation so you have more evidence to
> support your expectation that NiFi is doing something other than you expect?
>
> Joe
>
> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com>
> wrote:
>
>> Hi
>>
>> Today I have another file which have been running through the retry loop
>> one time. To test the processors and the algorithm I added the HashContent
>> processor and also added hashing by SHA-1.
>> I file have been going through the system, and both the SHA-1 and SHA-256
>> are both different than expected. with a 1 minutes delay the file is going
>> back into the hashing content flow and this time it calculates both hashes
>> fine.
>>
>> I don't believe that the hashing is buggy, but something is very very
>> strange. What can influence the processors/algorithm to calculate a
>> different hash???
>> All the input/output claim information is exactly the same. It is the
>> same flow/content file going in a loop. It happens on all 3 nodes.
>>
>> Any suggestions for where to dig ?
>>
>> Regards
>> Jens M. Kofoed
>>
>>
>>
>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
>> jmkofoed.ube@gmail.com>:
>>
>>> Hi Mark
>>>
>>> Thanks for replaying and the suggestion to look at the content Claim.
>>> These 3 pictures is from the first attempt:
>>> [image: image.png]   [image: image.png]   [image: image.png]
>>>
>>> Yesterday I realized that the content was still in the archive, so I
>>> could Replay the file.
>>> [image: image.png]
>>> So here are the same pictures but for the replay and as you can see the
>>> Identifier, offset and Size are all the same.
>>> [image: image.png]   [image: image.png]   [image: image.png]
>>>
>>> In my flow if the hash does not match my original first calculated hash,
>>> it goes into a retry loop. Here are the pictures for the 4th time the file
>>> went through:
>>> [image: image.png]   [image: image.png]   [image: image.png]
>>> Here the content Claim is all the same.
>>>
>>> It is very rare that we see these issues <1 : 1.000.000 files and only
>>> with large files. Only once have I seen the error with a 110MB file, the
>>> other times the files size are above 800MB.
>>> This time it was a Nifi-Flowstream v3 file, which has been exported from
>>> one system and imported in another. But while the file has been imported it
>>> is the same file inside NIFI and it stays at the same node. Going through
>>> the same loop of processors multiple times and in the end the
>>> CryptographicHashContent calculate a different SHA256 than it did earlier.
>>> This should not be possible!!! And that is what concern my the most.
>>> What can influence the same processor to calculate 2 different sha256 on
>>> the exact same content???
>>>
>>> Regards
>>> Jens M. Kofoed
>>>
>>>
>>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <markap14@hotmail.com
>>> >:
>>>
>>>> Jens,
>>>>
>>>> In the two provenance events - one showing a hash of dd4cc… and the
>>>> other showing f6f0….
>>>> If you go to the Content tab, do they both show the same Content Claim?
>>>> I.e., do the Input Claim / Output Claim show the same values for Container,
>>>> Section, Identifier, Offset, and Size?
>>>>
>>>> Thanks
>>>> -Mark
>>>>
>>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>
>>>> wrote:
>>>>
>>>> Dear NIFI Users
>>>>
>>>> I have posted this mail in the developers mailing list and just want to
>>>> inform all of our about a very odd behavior we are facing.
>>>> The background:
>>>> We have data going between 2 different NIFI systems which has no direct
>>>> network access to each other. Therefore we calculate a SHA256 hash value of
>>>> the content at system 1, before the flowfile and data are combined and
>>>> saved as a "flowfile-stream-v3" pkg file. The file is then transported to
>>>> system 2, where the pkg file is unpacked and the flow can continue. To be
>>>> sure about file integrity we calculate a new sha256 at system 2. But
>>>> sometimes we see that the sha256 gets another value, which might suggest
>>>> the file was corrupted. But recalculating the sha256 again gives a new hash
>>>> value.
>>>>
>>>> ----
>>>>
>>>> Tonight I had yet another file which didn't match the expected sha256
>>>> hash value. The content is a 1.7GB file and the Event Duration was
>>>> "00:00:17.539" to calculate the hash.
>>>> I have created a Retry loop, where the file will go to a Wait process
>>>> for delaying the file 1 minute and going back to
>>>> the CryptographicHashContent for a new calculation. After 3 retries the
>>>> file goes to the retries_exceeded and goes to a disabled process just to be
>>>> in a queue so I manually can look at it. This morning I rerouted the file
>>>> from my retries_exceeded queue back to the CryptographicHashContent for a
>>>> new calculation and this time it calculated the correct hash value.
>>>>
>>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is
>>>> happening.
>>>> <image.png>
>>>>
>>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with
>>>> openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
>>>> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
>>>> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
>>>> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>>> I have inspected different logs to see if I can find any correlation
>>>> what happened at the same time as the file is going through my loop, but
>>>> there are no event/task at that exact time.
>>>>
>>>> System 1:
>>>> At 10/19/2021 00:15:11.247 CEST my file is going through
>>>> a CryptographicHashContent: SHA256 value:
>>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>> The file is exported as a "FlowFile Stream, v3" to System 2
>>>>
>>>> SYSTEM 2:
>>>> At 10/19/2021 00:18:10.528 CEST the file is going through
>>>> a CryptographicHashContent: SHA256 value:
>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>> <image.png>
>>>> At 10/19/2021 00:19:08.996 CEST the file is going through the same
>>>> CryptographicHashContent at system 2: SHA256 value:
>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>> At 10/19/2021 00:20:04.376 CEST the file is going through the
>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>> At 10/19/2021 00:21:01.711 CEST the file is going through the
>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>>
>>>> At 10/19/2021 06:07:43.376 CEST the file is going through the
>>>> same a CryptographicHashContent at system 2: SHA256 value:
>>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>>> <image.png>
>>>>
>>>> How on earth can this happen???
>>>>
>>>> Kind Regards
>>>> Jens M. Kofoed
>>>>
>>>>
>>>>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Joe Witt <jo...@gmail.com>.
Jens,

You have a very involved setup including other systems (non NiFi).  Have
you removed those systems from the equation so you have more evidence to
support your expectation that NiFi is doing something other than you expect?

Joe

On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <jm...@gmail.com>
wrote:

> Hi
>
> Today I have another file which have been running through the retry loop
> one time. To test the processors and the algorithm I added the HashContent
> processor and also added hashing by SHA-1.
> I file have been going through the system, and both the SHA-1 and SHA-256
> are both different than expected. with a 1 minutes delay the file is going
> back into the hashing content flow and this time it calculates both hashes
> fine.
>
> I don't believe that the hashing is buggy, but something is very very
> strange. What can influence the processors/algorithm to calculate a
> different hash???
> All the input/output claim information is exactly the same. It is the same
> flow/content file going in a loop. It happens on all 3 nodes.
>
> Any suggestions for where to dig ?
>
> Regards
> Jens M. Kofoed
>
>
>
> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
> jmkofoed.ube@gmail.com>:
>
>> Hi Mark
>>
>> Thanks for replaying and the suggestion to look at the content Claim.
>> These 3 pictures is from the first attempt:
>> [image: image.png]   [image: image.png]   [image: image.png]
>>
>> Yesterday I realized that the content was still in the archive, so I
>> could Replay the file.
>> [image: image.png]
>> So here are the same pictures but for the replay and as you can see the
>> Identifier, offset and Size are all the same.
>> [image: image.png]   [image: image.png]   [image: image.png]
>>
>> In my flow if the hash does not match my original first calculated hash,
>> it goes into a retry loop. Here are the pictures for the 4th time the file
>> went through:
>> [image: image.png]   [image: image.png]   [image: image.png]
>> Here the content Claim is all the same.
>>
>> It is very rare that we see these issues <1 : 1.000.000 files and only
>> with large files. Only once have I seen the error with a 110MB file, the
>> other times the files size are above 800MB.
>> This time it was a Nifi-Flowstream v3 file, which has been exported from
>> one system and imported in another. But while the file has been imported it
>> is the same file inside NIFI and it stays at the same node. Going through
>> the same loop of processors multiple times and in the end the
>> CryptographicHashContent calculate a different SHA256 than it did earlier.
>> This should not be possible!!! And that is what concern my the most.
>> What can influence the same processor to calculate 2 different sha256 on
>> the exact same content???
>>
>> Regards
>> Jens M. Kofoed
>>
>>
>> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>>
>>> Jens,
>>>
>>> In the two provenance events - one showing a hash of dd4cc… and the
>>> other showing f6f0….
>>> If you go to the Content tab, do they both show the same Content Claim?
>>> I.e., do the Input Claim / Output Claim show the same values for Container,
>>> Section, Identifier, Offset, and Size?
>>>
>>> Thanks
>>> -Mark
>>>
>>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>
>>> wrote:
>>>
>>> Dear NIFI Users
>>>
>>> I have posted this mail in the developers mailing list and just want to
>>> inform all of our about a very odd behavior we are facing.
>>> The background:
>>> We have data going between 2 different NIFI systems which has no direct
>>> network access to each other. Therefore we calculate a SHA256 hash value of
>>> the content at system 1, before the flowfile and data are combined and
>>> saved as a "flowfile-stream-v3" pkg file. The file is then transported to
>>> system 2, where the pkg file is unpacked and the flow can continue. To be
>>> sure about file integrity we calculate a new sha256 at system 2. But
>>> sometimes we see that the sha256 gets another value, which might suggest
>>> the file was corrupted. But recalculating the sha256 again gives a new hash
>>> value.
>>>
>>> ----
>>>
>>> Tonight I had yet another file which didn't match the expected sha256
>>> hash value. The content is a 1.7GB file and the Event Duration was
>>> "00:00:17.539" to calculate the hash.
>>> I have created a Retry loop, where the file will go to a Wait process
>>> for delaying the file 1 minute and going back to
>>> the CryptographicHashContent for a new calculation. After 3 retries the
>>> file goes to the retries_exceeded and goes to a disabled process just to be
>>> in a queue so I manually can look at it. This morning I rerouted the file
>>> from my retries_exceeded queue back to the CryptographicHashContent for a
>>> new calculation and this time it calculated the correct hash value.
>>>
>>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is
>>> happening.
>>> <image.png>
>>>
>>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with
>>> openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
>>> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
>>> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
>>> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>>> I have inspected different logs to see if I can find any correlation
>>> what happened at the same time as the file is going through my loop, but
>>> there are no event/task at that exact time.
>>>
>>> System 1:
>>> At 10/19/2021 00:15:11.247 CEST my file is going through
>>> a CryptographicHashContent: SHA256 value:
>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>> The file is exported as a "FlowFile Stream, v3" to System 2
>>>
>>> SYSTEM 2:
>>> At 10/19/2021 00:18:10.528 CEST the file is going through
>>> a CryptographicHashContent: SHA256 value:
>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> <image.png>
>>> At 10/19/2021 00:19:08.996 CEST the file is going through the same
>>> CryptographicHashContent at system 2: SHA256 value:
>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> At 10/19/2021 00:20:04.376 CEST the file is going through the
>>> same a CryptographicHashContent at system 2: SHA256 value:
>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>> At 10/19/2021 00:21:01.711 CEST the file is going through the
>>> same a CryptographicHashContent at system 2: SHA256 value:
>>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>>
>>> At 10/19/2021 06:07:43.376 CEST the file is going through the
>>> same a CryptographicHashContent at system 2: SHA256 value:
>>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>>> <image.png>
>>>
>>> How on earth can this happen???
>>>
>>> Kind Regards
>>> Jens M. Kofoed
>>>
>>>
>>>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi

Today I have another file which have been running through the retry loop
one time. To test the processors and the algorithm I added the HashContent
processor and also added hashing by SHA-1.
I file have been going through the system, and both the SHA-1 and SHA-256
are both different than expected. with a 1 minutes delay the file is going
back into the hashing content flow and this time it calculates both hashes
fine.

I don't believe that the hashing is buggy, but something is very very
strange. What can influence the processors/algorithm to calculate a
different hash???
All the input/output claim information is exactly the same. It is the same
flow/content file going in a loop. It happens on all 3 nodes.

Any suggestions for where to dig ?

Regards
Jens M. Kofoed



Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed <
jmkofoed.ube@gmail.com>:

> Hi Mark
>
> Thanks for replaying and the suggestion to look at the content Claim.
> These 3 pictures is from the first attempt:
> [image: image.png]   [image: image.png]   [image: image.png]
>
> Yesterday I realized that the content was still in the archive, so I could
> Replay the file.
> [image: image.png]
> So here are the same pictures but for the replay and as you can see the
> Identifier, offset and Size are all the same.
> [image: image.png]   [image: image.png]   [image: image.png]
>
> In my flow if the hash does not match my original first calculated hash,
> it goes into a retry loop. Here are the pictures for the 4th time the file
> went through:
> [image: image.png]   [image: image.png]   [image: image.png]
> Here the content Claim is all the same.
>
> It is very rare that we see these issues <1 : 1.000.000 files and only
> with large files. Only once have I seen the error with a 110MB file, the
> other times the files size are above 800MB.
> This time it was a Nifi-Flowstream v3 file, which has been exported from
> one system and imported in another. But while the file has been imported it
> is the same file inside NIFI and it stays at the same node. Going through
> the same loop of processors multiple times and in the end the
> CryptographicHashContent calculate a different SHA256 than it did earlier.
> This should not be possible!!! And that is what concern my the most.
> What can influence the same processor to calculate 2 different sha256 on
> the exact same content???
>
> Regards
> Jens M. Kofoed
>
>
> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:
>
>> Jens,
>>
>> In the two provenance events - one showing a hash of dd4cc… and the other
>> showing f6f0….
>> If you go to the Content tab, do they both show the same Content Claim?
>> I.e., do the Input Claim / Output Claim show the same values for Container,
>> Section, Identifier, Offset, and Size?
>>
>> Thanks
>> -Mark
>>
>> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>
>> wrote:
>>
>> Dear NIFI Users
>>
>> I have posted this mail in the developers mailing list and just want to
>> inform all of our about a very odd behavior we are facing.
>> The background:
>> We have data going between 2 different NIFI systems which has no direct
>> network access to each other. Therefore we calculate a SHA256 hash value of
>> the content at system 1, before the flowfile and data are combined and
>> saved as a "flowfile-stream-v3" pkg file. The file is then transported to
>> system 2, where the pkg file is unpacked and the flow can continue. To be
>> sure about file integrity we calculate a new sha256 at system 2. But
>> sometimes we see that the sha256 gets another value, which might suggest
>> the file was corrupted. But recalculating the sha256 again gives a new hash
>> value.
>>
>> ----
>>
>> Tonight I had yet another file which didn't match the expected sha256
>> hash value. The content is a 1.7GB file and the Event Duration was
>> "00:00:17.539" to calculate the hash.
>> I have created a Retry loop, where the file will go to a Wait process for
>> delaying the file 1 minute and going back to the CryptographicHashContent
>> for a new calculation. After 3 retries the file goes to the
>> retries_exceeded and goes to a disabled process just to be in a queue so I
>> manually can look at it. This morning I rerouted the file from my
>> retries_exceeded queue back to the CryptographicHashContent for a new
>> calculation and this time it calculated the correct hash value.
>>
>> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is
>> happening.
>> <image.png>
>>
>> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with
>> openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
>> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
>> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
>> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
>> I have inspected different logs to see if I can find any correlation what
>> happened at the same time as the file is going through my loop, but there
>> are no event/task at that exact time.
>>
>> System 1:
>> At 10/19/2021 00:15:11.247 CEST my file is going through
>> a CryptographicHashContent: SHA256 value:
>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>> The file is exported as a "FlowFile Stream, v3" to System 2
>>
>> SYSTEM 2:
>> At 10/19/2021 00:18:10.528 CEST the file is going through
>> a CryptographicHashContent: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> <image.png>
>> At 10/19/2021 00:19:08.996 CEST the file is going through the same
>> CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> At 10/19/2021 00:20:04.376 CEST the file is going through the
>> same a CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>> At 10/19/2021 00:21:01.711 CEST the file is going through the
>> same a CryptographicHashContent at system 2: SHA256 value:
>> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>>
>> At 10/19/2021 06:07:43.376 CEST the file is going through the
>> same a CryptographicHashContent at system 2: SHA256 value:
>> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
>> <image.png>
>>
>> How on earth can this happen???
>>
>> Kind Regards
>> Jens M. Kofoed
>>
>>
>>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Mark

Thanks for replaying and the suggestion to look at the content Claim.
These 3 pictures is from the first attempt:
[image: image.png]   [image: image.png]   [image: image.png]

Yesterday I realized that the content was still in the archive, so I could
Replay the file.
[image: image.png]
So here are the same pictures but for the replay and as you can see the
Identifier, offset and Size are all the same.
[image: image.png]   [image: image.png]   [image: image.png]

In my flow if the hash does not match my original first calculated hash, it
goes into a retry loop. Here are the pictures for the 4th time the file
went through:
[image: image.png]   [image: image.png]   [image: image.png]
Here the content Claim is all the same.

It is very rare that we see these issues <1 : 1.000.000 files and only with
large files. Only once have I seen the error with a 110MB file, the other
times the files size are above 800MB.
This time it was a Nifi-Flowstream v3 file, which has been exported from
one system and imported in another. But while the file has been imported it
is the same file inside NIFI and it stays at the same node. Going through
the same loop of processors multiple times and in the end the
CryptographicHashContent calculate a different SHA256 than it did earlier.
This should not be possible!!! And that is what concern my the most.
What can influence the same processor to calculate 2 different sha256 on
the exact same content???

Regards
Jens M. Kofoed


Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <ma...@hotmail.com>:

> Jens,
>
> In the two provenance events - one showing a hash of dd4cc… and the other
> showing f6f0….
> If you go to the Content tab, do they both show the same Content Claim?
> I.e., do the Input Claim / Output Claim show the same values for Container,
> Section, Identifier, Offset, and Size?
>
> Thanks
> -Mark
>
> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>
> wrote:
>
> Dear NIFI Users
>
> I have posted this mail in the developers mailing list and just want to
> inform all of our about a very odd behavior we are facing.
> The background:
> We have data going between 2 different NIFI systems which has no direct
> network access to each other. Therefore we calculate a SHA256 hash value of
> the content at system 1, before the flowfile and data are combined and
> saved as a "flowfile-stream-v3" pkg file. The file is then transported to
> system 2, where the pkg file is unpacked and the flow can continue. To be
> sure about file integrity we calculate a new sha256 at system 2. But
> sometimes we see that the sha256 gets another value, which might suggest
> the file was corrupted. But recalculating the sha256 again gives a new hash
> value.
>
> ----
>
> Tonight I had yet another file which didn't match the expected sha256 hash
> value. The content is a 1.7GB file and the Event Duration was
> "00:00:17.539" to calculate the hash.
> I have created a Retry loop, where the file will go to a Wait process for
> delaying the file 1 minute and going back to the CryptographicHashContent
> for a new calculation. After 3 retries the file goes to the
> retries_exceeded and goes to a disabled process just to be in a queue so I
> manually can look at it. This morning I rerouted the file from my
> retries_exceeded queue back to the CryptographicHashContent for a new
> calculation and this time it calculated the correct hash value.
>
> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is
> happening.
> <image.png>
>
> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with
> openjdk version "1.8.0_292", OpenJDK Runtime Environment (build
> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build
> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware
> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
> I have inspected different logs to see if I can find any correlation what
> happened at the same time as the file is going through my loop, but there
> are no event/task at that exact time.
>
> System 1:
> At 10/19/2021 00:15:11.247 CEST my file is going through
> a CryptographicHashContent: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> The file is exported as a "FlowFile Stream, v3" to System 2
>
> SYSTEM 2:
> At 10/19/2021 00:18:10.528 CEST the file is going through
> a CryptographicHashContent: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> <image.png>
> At 10/19/2021 00:19:08.996 CEST the file is going through the same
> CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> At 10/19/2021 00:20:04.376 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
> At 10/19/2021 00:21:01.711 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
>
> At 10/19/2021 06:07:43.376 CEST the file is going through the
> same a CryptographicHashContent at system 2: SHA256 value:
> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
> <image.png>
>
> How on earth can this happen???
>
> Kind Regards
> Jens M. Kofoed
>
>
>

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

Posted by Mark Payne <ma...@hotmail.com>.
Jens,

In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0….
If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?

Thanks
-Mark

On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <jm...@gmail.com>> wrote:

Dear NIFI Users

I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing.
The background:
We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value.

----

Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash.
I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value.

THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening.
<image.png>

We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time.

System 1:
At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
The file is exported as a "FlowFile Stream, v3" to System 2

SYSTEM 2:
At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
<image.png>
At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819

At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
<image.png>

How on earth can this happen???

Kind Regards
Jens M. Kofoed