You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Jens M. Kofoed" <jm...@gmail.com> on 2021/10/12 05:35:51 UTC

File corruption with Put/Fetch SFTP

Dear Developers

We have a situation where we see corrupted file after using PutSFTP and
FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK Runtime
Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit
Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server 20.04

We have a flow between 2 separated systems where we use a PUTSFTP to export
data from one NIFI instance to a datadiode and use FetchSFTP to grep data
on the other end. To be sure data is not corrupted we calculate a SHA256 on
each side, and transfer the flowfile metadata in a seperate file. In rare
cases have see that the SHA256 doesn't match on both sides and are
investigation where the errors happens. We see 2 errors. Manually
calculation a SHA256 on both side of the diodes the file is OK and we have
found that the errors at  happens between NIFI and the SFTP servers. And it
can happens at both sides.
So for testing I created this little flow:
GeneratingFlowFile (size 100MB) (Run once) ->
CryptographicHashContent (SHA256) ->
UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) ->
PutSFTP ->
FetchSFTP ->
CryptographicHashContent (SHA256) ->
routeOnAttribute (compare root.hash vs.content_SHA-256)
    If unmatch ->
        Going to a disabled process for placeholding the corrupted file in
a file queue
    If match ->
        UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping back
to PutSFTP

After 8992 iteration the file is corrupted. To test if the errors are in
the calculation of the SHA256 I have a copy of the flow without the
PUT/FETCH SFTP processors which haven't got any errors yet.

It is very rare that we see these errors, millions of files are going
through without any issues but some time it happens which is not good.

Can any one please help? Maybe trying to setup the same test and see if you
also have a corrupted file after some days.

Kind regards
Jens M. Kofoed

Re: File corruption with Put/Fetch SFTP

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Many many thanks 🙏 Joe, that makes my flow a lot simpler. 

Thanks 
Jens

> Den 13. okt. 2021 kl. 16.50 skrev Joe Witt <jo...@gmail.com>:
> 
> Jens
> 
> If you use MergContent [1] you can create streams of flowfile bundles
> (attributes/content serialized together) in groups of 1 or more.  Then
> on the other end you can use UnpackContent [2]
> 
> Thanks
> Joe
> 
> [1] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.MergeContent/index.html
> [2] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.UnpackContent/index.html
> 
>> On Tue, Oct 12, 2021 at 11:07 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>> 
>> Dear Joe
>> 
>> Regarding you point 5. This is almost also what I'm doing. But last night
>> at my phone I "just wrote" we created a hash file. What I'm actually doing
>> is converting the flowfile to json.
>> Are there a way where NIFI can export the complete flowfile (attributes and
>> content) into 1 file, which we can import again on the other side? Right
>> now I do it in 2 steps
>> Below is a short description of my flow for transferring data between
>> systems where we can't use S2S.
>> At low side:
>> get data ->
>>  CryptographicHashContent ->
>>    UpdateAttribute: original.filename = ${filename},
>> rootHash=${content_SHA-256} ->
>>      UpdateAttribute: filename=${UUID()} ->
>>        PutSFTP ->
>>          AttributesToJSON: Destination=flowfile-content ->
>>            UpdateAttribute: filename=${filename:append('.flowfile')} ->
>>              PutSFTP
>> 
>> At high side:
>> ListSFTP: File filter Regex = .*\.flowfile ->
>>  FetchSFTP ->
>>    ExecuteScript: (converting json data into attributes) ->
>>      UpdateAttribute: filename = ${filename:substringBefore('.flowfile')}
>> ->
>>        FetchSFTP ->
>>          CryptographicHashContent ->
>>            RouteOnAttribute: Hash_OK =
>> ${rootHash:equals(${content_SHA-256})} ->
>>              Hash_OK -> following production flow
>>              Unmatched -> Error flow
>> 
>> Kind regards
>> Jens
>> 
>>> Den tir. 12. okt. 2021 kl. 21.36 skrev Joe Witt <jo...@gmail.com>:
>>> 
>>> Jens
>>> 
>>> For such a setup the very specific details matter and here there are a
>>> lot of details.  It isn't easy to sort through this for me so I'll
>>> keep it high level based on my experience in very similar
>>> situations/setups:
>>> 
>>> 1. I'd generally trust SFTP to be awesome and damn near failure proof
>>> in itself.  I'd focus on other things.
>>> 2. I'd generally trust that data packet corruption in terms of network
>>> transfer is bulletproof and not think that is a problem especially
>>> since SFTP and various protocols employed here offer certain
>>> guarantees themselves (including nifi).
>>> 3. I'd be suspect of one way transfer/guard devices creating issues.
>>> I'd remove that and try to reproduce the problem.
>>> 4. In linux a cp/mv is not atomic as I understand if data is spanning
>>> across file systems so you could have partially written data scenarios
>>> here potentially.
>>> 5. I'd be careful to avoid multiple file scenarios such as original
>>> content and the sha256.  Instead if the low side is a NiFi and the
>>> high side is a NiFi I'd have lowside nifi write out flowfiles and pass
>>> those over the guard device.  Why?  Because this gives you your
>>> original content AND the flowfile attributes (where I'd have the
>>> sha256).  On the high side nifi i'd unpack that flow file and ensure
>>> the content matches the stated sha256.
>>> 
>>> Joe
>>> 
>>> On Tue, Oct 12, 2021 at 12:25 PM Jens M. Kofoed <jm...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi Joe
>>>> 
>>>> I know what you are thinking but that’s not the case.
>>>> Check my very short description of my test flow.
>>>> In my loop the PutSFTP process is using default settings which means
>>> it’s uploading files as .filename and rename it when done. The next process
>>> is the FetchSFTP which will load the file as filename. If PutSFTP is not
>>> finished uploading the file it will have the wrong filename and the flow
>>> file will not go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP
>>> can’t fetch the file. So in my test flow it is not the case.
>>>> 
>>>> In our production flow, after nifi gets its data it calculates the
>>> sha256.  uploads the data to a sftp server as .filename and rename it when
>>> done. Default settings for PutSFTP. Next it create a new file with the
>>> value of the hash and save it as filename.sha256.
>>>> At that sftp server a bash script is looking for NOT hidden files every
>>> 2 seconds with a ls command. If there are files the bash script does a cp
>>> filename /archive/filename and sends the data to server 3 via a data diode.
>>> At the other side another nifi server reads the filename.sha256, reads in
>>> the hash value and reads in the original data. Calculate a new sha256 and
>>> compare the two hashes.
>>>> Yesterday there was a corruption again and we checked the file at the
>>> first sftp server where the first nifi saved it after creating the first
>>> hash. Running a sha256sum at the /archive/filename produced a different
>>> hash than nifi. So after the PutSFTP and a Linux cp command the file was
>>> corrupted.
>>>> It have been less than 1 file pr. 1.000.000 files where we have seen
>>> theses issues. But we see them.
>>>> Now we try to investigate that course the issue. Therefore I created the
>>> small test flow and already after nearly 9000 iteration in the loop the
>>> file has been corrupted just being uploaded and downloaded again.
>>>> 
>>>> Are we facing a network issue where a data packed is corrupted?
>>>> Are there a very rare cases where the sftp implementation is doing
>>> something wrong?
>>>> We don’t know yet but we are running some more tests and at different
>>> systems to narrow it down
>>>> 
>>>> Kind regards
>>>> Jens M. Kofoed
>>>> 
>>>>> Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <jo...@gmail.com>:
>>>>> 
>>>>> Hello
>>>>> 
>>>>> How does nifi grab the data from the file system?  It sounds like it is
>>>>> doing partial reads due to a competing consumer (data still being
>>> written)
>>>>> scenario.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed <
>>> jmkofoed.ube@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Dear Developers
>>>>>> 
>>>>>> We have a situation where we see corrupted file after using PutSFTP
>>> and
>>>>>> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK
>>> Runtime
>>>>>> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK
>>> 64-Bit
>>>>>> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server
>>> 20.04
>>>>>> 
>>>>>> We have a flow between 2 separated systems where we use a PUTSFTP to
>>> export
>>>>>> data from one NIFI instance to a datadiode and use FetchSFTP to grep
>>> data
>>>>>> on the other end. To be sure data is not corrupted we calculate a
>>> SHA256 on
>>>>>> each side, and transfer the flowfile metadata in a seperate file. In
>>> rare
>>>>>> cases have see that the SHA256 doesn't match on both sides and are
>>>>>> investigation where the errors happens. We see 2 errors. Manually
>>>>>> calculation a SHA256 on both side of the diodes the file is OK and we
>>> have
>>>>>> found that the errors at  happens between NIFI and the SFTP servers.
>>> And it
>>>>>> can happens at both sides.
>>>>>> So for testing I created this little flow:
>>>>>> GeneratingFlowFile (size 100MB) (Run once) ->
>>>>>> CryptographicHashContent (SHA256) ->
>>>>>> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) ->
>>>>>> PutSFTP ->
>>>>>> FetchSFTP ->
>>>>>> CryptographicHashContent (SHA256) ->
>>>>>> routeOnAttribute (compare root.hash vs.content_SHA-256)
>>>>>>   If unmatch ->
>>>>>>       Going to a disabled process for placeholding the corrupted
>>> file in
>>>>>> a file queue
>>>>>>   If match ->
>>>>>>       UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping
>>> back
>>>>>> to PutSFTP
>>>>>> 
>>>>>> After 8992 iteration the file is corrupted. To test if the errors are
>>> in
>>>>>> the calculation of the SHA256 I have a copy of the flow without the
>>>>>> PUT/FETCH SFTP processors which haven't got any errors yet.
>>>>>> 
>>>>>> It is very rare that we see these errors, millions of files are going
>>>>>> through without any issues but some time it happens which is not good.
>>>>>> 
>>>>>> Can any one please help? Maybe trying to setup the same test and see
>>> if you
>>>>>> also have a corrupted file after some days.
>>>>>> 
>>>>>> Kind regards
>>>>>> Jens M. Kofoed
>>>>>> 
>>> 

Re: File corruption with Put/Fetch SFTP

Posted by Joe Witt <jo...@gmail.com>.
Jens

If you use MergContent [1] you can create streams of flowfile bundles
(attributes/content serialized together) in groups of 1 or more.  Then
on the other end you can use UnpackContent [2]

Thanks
Joe

[1] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.MergeContent/index.html
[2] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.UnpackContent/index.html

On Tue, Oct 12, 2021 at 11:07 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Dear Joe
>
> Regarding you point 5. This is almost also what I'm doing. But last night
> at my phone I "just wrote" we created a hash file. What I'm actually doing
> is converting the flowfile to json.
> Are there a way where NIFI can export the complete flowfile (attributes and
> content) into 1 file, which we can import again on the other side? Right
> now I do it in 2 steps
> Below is a short description of my flow for transferring data between
> systems where we can't use S2S.
> At low side:
> get data ->
>   CryptographicHashContent ->
>     UpdateAttribute: original.filename = ${filename},
> rootHash=${content_SHA-256} ->
>       UpdateAttribute: filename=${UUID()} ->
>         PutSFTP ->
>           AttributesToJSON: Destination=flowfile-content ->
>             UpdateAttribute: filename=${filename:append('.flowfile')} ->
>               PutSFTP
>
> At high side:
> ListSFTP: File filter Regex = .*\.flowfile ->
>   FetchSFTP ->
>     ExecuteScript: (converting json data into attributes) ->
>       UpdateAttribute: filename = ${filename:substringBefore('.flowfile')}
> ->
>         FetchSFTP ->
>           CryptographicHashContent ->
>             RouteOnAttribute: Hash_OK =
> ${rootHash:equals(${content_SHA-256})} ->
>               Hash_OK -> following production flow
>               Unmatched -> Error flow
>
> Kind regards
> Jens
>
> Den tir. 12. okt. 2021 kl. 21.36 skrev Joe Witt <jo...@gmail.com>:
>
> > Jens
> >
> > For such a setup the very specific details matter and here there are a
> > lot of details.  It isn't easy to sort through this for me so I'll
> > keep it high level based on my experience in very similar
> > situations/setups:
> >
> > 1. I'd generally trust SFTP to be awesome and damn near failure proof
> > in itself.  I'd focus on other things.
> > 2. I'd generally trust that data packet corruption in terms of network
> > transfer is bulletproof and not think that is a problem especially
> > since SFTP and various protocols employed here offer certain
> > guarantees themselves (including nifi).
> > 3. I'd be suspect of one way transfer/guard devices creating issues.
> > I'd remove that and try to reproduce the problem.
> > 4. In linux a cp/mv is not atomic as I understand if data is spanning
> > across file systems so you could have partially written data scenarios
> > here potentially.
> > 5. I'd be careful to avoid multiple file scenarios such as original
> > content and the sha256.  Instead if the low side is a NiFi and the
> > high side is a NiFi I'd have lowside nifi write out flowfiles and pass
> > those over the guard device.  Why?  Because this gives you your
> > original content AND the flowfile attributes (where I'd have the
> > sha256).  On the high side nifi i'd unpack that flow file and ensure
> > the content matches the stated sha256.
> >
> > Joe
> >
> > On Tue, Oct 12, 2021 at 12:25 PM Jens M. Kofoed <jm...@gmail.com>
> > wrote:
> > >
> > > Hi Joe
> > >
> > > I know what you are thinking but that’s not the case.
> > > Check my very short description of my test flow.
> > > In my loop the PutSFTP process is using default settings which means
> > it’s uploading files as .filename and rename it when done. The next process
> > is the FetchSFTP which will load the file as filename. If PutSFTP is not
> > finished uploading the file it will have the wrong filename and the flow
> > file will not go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP
> > can’t fetch the file. So in my test flow it is not the case.
> > >
> > > In our production flow, after nifi gets its data it calculates the
> > sha256.  uploads the data to a sftp server as .filename and rename it when
> > done. Default settings for PutSFTP. Next it create a new file with the
> > value of the hash and save it as filename.sha256.
> > >  At that sftp server a bash script is looking for NOT hidden files every
> > 2 seconds with a ls command. If there are files the bash script does a cp
> > filename /archive/filename and sends the data to server 3 via a data diode.
> > At the other side another nifi server reads the filename.sha256, reads in
> > the hash value and reads in the original data. Calculate a new sha256 and
> > compare the two hashes.
> > > Yesterday there was a corruption again and we checked the file at the
> > first sftp server where the first nifi saved it after creating the first
> > hash. Running a sha256sum at the /archive/filename produced a different
> > hash than nifi. So after the PutSFTP and a Linux cp command the file was
> > corrupted.
> > > It have been less than 1 file pr. 1.000.000 files where we have seen
> > theses issues. But we see them.
> > > Now we try to investigate that course the issue. Therefore I created the
> > small test flow and already after nearly 9000 iteration in the loop the
> > file has been corrupted just being uploaded and downloaded again.
> > >
> > > Are we facing a network issue where a data packed is corrupted?
> > > Are there a very rare cases where the sftp implementation is doing
> > something wrong?
> > > We don’t know yet but we are running some more tests and at different
> > systems to narrow it down
> > >
> > > Kind regards
> > > Jens M. Kofoed
> > >
> > > > Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <jo...@gmail.com>:
> > > >
> > > > Hello
> > > >
> > > > How does nifi grab the data from the file system?  It sounds like it is
> > > > doing partial reads due to a competing consumer (data still being
> > written)
> > > > scenario.
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed <
> > jmkofoed.ube@gmail.com>
> > > > wrote:
> > > >
> > > >> Dear Developers
> > > >>
> > > >> We have a situation where we see corrupted file after using PutSFTP
> > and
> > > >> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK
> > Runtime
> > > >> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK
> > 64-Bit
> > > >> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server
> > 20.04
> > > >>
> > > >> We have a flow between 2 separated systems where we use a PUTSFTP to
> > export
> > > >> data from one NIFI instance to a datadiode and use FetchSFTP to grep
> > data
> > > >> on the other end. To be sure data is not corrupted we calculate a
> > SHA256 on
> > > >> each side, and transfer the flowfile metadata in a seperate file. In
> > rare
> > > >> cases have see that the SHA256 doesn't match on both sides and are
> > > >> investigation where the errors happens. We see 2 errors. Manually
> > > >> calculation a SHA256 on both side of the diodes the file is OK and we
> > have
> > > >> found that the errors at  happens between NIFI and the SFTP servers.
> > And it
> > > >> can happens at both sides.
> > > >> So for testing I created this little flow:
> > > >> GeneratingFlowFile (size 100MB) (Run once) ->
> > > >> CryptographicHashContent (SHA256) ->
> > > >> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) ->
> > > >> PutSFTP ->
> > > >> FetchSFTP ->
> > > >> CryptographicHashContent (SHA256) ->
> > > >> routeOnAttribute (compare root.hash vs.content_SHA-256)
> > > >>    If unmatch ->
> > > >>        Going to a disabled process for placeholding the corrupted
> > file in
> > > >> a file queue
> > > >>    If match ->
> > > >>        UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping
> > back
> > > >> to PutSFTP
> > > >>
> > > >> After 8992 iteration the file is corrupted. To test if the errors are
> > in
> > > >> the calculation of the SHA256 I have a copy of the flow without the
> > > >> PUT/FETCH SFTP processors which haven't got any errors yet.
> > > >>
> > > >> It is very rare that we see these errors, millions of files are going
> > > >> through without any issues but some time it happens which is not good.
> > > >>
> > > >> Can any one please help? Maybe trying to setup the same test and see
> > if you
> > > >> also have a corrupted file after some days.
> > > >>
> > > >> Kind regards
> > > >> Jens M. Kofoed
> > > >>
> >

Re: File corruption with Put/Fetch SFTP

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Dear Joe

Regarding you point 5. This is almost also what I'm doing. But last night
at my phone I "just wrote" we created a hash file. What I'm actually doing
is converting the flowfile to json.
Are there a way where NIFI can export the complete flowfile (attributes and
content) into 1 file, which we can import again on the other side? Right
now I do it in 2 steps
Below is a short description of my flow for transferring data between
systems where we can't use S2S.
At low side:
get data ->
  CryptographicHashContent ->
    UpdateAttribute: original.filename = ${filename},
rootHash=${content_SHA-256} ->
      UpdateAttribute: filename=${UUID()} ->
        PutSFTP ->
          AttributesToJSON: Destination=flowfile-content ->
            UpdateAttribute: filename=${filename:append('.flowfile')} ->
              PutSFTP

At high side:
ListSFTP: File filter Regex = .*\.flowfile ->
  FetchSFTP ->
    ExecuteScript: (converting json data into attributes) ->
      UpdateAttribute: filename = ${filename:substringBefore('.flowfile')}
->
        FetchSFTP ->
          CryptographicHashContent ->
            RouteOnAttribute: Hash_OK =
${rootHash:equals(${content_SHA-256})} ->
              Hash_OK -> following production flow
              Unmatched -> Error flow

Kind regards
Jens

Den tir. 12. okt. 2021 kl. 21.36 skrev Joe Witt <jo...@gmail.com>:

> Jens
>
> For such a setup the very specific details matter and here there are a
> lot of details.  It isn't easy to sort through this for me so I'll
> keep it high level based on my experience in very similar
> situations/setups:
>
> 1. I'd generally trust SFTP to be awesome and damn near failure proof
> in itself.  I'd focus on other things.
> 2. I'd generally trust that data packet corruption in terms of network
> transfer is bulletproof and not think that is a problem especially
> since SFTP and various protocols employed here offer certain
> guarantees themselves (including nifi).
> 3. I'd be suspect of one way transfer/guard devices creating issues.
> I'd remove that and try to reproduce the problem.
> 4. In linux a cp/mv is not atomic as I understand if data is spanning
> across file systems so you could have partially written data scenarios
> here potentially.
> 5. I'd be careful to avoid multiple file scenarios such as original
> content and the sha256.  Instead if the low side is a NiFi and the
> high side is a NiFi I'd have lowside nifi write out flowfiles and pass
> those over the guard device.  Why?  Because this gives you your
> original content AND the flowfile attributes (where I'd have the
> sha256).  On the high side nifi i'd unpack that flow file and ensure
> the content matches the stated sha256.
>
> Joe
>
> On Tue, Oct 12, 2021 at 12:25 PM Jens M. Kofoed <jm...@gmail.com>
> wrote:
> >
> > Hi Joe
> >
> > I know what you are thinking but that’s not the case.
> > Check my very short description of my test flow.
> > In my loop the PutSFTP process is using default settings which means
> it’s uploading files as .filename and rename it when done. The next process
> is the FetchSFTP which will load the file as filename. If PutSFTP is not
> finished uploading the file it will have the wrong filename and the flow
> file will not go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP
> can’t fetch the file. So in my test flow it is not the case.
> >
> > In our production flow, after nifi gets its data it calculates the
> sha256.  uploads the data to a sftp server as .filename and rename it when
> done. Default settings for PutSFTP. Next it create a new file with the
> value of the hash and save it as filename.sha256.
> >  At that sftp server a bash script is looking for NOT hidden files every
> 2 seconds with a ls command. If there are files the bash script does a cp
> filename /archive/filename and sends the data to server 3 via a data diode.
> At the other side another nifi server reads the filename.sha256, reads in
> the hash value and reads in the original data. Calculate a new sha256 and
> compare the two hashes.
> > Yesterday there was a corruption again and we checked the file at the
> first sftp server where the first nifi saved it after creating the first
> hash. Running a sha256sum at the /archive/filename produced a different
> hash than nifi. So after the PutSFTP and a Linux cp command the file was
> corrupted.
> > It have been less than 1 file pr. 1.000.000 files where we have seen
> theses issues. But we see them.
> > Now we try to investigate that course the issue. Therefore I created the
> small test flow and already after nearly 9000 iteration in the loop the
> file has been corrupted just being uploaded and downloaded again.
> >
> > Are we facing a network issue where a data packed is corrupted?
> > Are there a very rare cases where the sftp implementation is doing
> something wrong?
> > We don’t know yet but we are running some more tests and at different
> systems to narrow it down
> >
> > Kind regards
> > Jens M. Kofoed
> >
> > > Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <jo...@gmail.com>:
> > >
> > > Hello
> > >
> > > How does nifi grab the data from the file system?  It sounds like it is
> > > doing partial reads due to a competing consumer (data still being
> written)
> > > scenario.
> > >
> > > Thanks
> > >
> > > On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed <
> jmkofoed.ube@gmail.com>
> > > wrote:
> > >
> > >> Dear Developers
> > >>
> > >> We have a situation where we see corrupted file after using PutSFTP
> and
> > >> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK
> Runtime
> > >> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK
> 64-Bit
> > >> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server
> 20.04
> > >>
> > >> We have a flow between 2 separated systems where we use a PUTSFTP to
> export
> > >> data from one NIFI instance to a datadiode and use FetchSFTP to grep
> data
> > >> on the other end. To be sure data is not corrupted we calculate a
> SHA256 on
> > >> each side, and transfer the flowfile metadata in a seperate file. In
> rare
> > >> cases have see that the SHA256 doesn't match on both sides and are
> > >> investigation where the errors happens. We see 2 errors. Manually
> > >> calculation a SHA256 on both side of the diodes the file is OK and we
> have
> > >> found that the errors at  happens between NIFI and the SFTP servers.
> And it
> > >> can happens at both sides.
> > >> So for testing I created this little flow:
> > >> GeneratingFlowFile (size 100MB) (Run once) ->
> > >> CryptographicHashContent (SHA256) ->
> > >> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) ->
> > >> PutSFTP ->
> > >> FetchSFTP ->
> > >> CryptographicHashContent (SHA256) ->
> > >> routeOnAttribute (compare root.hash vs.content_SHA-256)
> > >>    If unmatch ->
> > >>        Going to a disabled process for placeholding the corrupted
> file in
> > >> a file queue
> > >>    If match ->
> > >>        UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping
> back
> > >> to PutSFTP
> > >>
> > >> After 8992 iteration the file is corrupted. To test if the errors are
> in
> > >> the calculation of the SHA256 I have a copy of the flow without the
> > >> PUT/FETCH SFTP processors which haven't got any errors yet.
> > >>
> > >> It is very rare that we see these errors, millions of files are going
> > >> through without any issues but some time it happens which is not good.
> > >>
> > >> Can any one please help? Maybe trying to setup the same test and see
> if you
> > >> also have a corrupted file after some days.
> > >>
> > >> Kind regards
> > >> Jens M. Kofoed
> > >>
>

Re: File corruption with Put/Fetch SFTP

Posted by Joe Witt <jo...@gmail.com>.
Jens

For such a setup the very specific details matter and here there are a
lot of details.  It isn't easy to sort through this for me so I'll
keep it high level based on my experience in very similar
situations/setups:

1. I'd generally trust SFTP to be awesome and damn near failure proof
in itself.  I'd focus on other things.
2. I'd generally trust that data packet corruption in terms of network
transfer is bulletproof and not think that is a problem especially
since SFTP and various protocols employed here offer certain
guarantees themselves (including nifi).
3. I'd be suspect of one way transfer/guard devices creating issues.
I'd remove that and try to reproduce the problem.
4. In linux a cp/mv is not atomic as I understand if data is spanning
across file systems so you could have partially written data scenarios
here potentially.
5. I'd be careful to avoid multiple file scenarios such as original
content and the sha256.  Instead if the low side is a NiFi and the
high side is a NiFi I'd have lowside nifi write out flowfiles and pass
those over the guard device.  Why?  Because this gives you your
original content AND the flowfile attributes (where I'd have the
sha256).  On the high side nifi i'd unpack that flow file and ensure
the content matches the stated sha256.

Joe

On Tue, Oct 12, 2021 at 12:25 PM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Hi Joe
>
> I know what you are thinking but that’s not the case.
> Check my very short description of my test flow.
> In my loop the PutSFTP process is using default settings which means it’s uploading files as .filename and rename it when done. The next process is the FetchSFTP which will load the file as filename. If PutSFTP is not finished uploading the file it will have the wrong filename and the flow file will not go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP can’t fetch the file. So in my test flow it is not the case.
>
> In our production flow, after nifi gets its data it calculates the sha256.  uploads the data to a sftp server as .filename and rename it when done. Default settings for PutSFTP. Next it create a new file with the value of the hash and save it as filename.sha256.
>  At that sftp server a bash script is looking for NOT hidden files every 2 seconds with a ls command. If there are files the bash script does a cp filename /archive/filename and sends the data to server 3 via a data diode. At the other side another nifi server reads the filename.sha256, reads in the hash value and reads in the original data. Calculate a new sha256 and compare the two hashes.
> Yesterday there was a corruption again and we checked the file at the first sftp server where the first nifi saved it after creating the first hash. Running a sha256sum at the /archive/filename produced a different hash than nifi. So after the PutSFTP and a Linux cp command the file was corrupted.
> It have been less than 1 file pr. 1.000.000 files where we have seen theses issues. But we see them.
> Now we try to investigate that course the issue. Therefore I created the small test flow and already after nearly 9000 iteration in the loop the file has been corrupted just being uploaded and downloaded again.
>
> Are we facing a network issue where a data packed is corrupted?
> Are there a very rare cases where the sftp implementation is doing something wrong?
> We don’t know yet but we are running some more tests and at different systems to narrow it down
>
> Kind regards
> Jens M. Kofoed
>
> > Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <jo...@gmail.com>:
> >
> > Hello
> >
> > How does nifi grab the data from the file system?  It sounds like it is
> > doing partial reads due to a competing consumer (data still being written)
> > scenario.
> >
> > Thanks
> >
> > On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed <jm...@gmail.com>
> > wrote:
> >
> >> Dear Developers
> >>
> >> We have a situation where we see corrupted file after using PutSFTP and
> >> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK Runtime
> >> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit
> >> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server 20.04
> >>
> >> We have a flow between 2 separated systems where we use a PUTSFTP to export
> >> data from one NIFI instance to a datadiode and use FetchSFTP to grep data
> >> on the other end. To be sure data is not corrupted we calculate a SHA256 on
> >> each side, and transfer the flowfile metadata in a seperate file. In rare
> >> cases have see that the SHA256 doesn't match on both sides and are
> >> investigation where the errors happens. We see 2 errors. Manually
> >> calculation a SHA256 on both side of the diodes the file is OK and we have
> >> found that the errors at  happens between NIFI and the SFTP servers. And it
> >> can happens at both sides.
> >> So for testing I created this little flow:
> >> GeneratingFlowFile (size 100MB) (Run once) ->
> >> CryptographicHashContent (SHA256) ->
> >> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) ->
> >> PutSFTP ->
> >> FetchSFTP ->
> >> CryptographicHashContent (SHA256) ->
> >> routeOnAttribute (compare root.hash vs.content_SHA-256)
> >>    If unmatch ->
> >>        Going to a disabled process for placeholding the corrupted file in
> >> a file queue
> >>    If match ->
> >>        UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping back
> >> to PutSFTP
> >>
> >> After 8992 iteration the file is corrupted. To test if the errors are in
> >> the calculation of the SHA256 I have a copy of the flow without the
> >> PUT/FETCH SFTP processors which haven't got any errors yet.
> >>
> >> It is very rare that we see these errors, millions of files are going
> >> through without any issues but some time it happens which is not good.
> >>
> >> Can any one please help? Maybe trying to setup the same test and see if you
> >> also have a corrupted file after some days.
> >>
> >> Kind regards
> >> Jens M. Kofoed
> >>

Re: File corruption with Put/Fetch SFTP

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Joe

I know what you are thinking but that’s not the case. 
Check my very short description of my test flow. 
In my loop the PutSFTP process is using default settings which means it’s uploading files as .filename and rename it when done. The next process is the FetchSFTP which will load the file as filename. If PutSFTP is not finished uploading the file it will have the wrong filename and the flow file will not go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP can’t fetch the file. So in my test flow it is not the case. 

In our production flow, after nifi gets its data it calculates the sha256.  uploads the data to a sftp server as .filename and rename it when done. Default settings for PutSFTP. Next it create a new file with the value of the hash and save it as filename.sha256. 
 At that sftp server a bash script is looking for NOT hidden files every 2 seconds with a ls command. If there are files the bash script does a cp filename /archive/filename and sends the data to server 3 via a data diode. At the other side another nifi server reads the filename.sha256, reads in the hash value and reads in the original data. Calculate a new sha256 and compare the two hashes. 
Yesterday there was a corruption again and we checked the file at the first sftp server where the first nifi saved it after creating the first hash. Running a sha256sum at the /archive/filename produced a different hash than nifi. So after the PutSFTP and a Linux cp command the file was corrupted.
It have been less than 1 file pr. 1.000.000 files where we have seen theses issues. But we see them.
Now we try to investigate that course the issue. Therefore I created the small test flow and already after nearly 9000 iteration in the loop the file has been corrupted just being uploaded and downloaded again. 

Are we facing a network issue where a data packed is corrupted?
Are there a very rare cases where the sftp implementation is doing something wrong?
We don’t know yet but we are running some more tests and at different systems to narrow it down 

Kind regards 
Jens M. Kofoed 

> Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <jo...@gmail.com>:
> 
> Hello
> 
> How does nifi grab the data from the file system?  It sounds like it is
> doing partial reads due to a competing consumer (data still being written)
> scenario.
> 
> Thanks
> 
> On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed <jm...@gmail.com>
> wrote:
> 
>> Dear Developers
>> 
>> We have a situation where we see corrupted file after using PutSFTP and
>> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK Runtime
>> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit
>> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server 20.04
>> 
>> We have a flow between 2 separated systems where we use a PUTSFTP to export
>> data from one NIFI instance to a datadiode and use FetchSFTP to grep data
>> on the other end. To be sure data is not corrupted we calculate a SHA256 on
>> each side, and transfer the flowfile metadata in a seperate file. In rare
>> cases have see that the SHA256 doesn't match on both sides and are
>> investigation where the errors happens. We see 2 errors. Manually
>> calculation a SHA256 on both side of the diodes the file is OK and we have
>> found that the errors at  happens between NIFI and the SFTP servers. And it
>> can happens at both sides.
>> So for testing I created this little flow:
>> GeneratingFlowFile (size 100MB) (Run once) ->
>> CryptographicHashContent (SHA256) ->
>> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) ->
>> PutSFTP ->
>> FetchSFTP ->
>> CryptographicHashContent (SHA256) ->
>> routeOnAttribute (compare root.hash vs.content_SHA-256)
>>    If unmatch ->
>>        Going to a disabled process for placeholding the corrupted file in
>> a file queue
>>    If match ->
>>        UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping back
>> to PutSFTP
>> 
>> After 8992 iteration the file is corrupted. To test if the errors are in
>> the calculation of the SHA256 I have a copy of the flow without the
>> PUT/FETCH SFTP processors which haven't got any errors yet.
>> 
>> It is very rare that we see these errors, millions of files are going
>> through without any issues but some time it happens which is not good.
>> 
>> Can any one please help? Maybe trying to setup the same test and see if you
>> also have a corrupted file after some days.
>> 
>> Kind regards
>> Jens M. Kofoed
>> 

Re: File corruption with Put/Fetch SFTP

Posted by Joe Witt <jo...@gmail.com>.
Hello

How does nifi grab the data from the file system?  It sounds like it is
doing partial reads due to a competing consumer (data still being written)
scenario.

Thanks

On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed <jm...@gmail.com>
wrote:

> Dear Developers
>
> We have a situation where we see corrupted file after using PutSFTP and
> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK Runtime
> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit
> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server 20.04
>
> We have a flow between 2 separated systems where we use a PUTSFTP to export
> data from one NIFI instance to a datadiode and use FetchSFTP to grep data
> on the other end. To be sure data is not corrupted we calculate a SHA256 on
> each side, and transfer the flowfile metadata in a seperate file. In rare
> cases have see that the SHA256 doesn't match on both sides and are
> investigation where the errors happens. We see 2 errors. Manually
> calculation a SHA256 on both side of the diodes the file is OK and we have
> found that the errors at  happens between NIFI and the SFTP servers. And it
> can happens at both sides.
> So for testing I created this little flow:
> GeneratingFlowFile (size 100MB) (Run once) ->
> CryptographicHashContent (SHA256) ->
> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) ->
> PutSFTP ->
> FetchSFTP ->
> CryptographicHashContent (SHA256) ->
> routeOnAttribute (compare root.hash vs.content_SHA-256)
>     If unmatch ->
>         Going to a disabled process for placeholding the corrupted file in
> a file queue
>     If match ->
>         UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping back
> to PutSFTP
>
> After 8992 iteration the file is corrupted. To test if the errors are in
> the calculation of the SHA256 I have a copy of the flow without the
> PUT/FETCH SFTP processors which haven't got any errors yet.
>
> It is very rare that we see these errors, millions of files are going
> through without any issues but some time it happens which is not good.
>
> Can any one please help? Maybe trying to setup the same test and see if you
> also have a corrupted file after some days.
>
> Kind regards
> Jens M. Kofoed
>