You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by abcfd abcdd <of...@gmail.com> on 2020/04/22 03:35:23 UTC

Bug reports about working with VolatileContentRepository and VolatileFlowFileRepository

Hola，

Our team has worked with NiFi for over one year. Our scenario is dealing
with 3-5 billion data using NiFi, we found that
WriteAheadFlowFileRepository and FileSystemRepository cannot meet
command，so we put data which need to be consumed in tmpfs and choose
VolatileFlowFileRepository and VolatileContentRepository to reduce I/O
costs and avoid WAL, because in our scenario, the data can be thrown away
when backpressure occurs or NiFi restarted.

But, we find three problems working with VolatileFlowFileRepository and
VolatileContentRepository.
1. VolatileContentRepository
when maxSize = 100MB and blockSize = 2KB, there should be 51200 "slots". If
we write one kb by one kb, 102400 one kb should be written in, but when
writing 51201th one kb, "java.io.IOException: Content Repository is out of
space" occurs. Here's the Junit Test I write.

@Test
public void test() throws IOException {
    System.setProperty(NiFiProperties.PROPERTIES_FILE_PATH,
TestVolatileContentRepository.class.getResource("/conf/nifi.properties").getFile());
    final Map<String, String> addProps = new HashMap<>();
    addProps.put(VolatileContentRepository.BLOCK_SIZE_PROPERTY, "2 KB");
    final NiFiProperties nifiProps =
NiFiProperties.createBasicNiFiProperties(null, addProps);
    final VolatileContentRepository contentRepo = new
VolatileContentRepository(nifiProps);
    contentRepo.initialize(claimManager);
    // can write 100 * 1024 /1 = 102400, but after 51201, blocks exhausted
    for (int idx =0; idx < 51201; ++idx) {
        final ContentClaim claim = contentRepo.create(true);
        try (final OutputStream out = contentRepo.write(claim)){
            final byte[] oneK = new byte[1024];
            Arrays.fill(oneK, (byte) 55);

            out.write(oneK);
        }
    }
}

2. VolatileFlowFileRepository
When the backpressure occurs, FileSystemSwapManager will swap out FlowFiles
to disk whenever swapQueue size exceeds 10000,  there's no problem in
swap-out process BUT in swap-in process, VolatileFlowFileRepository does
not "acknowledge" the FlowFiles which has been swap out when
FileSystemSwapManager swaps in FlowFiles from the disk and logs the warning
information "Cannot swap in FlowFiles from location..." because the
implementation of "isValidSwapLocationSuffix" in VolatileFlowFileRepository
is always FALSE.
And the queue is still like FULL when checking the NiFi frontend, the
upstream processor is STUCKED, maybe FileSystemSwapManager "thinks" these
FlowFiles are still not consumed.

3. we found that NiFi cannot live more than a week even if we use
WriteAheadFlowFileRepository and FileSystemRepository. NiFi stucked,
 didn't process any data and there was no output in nifi-app.log. We
restart NiFi and it is back to normal, but we didn't know what happened.

Muchas Gracias

Re: Bug reports about working with VolatileContentRepository and VolatileFlowFileRepository

Posted by zhangxinchen <of...@gmail.com>.

Thanks Mark,

I have posted these issues in Jira. 
https://issues.apache.org/jira/projects/NIFI/issues/NIFI-7388?filter=allissues

Your suggestion is very constructive, we have switched back to WriteAheadFlowFileRepository and FileSystemRepository temporarily, looking forward to a new version which can fix these issues.

We are running NiFi-1.11.4. The environment is jdk8_141, docker 18.04.0-ce and Kubernetes v1.13.3. 
We doubt that it may be related to CodeCache in JVM, so we have increase CodeCacheSize to 256MB, using CodeCacheFlushing. We will continue to pay attention to this issue and gather a thread dump when stuck occurs according to your suggestions.

ZhangXinchen

On 2020/04/22 14:51:51, Mark Payne <ma...@hotmail.com> wrote: 
> Hello,
> 
> Thanks for reporting the issues with the Volatile Content & FlowFile Repositories. These definitely sounds like bugs. Do you mind filing a Jira [1] for these?
> If you’d like to store everything in memory, though, my recommendation would honestly be to use a RAM disk rather than the volatile repositories, since the standard repositories are much more widely used and therefore extremely well tested, while the volatile implementations are much less so.
> 
> In terms of the last issue, in which NiFi becomes stuck after a week: what version of NiFi are you running? I would recommend gathering a thread dump (bin/nifi.sh dump dump1.txt) when this occurs and providing the thread dump so that it can be analyzed to determine what’s happening.
> 
> Thanks
> -Mark
> 
> 
> [1] https://issues.apache.org/jira/projects/NIFI
> 
> 
> 
> On Apr 21, 2020, at 11:35 PM, abcfd abcdd <of...@gmail.com>> wrote:
> 
> Hola，
> 
> Our team has worked with NiFi for over one year. Our scenario is dealing with 3-5 billion data using NiFi, we found that WriteAheadFlowFileRepository and FileSystemRepository cannot meet command，so we put data which need to be consumed in tmpfs and choose VolatileFlowFileRepository and VolatileContentRepository to reduce I/O costs and avoid WAL, because in our scenario, the data can be thrown away when backpressure occurs or NiFi restarted.
> 
> But, we find three problems working with VolatileFlowFileRepository and VolatileContentRepository.
> 1. VolatileContentRepository
> when maxSize = 100MB and blockSize = 2KB, there should be 51200 "slots". If we write one kb by one kb, 102400 one kb should be written in, but when writing 51201th one kb, "java.io.IOException: Content Repository is out of space" occurs. Here's the Junit Test I write.
> 
> @Test
> public void test() throws IOException {
>     System.setProperty(NiFiProperties.PROPERTIES_FILE_PATH, TestVolatileContentRepository.class.getResource("/conf/nifi.properties").getFile());
>     final Map<String, String> addProps = new HashMap<>();
>     addProps.put(VolatileContentRepository.BLOCK_SIZE_PROPERTY, "2 KB");
>     final NiFiProperties nifiProps = NiFiProperties.createBasicNiFiProperties(null, addProps);
>     final VolatileContentRepository contentRepo = new VolatileContentRepository(nifiProps);
>     contentRepo.initialize(claimManager);
>     // can write 100 * 1024 /1 = 102400, but after 51201, blocks exhausted
>     for (int idx =0; idx < 51201; ++idx) {
>         final ContentClaim claim = contentRepo.create(true);
>         try (final OutputStream out = contentRepo.write(claim)){
>             final byte[] oneK = new byte[1024];
>             Arrays.fill(oneK, (byte) 55);
> 
>             out.write(oneK);
>         }
>     }
> }
> 
> 2. VolatileFlowFileRepository
> When the backpressure occurs, FileSystemSwapManager will swap out FlowFiles to disk whenever swapQueue size exceeds 10000,  there's no problem in swap-out process BUT in swap-in process, VolatileFlowFileRepository does not "acknowledge" the FlowFiles which has been swap out when FileSystemSwapManager swaps in FlowFiles from the disk and logs the warning information "Cannot swap in FlowFiles from location..." because the implementation of "isValidSwapLocationSuffix" in VolatileFlowFileRepository is always FALSE.
> And the queue is still like FULL when checking the NiFi frontend, the upstream processor is STUCKED, maybe FileSystemSwapManager "thinks" these FlowFiles are still not consumed.
> 
> 3. we found that NiFi cannot live more than a week even if we use WriteAheadFlowFileRepository and FileSystemRepository. NiFi stucked,  didn't process any data and there was no output in nifi-app.log. We restart NiFi and it is back to normal, but we didn't know what happened.
> 
> Muchas Gracias
> 
>

Re: Bug reports about working with VolatileContentRepository and VolatileFlowFileRepository

Posted by Mark Payne <ma...@hotmail.com>.

Hello,

Thanks for reporting the issues with the Volatile Content & FlowFile Repositories. These definitely sounds like bugs. Do you mind filing a Jira [1] for these?
If you’d like to store everything in memory, though, my recommendation would honestly be to use a RAM disk rather than the volatile repositories, since the standard repositories are much more widely used and therefore extremely well tested, while the volatile implementations are much less so.

In terms of the last issue, in which NiFi becomes stuck after a week: what version of NiFi are you running? I would recommend gathering a thread dump (bin/nifi.sh dump dump1.txt) when this occurs and providing the thread dump so that it can be analyzed to determine what’s happening.

Thanks
-Mark


[1] https://issues.apache.org/jira/projects/NIFI



On Apr 21, 2020, at 11:35 PM, abcfd abcdd <of...@gmail.com>> wrote:

Hola，

Our team has worked with NiFi for over one year. Our scenario is dealing with 3-5 billion data using NiFi, we found that WriteAheadFlowFileRepository and FileSystemRepository cannot meet command，so we put data which need to be consumed in tmpfs and choose VolatileFlowFileRepository and VolatileContentRepository to reduce I/O costs and avoid WAL, because in our scenario, the data can be thrown away when backpressure occurs or NiFi restarted.

But, we find three problems working with VolatileFlowFileRepository and VolatileContentRepository.
1. VolatileContentRepository
when maxSize = 100MB and blockSize = 2KB, there should be 51200 "slots". If we write one kb by one kb, 102400 one kb should be written in, but when writing 51201th one kb, "java.io.IOException: Content Repository is out of space" occurs. Here's the Junit Test I write.

@Test
public void test() throws IOException {
    System.setProperty(NiFiProperties.PROPERTIES_FILE_PATH, TestVolatileContentRepository.class.getResource("/conf/nifi.properties").getFile());
    final Map<String, String> addProps = new HashMap<>();
    addProps.put(VolatileContentRepository.BLOCK_SIZE_PROPERTY, "2 KB");
    final NiFiProperties nifiProps = NiFiProperties.createBasicNiFiProperties(null, addProps);
    final VolatileContentRepository contentRepo = new VolatileContentRepository(nifiProps);
    contentRepo.initialize(claimManager);
    // can write 100 * 1024 /1 = 102400, but after 51201, blocks exhausted
    for (int idx =0; idx < 51201; ++idx) {
        final ContentClaim claim = contentRepo.create(true);
        try (final OutputStream out = contentRepo.write(claim)){
            final byte[] oneK = new byte[1024];
            Arrays.fill(oneK, (byte) 55);

            out.write(oneK);
        }
    }
}

2. VolatileFlowFileRepository
When the backpressure occurs, FileSystemSwapManager will swap out FlowFiles to disk whenever swapQueue size exceeds 10000,  there's no problem in swap-out process BUT in swap-in process, VolatileFlowFileRepository does not "acknowledge" the FlowFiles which has been swap out when FileSystemSwapManager swaps in FlowFiles from the disk and logs the warning information "Cannot swap in FlowFiles from location..." because the implementation of "isValidSwapLocationSuffix" in VolatileFlowFileRepository is always FALSE.
And the queue is still like FULL when checking the NiFi frontend, the upstream processor is STUCKED, maybe FileSystemSwapManager "thinks" these FlowFiles are still not consumed.

3. we found that NiFi cannot live more than a week even if we use WriteAheadFlowFileRepository and FileSystemRepository. NiFi stucked,  didn't process any data and there was no output in nifi-app.log. We restart NiFi and it is back to normal, but we didn't know what happened.

Muchas Gracias