You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/06/27 10:56:34 UTC
[GitHub] [flink] StephanEwen commented on issue #8880: [FLINK-12986] [network] Fix instability of BoundedBlockingSubpartition under memory pressure.

StephanEwen commented on issue #8880: [FLINK-12986] [network] Fix instability of BoundedBlockingSubpartition under memory pressure.
URL: https://github.com/apache/flink/pull/8880#issuecomment-506298639
 
 
   @zhijiangW Thanks, these are very good comments. Some answers in line.
   
   > Maybe we should not always use the FILE_MMAP type as default. It could give an option to config or system auto selects the proper type internally based on some factors. E.g. via automatically checking whether use 64 bit JRE, if not only small size file fitting into virtual address space could use MMAP. For 32 bit JRE, FILE_FILE might be a better choice.
   
   That is a fair point. We could do that.
   Just out of curiosity: Do you know how many people use 32bit JVMs still?
   
   >n Java has currently the limitation of not being able to unmap files from user code. Due to this bug ("http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038") in Sun's JRE, the files are unmapped when GC releases the byte buffers. We could supply the workaround mentioned in the bug report (see {@link #setUseUnmap}), which may fail on non-Oracle/OpenJDK JVMs. It forcefully unmaps the buffer on close by using an undocumented internal cleanup functionality. We could also check automatically whether this way is supported, then it might also be another factor to decide whether use FILE_MMAP by default.
   
   The code explicitly unmaps files using a utility from Netty, see https://github.com/apache/flink/pull/8880/files#diff-94d133764660d5d8372ca24c08257e25R138
   This should work, the JVM actually segfaults if you use the buffer after that.
   
   > Currently we use Integer.MAX_VALUE as every mmap region size. If we do not want to make it configurable because users might also do not know how to tune it, we might automatically select different size based on whether 32 or 64 bit JRE. Because on 32 bit platform the address space can be very fragmented, so large size might not be mapped. Using a lower region size makes the implementation a little bit slower, but the chance is slower that mmap fails. As I know, lucene uses the size of 1 << 30 for 64 bit, and 1 << 28 for 32 bit.
   
   If we use `FILE` mode for 32 bit JVMs, it seems that this would be solved already.
   
   > In current FILE_MMAP implementation, we would always mmap a region after writing Integer.MAX_VALUE data size. This is simple way because we do not need to consider the partial data margin. But if the produced partitions could not be consumed in time limited by resource, I am not sure whether this pre-mmap has other downsides. E.g. one map produces 5000 subpartitions, but only one reduce task could be deployed at a time limited by resource. Another option is lazy mmap that means establishing the mmap after the partition is requested. But it seems complicated for handling the region margin to avoid partial header or data.
   
   Do we know if the eager mapping has downsides? There is no eager pre-fetching, so there should be no wasted I/O. It would keep the virtual memory page table occupied for longer (is that an issue?) and it might incentivize the kernel to keep that data cached in memory longer, and evict other non-mapped files from the OS file cache instead (sounds like an advantage, could it have a downside?).
   
   Eager mapping simplifies the implementation because it avoids the need to synchronize the lazy mapping when creating multiple readers concurrently.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services