You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/11/04 16:00:14 UTC

[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-549420908
 
 
   Sorry for the late response @tillrohrmann .
   
   By comparing the current implementation of `ByteBufferUtils` and `MemorySegment`, I think most fundamental methods (`putPrimitive`, `getPrimitive`, `compareTo`, `copyTo`, etc.) are already included in `MemorySegment`, and I agree it's no big deal not to support operating systems that don't support unaligned memory access (thus we don't need those branches for `Unaligned` case in `ByteBufferUtils`), so overall it's definitely workable to replace `ByteBuffer` usage with `MemorySegment`.
   
   Regarding the efforts of the merge, there're mainly two aspects from my point of view:
   
   1. We may still need some modifications on `MemorySegment`. This is to some extent conflict with our original intention of making spill-able backend a standalone module to prevent impact on existing components, so I'm a little bit concerned. 
   
   2. In the current design we map the file segment into memory (in format of `MappedByteBuffer`) and use it as the base of a "Chunk", furthermore splitting it into "Buckets" for finer-grained usage, so everything in-flight is `ByteBuffer`. Replacing `ByteBuffer` with `MemorySegment` requires many interface changes in `CopyOnWriteSkipListStateMap` and `SpaceAllocator`.
   
   I have made a new commit to address @StephanEwen 's comments on value serializer. I also tried to modify key serializer with `DataOutputSerializer` and `DataInputDeserializer` but found it may not be as efficient due to two additional `System.arrayCopy` w/o changing the existing design of position manipulation in `DataOutputSerializer` (currently we can only set the position after data ingestion and don't allow "holes"). Attached is the patch I made for PoC, please check it and let me know your thoughts. Thanks.
   
   [use_data_input_output_serializer_in_skiplist_key_serializer.patch.txt](https://github.com/apache/flink/files/3804829/use_data_input_output_serializer_in_skiplist_key_serializer.patch.txt)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services