You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/06/11 03:01:47 UTC

[GitHub] [flink] JingsongLi opened a new pull request #8682: [FLINK-12796][table-planner-blink] Introduce BaseArray and BaseMap to reduce conversion overhead to blink

JingsongLi opened a new pull request #8682: [FLINK-12796][table-planner-blink] Introduce BaseArray and BaseMap to reduce conversion overhead to blink
URL: https://github.com/apache/flink/pull/8682
 
 
   ## What is the purpose of the change
   
   Currently, in internal data format of flink, the array is only BinaryArray, and the map is only BinaryMap. If the user writes a UDAF with arrays as parameters and return values, it will lead to frequent conversion between Java arrays and BinaryArrays (each conversion is equivalent to the entire array of copys), which is very time-consuming.
   In order to avoid copy in conversion, BaseArray and BaseMap are introduced as internal formats.
   BaseArray is the parent of GenericArray and BinaryArray, providing various read and write operations on an array.
   GenericArray is a wrapper class for Java arrays, which internally wraps a Java array. This array stores some elements of internal data format.
   Conversion can be avoided when the element type is a primitive type or a type that is consistent internally format and externally format. (Detail see: DataFormatConverters)
   After our benchmark, the performance of UDAF using primitive Array has been improved by 10 times.
   
   ## Brief change log
   
   Introduce BaseArray, GenericArray.
   Introduce BaseMap, GenericMap.
   Modify serializers.
   Modify ScalarOperatorGens.
   Modify DataFormatConverters.
   
   ## Verifying this change
   
   ut
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes)
     - If yes, how is the feature documented? (JavaDocs)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services