You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/03/23 18:14:08 UTC

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-805121867


   @mcvsubbu 
   > Any reason we are restricting the trailer (or footer) to have only key-value pairs? We don't need to place that restriction as long as the length is also encoded up front. It can be any serialized object, right?
   
   You are right, it can be any serialized object, but restricting to only contains KV pairs has following benefit:
   
   * Any object can be add as a KV pair, just: (key, serialized_object). So it's easy to add new section to footer in future.
   * For all KV pairs in footer, put their keys in enum, so when serialize footer, the order of KV pairs is deterministic. This make all KV pairs is positional/locatable. So we are able to replace value of a given key in footer even after serialized. 
   * If we want to add a new object into data table. If we are OK to put it as a KV pair into footer, we don't need to bum up version Here is the pseudocode of serialize/de-serialize footer:
   ```
   enum footerkeys {
   	k0,
   	k1,
   	k2,
   }
   
   String footerkeysToStr = new String[]{
   	"k0",
   	"k1",
   	"k2",
   }
   
   function serializeFooter() {
    	byte[] bytes;
    	for (key in footerkeys) {
    	    String data = encode_to_str(value_of_key(key));
    	    bytes = append(bytes, len(data));
    	    bytes = append(bytes, data.toBytes());
    	}
   }
   
   function String[] deSerializeFooter(byte[] bytes) {
   	String[] values = new String[len(footerkeys)];
   	for (int i = 0; i < len(footerkeys); i++) {
   	   int data_len = bytes.nextInt();
   	   values[i] = bytes.nextBytesofLens(data_len);
   	}
   }
   
   // If values_i is a complex object instead of a string, we can deserialize it even further:
       String[] footerKVpairs = deSerializeFooter(bytes);
   	Object_i = deserialize(footerKVpairs[i].toBytes());
   
   ```
   So, if we want to add  new object to footer, add it as KV pair, and as long as we add the key as the last one of the enum, old broker will just ignore the extra one, it's back-compatable).
   
   If we make footer not only contains KV pairs, but also other arbitrary serializable objects:
   ```
   +------------------------------------+
   |     
   |    serializable object 1
   |
   +------------------------------------
   |
   |    serializable object 2
   |
   +------------------------------------
   |
   |    KV pairs
   |
   +------------------------------------
   
   ```
   It's not extensible: If we wanner add a serializable_object_3 in between of serializable_object_2 and KV_pairs, we need to bump up version (If we bump version, we can also add in to the middle of data table, not necessarily in footer). 
   
   That's the reason I prefer footer only contains KV pairs: If we want to add a new simple section into data table, and don't want bump up version, add it as KV pair to footer. If we want add new very complex section or re-arrange current sections, add it into the middle of data table, and bump up version.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org