You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/08/04 05:40:33 UTC

[GitHub] [orc] guiyanakuang commented on pull request #810: ORC-904: Use Map for userMetadata in ReaderImpl

guiyanakuang commented on pull request #810:
URL: https://github.com/apache/orc/pull/810#issuecomment-892378836


   > @guiyanakuang . Technically, this looks reasonable because Java `WriterImpl` ensures that there is no duplicated keys at least Java layer(writer and reader)
   > 
   > ```java
   > private final Map<String, ByteString> userMetadata = new TreeMap<>();
   > ```
   > 
   > However, it seems that `ORC Foot` spec itself has no assumption that the user metadata items have unique keys. I'm wondering if we already have some other places to put this unique key assumption. Otherwise, this PR might not be safe with another ORC writers.
   > 
   > ```proto
   > message Footer {
   >   optional uint64 headerLength = 1;
   >   optional uint64 contentLength = 2;
   >   repeated StripeInformation stripes = 3;
   >   repeated Type types = 4;
   >   repeated UserMetadataItem metadata = 5;
   > ```
   > 
   > cc @omalley , @pgaref , @wgtmac , @williamhyun
   
   @dongjoon-hyun Additional commits solve the compatibility problem
   
   ```java
       private void lazyInitCache() {
         if (metadataCache == null) {
           metadataCache = new TreeMap<>();
           for(OrcProto.UserMetadataItem item: innerUserMetadata) {
             metadataCache.putIfAbsent(item.getName(), item.getValue());
           }
         }
       }
   ```
   `metadataCache.putIfAbsent(item.getName(), item.getValue());` , the same key will only get the first value. Keeping the same logic as the loop traversal.
   `getMetadataKeys` is still the same as before and can get non-unique keys.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org