You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/09/29 04:49:43 UTC

[GitHub] [hive] amansinha100 commented on a diff in pull request #3628: HIVE-26320: Deserialize Parquet VARCHAR and CHAR types appropriately

amansinha100 commented on code in PR #3628:
URL: https://github.com/apache/hive/pull/3628#discussion_r983060737


##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java:
##########
@@ -91,11 +93,61 @@ public class ParquetHiveSerDe extends AbstractSerDe implements SchemaInference {
 
   private ObjectInspector objInspector;
   private ParquetHiveRecord parquetRow;
+  private ObjectInspectorConverters.Converter converter;
 
   public ParquetHiveSerDe() {
     parquetRow = new ParquetHiveRecord();
   }
 
+  // Recursively check if CHAR or VARCHAR types are used
+  private boolean needsConversion(TypeInfo type) {

Review Comment:
   Thinking about the number of times needsConversion() and the subsequent convert() would be called for a table scan that is reading say N parquet files each with m row groups.  Such operations on a per file or per row group would be ok but if done on per row basis, it would be a perf hit. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org