You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/14 14:56:07 UTC

[GitHub] [hudi] WilliamShine opened a new issue #4816: [SUPPORT] hudi flink sql suport for _hoodie_commit_time Column

WilliamShine opened a new issue #4816:
URL: https://github.com/apache/hudi/issues/4816


   flink run this sql will get NPE:
   
   CREATE TABLE `account`(
     `_hoodie_commit_time` string, 
     `_hoodie_commit_seqno` string, 
     `_hoodie_record_key` string, 
     `_hoodie_partition_path` string, 
     `_hoodie_file_name` string, 
     `_ts_ms` bigint, 
     `_op` string, 
     `_hoodie_is_deleted` boolean, 
     `id` int, 
     `val` int, 
     `created_at` bigint, 
     `hh` string,
   `dt` string,
   PRIMARY KEY (`id`)  NOT ENFORCED)
   PARTITIONED BY (`dt`)
   WITH (
     'connector' = 'hudi',
     'path' = 's3://de-hive-test/ods_test_debezium_nick.db/test_ods_monitor1',
     'table.type' = 'MERGE_ON_READ'
   );
   CREATE TABLE if not exists `printTable` (
     `_hoodie_commit_time` string, 
     `_hoodie_commit_seqno` string, 
     `_hoodie_record_key` string, 
     `_hoodie_partition_path` string, 
     `_hoodie_file_name` string, 
     `_ts_ms` bigint, 
     `_op` string, 
     `_hoodie_is_deleted` boolean, 
     `id` int, 
     `val` int, 
     `created_at` bigint, 
     `hh` string,
   `dt` string
   ) WITH (
   'connector' = 'print'
   );
   INSERT INTO  printTable select * from account;
   
   
   why MergeOnReadInputFormat.getRequiredPosWithCommitTime  'add _hoodie_commit_time' for schema field?
   
   if sql have  'add _hoodie_commit_time' colum,schema will be 'add _hoodie_commit_time','add _hoodie_commit_time',`'hoodie_commit_seqno'......,
   the columnReaders[i].readToVector(num, writableVectors[i]) in ParquetColumnarRowSplitReader.nextBatch
   
   pageReader.readPage in AbstractColumnReader.readToVector will read  'add _hoodie_commit_time'  column twice,
   but in Parquet 1.11 ColumnChunkPageReadStore.readPage is DataPage compressedPage = compressedPages.poll();
   
   compressedPage will be null, final NEP will be happend.
   
   Can you tall me why add 'add _hoodie_commit_time' column in default?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4816: [SUPPORT] hudi flink sql suport for _hoodie_commit_time Column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4816:
URL: https://github.com/apache/hudi/issues/4816#issuecomment-1067587422


   @WilliamShine if your query answered. can we close the github issue if you are good. if not, can you follow up please. 
   thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4816: [SUPPORT] hudi flink sql suport for _hoodie_commit_time Column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4816:
URL: https://github.com/apache/hudi/issues/4816#issuecomment-1073050165


   closing the issue for now. feel free to re-open if need more assistance. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #4816: [SUPPORT] hudi flink sql suport for _hoodie_commit_time Column

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #4816:
URL: https://github.com/apache/hudi/issues/4816


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4816: [SUPPORT] hudi flink sql suport for _hoodie_commit_time Column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4816:
URL: https://github.com/apache/hudi/issues/4816#issuecomment-1047293509


   @danny0405 @leesf : can you loop in someone to assist here please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #4816: [SUPPORT] hudi flink sql suport for _hoodie_commit_time Column

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #4816:
URL: https://github.com/apache/hudi/issues/4816#issuecomment-1047478211


   The metadata field is added for time based filtering, because for streaming read we need to filter the data stream for each reading batch ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org