You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "pete91z (via GitHub)" <gi...@apache.org> on 2023/03/10 10:42:34 UTC

[GitHub] [hudi] pete91z commented on issue #8118: [SUPPORT] error in run_sync_tool.sh

pete91z commented on issue #8118:
URL: https://github.com/apache/hudi/issues/8118#issuecomment-1463613215

   Workaround I'm using at the moment is to create the table in spark-sql, but omitting the tblproperties clause:
   
   CREATE EXTERNAL TABLE IF NOT EXISTS persis.tempstream_hudi( _hoodie_commit_time string, _hoodie_commit_seqno string, _hoodie_record_key string, _hoodie_partition_path string, _hoodie_file_name string, id string, reading bigint, record_ts string) PARTITIONED BY (location int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ('hoodie.query.as.ro.table'='false','path'='s3a:///Airflow/DEV/LANDING/tempstream_hudi_2/') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a:///Airflow/DEV/LANDING/tempstream_hudi_2/';
   
   then adding the partitions manually:
   
   alter table tempstream_hudi add if not exists partition(location=1) LOCATION 's3a:///Airflow/DEV/LANDING/tempstream_hudi_2/1';
   
   alter table tempstream_hudi add if not exists partition(location=2) LOCATION 's3a:///Airflow/DEV/LANDING/tempstream_hudi_2/2';
   
   alter table tempstream_hudi add if not exists partition(location=3) LOCATION 's3a:///Airflow/DEV/LANDING/tempstream_hudi_2/3';
   
   HOWEVER - accessing the table via thrift / Hive metastore is not Hudi aware, and a select query returns rows from all files (and therefore potentially duplicates, so I have to window functions to show only the latest row versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org