You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/18 01:34:22 UTC
[GitHub] [hudi] jiangbiao910 opened a new issue, #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
jiangbiao910 opened a new issue, #6127:
URL: https://github.com/apache/hudi/issues/6127
Hudi has recently released the latest version of Hudi, 0.11.1, which is pulled from Github and modified as follows for our Hadoop environment CDH6.3.2。
Upgrading to 0.11.1 resulting use sparksql:
`
create table if not exists zone_test.hudi_spark_table0718_mor_0111
(
id string,
brand_id int,
name string,
model_id int,
model_name string,
etl_update_time string,
dt string,
hh string
) using hudi
options (
type = 'mor',
primaryKey = 'brand_id,vehicle_model_id',
preCombineField = 'etl_update_time',
hoodie.cleaner.commits.retained = '2',
hoodie.compact.inline=true
)
partitioned by (dt,hh)
;
insert into zone_test.hudi_spark_table0718_mor_0111 partition (dt,hh)
select id,
brand_id,
name,
model_id,
model_name,
CAST(current_timestamp AS string) as etl_update_time,
'20220718',
'10'
from zone_test.test_vehicle_status_2_hi
;
`
now I can see **3 tables** in the Hive
![image](https://user-images.githubusercontent.com/23710717/179433550-5bea5272-da73-4081-985e-1d7ab25857ff.png)
But,there is **zero result** in the table **zone_test.hudi_spark_table0718_mor_0111** and **hudi_spark_table0718_mor_0111_ro
hudi_spark_table0718_mor_0111_rt** has **2 records**。this is a bug?
However when I run this SQL:
`set hoodie.datasource.hive_sync.skip_ro_suffix=true;
create table if not exists zone_test.hudi_spark_table0718_mor_0111_skip
(
id string,
brand_id int,
name string,
model_id int,
model_name string,
etl_update_time string,
dt string,
hh string
) using hudi
options (
type = 'mor',
primaryKey = 'brand_id,vehicle_model_id',
preCombineField = 'etl_update_time',
hoodie.cleaner.commits.retained = '2',
hoodie.compact.inline=true
)
partitioned by (dt,hh)
;
insert into zone_test.hudi_spark_table0718_mor_0111_skip partition (dt,hh)
select id,
brand_id,
name,
model_id,
model_name,
CAST(current_timestamp AS string) as etl_update_time,
'20220715',
'10'
from zone_test.test_vehicle_status_2_hi
;
`
![image](https://user-images.githubusercontent.com/23710717/179433798-f1b8fc93-f23b-4c4f-ac0e-f8d5d3169c66.png)
now I can see 2 tables in the Hive,both
**hudi_spark_table0718_mor_0111_skip and hudi_spark_table0718_mor_0111_skip_rt** has **2recoreds**。
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] jiangbiao910 commented on issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
Posted by GitBox <gi...@apache.org>.
jiangbiao910 commented on issue #6127:
URL: https://github.com/apache/hudi/issues/6127#issuecomment-1186878898
> only _ro and _rt table will sync partition info to metastore, when I use presto to query raw table, cannot query the data, but _ro or _rt success. Depending on the implementation of the engine. When you `set hoodie.datasource.hive_sync.skip_ro_suffix=true`, it will take raw table as _ro to sync
So,I think it is normal from my tests; thank you very much!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] KnightChess commented on issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
Posted by GitBox <gi...@apache.org>.
KnightChess commented on issue #6127:
URL: https://github.com/apache/hudi/issues/6127#issuecomment-1186739851
only _ro and _rt table will sync partition info to metastore, when I use presto to query raw table, cannot query the data, but _ro or _rt success. Depending on the implementation of the engine. When you `set hoodie.datasource.hive_sync.skip_ro_suffix=true`, it will take raw table as _ro to sync
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6127:
URL: https://github.com/apache/hudi/issues/6127#issuecomment-1188593205
@KnightChess @fengjian428 thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6127:
URL: https://github.com/apache/hudi/issues/6127#issuecomment-1186696783
cc @fengjian428
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] jiangbiao910 commented on issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
Posted by GitBox <gi...@apache.org>.
jiangbiao910 commented on issue #6127:
URL: https://github.com/apache/hudi/issues/6127#issuecomment-1186877890
> hen I use presto to q
> @jiangbiao910 which query engine do you use to query data? make sure `hudi_spark_table0718_mor_0111` has no partition info in hive metastore
I use hive and trino to query data,
**show PARTITIONS hudi_spark_table0718_mor_0111** --》 0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] KnightChess commented on issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
Posted by GitBox <gi...@apache.org>.
KnightChess commented on issue #6127:
URL: https://github.com/apache/hudi/issues/6127#issuecomment-1186737362
@jiangbiao910 which query engine do you use to query data? make sure `hudi_spark_table0718_mor_0111` has no partition info in hive metastore
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] fengjian428 commented on issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #6127:
URL: https://github.com/apache/hudi/issues/6127#issuecomment-1186775480
Yeah, KnightChess is right, I think this can answer this question
> only _ro and _rt table will sync partition info to metastore, when I use presto to query raw table, cannot query the data, but _ro or _rt success. Depending on the implementation of the engine. When you `set hoodie.datasource.hive_sync.skip_ro_suffix=true`, it will take raw table as _ro to sync
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
Posted by GitBox <gi...@apache.org>.
xushiyan closed issue #6127: [SUPPORT] Upgrading to 0.11.1 resulting use sparksql and Sync Hive
URL: https://github.com/apache/hudi/issues/6127
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org