You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/17 14:25:02 UTC
[GitHub] [hudi] xiaoshao opened a new issue, #6969: [SUPPORT] does hudi do the same in MOR and COW table?
xiaoshao opened a new issue, #6969:
URL: https://github.com/apache/hudi/issues/6969
**To Reproduce**
Steps to reproduce the behavior:
1. spark version: spark-3.2.2-bin-hadoop3.2
2. hive version apache-hive-3.1.3-bin
3. OS: mac
4. start spark sql as `./spark-sql --jars ../../hudi-spark3-bundle_2.12-0.11.1.jar \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'`
5. create table `CREATE TABLE mor1(
uuid string, -- 要么给定uuid,要么PRIMARY KEY(field) NOT ENFORCED指定主键,否则会报错
name string,
age INT,
ts , -- ts是必须字段,在前面有介绍过,用来决定数据的新旧的
`partition` string
) using hudi
PARTITIONED BY (`partition`)
options (
primaryKey='uuid',
'path' = 'hdfs://localhost:9000/spark_hudi/hudi/mor1',
);`
6. add data `INSERT INTO mor1 VALUES
('id1','Danny',23, '1970-01-01 00:00:01','par1'),
('id2','Stephen',33, '1970-01-01 00:00:02','par1'),
('id3','Julian',53, '1970-01-01 00:00:03','par2'),
('id4','Fabian',31, '1970-01-01 00:00:04','par2'),
('id5','Sophia',18, '1970-01-01 00:00:05','par3'),
('id6','Emma',20, '1970-01-01 00:00:06','par3'),
('id7','Bob',44, '1970-01-01 00:00:07','par4'),
('id8','Han',56, '1970-01-01 00:00:08','par4');`
7. update data `INSERT INTO mor1 VALUES
('id1','Danny',27, '1970-01-01 00:00:01','par1'),`
I found that hudi generates several parquet files in the partition `par1`, not one parquet file and one log file.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version : 0.11.1
* Spark version : spark-3.2.2
* Hive version : apache-hive-3.1.3-bin
* Hadoop version : hadoop-3.3.3
* Storage hdfs
* Running on Docker? NO
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```Add the stacktrace of the error.```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] xiaoshao commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
Posted by GitBox <gi...@apache.org>.
xiaoshao commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1286521661
> When using spark, the default behavior is to write new insert records into parquet files and write update records into delta-log. If you want to write new insert records into a log, should use Bucket Index(above 0.12) or Hbase index
got it. I will try to create a hudi table with bucket index under hudi 0.12. thx for your help.
You mean
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] fengjian428 commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1281933281
When using spark, the default behavior is to write new insert records into parquet files and write update records into delta-log. If you want to write new insert records into a log, should use Bucket Index(above 0.12) or Hbase index
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] xiaoshao commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
Posted by GitBox <gi...@apache.org>.
xiaoshao commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1280951476
here is the file list for the partition `par1`
````
hdfs dfs -ls -R /spark_hudi/hudi/mor1/partition=par1/
2022-10-17 22:27:40,594 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 1 shaozengwei supergroup 96 2022-10-17 21:37 /spark_hudi/hudi/mor1/partition=par1/.hoodie_partition_metadata
-rw-r--r-- 1 shaozengwei supergroup 435420 2022-10-17 21:41 /spark_hudi/hudi/mor1/partition=par1/2b219924-8881-4feb-93cf-7faecf24adde-0_0-123-179_20221017214153248.parquet
-rw-r--r-- 1 shaozengwei supergroup 435468 2022-10-17 21:42 /spark_hudi/hudi/mor1/partition=par1/2b219924-8881-4feb-93cf-7faecf24adde-0_0-152-217_20221017214224834.parquet
-rw-r--r-- 1 shaozengwei supergroup 435490 2022-10-17 21:37 /spark_hudi/hudi/mor1/partition=par1/2b219924-8881-4feb-93cf-7faecf24adde-0_0-74-106_20221017213731273.parquet
-rw-r--r-- 1 shaozengwei supergroup 435414 2022-10-17 21:38 /spark_hudi/hudi/mor1/partition=par1/2b219924-8881-4feb-93cf-7faecf24adde-0_0-97-144_20221017213839012.parquet
➜ tools
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1283159916
yes, new inserts are routed to parquet files. Updates in subsequent commits are logged using log files. So, this behavior is expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] xiaoshao commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
Posted by GitBox <gi...@apache.org>.
xiaoshao commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1286520086
@nsivabalan but the new parquet files includes all the records. it is expected?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
URL: https://github.com/apache/hudi/issues/6969
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1287478187
thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org