You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/17 14:25:02 UTC

[GitHub] [hudi] xiaoshao opened a new issue, #6969: [SUPPORT] does hudi do the same in MOR and COW table?

xiaoshao opened a new issue, #6969:
URL: https://github.com/apache/hudi/issues/6969

   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  spark version:  spark-3.2.2-bin-hadoop3.2
   2.  hive version apache-hive-3.1.3-bin
   3.  OS: mac
   4.  start spark sql as  `./spark-sql --jars ../../hudi-spark3-bundle_2.12-0.11.1.jar \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
   --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
   --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'`
   
   5. create table `CREATE TABLE mor1(
     uuid string, -- 要么给定uuid,要么PRIMARY KEY(field) NOT ENFORCED指定主键,否则会报错
     name string,
     age INT,
     ts , -- ts是必须字段,在前面有介绍过,用来决定数据的新旧的
     `partition` string
   ) using hudi
   PARTITIONED BY (`partition`)
   options (
      primaryKey='uuid',
     'path' = 'hdfs://localhost:9000/spark_hudi/hudi/mor1',
   );`
   
   6.  add data `INSERT INTO mor1 VALUES
     ('id1','Danny',23, '1970-01-01 00:00:01','par1'),
     ('id2','Stephen',33, '1970-01-01 00:00:02','par1'),
     ('id3','Julian',53, '1970-01-01 00:00:03','par2'),
     ('id4','Fabian',31, '1970-01-01 00:00:04','par2'),
     ('id5','Sophia',18, '1970-01-01 00:00:05','par3'),
     ('id6','Emma',20, '1970-01-01 00:00:06','par3'),
     ('id7','Bob',44, '1970-01-01 00:00:07','par4'),
     ('id8','Han',56, '1970-01-01 00:00:08','par4');`
   
   7. update data `INSERT INTO mor1 VALUES
     ('id1','Danny',27, '1970-01-01 00:00:01','par1'),`
   
   
   I found that hudi generates several parquet files in the partition `par1`, not one parquet file and one log file.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :  0.11.1
   
   * Spark version : spark-3.2.2
   
   * Hive version : apache-hive-3.1.3-bin
   
   * Hadoop version :  hadoop-3.3.3
   
   * Storage hdfs
   
   * Running on Docker? NO
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiaoshao commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?

Posted by GitBox <gi...@apache.org>.
xiaoshao commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1286521661

   > When using spark, the default behavior is to write new insert records into parquet files and write update records into delta-log. If you want to write new insert records into a log, should use Bucket Index(above 0.12) or Hbase index
   
   got it. I will try to create a hudi table with bucket index under hudi 0.12. thx for your help.
   
   
   You mean 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1281933281

   When using spark, the default behavior is to write new insert records into parquet files and write update records into delta-log.  If you want to write new insert records into a log, should use Bucket Index(above 0.12) or Hbase index


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiaoshao commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?

Posted by GitBox <gi...@apache.org>.
xiaoshao commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1280951476

   here is the file list for the partition `par1`
   ````
   hdfs dfs -ls -R /spark_hudi/hudi/mor1/partition=par1/
   2022-10-17 22:27:40,594 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   -rw-r--r--   1 shaozengwei supergroup         96 2022-10-17 21:37 /spark_hudi/hudi/mor1/partition=par1/.hoodie_partition_metadata
   -rw-r--r--   1 shaozengwei supergroup     435420 2022-10-17 21:41 /spark_hudi/hudi/mor1/partition=par1/2b219924-8881-4feb-93cf-7faecf24adde-0_0-123-179_20221017214153248.parquet
   -rw-r--r--   1 shaozengwei supergroup     435468 2022-10-17 21:42 /spark_hudi/hudi/mor1/partition=par1/2b219924-8881-4feb-93cf-7faecf24adde-0_0-152-217_20221017214224834.parquet
   -rw-r--r--   1 shaozengwei supergroup     435490 2022-10-17 21:37 /spark_hudi/hudi/mor1/partition=par1/2b219924-8881-4feb-93cf-7faecf24adde-0_0-74-106_20221017213731273.parquet
   -rw-r--r--   1 shaozengwei supergroup     435414 2022-10-17 21:38 /spark_hudi/hudi/mor1/partition=par1/2b219924-8881-4feb-93cf-7faecf24adde-0_0-97-144_20221017213839012.parquet
   ➜  tools
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1283159916

   yes, new inserts are routed to parquet files. Updates in subsequent commits are logged using log files. So, this behavior is expected. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiaoshao commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?

Posted by GitBox <gi...@apache.org>.
xiaoshao commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1286520086

   @nsivabalan  but the new parquet files includes all the records. it is expected?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?
URL: https://github.com/apache/hudi/issues/6969


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6969: [SUPPORT] does hudi do the same in MOR and COW table?

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6969:
URL: https://github.com/apache/hudi/issues/6969#issuecomment-1287478187

   thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org