You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/06 07:00:44 UTC

[GitHub] [hudi] maheshguptags opened a new issue, #7613: Not able to Delete record

maheshguptags opened a new issue, #7613:
URL: https://github.com/apache/hudi/issues/7613

   **Not able to delete by spark which is generated by Flink hudi job**
   
   I have been trying to delete record from hudi table using pyspark which is generated by flink hudi job. So when I am running the job using config 1 the delete job creates a timeline of `delta commits` but does not delete the records.
   
   Whereas when I am trying with `config2` it creates `rollback` then `deltacommit` in `.hoodie` folder and creates empty parquet file in partition bucket. I want to understand that why config2 triggers the `rollback` and creates the `empty parquet` file.
   
   `config1`
   
   hudi_options_write = {
       'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
       'hoodie.datasource.write.recordkey.field': 'list_id,customer_id,client_id',
       'hoodie.table.name': tableName,
       'hoodie.datasource.write.partitionpath.field': 'client_id',
       'hoodie.datasource.write.operation':'delete',
       'hoodie.datasource.write.precombine.field': 'created_date'
   } 
   `config2`
   hudi_options_write = {
       'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
       'hoodie.datasource.write.recordkey.field': 'list_id,customer_id,client_id',
       'hoodie.table.name': tableName,
       'hoodie.datasource.write.partitionpath.field': 'client_id',
       'hoodie.datasource.write.operation':'upsert',
       'hoodie.datasource.write.payload.class': 'org.apache.hudi.common.model.EmptyHoodieRecordPayload',
       'hoodie.datasource.write.precombine.field': 'created_date'
   }
   I want to know why is delete operation not working properly. while with config1 I am to delete the record written by `spark hudi` job.
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Write some data in HUDI Table using `Flink hudi` job
   2. Try to read it using Pyspark
   3. Apply filter and try to delete the record using `config1` and `config2`
   
   **Expected behavior**
   
   I want to delete the record using `spark` that is generated by Flink job
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : Spark 3.3.0
   
   * Hive version : Hive 3.1.3
   
   * Hadoop version : Hadoop 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Image**
   
   ![image](https://user-images.githubusercontent.com/115445723/210947376-6f39de32-2d8b-4ff7-9a42-2eeeb287b6e6.png)
   
   **Stacktrace**
   [stacktrace_rollback_delete.log](https://github.com/apache/hudi/files/10357976/stacktrace_rollback_delete.log)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] maheshguptags commented on issue #7613: Not able to Delete record

Posted by "maheshguptags (via GitHub)" <gi...@apache.org>.
maheshguptags commented on issue #7613:
URL: https://github.com/apache/hudi/issues/7613#issuecomment-1408021672

   @danny0405 can you please post the status?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] maheshguptags commented on issue #7613: Not able to Delete record

Posted by GitBox <gi...@apache.org>.
maheshguptags commented on issue #7613:
URL: https://github.com/apache/hudi/issues/7613#issuecomment-1373220640

   CC: @Po Hong @nsivabalan @bhasudha @codope 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] maheshguptags commented on issue #7613: [SUPPORT] Not able to Delete record

Posted by "maheshguptags (via GitHub)" <gi...@apache.org>.
maheshguptags commented on issue #7613:
URL: https://github.com/apache/hudi/issues/7613#issuecomment-1409804936

   Hi @danny0405,
   I tried both option like Upsert, Delete but it is doing the same. so it is not working 
   Now if you have any working code for cross platform(flink insert, delete spark) to delete the record from hudi table,  please share with us as well.
   
   -Mahesh    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7613: [SUPPORT] Not able to Delete record

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7613:
URL: https://github.com/apache/hudi/issues/7613#issuecomment-1531080313

   Did you stop the flink job first while executing the Spark delete job?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #7613: [SUPPORT] Not able to Delete record

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #7613:
URL: https://github.com/apache/hudi/issues/7613#issuecomment-1570359418

   @maheshguptags Gentle ping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7613: Not able to Delete record

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7613:
URL: https://github.com/apache/hudi/issues/7613#issuecomment-1409794955

   I noticed that there is an write operation named `DELETE`, so just switch the value to `hoodie.datasource.write.operation` and have a try again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org