You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/04 17:51:04 UTC

[GitHub] [hudi] t0il3ts0ap opened a new issue #2535: [SUPPORT] _hoodie_is_deleted not working with custom transformer

t0il3ts0ap opened a new issue #2535:
URL: https://github.com/apache/hudi/issues/2535


   I have a `_deleted` column in my dataset which I am converting to `_hoodie_is_deleted` using a transformer. The change is reflected in metastore and s3 dataset. 
   But expected behavior is hard deletion instead of a soft deletion. The row should not show up when making any query.
   
   Attaching code for reference:
   ```
   public class CustomTransformer implements Transformer {
   
       public Dataset<Row> apply(JavaSparkContext javaSparkContext, SparkSession sparkSession,
           Dataset<Row> dataset, TypedProperties typedProperties) {
   
           return dataset
               .withColumnRenamed("__deleted", "_hoodie_is_deleted")
               .drop("__op", "__source_ts_ms");
       }
   }
   ```
   
   ```
   scala> val df = spark.read.format("org.apache.hudi").load("s3://***************/delta-streamer-test/tables/accounts-data/default")
   df: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 11 more fields]
   
   scala> df.show()
   +-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-------------------+----------------+----------------+------------+------------------+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name| id| username|   password|              email|      created_on|      last_login|       __lsn|_hoodie_is_deleted|
   +-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-------------------+----------------+----------------+------------+------------------+
   |     20210204174033|  20210204174033_0_6|                 1|               default|848f7f69-be2e-498...|  1|some user|new  pass 3|someemail@email.com|1612193554103104|1612460406978955|614115973352|             false|
   |     20210204173646|  20210204173646_0_2|                 8|               default|848f7f69-be2e-498...|  8|         |           |                   |               0|            null|614054424744|              true|
   +-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-------------------+----------------+----------------+------------+------------------+
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] t0il3ts0ap closed issue #2535: [SUPPORT] _hoodie_is_deleted not working with custom transformer

Posted by GitBox <gi...@apache.org>.
t0il3ts0ap closed issue #2535:
URL: https://github.com/apache/hudi/issues/2535


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] t0il3ts0ap commented on issue #2535: [SUPPORT] _hoodie_is_deleted not working with custom transformer

Posted by GitBox <gi...@apache.org>.
t0il3ts0ap commented on issue #2535:
URL: https://github.com/apache/hudi/issues/2535#issuecomment-773945636


   `_hoodie_is_deleted` was string, It has to boolean to make it work. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #2535: [SUPPORT] _hoodie_is_deleted not working with custom transformer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #2535:
URL: https://github.com/apache/hudi/issues/2535#issuecomment-776042684


   may be some room for docs to be improved?
   
   http://hudi.apache.org/docs/writing_data.html#deletes
   could say 
   
   `add a boolean column named _hoodie_is_deleted to DataSet. `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org