You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/31 16:11:53 UTC

[GitHub] [hudi] yabha-isomap opened a new issue, #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

yabha-isomap opened a new issue, #7100:
URL: https://github.com/apache/hudi/issues/7100

   1. I am trying to use Apache Hudi with Flink sql by following [Hudi's flink guide](https://hudi.apache.org/docs/flink-quick-start-guide)
   2. The basics are working, but now I need to provide custom implementation of HoodieRecordPayload as suggested on [this FAQ](https://hudi.apache.org/docs/faq#can-i-implement-my-own-logic-for-how-input-records-are-merged-with-record-on-storage).
   3. But when I am passing this config as shown in following listing, it doesn't work. Basically my custom class (MyHudiPoc.Poc) doesn't get picked and instead normal behaviour continues.
   
   ```sql
   
   CREATE TABLE t1(
     uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
     name VARCHAR(10),
     age INT,
     ts TIMESTAMP(3),
     `partition` VARCHAR(20)
   )
   PARTITIONED BY (`partition`)
   WITH (
     'connector' = 'hudi',
     'path' = '/tmp/hudi',
     'hoodie.compaction.payload.class' = 'MyHudiPoc.Poc', -- My custom class
     'hoodie.datasource.write.payload.class' = 'MyHudiPoc.Poc',  -- My custom class
     'write.payload.class' = 'MyHudiPoc.Poc',  -- My custom class
     'table.type' = 'MERGE_ON_READ'
   );
   
   INSERT INTO t1 VALUES
     ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
     ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
     ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
     ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
     ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
     ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
     ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
     ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
   
   
   insert into t1 values
     ('id1','Danny1',27,TIMESTAMP '1970-01-01 00:00:01','par1');
   
   ```
   
   1. I even tried passing it through `/etc/hudi/conf/hudi-default.conf`
   
   ```yaml
   ---
   "hoodie.compaction.payload.class": MyHudiPoc.Poc
   "hoodie.datasource.write.payload.class": MyHudiPoc.Poc
   "write.payload.class": MyHudiPoc.Poc
   ```
   
   I am also passing my custom jar while starting flink sql client.
   
   ```bash
   /bin/sql-client.sh embedded \
       -j ../jars/hudi-flink1.15-bundle-0.12.1.jar \
       -j ./plugins/flink-s3-fs-hadoop-1.15.1.jar \
       -j ./plugins/parquet-hive-bundle-1.8.1.jar \
       -j ./plugins/flink-sql-connector-kafka-1.15.1.jar \
       -j my-hudi-poc-1.0-SNAPSHOT.jar \
       shell
   ```
   
    1. I am able to pass my custom class in spark example but not in flink.
    1. Tried with both COW and MOR type of tables.
   
   Any idea what I am doing wrong?
   
   See listing in the question.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] complone commented on issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by GitBox <gi...@apache.org>.
complone commented on issue #7100:
URL: https://github.com/apache/hudi/issues/7100#issuecomment-1298209216

   Hello, for the configuration items you use hoodie.compaction.payload.class only map related configuration items to spark configuration in class HoodieWriterUtils.
   
   If you expect to use payloadClass, I think you should use write.payload.class, whose configuration items are defined in StreamerUtil#getPayloadConfig
   
   ![image](https://user-images.githubusercontent.com/20021404/199191180-df27a9de-6ef3-4b0e-b211-1ee45b649b32.png)
   ![image](https://user-images.githubusercontent.com/20021404/199191381-e8379497-84f6-4555-8d57-7f11d0ea232c.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #7100:
URL: https://github.com/apache/hudi/issues/7100#issuecomment-1299564461

   @yuzhaojing : can you assist here please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] complone commented on issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by GitBox <gi...@apache.org>.
complone commented on issue #7100:
URL: https://github.com/apache/hudi/issues/7100#issuecomment-1298209281

   At the same time I found that there is duplicate code here, I will open a PR to deal with it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] complone commented on issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by GitBox <gi...@apache.org>.
complone commented on issue #7100:
URL: https://github.com/apache/hudi/issues/7100#issuecomment-1298242721

   Maybe you should use ```payload.class```?
   
   ![image](https://user-images.githubusercontent.com/20021404/199199474-cd463deb-f215-463f-aba8-49bd431a777d.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #7100:
URL: https://github.com/apache/hudi/issues/7100#issuecomment-1306522726

   Did you try to use `payload.class` instead of `write.payload.class` then, we have changed the option key recent days.
   `write.payload.class` is changed as fallback option key but it only works in Flink 1.15.x.
   
   Feel free to re-open it if you still have problem here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] lucienoz commented on issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by "lucienoz (via GitHub)" <gi...@apache.org>.
lucienoz commented on issue #7100:
URL: https://github.com/apache/hudi/issues/7100#issuecomment-1661941802

   spark sql cow table how to set payload.class ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yabha-isomap commented on issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by GitBox <gi...@apache.org>.
yabha-isomap commented on issue #7100:
URL: https://github.com/apache/hudi/issues/7100#issuecomment-1298495326

   Thanks @complone . Tried with that also, but no luck.
   
   ```sql
   CREATE TABLE t1(
     uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
     name VARCHAR(10),
     age INT,
     ts TIMESTAMP(3),
     `partition` VARCHAR(20)
   )
   PARTITIONED BY (`partition`)
   WITH (
     'connector' = 'hudi',
     'path' = '/tmp/hudi',
     'hoodie.compaction.payload.class' = 'gsHudiPoc.Poc', -- My custom class
     'write.payload.class' = 'gsHudiPoc.Poc',  -- My custom class
     'payload.class' = 'gsHudiPoc.Poc',  -- My custom class
     'hoodie.datasource.write.payload.class' = 'gsHudiPoc.Poc',  -- My custom class
     'table.type' = 'COPY_ON_WRITE'
   );
   ```
   
   Let me try looking into the code of FlinkOptions.java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 closed issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by GitBox <gi...@apache.org>.
danny0405 closed issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql
URL: https://github.com/apache/hudi/issues/7100


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yabha-isomap commented on issue #7100: [SUPPORT] Custom HoodieRecordPayload for use in flink sql

Posted by GitBox <gi...@apache.org>.
yabha-isomap commented on issue #7100:
URL: https://github.com/apache/hudi/issues/7100#issuecomment-1337013456

   Thanks. I was able to get it to work with DataStream API.
   One tip for anyone facing this issue, put debugging message in the constructor of the class (and not in any method) to verify if your class is getting picked or not.
   In my case, class was getting picked but method was not getting called because of some code issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org