You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/10 10:14:15 UTC

[GitHub] [hudi] y0908105023 opened a new pull request, #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

y0908105023 opened a new pull request, #6650:
URL: https://github.com/apache/hudi/pull/6650

   …ins :
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #6650: [HUDI-4828] Fix the extraction of record keys which may be cut out

Posted by GitBox <gi...@apache.org>.
yihua merged PR #6650:
URL: https://github.com/apache/hudi/pull/6650


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1242707667

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11290",
       "triggerID" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c806cc4d05575a016fa5ee15d7f91972d6e55f2b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11290) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] y0908105023 commented on pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
y0908105023 commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1243292003

   `package org.apache.flink.hudi;
   
   import org.apache.flink.api.common.functions.MapFunction;
   import org.apache.flink.api.common.typeinfo.TypeInformation;
   import org.apache.flink.api.java.typeutils.RowTypeInfo;
   import org.apache.flink.configuration.Configuration;
   import org.apache.flink.configuration.RestOptions;
   import org.apache.flink.streaming.api.datastream.DataStream;
   import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment;
   import org.apache.flink.table.api.EnvironmentSettings;
   import org.apache.flink.table.api.Schema;
   import org.apache.flink.table.api.Table;
   import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment;
   import org.apache.flink.table.api.bridge.scala.internal.StreamTableEnvironmentImpl;
   import org.apache.flink.table.catalog.Column;
   import org.apache.flink.table.catalog.ResolvedSchema;
   import org.apache.flink.table.runtime.typeutils.ExternalTypeInfo;
   import org.apache.flink.table.types.DataType;
   import org.apache.flink.table.types.utils.TypeConversions;
   import org.apache.flink.types.Row;
   import org.apache.flink.util.Test;
   
   import org.apache.hudi.org.apache.avro.SchemaBuilder;
   
   import java.sql.Date;
   import java.sql.Time;
   import java.sql.Timestamp;
   import java.util.List;
   
   public class HudiRead {
   
       public static void main(String[] args) throws Exception {
   
           Configuration config = new Configuration();
           config.setInteger(RestOptions.PORT, 8082);
           config.setString("metrics.system-resource", "true");
           StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
           EnvironmentSettings settings = EnvironmentSettings.newInstance()
                   .inStreamingMode()
                   .build();
           StreamTableEnvironmentImpl tEnv = Test.getStreamTableEnvironment(env, settings);
   
           String hudiSql = "CREATE TABLE IF NOT EXISTS hudi (\n" +
                   "  `aa` STRING, " +
                   "  `bb` BIGINT," +
                   "  `cc` DECIMAL(10,2)," +
                   "  `dd` DATE," +
                   "  `ff` TIMESTAMP(6), " +
                   "  PRIMARY KEY (bb) NOT ENFORCED\n" +
                   ")\n" +
                   "  PARTITIONED BY (`dd`)\n" +
                   "with (\n" +
                   "  \n" +
                   "  'connector' = 'hudi',\n" +
                   "  'hive_sync.support_timestamp' = 'true',\n" +
                   "  'path' = 'hdfs://yangshuo7.local:9000/user/hudi//hjh8171/mor_fq_82572',\n" +
                   "  'read.streaming.enabled' = 'true',\n" +
                   "  'table.type' = 'MERGE_ON_READ',\n" +
                   "  'write.tasks' = '1',\n" +
                   "  'changelog.enable' = 'true',\n" +
                   "  'write.operation' = 'upsert',\n" +
                   "  'compaction.async.enabled' = 'false',\n" +
                   "  'hive_sync.enable' = 'true',\n" +
                   "  'hive_sync.mode' = 'hms',\n" +
                   "  'hive_sync.db' = 'hjh8171',\n" +
                   "  'hive_sync.table' = 'mor_fq_82572',\n" +
                   "  'hive_sync.metastore.uris' = 'thrift://localhost:9083'\n" +
                   ")";
   
   
           tEnv.executeSql(hudiSql);
           Table t = tEnv.from("hudi");
   
           tEnv.toChangelogStream(t).javaStream().map(new MapFunction<Row, Row>() {
               @Override
               public Row map(Row value) throws Exception {
                   return value;
               }
           }).print();
   
           env.execute("hudi job");
       }
       
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] y0908105023 commented on pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
y0908105023 commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1243290760

   `package org.apache.flink.hudi;
   
   import org.apache.flink.configuration.Configuration;
   import org.apache.flink.configuration.RestOptions;
   import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
   import org.apache.flink.table.api.EnvironmentSettings;
   import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
   
   public class HudiWrite {
   
       public static void main(String[] args) {
   
           Configuration config = new Configuration();
           config.setInteger(RestOptions.PORT, 8081);
           config.setString("metrics.system-resource", "true");
           StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(config);
           EnvironmentSettings settings = EnvironmentSettings.newInstance()
                   .inStreamingMode()
                   .build();
           env.setParallelism(1);
           StreamTableEnvironment tEnv = StreamTableEnvironment.create(env, settings);
           env.enableCheckpointing(10000);
   
           String hudiSql = "CREATE TABLE IF NOT EXISTS hudi (\n" +
                   "  `aa` STRING,`bb` BIGINT,`cc` DECIMAL(10,2),`dd` DATE,`ff` TIMESTAMP(6),PRIMARY KEY (aa,bb) NOT ENFORCED\n" +
                   ")\n" +
                   "  PARTITIONED BY (`dd`)\n" +
                   "with (\n" +
                   "  \n" +
                   "  'connector' = 'hudi',\n" +
                   "  'path' = 'hdfs://yangshuo7.local:9000/user/hudi//hjh8171/mor_fq_82572',\n" +
                   "  'read.streaming.enabled' = 'true',\n" +
                   "  'table.type' = 'MERGE_ON_READ',\n" +
                   "  'write.tasks' = '1',\n" +
                   "  'changelog.enable' = 'true',\n" +
                   "  'write.operation' = 'upsert',\n" +
                   "  'compaction.async.enabled' = 'true',\n" +
                   "  'hive_sync.enable' = 'true',\n" +
                   "  'hive_sync.mode' = 'hms',\n" +
                   "  'hive_sync.db' = 'hjh8171',\n" +
                   "  'hive_sync.table' = 'mor_fq_82572',\n" +
                   "  'hive_sync.metastore.uris' = 'thrift://localhost:9083'\n" +
                   ")";
   
           String mysqlCDCSql = "CREATE TABLE IF NOT EXISTS mysql (\n" +
                   "  `aa` STRING,`bb` BIGINT,`cc` DECIMAL(10,2),`dd` DATE,`ff` TIMESTAMP(3),PRIMARY KEY (aa,bb) NOT ENFORCED\n" +
                   ")\n" +
                   "  PARTITIONED BY (`aa`,`dd`)\n" +
                   "with (\n" +
                   " 'connector' = 'mysql-cdc', " +
                   " 'scan.startup.mode' = 'latest-offset', " +
                   " 'hostname' = '127.0.0.1', " +
                   " 'port' = '3306', " +
                   " 'username' = 'root', " +
                   " 'password' = '12345678', " +
                   " 'database-name' = 'flink', " +
                   " 'table-name' = 'all_schema3' " +
                   ")";
   
   
           tEnv.executeSql(mysqlCDCSql);
           tEnv.executeSql(hudiSql);
           tEnv.executeSql("insert into hudi select aa, bb, cc, dd, ff from mysql");
       }
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1242718237

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11290",
       "triggerID" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c806cc4d05575a016fa5ee15d7f91972d6e55f2b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11290) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #6650: [HUDI-4828] Fix the extraction of record keys which may be cut out

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1250126129

   @y0908105023 Could you update the PR description with the details as well?  It's great that you created a Jira ticket to explain the bug.  The PR should also be self-explanatory with the description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #6650: [HUDI-4828] Fix the extraction of record keys which may be cut out

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1250126169

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1242695301

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c806cc4d05575a016fa5ee15d7f91972d6e55f2b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6650:
URL: https://github.com/apache/hudi/pull/6650#discussion_r967684336


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##########
@@ -74,7 +74,7 @@ public static String getPartitionPathFromGenericRecord(GenericRecord genericReco
   public static String[] extractRecordKeys(String recordKey) {
     String[] fieldKV = recordKey.split(",");
     return Arrays.stream(fieldKV).map(kv -> {
-      final String[] kvArray = kv.split(":");
+      final String[] kvArray = kv.split(":", 2);

Review Comment:
   I assume this is needed when the value contains a colon (:).  Could you add a unit test for that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] y0908105023 commented on pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
y0908105023 commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1243288837

   Test as:
   
   `package org.apache.flink.hudi;
   
   import org.apache.flink.api.common.functions.MapFunction;
   import org.apache.flink.api.common.typeinfo.TypeInformation;
   import org.apache.flink.api.java.typeutils.RowTypeInfo;
   import org.apache.flink.configuration.Configuration;
   import org.apache.flink.configuration.RestOptions;
   import org.apache.flink.streaming.api.datastream.DataStream;
   import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment;
   import org.apache.flink.table.api.EnvironmentSettings;
   import org.apache.flink.table.api.Schema;
   import org.apache.flink.table.api.Table;
   import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment;
   import org.apache.flink.table.api.bridge.scala.internal.StreamTableEnvironmentImpl;
   import org.apache.flink.table.catalog.Column;
   import org.apache.flink.table.catalog.ResolvedSchema;
   import org.apache.flink.table.runtime.typeutils.ExternalTypeInfo;
   import org.apache.flink.table.types.DataType;
   import org.apache.flink.table.types.utils.TypeConversions;
   import org.apache.flink.types.Row;
   import org.apache.flink.util.Test;
   
   import org.apache.hudi.org.apache.avro.SchemaBuilder;
   
   import java.sql.Date;
   import java.sql.Time;
   import java.sql.Timestamp;
   import java.util.List;
   
   public class HudiRead {
   
       public static void main(String[] args) throws Exception {
   
           Configuration config = new Configuration();
           config.setInteger(RestOptions.PORT, 8082);
           config.setString("metrics.system-resource", "true");
           StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
           EnvironmentSettings settings = EnvironmentSettings.newInstance()
                   .inStreamingMode()
                   .build();
           StreamTableEnvironmentImpl tEnv = Test.getStreamTableEnvironment(env, settings);
   
           String hudiSql = "CREATE TABLE IF NOT EXISTS hudi (\n" +
                   "  `aa` STRING, " +
                   "  `bb` BIGINT," +
                   "  `cc` DECIMAL(10,2)," +
                   "  `dd` DATE," +
                   "  `ff` TIMESTAMP(6), " +
   //                "  `gg` as bb*10, " +
   //                "  WATERMARK FOR ff AS ff - INTERVAL '5' SECOND," +
                   "  PRIMARY KEY (bb) NOT ENFORCED\n" +
                   ")\n" +
                   "  PARTITIONED BY (`dd`)\n" +
                   "with (\n" +
                   "  \n" +
                   "  'connector' = 'hudi',\n" +
                   "  'hive_sync.support_timestamp' = 'true',\n" +
                   "  'path' = 'hdfs://yangshuo7.local:9000/user/hudi//hjh8171/mor_fq_82572',\n" +
                   "  'read.streaming.enabled' = 'true',\n" +
                   "  'table.type' = 'MERGE_ON_READ',\n" +
                   "  'write.tasks' = '1',\n" +
                   "  'changelog.enable' = 'true',\n" +
                   "  'write.operation' = 'upsert',\n" +
                   "  'compaction.async.enabled' = 'false',\n" +
                   "  'hive_sync.enable' = 'true',\n" +
                   "  'hive_sync.mode' = 'hms',\n" +
                   "  'hive_sync.db' = 'hjh8171',\n" +
                   "  'hive_sync.table' = 'mor_fq_82572',\n" +
                   "  'hive_sync.metastore.uris' = 'thrift://localhost:9083'\n" +
                   ")";
   
   
           tEnv.executeSql(hudiSql);
           Table t = tEnv.from("hudi");
   
           DataStream<Row> dataStream1 = Test.toStream(tEnv, t).javaStream();
           dataStream1.map(new MapFunction<Row, Row>() {
               @Override
               public Row map(Row value) throws Exception {
                   return value;
               }
           }).print("");
   
           Schema schema = getSchema(t);
           DataStream<Row> dataStream = tEnv.toChangelogStream(t, schema).javaStream();
   
           TypeInformation typeInformation = dataStream.getType();
           tEnv.toChangelogStream(t).javaStream().map(new MapFunction<Row, Row>() {
               @Override
               public Row map(Row value) throws Exception {
                   return value;
               }
           }).print();
   
           env.execute("hudi job");
       }
   
       private static Schema getSchema(Table t) {
           Schema.Builder builder = Schema.newBuilder();
   
           List<Column> columns = t.getResolvedSchema().getColumns();
           t.getResolvedSchema().toPhysicalRowDataType();
           for (Column c: columns) {
               DataType dt = c.getDataType();
               switch (dt.getLogicalType().getTypeRoot()) {
                   case TIMESTAMP_WITHOUT_TIME_ZONE:
                       builder.column(c.getName(), dt.bridgedTo(Timestamp.class));
                       break;
                   case TIME_WITHOUT_TIME_ZONE:
                       builder.column(c.getName(), dt.bridgedTo(Time.class));
                       break;
                   case DATE:
                       builder.column(c.getName(), dt.bridgedTo(Date.class));
                       break;
                   default:
                       builder.column(c.getName(), dt);
               }
           }
   
           return builder.build();
       }
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6650: [HUDI-4828] Fix the extraction of record keys which may be cut out

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1250149874

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11290",
       "triggerID" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11465",
       "triggerID" : "1250126169",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11465",
       "triggerID" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "DELETED",
       "url" : "TBD",
       "triggerID" : "1250126169",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 28ec2e93378de1f613936a4236208b9975259fdf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11465) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6650: [HUDI-4828] Fix the extraction of record keys which may be cut out

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1250133548

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11290",
       "triggerID" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "CANCELED",
       "url" : "TBD",
       "triggerID" : "1250126169",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11465",
       "triggerID" : "1250126169",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11465",
       "triggerID" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 28ec2e93378de1f613936a4236208b9975259fdf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11465) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6650: [HUDI-4828] Fix the extraction of record keys which may be cut out

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1250132876

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11290",
       "triggerID" : "c806cc4d05575a016fa5ee15d7f91972d6e55f2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1250126169",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "",
       "status" : "CANCELED",
       "url" : "TBD",
       "triggerID" : "1250126169",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "28ec2e93378de1f613936a4236208b9975259fdf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 28ec2e93378de1f613936a4236208b9975259fdf UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6650: [minor] HUDI-4828, fix the extraction for record keys may be cut out

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6650:
URL: https://github.com/apache/hudi/pull/6650#discussion_r973618423


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##########
@@ -74,7 +74,7 @@ public static String getPartitionPathFromGenericRecord(GenericRecord genericReco
   public static String[] extractRecordKeys(String recordKey) {
     String[] fieldKV = recordKey.split(",");
     return Arrays.stream(fieldKV).map(kv -> {
-      final String[] kvArray = kv.split(":");
+      final String[] kvArray = kv.split(":", 2);

Review Comment:
   Yes, I was referring to the same case and asking for adding a unit test in the repo.  I added a unit for this PR.  Will land it once CI passes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] y0908105023 commented on pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
y0908105023 commented on PR #6650:
URL: https://github.com/apache/hudi/pull/6650#issuecomment-1243291605

   `package org.apache.flink.hudi;
   
   import org.apache.flink.api.common.functions.MapFunction;
   import org.apache.flink.api.common.typeinfo.TypeInformation;
   import org.apache.flink.api.java.typeutils.RowTypeInfo;
   import org.apache.flink.configuration.Configuration;
   import org.apache.flink.configuration.RestOptions;
   import org.apache.flink.streaming.api.datastream.DataStream;
   import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment;
   import org.apache.flink.table.api.EnvironmentSettings;
   import org.apache.flink.table.api.Schema;
   import org.apache.flink.table.api.Table;
   import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment;
   import org.apache.flink.table.api.bridge.scala.internal.StreamTableEnvironmentImpl;
   import org.apache.flink.table.catalog.Column;
   import org.apache.flink.table.catalog.ResolvedSchema;
   import org.apache.flink.table.runtime.typeutils.ExternalTypeInfo;
   import org.apache.flink.table.types.DataType;
   import org.apache.flink.table.types.utils.TypeConversions;
   import org.apache.flink.types.Row;
   import org.apache.flink.util.Test;
   
   import org.apache.hudi.org.apache.avro.SchemaBuilder;
   
   import java.sql.Date;
   import java.sql.Time;
   import java.sql.Timestamp;
   import java.util.List;
   
   public class HudiRead {
   
       public static void main(String[] args) throws Exception {
   
           Configuration config = new Configuration();
           config.setInteger(RestOptions.PORT, 8082);
           config.setString("metrics.system-resource", "true");
           StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
           EnvironmentSettings settings = EnvironmentSettings.newInstance()
                   .inStreamingMode()
                   .build();
           StreamTableEnvironmentImpl tEnv = Test.getStreamTableEnvironment(env, settings);
   
           String hudiSql = "CREATE TABLE IF NOT EXISTS hudi (\n" +
                   "  `aa` STRING, " +
                   "  `bb` BIGINT," +
                   "  `cc` DECIMAL(10,2)," +
                   "  `dd` DATE," +
                   "  `ff` TIMESTAMP(6), " +
                   "  PRIMARY KEY (bb) NOT ENFORCED\n" +
                   ")\n" +
                   "  PARTITIONED BY (`dd`)\n" +
                   "with (\n" +
                   "  \n" +
                   "  'connector' = 'hudi',\n" +
                   "  'hive_sync.support_timestamp' = 'true',\n" +
                   "  'path' = 'hdfs://yangshuo7.local:9000/user/hudi//hjh8171/mor_fq_82572',\n" +
                   "  'read.streaming.enabled' = 'true',\n" +
                   "  'table.type' = 'MERGE_ON_READ',\n" +
                   "  'write.tasks' = '1',\n" +
                   "  'changelog.enable' = 'true',\n" +
                   "  'write.operation' = 'upsert',\n" +
                   "  'compaction.async.enabled' = 'false',\n" +
                   "  'hive_sync.enable' = 'true',\n" +
                   "  'hive_sync.mode' = 'hms',\n" +
                   "  'hive_sync.db' = 'hjh8171',\n" +
                   "  'hive_sync.table' = 'mor_fq_82572',\n" +
                   "  'hive_sync.metastore.uris' = 'thrift://localhost:9083'\n" +
                   ")";
   
   
           tEnv.executeSql(hudiSql);
           Table t = tEnv.from("hudi");
   
           tEnv.toChangelogStream(t).javaStream().map(new MapFunction<Row, Row>() {
               @Override
               public Row map(Row value) throws Exception {
                   return value;
               }
           }).print();
   
           env.execute("hudi job");
       }
       
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] y0908105023 commented on a diff in pull request #6650: [minor] following HUDI-4828, fix the extraction for record keys conta…

Posted by GitBox <gi...@apache.org>.
y0908105023 commented on code in PR #6650:
URL: https://github.com/apache/hudi/pull/6650#discussion_r968036201


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##########
@@ -74,7 +74,7 @@ public static String getPartitionPathFromGenericRecord(GenericRecord genericReco
   public static String[] extractRecordKeys(String recordKey) {
     String[] fieldKV = recordKey.split(",");
     return Arrays.stream(fieldKV).map(kv -> {
-      final String[] kvArray = kv.split(":");
+      final String[] kvArray = kv.split(":", 2);

Review Comment:
   <img width="720" alt="image" src="https://user-images.githubusercontent.com/8789291/189589596-5a5dc805-4c01-4890-acf7-9a75e3eadb51.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org