You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/09 13:07:38 UTC

[GitHub] [hudi] codope commented on a diff in pull request #6049: [HUDI-4365] Fixing URL-encoding in Bulk Insert row-writing path

codope commented on code in PR #6049:
URL: https://github.com/apache/hudi/pull/6049#discussion_r917266599


##########
hudi-common/src/test/java/org/apache/hudi/common/testutils/FileSystemTestUtils.java:
##########
@@ -44,7 +44,7 @@ public class FileSystemTestUtils {
   public static final String FORWARD_SLASH = "/";
   public static final String FILE_SCHEME = "file";
   public static final String COLON = ":";
-  public static final Random RANDOM = new Random();
+  public static final Random RANDOM = new Random(0xDEED);

Review Comment:
   +1



##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java:
##########
@@ -126,14 +125,17 @@ public void write(InternalRow record) throws IOException {
       if (populateMetaFields) { // usual path where meta fields are pre populated in prep step.
         partitionPath = String.valueOf(record.getUTF8String(HoodieRecord.PARTITION_PATH_META_FIELD_POS));
       } else { // if meta columns are disabled.
+        // TODO(HUDI-3993) remove duplication, unify with HoodieDatasetBulkInsertHelper
         if (!keyGeneratorOpt.isPresent()) { // NoPartitionerKeyGen
           partitionPath = "";
         } else if (simpleKeyGen) { // SimpleKeyGen
-          Object parititionPathValue = record.get(simplePartitionFieldIndex, simplePartitionFieldDataType);
-          partitionPath = parititionPathValue != null ? parititionPathValue.toString() : PartitionPathEncodeUtils.DEFAULT_PARTITION_PATH;
-          if (writeConfig.isHiveStylePartitioningEnabled()) {
-            partitionPath = (keyGeneratorOpt.get()).getPartitionPathFields().get(0) + "=" + partitionPath;
-          }
+          Object partitionPathValue = record.get(simplePartitionFieldIndex, simplePartitionFieldDataType);
+          String partitionPathField = keyGeneratorOpt.get().getPartitionPathFields().get(0);
+          boolean shouldURLEncodePartitionPath = writeConfig.shouldURLEncodePartitionPath();
+          boolean hiveStylePartitioningEnabled = writeConfig.isHiveStylePartitioningEnabled();
+
+          partitionPath = KeyGenUtils.handlePartitionPathDecoration(partitionPathField,
+              partitionPathValue == null ? null : partitionPathValue.toString(), shouldURLEncodePartitionPath, hiveStylePartitioningEnabled);

Review Comment:
   Consider passing `Option.of(partitionPathValue)` instead of `null`



##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java:
##########
@@ -97,8 +96,31 @@ public static Dataset<Row> prepareHoodieDatasetForBulkInsert(SQLContext sqlConte
       // simple fields for both record key and partition path: can directly use withColumn
       String partitionPathField = keyGeneratorClass.equals(SimpleKeyGenerator.class.getName()) ? partitionPathFields :
           partitionPathFields.substring(partitionPathFields.indexOf(":") + 1);
-      rowDatasetWithRecordKeysAndPartitionPath = rows.withColumn(HoodieRecord.RECORD_KEY_METADATA_FIELD, functions.col(recordKeyFields).cast(DataTypes.StringType))
-          .withColumn(HoodieRecord.PARTITION_PATH_METADATA_FIELD, functions.col(partitionPathField).cast(DataTypes.StringType));
+
+      // TODO(HUDI-3993) cleanup duplication
+      String tableName = properties.getString(HoodieWriteConfig.TBL_NAME.key());
+      String partitionPathDecorationUDFName = PARTITION_PATH_UDF_FN + tableName;
+
+      boolean shouldURLEncodePartitionPath = config.shouldURLEncodePartitionPath();
+      boolean isHiveStylePartitioned = config.isHiveStylePartitioningEnabled();
+
+      if (shouldURLEncodePartitionPath || isHiveStylePartitioned) {
+        sqlContext.udf().register(
+            partitionPathDecorationUDFName,
+            (UDF1<String, String>) partitionPathValue ->

Review Comment:
   So, I assume this UDF registration would be gone after HUDI-3993?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org