You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yuwei Xiao (Jira)" <ji...@apache.org> on 2022/07/10 07:37:00 UTC

[jira] [Commented] (HUDI-4318) IndexOutOfBoundException when recordKey has List values for Bucket index table

    [ https://issues.apache.org/jira/browse/HUDI-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564649#comment-17564649 ] 

Yuwei Xiao commented on HUDI-4318:
----------------------------------

Failed to re-produce the exception. My test code:

 
{code:java}
val schema = StructType( Array(
      StructField("uuid", StringType),
      StructField("ts", LongType),
      StructField("partitionpath", StringType),
      StructField("array_field", DataTypes.createArrayType(StringType))
    ))

    val data = Seq(Row("id1", 1L, "2020/01/01", List("a","b","c")))
    val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

    df.write.format("org.apache.hudi")
      .options(getQuickstartWriteConfigs)
      .option(PRECOMBINE_FIELD.key, "ts")
      .option(RECORDKEY_FIELD.key, "uuid")
      .option(PARTITIONPATH_FIELD.key, "partitionpath")
      .option(INDEX_TYPE.key(), IndexType.BUCKET.name())
      .option(BUCKET_INDEX_ENGINE_TYPE.key(), BucketIndexEngineType.SIMPLE.name())
      .option(BUCKET_INDEX_NUM_BUCKETS.key(), "4")
      .option(TBL_NAME.key, tableName)
      .mode(Overwrite)
      .save(tablePath) {code}

> IndexOutOfBoundException when recordKey has List values for Bucket index table
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-4318
>                 URL: https://issues.apache.org/jira/browse/HUDI-4318
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.11.1
>            Reporter: Harsha Teja Kanna
>            Assignee: Yuwei Xiao
>            Priority: Minor
>
> Currently, the Bucket index is supported only if the record key has columns with simple values.
> [https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/BucketIdentifier.java#L71]
> Example record for which this breaks
> column1:value1,column2:value2,column3:[value1,value2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)