You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yuwei Xiao (Jira)" <ji...@apache.org> on 2022/07/10 07:37:00 UTC
[jira] [Commented] (HUDI-4318) IndexOutOfBoundException when recordKey has List values for Bucket index table
[ https://issues.apache.org/jira/browse/HUDI-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564649#comment-17564649 ]
Yuwei Xiao commented on HUDI-4318:
----------------------------------
Failed to re-produce the exception. My test code:
{code:java}
val schema = StructType( Array(
StructField("uuid", StringType),
StructField("ts", LongType),
StructField("partitionpath", StringType),
StructField("array_field", DataTypes.createArrayType(StringType))
))
val data = Seq(Row("id1", 1L, "2020/01/01", List("a","b","c")))
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.write.format("org.apache.hudi")
.options(getQuickstartWriteConfigs)
.option(PRECOMBINE_FIELD.key, "ts")
.option(RECORDKEY_FIELD.key, "uuid")
.option(PARTITIONPATH_FIELD.key, "partitionpath")
.option(INDEX_TYPE.key(), IndexType.BUCKET.name())
.option(BUCKET_INDEX_ENGINE_TYPE.key(), BucketIndexEngineType.SIMPLE.name())
.option(BUCKET_INDEX_NUM_BUCKETS.key(), "4")
.option(TBL_NAME.key, tableName)
.mode(Overwrite)
.save(tablePath) {code}
> IndexOutOfBoundException when recordKey has List values for Bucket index table
> ------------------------------------------------------------------------------
>
> Key: HUDI-4318
> URL: https://issues.apache.org/jira/browse/HUDI-4318
> Project: Apache Hudi
> Issue Type: Bug
> Components: core
> Affects Versions: 0.11.1
> Reporter: Harsha Teja Kanna
> Assignee: Yuwei Xiao
> Priority: Minor
>
> Currently, the Bucket index is supported only if the record key has columns with simple values.
> [https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/BucketIdentifier.java#L71]
> Example record for which this breaks
> column1:value1,column2:value2,column3:[value1,value2]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)