You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/05/26 00:23:00 UTC

[jira] [Created] (HUDI-4156) AsyncIndexer fails for column stats partition

sivabalan narayanan created HUDI-4156:
-----------------------------------------

             Summary: AsyncIndexer fails for column stats partition 
                 Key: HUDI-4156
                 URL: https://issues.apache.org/jira/browse/HUDI-4156
             Project: Apache Hudi
          Issue Type: Bug
          Components: metadata
            Reporter: sivabalan narayanan


Tried to build col stats for a hudi table w/ async indexer and ran into below exception

 

Configs I had set are 
{code:java}
hoodie.metadata.enable=true
hoodie.metadata.index.async=true
hoodie.metadata.index.column.stats.enable=true
hoodie.write.concurrency.mode=optimistic_concurrency_control
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider {code}
command
{code:java}
./bin/spark-submit --class org.apache.hudi.utilities.HoodieIndexer /home/hadoop/hudi-utilities-bundle_2.12-0.12.0-SNAPSHOT.jar --props file:///home/hadoop/indexer.properties --mode scheduleandexecute --base-path TBL_PATH --table-name call_center --index-types COLUMN_STATS --parallelism 1 --spark-memory 10g {code}
 

 
{code:java}
2022-05-26 00:14:27,936 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
2022-05-26 00:14:27,937 INFO client.BaseHoodieClient: Stopping Timeline service !!
2022-05-26 00:14:27,937 INFO embedded.EmbeddedTimelineService: Closing Timeline server
2022-05-26 00:14:27,937 INFO service.TimelineService: Closing Timeline Service
2022-05-26 00:14:27,937 INFO javalin.Javalin: Stopping Javalin ...
2022-05-26 00:14:27,945 INFO javalin.Javalin: Javalin has stopped
2022-05-26 00:14:27,945 INFO service.TimelineService: Closed Timeline Service
2022-05-26 00:14:27,945 INFO embedded.EmbeddedTimelineService: Closed Timeline server
2022-05-26 00:14:27,945 INFO transaction.TransactionManager: Transaction manager closed
2022-05-26 00:14:27,946 ERROR utilities.UtilHelpers: Indexer failed
java.lang.IllegalArgumentException: Invalid number of file groups for partition:column_stats, found=2, required=1
	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.prepRecords(HoodieBackedTableMetadataWriter.java:968)
	at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:132)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initialCommit(HoodieBackedTableMetadataWriter.java:1087)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.buildMetadataPartitions(HoodieBackedTableMetadataWriter.java:858)
	at org.apache.hudi.table.action.index.RunIndexActionExecutor.execute(RunIndexActionExecutor.java:140)
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.index(HoodieSparkCopyOnWriteTable.java:291)
	at org.apache.hudi.client.BaseHoodieWriteClient.index(BaseHoodieWriteClient.java:1027)
	at org.apache.hudi.utilities.HoodieIndexer.scheduleAndRunIndexing(HoodieIndexer.java:278)
	at org.apache.hudi.utilities.HoodieIndexer.lambda$start$1(HoodieIndexer.java:198)
	at org.apache.hudi.utilities.UtilHelpers.retry(UtilHelpers.java:541)
	at org.apache.hudi.utilities.HoodieIndexer.start(HoodieIndexer.java:185)
	at org.apache.hudi.utilities.HoodieIndexer.main(HoodieIndexer.java:154)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2022-05-26 00:14:27,947 ERROR utilities.HoodieIndexer: Indexing with basePath: s3a://sagars-testlake/TPC-DS/1TB/hudi_hand_tuned_may20_1/call_center, tableName: call_center, runningMode: scheduleandexecute failed
2022-05-26 00:14:27,954 INFO server.AbstractConnector: Stopped Spark@450794b4{HTTP/1.1, (http/1.1)}{0.0.0.0:8090}
2022-05-26 00:14:27,954 INFO ui.SparkUI: Stopped Spark web UI at http://ip-172-31-39-68.us-east-2.compute.internal:8090
2022-05-26 00:14:27,964 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)