You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/19 06:20:56 UTC

[GitHub] [hudi] codejoyan opened a new issue #3499: [SUPPORT] Inline Clustering fails with Hudi

codejoyan opened a new issue #3499:
URL: https://github.com/apache/hudi/issues/3499


   **Environment**
   Hudi Version - 0.7
   Spark - 2.4.7
   DFS - Google Cloud storage
   
   **Inline Clustering Enabled**
   For lower ingestion latency without compromising the query performance I have now deployed a code with inline clustering enabled for further incremental runs. When I run the code with clustering it errors out with the below error 
   
   I have no idea why the number of partitions is -10. Please help me to debug.
   **Stacktrace of the error**
   ```
   21/08/19 06:08:34 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210818063404__commit__COMPLETED], [20210818064709__commit__COMPLETED], [20210818071622__commit__COMPLETED], [20210818072722__commit__COMPLETED], [20210818073610__commit__COMPLETED], [20210818074601__commit__COMPLETED], [20210818080912__commit__COMPLETED], [20210818083622__commit__COMPLETED], [20210819054628__rollback__COMPLETED], [20210819060506__commit__COMPLETED], [==>20210819060829__replacecommit__INFLIGHT]]
   **21/08/19 06:08:34 ERROR util.GlobalVar$: java.lang.IllegalArgumentException: requirement failed: Number of partitions cannot be negative but found -10**.
   21/08/19 06:08:34 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.Exception: Query execution failed in transformAndLoadBaseTable
   java.lang.Exception: Query execution failed in transformAndLoadBaseTable
   	at com.walmart.finwb.salesbaseload.SalesLoadBaseTables$.transformAndLoadBaseTable(SalesLoadBaseTables.scala:323)
   	at com.walmart.finwb.salesbaseload.SalesLoadBaseTables$.main(SalesLoadBaseTables.scala:116)
   	at com.walmart.finwb.salesbaseload.SalesLoadBaseTables.main(SalesLoadBaseTables.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:686)
   Caused by: java.lang.IllegalArgumentException: requirement failed: Number of partitions cannot be negative but found -10.
   	at scala.Predef$.require(Predef.scala:224)
   	at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:155)
   	at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:151)
   	at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
   	at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
   	at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
   	at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:645)
   	at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:646)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
   	at org.apache.spark.rdd.RDD.sortBy(RDD.scala:643)
   	at org.apache.spark.api.java.JavaRDD.sortBy(JavaRDD.scala:206)
   	at org.apache.hudi.execution.bulkinsert.GlobalSortPartitioner.repartitionRecords(GlobalSortPartitioner.java:41)
   	at org.apache.hudi.execution.bulkinsert.GlobalSortPartitioner.repartitionRecords(GlobalSortPartitioner.java:34)
   	at org.apache.hudi.table.action.commit.SparkBulkInsertHelper.bulkInsert(SparkBulkInsertHelper.java:103)
   	at org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy.performClustering(SparkSortAndSizeExecutionStrategy.java:74)
   	at org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy.performClustering(SparkSortAndSizeExecutionStrategy.java:50)
   	at org.apache.hudi.table.action.cluster.SparkExecuteClusteringCommitActionExecutor.lambda$runClusteringForGroupAsync$3(SparkExecuteClusteringCommitActionExecutor.java:121)
   	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
   	at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
   	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
   	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
   	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
   	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
   21/08/19 06:08:34 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.Exception: Query execution failed in transformAndLoadBaseTable
   ```
   Let me know if you need any further information
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zhangyue19921010 commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

zhangyue19921010 commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-902645799


   Hi @codejoyan There was a bug in 0.7.0-release during clustering plan step, which use `int totalSizeSoFar = 0;` to calculate the size of the current data. As you can see the type of `totalSizeSoFar` is Integer and `getWriteConfig().getClusteringMaxBytesInGroup()` is long, so that will lead to overflow something like `Integer.MAX_VALUE + 1 = -2147483648` 
   
   Also this bug is already fixed in https://github.com/apache/hudi/pull/2502 
   
   Maybe you can upgrade your hudi to 0.8.0 and have a try :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codejoyan edited a comment on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

codejoyan edited a comment on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-906690813


   Hi @zhangyue19921010 
   I am unable to find these info logs in the logs for the commit which does the clustering. 
   e.g. the clustering run created 2 commit metadata (1 commit and 1 replacecommit). If I search that logs, I do not get the above info logs.  
   
   **application_1629887089729_0829**
   20210826192935.commit
   20210826193107.replacecommit
   
   yarn logs -applicationId application_1629887089729_0829 -> does not contain the info logs.
   
   Will the info logs be in previous runs when clustering does not take place. Let me know if you need any more info


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

codope commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-907142318


   @codejoyan These logs were not present Hudi 0.7.0. They were added in 0.8.0. I am assuming the info logs are not suppressed. Are you checking the logs while running 0.8.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zhangyue19921010 commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

zhangyue19921010 commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-904237706


   Hi @codejoyan  could you please find all the info log including 
   `LOG.info("Adding one clustering group " + totalSizeSoFar + " max bytes: "
               + getWriteConfig().getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups);` 
   and 
   `LOG.info("Adding final clustering group " + totalSizeSoFar + " max bytes: "
             + getWriteConfig().getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups);`
    
   it is useful to debug using 0.8.0 hudi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-1002291692


   @codejoyan : Do you have any updates for us in this regard. If the issue is resolved, feel free to close out the github issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-1004320635


   thanks. but do reach out to us if you hit any issues. would be happy to assist you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan closed issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

nsivabalan closed issue #3499:
URL: https://github.com/apache/hudi/issues/3499


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codejoyan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

codejoyan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-906690813


   Hi @zhangyue19921010 
   I am unable to find these logs in the logs for the commit which does the clustering. 
   e.g. the clustering run created 2 commit metadata (1 commit and 1 replacecommit). If I search that logs, I do not get the above info logs.  
   
   **application_1629887089729_0829**
   20210826192935.commit
   20210826193107.replacecommit
   
   yarn logs -applicationId application_1629887089729_0829 -> does not contain the info logs.
   
   Will the info logs be in previous runs when clustering does not take place. Let me know if you need any more info


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-993954461


   @codejoyan : May I know of latest updates in this regard. Would like to get to the bottom of this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zhangyue19921010 edited a comment on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

zhangyue19921010 edited a comment on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-904237706


   Hi @codejoyan  could you please share all the info log including 
   `LOG.info("Adding one clustering group " + totalSizeSoFar + " max bytes: "
               + getWriteConfig().getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups);` 
   and 
   `LOG.info("Adding final clustering group " + totalSizeSoFar + " max bytes: "
             + getWriteConfig().getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups);`
    
   it is useful to debug using 0.8.0 hudi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codejoyan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

codejoyan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-903920701


   Hi @zhangyue19921010 I tried with 0.8.0 and ran into exact same issue. Looks like this is a bug. I will try to dig deep.
   Please let me know if there is any workaround as I want to test the ingestion performance and read performance with clustering on.
   
   Caused by: java.lang.IllegalArgumentException: requirement failed: Number of partitions cannot be negative but found -5.
   	at scala.Predef$.require(Predef.scala:224)
   	at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:155)
   	at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:151)
   	at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
   	at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
   	at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
   	at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:645)
   	at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:646)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
   	at org.apache.spark.rdd.RDD.sortBy(RDD.scala:643)
   	at org.apache.spark.api.java.JavaRDD.sortBy(JavaRDD.scala:206)
   	at org.apache.hudi.execution.bulkinsert.GlobalSortPartitioner.repartitionRecords(GlobalSortPartitioner.java:41)
   	at org.apache.hudi.execution.bulkinsert.GlobalSortPartitioner.repartitionRecords(GlobalSortPartitioner.java:34)
   	at org.apache.hudi.table.action.commit.SparkBulkInsertHelper.bulkInsert(SparkBulkInsertHelper.java:103)
   	at org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy.performClustering(SparkSortAndSizeExecutionStrategy.java:74)
   	at org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy.performClustering(SparkSortAndSizeExecutionStrategy.java:50)
   	at org.apache.hudi.table.action.cluster.SparkExecuteClusteringCommitActionExecutor.lambda$runClusteringForGroupAsync$3(SparkExecuteClusteringCommitActionExecutor.java:121)
   	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
   	at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
   	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
   	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
   	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codejoyan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

codejoyan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-1003898337


   Hi @nsivabalan I am yet to retest this issue. We can close this issue as of now.
   I will test with both 0.9.0 and 0.10.0 and open a GH issue if it persists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codejoyan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

codejoyan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-907794637


   Yes I am checking the logs while running 0.8.0. Let me know what additional information you need to debug this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan edited a comment on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

nsivabalan edited a comment on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-930269121


   @codejoyan : whats the key generator you are using? Can you confirm you are setting those params (key gen, record key, partition path) while setting these clustering configs as well. 
   because, from stack trace this is what I see. 
   Since you don't set any value for "hoodie.clustering.plan.strategy.sort.columns", we resot to using built in BulkInsertSort mode. by default we do global sorting here depending on (partition path, record key) pair. 
   
   We can try out one thing here. We can try to set no sorting mode and see what happens. 
    "hoodie.bulkinsert.sort.mode" = "NONE"
   
   Alternatively, can you give it a shot by setting "hoodie.clustering.plan.strategy.sort.columns" to record key field may be. We can also see how that pans out. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan edited a comment on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

nsivabalan edited a comment on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-930269121


   @codejoyan : whats the key generator you are using? Can you confirm you are setting those params (key gen, record key, partition path) while setting these clustering configs as well. 
   because, from stack trace this is what I see. 
   Since you don't set any value for "hoodie.clustering.plan.strategy.sort.columns", we resot to using built in BulkInsertSort mode. by default we do global sorting here depending on (partition path, record key) pair. 
   
   We can try out one thing here. We can try to set no sorting mode and see what happens. 
    "hoodie.bulkinsert.sort.mode" = "NONE"
   
   Alternatively, can you give it a shot by setting "hoodie.clustering.plan.strategy.sort.columns" to record key field may be. We can also see how that pans out. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zhangyue19921010 commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

zhangyue19921010 commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-904263730


   Also the log like `LOG.info("Starting clustering for a group, parallelism:" + numOutputGroups + " commit:" + instantTime);`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-930269121


   @codejoyan : whats the key generator you are using? Can you confirm you are setting those params (key gen, record key, partition path) while setting these clustering configs as well. 
   because, from stack trace this is what I see. 
   Since you don't set any value for "hoodie.clustering.plan.strategy.sort.columns", we resot to using built in BulkInsertSort mode. by default we do global sorting here depending on (partition path, record key) pair. 
   
   Alternatively, can you give it a shot by setting "hoodie.clustering.plan.strategy.sort.columns" to record key field may be. We can see how that pans out. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-930269121


   @codejoyan : whats the key generator you are using? Can you confirm you are setting those params (key gen, record key, partition path) while setting these clustering configs as well. 
   because, from stack trace this is what I see. 
   Since you don't set any value for "hoodie.clustering.plan.strategy.sort.columns", we resot to using built in BulkInsertSort mode. by default we do global sorting here depending on (partition path, record key) pair. 
   
   Alternatively, can you give it a shot by setting "hoodie.clustering.plan.strategy.sort.columns" to record key field may be. We can see how that pans out. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #3499:
URL: https://github.com/apache/hudi/issues/3499#issuecomment-926275260


   @satishkotha is anything jumping out to you here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org