You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "waitingF (via GitHub)" <gi...@apache.org> on 2023/03/29 07:34:39 UTC

[GitHub] [hudi] waitingF opened a new pull request, #8313: [SUPPORT] split source of kafka partition by count

waitingF opened a new pull request, #8313:
URL: https://github.com/apache/hudi/pull/8313

   ### Change Logs
   
   For the kafka source, when pulling data from kafka, the default parallelism is the number of kafka partitions.
   There are cases: 
   1. Pulling large amount of data from kafka (eg. maxEvents=100000000), but the # of kafka partition is not enough, the procedure of the pulling will cost too much of time
   2. There is huge data skew between kafka partitions, the procedure of the pulling will be blocked by the slowest partition
   
   to solve those cases, I add a parameter `hoodie.deltastreamer.kafka.per.batch.maxEvents` to control the maxEvents in one kafka batch, default Long.MAX_VALUE means not trun this feature on.
   
   ### Impact
   
   _Default should be no impact._
   
   ### Risk level (write none, low medium or high below)
   
   _none._
   
   ### Documentation Update
   
   - `hoodie.deltastreamer.kafka.per.batch.maxEvents` the config controls the max events when pulling data from kafka source
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8313: [SUPPORT] split source of kafka partition by count

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8313:
URL: https://github.com/apache/hudi/pull/8313#issuecomment-1488152516

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6a7ecf54571757b8aa6a869435d9f642ec4649eb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15970",
       "triggerID" : "6a7ecf54571757b8aa6a869435d9f642ec4649eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a7ecf54571757b8aa6a869435d9f642ec4649eb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15970) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8313: [SUPPORT] split source of kafka partition by count

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8313:
URL: https://github.com/apache/hudi/pull/8313#issuecomment-1488097432

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6a7ecf54571757b8aa6a869435d9f642ec4649eb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6a7ecf54571757b8aa6a869435d9f642ec4649eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a7ecf54571757b8aa6a869435d9f642ec4649eb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waitingF closed pull request #8313: [SUPPORT] split source of kafka partition by count

Posted by "waitingF (via GitHub)" <gi...@apache.org>.
waitingF closed pull request #8313: [SUPPORT] split source of kafka partition by count
URL: https://github.com/apache/hudi/pull/8313


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8313: [SUPPORT] split source of kafka partition by count

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8313:
URL: https://github.com/apache/hudi/pull/8313#issuecomment-1488420625

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6a7ecf54571757b8aa6a869435d9f642ec4649eb",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15970",
       "triggerID" : "6a7ecf54571757b8aa6a869435d9f642ec4649eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a7ecf54571757b8aa6a869435d9f642ec4649eb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15970) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org