You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/10/14 16:30:40 UTC

[GitHub] [incubator-pinot] lgo opened a new issue #6146: Low maximum limit for batch jobSpec pushParallelism

lgo opened a new issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146


   On batch jobs processing lots of segments for a table, they often run into Zookeeper conflicts when updating idealState. This causes contention on updates, slowing down everything. To resolve that the pushParallelism on a job spec had to be reduced to ~5, so that it would conflicts less recently. There was another issue y'all resolved which helped ensure progress happened on conflicts.
   
   Being able to upload segments faster will drastically reduce the burden for operating Pinot (backfilling, large segment uploads, or for adjusting existing data).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] joey-stripe edited a comment on issue #6146: Low maximum limit for batch jobSpec pushParallelism

Posted by GitBox <gi...@apache.org>.
joey-stripe edited a comment on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708570008


   Here was the stripped down jobSpec we are using for reference
   ```yaml
   executionFrameworkSpec:
     name: spark
     segmentMetadataPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner
     segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
     segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner
   jobType: SegmentCreationAndMetadataPush
   overwriteOutput: true
   pinotFSSpecs:
   - scheme: s3
     className: org.apache.pinot.plugin.filesystem.S3PinotFS
     configs:
       region: ...
   recordReaderSpec:
     dataFormat: ...
     className: ...
   tableSpec:
     tableName: ...
   pinotClusterSpecs:
   - controllerURI: ...
   segmentNameGeneratorSpec:
     type: normalizedDate
     configs:
      segment.name.prefix: ...
   pushJobSpec:
     segmentUriPrefix: ...
     segmentUriSuffix: ''
     pushParallelism: 5
     pushAttempts: 5
     pushRetryIntervalMillis: 3000
   ```
   
   And, a few relevant chunks from the pinot server conf
   ```conf
   pinot.server.instance.enable.split.commit=true
   ```
   
   As well as the controller conf.
   ```conf
   controller.enable.split.commit=true
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6146: Low maximum limit for batch jobSpec pushParallelism

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708561221


   Not sure what you mean by rethink. 
   A metadata only or URI push is a cheaper operation, so there is less likelyhood of contention.
   We can also make the backoff be in smaller increments?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 edited a comment on issue #6146: Low maximum limit for batch jobSpec pushParallelism

Posted by GitBox <gi...@apache.org>.
fx19880617 edited a comment on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708558434


   I feel this is due to we upload segments to all the controller hosts and the idealStats update requests coming from all controllers will cause the slowness and update conflicts.
   
   This issue wasn't there as controller needs to download the segment tar and untar the metadata then do the update, so it's a costly behavior in controller. With segment metadata only push mode, we may need to rethink this.
   
   cc: @siddharthteotia @mayankshriv @snleee 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 edited a comment on issue #6146: Low maximum limit for batch jobSpec pushParallelism

Posted by GitBox <gi...@apache.org>.
fx19880617 edited a comment on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708558434


   This is due to we upload segments to all the controller hosts and the idealStats update requests coming from all controllers will cause the slowness and update conflicts.
   
   This issue wasn't there as controller needs to download the segment tar and untar the metadata then do the update, so it's a costly behavior in controller. With segment metadata only push mode, we may need to rethink this.
   
   cc: @siddharthteotia @mayankshriv @snleee 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6146: Low maximum limit for batch jobSpec pushParallelism

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-713304807


   An improvement for this: https://github.com/apache/incubator-pinot/pull/6165
   This will limit the idealstates update parallelism to at most the number of pinot-controllers.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] joey-stripe commented on issue #6146: Low maximum limit for batch jobSpec pushParallelism

Posted by GitBox <gi...@apache.org>.
joey-stripe commented on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708570008


   Here was the stripped down jobSpec we are using
   ```yaml
   executionFrameworkSpec:
     name: spark
     segmentMetadataPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner
     segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
     segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner
   jobType: SegmentCreationAndMetadataPush
   overwriteOutput: true
   pinotFSSpecs:
   - scheme: s3
     className: org.apache.pinot.plugin.filesystem.S3PinotFS
     configs:
       region: ...
   recordReaderSpec:
     dataFormat: ...
     className: ...
   tableSpec:
     tableName: ...
   pinotClusterSpecs:
   - controllerURI: ...
   segmentNameGeneratorSpec:
     type: normalizedDate
     configs:
      segment.name.prefix: ...
   pushJobSpec:
     segmentUriPrefix: ...
     segmentUriSuffix: ''
     pushParallelism: 5
     pushAttempts: 5
     pushRetryIntervalMillis: 3000
   ```
   
   And, a few relevant chunks from the pinot server conf
   ```conf
   pinot.server.instance.enable.split.commit=true
   ```
   
   As well as the controller conf.
   ```conf
   controller.enable.split.commit=true
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6146: Low maximum limit for batch jobSpec pushParallelism

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708558434


   I feel is is due to we upload segments to all the controller hosts and the idealStats update requests coming from all controllers will cause the slowness and update conflicts.
   
   This issue wasn't there as controller needs to download the segment tar and untar the metadata then do the update, so it's a costly behavior in controller. With segment metadata only push mode, we may need to rethink this.
   
   cc: @siddharthteotia @mayankshriv @snleee 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org