You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/10/14 16:30:40 UTC
[GitHub] [incubator-pinot] lgo opened a new issue #6146: Low maximum limit for batch jobSpec pushParallelism
lgo opened a new issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146
On batch jobs processing lots of segments for a table, they often run into Zookeeper conflicts when updating idealState. This causes contention on updates, slowing down everything. To resolve that the pushParallelism on a job spec had to be reduced to ~5, so that it would conflicts less recently. There was another issue y'all resolved which helped ensure progress happened on conflicts.
Being able to upload segments faster will drastically reduce the burden for operating Pinot (backfilling, large segment uploads, or for adjusting existing data).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] joey-stripe edited a comment on issue #6146: Low maximum limit for batch jobSpec pushParallelism
Posted by GitBox <gi...@apache.org>.
joey-stripe edited a comment on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708570008
Here was the stripped down jobSpec we are using for reference
```yaml
executionFrameworkSpec:
name: spark
segmentMetadataPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner
segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner
jobType: SegmentCreationAndMetadataPush
overwriteOutput: true
pinotFSSpecs:
- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS
configs:
region: ...
recordReaderSpec:
dataFormat: ...
className: ...
tableSpec:
tableName: ...
pinotClusterSpecs:
- controllerURI: ...
segmentNameGeneratorSpec:
type: normalizedDate
configs:
segment.name.prefix: ...
pushJobSpec:
segmentUriPrefix: ...
segmentUriSuffix: ''
pushParallelism: 5
pushAttempts: 5
pushRetryIntervalMillis: 3000
```
And, a few relevant chunks from the pinot server conf
```conf
pinot.server.instance.enable.split.commit=true
```
As well as the controller conf.
```conf
controller.enable.split.commit=true
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] mcvsubbu commented on issue #6146: Low maximum limit for batch jobSpec pushParallelism
Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708561221
Not sure what you mean by rethink.
A metadata only or URI push is a cheaper operation, so there is less likelyhood of contention.
We can also make the backoff be in smaller increments?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 edited a comment on issue #6146: Low maximum limit for batch jobSpec pushParallelism
Posted by GitBox <gi...@apache.org>.
fx19880617 edited a comment on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708558434
I feel this is due to we upload segments to all the controller hosts and the idealStats update requests coming from all controllers will cause the slowness and update conflicts.
This issue wasn't there as controller needs to download the segment tar and untar the metadata then do the update, so it's a costly behavior in controller. With segment metadata only push mode, we may need to rethink this.
cc: @siddharthteotia @mayankshriv @snleee
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 edited a comment on issue #6146: Low maximum limit for batch jobSpec pushParallelism
Posted by GitBox <gi...@apache.org>.
fx19880617 edited a comment on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708558434
This is due to we upload segments to all the controller hosts and the idealStats update requests coming from all controllers will cause the slowness and update conflicts.
This issue wasn't there as controller needs to download the segment tar and untar the metadata then do the update, so it's a costly behavior in controller. With segment metadata only push mode, we may need to rethink this.
cc: @siddharthteotia @mayankshriv @snleee
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6146: Low maximum limit for batch jobSpec pushParallelism
Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-713304807
An improvement for this: https://github.com/apache/incubator-pinot/pull/6165
This will limit the idealstates update parallelism to at most the number of pinot-controllers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] joey-stripe commented on issue #6146: Low maximum limit for batch jobSpec pushParallelism
Posted by GitBox <gi...@apache.org>.
joey-stripe commented on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708570008
Here was the stripped down jobSpec we are using
```yaml
executionFrameworkSpec:
name: spark
segmentMetadataPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner
segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner
jobType: SegmentCreationAndMetadataPush
overwriteOutput: true
pinotFSSpecs:
- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS
configs:
region: ...
recordReaderSpec:
dataFormat: ...
className: ...
tableSpec:
tableName: ...
pinotClusterSpecs:
- controllerURI: ...
segmentNameGeneratorSpec:
type: normalizedDate
configs:
segment.name.prefix: ...
pushJobSpec:
segmentUriPrefix: ...
segmentUriSuffix: ''
pushParallelism: 5
pushAttempts: 5
pushRetryIntervalMillis: 3000
```
And, a few relevant chunks from the pinot server conf
```conf
pinot.server.instance.enable.split.commit=true
```
As well as the controller conf.
```conf
controller.enable.split.commit=true
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6146: Low maximum limit for batch jobSpec pushParallelism
Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6146:
URL: https://github.com/apache/incubator-pinot/issues/6146#issuecomment-708558434
I feel is is due to we upload segments to all the controller hosts and the idealStats update requests coming from all controllers will cause the slowness and update conflicts.
This issue wasn't there as controller needs to download the segment tar and untar the metadata then do the update, so it's a costly behavior in controller. With segment metadata only push mode, we may need to rethink this.
cc: @siddharthteotia @mayankshriv @snleee
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org