You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/10/21 04:11:39 UTC
Apache Pinot Daily Email Digest (2020-10-20)
### _#general_
**@yupeng:** Please check out our recent blog on how we operate pinot at Uber
scale. We are glad to share our learnings with the community.
**@brijoobopanna:** @brijoobopanna has joined the channel
**@babak:** @babak has joined the channel
**@bharadwaj.r07:** @bharadwaj.r07 has joined the channel
### _#random_
**@brijoobopanna:** @brijoobopanna has joined the channel
**@babak:** @babak has joined the channel
**@bharadwaj.r07:** @bharadwaj.r07 has joined the channel
### _#troubleshooting_
**@tanmay.movva:** Hello, I am trying to setup s3 as segment store for pinot,
which is deployed on kubernetes. Unfortunately it is a cross account bucket
and we have to pass bucket ACL also. I couldn’t find any way to pass acl
policy in the docs. Can anyone please help me with this?
**@fx19880617:** You can try to set it in controller/server config like:
```pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.server.storage.factory.s3.region=us-west-2
pinot.server.storage.factory.s3.accessKey=AKIARC**********
pinot.server.storage.factory.s3.secretKey=aaaaaaaaaaaa```
**@fx19880617:** similar in controller:
```pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=us-west-2
pinot.controller.storage.factory.s3.accessKey=AKIARC**********
pinot.controller.storage.factory.s3.secretKey=aaaaaaaaaaaa
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher```
**@tanmay.movva:** I did get that part. But we have to provide acl policy for
the s3 bucket so that pinot is able to write in that bucket. I am looking for
something similar to `druid.storage.disableAcl` in druid. Ref - It’s
implementation can be found here -
**@tanmay.movva:** I have already set the required configs for s3. Thanks for
your quick reply @fx19880617!
**@tanmay.movva:** But what I need is to tell pinot to set `bucket-owner-full-
control` as the acl.
**@fx19880617:** let me take a look that
**@g.kishore:**
**@g.kishore:** we might have to change this code to setup the s3clientbuilder
**@g.kishore:** @pradeepgv42 what do you think?
**@pradeepgv42:** @tanmay.movva I think S3PinotFS needs to be updated it
seems, currently that option is missing something similar to what you pointed
out here should work I believe for all the PutObjectRequests
**@fx19880617:** can we try to expose those options transparently?
**@pradeepgv42:** code is missing too
**@g.kishore:** I think Xiang is suggesting if there is a way we can pass all
the properties from pinot.controller.segment.fetcher.s3.** transparently to
the s3clientbuilder, this will solve the problem of having to change the code
everytime a new property needs to be set in S3Client
**@pradeepgv42:** Seems like this acl property need to be setup for each
upload/copy (PutObjectRequest & CopyObjectRequest) of any file on S3, so not
sure we can achieve that with just properties without code change.
**@g.kishore:** I see
**@pradeepgv42:** Code change should be simple, whereever there is
CopyObjectRequest or PutObjectRequest and when the config is turned on, set
acls
**@fx19880617:** got it. Created an issue: , Could you fill up more info
there ?
**@pradeepgv42:** done
**@venkatesan.v:** @venkatesan.v has joined the channel
### _#metadata-push-api_
**@fx19880617:** Hi, wanna bring this up, from the code perspective, seems
that for new segment add, we don’t have any synchronization on idealstates
updater
**@mayanks:** What do you mean?
**@fx19880617:** so if user tries to upload with high parallism then very
likely the idealstate update will fail
**@fx19880617:** say user pushes 20k segments with 100 threads in parallel
**@fx19880617:**
**@mayanks:** Isn't there a retry policy?
**@fx19880617:** currently they can only achieve about 4 as push parallelism
**@fx19880617:** this is considering retry
**@fx19880617:** otherwise parallelism =2 may cause the issue
**@mayanks:** I mean there was a retry policy in updating zk
**@fx19880617:** yes
**@mayanks:** could you describe the race condition?
**@fx19880617:** this is already considered
**@fx19880617:** it’s not race condition, it’s just many threads are trying
to update zk
**@fx19880617:** so the version bumped and the request is not succeed
**@mayanks:** So they cannot finish in the specified number of retries?
**@fx19880617:** then it needs to retry again
**@fx19880617:** yes
**@mayanks:** what is the max number of retries
**@fx19880617:** ```private static final RetryPolicy DEFAULT_RETRY_POLICY =
RetryPolicies.exponentialBackoffRetryPolicy(5, 1000L, 2.0f); ```
**@mayanks:** If we synchronized zk metadata update and IS update, will it
help?
**@fx19880617:** 5 times
**@mayanks:** it should at least reduce the retry
**@mayanks:** actual one question
**@mayanks:** why do we need parallelism for metadata push/
**@fx19880617:** It will help in single server level but still in the case of
pushing to all the controllers, we will still see the race
**@mayanks:** it should be fast anywyas, right
**@fx19880617:** for pushing 20k segments to bootstrap data, each segment
upload took 4 seconds
**@mayanks:** 20k segments? What is the segment size?
**@fx19880617:** 200mb
**@mayanks:** ok
**@fx19880617:** it’s metadata push, so segment size doesn’t matter
**@mayanks:** yeah, so either we make IS + ZK update sequential (in which
case no reason for parallel push), or we increase num retries
**@fx19880617:** just looking for a way to speed this up
**@fx19880617:** right
**@mayanks:** increasing num retries will put lot of load on ZK
**@fx19880617:** I think single controller level, we should put a table level
sync
**@mayanks:** if 20k segments
**@fx19880617:** yes
**@fx19880617:** 3 controllers means 3 parallel updates on table idealstates
**@fx19880617:** I think it’s anyway much better than current implementation
**@mayanks:** I think that's how we use it at lnkd
**@mayanks:** we have a VIP with 3 controllers
**@mayanks:** so we put parallelism as 3 for big use cases
**@fx19880617:** yeah, I think for segment data push, there are enough leeway
for idealstates update
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org