You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/10/21 04:11:39 UTC

Apache Pinot Daily Email Digest (2020-10-20)

### _#general_

  
 **@yupeng:** Please check out our recent blog on how we operate pinot at Uber
scale. We are glad to share our learnings with the community.  
**@brijoobopanna:** @brijoobopanna has joined the channel  
 **@babak:** @babak has joined the channel  
 **@bharadwaj.r07:** @bharadwaj.r07 has joined the channel  

###  _#random_

  
 **@brijoobopanna:** @brijoobopanna has joined the channel  
 **@babak:** @babak has joined the channel  
 **@bharadwaj.r07:** @bharadwaj.r07 has joined the channel  

###  _#troubleshooting_

  
 **@tanmay.movva:** Hello, I am trying to setup s3 as segment store for pinot,
which is deployed on kubernetes. Unfortunately it is a cross account bucket
and we have to pass bucket ACL also. I couldn’t find any way to pass acl
policy in the docs. Can anyone please help me with this?  
**@fx19880617:** You can try to set it in controller/server config like:
```pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.server.storage.factory.s3.region=us-west-2
pinot.server.storage.factory.s3.accessKey=AKIARC**********
pinot.server.storage.factory.s3.secretKey=aaaaaaaaaaaa```  
**@fx19880617:** similar in controller:
```pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=us-west-2
pinot.controller.storage.factory.s3.accessKey=AKIARC**********
pinot.controller.storage.factory.s3.secretKey=aaaaaaaaaaaa
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher```  
**@tanmay.movva:** I did get that part. But we have to provide acl policy for
the s3 bucket so that pinot is able to write in that bucket. I am looking for
something similar to `druid.storage.disableAcl` in druid. Ref -  It’s
implementation can be found here -  
**@tanmay.movva:** I have already set the required configs for s3. Thanks for
your quick reply @fx19880617!  
**@tanmay.movva:** But what I need is to tell pinot to set `bucket-owner-full-
control` as the acl.  
**@fx19880617:** let me take a look that  
**@g.kishore:**  
**@g.kishore:** we might have to change this code to setup the s3clientbuilder  
**@g.kishore:** @pradeepgv42 what do you think?  
**@pradeepgv42:** @tanmay.movva I think S3PinotFS needs to be updated it
seems, currently that option is missing  something similar to what you pointed
out here should work I believe for all the PutObjectRequests  
**@fx19880617:** can we try to expose those options transparently?  
**@pradeepgv42:** code is missing too  
**@g.kishore:** I think Xiang is suggesting if there is a way we can pass all
the properties from pinot.controller.segment.fetcher.s3.** transparently to
the s3clientbuilder, this will solve the problem of having to change the code
everytime a new property needs to be set in S3Client  
**@pradeepgv42:** Seems like this acl property need to be setup for each
upload/copy (PutObjectRequest & CopyObjectRequest) of any file on S3, so not
sure we can achieve that with just properties without code change.  
**@g.kishore:** I see  
**@pradeepgv42:** Code change should be simple, whereever there is
CopyObjectRequest or PutObjectRequest and when the config is turned on, set
acls  
**@fx19880617:** got it. Created an issue:  , Could you fill up more info
there ?  
**@pradeepgv42:** done  
 **@venkatesan.v:** @venkatesan.v has joined the channel  

###  _#metadata-push-api_

  
 **@fx19880617:** Hi, wanna bring this up, from the code perspective, seems
that for new segment add, we don’t have any synchronization on idealstates
updater  
 **@mayanks:** What do you mean?  
 **@fx19880617:** so if user tries to upload with high parallism then very
likely the idealstate update will fail  
 **@fx19880617:** say user pushes 20k segments with 100 threads in parallel  
 **@fx19880617:**  
 **@mayanks:** Isn't there a retry policy?  
 **@fx19880617:** currently they can only achieve about 4 as push parallelism  
 **@fx19880617:** this is considering retry  
 **@fx19880617:** otherwise parallelism =2 may cause the issue  
 **@mayanks:** I mean there was a retry policy in updating zk  
 **@fx19880617:** yes  
 **@mayanks:** could you describe the race condition?  
 **@fx19880617:** this is already considered  
 **@fx19880617:** it’s not race condition, it’s just many threads are trying
to update zk  
 **@fx19880617:** so the version bumped and the request is not succeed  
 **@mayanks:** So they cannot finish in the specified number of retries?  
 **@fx19880617:** then it needs to retry again  
 **@fx19880617:** yes  
 **@mayanks:** what is the max number of retries  
 **@fx19880617:** ```private static final RetryPolicy DEFAULT_RETRY_POLICY =
RetryPolicies.exponentialBackoffRetryPolicy(5, 1000L, 2.0f); ```  
**@mayanks:** If we synchronized zk metadata update and IS update, will it
help?  
 **@fx19880617:** 5 times  
 **@mayanks:** it should at least reduce the retry  
 **@mayanks:** actual one question  
 **@mayanks:** why do we need parallelism for metadata push/  
 **@fx19880617:** It will help in single server level but still in the case of
pushing to all the controllers, we will still see the race  
 **@mayanks:** it should be fast anywyas, right  
 **@fx19880617:** for pushing 20k segments to bootstrap data, each segment
upload took 4 seconds  
 **@mayanks:** 20k segments? What is the segment size?  
 **@fx19880617:** 200mb  
 **@mayanks:** ok  
 **@fx19880617:** it’s metadata push, so segment size doesn’t matter  
 **@mayanks:** yeah, so either we make IS + ZK update sequential (in which
case no reason for parallel push), or we increase num retries  
 **@fx19880617:** just looking for a way to speed this up  
 **@fx19880617:** right  
 **@mayanks:** increasing num retries will put lot of load on ZK  
 **@fx19880617:** I think single controller level, we should put a table level
sync  
 **@mayanks:** if 20k segments  
 **@fx19880617:** yes  
 **@fx19880617:** 3 controllers means 3 parallel updates on table idealstates  
 **@fx19880617:** I think it’s anyway much better than current implementation  
 **@mayanks:** I think that's how we use it at lnkd  
 **@mayanks:** we have a VIP with 3 controllers  
 **@mayanks:** so we put parallelism as 3 for big use cases  
 **@fx19880617:** yeah, I think for segment data push, there are enough leeway
for idealstates update  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org