You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Pinot Slack Email Digest <> on 2022/06/04 02:49:24 UTC

Apache Pinot Daily Email Digest (2022-06-03)

### _#general_

 **@fritz.wijaya:** @fritz.wijaya has joined the channel  
 **@fritz.wijaya:** Hi pinot community, Does pinot is a good case for
exporting detailed data report use case? The report would have some level of
aggreation but the granularity of dimensions is still high? Does this kind of
use case still fit with pinot? Thanks  
**@g.kishore:** if thats the primary use case, then Pinot is probably not the
right solution. But if you have other use cases and exporting detailed report
is infrequent, it should be ok  
**@fritz.wijaya:** Thanks @g.kishore for responding. The frequency maybe upto
5% of the total request that would do the export use case. But, it potentially
would exporting quite range of data period (upto 1 year) for each client data  
**@fritz.wijaya:** Does this kind of workload still good fit?  
**@g.kishore:** how many rows do you think it will need to export?  
**@fritz.wijaya:** Couple hundreds thousands records  
**@mayanks:** If the other 95% is analyrical workload and for few hundreds of
thousands of records you can definitely use Pinot  
**@mayanks:** Even if the number of records in report grow you ca still use
Pinot, but you might then want to split the query over different time ranges  
**@fritz.wijaya:** Thanks @mayanks. That is great news for me. What do you
mean by split the query by different time range?  
**@mayanks:** For example instead of getting millions of rows across a big
time range in a single query, break the query into multiple queries in smaller
time ranges and concat the results on client. It is pattern I have seen in
production for reporting cases. But may or may not apply to you  
**@fritz.wijaya:** I see. Thanks for explanation. But, would it be help if
implementing pagination when query the data? How the "reporting workload"
would be affecting the "analytics workload"? Does it necessary to separate the
 **@sowmya.gowda:** Hi Team, I have a scenario to load my local files in a
particular folder to pinot offline table. Suppose that, files will increase
for every one hour or so. How do I create a segments for those files in timely
basis for every hour ? Is there any automatic process for creating segments
for hour or so ?  
**@kharekartik:** Hi yes you can use Minion SegmentGenerationAndPushTask  
**@sowmya.gowda:** Thanks, I'll go through it  
 **@sandeep278:** @sandeep278 has joined the channel  
 **@facundo.bianco:** Hi Pinot Team, do you know if talk about  was recorded?
(and where I can find it). It was at Trino Submit (). Thank you.  
**@mayanks:** @brianolsen87 ^^  
**@mayanks:** @elon.azoulay  
**@brianolsen87:** Getting it  
 **@abhiram.p:** @abhiram.p has joined the channel  

###  _#random_

 **@fritz.wijaya:** @fritz.wijaya has joined the channel  
 **@sandeep278:** @sandeep278 has joined the channel  
 **@abhiram.p:** @abhiram.p has joined the channel  

###  _#troubleshooting_

 **@fritz.wijaya:** @fritz.wijaya has joined the channel  
 **@sandeep278:** @sandeep278 has joined the channel  
 **@abhijeet.kushe:** I am implementing the pagination use-case based on  .I
found out that pagination only works without distinct clause but not when
distinct is  that a limitation or a bug ?  
**@mayanks:** Distinct is currently modeled as an aggregation function, not
**@abhijeet.kushe:** Thanks Mayank.Distinct does work with aggregation as well  
**@mayanks:** I meant that pagination is supported for `selection` and not
`aggregation`, and Distinct is implemented as `aggregation` , which probably
explains the reason for what you are seeing.  
**@abhijeet.kushe:** Ok is that going to be added in future release ?  
**@mayanks:** I think @atri.sharma was planning on picking it up.  
**@atri.sharma:** Pagination? Yes  
**@abhijeet.kushe:** Thanks  
 **@abhiram.p:** @abhiram.p has joined the channel  
 **@bagi.priyank:** Hello, I am trying to use trino connector and running into
following error while trying to query pinot via trino ```>>> import trino >>>
conn = trino.dbapi.connect(host='<redacted>', port=8443, catalog='pinot',
schema='default', http_scheme='https',
auth=trino.auth.BasicAuthentication("xxx", "yyyy")) >>> cur = conn.cursor()
>>> cur.execute('SELECT * FROM mytable LIMIT 10') <trino.client.TrinoResult
object at 0x10428d160> >>> rows = cur.fetchall() Traceback (most recent call
last): File "<stdin>", line 1, in <module> File
"/usr/local/lib/python3.8/site-packages/trino/", line 558, in fetchall
return list(self.genall()) File "/usr/local/lib/python3.8/site-
packages/trino/", line 509, in __iter__ rows = self._query.fetch()
File "/usr/local/lib/python3.8/site-packages/trino/", line 677, in
fetch status = self._request.process(response) File
"/usr/local/lib/python3.8/site-packages/trino/", line 440, in process
raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoQueryError: TrinoQueryError(type=INTERNAL_ERROR,
name=GENERIC_INTERNAL_ERROR, message="Failed communicating with server: ",
query_id=20220603_211510_00025_9srer)``` I am using the external ip of the
loadbalancer i.e `service/pinot-controller-external` with port 9000 for
`pinot.controller-urls` . If it helps, I am using community provided helm
chart to stand up the pinot infrastructure on AWS EKS.  
**@xiangfu0:** Trino has to be deployed in the same k8s cluster as pinot  
**@bagi.priyank:** I see, so no way to use it without deploying to the same
k8s cluster?  

###  _#pinot-k8s-operator_

 **@bagi.priyank:** @bagi.priyank has left the channel  

###  _#pinot-perf-tuning_

 **@bagi.priyank:** @bagi.priyank has left the channel  

###  _#getting-started_

 **@fritz.wijaya:** @fritz.wijaya has joined the channel  
 **@sandeep278:** @sandeep278 has joined the channel  
 **@abhiram.p:** @abhiram.p has joined the channel  

###  _#segment-write-api_

 **@filipdolinski:** @filipdolinski has joined the channel  

###  _#introductions_

 **@fritz.wijaya:** @fritz.wijaya has joined the channel  
 **@visar:** HI everyone, I'm Visar. Been working on a public CDN for the past
year. Currently me and my team are migrating our HTTP analytics and monitoring
platform from ELK stack to Apache Pinot. Excited to be part of the community.
 **@sandeep278:** @sandeep278 has joined the channel  
 **@abhiram.p:** @abhiram.p has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: For additional commands,