You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/09/17 02:00:20 UTC
Apache Pinot Daily Email Digest (2021-09-16)

### _#general_

  
 **@prtk.ngm:** Hello All,  i am follwing this to configure spark job for
ingestion, can we give input staging Directory as HDFS Directory. I don't want
to use Hadoop utility jar to ingest data.  
**@ken:** I assume you can use HDFS for the staging directory, since we do
that for batch segment generation with Hadoop  
 **@raj.swarnim:** @raj.swarnim has joined the channel  
 **@sheetalarun.kadam2:** Hi all, I want to do insert of a single record to
Pinot from an application. Setting up Kafka for real time ingestion seems too
complicated for very small volume insert calls. Is there some other way?  
**@g.kishore:** Pinot does not support row level insert api as of now. You can
use the batch api  
**@npawar:** You can try out the SegmentWriter interface. There’s no document
yet, but this test demonstrates how to use  But in any case, creating segments
with single row isn’t the best idea  
**@sheetalarun.kadam2:** Thanks I will check it out. But yes the single row
thing is what’s troubling me. I use the table for searchbox so I don’t want to
batch process  
**@ken:** Are you trying to use Pinot for near real time search?  
**@sheetalarun.kadam2:** Its a normal search bar. It will be a regex query on
one of the columns. The reason to use Pinot is I have some dashboarding needs
which require fast aggregations. Having a different database like Mysql for
just one table (the search query one) seemed like an unnecessary layer. So I
am thinking to use Pinot for the search  
**@ken:** OK, but normally adding row-by-row (not batch) means you want near-
real time (NRT) search. As in, soon after data is available you want it to be
searchable. Otherwise you could just use batch to generate segments every day
(as an example).  
**@sheetalarun.kadam2:** oh yes, I want it near real time. Data should
available for all as soon as insert is done  
**@g.kishore:** If you are planning to use this in production and expect
strong guarantee.. it’s better to use Kafka  
**@becca.silverman:** @becca.silverman has joined the channel  
 **@leb9882:** @leb9882 has joined the channel  
 **@sanipindi:** @sanipindi has joined the channel  

###  _#random_

  
 **@raj.swarnim:** @raj.swarnim has joined the channel  
 **@becca.silverman:** @becca.silverman has joined the channel  
 **@leb9882:** @leb9882 has joined the channel  
 **@sanipindi:** @sanipindi has joined the channel  

###  _#troubleshooting_

  
 **@raj.swarnim:** @raj.swarnim has joined the channel  
 **@dadelcas:** Hello, I'm trying to configure presto to query pinot tables.
The catalogue seems fine, I can show tables in Pinot. However when I issue a
query I get the following error: `Query <id> failed: Cannot fetch from cache
<table>` Any hints to fix this error would be appreciated  
**@mayanks:** Are you able to run Pinot query directly? Also can you run
explain on presto  
**@dadelcas:** The query runs in Pinot, is a simple statement `select * from
<table> limit 1` . Explain returns the same error  
**@dadelcas:** I'm running Pinot 0.8.0 and Presto 0.261  
 **@becca.silverman:** @becca.silverman has joined the channel  
 **@anu110195:** Is there any way to check slow queries in pinot ?  
**@mayanks:** You can look at the broker logs for details on whether it was a
server, multiple servers, or broker that caused the issue. You can also look
at response metadata to check how much work the query did (in terms of
scanning / selecting docs, etc);  
**@leb9882:** @leb9882 has joined the channel  
 **@raj.swarnim:** Can anyone help me to understand, what this error means -
`Error: Could not find or load main class
org.apache.pinot.thirdeye.anomaly.ThirdEyeAnomalyApplication`. And how to fix
this, don't have much experience in Java.  
**@mayanks:** @pyne.suvodeep ^^  
**@pyne.suvodeep:** Hi @raj.swarnim It means that java is unable to find that
class in the classpath. Can you share the steps through which you got to this
error?  
 **@sanipindi:** @sanipindi has joined the channel  
 **@qianbo.wang:** Hi, I have a question about `DATETIMECONVERT` . It mentions
it buckets the time based on the given time granularity, but what is the start
of the first bucket? e.g. when running this query: ```SELECT
DATETIMECONVERT(time_col, '1:MILLISECONDS:EPOCH', '1:SECONDS:EPOCH',
'30:DAYS') as new_time_col, COUNT(id) FROM table WHERE (time_col BETWEEN
<epoch_second of 7/16> AND <epoch_second of 9/16>) AND GROUP BY new_time_col
ORDER BY new_time_col``` It returns 3 buckets for: 7/1, 7/31 and 8/30. So I
wonder how is this being calculated?  
**@qianbo.wang:** For a bit of more context, we are trying to bucket our data
with size of 30-day bucket to categorize their age.  
**@qianbo.wang:** Same result using ```DATETIMECONVERT(time_col,
'1:MILLISECONDS:EPOCH', '1:SECONDS:EPOCH', '720:HOURS')```  
**@jackie.jxt:** The start of the first bucket is Unix epoch time, and we use
millis since epoch to calculate the time bucket  
**@qianbo.wang:** ah, I see. thanks!  

###  _#thirdeye-pinot_

  
 **@raj.swarnim:** @raj.swarnim has joined the channel  

###  _#getting-started_

  
 **@zineb.raiiss:** Hello friends, I want to test the ThirdEye solution for
Pinot anomaly detection, so I followed the documentation , but failed to
connect to  
**@zineb.raiiss:** Do You have any idea?  
**@npawar:** @pyne.suvodeep  
 **@tiger:** Any tips for debugging slow queries? I was stress testing my
cluster, and noticed a behavior where when I send a bunch of queries at once,
the query latency goes from ~100ms to 4-5 seconds. The latency then stays
relatively high for a few minutes after the stress test and then returns back
to ~100ms. I also noticed behavior where sometimes a single server would take
significantly longer to process a query, which ends up increasing the overall
latency by a lot. That one slow server also stays consistently slow for a
while, so every query is bottlenecked by that server. Thanks!  
**@kulbir.nijjer:** This might be a good start:  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org