You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/11/22 02:00:14 UTC

Apache Pinot Daily Email Digest (2020-11-21)

### _#general_

  
 **@saurabh.dhupar:** @saurabh.dhupar has joined the channel  
 **@pabraham.usa:** I am executing a query and getting following error,
`select length(mydata) from mytable where length(mydata) > 500 LIMIT 10`.
Other queries are working fine. Wondering whether this a query issue or
cluster setup / index issue? ``` ERROR SplitRunner-0-38
com.facebook.presto.execution.executor.TaskExecutor Error processing Split
20201121_222456_00039_rmrgi.1.0.0-4 PinotSplit{connectorId=pinot,
splitType=SEGMENT, columnHandle=[PinotColumnHandle{columnName=mydata,
dataType=varchar, type=REGULAR}], segmentPinotQuery=Optional[SELECT mydata
FROM mytable_REALTIME LIMIT 2147483647], brokerPinotQuery=Optional.empty,
segments=[mytable__0__7__20201121T2220Z], segmentHost=Optional[Server_test-
pinot-server-1.test-pinot-server-headless_8098]} (start = 2824864.565796, wall
= 3 ms, cpu = 0 ms, wait = 0 ms, calls = 1): PINOT_UNCLASSIFIED_ERROR: Error
when hitting host Server_test-pinot-server-1.sb-pinot-server-headless_8098
with pinot query "SELECT mydata FROM mytable_REALTIME LIMIT 2147483647"```  
**@g.kishore:** is this with Presto/Pinot connector?  
**@pabraham.usa:** yes that is correct @g.kishore  
**@g.kishore:** Does it work if you run it directly on Pinot?  
**@g.kishore:** prestodb or prestosql  
**@pabraham.usa:** good thinking, just tried and it works in Pinot  
**@g.kishore:** looks like the connector is not pushing the udf down to Pinot  
**@g.kishore:** ```SELECT mydata FROM mytable_REALTIME LIMIT 2147483647```  
**@g.kishore:** its trying to pull everything out of Pinot  
**@pabraham.usa:** that query worked once earlier , So do you think I have to
check Pinot presto connector settings?  
**@g.kishore:** which presto are you using  
**@g.kishore:** prestodb or prestosql  
**@pabraham.usa:** prestodb the one from incubator-pinot github chart  
**@pabraham.usa:** The query always works without where clause like `select
length(mydata) from mytable limit 500`  
**@pabraham.usa:** @g.kishore, After clearing the caches for presto the query
started to work as normal. Thanks for pointing the issue .  
**@g.kishore:** it might be working but its not efficient  
**@pabraham.usa:** Thats correct there is a latency comparing with query
directly hitting pinot and executing via presto  
**@pabraham.usa:** I was expecting the same performance though  
**@g.kishore:** its basically pulling all the data from Pinot into Presto and
doing the computation in Presto, you want most of the computation to run in
Pinot  
**@g.kishore:** yeah, its a matter of telling Presto to push this function
down to Pinot  
**@g.kishore:** @fx19880617 is there a way to configure this or it requires a
code change in the connector?  
**@fx19880617:** is mydata a multi-value column?  
**@fx19880617:** I think presto has a function called array_length?  
**@pabraham.usa:** @fx19880617 mydata is a STRING with maxsize 10000  
**@pabraham.usa:** mapped as VARCHAR in presto  
**@fx19880617:** i see, I think string length method is not pushed down
through presto  
**@fx19880617:** it requires code change to do that  
**@fx19880617:** current approach is that presto will read all the strings
back and compute the string length  
**@fx19880617:** it’s definitely inefficient  
**@pabraham.usa:** Ohh ok , most of the other methods are available right? so
far I face issue with length only. Trying others as well  
**@pabraham.usa:** yes it is extremely slow  
**@pabraham.usa:** I now installed pinot driver in superset and trying to
split the diff types of queries  
**@fx19880617:** for aggregations, we pushed down them, but those string
functions we don’t pushdown yet  
**@fx19880617:** split the diff types?  
**@fx19880617:** you mean register both presto and pinot tables?  
**@pabraham.usa:** The ones require join from other datasources I can use
presto driver and the queries that can be served just by pinot I can use
pinot.  
**@pabraham.usa:** yes correct  
**@fx19880617:** got it  
**@pabraham.usa:** direct Pinot queries are performing well  
**@fx19880617:** yes, that’s why we need function pushdown
:slightly_smiling_face:  
**@pabraham.usa:** Hope that will get implemented at some point soon, Also
came across SQL Passthrough  
**@fx19880617:** yes, prestosql allows to directly pushdown whatever your
wrote to pinot. That is good for users know how pinot query works and syntax  
**@pabraham.usa:** Thanks will try and see  
 **@fx19880617:** is mydata a multi-value column?  
 **@kmlamkin:** @kmlamkin has joined the channel  

###  _#random_

  
 **@saurabh.dhupar:** @saurabh.dhupar has joined the channel  
 **@kmlamkin:** @kmlamkin has joined the channel  

###  _#troubleshooting_

  
 **@elon.azoulay:** We had users who accidentally produced malformed data to a
kafka topic. The realtime segments were in an "offline" state and then we saw
log messages that the segments were removed, we do not see them in deleted
segments, deep store or the servers. The users showed that non corrupt data
from what were consuming segments was also missing (maybe from the bad
segments?). Is there anything that would cause that behavior? Is that
expected?  
**@elon.azoulay:** Here is a snippet from the logs:  
**@elon.azoulay:**  
**@elon.azoulay:** Let me know if you need more logs  

###  _#announcements_

  
 **@kmlamkin:** @kmlamkin has joined the channel  
 **@kmlamkin:** @kmlamkin has left the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org