You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/06/04 02:00:17 UTC
Apache Pinot Daily Email Digest (2021-06-03)

### _#general_

  
 **@lakshmanan.velusamy:** Hello Community, We are trying to add user defined
scalar functions. Can these functions be used in star tree index for pre-
aggregation ?  
**@g.kishore:** Scalar functions are one to one. Aggregation Functions are
many to one. you will have to implement ```public interface
AggregationFunction<IntermediateResult, FinalResult extends Comparable> ```  
**@lakshmanan.velusamy:** Thanks @g.kishore. Does the star tree index support
custom aggregate functions ?  
**@g.kishore:** yes, you will have to implement the interface I pasted above
and have that available in class path  
 **@kkmagic99:** @kkmagic99 has joined the channel  
 **@jmeyer:** Hello :slightly_smiling_face: Is there a whirlwind tour of
Pinot's code base available somewhere ? Some pointers on where to start ?  
**@mayanks:** What would you like to do? Based on that I can point you to the
right direction in the code. You can familiarize yourself with Pinot
architecture (if not already done)  
**@jmeyer:** To get to know & understand Pinot more. I think I've read most of
the existing documentation and wondering how to get a better understanding
overall :)  
**@jmeyer:** But yeah, a sort of codebase ''entrypoint'' would be a start
:sourire: (for queries, for example)  
**@npawar:** What helped me was running the quick start and then following the
query in debug mode, and also ingestion. Another good way would be to pick up
a beginner issue from GitHub. Lemme know if you'd like to do that,I can point
you to some!  
**@mayanks:** Yep this ^^  
 **@savio.teles:** Hello. What happens when upsert during the real-time
ingestion with primary key and event time equals? The documentation says:
"When two records of the same primary key are ingested, _the record with the
greater event time (as defined by the time column) is used_.". But when there
is a tie, what happens?  
**@g.kishore:** @yupeng @jackie.jxt will know the exact answer. As part of
partial upsert work we are doing, we will make the merging logic
pluggable/configurable  
**@yupeng:** Currently the behavior is undefined, so it’s implementation based
which is the message that has largest offset. However, there are caveats such
as for the case where the records are sorted by some column, the order is not
determined  
**@mayanks:** @yupeng mind adding to docs/FAQ?  
**@savio.teles:** I have an upsert scenario that I am not getting my head
around. We have created an order table (configuration in the file below) with
primary_key being "kid" and the time column being "_order_date". We have
duplicated rows (same "kid" and "_order_date") in our table. But, upsert is
keeping duplicates at the end. Depending on the filter of the query it returns
duplicate items to me. See the screenshots below.  
**@jackie.jxt:** I don’t see upsert is enabled in the table config. Please
follow the instructions here to enable the upsert:  
**@jackie.jxt:** Please make sure the Kafka stream is partitioned on the
primary key  
**@savio.teles:** Sorry @jackie.jxt. Here is the updated file! The one I had
sent was from the previous version. The error happens with the table created
with this schema below.  
**@savio.teles:**  
**@jackie.jxt:** Is the Kafka stream partitioned with the primary key?  
**@jackie.jxt:** Can you try this query: select kid, $hostName from orders2
where kid = ‘ever max-100000009’  
**@yupeng:** @mayanks sure thing  
**@savio.teles:** @jackie.jxt. The results to the query are in the image
below. I don't know if the Kafka stream is partitioned with the primary key. I
will check here and get back to you. Tks  
**@jackie.jxt:** @savio.teles The kafka stream is not properly partitioned,
and the same `kid` shows up in 2 different partitions  
**@savio.teles:** Tks, @jackie.jxt.  
 **@cgoddard:** @cgoddard has joined the channel  

###  _#random_

  
 **@kkmagic99:** @kkmagic99 has joined the channel  
 **@cgoddard:** @cgoddard has joined the channel  

###  _#feat-presto-connector_

  
 **@keweishang:** @keweishang has joined the channel  

###  _#troubleshooting_

  
 **@elon.azoulay:** We ran into an issue where a thread was blocked, and it
caused the entire cluster to stop processing queries due to the worker (`pqw`
threads in default scheduler) threadpool on one server being blocked. Would it
help to make the worker thread pool use a cached threadpool while still
keeping the query runner threadpool (`pqr` threads in default scheduler)
fixed? Or do you recommend using one of the other query schedulers? Here is
the thread dump:  
**@mayanks:** Hmm, could you elaborate on how just one thread being blocked
caused the entire cluster to stop processing?  
**@mayanks:** Also, any insights on why it was blocked?  
**@elon.azoulay:** Yep, was really strange but happened multiple times,
coinciding with large # of segments scanned for a particular table. Only one
`pqr` threadpool on one worker was blocked, but no queries in that tenant were
completing.  
**@elon.azoulay:** We would see messages in the logs like this:  
**@elon.azoulay:**  
**@elon.azoulay:** Thread was blocked (or deadlocked?) reading.  
**@elon.azoulay:** maybe it was deadlocked with a thread ingesting realtime
segments?  
**@mayanks:** I think the root cause might be something else, I don't see how
one thread can block the entire cluster.  
**@elon.azoulay:** Either way each time it happened just bouncing that one
server fixed the issue.  
**@elon.azoulay:** I checked threads on all the servers in that tenant and
only one particular server would have `pqw` threads in BLOCKED state with a
similar thread dump each time  
**@elon.azoulay:** I did see that direct memory dropped and mmapped memory
suddenly would spike each time this happened.  
**@elon.azoulay:** And no queries completed (for tables on that tenant) until
the server was restarted.  
**@mayanks:** That probably means segment completion  
**@elon.azoulay:** Any idea what would cause that?  
**@mayanks:** The direct mem vs mmap is segment completion.  
**@mayanks:** Are you saying one thread in one server is causing entire
cluster to stop processing?  
**@mayanks:** Or are you saying all/several pqw in one server?  
**@elon.azoulay:** Only for queries hitting tables on that tenant  
**@elon.azoulay:** and only 1 server had blocked pqw threads, always with the
same thread dump  
**@mayanks:** Ok, multiple blocked pqw threads on 1 server, then?  
**@elon.azoulay:** From the thread dump I see this code:  
**@elon.azoulay:** ``` public void add(int dictId, int docId) { if
(_bitmaps.size() == dictId) { // Bitmap for the dictionary id does not exist,
add a new bitmap into the list ThreadSafeMutableRoaringBitmap bitmap = new
ThreadSafeMutableRoaringBitmap(docId); try { _writeLock.lock();
_bitmaps.add(bitmap); } finally { _writeLock.unlock(); } } else { // Bitmap
for the dictionary id already exists, check and add document id into the
bitmap _bitmaps.get(dictId).add(docId); } }```  
**@elon.azoulay:** Maybe bcz a segment was added while the unsynchronized code
was getting the size or calling get? (i.e. the first and last lines)  
**@elon.azoulay:** And although it was multiple pqw threads blocked on 1
server, there were still runnable pqw threads. Each time this happened it was
the same scenario: 1 server, 1-3 threads blocked with that same thread dump.  
**@mayanks:**  
**@mayanks:** Are you using text index?  
**@mayanks:** if not then nm  
**@elon.azoulay:** ok:)  
**@elon.azoulay:** we're not using text indexes on this table at all  
**@mayanks:** what release?  
**@elon.azoulay:** 0.6.0  
**@elon.azoulay:** we're testing 0.7.1, do you recommend we upgrade?  
**@mayanks:** I am curious to know what is happening here. I didn't catch any
race conditions in my radar recently.  
**@elon.azoulay:** yep, I just noticed that the size() and get() are called
unsynchronized, but it doesn't seem like that would lead to any deadlocks...  
**@elon.azoulay:** From the logs it looks like a thread was stuck reading -
also not sure why it would block other queries from completing.  
**@mayanks:** I can see how multiple pqw threads blocked can cause this. But
single one, I am not so sure.  
**@elon.azoulay:** there were multiple pqw threads blocked (1-3 out of 8 ) but
there were also runnable pqw threads each time  
**@mayanks:** There's a metric for wait time in scheduler queue for the query  
**@elon.azoulay:** Nice, is it in the fcfs code?  
**@mayanks:** ```ServerQueryExecutorV1Impl```  
**@elon.azoulay:** thanks!  
**@mayanks:** Check for SCHEDULER_WAIT  
**@mayanks:** see if that spikes  
**@mayanks:** If so, then it is building a backlog  
**@mayanks:** One way to mitigate that is to set table level timeout  
**@elon.azoulay:** ok, and the other thing that happened each time was a huge
(like 75%) drop in direct memory used, and a spike in mmapped memory.  
**@elon.azoulay:** Oh nice, how do we set the timeout?  
**@mayanks:** ```QueryConfig```  
**@mayanks:** Inside of Table config  
**@mayanks:** Although I think since in java you cannot force interrupt a
thread, not sure if that will preempt this blocked thread.  
**@mayanks:** Perhaps it does  
**@mayanks:** But again, this is mitigation.  
**@mayanks:** I'd recommend filing an issue with as much details as possible.
We should get to the root cause.  
**@elon.azoulay:** Sure, I saved a bunch of logs and metrics...  
**@mayanks:** Seems like some race condition might be causing some threads to
block, building a backlog (please check the scheduler wait metric)  
**@elon.azoulay:** Would changing pqw to a cached threadpool be dangerous?  
**@elon.azoulay:** Yep, I'm looking for it now, thanks!  
**@mayanks:** need to think about repercussions of that. I'd rather get to
bottom of the issue instead of trying something.  
**@elon.azoulay:** yep, makes sense  
**@elon.azoulay:** thanks, I'll look into the metrics  
**@mayanks:** Please also include info on if segment commit was happening, and
if that failed or something, any special characteristics of the table/kafka-
topic on the table this happens, etc.  
**@mayanks:** This will help get some clues.  
**@elon.azoulay:** Sure, I'll file an issue, thanks for the advice!  
**@mayanks:** thanks for reporting.  
**@elon.azoulay:** So I do see spikes in the scheduler wait metric:  
**@elon.azoulay:**  
**@mayanks:** Do they correspond to the time when issue happened? I’d so try
setting table level timeout as a fallback but we should still debug the issue  
**@elon.azoulay:** yep  
**@elon.azoulay:** also another note: using the trino connector I could query
the servers directly with no blocks or delays  
**@elon.azoulay:** same table...  
**@elon.azoulay:** it was only broker queries which were blocked - not sure
why restarting 1 server would fix the issue each time.  
**@mayanks:** Hmm, broker to server connection is async now (since a long
time), so not sure why that would happen  
**@mayanks:** Oh I think it could be because Trino to server connection uses
different thread pool? (not shared with pqw)?  
**@mayanks:** Also tagging @jackie.jxt  
**@jackie.jxt:** If it is always the same server that is blocking, maybe check
hardware failure?  
**@jackie.jxt:** E.g. block on IO because one segment is unreadable  
**@jackie.jxt:** (Just random guess)  
**@ken:** Hmm, I had something similar happen when I did a `DISTINCT` query on
a column with very high cardinality. A broker somehow got “wedged”, and would
no longer process queries - we had to bounce it. IIRC the guess at that time
was some issue with the connection pool getting hung, when the response from a
server was too big.  
**@mayanks:** Yeah so the end result is queries piling up and timing out. And
may be mitigated by choosing a more appropriate table level query timeout. In
your case it was a expensive query. but here I am more concerned about the
possible deadlock  
**@elon.azoulay:** Yeah, it happened today again - sorry for the delay. I
saved thread dumps (all similar). Today it was multiple servers, all pqw pool.  
**@elon.azoulay:** And the schedule wait metric showed pileups again. Seemed
like it wasn't one large query, many small ones blocked...  
 **@elon.azoulay:**  
 **@kkmagic99:** @kkmagic99 has joined the channel  
 **@patidar.rahul8392:** It's showing this msg but data is not reflecting in
table.  
 **@patidar.rahul8392:** This is the configuration file  
 **@patidar.rahul8392:** Hi Everyone, I am Loading Data from HDFS location to
Pinot Hybrid Table.I have Pushed data for 5 days and executed this command 5
time ,one time for each day file. hadoop jar \
${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-
dependencies.jar \
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
-jobSpecFile /home/rtrs/hybrid/config/final/executionFrameworkSpec.yaml In the
end when I am doing select * from tablename_OFFLINE. I am able to see only
latest data .i.e. 5th day's data. This is the timestamp column value in my
data "current_ts":"2021-05-30T23:34:31.624000" This is the details from Schema
file for timestamp column. "dateTimeFieldSpecs": [ { "name": "current_ts",
"dataType": "STRING", "format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-
dd'T'HH:mm:ss.SSSSSS", "granularity": "1:MILLISECONDS" } This is the details
from offline_config.json file "tableType": "OFFLINE", "segmentsConfig": {
"timeColumnName": "current_ts", "replication": "1", "replicasPerPartition":
"1", Looks like some timestamp Issue. Kindly suggest what i need to change
here.  
**@jackie.jxt:** Can you please check how many segments are there in this
table?  
**@patidar.rahul8392:** @jackie.jxt gimme 2 min I m trying to reload the
files. Will share the details within 5 min  
**@patidar.rahul8392:** It's creating one one segment @jackie.jxt  
**@patidar.rahul8392:** Segment status is showing as good and server is stats
is Online for this  
**@jackie.jxt:** Do you mean only one segment in the table? What is the
segment name? I'm suspecting the job is generating the segment with same name
everyday and keep replacing the segment  
**@patidar.rahul8392:** Segment name is exactly same as my tablename  
**@patidar.rahul8392:** Which I have mention as segment.name.prefix in my conf
file  
**@patidar.rahul8392:**  
**@patidar.rahul8392:** This is the segment name @jackie.jxt  
**@jackie.jxt:** In your table config, do you specify the `pushType` as
`REFRESH`?  
**@patidar.rahul8392:** Yes you were try @jackie.jxt it's replacing the
segments I have tried different segment name for next file so now 2 segments
are the and data is the for 2 days  
**@patidar.rahul8392:**  
**@patidar.rahul8392:** Where I need to specify this @jackie.jxt I am not
using this property  
**@patidar.rahul8392:** It offline table config also I am not using this
property  
**@jackie.jxt:** What is your `ingestionConfig`?  
**@jackie.jxt:** You can have it like: ```"ingestionConfig": { ...,
"batchIngestionConfig": { "segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "DAILY" } }```  
**@patidar.rahul8392:** Ok so this changes only I need to add in offline-
table-config.json rtr? Not for realtime-config.json?  
**@patidar.rahul8392:**  
**@jackie.jxt:** Yes, no need to add it to the realtime-table-config  
**@ken:** What is in your `executionFrameworkSpec.yaml` file? You typically
need a section like: ```segmentNameGeneratorSpec: # type: Current supported
types are 'simple' and 'normalizedDate'. type: normalizedDate configs:
segment.name.prefix: 'ads_batch'``` So that segment names vary by date. If you
would wind up with multiple results that have the same name (by date) you can
add `exclude.sequence.id: false` to the `configs` section.  
 **@patidar.rahul8392:** @npawar @fx19880617 @ken @mayanks @elon.azoulay  
 **@cgoddard:** @cgoddard has joined the channel  

###  _#pinot-k8s-operator_

  
 **@vananth22:** @vananth22 has joined the channel  

###  _#pinot-dev_

  
 **@vananth22:** @vananth22 has joined the channel  
 **@vananth22:** :wave: I'm curious to know if there any recent conversation
around supporting the window functions in Pinot :bow:  
**@g.kishore:** Yes.. we plan to add something more powerful than window
function.. match_recognize  

### _#getting-started_

  
 **@keweishang:** @keweishang has joined the channel  
 **@hongtaozhang:** @hongtaozhang has joined the channel  

###  _#complex-type-support_

  
 **@vananth22:** @vananth22 has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org