You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/01/14 02:00:17 UTC

Apache Pinot Daily Email Digest (2021-01-13)

### _#general_

**@james.zhao619:** @james.zhao619 has joined the channel
**@amitchopra:** Quick question - when i modify a table to add inverted index
to the config and then do a reload, i can see that the segment has the
inverted index added and affect on queries. But the reload does not work for
star index? Is this expected?
**@g.kishore:** it does work on star-tree as well, I think it needs a flag to
turn on auto-reload for star-tree (@jackie.jxt, can you please confirm)
**@jackie.jxt:** Yes, please add the `enableDynamicStarTreeCreation` into
your index config, see for more details
**@amitchopra:** Thanks @jackie.jxt. Will try it out
**@amitchopra:** BTW, when i removed the inverted index from table config and
reloaded all segments, it did not remove the inverted index from the actual
segment. FYI - @jackie.jxt @npawar
**@jackie.jxt:** That is expected. We don’t support removing indexes, but it
should not be loaded
**@amitchopra:** ok, got it
**@npawar:** should this be made default true @jackie.jxt ?
**@npawar:** not very intuitive that we enable a special flag just for star-
tree?
**@jackie.jxt:** Agree. We have it disabled by default to protect the
servers, but I think most users want it enabled
**@amitchopra:** BTW - on another topic related to reload, i saw that there
is a proposal to show the status of reload - . Just wanted to say that i think
that will be super useful. Right now i did not find a way to know whether the
segments were reloaded or not. Might want to have some sort of desired vs
realized state API for same.
**@parvezht:** @parvezht has joined the channel
**@npawar:** @amitchopra we have the reload status API already. Works only
for offline tables so far. You can check it out in the cluster manager on the
table details page @darshants.darshan1 is working on adding the API support
for realtime tables.
**@amitchopra:** @npawar Is this only available on the API? Or also on the UI?
Also can you share the API
**@amitchopra:** On the UI i see reload status button under cluster manager.
Does that basically show the status after all segments have been updated?
**@npawar:** it will show whether an index is present or not for a particular
column. It is agnostic to when the reload happened. But it is a good indicator
- say you didnt have inverted index, then the reload status would’ve shown X.
And then if you applied inv index and pressed Reload, the Reload Status should
show :white_check_mark:
**@npawar:** it only looks at segment metadata
**@amitchopra:** Got it, thanks.
**@amitchopra:** Few question on sorted index: 1\. I was trying to create a
sorted index on a STRING column. But it was not working. Then i tried it on a
INT column and it worked. Is sorted index only supported on INT (or LONG)
types? 2\. I see isSorted = true in metadata.properties file for the event
time as well as the metric column. Though i did not enable sorted index for
those. What does this imply? Given i remember it was mentioned that only one
column can be used as sorted index 3\. Related to above, if most queries will
have time in where clause, then should we add sorted index on time field? Or
is it more beneficial to add sorted index on a field (used often to filter)
other than time field?
**@mayanks:** ```1. Sorted index is independent of data type. Is this for
realtime or offline? In offline, the data is expected to be sorted before
pushing to Pinot. 2\. For a column with single value, the data is sorted by
definition. Check the cardinality of these columns. 3\. More beneficial to add
sorted index that is used often in filter and that has low selectivity ```
**@amitchopra:** Thanks @mayanks. For (1), i am using offline. I missed the
sorting requirement. Now i sorted it before the segment creation and i can see
isSorted true for the string data type column as well
**@mayanks:** :+1:
**@amitchopra:** not sure i understood point (2). All other dimensional
columns are single valued. But i see they are set to isSorted = false. Only
the column i added to the table config and the time and metric fields, i see
isSorted as true
**@mayanks:** If the metadata says it is sorted, then the column is sorted.
However, note that if all rows had a value of metric = 1, then by default that
is sorted, right?
**@amitchopra:** ah, i see. Makes sense for the metric field. But what the
time column?
**@amitchopra:** the input to segment creation had time sorted, so i believe
that’s what it is referring to then
**@mayanks:** yes
**@mayanks:** what is your time unit?
**@amitchopra:** seconds
**@mayanks:** I think if you sort by some other dimension, then time won't be
sorted
**@amitchopra:** got it.
**@amitchopra:** Last question on (3). By low selectivity, do you mean the
cardinality. Number of distinct values for that column?
**@mayanks:** selectivity - how many rows does that predicate select. low
selectivity - more rows eliminated - better performance.
**@amitchopra:** got it. Thanks again :pray:
**@mayanks:** :+1:
**@yupeng:** hey, the new cluster management UI is very convenient and
powerful (e.g. delete table)…. is there a plan to add access control to it
**@mayanks:** Yep - @moradi.sajjad ^^
**@moradi.sajjad:** Today I sent out a PR for authenticating calls on write
endpoints - the one which modify state of the system:
**@moradi.sajjad:**
**@moonbow:** @moonbow has joined the channel

### _#random_

**@james.zhao619:** @james.zhao619 has joined the channel
**@parvezht:** @parvezht has joined the channel
**@moonbow:** @moonbow has joined the channel

### _#troubleshooting_

**@d.sashikanth:** @d.sashikanth has joined the channel
**@james.zhao619:** @james.zhao619 has joined the channel
**@james.zhao619:** @g.kishore Hi Kishore Recent days,I try to install pinot
cluster according to the online document. But I have some questions in my
installation process. First, I don't understand the role of kafka component
role in whole pinot cluster. Second, when I try to StartServer paramenter to
start pinot server, how do I know if the server is realtime or offline? Could
you give me some guide, thank you.
**@contact:** A server isn't realtime or offline, its the segments that are
either realtime or offline
**@james.zhao619:** I see. Thank you for your reply.
**@james.zhao619:** @contact I don't know the kafka role in whole pinot
architecture. I couldn't find the illustration in online document.
**@contact:** Kafka is optional, thats only if you want to use realtime
segments
**@g.kishore:** +1 to what @contact said.
**@yash.agarwal:** When Uploading segments to controller using Segment Uri
Push, It is putting the segments in path
```fake_job_for_testing_time_based_OFFLINE/fake_job_for_testing_time_based_OFFLINE_18597_18597_0```
whereas when trying to delete the segments, it is calling exists on path
```fake_job_for_testing_time_based/fake_job_for_testing_time_based_OFFLINE_18597_18597_0```
which logs ```Failed to find local segment file for segment```
**@yash.agarwal:** I am on version 0.6.0
**@g.kishore:** Looks like a bug or maybe you are passing wrong table during
upload?
**@yash.agarwal:** ```fake_job_for_testing_time_based```
**@yash.agarwal:** I use this table during upload. in tableSpec
**@yash.agarwal:** I see the data getting populated correctly in the table.
**@g.kishore:** @fx19880617 ^^
**@fx19880617:** what's the download uri looks like in segment metadata?
**@yash.agarwal:** ``````
**@fx19880617:** do you have more logs or stacktrace for the error? which
scheme you are using? http or gs or s3/
**@yash.agarwal:** I have created my own filesystem.
**@yash.agarwal:**
**@yash.agarwal:** What is the correct path for storing offline segments ?
`fake_job_for_testing_time_based` or `fake_job_for_testing_time_based_OFFLINE`
**@yash.agarwal:** @fx19880617 Would something like this be needed or am I
missing something ?
**@fx19880617:** hmmm
**@fx19880617:** for your push job, did you set push uri prefix and suffix?
**@fx19880617:** I think it should be `fake_job_for_testing_time_based`
**@fx19880617:** i need to check why it puts _OFFLINE for the table name
**@parvezht:** @parvezht has joined the channel
**@moonbow:** @moonbow has joined the channel

### _#getting-started_

**@amitchopra:** Hi, Trying to understand the affect of cardinality of column
to the query response time. I did an experiment with a single segment (of
3999999 rows). With three variants of cardinality columns: 1\. low (100 unique
values) 2\. medium (10000 unique values) 3\. high (almost all are unique.
3912203) The kind of query i am testing with is basic group by queries. Have
not used any filters. For example - select processHash, count(processHash) as
aggreg from metrics group by processHash order by aggreg desc limit 10 With
above queries applied on low, medium and high cardinality column, i see
difference in query response time. Though *numEntriesScannedInFilter (0)* and
*numEntriesScannedPostFilter (*3999999) does not change across the queries.
The time i see are: 1\. low cardinality column - 59 ms 2\. medium - 107 ms 3\.
high - 432 ms Questions: 1\. Given there are no filters, why is there this
difference in query performance? 2\. Is there anything that can be done to
improve high cardinality column performance? I did try with star index. Which
helped to improve response time for low cardinality column but not for medium
and high cardinality columns (actually it became worse). Also tried with
inverted index - no affect. Though these will be more useful during filter
operation, i suppose.
**@g.kishore:** dictionary encoding vs non dictionary
**@g.kishore:** @jackie.jxt can explain it in more detail
**@amitchopra:** should i try with in that case?
**@jackie.jxt:** @amitchopra The difference is mainly caused by storing and
transferring different size of groups. For high cardinality column, raw index
might benefit from better locality and have better performance
**@jackie.jxt:** Is the data type all the same for the 3 column?
**@amitchopra:** @jackie.jxt the type is same (string). But the values are
different. For low cardinality element column.ruleId.bitsPerElement = 7
column.ruleId.lengthOfEachEntry = 8 Whereas for high cardinality element
column.processHash.bitsPerElement = 22 column.processHash.lengthOfEachEntry =
8
**@jackie.jxt:** I see. I think you can try raw value forward index for the
high cardinality column
**@amitchopra:** ok, thanks. Let me try and see what difference it makes
**@amitchopra:** I tried with no dictionary for medium and high cardinality
columns. Though the performance did not improve. It actually became worse
:disappointed:. FYI - @g.kishore @jackie.jxt
**@jackie.jxt:** Then the bottleneck for this query is storing and processing
all the groups instead of scanning the values. It is more efficient to store
dictionary ids comparing to store actual values
**@amitchopra:** got it, thanks
**@g.kishore:** will be good to put these numbers in an issue, there might be
some bottlenecks we are not aware of
**@amitchopra:** sure, i can open an issue :+1:
**@amitchopra:** I had one more question in regards to above - for medium
cardinality column (10000 unique values), i see *numGroupsLimitReached as
true.* As per documentation, default number of groups should be 100k. Then why
is it set to true. Also given i have limit of 10 on the query, shouldn’t the
engine handle this appropriately
**@jackie.jxt:** Do you have it configured explicitly? The config key is
`pinot.server.query.executor.num.groups.limit`
**@amitchopra:** @jackie.jxt - I believe you mean in pinot-server.conf? no i
haven’t set it.
**@jackie.jxt:** Hmm.. That is unexpected
**@jackie.jxt:** Do you run the query in PQL mode or SQL mode?
**@amitchopra:** SQL mode
**@jackie.jxt:** I just checked the code and we don't set it in SQL mode..
**@jackie.jxt:** Could you please file a github issue and put the details?
**@amitchopra:** sure. Let me file an issue. Just so that i understand, you
mean the 100k limit is not set? But what is the default limit in SQL mode
today then?
**@jackie.jxt:** We don't put the `numGroupsLimitReached` in SQL mode. I don't
know how it shows up in the response
**@amitchopra:** got it
**@amitchopra:** As per the second part of the question, given there was a
limit of 10 on the query, shouldn’t this be handled by the engine (even if it
was a column with more than 100k distinct values)
**@amitchopra:** @jackie.jxt - created issue
**@jackie.jxt:** If there are over 100k (by default) distinct groups within a
single segment, we will only store the first 100k groups to prevent servers
running out of memory for extremely expensive queries
**@amitchopra:** @jackie.jxt @g.kishore - What i am trying to say is that
given the query has limit of 10, can’t the engine only keep top 10 groups in
memory while doing the calculation. Just curious if this could be an
enhancement? Or is there a reason why this cannot be done?
**@g.kishore:** it will be wrong to keep only 10 groups
**@g.kishore:** results will be incorrect
**@jackie.jxt:** In order to aggregate the values for the groups, we need to
keep all groups, and sort the aggregated values in the end to get the 10 final
groups
**@g.kishore:** @jackie.jxt we dont keep 100k groups by default right, I think
its more like min(limit * some scaling factor, 100k)
**@jackie.jxt:** Within the segment, we keep all the groups. Then we aggregate
on these groups, sort them, and then keep `max(limit * 5, 5000)` groups
**@jackie.jxt:** Without sorting, we have to keep all the groups to get the
correct result
**@amitchopra:** I thought a little on this and realized the issue with the
proposal i was making. Thanks for the clarification :+1:
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org