You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/06/19 02:00:19 UTC

Apache Pinot Daily Email Digest (2021-06-18)

### _#general_

  
 **@gqian3:** Hi, is there some documents for how to configure Java client to
authenticate Pinot with TLS enabled, does it support both 1-way and 2-way
authentication?  
**@mayanks:**  
**@gqian3:** Thanks, but I mean how do we specify the Java client keystore and
trust store information when using a Java client?  
 **@neilteng233:** Hey, we are using presto on top of pinot. And we want to
build star-tree index on the table. The aggregation function is
DistinctCountHLL. And I will also use approx_distinct in prestoDB which is
also back by HLL. I am wondering will presto respect this star-tree index in
pinot?  
**@mayanks:** From presto code I see it might be supported: ``` private String
handleApproxDistinct(CallExpression aggregation,
Map<VariableReferenceExpression, Selection> inputSelections) {
List<RowExpression> inputs = aggregation.getArguments(); if (inputs.isEmpty()
|| inputs.size() > 2) { throw new PinotException(PINOT_UNSUPPORTED_EXPRESSION,
Optional.empty(), "Cannot handle approx_distinct function " + aggregation); }
Selection selection =
inputSelections.get(getVariableReference(inputs.get(0))); if (inputs.size() ==
1) { return format("DISTINCTCOUNTHLL(%s)", selection); } RowExpression
standardErrorInput = inputs.get(1); String standardErrorString; if
(standardErrorInput instanceof ConstantExpression) { standardErrorString =
getLiteralAsString((ConstantExpression) standardErrorInput); } ```  
**@mayanks:** @xiangfu0 to also confirm.  
**@mayanks:** In the meanwhile @neilteng233 could you `explain` the query on
presto side? It might show the query being sent to Pinot, where you can verify
if it sent DistinctCountHLL to Pinot  
**@xiangfu0:** Try to use explain to see the query plan  
**@xiangfu0:** We have rewritten the aggregation and filter parts to Pinot
query and push down  
**@mayanks:** I took the code snippet from  
**@mayanks:** But yeah, explain the query should tell the actual pinot sql  
**@xiangfu0:** Yes, approx_distinct will be converted to distinctCounthll  
**@neilteng233:** Thank you guy! Sorry for the late reply. my company's vpn
does not white list slack.  
**@neilteng233:** I will look into the presto explain.  
**@neilteng233:** it does convert the presto's approx_count to pinot's
countdistinctHHL. Thanks.  
 **@neilteng233:** Hey, can anyone recommend other materials related to the
"Raw value forward index" I am having a really difficult time understanding
the Raw value forward index example .  
**@mayanks:** What are you looking for? It just stores raw data chunk
compressed, as opposed to dictionary encoding  
**@neilteng233:** where does the chunk size come from? And the "chunkoffset =
docId % chunkSize" is hard to understand in the example. if the chunk is
compressed, what is the difference between it and the compression on disk as a
column-oriented DB? If purpose is to improve large sequential scan, do you
mean a scan on this col without any where clause? If there is where clause, I
think we still need to check each value.  
**@neilteng233:** The example I am referring to:  
**@mayanks:** The math (modulo etc) is on uncompressed chunk. Compression is
for on-disk index.  
**@mayanks:** Say you wanted to read docId 1 to 1000. In case of dictionary,
the dict encoding may scatter these 1000 values all over the disk (in the
worst case requiring 1000 disk seeks). In case of raw index, there is no
dictionary, and all 1000 values would be contiguous on disk (minimizing disk
seeks)  
**@mayanks:** Typically, you want to use this for high cardinality string
columns, where dictionary encoding does not provide much compression.  
**@neilteng233:** I think I missed a point here -- the indexed column is
always sorted.  
**@mayanks:** No, only the sorted column is sorted. And dictionaries are
sorted.  
**@neilteng233:** OK. What does those pointer from colA to colB trying to say?  
**@mayanks:** So consider this: ```Your use case has queries mostly for a
primary column (eg where customerId = xxx). If you sort on customerId, then
you will always pick contiguous docIds for a given query. Now consider you
have a high cardinality string column that you project in the query. With
dictionary, the fwd index will have dictionary ids, that may point to
different disk blocks. Without dictionary for this high cardinality column,
the contiguous docIds will correspond to contiguous disk blocks.```  
**@mayanks:** Hopefully that makes sense?  
**@neilteng233:** OK, I think I understand it. I have a question about "sort
on customerId", do we mean all the columns are sorted with the same order as
customerId. how do we config that all the records sorted according to one
columns in the disk?  
**@mayanks:** Yes, that is implicit. A docId represents a row in the table and
has to match across columns, nothing special needs to be done for that  
**@neilteng233:** Is docId a theoretical auto-incremental UUID in pinot or a
primary key we actual specify? But I dont see pinot has a concept of primary
key.  
**@neilteng233:** because "A docId represents a row in the table and has to
match across columns", I think for a column-oriented DB, every column is
sorted with this docId and compressed in default. That is the way data lay out
in the disk.  
**@mayanks:** docId is just a contiguous integer (0, 1, 2, 3...) in the scope
of a Pinot segment  
**@mayanks:** `I think for a column-oriented DB, every column is sorted with
this docId and compressed in default. That is the way data lay out in the
disk.`  
**@mayanks:** Hmm, then how do you identify a row across columns. If you sort
each column independently you will loose which value in colA corresponds to
which value in colB. I am not sure what other column oriented DBs do, but
Pinot does not do this  
**@neilteng233:** By sorted, I just mean layer out in the order as the docID
does.  
**@neilteng233:** I think we mean the same thing.  
**@mayanks:** Yes, seems so  
**@neilteng233:** wait, "dictionaries are sorted", do you mean the docID is
sorted according to the indexed column?  
**@mayanks:** dictionary is separate from docId  
**@neilteng233:** I am sorry, it is not.  
**@neilteng233:** OK, back to the raw value forward index, if the data in disk
are already in the order of docId, what is the meaning of it?  
**@mayanks:** From docId you get dictionaryId  
**@mayanks:** The actual data for that dictionary id can be anywhere on disk  
**@mayanks:**  
**@mayanks:** Look at the forward index section to understand docId -> dictId
-> rawData  
**@neilteng233:** thanks, I understand the Dictionary-encoded forward index.  
**@neilteng233:** Just I am not sure why we need to specify the "raw value
forward index" because the data in the disk is already in that way.  
**@mayanks:** It is not  
**@mayanks:** The dictionary of a column is generated by "sorting values of
the column". dictId = 0 is the first sorted value and so on.  
**@neilteng233:** yes.  
**@neilteng233:** Do you mean the data in the same column are not sitting next
to each other in the disk in some cases?  
 **@thiagopsnfg:** @thiagopsnfg has joined the channel  
 **@ravinder2021.kr:** @ravinder2021.kr has joined the channel  

###  _#random_

  
 **@thiagopsnfg:** @thiagopsnfg has joined the channel  
 **@ravinder2021.kr:** @ravinder2021.kr has joined the channel  

###  _#troubleshooting_

  
 **@laxman:** Hi All, can someone please point me to some detailed
documentation on metric aggregation in Pinot. Documentation I found on this is
very limited. I’m looking for following information. • Does REALTIME tables
support aggregation/rollup during ingestion? • What are the different types of
aggregation types supported (max, min, sum and anymore?)? • Any known
limitations in using aggregations in REALTIME & OFFLINE tables? • Any general
best practices and gotchas with aggregations/rollups?  
**@mayanks:**  
**@mayanks:** Let me know which parts need more information and I'll update
the docs. Or you can help update the docs by joining <#C023BNDT0N8|pinot-
docsrus>  
**@laxman:** Thanks @mayanks for the pointers. Going through this
documentation.  
 **@jai.patel856:** Good morning (in Seattle) folks. i wanted some help
troubleshooting a Pinot (0.6.0) upsert table. For context: 1\. This table was
deployed to our staging environment and production environment. Exact same
schema and tablespec. Works fine in staging streaming junk data. Not so much
in production on real data. 2\. Retention time is 10 days. 3\. After periods
of idleness, we are seeing cases where the production instance returning no
data. Try again 10 minutes later and everything is fine. 4\. Querying for age
of the newest record, it’s about 2 minutes old in production. Which seems
right. 5\. Some observations I noticed: a. Our time column (processed_at) is
not the same as our sorted column index (created_at_seconds) b. We are on
Pinot 0.6.0 (old bug?) c. We have only two upsert tables like this providing
different views of the data on the cluster. d. The cluster is resourced for
“testing.” Does Pinot evict idle tables out of memory? Could it be slow to
reload it because of the index? Is it the resources? Is there a known bug I’m
htiting? cc: @elon.azoulay @xiangfu0 @npawar  
**@jai.patel856:** FYI: @chundong.wang @lakshmanan.velusamy  
**@jai.patel856:** I’ve reproduced this behavior twice. Yesterday upon
creation of the tables. And today, having left them idle for the last 14
hours.  
**@xiangfu0:** For idle table, is your queries timed out ?  
**@jai.patel856:** no error, just no results in the query ui  
**@jai.patel856:** ran a size() op through the swagger and got a bunch of ‘-1’
on the segments and such  
**@xiangfu0:** How long you waited the query response?  
**@xiangfu0:** @jackie.jxt might have some more insights  
**@elon.azoulay:** Could this be due to direct memory oom? You can find out by
looking at the server logs  
**@xiangfu0:** Left idle should be fine  
**@xiangfu0:** My feeling is server got restarted as well  
**@jai.patel856:** before it was about 10 seconds before i would get no
results, eventually it took a little less time and returned results, then
results became fast.  
**@jai.patel856:** getting 0 results again now  
**@xiangfu0:** 10 sec is some internal default timeout  
**@jai.patel856:** and then results again…  
**@jackie.jxt:** How long have you been running this table? Any segment pass
10 days retention?  
**@jai.patel856:** looks from the logs there was a server restart about 4
minutes ago  
**@jai.patel856:** the table is a day old  
**@jackie.jxt:** If you have only one replica, then server restart will cause
data loss  
**@jai.patel856:** The oldest data from the stream is around 10 days old.  
**@jai.patel856:** ```"segmentsConfig": { "schemaName":
"enriched_station_orders_v1_14_rt_upsert_v2_0", "retentionTimeUnit": "DAYS",
"retentionTimeValue": "10", "timeColumnName": "processed_at", "timeType":
"MILLISECONDS", "segmentAssignmentStrategy":
"BalanceNumSegmentAssignmentStrategy", "segmentPushFrequency": "daily",
"segmentPushType": "APPEND", "replicasPerPartition": "3" },```  
**@jai.patel856:** Replicas looks like 3  
**@elon.azoulay:** When did it last occur?  
**@elon.azoulay:** was this on the staging or production cluster?  
**@jai.patel856:** prod  
**@jackie.jxt:** Is the server restarted normally or just killed somehow?  
**@jai.patel856:** 10:15am @elon.azoulay on server 0  
**@jai.patel856:** I didn’t request the restart if that’s what you’re asking.
I’m not seeing anything in the kubernetes log prior to the restart.. Let me
check the other servers.  
**@elon.azoulay:** You can check the logs for the servers in kibana, I'm
seeing this:  
**@elon.azoulay:** ```java.lang.RuntimeException: Inconsistent data read.
Index data file
/var/pinot/server/data/index/enriched_customer_orders_v1_14_rt_upsert_v2_0_REALTIME/enriched_customer_orders_v1_14_rt_upsert_v2_0__8__4__20210618T0857Z/v3/columns.psf
is possibly corrupted```  
**@elon.azoulay:** Today at 10:18am  
**@elon.azoulay:** You can ignore the "Cannot find classloader for class
errors" - that's happens when the server starts, will be fixed in an upcoming
pr.  
**@jai.patel856:** Found the error on server-2  
**@elon.azoulay:** data read error?  
**@jackie.jxt:** This error is logged when the magic marker validation failed,
which means the data file is corrupted somehow  
**@jackie.jxt:** Probably because some hard failure during segment creation  
**@jackie.jxt:** Restarting the server should try to download a new copy from
the deep storage  
**@jai.patel856:** Is this an area where stability fixes were made in 0.7.1?  
**@jackie.jxt:** AFAIK no. This error should be able to auto-recover though  
**@jackie.jxt:** Can you please provide the query stats for the empty
response?  
**@jai.patel856:** how do I get those?  
**@jai.patel856:** Also, right not our sorted column index is not on the same
column as is our time column. Will this cause performance degradation for the
queries on the upserted data?  
**@elon.azoulay:** Would have to test that as well - depends on the queries  
**@jai.patel856:** just a normal select *  
**@jai.patel856:** @jackie.jxt We’re intermittantly getting the error: [ {
“message”: “ServerTableMissing:\nFailed to find table:
enriched_station_orders_v1_14_rt_upsert_v2_1_REALTIME”, “errorCode”: 230 } ]  
**@jackie.jxt:** If you are using the query console, you can show the JSON
response which should have the query stats inside  
**@jackie.jxt:** The `ServerTableMissing` is not normal. Does it happen when
the server is restarted unintentionally?  
**@jai.patel856:** @jackie.jxt how do I show the json?  
**@jai.patel856:** nvm, i see it  
**@jai.patel856:** We are seeing this error, but not sure if it’s related:
```@timestamp: Jun 18, 2021 @ 13:52:47.195 -07:00 _id: w3HlIHoB6R61qWfdxh39
_index: logging-production-us-central1:.k8s-container-logs-001288 _score: -
_type: _doc kubernetes.cluster_name: data-cluster kubernetes.cluster_region:
us-central1 kubernetes.container_name: server : pinot
kubernetes.namespace_name: pinot-dev kubernetes.pod_name: pinot-upsert-server-
zonal-2 payload.text: Terminating due to java.lang.OutOfMemoryError: Java heap
space```  
**@jai.patel856:** I’m much more curious about this because it seems to happen
with regularity.  
**@xiangfu0:** ```Terminating due to java.lang.OutOfMemoryError: Java heap
space```  
**@xiangfu0:** it’s oom  
 **@aaron:** Any suggestion for speeding up a query that uses REGEX_LIKE to
filter on a dimension? I see string operations being super slow. Even if I
rewrite my regex as `SUBSTR(foo, ..., ...) = bar` I still see the query taking
more than 10 seconds  
**@mayanks:** Have you tried text index?  
**@mayanks:**  
**@aaron:** Does it play nice with star tree index?  
**@aaron:** (To be more precise, I want my query to be accelerated by the use
of the star tree index, and I also want to quickly filter by regex for one of
the dimensions)  
**@mayanks:** Should work  
**@aaron:** Ok, that's really neat  
**@aaron:** How can a text index accelerate a regex to be faster than table
scan?  
**@mayanks:** It leverages lucene index internally  
**@aaron:** Cool  
 **@thiagopsnfg:** @thiagopsnfg has joined the channel  
 **@ravinder2021.kr:** @ravinder2021.kr has joined the channel  

###  _#getting-started_

  
 **@aaronlevin:** @aaronlevin has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org