You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/19 02:00:16 UTC

Apache Pinot Daily Email Digest (2021-10-18)

### _#general_

  
 **@surajkmth29:** Hi Team, I was looking at ID_SET and IN_SUB_QUERY
provisions in pinot for handling subqueries referring the below video:  Here I
have few questions: 1\. Is the ID_SET only supported for integer values? 2\.
Is there support for alphanumeric strings? Any pointers would be helpful  
**@g.kishore:** @jackie.jxt ^^  
**@jackie.jxt:** @surajkmth29 ID_SET supports all data types. For non-integer
types (types other than INT and LONG), it stores the values in a bloom filter  
**@jackie.jxt:** The `expectedInsertions` and `fpp` is configurable for the
bloom filter to tune the accuracy. You may read more here:  
**@msoni6226:** Hi Team, Is there any document available where I can get the
definition of counters/metrics exposed from Pinot for Prometheus?  
**@adireddijagadesh:** @msoni6226 Please refer this document:  
**@vibhor.jain:** Hi Team, As part of handling duplicates in our hybrid table,
we thought of using "mergeType": "dedup" for moving data from realtime to
offline table. The problem we are facing is, one of our column is storing
encrypted value and even for duplicate rows, this value is changing everytime.
Is there a way to perform "dedup" on a subset of columns for moving data to
offline table via minion?  
**@mayanks:** Won’t that cause data loss due to incorrect dedup?  
**@vibhor.jain:** Hi @mayanks, by a subset of columns I mean pointing only the
primary key columns. Currently for "mergeType": "dedup" config, it scans the
entire row. Is there any option of restricting it to primary key-related
columns somehow?  
**@mayanks:** There isn't one right now, afaik. But I am still unclear. Let's
say you have two rows with same primary key values, but different on other
dimensions, which ones do you expect the dedup to drop?  
 **@valentin.richer:** @valentin.richer has joined the channel  
 **@kchavda:** Hi All, Any advice/suggestions on how to handle null values in
date column with valid values same as the default `1970-01-01` in Pinot (ex:
date of birth)? In my real time table schema I have the date defined as below
under dateTimeFieldSpecs: ```{ "name": "date_of_birth", "dataType":
"TIMESTAMP", "format": "1:DAYS:TIMESTAMP", "granularity": "1:DAYS" }```  
**@mayanks:** `date_of_birth` is not a time column right, but regular
dimension?  
**@kchavda:** Right. I have a created_at date which I'm using as the primary
time column in the table segment config.  
**@kchavda:** I'm formatting the field to show as date when querying the data.  

###  _#random_

  
 **@valentin.richer:** @valentin.richer has joined the channel  

###  _#troubleshooting_

  
 **@valentin.richer:** @valentin.richer has joined the channel  
 **@vibhor.jain:** Hi Team, As part of handling duplicates in our hybrid
table, we thought of using "mergeType": "dedup" for moving data from realtime
to offline table. The problem we are facing is, one of our column is storing
encrypted value and even for duplicate rows, this value is changing everytime.
Since "dedup" works on entire row, its not removing the duplicates. Is there a
way to perform "dedup" on a subset of columns for moving data to offline table
via minion?  
**@jackie.jxt:** @vibhor.jain Currently that is not supported. If the value
keeps changing, we won't known which value to keep during the `dedup`. Is it
possible to model the use case as `rollup` where we can merge different values
into one?  

###  _#thirdeye-pinot_

  
 **@hardik.chheda:** @hardik.chheda has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org