You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/21 02:00:22 UTC

Apache Pinot Daily Email Digest (2021-10-20)

### _#general_

**@jieshe:** @jieshe has joined the channel
**@alihaydar.atil:** @alihaydar.atil has joined the channel
**@alihaydar.atil:** Hey, does H3index only apply to ST_Distance function? if
so any suggestions to query points lies inside a polygon fastest way possible?
i have a table with latitude and longitude columns
**@mayanks:** @yupeng ^^
**@yupeng:** right now yes. there is a PR to add this support
**@kautsshukla:** Hi All, I have a defined schema : “name”: “properties”,
“dataType”: “JSON”, i’m consuming messages from Kafka. In table value is
coming as NULL whereas in kafka topic data is coming as expected :
{“type”:“event”,“ip”:“127.0.0.1",“created_at”:1634102442620,“properties”:{“city”:“abc”,“clinic”:“”,“symptomId”:“”,“treatmentId”:“”}}.
Any help here why is it happening ????
**@mayanks:** Will need to see the table config and schema (at least the JSON
specific part). Also are you planning to query this column? If so, perhaps use
JSON indexing?
**@npawar:** Not sure if the data type json works with the json index. You're
better off setting the properties_str column as STRING, and add ingestion
config on properties_str column as transformFunction: jsonFormat(properties)
**@vaibhav.gupta:** @vaibhav.gupta has joined the channel
**@awadesh22kumar:** @awadesh22kumar has joined the channel
**@falexvr:** hey guys, good afternoon, I'd like to know if any of you have
had to code a client to query pinot in scala? Is there a library to connect to
pinot for scala clients?
**@g.kishore:** I dont think there is a client in Scala.. java wont work?
**@falexvr:** Yeah, it does work, but I was curious to see if there was any
sort of library acting as a wrapper for scala
**@g.kishore:** we are not aware of one in scala
**@falexvr:** Thanks
**@jain.arpit6:** @jain.arpit6 has joined the channel
**@jain.arpit6:** Hi, I have created a realtime table on 0.8.0 Pinot cluster.
Data is getting in pinot but I see this log msg for one segment "Stopping
consumption due to row limit nRows=100000 numRowsIndxed=10000
numRowsconsumed=100000" Also i checked the debug endpoint in swagger and it
shows below result for segment
**@g.kishore:** thats a valid log statement, it gets printed before flushing
the segment to disk. After that there should be a new consuming segment that
will start consuming messages again.
**@jain.arpit6:** Where is it pickking the value 100000 from?
**@g.kishore:** from table config
**@g.kishore:** Sorry, I did not see the debug output
**@g.kishore:** looks like the segment is not getting built
**@g.kishore:** any exception in the log?
**@jain.arpit6:** Also I spotted an error when it is trying to build the
segment after that log message and same error I can see in the debug endpoint.
So looks like something is wrong with our data
**@jain.arpit6:** We have not specified that value 100000 in config
**@g.kishore:** right.. can you paste the error here
**@g.kishore:** I think thats the default
**@mayanks:** Yeah, please paste the error. The 100k value seems to indicate
the initial value of segment auto sizing.
**@jain.arpit6:** So it is trying to flush the segment after reading 100k
records which is a default value for some property. I am specifying some
values(size/time) in config for flushing but seems they are not getting picked
up
**@ssubrama:** @jain.arpit6 this has the configs :
**@jain.arpit6:** The exception in log while creating a segment is because of
a Datetime field. I have declared a datetime string type field with
format(1:milliseconds:simple_date_format:YYYY-MM-DD'T'HH:MM:SS.SSSZ). The
exception says Could not parse "2021-10-09T18:42:54.985Z": value 42 for
monthOfYear must be in the range 1,12. According to the format I defined, 42
is seconds but it is taking it as month.
**@g.kishore:** can you please file an issue?
**@jain.arpit6:** I got past the above issue. Reason was format specifier is
case sensitive so I had to put M for month and m for minutes.
**@jain.arpit6:** However now I get another error for the same column while
building dictionary at segment creation time. The log says " created
dictionary for String column: InsertedTime with cardinality:149, max length in
bytes:24,range:2021-10-09T18:42:54.985Z to null And than error later with
illegalargumentexception: invalid format: "null"
**@jain.arpit6:** To my understanding, it looks like it scans all the values
for the given column to build a range and in this case it gets a null and a
valid value. but null is obviously not valid format for the given field and it
fails
**@jain.arpit6:** I gave a default value(1800-01-01T00:00:00.000Z) in schema
for the given field but still same error
**@jain.arpit6:** How should i fix this ?
**@jain.arpit6:** @mayanks
**@jain.arpit6:** As suggested by you, please find the output of debug
endpoint
**@mapshen:** Hi, if Pinot expects a field to be numeric but receives a
string value, how does Pinot handle it?
**@g.kishore:** it will try to parse it and if it fails, it will use the
default value for that data type. Default value can be overridden in the
schema
**@mapshen:** Thanks @g.kishore would you mind pointing me to the code?
**@g.kishore:** see compositetransformer
**@mapshen:** Previously we had a case where an incorrect value type for the
datetime field halted the whole ingestion. Do you have special handling for
this field?
**@g.kishore:** I think so. yes, because retention depends on the primary time
column and we avoid setting default value for that and fail fast
**@mapshen:** Took a look at the code and it seems it’s DataTypeTransformer
that does the job which in turns relies on PinotDataType. According to
PinotDataType, exceptions can be thrown if a conversion is not possible. Would
you mind pointing me to the place that handles this exception and uses the
default value, instead of stopping the whole ingestion?
**@tyler773:** @tyler773 has joined the channel

### _#random_

**@jieshe:** @jieshe has joined the channel
**@alihaydar.atil:** @alihaydar.atil has joined the channel
**@vaibhav.gupta:** @vaibhav.gupta has joined the channel
**@awadesh22kumar:** @awadesh22kumar has joined the channel
**@jain.arpit6:** @jain.arpit6 has joined the channel
**@tyler773:** @tyler773 has joined the channel

### _#troubleshooting_

**@jieshe:** @jieshe has joined the channel
**@alihaydar.atil:** @alihaydar.atil has joined the channel
**@lrhadoop143:** Hi ,can we remove old data(more than one week) from pinot
table.if Yes how?
**@mayanks:** You can set retention in table config to 7 days that should do
it
**@lrhadoop143:** Thank you @Mayank
**@msoni6226:** Hi Team, We are trying to understand the Pinot metrics
exposed to Prometheus. While looking into the segments error metrics
*"pinot_controller_segmentsInErrorState_Value"*, it states that "Number of
segments in error state". However, we see that we do have some of the segments
in bad state but the same is not reflected in Prometheus Graph. The count
shows 0
**@mayanks:** If you are referring to `BAD` status in the console, check the
external view to ensure that is the case. I think someone else reported an
issue where console reports `BAD` for consuming segments. I had requested for
opening a GH issue, so there might already be one
**@vaibhav.gupta:** @vaibhav.gupta has joined the channel
**@awadesh22kumar:** @awadesh22kumar has joined the channel
**@jain.arpit6:** @jain.arpit6 has joined the channel
**@tyler773:** @tyler773 has joined the channel
**@saadkhan:** Hi team, From auth settings, I was able to enable it user
credentials following the instructions but queries via console are not going
through with a READ error. As per instruction the broker and controller have
same admin username:pwd
**@xiangfu0:** are you running on latest code?
**@xiangfu0:** there was a fix after 0.8.0 release on this
**@saadkhan:** @xiangfu0 Well I'm using 0.8.0 release. For upgrading, if pinot
is deployed distributed, would there be an issue during upgrade due to version
mismatch?
**@xiangfu0:** pinot handles backward compatible as long as you are following
the order of controller -> broker -> server / minion
**@saadkhan:** Coo, thanks I will follow this pattern ```controller 0.9.0 ->
broker 0.8.0 -> server / minion 0.8.0 controller 0.9.0 -> broker 0.9.0 ->
server / minion 0.8.0 controller 0.9.0 -> broker 0.9.0 -> server / minion
0.9.0```
**@xiangfu0:** yes

### _#pinot-dev_

**@lrhadoop143:** Hi ,can we remove old data(more than one week) from pinot
table.if Yes how?
**@atri.sharma:** @lrhadoop143 Please use the general or troubleshooting
channel for such questions
**@lrhadoop143:** Ok @atri.sharma
**@dadelcas:** Are there any in-progress discussions around adding big number
data types? E.g. big decimal and big integer
**@g.kishore:** we added biddecimal earlier this year
**@dadelcas:** Is that available in 0.8.0? I can't find it in the
documentation
**@dadelcas:** I mean having a column of type Decimal(30, 4) for example.
I've been skimming the source code and FieldType only define the types as per
the docs. I've seen the PR that introduces `bytesToBigDecimal() but nothing to
support BigDecimal as a data type`
**@g.kishore:** @jackie.jxt ^^
**@jackie.jxt:** @dadelcas `BigDecimal` is not supported as a standard data
type yet. Currently in order to use `BigDecimal` you need to store them as
`BYTES`. You may use `BigDecimalUtils.serialize()` to get the serialized bytes
**@dadelcas:** Yup, so my question as per the previous comment is whether
there are any discussion about adding these data types at the moment or that
isn't on the roadmap yet?
**@g.kishore:** I think we have the primitives needed to support it as a
standard datatype, so dont see a reason not to do it.
**@g.kishore:** please file a github issue
**@dadelcas:** so there is an issue already, raised by @xiangfu0
**@g.kishore:** Date is done.
**@g.kishore:** @jackie.jxt can you update the issue
**@dadelcas:** if BigInteger can be added to the list that'd be great!
**@g.kishore:** yes, its the same concept under the hood
**@g.kishore:** Dan, you need this because of querying through presto/trino?
**@dadelcas:** that's part of it, the main issue is actually that I have to
deal with big high-precision amounts and I would rather avoid doing
conversions during calculations
**@dadelcas:** I think in trino a string can be easily converted to decimal
if in pinot I use toBigDecimal in the column
**@g.kishore:** Got it

### _#getting-started_

**@chad:** @chad has joined the channel
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org