You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/03/17 02:00:25 UTC

Apache Pinot Daily Email Digest (2021-03-16)

### _#general_

  
 **@deepakcse2k5:** @deepakcse2k5 has joined the channel  
 **@deepakcse2k5:** Is update query is possible using pinot  
**@mayanks:** If you mean SQL `update` statement, then no. What's your use
case?  
**@deepakcse2k5:** we are using some update statement , basically to update
new column based on other column in offline table  
**@g.kishore:** you can use derived column feature for that  
**@vibhor.jain:** Hi All, what is the general approach preferred for
retrofitting old data? I see that MS teams uses Pinot. Now if I sent a msg via
teams and later updated that, how can such use case be handled in Pinot?
Suggestions welcome.  
**@g.kishore:** You can use upsert feature  
**@ravi.maddi:** *is It correct??* I have a column contains list of
integers("madIds": [1111, 2222, 3444]) for that I am writing like in schema
config file, please correct me and confirm me. ```{ "name": "madIds",
"datatype": "INT", "delimiter":",", "singleValueField":false },```  
**@fx19880617:** I think this is ok, you don’t need to set delimiter in
schema. Is should be how the record reader parsing the data  
**@fx19880617:** What’s your data format, if it’s json, then the parser should
parse it to an array already  
**@ravi.maddi:** ok, got thanks,  
 **@zzh243402448:** @zzh243402448 has joined the channel  
 **@ravi.maddi:** @All - how to write schema for *date column* I have a column
with date: "startDate": "2021-01-04 00:00:00" Need help
:slightly_smiling_face:  
**@fx19880617:** Can you try something like :  
**@fx19880617:** "dateTimeFieldSpecs": [{ "name": "startDate", "dataType":
"STRING", "format" : "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
"granularity": "1:DAYS" }]  
**@fx19880617:** FYI :  
**@ravi.maddi:** Thanks, I have two fields startDate and endDate then. I have
to write two times this block of code with name different. am I right?  
**@ravi.maddi:** I have three date columns, So, I written like this,
```"dateTimeFieldSpecs": [ { "name": "_source.startDate", "dataType":
"STRING", "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity":
"1:DAYS" }, { "name": "_source.lastUpdate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" },
{ "name": "_source.sDate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }
]``` can you please correct. I am getting error {"code":400,"error":"Cannot
find valid fieldSpec for timeColumn: timestamp from the table config:
eventflow_REALTIME, in the schema: eventflowstats"}  
**@ravi.maddi:** Hi Xiang Fu, can you check once  
 **@ravi.maddi:** @All - I added a table by using *addTable* pinot command,
but after I changed the schema, how to update the existing table already
added. *How to do update and delete table here*.  
**@fx19880617:** You can update schema using schema api  
**@fx19880617:** Try out controller swagger UI  
**@fx19880617:** It also generates the corresponding requests  
**@ravi.maddi:** What happens , If I run the same addTable commands with the
latest schema file. any Idea?  
 **@vibhor.jain:** @vibhor.jain has joined the channel  
 **@vibhor.jain:** Hi All, what is the general approach preferred for
retrofitting old data in Pinot? I see that MS teams uses Pinot. Now if I sent
a msg via teams and later updated that, how can such use case be handled in
Pinot where there is no update supported? Suggestions welcome.  
**@ganesh.github:** @vibhor.jain Can you have a look at this?  
**@ravi.maddi:** @All - I am getting an error while stating zookeeper with
pinot admin. zookeeper state changed (SyncConnected) Waiting for keeper state
SyncConnected Terminate ZkClient event thread. Session: 0x10003506d770000
closed Start zookeeper at localhost:2181 in thread main EventThread shut down
for session: 0x10003506d770000 Expiring session 0x10002b33f080005, timeout of
30000ms exceeded Expiring session 0x10002b33f080006, timeout of 30000ms
exceeded Expiring session 0x10002b33f080007, timeout of 30000ms exceeded
Expiring session 0x10002b33f080004, timeout of 30000ms exceeded Expiring
session 0x10002b33f080008, timeout of 30000ms exceeded Expiring session
0x10002b33f080002, timeout of 30000ms exceeded Expiring session
0x10002b33f08000b, timeout of 60000ms exceeded any solutions, Need help  
 **@ravi.maddi:** One doubt, How can I know which version of Kafka using by my
local Pinot?  
 **@chad.preisler:** @chad.preisler has joined the channel  
 **@ravi.maddi:** *Hi All,* I have three date columns, So, I written like
this, ```"dateTimeFieldSpecs": [ { "name": "_source.startDate", "dataType":
"STRING", "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity":
"1:DAYS" }, { "name": "_source.lastUpdate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" },
{ "name": "_source.sDate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }
]``` can you please correct. I am getting error ```{"code":400,"error":"Cannot
find valid fieldSpec for timeColumn: timestamp from the table config:
eventflow_REALTIME, in the schema: eventflowstats"}``` Need your help
:slightly_smiling_face:  
**@g.kishore:** Let’s use <#C011C9JHN7R|troubleshooting> for these questions.  
**@ravi.maddi:** Sure thanks  
**@g.kishore:** The error message has the info- time column (time stamp)
specified in tableconfig does not exist in schema  
**@g.kishore:** Change the time column in table config to point to one of
these name in schema  
**@ravi.maddi:** i don't have any column named 'timestamp'.  
 **@g.kishore:** Pinot meetup talk happening now if interested (  
**@karinwolok1:** :wave: Welcome all the new Pinot :wine_glass: community
members! How did you find out about Pinot? What are you working on?
@chad.preisler @vibhor.jain @zzh243402448 @deepakcse2k5 @harshvardhan.surolia
@nirav.shah @slatermegank @timebertt @orajason @satish @terodeakshay
@abprakash2003 @prshnt.1314 @hussain @shilpa.kumar1222 @thejas.nair
@akashkumar @mohamedsultan.ms304 @tamilselvansk23 @nurcahyopujo
@prachiprakash80 @matteo.santero @ravi.maddi @morzaria @suresh.k.kode
@rrepaka123 @jainendra1607tarun @santosh.rudra @xulinnankai @manish.bhoge
@ratchetmdt @james.wes.taylor @contactvivekjain @carlosmanzueta
@dileepkumarv.allam  
 **@rkitay:** @rkitay has joined the channel  
 **@rkitay:** Hi, what data types does `Pinot` support out-of-the-box? I’m
guessing `String`, numerics (integers and floating points), `date` and
`boolean` - are there any others supported? For example - `ip-address`?  
**@g.kishore:** no ipaddress is not supported.  
**@ken:**  
**@ken:** Note no specific date type  
**@ken:** There is a somewhat hidden Boolean type, but it gets mapped to a
string internally, I believe.  
**@rkitay:** So if I need to keep IP Addresses and support a filter like:
```IP inCidr(212.36.0.0/24)``` What are my options? Do I keep the data in raw
`byte[]` format and implement a `UDF` that will perform this filter? Can such
a query be sub-second ?  
**@g.kishore:** thats right, but we just use 1 bit to represent it on disk  
**@g.kishore:** @rkitay we can extend range query to support things like that  
**@g.kishore:** can you please an issue for ip indexing, its a cool use case  
**@rkitay:** @g.kishore, :slightly_smiling_face: Sure - though I haven’t
decided yet if we can invest time in checking `Pinot` at this time.  
**@ken:** @rkitay I haven’t worked with IP addresses in SQL, but worst case I
assume you could store as 32 bit int and do range queries?  
**@g.kishore:** @rkitay thats totally fine. someone else might ask for it
later or a contributor might want to pick it up.  
**@rkitay:** @ken, for IPv4 - yes. But for IPv6 I need 16 bytes. So either I
use a composite field with two `long`s or a `byte []`  
 **@rkitay:** Is there any limitation on the size of a single record written
into `Pinot`? Our average records are about 6KB when stored in `AVRO` , but
can reach up to ~50K in edge cases  
**@mayanks:** Pinot is columnar. Is the size due to wide schema or columns
that have large data?  
**@mayanks:** If former, no issues If latter, what’s the data type of those
columns?  
**@rkitay:** A combination - we have about 90 fields, some are numeric, others
are short strings - the rest are potentially large strings (e.g. HTTP
Request/Response Headers) - that can reach several KB for a single field.
Also, we keep nested records within each record - and each outer record can
contain several nested records - which also increases the size of a single
column  
**@rkitay:** Also - for some of these fields, we do not need indexing (e.g.
Request Headers) - I just need to be able to find them based on other
dimensions  
**@g.kishore:** yes, you can apply snappy compression on such columns  
**@g.kishore:** indexing a column is optional in Pinot  
**@mayanks:** For Strings, there is a default max length (iirc 512), but can
be overwritten:  
**@xd:** @xd has joined the channel  
 **@jamesmills:** @jamesmills has joined the channel  
 **@simon.paradis:** @simon.paradis has joined the channel  

###  _#random_

  
 **@deepakcse2k5:** @deepakcse2k5 has joined the channel  
 **@deepakcse2k5:** Is update query possible using pinot  
**@fx19880617:** you can do pinot upsert with realtime only table:  
**@deepakcse2k5:** is it possible for offline table  
 **@zzh243402448:** @zzh243402448 has joined the channel  
 **@vibhor.jain:** @vibhor.jain has joined the channel  
 **@deepakcse2k5:** can we make ‘date’ related column as part of primary key
in pinot?  
 **@chad.preisler:** @chad.preisler has joined the channel  
 **@rkitay:** @rkitay has joined the channel  
 **@xd:** @xd has joined the channel  
 **@jamesmills:** @jamesmills has joined the channel  
 **@simon.paradis:** @simon.paradis has joined the channel  

###  _#feat-text-search_

  
 **@akashkumar:** @akashkumar has joined the channel  

###  _#feat-presto-connector_

  
 **@akashkumar:** @akashkumar has joined the channel  
 **@hussain:** @hussain has joined the channel  

###  _#troubleshooting_

  
 **@deepakcse2k5:** @deepakcse2k5 has joined the channel  
 **@jungmwiner:** To run thirdeye locally, refer to the manual below. \-  I
used the master branch, and building thirdeye was successful. ### 1 By the way
An error occurs when executing `./run-frontend.sh`. ``` Error: Could not find
or load main class
org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardApplication ``` I tested
the same in several environments, but the problem occurred the same. ### 2 The
same problem occurs when executing the "./run-backend.sh" script. However, the
cause of the problem seems to be different. There seems to be a problem
because there is no
`org.apache.pinot.thirdeye.anomaly.ThirdEyeAnomalyApplication` class in the
jar file. **Tell me how to fix it, and I'll send you a PR.**  
**@fx19880617:** can you ask this in thirdeye slack? ()  
**@jungmwiner:** @fx19880617 thank you^^  
 **@deepakcse2k5:** Is update query possible using pinot  
 **@zzh243402448:** @zzh243402448 has joined the channel  
 **@vibhor.jain:** @vibhor.jain has joined the channel  
 **@deepakcse2k5:** can we make ‘date’ related column as part of primary key
in pinot?  
 **@chad.preisler:** @chad.preisler has joined the channel  
 **@ravi.maddi:** *Hi All,* I have three date columns, So, I written like
this, ```"dateTimeFieldSpecs": [ { "name": "_source.startDate", "dataType":
"STRING", "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity":
"1:DAYS" }, { "name": "_source.lastUpdate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" },
{ "name": "_source.sDate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }
]``` can you please correct. I am getting error ```{"code":400,"error":"Cannot
find valid fieldSpec for timeColumn: timestamp from the table config:
eventflow_REALTIME, in the schema: eventflowstats"}``` Need your help
:slightly_smiling_face:  
**@npawar:** can you share the table config?  
**@ravi.maddi:** Table Config: ```{ "tableName": "eventflow", "tableType":
"REALTIME", "segmentsConfig": { "timeColumnName": "timestamp", "timeType":
"MILLISECONDS", "schemaName": "eventflowstats", "replicasPerPartition": "1" },
"tenants": {}, "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": {
"streamType": "kafka", "stream.kafka.consumer.type": "lowlevel",
"stream.kafka.topic.name": "event_count-topic",
"stream.kafka.decoder.class.name":
"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.consumer.factory.class.name":
"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "localhost:9876",
"realtime.segment.flush.threshold.time": "3600000",
"realtime.segment.flush.threshold.size": "50000",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest" } }, "metadata": {
"customConfigs": {} } }``` And Schema File: ``` { "schemaName":
"eventflowstats", "eventflow": [ { "name": "_index", "dataType": "INT" }, {
"name": "_type", "dataType": "STRING" }, { "name": "id", "dataType": "INT" }
], "dateTimeFieldSpecs": [ { "name": "_source.startDate", "dataType":
"STRING", "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity":
"1:DAYS" }, { "name": "_source.lastUpdate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" },
{ "name": "_source.sDate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }
] }```  
**@npawar:** In your table config, you've configured "timeColumnName" :
"timestamp"  
**@npawar:** You need to change that to one of the dateTime columns from your
schema  
**@npawar:** Also, in your schema, you have the dimensions under "eventFlow"
instead of "dimensionFieldSpecs"  
**@ravi.maddi:** ok, got , so I have to remove remaining two. And I have to
add as normal fields am I right?  
**@npawar:** you can keep all 3 as dateTimeFieldSpecs  
**@npawar:** but select one of them as the primary time column, and enter that
in the tableConfig  
**@ravi.maddi:** is It right: ```"timeColumnName": "_source.startDate,
_source.lastUpdate, _source.sDate",```  
**@ravi.maddi:** is It correct: ```"dateTimeFieldSpecs": [ { "name":
"_source.startDate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity": "1:DAYS" }, {
"name": "_source.lastUpdate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" },
{ "name": "_source.sDate", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }
]```  
**@npawar:** No, you have to put just one column in the tableConfig.
```"timeColumnName": "_source.sDate"```  
**@npawar:** schema is correct  
**@ravi.maddi:** got  
**@ravi.maddi:** after this changes, I am getting this error: Sending request:
to controller: localhost, version: Unknown Got Exception to upload Pinot
Schema: aschema I think Pinot server went down. Any idea  
**@ravi.maddi:** @npawar -- can check once  
 **@ravi.maddi:** *Hi All* I am getting this error: I am trying to addTable
with pinot-admin. Sending request:  to controller: localhost, version: Unknown
Got Exception to upload Pinot Schema: aschema Need help
:slightly_smiling_face:  
 **@rkitay:** @rkitay has joined the channel  
 **@xd:** @xd has joined the channel  
 **@jiatao:** Hi, `Pinot Quickstart on JDK 15-ea` is failing for my pr (which
only change log messages). Seems like the test is running java 16 instead of
15: `JAVA_HOME_16.0.0_x64=/opt/hostedtoolcache/jdk/16.0.0/x64` . Any idea how
to fix this? The pr test link for reference:  
**@fx19880617:** Rerun the test doesn’t help?  
**@jiatao:** I changed one line, and rebase the pr which trigger the test
again, but it's still running jdk 16.  
**@fx19880617:** I saw it’s failing for other PR as well  
**@fx19880617:**  
**@fx19880617:** guess the issue is on github action side  
**@jiatao:** I see. Thanks.  
**@jiatao:** FYI: ^^ @jlli  
**@jlli:** sorry I have no context on this issue in apache pinot repo though..  
**@jlli:** One thing worthy to try is to use java 15 GA version instead of EA:  
**@jlli:** Build passes. @fx19880617@jiatao we should be good to go with that
changes now :point_up_2:  
**@fx19880617:** :thumbsup:  
**@jiatao:** @jlli Thanks!  
 **@jamesmills:** @jamesmills has joined the channel  
 **@simon.paradis:** @simon.paradis has joined the channel  

###  _#pinot-dev_

  
 **@akashkumar:** @akashkumar has joined the channel  
 **@zzh243402448:** @zzh243402448 has joined the channel  

###  _#pinot-docs_

  
 **@zzh243402448:** @zzh243402448 has joined the channel  

###  _#segment-write-api_

  
 **@npawar:** set up a meeting for 11am. Please move it around if that time
doesnt work @yupeng @chinmay.cerebro  
 **@yupeng:** sg  

###  _#metrics-plugin-impl_

  
 **@fx19880617:** @fx19880617 has joined the channel  
 **@xd:** @xd has joined the channel  
 **@jlli:** @jlli has joined the channel  
 **@fx19880617:**  
 **@fx19880617:** we can move the metrics plugin discussion here  
 **@fx19880617:** I think Xiaoman has some question about the listener
implementation  
 **@xd:** I think the major problem here is that the old plugins that
implemented `MetricsRegistryRegistrationListener` makes the server start
process hang, without any clue in our log to trace it  
 **@xd:** I agree that the interface change is a good direction in design; but
I am a bit concerned about the other Pinot users that did the same thing will
have trouble debugging  
 **@xd:** It took me quite a few hours digging until I figure out  
 **@xd:** Even `jstack` does not help  
 **@xd:** I don't have a better solution to this though. Maybe a proper
communication is the only way  
 **@xd:** Originally I thought it was another interface change, but here we
are dealing with dependencies too, so it is hard to find a good solution  
 **@xd:** After reimplementing my plugin the pinot server now starts properly
with my metrics plugin  
 **@jlli:** Hey Xiaoman, thanks for reaching out! I understand your concern of
removing the methods in `PinotMetricsRegistryListener`. While
`PinotMetricsRegistryListener` is just a wrapper. The wrapper’s method won’t
be registered to the actual registry; instead, it’s the actual yammer’s
listener which methods will be invoked. That’s why I think you want to add the
method like `void onMetricsRegistryRegistered(MetricsRegistry
metricsRegistry);`. While that’ll make the repo unclean, because we still have
to pull in the actual yammer dependencies to pinot’s code. One thing I’d
suggest is to initialize an actual Yammer listener and pass it as the param to
the constructor.  
 **@jlli:** this is the sample code on how to handle listener in LinkedIn (not
open source), hope that will give you some idea: ``` @Override public void
onMetricsRegistryRegistered(final PinotMetricsRegistry metricsRegistry) {
MetricsRegistryListener metricsRegistryListener = new
MetricsRegistryListener() { @Override public void onMetricAdded(MetricName
metricName, Metric metric) { // do sth } } @Override public void
onMetricRemoved(MetricName metricName) { // do sth } };
metricsRegistry.addListener(new
YammerMetricsRegistryListener(metricsRegistryListener)); }```  
 **@xd:** Thanks. I have got mine working after recompiling. Mostly it is
because of Plugins are loaded in runtime by reflection, and then Plugins built
agains old Pinot got loaded without any check.  
 **@jlli:** I see. Glad that was resolved. Usually the build will fail at
compile time, then we should do some changes on addressing the new code  

###  _#flink-pinot-connector_

  
 **@fx19880617:** @fx19880617 has joined the channel  
 **@npawar:** @npawar has joined the channel  
 **@yupeng:** @yupeng has joined the channel  
 **@chinmay.cerebro:** @chinmay.cerebro has joined the channel  
 **@fx19880617:** this is the design doc:  
**@chinmay.cerebro:** :thumbsup:  
 **@chinmay.cerebro:** Looks like this doc needs a lot of changes  
 **@yupeng:** sure. i can update the doc to reflect the latest discussions  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org