You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/03/13 02:00:24 UTC

Apache Pinot Daily Email Digest (2021-03-12)

### _#general_

  
 **@anumukhe:** Hi we will be installing Pinot cluster in AWS on top of EKS.
We know that in AWS EKS has Multi (Three) Availability Zone (AZ) based HA in a
specific Region. So I would like to understand that whether the EKS based
Pinot cluster will be by default Fault Tolerant & HA within the region in case
of any AZ failure or not. I know that Pinot Server has Segment Replica and
replica-group which provide HA within the cluster in case of server failure.
But what will happen if the controller has issue in the cluster (on EKS) or
multiple servers have been corrupted or the cluster (on EKS) as a whole goes
down? Considering the fact that the server will have EBS as data serving file
system (& EBS multi AZ replication/sync will be ON), will EKS by default bring
up alternative node like Controller or Server (or even Broker)? Net-net can we
expect 100% service availability in Pinot on EKS in any Region? Or do we need
to setup another Pinot Cluster on EKS on another AZ i.e. minimum Two Pinot
Cluster (On EKS) in Two AZ within a Region? Please suggest  
**@g.kishore:** Yes, Pinot is HA within a region.. as long as you make sure
the replication factor is >1 and you have >1 controller and >1 broker  
**@g.kishore:** yes, K8s will bring up another server/controller/broker on
failure and everything will be handled seamlessly  
**@anumukhe:** Thanks a lot  
 **@matteo.santero:** @matteo.santero has joined the channel  
 **@ravi.maddi:** @ravi.maddi has joined the channel  
 **@ravi.maddi:** Hi Guys,  
 **@ravi.maddi:** I have a doubt, is It possible for a nested json data as
Pinot table? Avro support nested entities(json) by using record type in Avro
Schema. Like Avro, Pinot Table configuration supports nested json
entities.(Like Account json contains address json as embedded. )  
**@g.kishore:** Yes, we recently added support for indexing nested json fields  
 **@ravi.maddi:** I have been gone through Pinot documentation that Pinot
support Avro, but I am not able to find any samples or sample code regarding
that. Can you help by referring some code with Pinot and Avor combination.  
**@g.kishore:** If you ingest avro into Pinot complex nested fields will be
automatically converted to json  
**@1705ayush:** Hi all, I am writing this to explain the *loop of problems*
that we are facing while working on the *architecture* having *Superset*
(v1.0.1), *Pinot*(latest docker image) and *Presto*
(starburstdata/presto:350-e.3 docker image). Working around a problem in one
framework causes problem in the other. I do not know which community can help
me to solve this hence, posting it on both. Till now: We have successfully
pushed 1 million records in a pinot table and would like to build charts on
Superset over it. *Problem # 1* We connected superset to pinot successfully
and were able to build SQL lab queries only to find out that Superset does not
support Exploring of SQL Lab virtual data as a chart if the connected database
is Apache Pinot. (The "Explore" button is disabled) Please let me know, if
this can be solved or we interpreted it incorrectly as it will solve the whole
problem at once. To work it around, we got to know that superset - presto
connection would enable this Explore button and we had implementation of
presto any-which ways in our plan. So, we implemented Presto on top of pinot.
*Problem # 2* We found that Presto cannot aggregate pinot records of count
more than 50k throwing error `Segment query returned '50001' rows per split,
maximum allowed is '50000' rows. with query "SELECT * FROM pinot_table LIMIT
50001"` Presto cannot even query something like this: ```presto:default>
select count(*) from pinot.default.pinot_table;``` Even, if we increase the
50k limit of presto's pinot.properties `pinot.max-rows-per-split-for-segment-
queries` to 1 million, the presto server crashes stating heap memory exceeded.
To work it around, we got to know that we can make pinot to do the
aggregations and feed the aggregated result to presto which will in turn feed
the superset to visualize the charts, by writing the aggregation logic inside
the sub query of presto like, ```presto:default> select * from
pinot.default."select count(*) from pinot_table"``` This returns the expected
result. *Problem # 3* We found that, though we can make pinot to do the
aggregations, we cannot use the supported transformation function of pinot
listed , inside the sub query of presto. The query ```select datetrunc('day',
epoch_ms_col, 'milliseconds') from pinot_table limit 10``` works fine in pinot
but when embedded in presto as sub query like below does not work
```presto:default> select * from pinot.default."select datetrunc('day',
epoch_ms_col, 'milliseconds') from pinot_table limit 10"; Query failed: Column
datetrunc('day',epoch_ms_col,'milliseconds') not found in table default.select
datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10``` I
do not know if we are doing something wrong while querying/implementing or
have missed some useful config setting that can solve our problem. The SQL Lab
query which we want to query from pinot and eventually use the result to make
a chart is like ```SELECT day_of_week(epoch_ms_col), count(*) from pinot_table
group by day_of_week(epoch_ms_col)``` Any help is really appreciated !!!  
**@g.kishore:** Thanks Ayush. Looking into it. can you file a github issue..
this is super detailed and can be helpful for other developers  
**@fx19880617:** For #3, @elon.azoulay do you know if it’s supported?  
**@elon.azoulay:** Hey, I wanted to try some things out, might have a pr to
fix that if I can't find a workaround  
**@elon.azoulay:** Already have one open that I can add to:  

### _#random_

  
 **@matteo.santero:** @matteo.santero has joined the channel  
 **@ravi.maddi:** @ravi.maddi has joined the channel  

###  _#troubleshooting_

  
 **@humengyuk18:** Hi team, when connecting pinot using python pinotdb driver,
how should I route to different tenant broker? I have configured different
table using different tenant, should I use different connection string for
different tenant?  
**@fx19880617:**  
**@fx19880617:** ```conn = connect(host="localhost", port=8000,
path="/query/sql", scheme="http")```  
**@fx19880617:** You can set tenant broker as `host`  
**@humengyuk18:** So I should set different connection for different tenant?  
**@fx19880617:** You can also try `sqlalchemy.engine` , example is in the same
python file  
**@fx19880617:** yes  
**@humengyuk18:** Got it, thanks  
**@fx19880617:** basically per tenant a connection  
 **@matteo.santero:** @matteo.santero has joined the channel  
 **@matteo.santero:** Hello Here, I am new in Pinot and Presto, so maybe my
question has an obvious explenation I am not seeing. I ve a table split in
OFFLINE and REALTIME, with a defined key and time. From pinot interface I am
getting 1 record (the OFFLINE one) while from Presto by DBeaver client I am
getting 2 records (one for each table) Is there something I am missing in thew
configuration for this? ```upsertConfig mode true in REALTIME pinot 0.6.0
presto 0.247 dbeaver 21.0.0.202103021012``` Thanks you very much in advance  
**@g.kishore:** have you turned on upsert?  
**@matteo.santero:** i supposed that the `upsertConfig mode true ` mean yes
but probably is not enough  
**@matteo.santero:** strange is that the pinot web interface seems is using
the upsert setitng while presto not  
**@g.kishore:** I dont think upsert can support hybrid table's as of today..
@yupeng ^^  
**@matteo.santero:** I don’t know if is something i forgot on the link between
the 2 systems  
**@yupeng:** yes, upsert is only for realtime table now. there is an ongoing
PR to address upsert table with longer retention  
**@falexvr:** Guys, we have an error in our controller instances, we get this
message which we think might be preventing the kafka consumer to work
properly: ```Got unexpected instance state map: {Server_mls-pinot-
server-1.mls-pinot-server-headless.production.svc.cluster.local_8098=ONLINE,
Server_mls-pinot-server-2.mls-pinot-server-
headless.production.svc.cluster.local_8098=ONLINE} for segment:
dpt_video_event_captured_v2__0__22__20210306T1745Z``` Would you please tell me
what can cause this issue?  
 **@falexvr:** This is what follows to that log entry  
 **@falexvr:** Ah... It seems I'm in some kind of weird scenario not currently
being handled by the validation manager  
 **@falexvr:** All my segments are ONLINE, all in DONE state, the last segment
doesn't have any CONSUMING replicas  
 **@falexvr:** So, what can we do to fix this?  
 **@ravi.maddi:** @ravi.maddi has joined the channel  
 **@joshhighley:** when ingesting large amounts of streaming data, is there
any mechanism for capturing/notifying about data errors other than looking
through the server log files? We're ingesting millions of records as fast as
Pinot can ingest them, so looking through logs isn't really practical.  
**@dlavoie:** Please share your use case to this issue:  
**@joshhighley:** that seems to be at a table level -- connectivity issues or
configuration issues. Is row-level monitoring applicable to that issue?  
**@dlavoie:** It’s worth sharing as this bring another perspective to the
general “ingestion” observability topic.  
**@dlavoie:** If one row breaks ingestion, it’s equaly important to bubble up
the details than a table misconfiguration.  
**@dlavoie:** I think both issues fits in the “table ingestion status” story  
**@g.kishore:** there is a metric that you can monitor for errors during
ingestion  
**@joshhighley:** I see an error count in ServerMeter but it's just a counter.
Per instructions here:  I exposed the metrics but the counter is still 0
despite errors in the logs  

###  _#pinot-dev_

  
 **@npawar:** Quickstart isn’t working for me in Intellij after pulling recent
changes. Anyone else seeing this? ```2021/03/11 20:28:36.874 ERROR
[StartServiceManagerCommand] [main] Failed to start a Pinot [CONTROLLER] at
10.357 since launch java.lang.IllegalStateException: Failed to initialize
PinotMetricsFactory. Please check if any pinot-metrics related jar is actually
added to the classpath. at
com.google.common.base.Preconditions.checkState(Preconditions.java:444)
~[guava-20.0.jar:?] at
org.apache.pinot.common.metrics.PinotMetricUtils.initializePinotMetricsFactory(PinotMetricUtils.java:85)
~[classes/:?] at
org.apache.pinot.common.metrics.PinotMetricUtils.init(PinotMetricUtils.java:57)
~[classes/:?] at
org.apache.pinot.controller.ControllerStarter.initControllerMetrics(ControllerStarter.java:483)
~[classes/:?] at
org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:279)
~[classes/:?] at
org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116)
~[classes/:?] at
org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91)
~[classes/:?] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234)
~[classes/:?] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286)
[classes/:?] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233)
[classes/:?] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183)
[classes/:?] at
org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130)
[classes/:?] at
org.apache.pinot.tools.admin.command.QuickstartRunner.startControllers(QuickstartRunner.java:124)
[classes/:?] at
org.apache.pinot.tools.admin.command.QuickstartRunner.startAll(QuickstartRunner.java:170)
[classes/:?] at org.apache.pinot.tools.Quickstart.execute(Quickstart.java:169)
[classes/:?] at
org.apache.pinot.tools.admin.command.QuickStartCommand.execute(QuickStartCommand.java:78)
[classes/:?] at
org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164)
[classes/:?] at
org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184)
[classes/:?]```  
 **@npawar:** @jlli could this be related to the metrics registry changes ?  
 **@jlli:** How do you run the `StartServiceManagerCommand`?  
 **@npawar:** i’m running PinotAdministrator with `QuickStart -type OFFLINE`
arguments  
**@jlli:** I see. It seems the actual metric module is missing. Let me create
a hotfix in pinot-tools module  
**@npawar:** thanks!  
**@g.kishore:** Don’t we have a QuickStart test yet?  
**@npawar:** this only happens on IDE i think, because you dont add any
plugins to the classpath  
**@jlli:** Opened a PR for that and tested it in my IDE:  
**@jlli:** I’ve validated all the scripts in pinot-distribution and they work
fine. So adding the missing one to pinot-tools would be sufficient  
**@fx19880617:** the script works as it has all the dependencies from plugin
directory  
 **@anumukhe:** @anumukhe has joined the channel  
 **@anumukhe:** Hi we will be installing Pinot cluster in AWS on top of EKS.
We know that in AWS EKS has Multi (Three) Availability Zone (AZ) based HA in a
specific Region. So I would like to understand that whether the EKS based
Pinot cluster will be by default Fault Tolerant & HA within the region in case
of any AZ failure or not. I know that Pinot Server has Segment Replica and
replica-group which provide HA within the cluster in case of server failure.
But what will happen if the controller has issue in the cluster (on EKS) or
multiple servers have been corrupted or the cluster (on EKS) as a whole goes
down? Considering the fact that the server will have EBS as data serving file
system (& EBS multi AZ replication/sync will be ON), will EKS by default bring
up alternative node like Controller or Server (or even Broker)? Net-net can we
expect 100% service availability in Pinot on EKS in any Region? Or do we need
to setup another Pinot Cluster on EKS on another AZ i.e. minimum Two Pinot
Cluster (On EKS) in Two AZ within a Region? Please suggest (edited)  

###  _#segment-write-api_

  
 **@npawar:** @yupeng thread safety would be each implementations decision rt?
do we need something on the interface? it is acceptable for someone to write a
non thread safe impl and use it the way they want in their application  
 **@yupeng:** thats my question :slightly_smiling_face:  
 **@yupeng:** do you recommend thread safety for all implementations?  
 **@npawar:** okay, then no :slightly_smiling_face:  
 **@yupeng:** i wonder in the connector, what implementation it’ll use  
 **@yupeng:** basically, i am thinking of how to implement this async flush  
 **@yupeng:** in the wrapper, or let the connector do it  
 **@npawar:** thinking.. async flush would mean we create a new buffer and
start collecting while flush completes. if flush encounters errors, then what
happens? how will the connector go back and resubmit for that segment? and
what if there’s some permanent exception in flush, we’ll keep creating new
buffers  
 **@yupeng:** yes  
 **@yupeng:** i think there can be two buffers, staging (collecting records)
and ready (to flush)  
 **@yupeng:** if there are issues with `ready`, then staging cannot be moved
to ready  
 **@yupeng:** and eventually be blocking  
 **@yupeng:** the purpose is to have some simple pipeline not to block the
record collection, assuming segment creation/upload will take some time  
 **@npawar:** `if there are issues with ready, then staging cannot be moved to
ready` - what about the data that was in `ready`? how will you recover that?  
 **@yupeng:** same to `staging` ?  
**@npawar:** didnt get you  
**@yupeng:** i meant to have more extra buffer for better parallelism, the
error handling would be similar to a single buffer  
**@yupeng:** if there is a single buffer, how do we recover?  
**@npawar:** flink connector will be waiting for success response from the
segment build. If it fails, the connector knows the last point of flush, and
will try write again. Say last flush was at 100, and now it is 250. But the
flush at 250 failed. Flink connector will write from 100. But if the connector
has moved onto writing new records, i.e. called flush at 250 and began
writing, and is now at 330 when it received response that the flush at 250 was
a failure. Then what does the flink connector do for records 100-250?  
**@yupeng:** it’ll wait block there?  
**@yupeng:** btw, i think like what spark connector behaves today, an error
during the segment load will fail the entire job  
**@yupeng:** i think similar behavior will be there in flink connector  
**@npawar:** Alright, if you think the flink connector can handle that, then I
have no concerns about doing the 2 burgers design  
**@npawar:** Do you mind getting on a call with me next week and explaining it
from the flink side some more? I've never worked with flink, so I'm probably
not seeing what you are  
**@yupeng:** sure thing  
 **@yupeng:** it’s just one buffer vs two buffers  
 **@yupeng:** and it’s internal to connector users  
 **@yupeng:** users only know there is some issue that it cannot upload  
 **@yupeng:** but staging/ready is internal  
 **@chinmay.cerebro:** how will this work with checkpointing ?  
 **@chinmay.cerebro:** the segments have to be aligned with checkpoint
barriers is it not ?  
 **@npawar:** The segment writer doesn't have to worry about it right?  
 **@chinmay.cerebro:** no , but the connector will  
 **@chinmay.cerebro:** I mean the flush call will be of relevance  
 **@chinmay.cerebro:** The sink connector has to acknowledge a snapshot or
checkpoint to the job manager - that's the point when everything upto that
checkpoint is deemed consistent (and any failure will resume from the last
checkpoint) .  
 **@chinmay.cerebro:** so the sink connector needs to know whether something
has been persisted to the sink or not in a reliable manner  
 **@chinmay.cerebro:** Think of how this happens in a stateful Flink operator
(which is using something like RocksDB) - the operator is doing local changes
to RocksDB . In the case of a checkpoint barrier - it will block any further
input processing and checkpoint the rocksdb state before moving on  
 **@chinmay.cerebro:** which means the RocksDB snapshot is a blocking
operation  
 **@chinmay.cerebro:** I feel the Sink connector might have to follow similar
paradigm  
 **@chinmay.cerebro:** @yupeng thoughts ?  
 **@npawar:** Hmm thanks for the context and details  
 **@npawar:** This is kinda why I think that the connector should not proceed
unless it gets acknowledgement that push is done  
 **@chinmay.cerebro:** yes  
 **@chinmay.cerebro:** but its been a while for me for Flink - maybe there are
newer things that I don't know about  
 **@chinmay.cerebro:** lets wait for Yupeng's comments  
 **@npawar:** Okay. Let's have a call on Tuesday? We've been going back and
forth on this async discussion  
 **@chinmay.cerebro:** I also think - the first version shouldn't try to over
optimize  
 **@chinmay.cerebro:** keep it simple  
 **@chinmay.cerebro:** sounds good @ Tuesday  
 **@yupeng:** yes, in current proposal, checkpoint not supported for pinot
sink  
 **@yupeng:** so if the connector fails, the job has to be relaunched  
 **@yupeng:** as i explained in the connector design doc  
 **@yupeng:** however, it’s still possible to support checkpoint in future  
 **@yupeng:** that checkpoint shall remember the last successful segment name
in the folder  
 **@yupeng:** and discard other segments whose sequence numbers are higher  

###  _#releases_

  
 **@tanmay.movva:** @tanmay.movva has joined the channel  

###  _#debug_upsert_

  
 **@g.kishore:** @g.kishore has joined the channel  
 **@yupeng:** @yupeng has joined the channel  
 **@matteo.santero:** @matteo.santero has joined the channel  
 **@jackie.jxt:** @jackie.jxt has joined the channel  
 **@g.kishore:** @yupeng @jackie.jxt  
 **@matteo.santero:**  
 **@matteo.santero:**  
 **@matteo.santero:**  
 **@matteo.santero:**  
 **@matteo.santero:** ```upsertConfig mode true in REALTIME pinot 0.6.0 presto
0.247 dbeaver 21.0.0.202103021012```  
 **@josefarf:** @josefarf has joined the channel  
 **@jackie.jxt:** @matteo.santero For the same primary key, there should be
only one server queried  
 **@jackie.jxt:** Please check if the kafka stream is properly partitioned  
 **@jackie.jxt:** Sorry, didn't notice the other thread that this is a hybrid
table. As Yupeng mentioned, upsert does not work on hybrid table  
 **@g.kishore:** Right.. do we have an eta on allowing batch updates to upsert
enabled tables?  
 **@g.kishore:** We should flag this error somewhere  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org