You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/03/26 02:00:24 UTC

Apache Pinot Daily Email Digest (2022-03-25)

### _#general_

  
 **@abhineshada:** @abhineshada has joined the channel  
 **@piyush.chauhan:** Is there a tool that helps us visualise the schema of
Pinot OLAP Tables (across all tables)? Since there is no joins support, we
need to have data redundancy (same field in multiple tables). And I want to
see all fields and their relationships (redundant fields) across all tables. I
am looking something similar to ER diagram that postgres allows us to make.  
**@mayanks:** Seems like a helpful tool to have. I am not aware of any for
pinot schema though.  
 **@octchristmas:** Hi. Team.  In my tests, I had to use the groovy syntax
used in the 'injection transformation filterFunction' differently for the
realtime table and the offline table. I don't know if this is a bug or wrong
use case or the correct case, but I do report it as it seems confusing to
users. • 'score' field's datatype is float. • 'filterFunction" :
"Groovy({score >= 4 && score < 6}, score)" REALTIME working, OFFLINE not
working(casting exception, creationAndPush injestion job) • 'filterFunction" :
"Groovy({(score as float) >= 4 && (score as float) < 6}, score)" OFFLINE
working  
**@mayanks:** Do the offline and realtime schema match for `score`?  
**@mayanks:** If yes, perhaps file a GH issue with as much details a possible.  
 **@francois:** Hi amazing pinot team. If I read well the docs there is a
mecanism for GDPR compliance. Have you any example to share arround here. For
eg I’ve customer_Id I want to purge. How can I do that ? And what happen for
the curent consumming segment ? Thx for your help :wink:  
**@mayanks:** There is a `SegmentPurger` minion job that you may want to
customize to query what records to purge.  
 **@prashant.pandey:** Hi Pinot folks :slightly_smiling_face: I am getting
some segments of my table as unavailable (error code 305): ```[ { "message":
"7 segments...unavailable", "errorCode": 305 } ]``` I am okay to let go of
these segments to query my table. How can I do that? Or anything I can do to
bring these back up? I tried reloading but doesn’t help. Also, this is an
REALTIME table.  
**@prashant.pandey:** Controller logs: ```Unable to find a next state for
resource: span_event_view_1_REALTIME partition:
span_event_view_1__7__10175__20220321T0925Z from stateModelDefinitionclass
org.apache.helix.model.StateModelDefinition from:ERROR to:ONLINE```  
**@mayanks:** What does the idealstate assignment say? Is it assigned to a
server that is no longer in cluster? If so, try rebalance.  

###  _#random_

  
 **@abhineshada:** @abhineshada has joined the channel  

###  _#troubleshooting_

  
 **@abhineshada:** @abhineshada has joined the channel  
 **@luisfernandez:** anyone know the reason why a server that has been marked
as Dead, and updated its tags and after issued a rebalance would be still pop
in the `IdealState` in zookeeper?  
**@luisfernandez:** IdealState<>ExternalView  
**@mayanks:** Is this a huge table with large number of segments (that would
take time to rebalance)? If so, this could be a race condition that
@jackie.jxt recently fixed.  
**@jackie.jxt:** Still in IS or EV?  
**@jackie.jxt:** You might need to run the `downtime` mode because server is
already dead, and EV can never match IS  
**@jackie.jxt:** If you are already running `downtime` mode, but find server
show up again in the IS after a while, then that might be the race condition
fixed in . You may restart the controller and rebalance again with `downtime`
to recover  
**@luisfernandez:** Hey yes I noted someone having this issue and they link
that PR we have downgraded Pinot to 0.9.3 so I had to basically restart the
controllers once that happened things started to work agoan  
 **@diogo.baeder:** Hi folks! Now that we're using Pinot with realtime tables
in production, I'm also doing some experiments with offline tables for
something else I'm developing. However, one thing I'd like to do is to be able
to partition the data according to the values in some of the dimension
columns. I'll follow in a thread:  
**@diogo.baeder:** For example, suppose I have a table with a number of
columns, where two of them are "country" and "state". Suppose, for example,
that I have the following possible combinations: • Country: US, State: NJ •
Country: BR, State: SP • Country: BR, State: BA  
**@diogo.baeder:** In the case above, I'd like to have 3 partitions, one for
each combination of country + state, but without knowing beforehand that I'll
end up having 3 partitions - because I want to have more partitions in the
future if more combinations are necessary.  
**@diogo.baeder:** Does Pinot support that? If yes, then is there a doc I can
follow to learn how to do that? Thanks! :slightly_smiling_face:  
**@diogo.baeder:** Just to explain why I need this: in this system I'm
developing/experimenting-with, when we query for a certain subset of data, it
will necessarily be under the same combination of those columns; Therefore,
creating these partitions will be a good move for us, because none of our
queries would cross partition boundaries, and would reduce the amount of
documents to query - we'll also use inverted indexes because they will fit
well for us, but even then, partitioning will further improve performance by a
great deal for us.  

###  _#community_

  
 **@piyush.chauhan:** @piyush.chauhan has joined the channel  

###  _#presto-pinot-connector_

  
 **@aaron.weiss:** @aaron.weiss has joined the channel  

###  _#getting-started_

  
 **@abhineshada:** @abhineshada has joined the channel  
 **@piyush.chauhan:** @piyush.chauhan has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org