You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/02/23 02:00:24 UTC

Apache Pinot Daily Email Digest (2022-02-22)

### _#general_

  
 **@jiajunbernoulli:** @jiajunbernoulli has joined the channel  
 **@wcxzjtz:** hello everyone, wondering if we have `st_setsrid` like function
in crdb to change the spatial reference system?  
**@g.kishore:** dont think its available. I know H3 library thats used
supports it. so, it might be a simple udf to support it cc @yupeng something
as simple as adding this to ScalarFunctions.java or overloading the existing
functions to take srid as additional parameter ``` @ScalarFunction public
static byte[] setSRID(byte[] bytes, int srid) { Geometry geometry =
GeometrySerializer.deserialize(bytes); geometry.setSRID(srid); return
GeometrySerializer.serialize(geometry); }```  
**@wcxzjtz:** gotcha. is our default spatial reference system id `4326` ?  
**@yupeng:** Yes, it's 4326  
**@yupeng:** Adding this func is easy, the hard part is that the serialization
does not store it today  
**@yupeng:** Serialization today uses 1 bit to differentiate geography vs
geometry, but not the general srid to save storage.  
**@yupeng:** It's possible to build an extension to this, though  
**@wcxzjtz:** got it. thanks.  
 **@prashant.pandey:** Hello team, I am trying to run the Realtime Provisioner
for one of my tables with the following config: `RealtimeProvisioningHelper
-tableConfigFile /Users/prashant.pandey/table_config.json -numPartitions 4
-pushFrequency null -numHosts 12 -numHours 2 -sampleCompletedSegmentDir
/Users/prashant.pandey/segment_dir -ingestionRate 4750 -maxUsableHostMemory
10G -retentionHours 24` The segment is around 426M in size. But this returns
the following: ```Note: * Table retention and push frequency ignored for
determining retentionHours since it is specified in command * See  2022/02/22
11:41:31.825 INFO [RealtimeProvisioningHelperCommand] [main] Memory used per
host (Active/Mapped) numHosts --> 12 | numHours 2 --------> NA | 2022/02/22
11:41:31.826 INFO [RealtimeProvisioningHelperCommand] [main] Optimal segment
size numHosts --> 12 | numHours 2 --------> NA | 2022/02/22 11:41:31.826 INFO
[RealtimeProvisioningHelperCommand] [main] Consuming memory numHosts --> 12 |
numHours 2 --------> NA | 2022/02/22 11:41:31.827 INFO
[RealtimeProvisioningHelperCommand] [main] Total number of segments queried
per host (for all partitions) numHosts --> 12 | numHours 2 --------> NA |
Class transformation time: 0.271994872s for 4134 classes or
6.579459893565553E-5s per class``` Why am I getting `N/A` s? Is the config
incorrect?  
**@prashant.pandey:** We found why this happened. The problem was that our
retention period is 7 days, but we move segments to OFFLINE servers under 3h.
I was configuring the retention to be 7 days due to which `if
(activeMemoryPerHostBytes <= _maxUsableHostMemory)` in `MemoryEstimator.java`
was evaluating to be `false`.  
**@mark.needham:** so did you have to update your table config to get this
working?  
**@ssubrama:** 1\. @prashant.pandey this means that you don't have enough
active memory to host all your mem requirements for 24h. You can run the
command with higher memory and see what it reports. It will give you a report
of mapped vs raw memory as well (which means data is pulled from disk by OS
whenever needed). If you are ok with that, then you may be fine with the
existing memory/numHosts. Otherwise, you need to increase something. Just to
get an idea, you can always run the command with higher memory and more number
of hosts (you can give multiple values) and see where you stand.  
**@ssubrama:** @mark.needham not sure why table config needs to change?  
**@mark.needham:** Ah I dunno, was just asking what Prashant had changed to
get it to work.  
**@prashant.pandey:** @mark.needham Yes actually had to reduce retention from
7 days to 3h in our table config. This was done was segments are stored in
realtime servers only for some time, and then are moved to OFFLINE servers.
The program actually uses what’s in the supplied config over what’s supplied
in the program args. So this 24h was actually moot and not used - It was using
full 7 days as retention period as was present in the config@ssubrama. I think
we can document this special case, and also that retention period specified in
the config takes precedence over the one supplied in prog. args.  
**@moradi.sajjad:** @prashant.pandey that's not the case. Look at the code in
RTProvHelper Command where it uses the value:  If _retentionHours is provided
as a command argument, it ignores the table config retention.  
**@moradi.sajjad:** And subbu is right. When you get NA, it means the memory
is not enough. So use a large number for maxUsableHostMemory parameter so
you'll see how much memory you'll need  
 **@alihaydar.atil:** Hello everyone :slightly_smiling_face:  I was wondering
if there is any update on this issue? Is there any work done on it or are you
planning on implementing this feature in the near future? Wish everybody a
great day!  
 **@kishorenaidu712:** @kishorenaidu712 has joined the channel  
 **@ryantle1028:** @ryantle1028 has joined the channel  
 **@kishorenaidu712:** Hi, is there any approach to view the contents stored
on segment ?  
 **@karinwolok1:** Meetup tomorrow!! Feel free to share with friends who you
think would benefit . :slightly_smiling_face:  
**@karinwolok1:** Welcome :wave: to all the new Apache Pinot :wine_glass:
community members! Please tell us who you are and what brought you here!
:smiley: @kishorenaidu712 @manish.jaiswal @jiajunbernoulli @naga.b
@jatink.5251 @spboora @aliakbari76318 @juan @praveen82 @surya.patnaik1
@imptrik @apte.kaivalya @jma @nouru @achyuthaputha @vvydier @karsumit94
@drew.flintosh @jt @pawel.wasowicz @rautelachetan @bvencill  
 **@makhli:** @makhli has joined the channel  
 **@tiger:** Hi, just wondering how does replicas work for realtime tables in
terms of choosing which replicas to query? From what I can see, it appears
that the broker randomly chooses which replica to use when querying.  
**@tiger:** Also, when a replica goes down and returns, does pinot wait for it
to recover and catch up before querying it again?  
**@g.kishore:** yes, it randomly selects the replica. As of today, when a
replica goes and returns, broker does not wait to recover and catch up before
querying. We thought of adding this but we did not because in practice, the
catch up is very fast and hardly noticeable.. we have seen speed of 100k
event/sec during catch up..  
**@tiger:** thanks!  

###  _#random_

  
 **@jiajunbernoulli:** @jiajunbernoulli has joined the channel  
 **@kishorenaidu712:** @kishorenaidu712 has joined the channel  
 **@ryantle1028:** @ryantle1028 has joined the channel  
 **@makhli:** @makhli has joined the channel  

###  _#troubleshooting_

  
 **@jiajunbernoulli:** @jiajunbernoulli has joined the channel  
 **@yeongjukang:** Hello team, I wanted to drop a server instance from a
cluster to shrink cluster size. So I executed command below after helm chart
update but met an error. ```curl -XDELETE
localhost:9000/instances/Server_pinot-server-2.pinot-server-headless.dev-
pinot.svc.cluster.local_8098 {"_code":409,"_error":"Failed to drop instance
Server_pinot-server-2.pinot-server-headless.dev-pinot.svc.cluster.local_8098 -
Instance Server_pinot-server-2.pinot-server-headless.dev-
pinot.svc.cluster.local_8098 exists in ideal state for user2_REALTIME"}``` •
What will happen if i update zk's idealstate of all tables related to server-2
to server-1? (table status became healthy again) • Will also there be
automatic copy based on other segment to maintain replica desire?  
**@g.kishore:** this should help  
**@yeongjukang:** @g.kishore Thanks a lot! I tried that one through GUI after
server deletion but met this one. ```Caused by: java.net.UnknownHostException:
pinot-server-2.pinot-server-headless.dev-pinot.svc.cluster.local```  
**@yeongjukang:** Additionally, before that, I didn't read the log then but
there were always message that segments are balanced.  
**@mayanks:** I think the sequence is to first untag the server, then
rebalance, and then remove the server.  
**@yeongjukang:** @mayanks Thanks for reply. I got aware of untag now. Does
the rebalance do same thing internally with what i did?  
**@yeongjukang:** It just works now so that's why I am asking  
**@mayanks:** Rebalance does not untag.  
 **@deemish2:** Hello Team , I am running query via pinot UI using where
clause on some column value == false. it gives result --. even when we use
where clause and filter value with 0.  
**@richard892:** hi I've noticed this myself. @sanket can you take a look at
this please? cc @mayanks  
 **@kishorenaidu712:** @kishorenaidu712 has joined the channel  
 **@kishorenaidu712:** Hey, I recently started using pinot and facing an issue
with ingesting data with JSON column. I have marked the column as JSON data
type in schema and have used JSON index for the column as well. But when I
query the data, I get null value for the entire JSON column. Where did I go
wrong?  
**@mark.needham:** Hey - no tsure, you'll have to give a bit more information.
e.g • How are you importing the data? • What do you table config/schema look
like?  
**@kishorenaidu712:** I am importing the data through batch ingestion from
standalone machine.  
**@mark.needham:** ok cool. So you said you get rows returned but they're
empty? Can you share the ingestion job spec + a sample of the CSV file that
you're ingesting?  
**@kishorenaidu712:**  
**@kishorenaidu712:** Yes the values returned are null, when I try querying
the data.  
**@mark.needham:** Thanks. Pinot tries to map the column name in the schema to
a field name in each JSON document. So you would need to create a key called
`sample` to have this work. If you update your JSON file to read like this:
```{"sample":
{"name":{"first":"daffy","last":"duck"},"score":101,"data":["a","b","c","d"]}}
{"sample":
{"name":{"first":"donald","last":"duck"},"score":102,"data":["a","b","e","f"]}}
{"sample": {"name":{"first":"mickey","last":"mouse"} ,"score":103 ,"data":["a"
,"b" ,"g" ,"h"]}} {"sample": {"name":{"first":"minnie" ,"last":"mouse"}
,"score":104 ,"data":["a" ,"b" ,"i" ,"j"]}} {"sample":
{"name":{"first":"goofy" ,"last":"dwag"} ,"score":104 ,"data":["a" ,"b" ,"i"
,"j"]}} {"sample": {"person":{"name":"daffy duck" ,"companies":[{"name":"n1"
,"title":"t1"} ,{"name":"n2" ,"title":"t2"}]}}} {"sample":
{"person":{"name":"scrooge mcduck" ,"companies":[{"name":"n1" ,"title":"t1"}
,{"name":"n2" ,"title":"t2"}]}}}```  
**@mark.needham:** and then run the ingestion job again  
 **@ryantle1028:** @ryantle1028 has joined the channel  
 **@apte.kaivalya:** Hey :wave: I am deploying pinot using helm charts on a
k8s cluster. I have done it several times before but seeing this issue for the
first time. Any ideas? ```Cluster manager: Broker_email-analytics-pinot-
broker-1.email-analytics-pinot-broker.email-pinot.svc.test01.k8s.run_8099
disconnected Failed to start Pinot Broker org.apache.helix.HelixException:
Cluster structure is not set up for cluster: email-analytics at
org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1124)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:701)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:738)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.broker.broker.helix.BaseBrokerStarter.start(BaseBrokerStarter.java:209)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.service.PinotServiceManager.startBroker(PinotServiceManager.java:143)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:92)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:276)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:302)
[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:276)
[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f]
Failed to start a Pinot [BROKER] at 0.691 since launch
org.apache.helix.HelixException: Cluster structure is not set up for cluster:
email-analytics at
org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1124)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:701)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:738)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.broker.broker.helix.BaseBrokerStarter.start(BaseBrokerStarter.java:209)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.service.PinotServiceManager.startBroker(PinotServiceManager.java:143)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:92)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:276)
~[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:302)
[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:276)
[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f]
Shutting down Pinot Service Manager with all running Pinot instances...
Shutting down Pinot Service Manager admin application... Deregistering service
status handler```  
**@apte.kaivalya:** the only thing changed is I am using a new Zk cluster  
**@mark.needham:**  
**@mark.needham:** I think you need to configure helm to auto restart
server/broker/controller on error  
**@mark.needham:** in this post I show how to do it for docker  
**@apte.kaivalya:** thanks I will look at that.  
**@apte.kaivalya:** I think helm already has retries.. because I have seen
pods going from failing to running state.  
**@mark.needham:** and still showing this error each time?  
**@mark.needham:** or you see the error message and it's actually working?  
**@apte.kaivalya:** one of the brokers started successfully other one keeps
failing  
**@apte.kaivalya:** same with controllers  
**@mark.needham:** with that error?  
**@apte.kaivalya:** yeah.. but it keeps trying  
**@apte.kaivalya:** ideally once a cluster structure is setup on Zk it should
work right?  
**@mark.needham:** yeh  
**@apte.kaivalya:** ok looks like the error has gone away.  
**@mark.needham:** oh ok  
**@mark.needham:** That error should be a race condition that only happens the
first time that a cluster is formed. I have tried it loads of times to check
that assumption and it seems to be true. But let us know if you see it happen
again.  
**@apte.kaivalya:** thank you. yes I will notify :eyes:  
**@xiangfu0:** if a new zk, make sure you have controller started then
broker/servers?  
**@xiangfu0:** For the first time, controller will construct all the paths  
**@apte.kaivalya:** Hmm ok, let me check if I can control the startup order  
**@makhli:** @makhli has joined the channel  
 **@luisfernandez:** hey friends, I asked this sometime ago, i’m my company we
are trying to move to pinot from another data source, we are trying to
validate whatever we are storing in pinot is equal to what we have in our
separate data source, how can you do this kind of validations with pinot? last
time i was suggested to treat the underlying topic that our table consume from
as the source of truth, does this still hold true? so you would compare the
contests on that topic vs biquery? thanks for your help!  
**@g.kishore:** use time based queries and give enough buffer for all sources
to catch up..  
**@g.kishore:** for e.g. compare ```select count(*) from T where time between
t1 and t2 select sum(metric) from T where time between t1 and t2 select
distinctCount(dim) from T where time between t1 and t2``` run this on both the
big query and Pinot  
**@luisfernandez:** thank you!  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org