You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/05/01 02:00:17 UTC

Apache Pinot Daily Email Digest (2021-04-30)

### _#general_

  
 **@ntchervenski:** @ntchervenski has joined the channel  
 **@digvijaybhk:** @digvijaybhk has joined the channel  
 **@srinivas96alluri:** @srinivas96alluri has joined the channel  
 **@jonathan1841:** @jonathan1841 has joined the channel  

###  _#random_

  
 **@ntchervenski:** @ntchervenski has joined the channel  
 **@digvijaybhk:** @digvijaybhk has joined the channel  
 **@srinivas96alluri:** @srinivas96alluri has joined the channel  
 **@kevdesigned:** @kevdesigned has left the channel  
 **@jonathan1841:** @jonathan1841 has joined the channel  

###  _#troubleshooting_

  
 **@ntchervenski:** @ntchervenski has joined the channel  
 **@digvijaybhk:** @digvijaybhk has joined the channel  
 **@srinivas96alluri:** @srinivas96alluri has joined the channel  
 **@pedro.cls93:** Hello, I'm encountering a problem where Pinot is not
consuming kafka events for a realtime table after defining it. What are some
quick places to look at, to understand what might be causing this?  
**@mayanks:** ```1. Check external view/ ideal state to see if table was
created or not. 2\. If table was created, is the segment in external view in
error state? Then look at server log 3\. If table not created, look at
controller log on what happened when you issused the create table command.```  
**@pedro.cls93:** This kafka topic has 13M+ events partitioned over 16
partitions. I can consume these events using kafka tools. Memory-wise & CPU-
Wise Pinot components seem stable.  
**@pedro.cls93:** Ideal_state reports "customized" mode: ```{ "id":
"ComputedView_REALTIME", "simpleFields": { "BATCH_MESSAGE_MODE": "false",
"BUCKET_SIZE": "0", "IDEAL_STATE_MODE": "CUSTOMIZED", "INSTANCE_GROUP_TAG":
"ComputedView_REALTIME", "MAX_PARTITIONS_PER_INSTANCE": "1", "NUM_PARTITIONS":
"0", "REBALANCE_MODE": "CUSTOMIZED", "REPLICAS": "1", "STATE_MODEL_DEF_REF":
"SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" },
"mapFields": { "ComputedView__0__0__20210429T1647Z": { "Server_pinot-
server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
}, "ComputedView__10__0__20210429T1647Z": { "Server_pinot-server-0.pinot-
server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__11__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__12__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__13__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__14__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__15__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__1__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__2__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__3__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__4__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__5__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__6__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__7__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__8__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" },
"ComputedView__9__0__20210429T1647Z": { "Server_pinot-server-0.pinot-server-
headless.dc-pinot.svc.cluster.local_8098": "CONSUMING" } }, "listFields": {}
}```  
**@mayanks:** This is good. Check external view  
**@pedro.cls93:** That is the external view from the zookeeper Browser, unless
I'm meant to search elsewhere?  
**@mayanks:** You mentioned it was Ideal state  
**@mayanks:** Is above that you pasted Ideal State or External View?  
**@pedro.cls93:** I meant the 'IDEAL_STATE_MODE' property from the
externalView found in the Zookeeper Browser is set to "customized". My
apologies for the confusion.  
**@mayanks:** Ok, external view says that segments are in consuming state  
**@mayanks:** Which implies servers must be consuming  
**@mayanks:** What's the issue you are seeing?  
**@pedro.cls93:** They have not changed from consuming in 24h and yet trying
to count number of records in this Table is always at 0.  
**@mayanks:** check any server to see if it has logs stating it is consuming  
**@pedro.cls93:** Controller reports not finding segments: ```2021/04/30
17:01:44.228 WARN [SegmentDeletionManager] [grizzly-http-server-0] Failed to
find local segment file for segment
file:/var/pinot/controller/data/ComputedView/ComputedView__0__0__20210429T1713Z```  
**@mayanks:** What is controller data dir?  
**@pedro.cls93:** Is there a way to recompute/recreate segments? I think
someone deleted the pvcs associated with the k8s deployment.  
**@mayanks:** I thikn if you restart all servers they should simply start
consuming from beginning (since nothing was saved)  
**@pedro.cls93:** Is there a Pinot API to restart the servers? Or simply
deleting the k8s resources?  
**@mayanks:** Probably no api. Not sure what deleting would do. Does k8s not
have option to restart?  
**@pedro.cls93:** Only for deployment resources, which my pinot installation
is not.  
**@mayanks:** If you delete and recreate, will they get the same name? If not,
not sure what the behavior would be. In that case simply nuke everything
(delete table first) and restart might be cleaner/safer.  
**@pedro.cls93:** Names are consistent yes, I will try that on pods first  
**@mayanks:** ok  
**@pedro.cls93:** Are segments stored in the same place as table & schema
definitions?  
**@mayanks:** No  
**@mayanks:** Table/Schema is stored in ZK  
**@mayanks:** Segments are backed up in deep-store (that you configure on
controller)  
**@pedro.cls93:** Restarting the server pod did not work. I think the issue is
because deleting pods, does not delete state of those pods which are defined
in Persistent Volume Claims.  
**@pedro.cls93:** What about segment metadata? Is that in zookeeper?  
**@mayanks:** Whatever you can browse via ZK is stored there (include
segmentZKMetadata). There's also a segment metadata in the segment file itself  
**@pedro.cls93:** I think I need to delete segment metadata, to force Pinot to
re-read from kafka.  
**@mayanks:** Oh yea  
**@pedro.cls93:** So this would be stored in ZK right?  
**@mayanks:** Then easier to delete and recreate the table  
**@mayanks:** It will be much cleaner that way I think  
**@pedro.cls93:** Understood, I'm trying to see if there is an alternative in
case something like this happens in Prod and deleting + recreating the table
is not an option.  
**@mayanks:** In prod, you should configure it in a way that restart server is
possible. That will fix this  
**@mayanks:** And you will also not lose prior segments that are already
committed  
**@pedro.cls93:** Is there documentation on how to do configure server restart
in K8s?  
**@mayanks:** @fx19880617 ^^  
**@pedro.cls93:** Is there a way to from ZK to know the path of each segment
of a Table in disk?  
**@pedro.cls93:** And is there a way to edit that information if need be?  
**@mayanks:** There are two copies of segments 1) One backup in deep-store 2)
Local copy in serving nodes  
**@mayanks:** ZK has the download url of segments, that points to the deep-
store. In the segment metadata  
**@pedro.cls93:** Download urls are "null" for me, since I don't have deep-
store configured yet.  
**@pedro.cls93:** What about the local copies?  
**@mayanks:** it should not be null  
**@mayanks:** Oh it is null because no segment was committed  
**@mayanks:** Servers store segment on local disk (whatever data dir you
specified) for serving  
**@mayanks:** Also, if you didn't specify a deep-store, then it defaults to
dataDir in controller. And the download url will be a controller URL  
**@fx19880617:** why you want to do so in k8s : `Is there documentation on how
to do configure server restart in K8s?`  
**@mayanks:** @fx19880617 Let me summarize:  
**@mayanks:** ```1. The cluster went into a state where all segments are
showing as consuming in EV. 2\. However, someone delete the PVC, so no
segments. 3\. Typically, fixing PVC + restarting servers should have fixed the
problems - as in servers would start consuming again.```  
**@mayanks:** My recommendation was to delete/recreate the table. But Pedro is
asking what if this happens in prod, what are the ways to fix  
**@fx19880617:** you can try to manually delete the pod  
**@fx19880617:** then it will be recreated  
**@fx19880617:** and why delete pvc  
 **@kevdesigned:** @kevdesigned has left the channel  

###  _#minion-improvements_

  
 **@laxman:** Just want to give you guys an update where I am with this • Made
changes in our application to support and create HYBRID tables • Configured
RealtimeToOfflineConversion task to *8 tables* • Tasks are scheduled to
trigger every one hour in controller config • Deployed this two clusters (test
env) • Having 2 minion instances per cluster (Xms:2G, Xmx3G)  
 **@laxman:** Facing two problems as of now • This conversion job seems to
have issues with processing map-type fields with null values • Memory
consumption seems to high and sometimes going heap OOM. Yet to rebase and try
the patch provided by @npawar. Will do it over the weekend and early next week  
 **@laxman:** Stacktrace for first problem (NPE while processing map type
fields with null values) ```2021/04/30 19:43:10.609 ERROR
[TaskFactoryRegistry] [TaskStateModelFactory-task_thread-7] Caught exception
while executing task: Task_RealtimeToOfflineSegmentsTask_1619811769649_0
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.NullPointerException: null of array in field request_params__KEYS of
record at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.core.segment.processing.framework.SegmentReducer.flushRecords(SegmentReducer.java:119)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.core.segment.processing.framework.SegmentReducer.reduce(SegmentReducer.java:103)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.core.segment.processing.framework.SegmentProcessorFramework.processSegments(SegmentProcessorFramework.java:158)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.minion.executor.RealtimeToOfflineSegmentsTaskExecutor.convert(RealtimeToOfflineSegmentsTaskExecutor.java:205)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.minion.executor.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:116)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.minion.executor.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:51)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:81)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) [pinot-all-0.7.1-jar-
with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
[?:?] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by:
java.lang.NullPointerException: null of array in field request_params__KEYS of
record at
org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:93)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:87)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] ... 14 more
Caused by: java.lang.NullPointerException``` Corresponding schema snippet ```
{ "name": "request_params__KEYS", "dataType": "STRING", "singleValueField":
false, "defaultNullValue": "" }, { "name": "request_params__VALUES",
"dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, ```  
 **@laxman:** @jackie.jxt /@npawar /@fx19880617: Let me know if you have any
pointers for this problem? Any known issues around map types?  
 **@npawar:** will try this out and let you know  
 **@laxman:** Thanks @npawar Walked through the code. I guess issue is
converting __Keys and __Values to map object in Avro May be somewhere around
the following code. (We are on Pinot 0.7.1)
org.apache.pinot.core.segment.processing.framework.SegmentMapper#map
\-->org.apache.pinot.core.segment.processing.utils.SegmentProcessorUtils#convertGenericRowToAvroRecord  
 **@laxman:** Let me know if you need any help to reproduce this  
 **@npawar:** yes would like some help.. do you think you can add a test case
to reproduce this in the SegmentProcessingFrameworktest file?  
 **@jackie.jxt:** When we read values for these 2 fields, they should be MV
STRING type  
 **@jackie.jxt:** If we can reproduce it in a test, then it will be much
easier for us to debugging  

###  _#json-query-support_

  
 **@amrish.k.lal:** @amrish.k.lal has joined the channel  
 **@g.kishore:** @g.kishore has joined the channel  
 **@steotia:** @steotia has joined the channel  
 **@amrish.k.lal:** Hi @g.kishore @steotia, creating channel for json related
discussion. please feel free to add others if needed.  
 **@amrish.k.lal:** @g.kishore in continuation of what we discussed yesterday,
this is the existing query: `select jsoncolumn,json_extract_scalar(jsoncolumn,
'$.person.companies[:5].name', 'STRING') from jsontable where id = 106` which
produces the results: `{"person":{"name":"daffy
duck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}},
["n1","n2"]` What we would like to do is to rewrite the existing query to:
`select jsoncolumn.person.companies[:5].name from jsontable where id = 106`  
 **@amrish.k.lal:** Also, for JSON storage support, we had briefly looked at
BSON format (mongodb), JSON2 (derivative of BSON used in postgres), and  (used
by oracle json database.). In either case, we were looking at a format that
would help to minimizing parsing of json strings into json object before query
evaluation as is being done in json_extract_scalar.  
 **@amrish.k.lal:** I believe by "unnesting" you are referring to the fact
that `["n1","n2"]` could be separate rows in Pinot?  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org