You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/03/04 02:00:15 UTC

Apache Pinot Daily Email Digest (2021-03-03)

### _#general_

  
 **@joshhighley:** Ingesting JSON data into a realtime table. A field in the
JSON is a JSON string with leading spaces but is always numeric data
otherwise: ```{ "account":" 123", .....}``` If my realtime table defines the
account column as DOUBLE, then the record loads with no issue -- the spaces
appear to be ignored. However, if I define the column as INT then the record
does not load. More troublesome, I can't find any error messages in any of the
logs -- I would expect some kind of error message?  
**@mayanks:** Thanks for reporting @joshhighley, let me take a look at the
code, and will get back to you  
**@mayanks:** @joshhighley I did a small experiment, Double.parseDouble can
parse " 123", but Integer.parseInt throws NumberFormatException. I suspect
that the exception is being swallowed. Either way, seems like a bug.  
**@mayanks:** For a temporary work-around, is it possible for you to strip the
leading spaces? And also file an issue, so we can fix this.  
**@joshhighley:** No, unfortunately, it's not practical for us to parse the
record prior, modify the value, then write it out again.  
**@joshhighley:** using a Double column type will probably be our workaround  
**@joshhighley:** I submitted issue #6634  
**@mayanks:** Thanks for submitting the issue.  
**@g.kishore:** @mayanks so jave trims the string for double parse but not for
integer parse?  
**@mayanks:** Yes  
**@g.kishore:** That’s bizarre. Should be a simple fix but might add perf
overhead for things that are already trimmed may be try trim only on
exception?  
**@mayanks:** This was a standalone unit test that I did, I'll take a look at
where in the code we do the type conversion a little later.  
**@mayanks:** ```Double.parseDouble(" 123"); -> 123.0```  
**@mayanks:** ```Integer.parseInt(" 123"); -> NumberFormatException```  
**@joshhighley:** BTW, if I remove the leading spaces from the String, then it
will convert successfully to int. I tried using a data transformation to do
this, but they aren't allow to transform a column to the same column.  
 **@nachiket.kate:** @nachiket.kate has joined the channel  
 **@m.e.driscoll:** @m.e.driscoll has joined the channel  
 **@lloyd.branch:** @lloyd.branch has joined the channel  
 **@csanderson.data:** @csanderson.data has joined the channel  
 **@miliang:** @miliang has joined the channel  
 **@joshhighley:** When streaming data via Kafka to a realtime table, does it
have to be 1 record per message or is there a way to put multiple records in a
single message?  

###  _#random_

  
 **@nachiket.kate:** @nachiket.kate has joined the channel  
 **@m.e.driscoll:** @m.e.driscoll has joined the channel  
 **@lloyd.branch:** @lloyd.branch has joined the channel  
 **@csanderson.data:** @csanderson.data has joined the channel  
 **@miliang:** @miliang has joined the channel  

###  _#feat-presto-connector_

  
 **@dutta.kinshuk:** @dutta.kinshuk has joined the channel  

###  _#troubleshooting_

  
 **@nachiket.kate:** @nachiket.kate has joined the channel  
 **@elon.azoulay:** Hi, we have an issue where the pinot servers are in a
crash loop, they cannot start up. The servers are spewing tons of messages
like : ```[HelixTaskExecutor] [ZkClient-EventThread-23-pinot-us-
central1-zookeeper:2181] SessionId does NOT match. expected sessionId:
300000c69e5009a, tgtSessionId in message: 300000c69e50099, messageId:
9d191304-00cc-4138-bb57-7997a960fab0```  
**@elon.azoulay:** When I look in the errors section of the zookeeper browser
I see: ```"id":
"300000c69e50084__enriched_customer_orders_jp_upsert_realtime_streaming_v1_REALTIME",
"simpleFields": {}, "mapFields": { "HELIX_ERROR 20210303-100525.000929
STATE_TRANSITION 7f8da719-5667-4d33-adb9-76a8010c9c56": { "AdditionalInfo":
"Exception while executing a state transition task
enriched_customer_orders_jp_upsert_realtime_streaming_v1__7__330__20210224T2322Zjava.lang.reflect.InvocationTargetException\n\tat
jdk.internal.reflect.GeneratedMethodAccessor452.invoke(Unknown Source)\n\tat
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat
org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404)\n\tat
org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331)\n\tat
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)\n\tat
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)\n\tat
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
java.base/java.lang.Thread.run(Thread.java:834)\nCaused by:
java.util.NoSuchElementException: 'segment.total.docs' doesn't map to an
existing object\n\tat
org.apache.commons.configuration.AbstractConfiguration.getInt(AbstractConfiguration.java:816)\n\tat
org.apache.pinot.core.segment.index.metadata.SegmentMetadataImpl.<init>(SegmentMetadataImpl.java:128)\n\tat
org.apache.pinot.core.segment.index.loader.SegmentPreProcessor.<init>(SegmentPreProcessor.java:71)\n\tat
org.apache.pinot.core.indexsegment.immutable.ImmutableSegmentLoader.load(ImmutableSegmentLoader.java:98)\n\tat
org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:283)\n\tat
org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:133)\n\tat
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164)\n\t...
11 more\n", "Class": "class
org.apache.helix.messaging.handling.HelixStateTransitionHandler", "MSG_ID":
"8237ad10-da30-4ad9-8b80-930b437d48fa", "Message state": "READ" },
"HELIX_ERROR 20210303-100532.000104 STATE_TRANSITION
24244f32-ca45-463b-9c15-586d5a667669": { "AdditionalInfo": "Message execution
failed. msgId: 8237ad10-da30-4ad9-8b80-930b437d48fa, errorMsg:
java.lang.reflect.InvocationTargetException", "Class": "class
org.apache.helix.messaging.handling.HelixStateTransitionHandler", "MSG_ID":
"8237ad10-da30-4ad9-8b80-930b437d48fa", "Message state": "READ" }```  
**@jackie.jxt:** Based on the error message, seems the segment
`enriched_customer_orders_jp_upsert_realtime_streaming_v1__7__330__20210224T2322Z`
is clasped.  
**@jackie.jxt:** Does this happen to only one server or all servers?  
**@elon.azoulay:** only the tenants where it exists.  
**@jackie.jxt:** If you have time, we can have a quick zoom chat to debug the
issue  
**@elon.azoulay:** wow, I owe you one:) Sure whenever you have some time.  
**@jackie.jxt:**  
 **@m.e.driscoll:** @m.e.driscoll has joined the channel  
 **@lloyd.branch:** @lloyd.branch has joined the channel  
 **@csanderson.data:** @csanderson.data has joined the channel  
 **@miliang:** @miliang has joined the channel  
 **@miliang:** Hey, it seems there is a bug in most recent code of pinot. This
kind of query will throws exception:  
 **@miliang:** But it previously works well:  
 **@fx19880617:** I think the in clause should use single quote  
**@miliang:** ```SELECT jsonExtractScalar(mapDim2json, '$.non-existing-key',
'INT') FROM FeatureTest1 WHERE bytesDimSV1 = 'deed0507' AND
jsonExtractKey(mapDim2json, '$.*') in ('$[non-existing-key]')```  
**@miliang:** or ```SELECT jsonExtractScalar(mapDim2json, '$.non-existing-
key', 'INT') FROM FeatureTest1 WHERE bytesDimSV1 = 'deed0507' AND
jsonExtractKey(mapDim2json, '$.*') in ('$[\'non-existing-key\']')```  
**@fx19880617:** right  
**@fx19880617:**  
**@fx19880617:** Pinot uses single quote for literals and double quote for
identifiers  
 **@fx19880617:** the previous version of pinot doesn’t check on that, so it
will return empty results always  

###  _#onboarding_

  
 **@nachiket.kate:** @nachiket.kate has joined the channel  

###  _#aggregators_

  
 **@nachiket.kate:** @nachiket.kate has joined the channel  

###  _#pinot-dev_

  
 **@dutta.kinshuk:** @dutta.kinshuk has joined the channel  

###  _#pinot-docs_

  
 **@dutta.kinshuk:** @dutta.kinshuk has joined the channel  

###  _#pinot-perf-tuning_

  
 **@nachiket.kate:** @nachiket.kate has joined the channel  

###  _#feat-partial-upsert_

  
 **@yupeng:** @npawar @jackie.jxt @tingchen could you review this doc again
and approve it at the top  ?  
 **@npawar:** @npawar has joined the channel  
 **@yupeng:** also , i hope we all agree on the merger interface  
 **@jackie.jxt:** The merger interface looks good. How we handle the merge of
each column is implementation details  
 **@jackie.jxt:** The interface should be row based  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org