You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/10/15 02:00:17 UTC
Apache Pinot Daily Email Digest (2020-10-14)

### _#general_

  
 **@krishna:** @krishna has joined the channel  
 **@murat.migdisoglu:** @murat.migdisoglu has joined the channel  
 **@murat.migdisoglu:** hello dear pinot community, I'm evaluating pinot in
our POC and comparing to druid. One thing that I couldn't find in the doc is
the rest api to trigger a batch ingestion. Is CLI the only way of submitting
batch ingestion job?  
 **@ssubrama:** @murat.migdisoglu on the controller, there is a `POST` api:
`:whateverport/segments`  
 **@murat.migdisoglu:** I have another issue now. During the realtime
ingestion, after publishing the first segment with 50K rows, Pinot does not
ingest anymore data. maybe it is not creating the new segment, Im not sure.
Its an append type table("segmentPushType": "APPEND",) with
("segmentPushFrequency": "HOURLY") . Where might be the issue? I can't see any
exception in any log file  
**@snlee:** @murat.migdisoglu Can void avoid mentioning `here` from the next
time? There are more than 600 people in this general channel :wink:  
**@npawar:** did you check server logs? most likely the consuming segment is
not able to finish completing the segment  
**@fx19880617:** also can we move the discussion to channel
<#C011C9JHN7R|troubleshooting> ?  

###  _#random_

  
 **@krishna:** @krishna has joined the channel  
 **@murat.migdisoglu:** @murat.migdisoglu has joined the channel  

###  _#troubleshooting_

  
 **@murat.migdisoglu:** @murat.migdisoglu has joined the channel  
 **@murat.migdisoglu:** I'm following up my thread related to the real time
ingestion here..  
**@npawar:** can you share your table config and schema?  
**@murat.migdisoglu:**  
**@npawar:** the issue might be the schema name. the table config has
`"schemaName": "revenue",` whereas the schema has `"schemaName":
"revenue_test_murat",`  
**@npawar:** Does your pinot-server log show absolutely no
warning/error/exception message?  
**@npawar:** and which version of Pinot are you on? The newer versions should
have blocked creating the table config with misssing schema  
**@murat.migdisoglu:** I'll reverify your point with schema mismatch  
**@murat.migdisoglu:** we're running 0.5.0  
**@murat.migdisoglu:** ok I verified the schema @npawar the table's schema is
revenue_test_murat  
**@murat.migdisoglu:** I'm tailing the server log  
**@murat.migdisoglu:** and it doesn't print anythin  
**@npawar:** did you delete and recreate table config after fixing the schema
name?  
**@murat.migdisoglu:** I did and I can retry.. But if that was the issue, why
would it ingest only the first 50000 rows? "segment.flush.threshold.size":
"50000"  
**@npawar:** because it’s not able to complete the segment. the consumer
consumed 50k rows based on this config and is not able to move forward coz
segment creation is failing  
**@murat.migdisoglu:** batch process works by the way  
**@murat.migdisoglu:** but its another table(offline) afaik  
**@npawar:** can you share the whole controller and server log  
**@murat.migdisoglu:** ```Could not build segment
java.lang.NullPointerException: null at
org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.addColumnMetadataInfo(SegmentColumnarIndexCreator.java:535)
~[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:489)
~[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:399)
~[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:240)
~[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:223)
~[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
org.apache.pinot.core.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:127)
~[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:742)
[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:693)
[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:604)
[pinot-all-0.5.0-jar-with-
dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at
java.lang.Thread.run(Thread.java:832) [?:?] Pr```  
**@murat.migdisoglu:** I've found an error  
**@murat.migdisoglu:** but after that, no matter how much data comes to kafka,
it does not generate any error  
**@npawar:** like i said before, the consumption is going to stop, if it
cannot create the segment.  
**@npawar:** this seems related:  
**@npawar:** i dont know if this was included in the 0.5.0. Checking  
**@npawar:** To unblock, you could try without `aggregateMetrics` , or build
from source  
**@npawar:** yup, that fix is not part of 0.5.0. Could you build from source?  
**@murat.migdisoglu:** You're right It worked without aggregation.  
**@murat.migdisoglu:** For the sake of poc i'll stop here. But im suprised to
have a bug in such a fundemantal feature :(  
**@murat.migdisoglu:** Thx a lot for your help  
**@npawar:** i’m also surprised :slightly_smiling_face: this feature is being
used in some places i believe. So my hunch is that it is the combination of
`aggregateMetrics: true` + `columnMinMaxValueGeneratorMode: ALL` . I have a
feeling it may work fine if you remove columnMinMaxValueGenerator. And fwiw,
it has been fixed on master and will be available in the next release  
**@npawar:** @mayanks this flag is used at LinkedIn right? How does it work
inspite of this:  ? Is there something specific in this table config that
might be triggering this?  
**@mayanks:** IIRC, this was a bug that got introduced and we hit the same
problem at LinkedIn. I believe that #5862 fixed the issue  
**@mayanks:** @npawar ^^  

###  _#docs_

  
 **@krishna:** @krishna has joined the channel  

###  _#jdbc-connector_

  
 **@krishna:** @krishna has joined the channel  

###  _#lp-pinot-poc_

  
 **@andrew:** hm, i don’t see that happening  
 **@andrew:** it’s all showing as REALTIME  
 **@andrew:** i thought it was configured to move completed segments to
offline  
 **@andrew:** i also see that a lot of segments are in a “bad” state  
 **@andrew:**  
 **@andrew:** if i click one of the bad segments, there’s no further
information explaining why  
 **@andrew:**  
 **@g.kishore:** its a minor UI bug - it shows bad when its getting converted
from consuming to ONLINE  
 **@g.kishore:**  
 **@fx19880617:** If all the segments are still on realtime nodes, then there
is some issue there  
 **@fx19880617:** from my benchmark, once segments are persisted, they will be
moved to offline servers  
 **@fx19880617:** do you have controller log or we can do a zoom call to look
at this  

###  _#roadmap_

  
 **@krishna:** @krishna has joined the channel  

###  _#metadata-push-api_

  
 **@fx19880617:**  
 **@fx19880617:** @steotia this is the user trying with metadata push  
 **@fx19880617:** they are trying to push 2k or 20k segments at once  
 **@fx19880617:** it’s still taking about 1+hour for them  
 **@fx19880617:** data volume is at couple TB level  
 **@mayanks:** @fx19880617 what is the deep store being used in this case?  
 **@fx19880617:** s3  
 **@fx19880617:** so we let controller skip the download part  
 **@mayanks:** Yeah, we want to skip that as well  
 **@fx19880617:** but the idealstats updater coming from all controllers will
drag down the upload speed  
 **@fx19880617:** thinking of if we make parallelism to 100, then 100 threads
trying to update idealstats at once  
**@mayanks:** I think we may need to ensure there are no race conditions  
 **@mayanks:** The issue we have is that not every deepstore has an overwrite
blob api  
 **@fx19880617:** that’s ok right ?  
 **@fx19880617:** if deepstore cannot override  
 **@fx19880617:** then you can make a daily directory and put segments with
same segment names  
 **@fx19880617:** then push  
 **@fx19880617:** let the refresh code path updating the download path as well  
 **@fx19880617:** since it’s already updating crc/segment refresh time etc  
 **@mayanks:** Yeah, i think that is fine. Thing we are debating was how to
cleanup old segment  
 **@mayanks:** Ideally, if upload path does the cleanup then it would be best  
 **@fx19880617:** then you can have a cleanup job in hadoop to do that  
 **@steotia:** Does S3 support overwrite?  
 **@fx19880617:** yes  
 **@fx19880617:** the override is user delete the directory then push  
 **@fx19880617:**  
 **@fx19880617:** s3 will just do the overwrite  
 **@fx19880617:** seems it also has versioning supported  
 **@fx19880617:** if your blob store has that support and you can somehow
inherit the version in download uri, then it will be fantastic  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org