You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/12/15 02:00:21 UTC

Apache Pinot Daily Email Digest (2021-12-14)

### _#general_

  
 **@emailvidhi01:** @emailvidhi01 has joined the channel  
 **@rishi.shukla:** @rishi.shukla has joined the channel  
 **@karinwolok1:** :partying_face: :partying_face: :partying_face: Apache
Pinot :wine_glass: has officially hit 1 MILLION :tada: downloads!
:partying_face: :partying_face: :partying_face:  
 **@weixiang.sun:** We are working on offline segment ingestion. Currently we
are using the TarPush. But its problem is that the controller need get
involved with the data path by downloading the segment. Just curious, how does
metadata push prevent the controller getting involved with data path?  
**@ken:** With metadata push, you give the controller the URI of where the
segment is located. This is used to update Zookeeper state, and (if needed)
will trigger a download by the server processes. Which is why, when doing
metadata push, you need to have your “deep store” location for segments be a
shared file system (S3, HDFS, etc) that all the servers can access.  
**@weixiang.sun:** @ken Thanks  
**@weixiang.sun:** @elon.azoulay ^^  
**@elon.azoulay:** Yep, but in order to get the metadata
SegmentPushUtils.sendSegmentUriAndMetadata downloads the segment, extracts the
metadata and only uploads the metadata.  
**@elon.azoulay:** From the code ```/** * This method takes a map of segment
downloadURI to corresponding tar file path, and push those segments in
metadata mode. * The steps are: * 1. Download segment from tar file path; * 2.
Untar segment metadata and creation meta files from the tar file to a segment
metadata directory; * 3. Tar this segment metadata directory into a tar file *
4. Generate a POST request with segmentDownloadURI in header to push tar file
to Pinot controller. * * @param spec is the segment generation job spec *
@param fileSystem is the PinotFs used to copy segment tar file * @param
segmentUriToTarPathMap contains the map of segment DownloadURI to segment tar
file path * @throws Exception */```  
**@elon.azoulay:** Atleast in pinot 0.8.0 ^^^. Did it change in a newer
version of pinot?  
**@weixiang.sun:** Is SegmentPushUtils.sendSegmentUriAndMetadata called
outside the controller?  
**@elon.azoulay:** Looks like it's called from the ingestion jobs  
**@weixiang.sun:** segment download here is not happening in the controller.  
**@elon.azoulay:** Only place I see it called is from ingestion jobs  
**@elon.azoulay:** But the upload segment call happens on the controller:
PinotSegmentUploadDownloadRestletResource  
 **@karinwolok1:** :mega: *Kafka Summit London is looking for speakers!*
:mega: Interested in speaking? You have until Dec 20 to submit, so send in
your talk now!!! :partying_face:  
**@npawar:** Thanks for the reminder @karinwolok1! This :point_up: is a really
great opportunity folks :slightly_smiling_face: Many of you are doing really
cool things with Pinot and Kafka, and this is the best platform to share your
story about using these 2 systems together. Plus it’s in person at London this
time! (though KS virtual experience is also extremely fun!)  
 **@chris.jayakumar:** Hello folks, what is the recommended system specs for
each of the services required for a pinot cluster. Is there a formula to
calculate this based on the size of the data?  
**@mayanks:** Hello, depending on your data size and workload, you can go
anywhere from 4-32 cores, 4-64GB for the serving nodes.  
**@chris.jayakumar:** is that per service like controller, server etc? or you
mean overall?  
**@mayanks:** That was for pinot-server. For Controller - the parameter is
total number of tables and segments across all tables. Typically 4-8 core will
do the job. For broker. 4-16 cores, 4GB to 64GB depending on your workload  
**@chris.jayakumar:** cool thanks for your help Mayank  

###  _#random_

  
 **@emailvidhi01:** @emailvidhi01 has joined the channel  
 **@rishi.shukla:** @rishi.shukla has joined the channel  

###  _#feat-presto-connector_

  
 **@lrhadoop143:** @lrhadoop143 has joined the channel  

###  _#troubleshooting_

  
 **@lrhadoop143:** Hi team ,facing issues while accessing pinto table in
presto getting Query 20211214_044329_00009_tcxbr failed:
java.net.SocketTimeoutException: Connect Timeout  
 **@emailvidhi01:** @emailvidhi01 has joined the channel  
 **@rishi.shukla:** @rishi.shukla has joined the channel  
 **@lrhadoop143:** Hi team, error while running join queries of pinot data
from presto. ERROR:Query 20211214_102018_00035_4f5rn failed: null value in
entry: Server_172.19.0.5_7000=null  
**@lrhadoop143:** Log: ```java.lang.NullPointerException: null value in entry:
Server_172.19.0.5_7000=null at
com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
at
com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42)
at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72) at
com.google.common.collect.ImmutableMap.of(ImmutableMap.java:124) at
com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:458) at
com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) at
com.facebook.presto.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:242)
at
com.facebook.presto.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:214)
at
com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:161)
at
com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:280)
at
com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:245)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:424) at
com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:307) at
com.facebook.presto.operator.Driver.tryWithLock(Driver.java:728) at
com.facebook.presto.operator.Driver.processFor(Driver.java:300) at
com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
at
com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at
com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
at
com.facebook.presto.$gen.Presto_0_267_SNAPSHOT_ac0dc73____20211214_100300_1.run(Unknown
Source) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```  
**@mayanks:** What versions of Pinot/Presto are you using?  

###  _#pinot-dev_

  
 **@ilirsuloti:** @ilirsuloti has joined the channel  

###  _#presto-pinot-connector_

  
 **@lrhadoop143:** Hi team ,facing issues while accessing pinto table in
presto getting Query 20211214_044329_00009_tcxbr failed:
java.net.SocketTimeoutException: Connect Timeout  
 **@xiangfu0:** have you tried with newest docker image ?  
 **@xiangfu0:** we saw some groupby queries with issue  
 **@xiangfu0:** if you can share more logs on your presto coordinator/worker,
that will be very useful for debugging  
 **@lrhadoop143:** Hi Xiang ,Now I can connect to presto and able to do simple
queries ,but while trying joins and sub quires query is failing with error:
```java.lang.NullPointerException: null value in entry:
Server_172.19.0.5_7000=null at
com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
at
com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42)
at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72) at
com.google.common.collect.ImmutableMap.of(ImmutableMap.java:124) at
com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:458) at
com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) at
com.facebook.presto.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:242)
at
com.facebook.presto.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:214)
at
com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:161)
at
com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:280)
at
com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:245)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:424) at
com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:307) at
com.facebook.presto.operator.Driver.tryWithLock(Driver.java:728) at
com.facebook.presto.operator.Driver.processFor(Driver.java:300) at
com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
at
com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at
com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
at
com.facebook.presto.$gen.Presto_0_267_SNAPSHOT_ac0dc73____20211214_100300_1.run(Unknown
Source) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```  
**@xiangfu0:** can you explain the query ?  
 **@xiangfu0:** is presto and pinot deployed in same cluster?  
 **@xiangfu0:** for pinot side, can you try to add below into your pinot
server configs: ```pinot.server.instance.currentDataTableVersion=2
pinot.server.grpc.enable=true pinot.server.grpc.port=8090```  

### _#pinot-perf-tuning_

  
 **@rohitdev.kulshrestha:** @rohitdev.kulshrestha has joined the channel  

###  _#getting-started_

  
 **@ilirsuloti:** @ilirsuloti has joined the channel  
 **@weixiang.sun:** @weixiang.sun has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org