You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/05/26 03:05:54 UTC

Apache Pinot Daily Email Digest (2022-05-25)

### _#general_

  
 **@dangngoctan2012:** @dangngoctan2012 has joined the channel  
 **@priya.shivakumar:** @priya.shivakumar has joined the channel  
 **@arnaud.zdziobeck:** @arnaud.zdziobeck has joined the channel  
 **@ralph.debusmann967:** One basic question about data preparation (for
ingestion into Pinot) - how do you combine e.g. multiple Kafka topics into one
table in Pinot so that you can query them as one - without having JOINs? Is
there any way to do it without heavy upfront stream processing using e.g.
Kafka Streams/ksqlDB/Flink/Materialize etc.?  
**@g.kishore:** Do you want to simply union thebtwo streams or perform some
kind of join across the two streams  
**@ralph.debusmann967:** A union would be a good start (basically putting a
bunch of Kafka topics into one table in Pinot), of course some kind of join
would be even better. I'd just like to avoid having to use stream processing
for this and just pull the data from various Kafka topics into Pinot and go
from there :slightly_smiling_face:  
**@ralph.debusmann967:** How is this done in LinkedIn for example?  
**@g.kishore:** It’s a samza job that does join and writes back to kafka  
**@ralph.debusmann967:** Thanks! And what if I don't want to add a stream
processing component to my architecture - what options would you recommend?  
**@g.kishore:** Depends on is it a join or simple union of two topics  
**@ralph.debusmann967:** So in our case we have e.g. one topic of daily
aggregated Twitter sentiments and one topic of daily aggregated copper prices
(simplified example). It could be that one of the time series has a different
starting point compared to the other - e.g. the Twitter sentiments would start
in 2015 and the copper prices in 1990. Would it be possible to bring the data
starting from 2015 together into one Pinot table with the union operation?  
**@g.kishore:** You can write a plug-in that is a composite consumer across
multiple topics  
**@ralph.debusmann967:** Cool thanks - I'll try that :grinning:  
**@g.kishore:** happy to help if you can share the PR or a github.  
 **@ysuo:** Hi team, I have a question about using Pinot JDBC to connect Pinot
controller deployed in K8s. ```DriverManager.registerDriver(new
PinotDriver()); //Connection conn = DriverManager.getConnection(DB_URL); //
will query DefaultTenant if not specified tenant here Properties info = new
Properties(); info.putIfAbsent("tenant", "TestBroker"); Connection conn =
DriverManager.getConnection(DB_URL,info); Statement statement =
conn.createStatement();``` but the following error returned: ```Caused by:
java.net.UnknownHostException: pinot-broker-8.pinot-broker-
headless.pinot.svc.cluster.local: nodename nor servname provided, or not known
at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at
java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)
at
java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)
at
java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505) at
java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364) at
java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298) at
java.base/java.net.InetAddress.getByName(InetAddress.java:1248) at
com.ning.http.client.NameResolver$JdkNameResolver.resolve(NameResolver.java:28)
at
com.ning.http.client.providers.netty.request.NettyRequestSender.remoteAddress(NettyRequestSender.java:359)
at
com.ning.http.client.providers.netty.request.NettyRequestSender.connect(NettyRequestSender.java:370)
at
com.ning.http.client.providers.netty.request.NettyRequestSender.sendRequestWithNewChannel(NettyRequestSender.java:282)
... 12 more``` Is there any configuration to make sure controller can get the
right accessible broker url?  
**@ysuo:** I’m locally testing connecting to pinot controller deployed in K8s
cluster.  
 **@teehan:** @teehan has joined the channel  
 **@matthew:** @matthew has joined the channel  
 **@tommaso.peresson:** @tommaso.peresson has joined the channel  
 **@ysuo:** I tried locally to execute query via broker through the command
below. curl -H “Content-Type: application/json” -X POST -d ‘{“sql”:“select *
from action limit 1”}’  This pinot cluster is deployed in K8s cluster and we
have 4 brokers in this pinot cluster.  is the exposed broker gateway address.
This table named action here is configured to tenant named tenanta which has
only one broker. When I run the above command, I can get the right results
sometimes. But most of the time, the following error returned.
org.apache.pinot.client.PinotClientException: Query had processing exceptions:
[{“message”:“BrokerResourceMissingError”,“errorCode”:410}] at
org.apache.pinot.client.Connection.execute(Connection.java:127) at
com.bigdata.PinotJava.main(PinotJava.java:53) Is there some configuration I’m
missing here for this issue? Any idea how to fix it?  
**@xiangfu0:** hi, for your case, you need to create a new service e.g.
`pinot-broker-tenanta` with different node selector to pick the right pinot
broker. Then you can query the exposed service or loadbalancer for that pinot
broker. Current k8s setup is for pure shared tenant.  
**@ysuo:** Hi, since sometimes this command could return the right result, is
it because the table tenant is matched that time? I mean, in my test, I got
one time right, and three times ‘BrokerResourceMissingError’, and it followed
this pattern when I tried more times. So, is there a possibility to set tenant
as a command parameter?  
 **@cesaro.angelo:** @cesaro.angelo has joined the channel  
 **@ghita.saouir:** @ghita.saouir has joined the channel  
 **@m.ram3sh:** @m.ram3sh has joined the channel  
 **@sonam.dp42:** @sonam.dp42 has joined the channel  
 **@rbobbala:** Hello Team, Can someone help me with the below error ```Error:
INSTALLATION FAILED: unable to build kubernetes objects from release manifest:
unable to recognize "": no matches for kind "Ingress" in version
"extensions/v1beta1"```  
**@mayanks:** @xiangfu0  
**@xiangfu0:** I think this is due to the ingress upgrade for k8s  
**@xiangfu0:** did you happen to enable the ingress here:  
**@rbobbala:** Yes  
**@rbobbala:** It tried to enable the ingress to access from the browser  
**@xiangfu0:** I think your k8s is on higher version that it doesn’t support
the current ingress  
**@xiangfu0:** cc: @diana.arnos  
**@rbobbala:** what is the version this supports  
**@rbobbala:**?  
**@xiangfu0:**  
**@xiangfu0:** current implementation is `apiVersion: extensions/v1beta1`  
**@rbobbala:** Yes I want to know the K8 version that supports
extensions/v1beta1  
**@xiangfu0:** We should upgrade this to ```apiVersion: ```  
**@rbobbala:** Thanks for sharing  
**@rbobbala:** Can I change ingress APi version ?  
**@rbobbala:** Just confused on how I can modify the ingress.yaml file jus for
my deployment  
**@rbobbala:** The helm chart uses the the templates from the Repo right  
**@rbobbala:** Wondering how can I modify the templates and make use of Helm
to install  
**@rbobbala:** or the alternative way is to install k8 cluster with an older
version that supports apiVersion: extensions/v1beta1  
**@xiangfu0:** Yes, you can modify the chart  
**@rbobbala:** okay  
**@xiangfu0:** In short the chart is just a template to be installed  
**@xiangfu0:** helm will apply values to the template  
**@rbobbala:** Got it  
**@xiangfu0:** values.yaml has the values and flags  
**@xiangfu0:** i can review the change if would love to contribute as well  
**@rbobbala:** Hope it doesn't mess up if I can change the apiVersion to
```apiVersion: ```  
**@xiangfu0:** you need to change the rest of the file  
**@xiangfu0:** just change apiVersion doesn’t work  
**@rbobbala:** okay  
**@xiangfu0:** you can follow  to add another section after  
**@rbobbala:** Thanks for sharing  

###  _#random_

  
 **@dangngoctan2012:** @dangngoctan2012 has joined the channel  
 **@priya.shivakumar:** @priya.shivakumar has joined the channel  
 **@arnaud.zdziobeck:** @arnaud.zdziobeck has joined the channel  
 **@teehan:** @teehan has joined the channel  
 **@matthew:** @matthew has joined the channel  
 **@tommaso.peresson:** @tommaso.peresson has joined the channel  
 **@cesaro.angelo:** @cesaro.angelo has joined the channel  
 **@ghita.saouir:** @ghita.saouir has joined the channel  
 **@m.ram3sh:** @m.ram3sh has joined the channel  
 **@sonam.dp42:** @sonam.dp42 has joined the channel  

###  _#feat-presto-connector_

  
 **@gaetanmorlet:** @gaetanmorlet has joined the channel  

###  _#pinot-power-bi_

  
 **@gaetanmorlet:** @gaetanmorlet has joined the channel  

###  _#troubleshooting_

  
 **@dangngoctan2012:** @dangngoctan2012 has joined the channel  
 **@priya.shivakumar:** @priya.shivakumar has joined the channel  
 **@arnaud.zdziobeck:** @arnaud.zdziobeck has joined the channel  
 **@teehan:** @teehan has joined the channel  
 **@matthew:** @matthew has joined the channel  
 **@tommaso.peresson:** @tommaso.peresson has joined the channel  
 **@lars-kristian_svenoy:** Hey team :wave: . I'm currently in the process of
writing a custom flink job which is able to atomically replace the segments
for a pinot refresh table. I've been looking into the segment replacement
protocol, and wanted to see if I understand this correctly.. More info in
thread  
**@lars-kristian_svenoy:** So prior to uploading segments, I should call
startReplaceSegments. Then after that has been called, can I then start
calling uploadSegment? I guess in this case, I should be uploading segments to
some other directory/bucket (s3). Once this is all done, do I then call
endReplaceSegments? What do I do if there is a failure while uploading
segments? Anything else I should know? Thank you all  
**@g.kishore:** This is needed for batch replacement of segments in an atomic
way.. if you want to just replace one segment at a time.. you can just call
upload segment  
 **@cesaro.angelo:** @cesaro.angelo has joined the channel  
 **@tommaso.peresson:** Hi Everyone. I'm currently setting up a table that has
a MV column called `users` containing a list of `user_id` . From what I've
tried `distinctcounthllmv()` can't be used as an aggregated function in a
star-tree index. Has anyone ever faced a similar problem? If yes how did you
solved it? Is it possible to calculate the raw-hll state at ingestion time and
then perform the estimation at query time? Thanks everyone for helping  
**@mayanks:** Yes you can have an hll column in ingested data and can still
query it using hll function  
**@tommaso.peresson:** do you have any documentation on how to perform this?  
**@mayanks:** @jackie.jxt  
 **@ghita.saouir:** @ghita.saouir has joined the channel  
 **@m.ram3sh:** @m.ram3sh has joined the channel  
 **@sonam.dp42:** @sonam.dp42 has joined the channel  

###  _#pinot-dev_

  
 **@dadelcas:** Hello, I'm reading the freshness metrics design document which
is something we are thinking of using in one of our uses cases. However the
freshness timestamp returned by pinot always seems to be the pinot indexing
time. Reading through the code it seems there isn't a row metadata
implementation for kafka. I'd like to confirm this is the case and if so I'd
like to contribute the code changes to get this working as per the design
document. I can't see an open issue in github related to this  
 **@g.kishore:** I thought it used timestamp from row metadata if it’s
available  
**@dadelcas:** Yup, it does choose indexing timestamp if row metadata is not
available. Doesn't seem like any of the stream ingestion plugins returns row
metadata at the moment  
 **@dadelcas:** The default implementation returns null  
 **@dadelcas:** I'm going to raise an issue in github and open a PR, I'll post
the link here if further discussion is needed  
 **@g.kishore:** :+1:  
 **@dadelcas:** This is the PR, I didn't write a lot of details on it nor in
the linked github issue. Apologies for the rush  
**@ken:** Is anyone else getting a dependency convergence failure when
building from master? Details in thread…  
**@ken:** I ran `mvn clean install -DskipTests -Pbin-dist` from the top, and
it failed when building `pinot-spark` with: ```[WARNING] Dependency
convergence error for org.apache.hadoop:hadoop-yarn-api:2.6.5 paths to
dependency are: +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT
+-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0
+-org.apache.spark:spark-yarn_2.11:2.4.0 +-org.apache.hadoop:hadoop-yarn-
api:2.6.5 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT
+-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0
+-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-
yarn-server-nodemanager:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and
+-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-
testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-yarn-server-
tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-server-resourcemanager:2.8.3
+-org.apache.hadoop:hadoop-yarn-api:2.8.3 and +-org.apache.pinot:pinot-
spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0
+-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-
yarn-server-resourcemanager:2.8.3 +-org.apache.hadoop:hadoop-yarn-server-
applicationhistoryservice:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and
+-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-
testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-yarn-server-
tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and
+-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-
testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-yarn-server-
tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and
+-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-
testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-minicluster:2.8.3
+-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-
yarn-api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT
+-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0
+-org.apache.hadoop:hadoop-minicluster:2.8.3 +-org.apache.hadoop:hadoop-yarn-
api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT
+-org.apache.hadoop:hadoop-yarn-api:2.10.1 [WARNING] Rule 0:
org.apache.maven.plugins.enforcer.DependencyConvergence failed with message:
Failed while enforcing releasability. See above detailed error message.```  
**@ken:** Only change I see to jar versions is on April 12th, by PJ Fanning,
where the Hadoop version was bumped to 2.10.1  
**@g.kishore:** not sure, how it passed the CI  
**@ken:** Yes, exactly - so maybe my setup is borked? Now I’m wading through
dependency graphs :disappointed:  
**@g.kishore:** do you have the PR?  
**@ken:** No, I was working on a different issue, so wanted to start fresh
from current Pinot master, but that build failed  
**@ken:** I’ll dig a bit more.  
**@ken:** Looks like modifying the pom to exclude spark-yarn from spark-
testing-base is sufficient. But I’m wondering why CI/original PR didn’t fail
with the same issue.  
 **@hareesh.lakshminaraya:** @hareesh.lakshminaraya has joined the channel  

###  _#announcements_

  
 **@gaetanmorlet:** @gaetanmorlet has joined the channel  

###  _#getting-started_

  
 **@dangngoctan2012:** @dangngoctan2012 has joined the channel  
 **@priya.shivakumar:** @priya.shivakumar has joined the channel  
 **@gunnar.enserro:** hey! my team an I are researching how to implement ML
and analytics into our pipeline! It could end up being a bottleneck... what
would be goods ideas for scaling, placement, and formatting Apache pinot for
ML tasks?  
**@mayanks:** Would like to understand the requirement a bit more. What do
these ML tasks do, and how are you planning to use Apache Pinot there?  
 **@arnaud.zdziobeck:** @arnaud.zdziobeck has joined the channel  
 **@teehan:** @teehan has joined the channel  
 **@gaetanmorlet:** @gaetanmorlet has joined the channel  
 **@matthew:** @matthew has joined the channel  
 **@tommaso.peresson:** @tommaso.peresson has joined the channel  
 **@cesaro.angelo:** @cesaro.angelo has joined the channel  
 **@ghita.saouir:** @ghita.saouir has joined the channel  
 **@m.ram3sh:** @m.ram3sh has joined the channel  
 **@sonam.dp42:** @sonam.dp42 has joined the channel  

###  _#pinot-docsrus_

  
 **@steotia:** can I get help in approving this  ?  
 **@steotia:** I don't seem to have write access on this repo  
 **@jlli:** just did  
 **@steotia:** thank you  
 **@sonam.dp42:** @sonam.dp42 has joined the channel  
 **@sonam.dp42:** Hi, I've just send out a PR for an Explain plan doc update:
The doc update is based on changes made in this PR:  can someone take a look.
cc @steotia  

###  _#introductions_

  
 **@dangngoctan2012:** @dangngoctan2012 has joined the channel  
 **@priya.shivakumar:** @priya.shivakumar has joined the channel  
 **@arnaud.zdziobeck:** @arnaud.zdziobeck has joined the channel  
 **@teehan:** @teehan has joined the channel  
 **@gaetanmorlet:** @gaetanmorlet has joined the channel  
 **@matthew:** @matthew has joined the channel  
 **@tommaso.peresson:** @tommaso.peresson has joined the channel  
 **@cesaro.angelo:** @cesaro.angelo has joined the channel  
 **@ghita.saouir:** @ghita.saouir has joined the channel  
 **@m.ram3sh:** @m.ram3sh has joined the channel  
 **@sonam.dp42:** @sonam.dp42 has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org