You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/06/30 02:00:14 UTC

Apache Pinot Daily Email Digest (2020-06-29)

<h3><u>#general</u></h3><br><strong>@g.kishore: </strong>Since schema is optional in Pinot, we cannot validate the query. We should probably validate and throw exception if the schema is defined<br><strong>@tisantos: </strong>Hi all, sharing this blog post on how LinkedIn Talent Insights was built using Pinot! <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMULmwXpUy0vBvQDjipJvea-2B1E47iEO0mvL4H3-2FQjn-2FwdG3z-2BtmSy2w-2BGGds4J0B2LXDSVjjJDf5OaXzkYpfxyiWrdLnF-2BigJ8l3fRV6lvWPfGPPh_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTzaSkwmSoqTfD7vNiUtjDYttpwnEGsdM3-2FtYraLdIYKYDaGjv8h3vbym9zBCiXHkWZisddaZqZEHYE-2BH1IYbl7ao9rti6KAFacS5CeSyx1ufUd6Aj-2Bu3LQJVR0YHiBhixa7dw20RNxQ-2FAhY99jobSzZqoHDhsClnD06GMDGTdlY78oTbSwFdNGzAnVZvvUTapI-3D><br><strong>@ayushi.shiv: </strong>@ayushi.shiv has joined the channel<br><h3><u>#random</u></h3><br><strong>@ayushi.shiv: </strong>@ayushi.shiv has joined the channel<br><h3><u>#troubleshooting</u></h3><br><strong>@shounakmk219: </strong>Hey all, wanted to know if number of partitions matter in replica group concept? And if one of the replica group has a missing segment then how to make sure it downloads that segment from controller?<br><strong>@steotia: </strong>Are you talking about numInstancesPerPartition or numPartitions in segmentPartitionConfig?<br><strong>@steotia: </strong>replica groups and partitioning are two different concepts.<br><strong>@mayanks: </strong>I think @shounakmk219 is referring to realtime topic partitions.<br><strong>@shounakmk219: </strong>Yes partitions on topic<br><strong>@mayanks: </strong>Your replica group size (num instance in replica) should be enough to consume the number of partitions you have as well as serve your qps/latency<br><strong>@shounakmk219: </strong>Okk and what about the missing segment?<br><strong>@mayanks: </strong>Why would the segment be missing?<br><strong>@shounakmk219: </strong>So this issue came up due to not enough space on server pv so it was not able to download the segment and gave up after 3 retries<br><strong>@mayanks: </strong>Then you need more storage on the server<br><strong>@shounakmk219: </strong>Ya so once we increase the pv size should the segment be manually uploaded to that server?<br><strong>@mayanks: </strong>It should be ONLINE in the ideal state. So a server restart should fix it.<br><strong>@shounakmk219: </strong>Oh okay.. that should do.. thanks a lot Mayank and Sidd:+1:<br><strong>@elon.azoulay: </strong>We enabled istio in our k8s pinot deployment. Getting zk timeouts. Wanted to know if setting system properties for `zk.connection.timeout` and `helixmanager.waitForConnectedTimeout` is possible to do in a config or just set with `-D`zk.connection.timeout=...``<br><strong>@pradeepgv42: </strong>Hi, I am consuming data into pinot using two kafka nodes and I notice data from lot of partitions is missing (I have 64 partitions currently).
I am setting following kafka related properties, initially I had the `stream.kafka.zk.broker.url` missing, added it later on.
I have number of replicas set to `1` and no tagsOverrideConfig
```
        "streamType": "kafka",
        "stream.kafka.consumer.type": "LowLevel",
        "stream.kafka.topic.name": "&lt;TOPIC&gt;",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list": "&lt;node1&gt;:9092,&lt;node2&gt;:9092",
        "stream.kafka.zk.broker.url": "&lt;zk&gt;:2191",```
<br><strong>@pradeepgv42: </strong>When I run a select count(distinct()) query on a field whose value is different for data received from each partition,
I see that segments queries data shows `"numSegmentsQueried": 14,`  where there are 64 kafka partitions and also /segments/{tableName} shows ~68 segments<br><strong>@pradeepgv42: </strong>Wondering if anyone knows if the issue is in kafka consumer setting?<br><strong>@jackie.jxt: </strong>Looking into it<br><strong>@g.kishore: </strong>Can you paste the ideal state <br><strong>@pradeepgv42: </strong><br><h3><u>#pinot-dev</u></h3><br><strong>@g.kishore: </strong><https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMSfW2QiSG4bkQpnpkSL7FiK3MHb8libOHmhAW89nP5XKRe3DHeT3uJk0ZxGzj-2FZ8VQ-3D-3DwV0__vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTzaSkwmSoqTfD7vNiUtjDYtnZy53fVF2o11QhGLLw3WnpuRT3G2kJe5h82vfUcOqA21Yy05CfEk2hUy7IKlXVriOEfjryyP1YCByBmUrz9WAAwXjRfXtJsA2hp2Xv9j50EKuXyz7HiEeIbDm3FpiTSTZIgzqWXPZ-2Fzfb0uxHHSuuRrgUA6CmGDGS31tS0FNLo0-3D><br><strong>@g.kishore: </strong>need a quick review on this<br><strong>@mayanks: </strong>I had left a question on the PR (I am very likely missing something obvious)<br><strong>@g.kishore: </strong>what was the questio<br><strong>@mayanks: </strong>Neha just replied<br><strong>@g.kishore: </strong>ok<br><strong>@g.kishore: </strong>anything else needed<br><strong>@mayanks: </strong>No, just e2e test as others also mentioned.<br><strong>@g.kishore: </strong>ok<br>