You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/08/20 02:00:07 UTC

Apache Pinot Daily Email Digest (2020-08-19)

<h3><u>#general</u></h3><br><strong>@ankit.raj.singh: </strong>Hi all,  Pinot seems to have Map data type support for column. Is query possible on it?…if possible, is there any example i can refer to?<br><strong>@cj: </strong>@cj has joined the channel<br><strong>@joey: </strong>I was having some trouble making a schema/table with transform configs, following the documentation at <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMdTeAXadp8BL3QinSdRtJdpjVdLs6pcP-2BmJYu0RGOJgjW4yUWessZqxMl9cgpZEQvZ61b7e-2FULIdDgh3zQyBB6rFDS3rtq9FjWH9nlZ0hZ99VBNms8zq1x7TdCDGsaZg5i-2F97vZlpfEl6fip6rjl-2BUw-3DvQRv_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxrfh0vLw2LDMQsRtDNJtUKznpOvlN6rnjjXxrv6HQncsTrUBV7sNMNMlck104lvPWDKKJzp4EBhVJasu6WvVgcEwgXPBcIVI9J2sJtXBTYO3Xdrveqjs7cCva2Zb5pWwRF2ZMWmJ3ydDCMrjWhuc86g-2BttX2xHbQe4GxlpzZKvWWd5nu4yGamYNHfPdW2IN3U-3D> (=&gt; thread)<br><strong>@joey: </strong>Is there any way to get a query plan from Pinot, such as indexes used to pull data, or is this something that you can only really interpret from the query response stats on the Controller's query console?<br><strong>@luu: </strong>@luu has joined the channel<br><h3><u>#random</u></h3><br><strong>@cj: </strong>@cj has joined the channel<br><strong>@luu: </strong>@luu has joined the channel<br><h3><u>#troubleshooting</u></h3><br><strong>@laxman: </strong>Folks, Have some doubts related to rebalance. Can someone please clarify these or point me to relevant documentation.
=============
• One of our pinot servers was scaled down (kubenetes - from 4 servers to 3 servers).
• Even after several hours segments didn’t come online.
• Same case with CONSUMING segments. Kafka partitions which were getting processed by scaled down server are now not getting processed at all.
=============
• When does the rebalance gets triggered? I already tried server/controller restarts. I also tried rebalance from controller UI
• What is the right way to scaled down a server?
FYI: we are on 0.3.0 + some fixes, in case if it matters<br><strong>@elon.azoulay: </strong>Is there a quick way we can convert realtime segments to offline segments? Is there any benefit to doing that since  the realtime segment is created with star tree, inverted, sorted and text indexes?<br><h3><u>#onboarding</u></h3><br><strong>@pratiksethi.pro94: </strong>@pratiksethi.pro94 has joined the channel<br><h3><u>#transform-functions</u></h3><br><strong>@pratiksethi.pro94: </strong>@pratiksethi.pro94 has joined the channel<br><h3><u>#community</u></h3><br><strong>@pratiksethi.pro94: </strong>@pratiksethi.pro94 has joined the channel<br><h3><u>#pinot-0-5-0-release</u></h3><br><strong>@laxman: </strong>@tingchen: Will you be able to cherry-pick the following to 0.5.0 rc branch.
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMSfW2QiSG4bkQpnpkSL7FiK3MHb8libOHmhAW89nP5XK7afLY5WyBExz0XzvjqBqCA-3D-3DIZrI_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxrfh0vLw2LDMQsRtDNJtUKjlRTDT1MIsm5StTQgCMYTna5ln6-2BdwGiIjdvTK-2BnwhNEXTr6S3l16LppcdNN-2B7eOc6s-2B1sRTt-2BEpPynmkSvnL4ve8uVmzH5MkmWapUgDJMRPaUMlttTLfTGDPWBgwnxqrZX7PoPmPGjyjgEpNKr1Te172Ohk8ZLIvfoSizXJD3M-3D><br><h3><u>#multiple_streams</u></h3><br><strong>@fra.costa: </strong>@g.kishore sorry to bother you again, but I would like to ask one more clarification around the duality between real-time and offline. As we mentioned with an hybrid table setup at query time if an entry appears in both stores the offline ones take precedence
• How is the “collision” determined? Is there some sort of identifier column in the table schema that governs that?
• When setting up real-time and offline, my understanding is that real-time has some sort of window associated with it: 
              1. What happens when that period passes? Are entires purged by the real time datastore?
              2. If an instance exists in which entries are consolidated and moved from real time to offline, how the same key aspect is dealt with? Are real time entries dismissed in favor of the existing offline (if any)?

I apologize if some of these questions are already addressed in the documentation, happy to read relevant sections in case I missed them
Thanks,<br><strong>@npawar: </strong>Regarding hybrid table and which data takes precedence, this might help: <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMdTeAXadp8BL3QinSdRtJdpzMyfof-2FnbcthTx3PKzMZIvTvz0ZlGzjfnWuiLO3kB-2FQ-3D-3DH8aO_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxrfh0vLw2LDMQsRtDNJtUKV71Hy2Q4Pn7aIJhiBtj2cRegrX4tFr-2FAQhLCYrJxdrrPwA-2FojQcedZ3qZzH-2Fsrx2j1-2FCfpysdUDHM9qxYbqxSMq-2B4fjMf6BI4YAls9x3uEePBxC-2BP3r5Vk291rbC4-2Bk1U0O7nB6ukJr9tLp-2FsoWnL-2FzajZnhH6J1HItSIq9nkhk-3D> (time boundary section)<br><strong>@npawar: </strong>I’m not sure what you mean by “window” of a realtime table. We have a concept of retention. This can be configured for both realtime and offline tables.  If the data in the table becomes older than the retention, it is deleted<br><strong>@fra.costa: </strong>Thanks Neha, I am going to read that and reply after<br><strong>@fra.costa: </strong>That page perfectly explains the first question, it’s done on the time series dimension, there’s no key on single object involved, makes sense.

As for the retention, yes I was referring to that seems like that is a different concept than the behavior I was worring about.

I guess the only question left is if there is instances in which Pinot independently consolidate Realtime segments into the offline ones, I have a vague memory of reading something about it, but not 100% sure<br><strong>@fra.costa: </strong>If that is the case I am basically trying to understand if the Offline data is replaced by the newly “consolidated” realtime segments<br><strong>@npawar: </strong>as of now, the offline data needs to be populated by you, with your own offline jobs setup.<br><strong>@npawar: </strong>but,<br><strong>@npawar: </strong>we have a project ongoing, which will move segments from realtime to offline table - <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMc9VK8AZw4xfCWnhVjqO8F0jpwxWv4fC4LAZTjvhd54Mnnp7A4BBhAtbRr8NR9LakFgJLRyTGnwArJQe4yssb406fh8XVHzSkiiNBgKkuV9jApFM_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxrfh0vLw2LDMQsRtDNJtUKfQ5n1DqLN1oI6Kdf-2B-2BIFqOJ37P7YUpw18Cen-2BZjMNo1f978pXofjkt-2Bxq6x9J4Gr-2BGefv-2Bk8LAfFshDvLEiocHd3cc5YYcM7XrNNKEnRbWifqSbE6pf9JvezRzwoxMPYb1epC32X1qX7uOxpXZcMV9WWp4Wnv8ImMONCzT-2BPgo4-3D><br><strong>@npawar: </strong>though i dont know if that’ll help in your case, since you need the accurate data merged with the realtime data<br><strong>@fra.costa: </strong>Thanks Neha, in my case it would actually hurt us<br><strong>@fra.costa: </strong>I was more watching out for that happening under the cover<br><strong>@fra.costa: </strong>so in that regard we are good, thank you very much<br>