You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/07/22 02:00:08 UTC

Apache Pinot Daily Email Digest (2020-07-21)

<h3><u>#general</u></h3><br><strong>@mailtobuchi: </strong>Quick question about Pinot query.
If this was the Pinot query result plan, does this mean `numSegmentsProcessed` segments were mem mapped?
```{
    "resultTable": {
        "dataSchema": {
            "columnDataTypes": ["BYTES"],
            "columnNames": ["id"]
        },
        "rows": [
            ["6a254bd3c853e950"]
        ]
    },
    "exceptions": [],
    "numServersQueried": 4,
    "numServersResponded": 4,
    "numSegmentsQueried": 1237,
    "numSegmentsProcessed": 1229,
    "numSegmentsMatched": 4,
    "numConsumingSegmentsQueried": 8,
    "numDocsScanned": 4,
    "numEntriesScannedInFilter": 4,
    "numEntriesScannedPostFilter": 4,
    "numGroupsLimitReached": false,
    "totalDocs": 265510367,
    "timeUsedMs": 32,
    "segmentStatistics": [],
    "traceInfo": {},
    "minConsumingFreshnessTimeMs": 1595297570989
}```
<br><strong>@mayanks: </strong>No, this is the number of segments the query had to process.<br><strong>@hiboss1: </strong>@hiboss1 has joined the channel<br><strong>@dlavoie: </strong>Wouldn't Pinot make an incredible datasource for Grafana?<br><strong>@pradeepgv42: </strong>@steotia Thanks a lot for enabling the TEXT_MATCH feature on dictionary encoded columns.
on a smaller table with ~25M rows, simple regexp_like query takes 178ms vs TEXT_MATCH takes ~30ms
This is pretty cool.<br><strong>@g.kishore: </strong>Amazing video by @kennybastani on Deploying Pinot on Kubernetes <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMc92bR9g-2BkGUUQX5IM7P9-2BAhHRVzOXTS92je0dEky-2B6erluPv4yRe5Qxf8-2BPzQrpHg-3D-3DJHPT_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTybC1-2B-2FcUzX2RLE0WEiXpW8-2FoU2Y6JwxpYBGsUTMOfMOQsqm51Fyzpb3bLaYfh1TSyYryVJawHiPEsIw2FwM9lfYIO-2FfnyBg2faUY-2FGTgbUWpBy2hTlI03MzGgGB5mzTur8qzcpXO4hA0qczzwAyzOolfDvN60lMyZoH2Uda0Bgz2qIA-2BXBWWA94G1dOwatLvc-3D><br><strong>@rahulvinaykumar.chhap: </strong>Thanks for sharing :slightly_smiling_face:<br><strong>@sanjay: </strong>@sanjay has joined the channel<br><h3><u>#random</u></h3><br><strong>@hiboss1: </strong>@hiboss1 has joined the channel<br><strong>@sanjay: </strong>@sanjay has joined the channel<br><h3><u>#troubleshooting</u></h3><br><strong>@ankit.raj.singh: </strong>@ankit.raj.singh has joined the channel<br><strong>@elon.azoulay: </strong>We are about to upgrade to pinot-0.4.0 - do you recommend going to head or just cutting it at the 0.4.0 release commit?<br><strong>@elon.azoulay: </strong>Any notable config changes, or k8s changes we should be aware of? We're on pinot-0.3.0 now<br><strong>@damianoporta: </strong>Nooooo I have just upgraded my custom aggregation function :smile: did you change the API?<br><strong>@damianoporta: </strong>:joy:<br><strong>@g.kishore: </strong>@elon.azoulay I would go with 0.4.0 unless you need any feature in master<br><strong>@quietgolfer: </strong>Sorry, I think I've asked before (I lost my slack history).  Is there an easy way to have Pinot take the realtime inputs and automatically run data ingestion jobs to populate the offline tables?  Mostly checking to see if I can shortcut some work for a v1 deliverable.  I assume there is probably a simple setup to output the kafka topic for 1 day, split the data and run batch ingestion jobs.<br><strong>@g.kishore: </strong>Yes, it’s doable but there is no such tool <br><strong>@g.kishore: </strong>You can download the real-time segments use Pinot segment reader to read multiple segments to generate a new offline segment and push it<br>