You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/07/03 02:00:08 UTC

Apache Pinot Daily Email Digest (2020-07-02)

<h3><u>#general</u></h3><br><strong>@mayanks: </strong>Hi Pinot Community: As we plan ahead for the coming quarters, we have created this poll for the community to be able to vote on features you'd like to see. I encourage you to suggest features you'd like to see as well as upvote existing ones you think are useful for you. This will help us prioritize on feature asks. <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMRmWRpSwTQ1R-2Fu3Wa3xNjOCPY1ytDhgp-2BzNaYnys36gFGsl3_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxFCuQcE7jioWTVo9XCE6GjE8XafoH7jIkXUj-2BNBgmy1Cq3eFkS1L0HbFT1Nr-2B2aTYFPkbtnTAkijEVxVsFQh39LMolwjEU9CqMk5PYs-2BkiPSV8GVgnkbaPOWQpdboDPGijaV-2BPPL0rwUOthJqQrffeorZnjrBCTh4nO1x2Tzc9ycxKRYIOEyl4FAbo-2B1EtIvw-3D><br><strong>@xcodeleopard: </strong>@xcodeleopard has joined the channel<br><h3><u>#random</u></h3><br><strong>@xcodeleopard: </strong>@xcodeleopard has joined the channel<br><h3><u>#troubleshooting</u></h3><br><strong>@jackie.jxt: </strong>@pradeepgv42 What is the total size of your data? In order to solve this query, the servers need to scan the whole table<br><strong>@pradeepgv42: </strong>It’s close to ~4G<br><strong>@pradeepgv42: </strong>does indexingConfig changes applied to an existing table update the old segments?<br><strong>@pradeepgv42: </strong>Also, should keeping a min/max per segment help?
`columnMinMaxValueGeneratorMode: TIME`<br><strong>@g.kishore: </strong>Parts of them, such as inverted index, etc apply to old segments <br><strong>@g.kishore: </strong>However, the original encoding cannot be changed<br><strong>@g.kishore: </strong>You can use minion to perform such tasks <br><strong>@pradeepgv42: </strong>(not urgent, ptal when you guys get a chance, sorry for the late night ping)
Also, when I tried adding “timestampMillis” my timestamp column to the table (note that

```{
  "REALTIME": {
    "tableName": "tablename_REALTIME",
    "tableType": "REALTIME",
    "segmentsConfig": {
      "timeColumnName": "timestampMillis",
      "schemaName": "search",
      "timeType": "MILLISECONDS",
      "replicasPerPartition": "1"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "LowLevel",
        "stream.kafka.topic.name": "INPUT",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list": "&lt;broker_nodes&gt;:9092",
        "realtime.segment.flush.threshold.size": "0",
        "realtime.segment.flush.threshold.time": "24h",
        "realtime.segment.flush.desired.size": "80M",
        "realtime.segment.flush.autotune.initialRows": "700000",
        "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
      },
      "noDictionaryColumns": [
        "timestampMillis"
      ],
      "enableDefaultStarTree": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": true
    },
    "metadata": {
      "customConfigs": {}
    }
  }```
}

I am seeing this Nullptr exception, works fine when I choose a different string column. noDictionaryColumns should only contain string/bytes fields?
```Could not build segment
java.lang.NullPointerException: null
        at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:393) ~[pin
ot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:360) ~[pinot-all-0.
4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java
:216) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:199) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:141) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]```
<br><strong>@fx19880617: </strong>I think this is due to pinot segment creation uses timestamp column min/max value from dictionary to set segment name and write segment metadata(start/end time)<br><strong>@fx19880617: </strong>since it’s configed as a non-dictionary column, hence the npe<br><strong>@cadthecoder: </strong>@cadthecoder has joined the channel<br><strong>@mhomaid: </strong>@mhomaid has joined the channel<br><h3><u>#pinot-dev</u></h3><br><strong>@mayanks: </strong>We seem to have a dead-link in the landing page <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMRoeOIJ0cu5Wh4ZHlIWCwHKAcuYFOJS2aSFZbaEBwYXKuguQ_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxFCuQcE7jioWTVo9XCE6GjrfJFjXv2uGu4XwwqP33a2YDHLRfgF5YPWZUv1nmoyYIHHcw2UYVSU2OUME8bc5DnnTptniAFffGXyj0m2oG7qhz2t-2F8p8J8IrBF77qM58VUay56G5VdHRLIMLGAkNMz1EcwRnKp1DGTiMYDFgzUdF5HasoWyI7GUsPSBX1I2zbE-3D> . Scroll to the bottom of the page and under `Docs` click `Administration`. Note, this is *not* from <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMdTeAXadp8BL3QinSdRtJdplKYwVDwt8ZlTEj-2BvPJR-2B9uwf8_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxFCuQcE7jioWTVo9XCE6GjMt3kS69VDa5Js0j8i9-2FXEQvd0xXU4t0qRJhIccGcti7SBP-2FkaqA2JIDYrbdaP936UDBMzAh8GQYxUgDA8dLA7ky0pCdL69cPl-2FG3ERn0o4gO0AYxqvX-2BmOFi4sF9XxMDmFDdVANMziB0NYdIfk6FaxOWbyj2M8ArhvJ5vZqF4tU-3D>. cc @kennybastani<br><strong>@mayanks: </strong>The dead-link is: <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMdTeAXadp8BL3QinSdRtJdou-2FBKHPSn4-2FRIlDcgopXBWmvf7DkfNOEh8VN4Pvdy96Q-3D-3DaJco_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxFCuQcE7jioWTVo9XCE6GjfGi2hxWrLopb2g4THhmmhcnTplcrM69WOJaKnswrbEyU0K7JSL1bdNTbU3fK6YQOMTpZU-2BZqqdpWl3SZNn1yX7q8MIZYLpX1iQcs4QQwt9NdeQV3epGwYWcksRkWJhand-2F8shJHcUPwweucqdi-2Fg6S-2Ff8LAWtHgjO79ZmIbiDUE-3D><br><strong>@kennybastani: </strong>@mayanks Thanks for pointing that out. I've temporarily fixed the dead link. We'll make a more permanent fix soon.<br>