You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/07/07 02:00:08 UTC
Apache Pinot Daily Email Digest (2020-07-06)

<h3><u>#general</u></h3><br><strong>@yash.agarwal: </strong>Is it possible to use multiple buckets for S3PinotFs ? We have limitations to the amount of data we can store in a single bucket.<br><strong>@mayanks: </strong>@kharekartik ^^<br><strong>@kharekartik: </strong>Hi @yash.agarwal, currently it is not possible. Let me take a look into what can be done<br><strong>@g.kishore: </strong>@yash.agarwal what kind of limitation do you have<br><strong>@yash.agarwal: </strong>@g.kishore We have our buckets limited to 1TB and 2 million objects, and we are looking to deploy a cluster well over 50TB.<br><strong>@g.kishore: </strong>got it, let me see how we can support the multiple buckets.<br><strong>@yash.agarwal: </strong>Sure. Do let me know if I can do anything to help :slightly_smiling_face:.<br><strong>@g.kishore: </strong>would love to get your help, created <#C016ZKW1EPK|s3-multiple-buckets><br><h3><u>#troubleshooting</u></h3><br><strong>@somanshu.jindal: </strong>Hi, If i want to use zookeeper cluster for production setup, Can i specify all the zookeeper hosts when starting various pinot components like controller, broker etc.<br><strong>@yash.agarwal: </strong>@yash.agarwal has joined the channel<br><strong>@somanshu.jindal: </strong>I need help with hardware requirements for the various components like cores, memory etc?
Also which components are memory intensive, io intensive, cpu intensive etc.
Currently i am thinking of
• Controller - 2
• Broker - 2
• Servers - 3 (for realtime ingestion)
• Zookeeper (should i go with standalone or cluster?)
As far as i know, segments are stored on servers and controller (segment store), right?<br><strong>@yash.agarwal: </strong>Is it possible to use multiple buckets for S3PinotFs ? We have limitations to the amount of data we can store in a single bucket.<br><strong>@g.kishore: </strong>@somanshu.jindal For prod, here is a good setup
```controller 
- min 2 (for fault tolerance) ideal 3 
- 4 core, 4 gb (disk space should be sufficient for logs and temp segments) - 100 GB
Broker
- Min 2, add more nodes as needed as later to scale
 - 4 core, 4gb (disk space should be sufficient for logs) - 10GB min
Zookeeper (cluster mode), 
- min 3 (this is where the entire cluster state is stored)
- 4 gb, 4 core,  disk space sufficient to store logs, transaction logs and snapshots. If you can afford, go with ssd if not disk will be fine. 100GB

Pinot server
- Min 2 (this is where the segments will be stored), you can add more servers anytime without downtime
- 8 core, 16 gb, SSD boxes (pick any size that works for your use case (500 gb to 2TB or even more). 
- If you are running on cloud, you can use mounted ssd instead of local ssd```<br><strong>@pyne.suvodeep: </strong>@pyne.suvodeep has joined the channel<br><strong>@pradeepgv42: </strong>QQ, wondering how difficult would it be to include timestampNanos as part of the time column in pinot?
(is it just a matter of pinot parsing and understanding that timestamp is in Nanos or there are more assumptions around?)

I believe currently till `millis` is supported. Context is we have system level events (think stream of syscalls)
and want to be able to store the nanos timestamp to fix the order among them and also it’s used by other systems in our infrastructure.

Currently I am storing nanos column as a different column and created a `millis`
column to serve as time column, thinking if I can avoid storing the additional
duplicate info if the feature is simple enough to add?<br><strong>@g.kishore: </strong>IMO, nanos cannot be used as timestamp<br><strong>@g.kishore: </strong>irrespective of Pinot supporting that datatype<br><strong>@g.kishore: </strong>nanos is mainly used to measure relative times<br><strong>@elon.azoulay: </strong>FYI, we have a table which already exists and I wanted to add a sorted column index but getting "bad request 400". Nothing in the controller logs. Can you see what's wrong with the following?<br><strong>@elon.azoulay: </strong>```curl -f -k -X POST --header 'Content-Type: application/json' -d '@realtime.json' ${CONTROLLER}/tables```<br><strong>@elon.azoulay: </strong>```{
      "tableName": "oas_integration_operation_event",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "timeColumnName": "operation_ts",
        "timeType": "SECONDS",
        "retentionTimeUnit": "DAYS",
        "retentionTimeValue": "7",
        "segmentPushType": "APPEND",
        "segmentPushFrequency": "daily",
        "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
        "schemaName": "oas_integration_operation_event",
        "replicasPerPartition": "3",
        "timeType": "SECONDS"
      },
      "tenants": {
        "broker": "DefaultTenant",
        "server": "DefaultTenant"
      },
      "tableIndexConfig": {
        "loadMode": "MMAP",
        "invertedIndexColumns": [ "service_slug", "operation_type", "operation_result", "store_id"],
        "sortedColumn": ["operation_ts"],
        "noDictionaryColumns": [],
        "aggregateMetrics": "false",
        "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "LowLevel",
          "stream.kafka.topic.name": "oas-integration-operation-completion-avro",
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.decoder.prop.schema.registry.rest.url": "<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYVQ6HO-2FNd3WXo8sCVuFwfT0-3DT4te_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxq8wcUqUF3xaBwWeV07JNVyRGy4jlW5PJgT6jQqbHf3TPoY-2FqgmxDrNxIDcaah2om0KvbgMcFLGXrE8ZfpBNvOa9cIJododz1I6dFs45CFYTkxvtRRBjmslWphjLH4q6H1lFMXjU7Oa0hAjVJFMuO-2BC0ULgQjrczkzjbMYZ8ac8tFMZprfJvJ5lZlXAH5d4-2FE-3D>",
          "stream.kafka.zk.broker.url": "XXXX/",
          "stream.kafka.broker.list": "XXXX:9092",
          "realtime.segment.flush.threshold.time": "6h",
          "realtime.segment.flush.threshold.size": "0",
          "realtime.segment.flush.desired.size": "200M",
          "stream.kafka.consumer.prop.auto.isolation.level": "read_committed",
          "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
          "stream.kafka.consumer.prop.group.id": "oas_integration_operation_event-load-pinot-llprb",
          "stream.kafka.consumer.prop.client.id": "XXXX"
        },
        "starTreeIndexConfigs":  [{ "dimensionsSplitOrder": [ "service_slug", "store_id", "operation_type", "operation_result" ], "functionColumnPairs": [ "PERCENTILEEST__operation_latency_ms", "AVG__operation_latency_ms", "DISTINCTCOUNT__store_id", "COUNT__store_id", "COUNT__operation_type" ] }, { "dimensionsSplitOrder": [ "service_slug", "store_id" ], "functionColumnPairs": [ "COUNT__store_id", "COUNT__operation_type" ] }]
      },
      "metadata": {
        "customConfigs": {}
      }
}```<br><strong>@mayanks: </strong>IIRC, uploading segments to realtime tables was not possible (a while back, but not sure if it continues to be the case).<br><strong>@elon.azoulay: </strong>This is just updating the spec for the table<br><strong>@mayanks: </strong>can you try swagger?<br><strong>@elon.azoulay: </strong>Sure<br><strong>@elon.azoulay: </strong>Oh, thanks! Looks like I can't change the time type for the time column, i.e. segmentsConfig.timeType<br><strong>@mayanks: </strong>Makes sense, that could be backward incompatible.<br><h3><u>#presto-pinot-streaming</u></h3><br><strong>@elon.azoulay: </strong>Here's a link to the design doc: <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMc9VK8AZw4xfCWnhVjqO8F2yNvEnb3JHma9TbSCfyfAx-2FVOn7Bt885qSK47uf3MFF-2FhL8qplE-2FLYisjbzJXY-2FUB7YCnAiPcrkdz5y054MsHzlsZTBtUMD-2BcUlK45ORI42w-3D-3DdUo8_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxq8wcUqUF3xaBwWeV07JNVh4mtLbu51UvID-2BpIVeVfHHAkz-2BQGywKBCG-2BuczerYFmfsSw-2BaUhWdf5KrlyQBpdgjghNzrbFX8rvY73d4ST7SlokoYDYRdCoOTGb1ArYbbIkXTayr2aC97n0VXZH4chsCkI8vMD05ZPq-2FvzlmlID-2FWYWayA-2FwE2RKIfyz6P47zs-3D><br><strong>@g.kishore: </strong>@jackie.jxt can you please take a look at this?<br><strong>@jackie.jxt: </strong>Sure<br><strong>@g.kishore: </strong>@elon.azoulay need access<br><strong>@elon.azoulay: </strong>Try this one: <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMc9VK8AZw4xfCWnhVjqO8F2yNvEnb3JHma9TbSCfyfAx-2FVOn7Bt885qSK47uf3MFF-2FhL8qplE-2FLYisjbzJXY-2FUB7YCnAiPcrkdz5y054MsHzlsZTBtUMD-2BcUlK45ORI42w-3D-3D32ZZ_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxq8wcUqUF3xaBwWeV07JNVquxchxi3QlvwYIA1-2FNYdsWIcFvbIHp6nKWfN04ATBV0yJvPGfj63ENLE4TNmKIg-2BcbJT6F3swY6J8adylMAjX7HFQOXlImxxHKo7cX7oqBOq-2BDPxsm1a5e4fBK7n4PpmlT6r4qZMmM16VR4YCnDU4w0ygo9mC2b-2BJwiMNVWoK98-3D><br><strong>@g.kishore: </strong>can you write a few sentences on why we need this and whats the current design<br><strong>@g.kishore: </strong><https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMWR9hf84-2BEJYpip6YlEfWjHMb3DE3DtTnj4lc7ywiNxn8nE0KD6t23Jqnbnkq1-2Fazw-3D-3D_TmA_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxq8wcUqUF3xaBwWeV07JNVzSBoU3zH4HjRuVheDvC3EgsKYdEk1Y6sJnY9wsmnoKBRjducBzXmsKfeziONk-2BOyIWDjmSFdd1orV6HvzPyxRynSRgZCN5CvD8J3b1YDJphT3Nc3t10nYBybTrYtMgwY6TWsi-2B0Dtu-2Fmo7DmxIVTnkAUvz5OUTUwdy7ZFd9iAvI-3D><br><strong>@g.kishore: </strong>use this diagram<br><strong>@g.kishore: </strong><br><strong>@g.kishore: </strong>today we are in unary streaming<br><strong>@g.kishore: </strong>and we want to move to server streaming<br><strong>@g.kishore: </strong>advantages
• less memory pressure on pinot server<br><strong>@g.kishore: </strong>• presto workers can start working as soon as chunks arrive<br><strong>@elon.azoulay: </strong>Sure<br><h3><u>#s3-multiple-buckets</u></h3><br><strong>@g.kishore: </strong>@g.kishore has joined the channel<br><strong>@yash.agarwal: </strong>@yash.agarwal has joined the channel<br><strong>@kharekartik: </strong>@kharekartik has joined the channel<br><strong>@singalravi: </strong>@singalravi has joined the channel<br><strong>@kharekartik: </strong>@g.kishore Is there a support for multiple directories for FS? If Yes, we can extend that to multiple buckets.<br><strong>@kharekartik: </strong>@yash.agarwal How do you want to split data across buckets?<br><strong>@g.kishore: </strong>@kharekartik No, I was thinking if users can provide a list of subFolders/s3buckets, we can pick one randomly or hash it based on segment name<br><strong>@kharekartik: </strong>Randomly at the time of creating the segments?<br><strong>@kharekartik: </strong>Wouldn't that disrupt the query execution?<br><strong>@g.kishore: </strong>no, we just store the uri along with segment metadata in ZK<br><strong>@g.kishore: </strong>it can point to anything<br><strong>@g.kishore: </strong>actually, this is a problem only with real-time where we create the URI<br><strong>@g.kishore: </strong>with batch ingestion, user can provide any URI<br><strong>@yash.agarwal: </strong>We don’t have any specific requirement around how to slit data across buckets.<br><strong>@pradeepgv42: </strong>@pradeepgv42 has joined the channel<br><strong>@kharekartik: </strong>Ok. Then I believe the change needs to be done in the handling of ingestion config and then picking a random directory while creating segments

S3 filesystem implementation won't need any change unless the buckets are located in different regions<br><strong>@yash.agarwal: </strong>all the buckets are co located.<br><strong>@g.kishore: </strong>Yash, is this realtime or offline<br><strong>@yash.agarwal: </strong>Right now it is only offline.<br><strong>@g.kishore: </strong>then, you dont need any thing for now<br><strong>@g.kishore: </strong>I am guessing you will use the ingestion-job to generate the segments<br><strong>@vallamsetty: </strong>@vallamsetty has joined the channel<br><strong>@yash.agarwal: </strong>Yeah I realised that too. I am very new to this so sorry for any troubles :slightly_smiling_face:<br><strong>@g.kishore: </strong>no worries, this is a good feature to have. if you dont mind, can you create an issue<br>