You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/05/04 02:00:16 UTC

Apache Pinot Daily Email Digest (2021-05-03)

### _#general_

**@rajathlikeslemons:** @rajathlikeslemons has joined the channel
**@kmvb.tau:** Hello, For Real-Time Tables, is there a condition like one
Kafka topic should have one table data alone ?. In our cases, we have multiple
tables data produced in a single topic. i.e we have a group of topics in
Kafka. Each topic will serve for the set of tables based on the use case. is
it possible to consume multiple table events from a single topic in pinot?
**@g.kishore:** Each table in Pinot is independent of each other.. you can set
up multiple tables to consume from same kafka topic.. You can also use filter
config to filter out rows that don't belong to that table
**@ssubrama:** Watch where you are headed though. It is easy to throw in all
data into one topic and create multiple pinot tables, but the tables will be
consuming all the data and discarding those that we don't need. This can
generate a lot of garbage and get in way of high throughput use cases.
**@kmvb.tau:** @ssubrama we have a group of topics. 1\. since we have both
user-level and table-level shard maintaining separate topic for every table is
not scalable for us. 2\. Configuring the Same topic for multiple tables will
increase unnecessary IO load on Kafka servers.
**@jmeyer:** Hello :wave: Has anyone got experience with Pinot on ADLS (Gen
2) ? Specifically: • Any idea on minimum IOPS for running Pinot smoothly on
lowish load ? (i.e. is a standard Storage account "enough ? If so, how "far"
can we push it ?) • Is it recommended to create a dedicated PVC for
`controller.local.temp.dir` ?
**@fx19880617:** • pinot uses ADLS as deep store( for backup) so it’s not your
query path, all the segments are required to be copied to pinot server local
disks. • For `controller.local.temp.dir` it’s mostly used as temporary data
store for segments uploaded to controller, which is not required if you use
ADLS as deep store, so you typically just keep it to local temp should be ok.
• Pinot server uses PVC to serve data, please ensure you give enough disk
space and SSD for query performance. On Azure, by default we use standard
AzureDisk as PVC. You can also try AzureFile if you don’t care about the perf
much.
**@jmeyer:** Thanks @fx19880617
**@jmeyer:** Does a Standard SSD with 500 IOPS seem enough for lowish loads or
it is not a good idea ? I get that we could have better performance (latency
& concurrency) with higher grade disks of course
**@fx19880617:** depends on your use cases, standard SSD is good for typical
use cases.
**@jmeyer:** We'll start with that, thanks again
**@kmvb.tau:** Few doubts regarding Streaming data : 1\. Pinot supports data
ingestion via streaming (Kafka) or batch(Hadoop) process. is there any direct
API available for pushing data into Pinot? 2\. Does pinot have a segment
compaction process like Hbase compaction? Creating a lot of small segments
will not affect query performance?
**@mayanks:** There offline push is via http post. There isn’t a write api
right now if that’s what you are asking.
**@mayanks:** Segment compaction is under progress and will be available
shortly
**@kmvb.tau:** Based on my understanding, Pinot Streaming Processor can pull
data from Kafka, kinesis, etc. but Pinot doesn't have a rest layer that
accepts data requests(POST) directly. is there any plan to support write REST
API in the future?
**@mayanks:** We have discussed, but there's no concrete timeline right now.
May I ask what's your use case that would need the write-api?
**@pedro.cls93:** Hello, Pinot docs related to deep-storage in K8s seem to be
broken: , can anyone point to the right resource?
**@mayanks:**
**@mayanks:**
**@pedro.cls93:** The last link is for file import, is that relevant?
**@pedro.cls93:** I've configured the controller & server to connect to ADFS.
I get the following exception: ```2021/05/03 16:20:48.309 ERROR
[StartServiceManagerCommand] [Start a Pinot [SERVER]] Failed to start a Pinot
[SERVER] at 1.844 since launch java.lang.RuntimeException:
com.azure.storage.file.datalake.models.DataLakeStorageException: Status code
400, "<?xml version="1.0" encoding="utf-8"?>
<Error><Code>OutOfRangeInput</Code><Message>One of the request inputs is out
of range. RequestId:45faebbd-b01e-0064-2438-40f291000000
Time:2021-05-03T16:20:48.1516232Z</Message></Error>" at
org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:58)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.server.starter.helix.SegmentFetcherAndLoader.<init>(SegmentFetcherAndLoader.java:71)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.server.starter.helix.HelixServerStarter.start(HelixServerStarter.java:324)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:150)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:95)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:260)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.access$000(StartServiceManagerCommand.java:57)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:260)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] Caused by:
com.azure.storage.file.datalake.models.DataLakeStorageException: Status code
400, "<?xml version="1.0" encoding="utf-8"?>``` Does this ring any bells?
**@mayanks:** @rkanumul ^^
**@rkanumul:** I haven’t seen this error before.. But will spend some time on
it..
**@mayanks:** Also, @pedro.cls93 are you using ADLS gen1 or gen2?
**@pedro.cls93:** Hello again, are pinot helm charts published to any hub?
They don't exist in , are they just available in the github repo?
**@fx19880617:** right now it’s only on github

### _#random_

**@rajathlikeslemons:** @rajathlikeslemons has joined the channel

### _#feat-presto-connector_

**@j.wise.hunter:** @j.wise.hunter has joined the channel

### _#troubleshooting_

**@rajathlikeslemons:** @rajathlikeslemons has joined the channel
**@pedro.cls93:** Hello, has anyone succesfully configured pinot to use Azure
Storage Containers for deep storage?
**@mayanks:** I know of ADLS gen2 based deployments
**@mayanks:** cc @dlavoie

### _#pinot-dev_

**@syedakram93:** @syedakram93 has joined the channel
**@j.wise.hunter:** @j.wise.hunter has joined the channel
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org