You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/05/13 02:00:19 UTC

Apache Pinot Daily Email Digest (2021-05-12)

### _#general_

  
 **@aiyer:** Hi Team -- How do you recommend to handle cases where we need to
delete a record due to gdpr/ccpa ?  
**@jackie.jxt:** This can be done by the minion `PurgeTask`  
**@jackie.jxt:** Since the record purge is custom logic, you need to implement
and plug in the task scheduler and the record purger  
**@jackie.jxt:** See `SegmentPurgerTest` and
`SimpleMinionClusterIntegrationTest` for examples  
**@aiyer:** ok.. I will take a look. Thanks Jackie.  
 **@aiyer:** Another question -- • How does presto push down aggregations when
we do joins ? If it doesn't push down and it fetches the row to do the agg,
then it will be slow. My use case is a typical star schema (a fact table and
multiple dimension tables). The slice/dice will generally occur on top of
these dimensions for which i will need join capabilities. (sorry getting
questions as I am doing some tests and thinking about my usecase)  
 **@aiyer:** I would expect the aggr for the fact table to happen on pinot and
then only the mapping of the ids to the values from dim table happen in
presto.. Let me know if its not clear and i will post an example.  
**@fx19880617:** Presto will parse query and generate the plan, if it’s join
then aggregate, presto will fetch data from both sides, only predicate push
down.  
**@fx19880617:** If it’s one table agg then join, it should be simple agg
pushdown  
**@fx19880617:** You can also do explain on your query to see the presto
generated plan and see what’s generated PinotQuery  
**@aiyer:** ok let me try this ! Thanks Xiang Fu  
**@aiyer:** ```presto:default> explain select sum(amount) from txn;
\--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
\- Output[_col0] => [sum:double] Estimates: {rows: ? (?), cpu: ?, memory:
0.00, network: ?} _col0 := sum \- RemoteStreamingExchange[GATHER] =>
[sum:double] Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: ?} \-
TableScan[TableHandle {connectorId='pinot_quickstart',
connectorHandle='PinotTableHandle{connectorId=pinot_quickstart,
schemaName=default, tableName=txn, isQue Estimates: {rows: ? (?), cpu: ?,
memory: 0.00, network: 0.00} sum := PinotColumnHandle{columnName=sum,
dataType=double, type=DERIVED} ```  
**@aiyer:** hi @fx19880617 -- What does this mean ?  
**@aiyer:** did this push down the agg?  
**@fx19880617:** you need to use the right arrow to show the full plan  
**@aiyer:** yeah so i tried these two queries -- ``` select txName,sa from
(select txtype,sum(amount) sa from txn group by txtype) a join txtypes b on
a.txtype=b.txtype; select txName,sum(a.amount) from txn a join txtypes b on
a.txtype=b.txtype group by txName;```  
**@aiyer:** the first one did the push down (Pinot query had the group by
logic)  
**@aiyer:** the second one did not have the group by in PinotQuery..  
**@aiyer:** Does this behavior look accurate to you ?  
**@fx19880617:** i think so  
**@aiyer:** cool.. I understand now.  
**@aiyer:** thank you !  
**@fx19880617:** this means presto parsed thee query into two plans  
**@fx19880617:** join later and join first  
 **@humengyuk18:** What are the limitations when using noDictionaryColumns? I
got the following exceptions when doing an orderby on a noDictionaryColumn:
```[ { "errorCode": 200, "message":
"QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat
java.nio.Buffer.checkBounds(Buffer.java:571)\n\tat
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:264)\n\tat
org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:80)\n\tat
org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:60)\n\tat
org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:34)\n\tat
org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:465)\n\tat
org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat
org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat
org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat
org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat
org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)\n\tat
org.apache.pinot.core.operator.query.SelectionOrderByOperator.computePartiallyOrdered(SelectionOrderByOperator.java:237)\n\tat
org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:178)\n\tat
org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:73)"
} ]```  
**@mayanks:** Hmm, from query perspective everything should work. My guess is
offset had overflow. but I thought we already switched to long based offset.
Can you provide more context?  
**@mayanks:** Cc @jackie.jxt @steotia  
**@jackie.jxt:** We are using `int` to store offset within the chunk, but in
normal case that should not overflow  
**@jackie.jxt:** @humengyuk18 Can you share the segment metadata? What is the
longest entry for this column? Does it contain special characters?  
 **@ricardo.bernardino:** Hi everyone! When using the realtime table with
upsert, is there any compaction mechanism on segments? Or will they just keep
on being created and kept forever? Thanks!  
**@mayanks:** Good question, we have discussed it and will likely decide to do
it, no concrete plan immediately that I am aware of though.  
**@ricardo.bernardino:** Thanks for the reply! I was under the impression that
on some design document it was mentioned that there would be some background
task that would purge the segments of stale entries  
**@mayanks:** Yes, currently that purge job is there for GDPR, and uses the
Pinot Minion framework. We need to create on that purges for stale entries  
 **@mohitdubey95:** @mohitdubey95 has joined the channel  
 **@patidar.rahul8392:** Is there any way to generate schema JSON file for
pinot table of JSON sample data.I have data for 250+ column in Kafka topic and
here manulaay I am writing JSON schema file for pinot table. Kindly suggest me
if is there any way to generate directly from sample data and can use same as
schema file for pinot.  
**@mayanks:** There's one for avro:  
**@mayanks:** Perhaps you can look at the code and see if you can contribute
one for JSON?  
 **@aiyer:** Question -- Is there any limit to the number of tenants we can
have on a single cluster ? Eg - is 5000 tenants too much ?  
**@mayanks:** Depends on the cluster size. Also, why do you need 5000 tenants?  
**@aiyer:** i was thinking that way purging would be simpler, once the tenant
is gone, we can simply purge the related tables and segments..  
**@aiyer:** also to have isolation..  
**@mayanks:** So you will have 5000 tables?  
**@mayanks:** And I didn't get the purging part, how is that simple. You could
just delete the table?  
**@aiyer:** yes we could delete the table for the tenant..  
 **@xiong.juliette:** @xiong.juliette has joined the channel  
 **@kmvb.tau:** For most of Time Series /Audit data, Time Criteria is the
basic one. (E.g) For one-year data, segments created on daily basis will have
365 segments per year. Even for queries that access only last month, last week
data will be scheduled to scan all segments including unnecessary ones. is it
possible to maintain min/max values of the primary time column in table Meta
?. maintaining time column meta will help broker side segment pruning similar
to partition.  
**@mayanks:** Pinot already does that and prunes segments based on min-max
time stamp in the segment metadata.  
**@kmvb.tau:** so query which accesses last week data(7 segments) will be
scheduled to scan only 7 segments ?. Does segment pruning happen at the broker
level itself or at server level?  
**@mayanks:** We have some pruning that happens at broker and other server
level  
**@mayanks:** Yes, only 7 days of segment will be processed. also Pinot has
sorted and inv index that can be used to further avoid scanning all data
inside these 7 segments  
**@kmvb.tau:** 1\. Based on my understanding from documentation, partitioning
helps segment pruning at the broker level itself. 2\. For last week's data
query, all 365 segments will be scheduled in the broker and only 7 segments
will be processed in the server remaining segments will be pruned in the
server based on segment metadata. 3\. My suggestions is to handle main time-
column criteria similar to partition column criteria. i.e pruning ar broker
level to avoid unnecessary scheduling to avoid cpu wastage.  
**@kmvb.tau:** please let me know if my understanding is wrong  
**@mayanks:** Yes we have optimized these based on real production use cases.
There is always a balance, eye broker needs to read metadata from zk, or cache
it, so that is the overhead. but these are optimizations we consider at
thousands of qps and millisecond latency. Is your usecase in that range? If
not then you might be over optimizing?  
**@kmvb.tau:** ok fine. For now, we expect 500 qps only and with sub 100 ms
latency. we will test and let you know if any issue due to overscheduling.  
**@mayanks:** Yeah, server level pruning + partitioning + sorting + inv index
+ replica group will give you much better than that.  
 **@aaron:** If data ingestion jobs take a lot of memory to create a star tree
index, how can I tune that? Does maxLeafRecords affect the memory usage of the
segment creation job at all?  
**@jackie.jxt:** Yes, but that also affects the performance gain from the
star-tree  
**@aaron:** Do I need to tune maxLeafRecords based on the size of my dataset
or is the default of 10000 a sane value?  
**@aaron:** I'm asking because I can't get SegmentCreation jobs to run without
an incredible amount of GC overhead, so I'm wondering if I'm doing something
wrong  
**@jackie.jxt:** 10k is usually good  
**@jackie.jxt:** Do you know how many dimensions are included in the star-
tree?  
**@aaron:** My dimensionsSplitOrder has 7 items, and I've got like ~20
functionColumnPairs  
**@jackie.jxt:** I see. 7 dimensions are not much. Any high cardinality ones?  
**@jackie.jxt:** If you observe lots of GC, increasing the memory limit might
help  
**@aaron:** I don't think anything is super high cardinality; one of them
could have maybe a few tens of k of values though  
**@aaron:** By increasing the memory limit you mean the java heap size a la
-Xms and -Xmx?  
**@aaron:** I'm currently running with `-Xms32G -Xmx32G`  
**@aaron:** And I'm also limiting the segment generation parallelism to 4  
**@jackie.jxt:** Hmm, that's already quite high  
**@aaron:** I have verbose GC logging on and I see a lot of this:
```2021-05-12T19:10:45.404+0000: [Full GC (Ergonomics) [PSYoungGen:
5921280K->5921277K(8552960K)] [ParOldGen: 21347368K->21347368K(22369792K)]
27268648K->27268646K(30922752K), [Metaspace: 55657K->55657K(59392K)],
55.0908616 secs] [Times: user=1220.86 sys=14.76, real=55.08 secs]
2021-05-12T19:11:40.497+0000: [Full GC (Ergonomics) [PSYoungGen:
5921280K->5921277K(8552960K)] [ParOldGen: 21347368K->21347368K(22369792K)]
27268648K->27268646K(30922752K), [Metaspace: 55657K->55657K(59392K)],
52.7552240 secs] [Times: user=1260.30 sys=13.89, real=52.75 secs]
2021-05-12T19:12:33.252+0000: [Full GC (Ergonomics) [PSYoungGen:
5921280K->5921279K(8552960K)] [ParOldGen: 21347368K->21347368K(22369792K)]
27268648K->27268648K(30922752K), [Metaspace: 55657K->55657K(59392K)],
47.7370731 secs] [Times: user=1237.77 sys=9.23, real=47.74 secs] ```  
**@jackie.jxt:** Are you using the on-heap or off-heap mode?  
**@aaron:** Not sure :grimacing: What is that and how can I find out?  
**@jackie.jxt:** Do you use the spark job to create the segment?  
**@aaron:** No, I'm running it via the docker image  
**@jackie.jxt:** Oh, with the minion task?  
**@jackie.jxt:** In that case it is off-heap  
**@jackie.jxt:** Can you try further reducing the parallelism and see if the
GC becomes better?  
**@aaron:** Not with minion either; I'm just running this on the command line  
**@jackie.jxt:** I see. Then maybe just reduce the parallelism and see if the
GC goes down  
**@aaron:** Is there such thing as a too-big segment creation job?  
**@jackie.jxt:** What's the size of your input file and the output segment?  
**@aaron:** The input is about 80 parquet files, 16 GB in total  
**@aaron:** Not sure how big the output segment is because it's never
succeeded :open_mouth:  
**@jackie.jxt:** In that case, can you start with single threaded?  
**@jackie.jxt:** 200MB per file in average is not too large  
**@aaron:** Ok so I looked into this a little more -- the cardinality of my
dimensions all together is 60,000,000  
**@aaron:** Like, if I multiply the cardinality of each dimension  
**@aaron:** Is that ridiculous?  
**@jackie.jxt:** Not too ridiculous, but chances are the star-tree won't get
much compression after removing the dimension  
**@jackie.jxt:** If you can get one segment generated, we can check the
segment metadata and see how many extra records generated for star-tree  
**@aaron:** Ok cool  
**@aaron:** Btw I realized I had `enableDefaultStarTree` enabled so it was
also building one across all dimensions, so I set that to false  
 **@sleepythread:** Need some feedback on the star tree index.
```"tableIndexConfig" : { "starTreeIndexConfigs":[{ "maxLeafRecords": 1000,
"functionColumnPairs": ["DISTINCT_COUNT_HLL__user_id","COUNT__dt"],
"dimensionsSplitOrder": ["dt","dim1","dim2","dim3","dim4"] }],
"enableDynamicStarTreeCreation" : true },``` This is to optimise following
queries. ```select dt,DISTINCT_COUNT_HLL(user_id) FROM TABLE GROUP BY dt
select dt,count(1) FROM TABLE GROUP BY dt select
dt,dim2,DISTINCT_COUNT_HLL(user_id) FROM TABLE where dim1 = 3 GROUP BY dt,
dim2 select dt,dim2,count(1) FROM TABLE where dim1 = 3 GROUP BY dt, dim2 ```
dim1,2,3,4 does not have too much high cardinality. User_id has the biggest
cardinality.  
**@mayanks:** Seems good to me. @jackie.jxt?  
**@jackie.jxt:** Yeah, lgtm  
**@sleepythread:** Thanks  
 **@yupeng:** @xd Nice talk at  today! A Pinot table of PB size is amazing..  
**@mayanks:** Is there a recording available?  
**@yupeng:** yes, you can view it after registration  
**@mayanks:** I am registered, will find it.  
**@xd:** Thanks. Hope our experience can help other Pinot enthusiastics!  
**@xd:** Supposedly this link will direct you there, if you register:  
**@mayanks:** :thankyou:  
**@mayanks:** Great talk @xd, just watched it. Largest Pinot table in the
world is quite an accomplishment. Congratulations!  

###  _#random_

  
 **@mohitdubey95:** @mohitdubey95 has joined the channel  
 **@xiong.juliette:** @xiong.juliette has joined the channel  

###  _#troubleshooting_

  
 **@chxing:** Hi All, When i using pinot0.7.1 I found this error in log  
 **@chxing:** ```Grpc port is not set for instance:
Controller_10.252.125.84_9000```  
**@fx19880617:** are you using presto? to enable grpc port in pinot server,
please add configs below to your pinot server configs then restart pinot
servers: ``` pinot.server.grpc.enable=true pinot.server.grpc.port=8090```  
**@chxing:** I don’t using presto now, so if pinot0.7.1 is enabled default?  
**@jackie.jxt:** You may ignore it then  
**@jackie.jxt:** While we should check why we log it for controller  
**@chxing:** yes , seems shoud be a bug?  
**@chxing:** I also got Admin port is not set for instance: Broker_sj1-pinot-
controller-broker-01_8099 in controller log  
**@chxing:** After set the conf, I still can get the error log in controller
```pinot.server.grpc.enable=true pinot.server.grpc.port=8090```  
 **@chxing:** Do you know how to dix it, thx  
 **@patidar.rahul8392:** Is there any option to remove dead server from pinot
UI.  
 **@patidar.rahul8392:** I don't want to show the last server with status as
Dead on Pinot UI.  
**@jackie.jxt:** @npawar ^^ Do we support dropping server via the UI?  
**@npawar:** no  
**@npawar:** you’ll have to untag it, then remove it from the cluster  
**@npawar:** to untag:  
**@aiyer:** Hi Team -- I am trying to test some simple joins with presto .
Seeing this issue for one of the tables even when I just do a select * on that
table. ```java.lang.NullPointerException: null value in entry:
Server_172.18.0.3_7000=null at
com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
at
com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42)
at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72) at
com.google.common.collect.ImmutableMap.of(ImmutableMap.java:124) at
com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:458) at
com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) at
com.facebook.presto.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:242)
at
com.facebook.presto.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:214)
at
com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:161)
at
com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:276)
at
com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:241)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) at
com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) at
com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) at
com.facebook.presto.operator.Driver.processFor(Driver.java:294) at
com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at
com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at
com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
at
com.facebook.presto.$gen.Presto_0_254_SNAPSHOT_2999330____20210512_100627_1.run(Unknown
Source) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```  
 **@aiyer:** I have two tables txn (fact) and txtypes (dimension) .. Just
added a handful of rows (2 to 3 rows) to check the query plan..  
 **@aiyer:** but when i query txtypes , i am getting this NPE..  
 **@aiyer:** on pinot, the select * is working fine for this table.  
 **@aiyer:** pls help  
**@aiyer:** these are the images i am using  
**@aiyer:** I pulled latest pinot, still same error..  
**@mayanks:** I think PrestoSQL/Trino may be more up to date than prestodb
@fx19880617 ?  
**@aiyer:** i used the image from the document... Is there some other image i
should use?  
**@mayanks:** Oh then ignore my comment.  
**@mayanks:** Let me check and get back  
**@aiyer:** sure  
**@aiyer:** Any luck @mayanks ? Or is there anything else I can try to
circumvent this problem?  
**@fx19880617:** can you check if your pinot server config has
```pinot.server.instance.currentDataTableVersion=2```  
**@fx19880617:** I think there was a recent pinot server upgrade, which make
the internal data protocol version advanced than presto has  
**@aiyer:** how can i check this ?  
**@aiyer:** is it in the UI  
**@aiyer:**?  
**@fx19880617:** for k8s  
**@aiyer:** I am running in docker on my local  
**@fx19880617:** check ```kubectl get configmaps pinot-live-server-config -n
pinot -o yaml```  
**@fx19880617:** replace `-n pinot` to your namespace  
**@aiyer:** i am not running on kubernetes  
**@fx19880617:** if just docker  
**@fx19880617:** then I guess it doesn’t have  
**@fx19880617:** have do you start the pinot docker  
**@aiyer:** ```docker run \ \--network=pinot-demo \ \--name pinot-quickstart \
-p 9000:9000 \ -d apachepinot/pinot:latest QuickStart \ -type hybrid```  
**@aiyer:** from here  
**@fx19880617:** ic, then this one should have no such configs  
**@fx19880617:** can you try image `apachepinot/pinot:0.7.1`  
**@aiyer:** ok sure..  
**@aiyer:** Yeah with this the NPE is gone and i am able to query from
presto... but the Query console on pinot UI is going blank..  
**@aiyer:** is there any way to get the latest pinot working with this ? Not
sure of what features i will miss by using 0.7.0..  
**@aiyer:** getting this blank query screen.  
**@fx19880617:** hmm, can you try to clean the cache or try in another
browser?  
**@aiyer:** Yeah just tried that..  
**@aiyer:** it worked in incognito..  
**@aiyer:** I will try the join on presto now!! Thank you.  
**@fx19880617:** :thumbsup:  
**@aiyer:** ignore the last message.. i will investigate more and get back.  
**@fx19880617:** Can you try this ```SET SESSION
pinot.limit_larger_for_segment=200000000; SELECT ...```  
**@fx19880617:** I thought default pinot.limit_larger_for_segment should be
2147483647  
**@aiyer:** let me try  
**@aiyer:** that didn't make any change.. actually the join is doing something
strange.. its not giving the correct result..  
**@aiyer:** ```presto:default> select txName,sum(a.amount) from txn a left
join txtypes b on a.txtype=b.txtype group by txName; txName | _col1
\---------+------------------- Invoice | 2453.240119934082 (1 row) Query
20210512_173300_00076_imcrr, FINISHED, 1 node Splits: 100 total, 100 done
(100.00%) 0:00 [0 rows, 62B] [0 rows/s, 157B/s] presto:default> select
txName,sa from (select txtype,sum(amount) sa from txn group by txtype) a join
txtypes b on a.txtype=b.txtype; txName | sa \---------+--------------------
Invoice | 2539.4801235198975 (1 row) Query 20210512_173312_00077_imcrr,
FINISHED, 1 node Splits: 67 total, 67 done (100.00%) 0:00 [0 rows, 42B] [0
rows/s, 210B/s]```  
**@aiyer:** the second query 's result is correct...  
**@aiyer:** First query is ignoring 2 rows..  
**@fx19880617:** hmm, how large is table txn  
**@aiyer:** its a very small table.. just 4 records in txn and 2 records in
txtypes..  
**@fx19880617:** then try to do join all and see?  
**@fx19880617:** ```select txName, a.amount, a.txtype from txn a left join
txtypes b on a.txtype=b.txtype ```  
**@aiyer:** ```presto:default> select txName, a.amount, a.txtype from txn a
left join txtypes b on a.txtype=b.txtype ; txName | amount | txtype
\---------+--------------------+-------- Invoice | 2342.1201171875 | 1 Invoice
| 111.12000274658203 | 1 (2 rows) Query 20210512_173837_00085_imcrr, FINISHED,
1 node Splits: 68 total, 68 done (100.00%) 0:00 [0 rows, 62B] [0 rows/s,
244B/s] presto:default> select txName, a.amount, a.txtype from txn a left join
txtypes b on a.txtype=b.txtype limit 100; txName | amount | txtype
\---------+---------+-------- Invoice | 2342.12 | 1 Invoice | 65.12 | 1
Invoice | 21.12 | 1 Invoice | 111.12 | 1 (4 rows)```  
**@aiyer:** without limit it shows 2 records..  
**@aiyer:** with limit , it shows all 4  
**@fx19880617:** hmm  
**@fx19880617:** what’s generated pinot query for without limit case  
**@aiyer:** ```GeneratedPinotQuery{query=SELECT amount, txtype FROM
txn__TABLE_NAME_SUFFIX_TEMPLATE____TIME_BOUNDARY_FILTER_TEMPLATE__ LIMIT 1,
format=SQL, table=txn, expectedColumnIndices=[], groupByClauses=0, ```  
**@fx19880617:** can you check presto config in your docker container log  
**@aiyer:** which config?  
**@fx19880617:** what’s the config for ```pinot.limit-large-for-segment```  
**@fx19880617:** you can search log for this  
**@fx19880617:** I think it should be set as 1  
**@aiyer:** not able to find this in the docker logs..  
**@fx19880617:** for quickstart we set this to 1 ``` connector.name=pinot
pinot.controller-urls=pinot-quickstart:9000 pinot.controller-rest-
service=pinot-quickstart:9000 pinot.limit-large-for-segment=1 pinot.allow-
multiple-aggregations=true pinot.use-date-trunc=true pinot.infer-date-type-in-
schema=true pinot.infer-timestamp-type-in-schema=true```  
**@fx19880617:** so it’s intentional  
**@aiyer:** ok got it.. so is this something I can reset ?  
**@fx19880617:** you can try to create this `pinot_quickstart.properties` file  
**@fx19880617:** then mount it to docker container  
**@fx19880617:** also the session config should work  
**@aiyer:** ok i will try the session cofig first..  
**@aiyer:** actually i had set earlier when you pointed. ```SET SESSION
pinot.limit_larger_for_segment=200000000; ```  
**@aiyer:** but that didnt work  
**@fx19880617:** SET SESSION pinot.limit_larger_for_segment=2147483647;  
**@fx19880617:** try this  
**@fx19880617:** then send the query again  
**@aiyer:** ok  
**@fx19880617:** you can check the explain for that  
**@aiyer:** didnt work.. i guess the session setthing is not getting picked
up..  
**@fx19880617:** hmm  
**@fx19880617:** which presto image are you using  
**@aiyer:** ```243aa15aff9d```  
**@fx19880617:** so explain doesn’t give the right limit?  
**@aiyer:** correct that still has limit 1  
**@fx19880617:** I tried on my side, it gives me the
```GeneratedPinotQuery{query=SELECT DestStateName FROM
airlineStats__TABLE_NAME_SUFFIX_TEMPLATE____TIME_BOUNDARY_FILTER_TEMPLATE__
LIMIT 1000,```  
**@fx19880617:** when I do ```presto:default> SET SESSION
pinot.limit_larger_for_segment=1000;```  
**@fx19880617:** hmm  
**@aiyer:** hmm .. not sure whats wrong.. are you using the same image?  
**@fx19880617:** I think as long as you can set the session config  
**@fx19880617:** then it should be fine  
**@aiyer:** ok.. another thing,.. I tried this as well... ``` presto:default>
select * from txn; amount | id | txtype | tenant | ts
\--------------------+---------------+--------+--------+-------------------------
111.12000274658203 | 101_1_2020102 | 1 | 101 | 2021-05-12 12:45:20.744
2342.1201171875 | 101_1_2020105 | 1 | 101 | 2021-05-12 12:44:50.744 (2
rows)```  
**@aiyer:** i should have gotten 5 records..  
**@aiyer:** but only got 2..  
**@aiyer:** but i tried select * from airlineStats, that is giving me multiple
records...  
**@fx19880617:** hmm  
**@fx19880617:** which catalog are you using  
**@fx19880617:** what’s the explain on this query  
**@aiyer:** ```uery=SELECT AirTime FROM
airlineStats__TABLE_NAME_SUFFIX_TEMPLATE____TIME_BOUNDARY_FILTER_TEMPLATE__
LIMIT 1```  
**@aiyer:** ```PinotQuery{query=SELECT AirTime FROM
airlineStats__TABLE_NAME_SUFFIX_TEMPLATE____TIME_BOUNDARY_FILTER_TEMPLATE__
LIMIT 1, format=SQL, table=airlineStats, expectedColumnIn```  
**@fx19880617:** I think it’s getting 1 record per segment  
**@fx19880617:** then merge  
**@fx19880617:** but this session config doesn’t work really interesting  
**@fx19880617:** can you try presto image: ```apachepinot/pinot-
presto:0.254-SNAPSHOT-54a7ec79a3-20210512```  
**@aiyer:** yeah something strange..  
**@aiyer:** yeah i can try  
**@aiyer:** same result ..  
**@aiyer:** i set the session as well..  
**@fx19880617:** hmm  
**@fx19880617:** then change the config file and mount it  
**@aiyer:** ok.. in production, this value how will it be decided?  
**@aiyer:** i work out of india time zone.. so i will test this config file
thing tomorrow morning my time and update here..  
**@fx19880617:** yes  
**@fx19880617:** in prod, you anyway need to have your own config file and set
it accordingly  
**@aiyer:** is there anything in the docs that talks about this? I would like
to understand how to set it up..  
 **@mohitdubey95:** @mohitdubey95 has joined the channel  
 **@xiong.juliette:** @xiong.juliette has joined the channel  
 **@avasudevan:** I am trying to connect `superset` to Pinot using `` But, I
am getting the below error… ```ERROR: (builtins.NoneType) None (Background on
this error at: )``` Could anyone assist?  
**@avasudevan:** Dockers running in my local  
**@chinmay.cerebro:** you might want to double check broker host and port and
make sure its externally addressable using something like `-p 9000:9000`  
**@chinmay.cerebro:** from your screenshot - doesn't look like that's the case  
**@fx19880617:** broker port is 8000 I think  
**@chinmay.cerebro:** @fx19880617 should we update the URL in this example:  
**@chinmay.cerebro:** > it says : ```  
**@fx19880617:** ic  
**@fx19880617:** it should be 8099 I think  
**@fx19880617:** let me update tit  
**@chinmay.cerebro:** and also mention it explicitly that it has to be
externally addresable  
**@chinmay.cerebro:** or everything is on the same network bridge  
**@fx19880617:** it’s an example  
**@fx19880617:** in the doc, it mentions broker port  
**@avasudevan:** Thanks guys! That worked.  

###  _#minion-improvements_

  
 **@jackie.jxt:** Added this github issue:  
**@jackie.jxt:** @npawar Please take a look and see if the proposed solution
is valid  
 **@jackie.jxt:** It should also solve the derived column problem  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org