You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/12/02 02:00:18 UTC
Apache Pinot Daily Email Digest (2020-12-01)

### _#general_

  
 **@whatatrip888:** I am working on load test using Jmeter in pinot tables.
Currently I am getting an exception related to connection pooling. Response
message:java.sql.SQLException: Cannot create PoolableConnectionFactory (null)
java.sql.SQLException: Cannot create PoolableConnectionFactory (null) at
org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:669)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:544)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:308)
[ApacheJMeter_jdbc.jar:5.3] at
org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:127)
[ApacheJMeter_jdbc.jar:5.3] at
org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:205)
[ApacheJMeter_core.jar:5.3] at
org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:380)
[ApacheJMeter_core.jar:5.3] at java.lang.Thread.run(Unknown Source)
[?:1.8.0_271]  
**@fx19880617:** How do you query Pinot ? Which client are you using? Did you
see any exceptions in Pinot controller or broker ?  
**@whatatrip888:** @fx19880617 @npawar @g.kishore From Jmeter, JDBC client
used to querying pinot. There is no exceptions in Pinot controller or broker.
The error from jmeter like this: 2020-12-01 19:55:11,767 INFO
o.a.j.e.StandardJMeterEngine: Running the test! 2020-12-01 19:55:11,767 INFO
o.a.j.s.SampleEvent: List of sample_variables: [] 2020-12-01 19:55:11,769 INFO
o.a.p.c.PinotDriver: Initiating connection to database for url: jdbc:
2020-12-01 19:55:11,804 INFO o.a.p.c.Connection: Creating connection to broker
list [instance-3:8099] 2020-12-01 19:55:11,812 ERROR
o.a.j.p.j.c.DataSourceElement: Error preinitializing the connection pool:
demo@494046791 java.sql.SQLException: Cannot create PoolableConnectionFactory
(null) at
org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:669)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:544)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:308)
[ApacheJMeter_jdbc.jar:5.3] at
org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:127)
[ApacheJMeter_jdbc.jar:5.3] at
org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:205)
[ApacheJMeter_core.jar:5.3] at
org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:380)
[ApacheJMeter_core.jar:5.3] at java.lang.Thread.run(Unknown Source)
[?:1.8.0_271] Caused by: java.sql.SQLFeatureNotSupportedException at
org.apache.pinot.client.base.AbstractBaseConnection.getAutoCommit(AbstractBaseConnection.java:116)
~[pinot-jdbc-client-0.5.0.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179]
at
org.apache.commons.dbcp2.DelegatingConnection.getAutoCommit(DelegatingConnection.java:391)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.commons.dbcp2.PoolableConnectionFactory.activateObject(PoolableConnectionFactory.java:116)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:117)
~[commons-dbcp2-2.7.0.jar:2.7.0] at
org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:665)
~[commons-dbcp2-2.7.0.jar:2.7.0] ... 7 more 2020-12-01 19:55:11,813 INFO
o.a.j.g.u.JMeterMenuBar: setRunning(true, *local*) 2020-12-01 19:55:11,817
INFO o.a.j.e.StandardJMeterEngine: Starting ThreadGroup: 1 : Thread Group
2020-12-01 19:55:11,818 INFO o.a.j.e.StandardJMeterEngine: Starting 1 threads
for group Thread Group. 2020-12-01 19:55:11,818 INFO
o.a.j.e.StandardJMeterEngine: Thread will continue on error 2020-12-01
19:55:11,818 INFO o.a.j.t.ThreadGroup: Starting thread group... number=1
threads=1 ramp-up=1 delayedStart=false 2020-12-01 19:55:11,818 INFO
o.a.j.t.ThreadGroup: Started thread group number 1 2020-12-01 19:55:11,818
INFO o.a.j.e.StandardJMeterEngine: All thread groups have been started
2020-12-01 19:55:11,838 INFO o.a.j.t.JMeterThread: Thread started: Thread
Group 1-1 2020-12-01 19:55:11,839 INFO o.a.p.c.PinotDriver: Initiating
connection to database for url: jdbc: 2020-12-01 19:55:11,871 INFO
o.a.p.c.Connection: Creating connection to broker list [instance-3:8099]
2020-12-01 19:55:11,878 INFO o.a.j.t.JMeterThread: Thread is done: Thread
Group 1-1 2020-12-01 19:55:11,879 INFO o.a.j.t.JMeterThread: Thread finished:
Thread Group 1-1 2020-12-01 19:55:11,879 INFO o.a.j.e.StandardJMeterEngine:
Notifying test listeners of end of test 2020-12-01 19:55:11,879 INFO
o.a.j.g.u.JMeterMenuBar: setRunning(false, *local*)  
**@fx19880617:** @kharekartik do you have any idea on this?  
**@kharekartik:** @whatatrip888 Which version of pinot and jdbc are you using?  
 **@dungnt:** @dungnt has joined the channel  
 **@gloetscher:** Hey everyone ! I’m currently investigating Apache Pinot, and
after reading a good chunk of the documentation, I have a couple of questions.
• If this , it’s said that if you lose all your controller, your cluster will
still be able to answer to read queries (but not write queries, obviously).
Then, if a new controller is started, it says that the cluster will recover
and will be then available again for write queries. That supposed that all
cluster states are stored somewhere. I suppose that “somewhere” is Zookeeper ?
• Offline servers are responsible to host segments. Let’s say we have only one
replica for a given segment, and the offline server hosting it dies. Will
Helix discover that and will ask another offline server to download the same
segment, in order to make it available again to the brokers ? • Where can I
find some information about the resource requirements (mainly CPU / memory)
for controllers / brokers / realtime servers / offline servers ? Thanks for
your help !  
**@tanmay.movva:** > Will Helix discover that and will ask another offline
server to download the same segment, in order to make it available again to
the brokers I don’t think this happens automatically, when a server has died
or left the cluster, then we need to trigger the server rebalancing for that
lost table so that the segments are hosted by other servers.  
**@tanmay.movva:** > will be then available again for write queries Rather
than seeing it as a write query, I think it is better to see it as new
segments won’t be published. Because the servers won’t be able to commit the
segment(as controller is involved in segment completion protocol) even though
they would be able to ingest data. You can read about the protocal here -  
**@gloetscher:** > I don’t think this happens automatically, when a server has
died or left the cluster, then we need to trigger the server rebalancing for
that lost table so that the segments are hosted by other servers. OK so I
guess it’ll be managed by replicas, then  
**@gloetscher:** > Rather than seeing it as a write query, I think it is
better to see it as new segments won’t be published. Because the servers won’t
be able to commit the segment(as controller is involved in segment completion
protocol) even though they would be able to ingest data. Thanks for the
clarification. Yeah, when I said “write queries”, what I meant was “the
possibility to make a segment available in Pinot”. But as long as the “cold”
storage remains available, you can still upload fresh data in it. It’ll just
not be available in the Pinot cluster.  
**@g.kishore:** When a server dies and if you are running on k8s, a new
container is created that will download the segments and start serving  
**@gloetscher:** yeah, but let’s say we have another server available with the
correct tags, will it be picked up as the new host for the missing segment ?  
**@g.kishore:** What’s the name of new server (logical)  
**@gloetscher:** sorry, I’m not sure to understand you.  
**@gloetscher:** also, I’m only starting with Pinot, so I may be missing some
important points  
**@g.kishore:** When a segment is uploaded, we assign it to one or mor
servers. That mapping is stored in Helix  
**@g.kishore:** That mapping will only change in the following scenario • add
more servers and invoke rebalance • Untag a server and invoke rebalance  
**@g.kishore:** Untag is a way to say that this server should not host
segments for this table anymore  
**@g.kishore:** Note that if a server dies, it’s tag is still maintained in
Helix/zookeeper  
**@g.kishore:**  
**@g.kishore:** This video should help  
**@gloetscher:** I’ll have a look, thanks !  
 **@srsurya122:** tried to execute the pinot start controller cmd It got the
following error could you please help me with this?  
**@taranrishit1234:** +1  
**@fx19880617:** I think this is the issue in windows. Can you try to run
Pinot on a Linux OS or try to run it using docker ?  
 **@srsurya122:**  
 **@myeole:** I am trying to run Pinot in Kubernetes but seeing following
error. Any suggestions ? helm install -n pinot-quickstart kafka
incubator/kafka --set replicas=1 Error: failed to download "incubator/kafka"
(hint: running `helm repo update` may help)  
**@fx19880617:** you need to install the repo  
**@fx19880617:** ``` helm repo add incubator ```  
**@fx19880617:** ```helm repo add stable ```  
**@fx19880617:** this is typical init cmd for helm after you install it  
**@fx19880617:**  
**@joao.comini:** Oh, `@here` is so scary :fearful:  
**@fx19880617:** Agreed  
**@fx19880617:** you can go to <#C011C9JHN7R|troubleshooting> channel for
questions :wink:  
**@myeole:** sure Thanks  
 **@zjinwei:** @zjinwei has joined the channel  

###  _#random_

  
 **@dungnt:** @dungnt has joined the channel  
 **@zjinwei:** @zjinwei has joined the channel  

###  _#troubleshooting_

  
 **@tanmay.movva:** Hello, I have added updated the table with indexing config
to add indices on some columns. After this I triggered `Reload All Segments`
to apply the indexes. When I try to check the `Reload Status` , I get this
error on the UI ```Table type : REALTIME not yet supported.```  
**@tanmay.movva:** I have checked the tableIndexConfig in the IDEAL STATE of
the table and the indices were present. Is this the source of truth for me to
check if the indexing has been applied or not? If not, how can I check if the
indexing has been applied to that table?  
**@jackie.jxt:** There are 2 ways: 1\. Make a query `select count(*) from
table where col = 123` and check the `numEntriesScannedInFilter`, if it is 0,
then that means the inverted index on `col` is present 2\. Log in to the
server and check the segment directory under data directory. In the
`index_map` file you should be able to find the inverted index for that column
if it exists  
**@tanmay.movva:** Thanks @jackie.jxt :slightly_smiling_face:  
 **@tanmay.movva:** Hello, when I am querying min/max of a column which is not
present in the table, pinot returns Infinity/-infinity. Shouldn’t the ideal
behaviour be to throw an error saying the column is not present?  
**@fx19880617:** do you mean query a column which doesn't exist ? I also feel
the behavior should be throw exception by saying field not found/ evaluated.  
**@fx19880617:** Can you create a github issue for this  
**@tanmay.movva:** Sure.  
**@fx19880617:** This may also be extended to other aggregation functions I
guess  
**@tanmay.movva:** Yes. Same scenario for other aggregations also.  
**@tanmay.movva:** @fx19880617  
**@fx19880617:** cool! thanks!  
 **@srsurya122:** @srsurya122 has joined the channel  
 **@zjinwei:** @zjinwei has joined the channel  
 **@amitchopra:** @amitchopra has joined the channel  
 **@yupeng:** hey, any good way to optimize such query ```SELECT
hour_start_timestamp_utc FROM downtime WHERE (secondsSinceEpoch > 1606247126)
ORDER BY secondsSinceEpoch DESC, hour_start_timestamp_utc DESC LIMIT 1``` it
scans the past 1 week of data but return only 1 record. since the table is
large, it ends up scanning about 100 million records per query, and takes
seconds query output is like ```{ "selectionResults": { "columns":
["hour_start_timestamp_utc"], "results": [ ["2020-12-01 09:00:00"] ] },
"exceptions": [], "numServersQueried": 9, "numServersResponded": 9,
"numSegmentsQueried": 1059, "numSegmentsProcessed": 1059,
"numSegmentsMatched": 18, "numConsumingSegmentsQueried": 0, "numDocsScanned":
142101504, "numEntriesScannedInFilter": 0, "numEntriesScannedPostFilter":
284203008, "numGroupsLimitReached": false, "totalDocs": 7374174837,
"timeUsedMs": 3522, "segmentStatistics": [], "traceInfo": {},
"minConsumingFreshnessTimeMs": 0 }```  
**@mayanks:** Perhaps add another predicate on secondsSinceEpoch < xxx  
**@mayanks:** That should allow for pruning more segments. If the code doesn't
do that today, would be a good feature to add.  
**@yupeng:** you mean the range indexing?  
**@mayanks:** Well range based pruning (based on metadata)  
**@mayanks:** so if minTime = a and maxTime = b (in metadata), then segments
where querty is not between a and be can be pruned out  
**@mayanks:** We do have a time segment pruner, I don't recall if it can
handle ranges  
**@yupeng:** got it  
**@yupeng:** does it require to have both lower and upper bounds?  
**@fx19880617:** use max(hour_start_timestamp_utc)?  
**@ken:** If the segments are time-based, and you set the table config’s
columnMinMaxValueGeneratorMode to something that gave you min/max for the
secondsSinceEpoch column, then would Pinot optimize out the segments that
contain no possible rows?  
**@mayanks:** @yupeng I assumed your predicate timestamp was for 1 week ago.
If so, that would prune out data from older than 1 week, but would still scan
data for the last one week  
**@mayanks:** @fx19880617 using `max` would still scan, unless we can somehow
prune out segments  
**@yupeng:** @fx19880617 tried max, but got error ``` "message":
"QueryExecutionError:\njava.lang.NumberFormatException: For input string:
\"2020-10-15 09:00:00\"\n\tat
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)\n\tat
sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)\n\tat
java.lang.Double.parseDouble(Double.java:538)\n\tat
org.apache.pinot.core.segment.index.readers.StringDictionary.getDoubleValue(StringDictionary.java:58)\n\tat
org.apache.pinot.core.operator.query.DictionaryBasedAggregationOperator.getNextBlock(DictionaryBasedAggregationOperator.java:66)\n\tat
org.apache.pinot.core.operator.query.DictionaryBasedAggregationOperator.getNextBlock(DictionaryBasedAggregationOperator.java:43)\n\tat
org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat
org.apache.pinot.core.operator.CombineOperator$1.runJob(CombineOperator.java:105)\n\tat
org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40)\n\tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)\n\tat
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)"```  
**@mayanks:** Yeah, seems that works on numeric data only  
**@fx19880617:** it’s not for string, if that column is number then should be
ok  
**@fx19880617:** then you can use time_coverter function to convert it to
human readable format  
**@yupeng:** that would lead to scan as well?  
**@fx19880617:** if it’s a full segment match, then we should just read it
from dictionary  
**@yupeng:** @mayanks that’s right, we wanted to get latest record, but not
sure when it was last generated  
**@mayanks:** I don't think max helps here, unless we find a way to avoid
segment scans (pruning)  
**@yupeng:** so use a wider range to be safe  
**@mayanks:** Why do you go back 7 days for the latest record?  
**@yupeng:** to be safe  
**@yupeng:** it’s used to track the most recent down time  
**@mayanks:** Your data is time ordered?  
**@yupeng:** need to check with the user, but i hope we can assume so  
**@mayanks:** If you just say max without any predicates then it will work (I
think that is what Xiang was suggesting)?  
**@mayanks:** without any predicates, we dont' scan, we just look at the
metadata  
**@yupeng:** right  
**@yupeng:** we tried that query, and got the error above  
**@mayanks:** Try time-convert as Xiang suggested  
 **@graham:** @graham has joined the channel  
 **@pradeepgv42:** Hi, we are trying NFS as deep store for pinot in an off
cloud setting, wondering if there’s anything we should be careful about? Also,
did anybody try MinIO as deepstore for pinot?  
**@mayanks:** We use NFS too, works well so far. The only issue is pushes of
large data size can take longer since all payload has to go through controller  
**@pradeepgv42:** got it thanks  

###  _#presto-pinot-streaming_

  
 **@dungnt:** @dungnt has joined the channel  

###  _#pinot-docs_

  
 **@g.kishore:** @fx19880617 we forgot to add docs for case statement  
**@fx19880617:**  
 **@ken:** On  it doesn’t list BOOLEAN as a data type, and on  it doesn’t say
what the default value would be for BOOLEANs. Should I fix that? And BOOLEAN
gets converted to a STRING, right? Any other similar “alias” types that are
missing?  
**@fx19880617:** I don’t think we support boolean, it will be converted to
string with dictionary.  
**@ken:** I see the “BOOLEAN” field type described on the page (“Data type of
the dimension column. Can be STRING, BOOLEAN, INT, LONG, DOUBLE, FLOAT,
BYTES”). Also, I’m using it in my schema. I would assume I’d get an error if
it wasn’t supported (which is different from getting auto-converted to a
string).  
**@fx19880617:** ah ic, this is for the internal implementation details,
basically we treat bool same as string field  
**@ken:** Is any validation done? Or is it just blindly converted to string?  
**@fx19880617:** data validation is there  
 **@ken:** On , it’s not clear which (if any) of the properties must exist in
the table json structure. E.g. `enableDefaultStarTree` is described, but it’s
not in my table json copied from `pinot-quickstart`. And for any that are
optional, the default value should be called out. E.g. I assume
`enableDefaultStarTree` has a default value of `false`.  
**@chinmay.cerebro:** good point. We didn't have a separate column called
default value for this table since the original fields did not really have
any. We can either update the description or just create a new column for this  
**@ken:** Is every property optional?  
**@ken:** (in that if the json doesn’t contain it, then there’s a default that
gets used)  
**@chinmay.cerebro:** Yes I think every property here is optional  
**@chinmay.cerebro:** there is no default for most of them  
**@chinmay.cerebro:** eg: inverted/sorted/bloom columns are empty if nothing
is specified. loadMode, nullHandlingEnabled etc have default values  
**@ken:** Got it, thanks  

###  _#presto-pinot-connector_

  
 **@dungnt:** @dungnt has joined the channel  

###  _#test-channel_

  
 **@kennybastani:** @kennybastani has joined the channel  
 **@karinwolok1:** @karinwolok1 has joined the channel  
 **@kennybastani:** test  
 **@kennybastani:** Did you get the notification?  
 **@karinwolok1:** yes  
 **@karinwolok1:** I got it earlier from someone which is why I want to change
it  
 **@karinwolok1:** I am looking on slack settings and it doesn't show that
option, although when I google it - it says there is a way. I am looking for
it but it doesn't exist on my end  
 **@kennybastani:** Okay, I think that the workspace owner (Kishore or Ananth)
has to change this setting  
 **@kennybastani:**  
 **@karinwolok1:** Ok I will tell Kishore  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org