You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/06/17 02:00:22 UTC

Apache Pinot Daily Email Digest (2021-06-16)

### _#general_

  
 **@hashhar:** @hashhar has joined the channel  
 **@mbracke:** Hi! Is there a way to write a where clause to match entries
that do not match a given regular expression? Using `not` just results in an
error message.  
**@fx19880617:** I think not regex_match is not supported, current solution is
to negate the regex (I know sometimes it’s hard)  
**@fx19880617:** We should add NOT support for REGEX_MATCH  
**@fx19880617:** can you create a github issue  
**@mbracke:** OK, thanks.  
**@mbracke:** Isn't this issue similar:  ? It's on REGEXP_LIKE, but that's
what I'm using.  
**@fx19880617:** true, I think we can use same issue  
 **@keweishang:** Hi team, I downloaded  and followed the `Manual cluster
setup` ()’s `Using launcher scripts` section, I ran ```export
JAVA_OPTS="-Xms4G -Xmx8G -XX:+UseG1GC -XX:MaxGCPauseMillis=200
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime -Xloggc:gc-pinot-controller.log"
bin/pinot-admin.sh StartController \ -zkAddress localhost:2191 \
-controllerPort 9000``` to start the controller. However, the controller logs
lots of warns like the following (in the thread), and `` returns a blank web
UI. May I have some help please? Thanks. The docker version works for me but I
want to install Pinot on our EC2 nodes for further PoC.  
**@keweishang:** Controller WARN logs when starting controller: ```Jun 16,
2021 4:52:07 PM org.glassfish.grizzly.http.server.NetworkListener start INFO:
Started listener bound to [0.0.0.0:9000] Jun 16, 2021 4:52:07 PM
org.glassfish.grizzly.http.server.HttpServer start INFO: [HttpServer] Started.
2021/06/16 16:52:12.717 INFO [Reflections] [main] Reflections took 5310 ms to
scan 1 urls, producing 65540 keys and 128519 values 2021/06/16 16:52:12.772
WARN [Reflections] [main] could not get type for name
org.apache.commons.digester.AbstractObjectCreationFactory from any class
loader org.reflections.ReflectionsException: could not get type for name
org.apache.commons.digester.AbstractObjectCreationFactory at
org.reflections.ReflectionUtils.forName(ReflectionUtils.java:390) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.reflections.Reflections.expandSuperTypes(Reflections.java:381) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.reflections.Reflections.<init>(Reflections.java:126) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
io.swagger.jaxrs.config.BeanConfig.classes(BeanConfig.java:276) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
io.swagger.jaxrs.config.BeanConfig.scanAndRead(BeanConfig.java:240) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
io.swagger.jaxrs.config.BeanConfig.setScan(BeanConfig.java:221) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.controller.api.ControllerAdminApiApplication.setupSwagger(ControllerAdminApiApplication.java:101)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.controller.api.ControllerAdminApiApplication.start(ControllerAdminApiApplication.java:78)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.controller.ControllerStarter.setUpPinotController(ControllerStarter.java:421)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:283)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] Caused by:
java.lang.ClassNotFoundException:
org.apache.commons.digester.AbstractObjectCreationFactory at
java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_242] at
java.lang.ClassLoader.loadClass(ClassLoader.java:419) ~[?:1.8.0_242] at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_242]
at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ~[?:1.8.0_242] at
org.reflections.ReflectionUtils.forName(ReflectionUtils.java:388) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] ... 18 more
2021/06/16 16:52:12.799 WARN [Reflections] [main] could not get type for name
org.apache.log4j.EnhancedPatternLayout from any class loader
org.reflections.ReflectionsException: could not get type for name
org.apache.log4j.EnhancedPatternLayout at
org.reflections.ReflectionUtils.forName(ReflectionUtils.java:390) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.reflections.Reflections.expandSuperTypes(Reflections.java:381) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.reflections.Reflections.<init>(Reflections.java:126) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
io.swagger.jaxrs.config.BeanConfig.classes(BeanConfig.java:276) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
io.swagger.jaxrs.config.BeanConfig.scanAndRead(BeanConfig.java:240) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
io.swagger.jaxrs.config.BeanConfig.setScan(BeanConfig.java:221) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.controller.api.ControllerAdminApiApplication.setupSwagger(ControllerAdminApiApplication.java:101)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.controller.api.ControllerAdminApiApplication.start(ControllerAdminApiApplication.java:78)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.controller.ControllerStarter.setUpPinotController(ControllerStarter.java:421)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:283)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] Caused by:
java.lang.ClassNotFoundException: org.apache.log4j.EnhancedPatternLayout at
java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_242] at
java.lang.ClassLoader.loadClass(ClassLoader.java:419) ~[?:1.8.0_242] at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_242]
at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ~[?:1.8.0_242] at
org.reflections.ReflectionUtils.forName(ReflectionUtils.java:388) ~[pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] ... 18
more```  
**@mayanks:** Does curl to controller work?  
**@mayanks:** cc @fx19880617  
**@keweishang:** Yes, curl to controller’s 9000 port works  
**@mayanks:** Yeah so I think the cluster is up, not sure about the UI issue.  
**@keweishang:** Yeah, I think its an UI issue too. Tried different browsers,
all didn’t work. Are these `ClassNotFoundException` WARNs normal?  
**@mayanks:** Don’t recall seeing them but if the cluster is up and behaving
well, then not sure. Also don’t think that could cause UI issue  
**@keweishang:** I can access the  page, but not the  page (blank)  
**@mayanks:** what about help page (swagger)?  
**@mayanks:** I have tagged Xiang in case he has seen this issue. If not,
perhaps we should file an issue  
**@keweishang:** all above pages works, only  doesn’t work (blank). Sure let’s
wait for Xiang’s feedback on it. I can file an issue later if that’s really a
bug.  
**@fx19880617:** I remember the root cause is that ui requires broker/server
also up to be shown. It’s fixed recently:  
**@mayanks:** Thanks @fx19880617  
**@fx19880617:** should be fixed in next release  
**@keweishang:** Thanks! Indeed, starting broker + server has solved the issue
:+1:  
 **@mark.needham:** Hi, I'm trying to learn how to use dimension tables, but
I'm doing something wrong, but what I'm not sure. I have a `regions` dim table
and `cases` normal table. And then I run this query: ```select areaName,
lookUp('regions', 'Region', 'LTLAName', areaName) from cases limit 10``` But
the error message says it doesn't find the lookup function: ```[ {
"errorCode": 200, "message":
"QueryExecutionError:\norg.apache.pinot.core.query.exception.BadQueryRequestException:
Unsupported function: lookup with 4 parameters\n\tat
org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:189)\n\tat
org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:56)\n\tat
org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:52)\n\tat
org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:83)\n\tat
org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:94)\n\tat
org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:33)\n\tat
org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)\n\tat
org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:234)\n\tat
org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat
org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155)\n\tat
org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:139)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)"
} ]``` Any ideas?  
**@jmeyer:** Not an expert, but I remember making it work a few days ago, and
your query looks okay to me What version of Pinot are you using ?  
**@kulbir.nijjer:** Yes @jmeyer is right. @mark.needham Support for Lookup UDF
join was added in 0.7.1 version only  From the error message it's not able to
find the required code which means running with older Pinot version.  
**@mark.needham:** aha, cool! Yeh I had it using the docker 'latest', tag but
the first time that I ran that it picked up version 0.6.0.  
**@mark.needham:**  
**@mark.needham:** pinned it to 0.7.1 now :slightly_smiling_face:  
**@kulbir.nijjer:** Cool!  
**@mark.needham:** presumably on this query it's doing the lookup for every
single row and therefore repeating the same lookup lots of times? Is there a
way that I can get it to do the aggregation by area name first and then do the
lookup afterwards so there are less lookups to do?  
**@mark.needham:** (reason I ask is that the query time is 10x more with the
lookup than without)  
**@jackie.jxt:** @mark.needham Yes you are right, the lookup is performed on a
per row basis because it is currently modeled as transform. Can you please
file an issue for the optimization of deferring the lookup? Also add the
feature contributor: @canerbalci  
 **@hamoop:** @hamoop has joined the channel  
 **@prasanna.gsl:** @prasanna.gsl has joined the channel  
 **@keweishang:** Hi team, the query to return the the earliest row’s
timestamp `select DATETIMECONVERT(MIN(created), '1:MILLISECONDS:EPOCH',
'1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss', '1:SECONDS') as
min_created from delivery_order limit 1` failed with the following error (in
slack thread). The `created` column is of type: ```{ "name": "created",
"dataType": "LONG", "format" : "1:MILLISECONDS:EPOCH", "granularity":
"1:MILLISECONDS" }``` Interestingly, the query `select
DATETIMECONVERT(created, '1:MILLISECONDS:EPOCH',
'1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss', '1:SECONDS') as
min_created from delivery_order limit 1` without `MIN()` works fine. May I
have some advice? Thanks.  
**@keweishang:** ```ProcessingException(errorCode:450, message:InternalError:
java.io.IOException: Failed : HTTP error code : 500 at
org.apache.pinot.controller.api.resources.PinotQueryResource.sendPostRaw(PinotQueryResource.java:302)
at
org.apache.pinot.controller.api.resources.PinotQueryResource.sendRequestRaw(PinotQueryResource.java:340)
at
org.apache.pinot.controller.api.resources.PinotQueryResource.getQueryResponse(PinotQueryResource.java:222)
at
org.apache.pinot.controller.api.resources.PinotQueryResource.handlePostSql(PinotQueryResource.java:137)
at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
at
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
at
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
at
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219)
at
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
at
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469)
at
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391))```  
 **@qianbo.wang:** @qianbo.wang has joined the channel  
 **@steotia:** Hi All, we published the blog post today that I had referred to
in yesterday's talk  
**@nishanth:** @nishanth has joined the channel  
 **@jai.patel856:** Pinot Upsert Question: Upsert is supported only for
realtime tables. That’s fine. The time column is use to determine the order of
the updates to choose the latest one. What time is used to determine when to
evict a row (visible or not). The documents tend to point to segment age to
determine when to evict messages. In practice it seems to evict based on when
the row was actually imported. What’s the expected behavior for a realtime
(upsert) table?  
**@jackie.jxt:** If a row is not updated by another row with newer timestamp,
then it will expire along with the segment containing it. The segment is
expired based on the latest timestamp within the segment and the retention
config  
**@jai.patel856:** We currently have a convention where our rows are versioned
with a number. We’re using this as our time column. Part of the reasoning for
this is to ensure in the case we reprocess our flink job we won’t overwrite
rows in Pinot with old data. But the ordering and the retention are both
controlled by the time column, correct? Is there a good mechanism to control
ordering more directly rather than relying ont he same time column ued for
retention?  
**@jackie.jxt:** IIUC, the requirement is the same as this issue: ?  
**@jackie.jxt:** Currently it is not supported yet  
 **@ken:** For an offline (batch-generated) table, if I don’t specify a
`segmentIngestionFrequency`, then are `APPEND` and `REFRESH` values for
`segmentIngestionType` essentially equivalent?  
**@mayanks:** Is this a hybrid table? Not specifying the frequency might mess
up time boundary depending on time unit.  
**@ken:** Just OFFLINE  
**@ken:** I guess the meta-question is what happens if I create a new version
of an existing segment file for an offline table, and do a metadata push. I’m
assuming that’s a refresh, and Pinot will correctly handle that.  
**@mayanks:** Another place it is used is for interval check for validation.  
**@mayanks:** Even for APPEND table, you can refersh any segment at any time  
**@mayanks:** That is how backfill works  
**@ken:** So if I’ve got an offline table segment that I update on a daily
basis, what’s the recommended settings? use `REFRESH` with a
`segmentIngestionFrequency` of 1 day?  
**@mayanks:** REFRESH is typically used for full refresh of data. These tables
typically don't have a time column. If either one is not true for you, you
might be ok just with APPEND  
**@ken:** And what guarantees does Pinot provide (if any) for what happens to
queries that are executing when an updated segment is being reloaded?  
**@mayanks:** Single segment update is atomic. As in a query will either see
old or new segment, not a partially updated segment.  
**@mayanks:** If you are refreshing a bunch of segments, then you can have a
situation where some segments are refreshed and others are not  
**@ken:** Thanks, good to know.  
**@ken:** Though I’m still curious about the meaning of
`segmentIngestionFrequency` for an OFFLINE table. Why does Pinot care if I
update every day or every week?  
**@mayanks:** It is only used in two places (based on what I see with a quick
grep of code): ```1. Time boundary (only applies to hybrid table). 2\. There
are checks that ensure data is pushed as expected (for operational
monitoring).```  
**@ken:** OK, guess I need to dig into the operational monitoring stuff more -
thanks  
**@mayanks:** Yeah, think of this situation - Your user of Pinot thinks data
is being pushed to it daily, but their pipeline has been failing (and they
didn't notice it). The first thing they would do is ask the question - "Why is
Pinot not showing my latest data?" We build some checks to ensure we can
automatically detect this situation.  

###  _#random_

  
 **@hashhar:** @hashhar has joined the channel  
 **@hamoop:** @hamoop has joined the channel  
 **@prasanna.gsl:** @prasanna.gsl has joined the channel  
 **@qianbo.wang:** @qianbo.wang has joined the channel  
 **@nishanth:** @nishanth has joined the channel  

###  _#troubleshooting_

  
 **@e-ramirez:** Hi, I am evaluating Pinot for possible production use in my
company. I am encountering problem on `back up/restore` feature. I appreciate
if anyone can help. Here is my setup. Kubernetes: EKS 1.20.4 Pinot version:
0.7.1 So I enable S3 as deep storage . Then ingested Parquet data from S3 .
Data loaded fine and I can query the expected data from Pinot. Next I
simulated replacing the cluster, by uninstalling all pods and its related
volumes(therefore losing all state) but kept the segment files in s3 segment
location(therefore backup is intact in deep store). Next I reinstalled
cluster, and reconfigured the tables. I was expecting that the servers would
automatically fetch the segments from deep store as mentioned in previous
post, but it does not seem to be happening. Am i missing a step? Thanks in
advance.  
**@g.kishore:** You cannot undeploy zookeeper  
**@g.kishore:** Zookeeper stores the metadata/list of segments  
**@e-ramirez:** Thank for the reply. May I know what should be the steps in
case I have to replace the cluster? Should I keep a backup of zookeeper and
restore it to the new cluster?  
**@g.kishore:** Yes  
**@g.kishore:** Or upload all the segments again to new cluster using upload
api call  
**@g.kishore:** It can be simple script over the segments in S3  
**@e-ramirez:** Got it. Looking at the `UploadSegment` command, the parameter
`segmentDir` requires a local path. This means i have to download the segments
first to upload. Is there a way to use the previous cluster’s s3 segment path
as source location to new cluster s3 segment path upload?  
**@g.kishore:** Use uri based or metadata based push  
**@e-ramirez:** awesome. Thanks. I will try this.  
**@mayanks:** Do you have realtime component as well?  
**@e-ramirez:** I can think of several use-cases where Pinot might be useful
to us. • As a main backend of our analytics dashboard. Currently we are using
Druid, GreenPlum, Tidb etc, but each one have drawbacks • As one of data
sources for our Machine Learning Jobs. Currently we are using Athena or direct
files from S3, but Athena have upper bound throughput while S3 file is too
limited. • As a backend sink of Kafka to complement our real time prediction
in production serving.  
**@mayanks:** Yeah, these sound like great use cases for Pinot. We are here to
help you use Pinot successfully for these.  
 **@hashhar:** @hashhar has joined the channel  
 **@jainendra1607tarun:** Hello everyone, I am running Presto to query Pinot
and the presto-pinot connector throws an exception when there is no data
returned by Pinot. Example query is : ```select * from pinot.default.mytable
where where datekey='2021-04-19 00:00:00' limit 10``` Though this query
returns empty result in Pinot as expected. The exception in presto is :
```java.lang.IllegalStateException: Expected at least one row to be present at
com.google.common.base.Preconditions.checkState(Preconditions.java:507) at
com.facebook.presto.pinot.PinotBrokerPageSourceSql.populateFromQueryResults(PinotBrokerPageSourceSql.java:118)
at
com.facebook.presto.pinot.PinotBrokerPageSourceBase.lambda$issueQueryAndPopulate$0(PinotBrokerPageSourceBase.java:327)
at com.facebook.presto.pinot.PinotUtils.doWithRetries(PinotUtils.java:39) at
com.facebook.presto.pinot.PinotBrokerPageSourceBase.issueQueryAndPopulate(PinotBrokerPageSourceBase.java:312)
at
com.facebook.presto.pinot.PinotBrokerPageSourceBase.getNextPage(PinotBrokerPageSourceBase.java:222)
at
com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:252)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) at
com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) at
com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) at
com.facebook.presto.operator.Driver.processFor(Driver.java:294) at
com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at
com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at
com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
at
com.facebook.presto.$gen.Presto_0_256_SNAPSHOT_5059796____20210616_162510_1.run(Unknown
Source) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)``` Is this a bug or am I missing some
configuration ?  
**@fx19880617:** This is a bug that @dharakkharod is working on.  
**@fx19880617:** we should have it fixed soon  
 **@patidar.rahul8392:** @ Hi Team, I am ingesting realtime data from Kafka
and updating realtime data in superset dashboard. In superset dashboard I have
one slice where I am displaying events of last 5 min based on my timestamp
columns. So for testing purpose I have pushed one event in Kafka which was
already available ( duplicate) as soon as pushed the data in Kafka it's
showing on pinot within milisec.but same it's not reflecting at dashboard side
as last 5 mins count. So my question is. Will it take sometime to reflect at
dashboard side ?or duplicates records will not show as last 5 mins. Count at
Dashboard. @mayanks  
**@mayanks:** can you check what query the dashboard is firing to Pinot, and
compare it with query you used to verify that event is in Pinot?  
**@patidar.rahul8392:**  
**@patidar.rahul8392:** @mayanks  
**@mayanks:** can you manually run superset query directly on pinot?  
**@patidar.rahul8392:** Okay  
**@mayanks:** my guess is the superset query is filtering the second record.
you'll need to then compare the rows with the predicate on why that is
happening  
**@patidar.rahul8392:** my bad @mayanks . I am pushing events whose
current_timestamp is older than 5 min and in superset giving interval as 5 min
.so might be this is the issue.  
**@mayanks:** :+1:  
 **@hamoop:** @hamoop has joined the channel  
 **@dharakkharod:** @dharakkharod has joined the channel  
 **@prasanna.gsl:** @prasanna.gsl has joined the channel  
 **@mateus.oliveira:** Hello team, need helo with something, I'm trying to
load some data from S3 bucket into Pinot but is give me this error ```Trying
to create instance for class
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme s3, classname
org.apache.pinot.plugin.filesystem.S3PinotFS Creating an executor service with
1 threads(Job parallelism: 0, available cores: 1.) Listed 8 files from URI: ,
is recursive: true Got exception to kick off standalone data ingestion job -
java.lang.RuntimeException: Caught exception during running -
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132)
[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at
org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:166)
[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at
org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:186)
[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435]
Caused by: java.lang.IllegalArgumentException at
sun.nio.fs.UnixFileSystem.getPathMatcher(UnixFileSystem.java:288)
~[?:1.8.0_292] at
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:175)
~[pinot-batch-ingestion-standalone-0.8.0-SNAPSHOT-
shaded.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] ...
4 more``` this is my job ```executionFrameworkSpec: name: 'standalone'
segmentGenerationJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush inputDirURI: '' includeFileNamePattern:
'*.json' outputDirURI: '' overwriteOutput: true pinotFSSpecs: \- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'us-
east-1' endpoint: '' accessKey: 'access' secretKey: 'key' recordReaderSpec:
dataFormat: 'json' className:
'org.apache.pinot.plugin.inputformat.json.JSONRecordReader' tableSpec:
tableName: 'bank' pinotClusterSpecs: \- controllerURI: ''```  
**@aaron:** Try `includeFileNamePattern: 'glob:**/*.json'`  
**@mayanks:** Yeah ^^. Seems it is failing here in the code: ``` if
(_spec.getIncludeFileNamePattern() != null) { includeFilePathMatcher =
FileSystems.getDefault().getPathMatcher(_spec.getIncludeFileNamePattern());
}```  
**@mateus.oliveira:** not receive any error anymore, but he is not create
segments  
**@mateus.oliveira:** ```Trying to create instance for class
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme s3, classname
org.apache.pinot.plugin.filesystem.S3PinotFS Creating an executor service with
1 threads(Job parallelism: 0, available cores: 1.) Listed 8 files from URI: ,
is recursive: true Trying to create instance for class
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme s3, classname
org.apache.pinot.plugin.filesystem.S3PinotFS Listed 0 files from URI: , is
recursive: true Start pushing segments: []... to locations:
[org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@106cc338] for
table bank```  
**@fx19880617:** Can you try Aaron’s suggestion?  
**@fx19880617:** Try `includeFileNamePattern: 'glob:**/*.json'`  
**@fx19880617:** I feel the pattern doesn’t match any file  
**@mateus.oliveira:** sure, I try and I have no more errors, but is not
creating segments  
**@mateus.oliveira:** can be, I will take a look into the files  
**@fx19880617:** ic  
**@fx19880617:** what’s your file names/paths?  
**@mateus.oliveira:** ```bank_2021_5_19_11_33_43.json```  
**@fx19880617:** hmm  
**@mateus.oliveira:** he even reads the 8 files as log message shows but is
weird  
**@fx19880617:** have you set this ``` schemaURI: '' tableConfigURI: ''```  
**@fx19880617:** under `tableSpec:`  
**@mateus.oliveira:** no but I will do it now  
**@mateus.oliveira:** nothing, dont create the segments and the table is
empty, I will review the schema, maybe is something related with that  
**@mateus.oliveira:** ```SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
authToken: null cleanUpOutputDir: false excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone,
segmentGenerationJobRunnerClassName:
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
segmentMetadataPushJobRunnerClassName: null, segmentTarPushJobRunnerClassName:
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
segmentUriPushJobRunnerClassName:
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
failOnEmptySegment: false includeFileNamePattern: glob:*.json inputDirURI:
jobType: SegmentCreationAndTarPush outputDirURI:  overwriteOutput: true
pinotClusterSpecs: \- {controllerURI: ''} pinotFSSpecs: \- className:
org.apache.pinot.plugin.filesystem.S3PinotFS configs: {region: us-east-1,
endpoint: '', accessKey: YOURACCESSKEY, secretKey: YOURSECRETKEY} scheme: s3
pushJobSpec: null recordReaderSpec: {className:
org.apache.pinot.plugin.inputformat.json.JSONRecordReader, configClassName:
null, configs: null, dataFormat: json} segmentCreationJobParallelism: 0
segmentNameGeneratorSpec: null tableSpec: {schemaURI: '', tableConfigURI: '',
tableName: bank} tlsSpec: null Trying to create instance for class
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme s3, classname
org.apache.pinot.plugin.filesystem.S3PinotFS Creating an executor service with
1 threads(Job parallelism: 0, available cores: 1.) Listed 8 files from URI: ,
is recursive: true Trying to create instance for class
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme s3, classname
org.apache.pinot.plugin.filesystem.S3PinotFS Listed 0 files from URI: , is
recursive: true Start pushing segments: []... to locations:
[org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@63f259c3] for
table bank root@pinot-controller-0:/opt/pinot#```  
**@mateus.oliveira:** the output of job execution  
**@fx19880617:** hmm, ok  
**@fx19880617:** ```includeFileNamePattern: glob:*.json```  
**@fx19880617:** `'glob:**/*.json'`  
**@fx19880617:** not `'glob:*.json'`  
**@mateus.oliveira:** work @fx19880617! thanks you and @aaron for the help  
**@mayanks:** @mateus.oliveira curious, was this a documentation issue (as in
was it not clear enough)?  
**@mayanks:** If so, any suggestions on how to improve it?  
**@mateus.oliveira:** In this case was my mistake but if you guys could detail
a little more the configs for example this part of pattern wasnt in the file
document, at least not in the s3, maybe even repeat a little this info will be
great, but besides was not a documentation problem, was my mistake  
**@mayanks:** I see, thanks  
**@kulbir.nijjer:** @mateus.oliveira btw endpoint is AWS S3 specific client
config not Pinot controller address,so current setting is invalid (AWS SDK
probably overriding it automatically based on region), u r probably fine not
specifying it at all ```endpoint: ''``` In case u interested about valid
values:  
**@fx19880617:** it might be a different s3 compatible fs endpoint, like
minio?  
**@kulbir.nijjer:** Yes good pt, it can be depending on object backend that
you are integrating with. Generally for AWS S3 access , its only needed for
advanced use cases.  
 **@qianbo.wang:** @qianbo.wang has joined the channel  
 **@nishanth:** @nishanth has joined the channel  
 **@nishanth:** Hello  
 **@chxing:** Hi @jackie.jxt Can pinot support druid connection pool  
 **@chxing:** We want to use connection pool for java service  
 **@jackie.jxt:** It doesn’t support connection pool, but Pinot supports jdbc
connector. @fx19880617 can you share more info about the jdbc connector?  

###  _#docs_

  
 **@e-ramirez:** @e-ramirez has joined the channel  

###  _#pinot-dev_

  
 **@nishanth:** @nishanth has joined the channel  

###  _#community_

  
 **@nishanth:** @nishanth has joined the channel  

###  _#announcements_

  
 **@nishanth:** @nishanth has joined the channel  

###  _#presto-pinot-streaming_

  
 **@hashhar:** @hashhar has joined the channel  

###  _#aggregate-metrics-change_

  
 **@nishanth:** @nishanth has joined the channel  

###  _#presto-pinot-connector_

  
 **@hashhar:** @hashhar has joined the channel  

###  _#getting-started_

  
 **@mark.needham:** @mark.needham has joined the channel  
 **@hamoop:** @hamoop has joined the channel  

###  _#debug_upsert_

  
 **@e-ramirez:** @e-ramirez has joined the channel  

###  _#pinot-docsrus_

  
 **@e-ramirez:** @e-ramirez has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org