You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/03/31 02:00:27 UTC

Apache Pinot Daily Email Digest (2022-03-30)

### _#general_

  
 **@rsibanez89:** @rsibanez89 has joined the channel  
 **@erik.bergsten:** @erik.bergsten has joined the channel  
 **@erik.bergsten:** Question about tiered storage+deep storage. We want to
use pinot to query mostly fresh data (last week or so) and then store data
long term on s3. We have deployed pinot in our k8s cluster using the helmchart
and configuring the controller to backup to s3 seems to work. If I understand
it correctly this is only to backup data which is also stored on local disks
however and we want to keep the local disk usage low! Can we use tiered
storage and configure one server to use s3 and another server to use local
disks?  
**@g.kishore:** Hi Erik, this feature is only available in StarTree version of
Pinot.  
**@erik.bergsten:** Oh! So these pages in the docs  and  are not the same as
that startree feature?  
**@g.kishore:** The concept is the same.. open source supports hdfs ssd, ebs
etc as tiers  
**@g.kishore:** StarTree version adds S3 as a tier  
**@nizar.hejazi:** @g.kishore What about using S3 for deep storage (not tiered
storage), is it also only supported in StarTree version?  
**@g.kishore:** no, that is available in OSS version as well  
 **@ysuo:** when I run an explain plan for query multiply times, I got
different query results. And the *totalDocs* number is different. A very
simple query like explain plan for select * from table_name. Any idea why is
that?  
**@richard892:** hi can you share the plans?  
**@mayanks:** also cc: @amrish.k.lal  
 **@mathew.pallan:** @mathew.pallan has joined the channel  
 **@dh258:** @dh258 has joined the channel  
 **@ralph.debusmann967:** @ralph.debusmann967 has joined the channel  

###  _#random_

  
 **@rsibanez89:** @rsibanez89 has joined the channel  
 **@erik.bergsten:** @erik.bergsten has joined the channel  
 **@mathew.pallan:** @mathew.pallan has joined the channel  
 **@dh258:** @dh258 has joined the channel  
 **@ralph.debusmann967:** @ralph.debusmann967 has joined the channel  

###  _#troubleshooting_

  
 **@rsibanez89:** @rsibanez89 has joined the channel  
 **@erik.bergsten:** @erik.bergsten has joined the channel  
 **@deemish2:** Hi Team , I am using pinot-0.10.0 to execute pinot ingestion
job with avro file and it gives error like :- Caused by:
java.lang.UnsupportedOperationException: Unsupported Avro type: NULL It is
working fine with json file. can anyone please help into this issue?  
**@tisantos:** Hi Deepak, would you happen to have the avro schema of your
input data? Seems like there might be an invalid field with a type `NULL`.  
**@deemish2:** Schema is same for json and avro  
**@deemish2:** It is getting problem on this field { “name”: “countryInfo”,
“dataType”: “STRING” },  
**@deemish2:** In data there is no value , It should come with default value -
NULL after populating data in pinot table  
**@tisantos:** The exception is thrown for the `viceusage` field. Can you
confirm what the avro type is for that field?  
**@deemish2:** { “name”: “viceusage”, “dataType”: “STRING” }  
**@deemish2:** error is coming where the datatype - STRING and if there is no
data of this field  
**@tisantos:** This exception is happening in the part of the code the is
purely parsing your Avro schema. If the `viceusage` field can be null, then it
should be an avro UNION type of `["null", "long"]` . I don't think that is
currently the case.  
 **@mathew.pallan:** @mathew.pallan has joined the channel  
 **@mathew.pallan:** Hi, I am evaluating Apache Pinot and wanted to understand
the deep storage options while deploying on Azure. From the docs, it seems
like Azure blob is not supported and Azure Data Lake Storage has to be used as
the deep storage. Can you please confirm on the same. Also is PinotFS the
abstraction used for deep storage as well. The docs mention PinotFS in the
context of importing data and hence this query.  
**@g.kishore:** That’s right.. only adls implementation is available as of
now.. PinotFs is used for deep storage abstraction as well.. you can look at
adsl implementation and add/contribute azure blob storage FS implementation  
**@mark.needham:** this is the interface that you have to implement if you
want to do that  
**@mark.needham:** This is the Azure data lake implementation -  
**@mathew.pallan:** Thanks @g.kishore @mark.needham for the quick response  
**@mathew.pallan:** @g.kishore I got a bit confused with the comment above
that says - PinotFs is used for deep storage abstraction as well. Does it
imply that there are other abstractions as well and we PinotFS is the new
approach(or one of the approaches) to abstract deep storage?  
**@g.kishore:** my bad, pinotFS is the abstraction to interact with deep
storage.  
**@mathew.pallan:** Thanks, clear now :thumbsup:  
**@mayanks:** @mathew.pallan To add more context, ADLS gen2 is built on top of
ABS (if I am not wrong). And it has better abstraction in terms of what
PinotFS needs, so we went with ADLS instead of ABS.  
 **@elon.azoulay:** Hi, if we make a change to the controller app, how do we
build it? i.e. `npm install` and `npm run build`?  
**@elon.azoulay:** Want to do a pr, but it seems that some additional steps
need to be done first. If you have a chance lmk. Thanks!  
 **@dh258:** @dh258 has joined the channel  
 **@dh258:** Hi All, While deploying Pinot  on-prem Tanzu K8 cluster I’m
running into an issue where helm unable to pull zookeeper image from  registry
due to rate limiting limitation on . Normal Pulling 19s (x3 over 63s) kubelet
Pulling image "zookeeper:3.5.5" Warning Failed 17s (x3 over 61s) kubelet
Failed to pull image "zookeeper:3.5.5": rpc error: code = Unknown desc =
failed to pull and unpack image "": failed to copy: httpReaderSeeker: failed
open: unexpected status code : 429 Too Many Requests - Server message:
toomanyrequests: You have reached your pull rate limit. You may increase the
limit by authenticating and upgrading:  Warning Failed 17s (x3 over 61s)
kubelet Error: ErrImagePull Normal BackOff 3s (x3 over 60s) kubelet Back-off
pulling image "zookeeper:3.5.5" Warning Failed 3s (x3 over 60s) kubelet Error:
ImagePullBackOff To get around it I put zookeeper image into our internal
image registry but not sure how to modify the helm chart to point it there.
Can you assist in getting this resolved?  
**@diana.arnos:** You need to overwrite the `image` property in the values
file: ```image: repository:  tag: 0.10.0``` and run helm like ```helm upgrade
\--install pinot-test pinot/pinot \--timeout 5m --wait --atomic \--values
values.yaml \--values values-${DEPLOY_ENVIRONMENT}.yaml```  
**@dlavoie:** ```--set zookeeper.image.registry=<your-registry> \ \--set
zookeeper.image.repository=<your-repository>``` to be specific about
zookeeper. The dependend chart has the following default values: ```## Bitnami
Zookeeper image version ## ref:  ## image: registry:  repository:
bitnami/zookeeper tag: 3.7.0-debian-10-r56```  
**@dlavoie:** @dh258 ^^  
**@dh258:** With these changes do I need to put all images used in pinot
install in my registry or just zookeeper?  
 **@dlavoie:** Pinot is using a a child chart for zookeeper. You can configure
the nested helm chart with `zookeeper.x` values  
**@dh258:** Can you be more specific what changes I need to make where?  
 **@ralph.debusmann967:** @ralph.debusmann967 has joined the channel  
 **@luisfernandez:** does anyone know why i may get the following error from
the broker `"message": "2 servers [pinot-server-1_O, pinot-server-0_O] not
responded"` I just changed the version of my broker trying to downgrade the
entire system to `0.9.3`  
**@luisfernandez:** anyone familiar with this error? ```"message":
"QueryExecutionError:\njava.lang.RuntimeException: Caught exception while
running CombinePlanNode.\n\tat
org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:146)\n\tat
org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:41)\n\tat
org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)\n\tat
org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:296)\n...\nCaused
by: java.util.concurrent.ExecutionException:
java.lang.NullPointerException\n\tat
java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)\n\tat
java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)\n\tat
org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:135)\n\t...
15 more\n...\nCaused by: java.lang.NullPointerException\n\tat
org.apache.pinot.core.util.trace.TraceContext.registerThreadToRequest(TraceContext.java:140)\n\tat
org.apache.pinot.core.util.trace.TraceCallable.call(TraceCallable.java:41)\n\t...
8 more",```  
**@luisfernandez:** btw a restart of servers made this go away but i’m curious
why it happened on the first place lol  

###  _#getting-started_

  
 **@rsibanez89:** @rsibanez89 has joined the channel  
 **@erik.bergsten:** @erik.bergsten has joined the channel  
 **@mathew.pallan:** @mathew.pallan has joined the channel  
 **@dh258:** @dh258 has joined the channel  
 **@ralph.debusmann967:** @ralph.debusmann967 has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org