You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/06/24 02:00:20 UTC

Apache Pinot Daily Email Digest (2021-06-23)

### _#general_

**@srikanth:** @srikanth has joined the channel
**@zsolt:** In the docs at Is this section still accurate? I can't find any
usage of the property in the code. > *Real-Time Pinot table:* In case of real-
time tables, make sure the "_pinot.server.instance.reload.consumingSegment_"
config is set to true inside . Without this, the current consuming segment(s)
will not reflect the default null value for newly added columns.
**@mayanks:** You are right, seems like the default value was switched to true
in this PR.
**@mayanks:** This was just merged a couple of days ago. The doc is still as
per the prior official release.
**@mayanks:** cc @jackie.jxt
**@zsolt:** I mean I grepped through the history and I couldn't find a
reference actually reading the config value. I don't think it was ever
actually wired. It was weird because we haven't set it and adding fields
worked.
**@mayanks:** Are you saying it works with 0.7.1. release?
**@zsolt:** yes
**@jackie.jxt:** @zsolt The next consuming segment will always get the updated
schema. The feature is about whether the current consuming segment can add the
new columns on the fly via reload
**@zsolt:** I see, I've found that the usage is through a different constant
**@zsolt:** Theres a comment > // Whether to reload consuming segment on
scheme update. Will change default behavior to true when this feature is
stabilized I assume this has happened already
**@jackie.jxt:** Yes, happened in less a week :wink:
**@specsek:** @specsek has joined the channel
**@hsaini:** @hsaini has joined the channel
**@vlum:** @vlum has joined the channel
**@mercyshans:** hi, does pinot provide any benefits for use cases that no
aggregation will be applied to query (means there is no metrics columns, all
are dimension columns). and is that the reason why metrics column does not
allow other like String types
**@mayanks:** Do you mean you want to just fetch a bunch of rows from Pinot
without any aggregation or group-by?
**@mercyshans:** @mayanks yes. or combination usage (like a table to server
both aggregation queries but also just few rows query)
**@mayanks:** Yes, you can do so. Especially, if you have both aggregation and
selection queries, both will work fine with Pinot
**@mayanks:** What you want to avoid is cases where you just have select *
queries that are simply fetching millions of records per query, and there are
no aggregation queries. In that case, you are not really utilizing the power
of Pinot
**@mercyshans:** make sense. what about string type support for metric
columns, why this is not supported? what if I just want to aggregate the count
of this metrics, string type should be reasonable right?
**@mayanks:** You can do count(*) anytime.
**@mayanks:** Metric columns are usually ones where you would do something
like `sum(metric)`
**@mercyshans:** ok so I need to make it as dimension columns, and avoid index
by set it to `noDictionaryColumns` correct?
**@mayanks:** No
**@mayanks:** You just do something like `select count(<col>) from myTable
where <col> != null`
**@mayanks:** Note, Pinot does not support nulls natively yet, so null values
are replaced by a default value, which you will have to filter out
**@mayanks:** You don't need to do anything special here for just getting
count
**@mercyshans:** yeah, but I do not want to create index on this column since
will not filter or aggregate on this column, how do I avoid the indexing
**@karinwolok1:** Hi all! Would love for you to join us tomorrow for this
event! We have some great speakers from LinkedIn, DoorDash, Confluent,
Microsoft, Decodable, Stripe and more!
**@qianbo.wang:** Hi, having a question on . Can we use the return value of
this function in `group by` statement? thanks in advance.
**@mayanks:** Looking at the code Lookup is implemented as a
TransformFunction, which can be applied to group by.
**@qianbo.wang:** thanks

### _#random_

**@srikanth:** @srikanth has joined the channel
**@specsek:** @specsek has joined the channel
**@hsaini:** @hsaini has joined the channel
**@vlum:** @vlum has joined the channel

### _#troubleshooting_

**@srikanth:** @srikanth has joined the channel
**@jmeyer:** Hello Is it possible to generate segment names following the
input file names ? Say I generate 10 files for 10 "ids", I'd want segments to
contain these ids, so that they can be replaced later by generating another
segment with the same name. e.g. `ID1.parquet -> prefix_ID1.segment` Anyway to
make this work using `segmentNameGeneratorSpec.type` ? Maybe using a
particular file structure like `data/ID/file.parquet` ? Thanks !
**@mayanks:** The default naming scheme already generates names friendly to
overwrite at a later point in time, right?
**@mayanks:** For example, <tableName>_<minTime>_<maxTime>_<id>
**@mayanks:** If you regenerate data for a date partitioned folder, you will
get consistent names, as long as the number of files is unchanged.
**@jmeyer:** What if I don't have a time column ? :smile:
**@jmeyer:** Can I use an "id" partitionned folder then ?
**@mayanks:** I think for refresh use case the convention is <tableName>_<id>
**@jmeyer:** So using a structure like so ```basedir/id1/file.parquet
basedir/id2/file.parquet``` would generate segments with names
```<table_name>_id1.segment <table_name>_id2.segment``` ?
:slightly_smiling_face:
**@jmeyer:** Meaning that regenerating those files would easily replace
previous segments for the same "ids"
**@mayanks:** No. the id I am referring to is just a sequence number generated
on the fly.
**@mayanks:** Just curious, why don't you have a time column? Is it a pure
refresh use case?
**@jmeyer:** Ah I see.. Any way to make it easy to replace segments based on a
user provided id ?
**@jmeyer:** > Just curious, why don't you have a time column? Is it a pure
refresh use case? It is a table I'm using with `IN_SUBQUERY`
:slightly_smiling_face:
**@jmeyer:** So it's more of a dimension table
**@jmeyer:** Hence why no time column
**@jmeyer:** I guess I can cheat and map ids -> time, but that sounds kind of
hacky ^^
**@mayanks:** I'll have to check the code. But in the worst case ,the name
generator is a very simple interface, and your use case seems like a one
others might need, so might be good to implement, if not already supported.
**@mayanks:**
**@mayanks:** Care to take a look?
**@jmeyer:** Looks pretty simple indeed Can't seem to find a "suitable"
strategy for my use case though
**@mayanks:** Ah, the interface does not provide a way to specify input file
name.
**@mayanks:** Might be worth discussing in a broader forum via a github issue.
**@mayanks:** I do see your use case to be a good one to support.
**@jmeyer:** Interesting, I'll do that
**@specsek:** @specsek has joined the channel
**@specsek:** Greetings! Is there a stable version of the helm chart to run?
I install the latest (0.7.1) but all the components crash with messages like
the following ```Unrecognized VM option 'PrintGCDateStamps' Error: Could not
create the Java Virtual Machine. Error: A fatal exception has occurred.
Program will exit.```
**@mayanks:** @xiangfu0 ^^
**@xiangfu0:** oh, are you using k8s ?
**@xiangfu0:** can you try to remove `PrintGCDateStamps` tag from the javaOpts
in `values.yaml` file?
**@xiangfu0:** I meant because we upgrade to java11 for those configs
**@xiangfu0:** You can also try to use image tag: `0.7.1-jdk11`
**@specsek:** I had to remove all the PrintGC statements to get it to run, and
though are other errors ```SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-all-0.7.1-jar-with-
dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found
binding in [jar:file:/opt/pinot/plugins/pinot-file-
system/pinot-s3/pinot-s3-0.7.1-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation. SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory] WARNING:
sun.reflect.Reflection.getCallerClass is not supported. This will impact
performance. ERROR StatusLogger File not found in file system or classpath:
/opt/pinot/conf/log4j2.xml ERROR StatusLogger Reconfiguration failed: No
configuration found for 'Default' at 'null' in 'null' WARNING: An illegal
reflective access operation has occurred WARNING: Illegal reflective access by
org.apache.pinot.spi.plugin.PluginClassLoader (file:/opt/pinot/lib/pinot-
all-0.7.1-jar-with-dependencies.jar) to method
java.net.URLClassLoader.addURL(java.net.URL) WARNING: Please consider
reporting this to the maintainers of
org.apache.pinot.spi.plugin.PluginClassLoader WARNING: Use --illegal-
access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
17:41:43.847 [main] ERROR org.apache.pinot.spi.plugin.PluginManager - Failed
to load plugin [pinot-gcs] from dir [/opt/pinot/plugins/pinot-file-
system/pinot-gcs] java.lang.IllegalArgumentException: object is not an
instance of declaring class at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:?] at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at
org.apache.pinot.spi.plugin.PluginClassLoader.<init>(PluginClassLoader.java:50)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at
org.apache.pinot.spi.plugin.PluginManager.createClassLoader(PluginManager.java:196)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at
org.apache.pinot.spi.plugin.PluginManager.load(PluginManager.java:187)
~[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at
org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:157) [pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at
org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:123) [pinot-
all-0.7.1-jar-with-
dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at
org.apache.pinot.spi.plugin.PluginManager.<init>(PluginManager.java:104)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at
org.apache.pinot.spi.plugin.PluginManager.<clinit>(PluginManager.java:46)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at
org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:182)
[pinot-all-0.7.1-jar-with-
dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] 17:41:43.868
[main] ERROR org.apache.pinot.spi.plugin.PluginManager - Failed to load plugin
[pinot-adls] from dir [/opt/pinot/plugins/pinot-file-system/pinot-adls]```
**@specsek:** and yes, I am using k8s :+1:
**@xiangfu0:** i see, is this on 0.7.1 image or 0.7.1-jdk11 image?
**@specsek:** 0.7.1-jdk11
**@xiangfu0:** ok, i’ll take a look
**@specsek:** thanks!:pray:
**@xiangfu0:** meanwhile you can try to use jdk8 image:
**@xiangfu0:** with the old jvmOpts
**@hsaini:** @hsaini has joined the channel
**@vlum:** @vlum has joined the channel
**@sheetalarun.kadam2:** Hello! I am using Presto Pinot python connector to
query Pinot. I have a requirement for a regex type predicate on one of the
dimensions. I created text index on the dimension. Will this help in the
performance? Will it be able to use TEXT_MATCH to query?
**@mayanks:** You can explain the presto query that should tell the Pinot
query. You can check if it is using text match.
**@sheetalarun.kadam2:** ohh, sorry if this seems dumb, I am new to Presto-
Pinot. I did try the explain<query> but the output does not speify any query
plan details. How to check it?

### _#pinot-dev_

**@s.azimigehraz:** @s.azimigehraz has joined the channel
**@syedakram93:** Is it possible for provide snapshot tar with current code?
above 7.1
**@dlavoie:** `mvn clean install -DskipTests -Pbin-dist`
**@dlavoie:**
**@xiangfu0:** for snapshot , you can use the docker images we published or
you need to build and publish them by yourself
**@xiangfu0:** also, i know that is able to build and publish dependencies in
case you wanna try.
**@syedakram93:** or 8.0 tar
**@evan.galpin:** @evan.galpin has joined the channel

### _#getting-started_

**@s.azimigehraz:** @s.azimigehraz has joined the channel
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org