You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/01/27 02:00:47 UTC

Apache Pinot Daily Email Digest (2021-01-26)

### _#general_

  
 **@swaroop.aj:** @swaroop.aj has joined the channel  
 **@niels.it.berglund:** Hey everyone! As @karinwolok1 asked what brought me
here, I'd like to introduce myself. I am Niels, I am based in Durban, South
Africa, and I work as Software Architect at Derivco. Been looking at Druid and
the like for a while, and recently came across Pinot. I'm here to see what I
can learn, and what Pinot can do for me, (the company).  
 **@kohei:** @kohei has joined the channel  
 **@alfredk.j:** @alfredk.j has joined the channel  
 **@alfredk.j:** Hi All ! I like to introduce myself. I'm Alfred and I'm
working as a Senior Developer at UST in Singapore. I'm here to listen/learn
more about Pinot.  
 **@jjyliu:** @jjyliu has joined the channel  
 **@tymm:** Hi, i am using addtable command to upload a new schema etc to
docker pinot. how do i update table, eg add/ remove columns after i have added
the table? Thanks  
**@g.kishore:** You can use the cluster manager UI to edit schema  
**@mateus.oliveira:** @mateus.oliveira has joined the channel  
 **@leerho:** @leerho has joined the channel  
 **@leerho:** Hello, this is Lee Rhodes of Apache DataSketches. I would be
interested in some feedback as to how you are using our library … what
sketches are you using, what you feel works well, and constructive feedback on
what could work better, or what problems you would like sketches to address?  
**@mayanks:** Hello @leerho. One of our count-distinct functions has a Theta-
Sketch based implementation (we have an HLL based function as well). We like
theta-sketches for its capability to perform set operations. However, our
biggest challenge is getting accuracy under control (especially, in case of
intersection of uneven sized sets). Anything you guys can do for that would be
really helpful.  
**@leerho:** Yes, accuracy of distinct counts of intersections (and
differences) of sampled sets are difficult. And it is not a shortcoming of the
algorithm, per se. It can be proven mathematically that no matter what
streaming algorithm you use, if you end up with sampled sets of the original
domain the accuracy of your estimate can be very poor compared with the
accuracy of a union operation. We knew this from the outset. This is why we
provide the getUpperBound() and getLowerBound() methods that you can use as a
tool to warn you, after the fact, if the accuracy goes beyond what you
consider to be acceptable. For example, with a theta sketch configured with
K=4096 ( logK=12), its accuracy on a single stream or from a merge (union)
will asymptote to about +/- 3.1 % with 95% confidence: (2 / sqrt(4096)) =
.03125. What you can do: after any intersection or difference operation check
to see how much the expected error has changed by computing
`((getUpperBound(2) / getEstimate()) -1) * sqrt(K)/2`. This will be factor of
how much your intersection error exceeds the nominal RSE of the sketch. If
this results in a 2, that means your estimated error of that operation will be
about twice as large or, in this case, about +/- 6.25% (at 95% confidence). At
least this allows you to monitor the relative error of intersections and even
be able to determine which operations caused the largest increase in error.
You can also try scheduling the sequence of your set operations so that all of
your intersections occur either early in the sequence or and the end.
Depending on your data, you might find that reordering the sequence might
help. Other than that, know that the intersection error of the theta sketches
approaches the theoretical limit of what is possible, given a streaming
algorithm and limited space. I hope this helps.  
**@mayanks:** Yes, this is helpful. Thanks  
**@leerho:** Do you have any interest in some of the other sketches:
quantiles, frequent items, etc?  
**@mayanks:** The interest is usually generated by Pinot users. Once we see
our users asking for these, we are happy to add those into Pinot.  
**@leerho:** We have found that very few system users are even aware that
these capabilities exist. We would be glad to work with you to promote the
possible leveraging of our DataSketches library to your users. There are lots
of ways to do this.  
**@ken:** Hi @leerho I haven’t spent any time seriously thinking about this,
but I always wondered if there was a faster way to approximate LLR (log-
likelihood ratio) using sketch-like methods (other than just using sketches
for approximate counts). I’ve found LLR to be a very useful way to surface
outliers in a dataset, but doing the exact computation (say, via map-reduce)
can be painful.  
**@mayanks:** We recently solved a audience matching use case at LinkedIn
using Data Sketches impl in Pinot. We talked about it in one of our meetups,
and I am in the process of publishing a Lnkd blog on the same.  
**@mayanks:** Happy to collaborate  
**@leerho:** I’ll have to do some research on LLR. Nonetheless, we have used
both Frequent Items and Quantiles for finding outliers as well.  
**@leerho:** We would be glad to help you with your blog and or meetups with
materials, tutorials. Let us know how we can help.  
**@karinwolok1:** And if you do anything like that, keep us posted - we'd be
happy to cross publish / promote :slightly_smiling_face:  
**@leerho:** We are actually preparing a Press-release with ASF about our
recent graduation. It would be great if you folks could give us a couple of
sentences of how useful DataSketches has been for Pinot!  
**@leerho:** Something with the format: “QUOTE,” said NAME, TITLE at COMPANY.
“…MORE…”  
 **@apadhy:** @apadhy has joined the channel  
 **@shipengxie:** @shipengxie has joined the channel  
 **@jeff:** @jeff has joined the channel  

###  _#random_

  
 **@swaroop.aj:** @swaroop.aj has joined the channel  
 **@kohei:** @kohei has joined the channel  
 **@alfredk.j:** @alfredk.j has joined the channel  
 **@jjyliu:** @jjyliu has joined the channel  
 **@mateus.oliveira:** @mateus.oliveira has joined the channel  
 **@leerho:** @leerho has joined the channel  
 **@apadhy:** @apadhy has joined the channel  
 **@shipengxie:** @shipengxie has joined the channel  
 **@karinwolok1:** Random question, @niels.it.berglund - are you related to
Tim? Haha  
**@dlavoie:** I had the same question but didn’t dared to ask :smile:  
**@karinwolok1:** I mean, he said hello in the random group - and I did see
him in the meetup on Thursday. Haha  
**@dlavoie:** Coincidence? I think not :stuck_out_tongue:  
**@karinwolok1:** life works in mysterious wayssss  
**@mayanks:** Wow, I had the same question too  
**@karinwolok1:** Maybe he's the Peyton to the Eli (Manning)  
**@karinwolok1:** Niels! EVERYONE WANTS TO KNOW hahahaha  
 **@jeff:** @jeff has joined the channel  

###  _#troubleshooting_

  
 **@elon.azoulay:** I can't seem to select a virtual column from the pinot
query console, is it supported? i.e. ```select $segmentName from <table> limit
10```  
**@g.kishore:** @jackie.jxt ^^  
**@jackie.jxt:** Are you running the latest version? What is the query
response?  
**@elon.azoulay:** It's just no rows, I'm running 0.5.0. iirc I remember doing
this and it worked.  
**@elon.azoulay:** empty response  
**@elon.azoulay:** same for `$docId`  
**@elon.azoulay:** but I'm using query console  
**@elon.azoulay:** tried both pql and sql  
**@jackie.jxt:** There was a fix for the virtual column, let me check  
**@jackie.jxt:**  
**@jackie.jxt:** Available in 0.6.0:joy:  
**@elon.azoulay:** Ok, upgrading :rolling_on_the_floor_laughing: thanks:)  
 **@swaroop.aj:** @swaroop.aj has joined the channel  
 **@kohei:** @kohei has joined the channel  
 **@alfredk.j:** @alfredk.j has joined the channel  
 **@jjyliu:** @jjyliu has joined the channel  
 **@mateus.oliveira:** @mateus.oliveira has joined the channel  
 **@ken:** I’m trying to use the map-reduce job to build segments. In
HadoopSegmentGenerationJobRunner.packPluginsToDistributedCache, there’s this
code: ``` File pluginsTarGzFile = new File(PINOT_PLUGINS_TAR_GZ); try {
TarGzCompressionUtils.createTarGzFile(pluginsRootDir, pluginsTarGzFile); }
catch (IOException e) { LOGGER.error("Failed to tar plugins directory", e);
throw new RuntimeException(e); }
job.addCacheArchive(pluginsTarGzFile.toURI());``` This creates a `pinot-
plugins.tar.gz` file in the Flink distribution directory, which is on my
server. But as the Hadoop DistributedCache documentation states, “The
`DistributedCache` assumes that the files specified via urls are already
present on the `FileSystem` at the path specified by the url and are
accessible by every machine in the cluster.”  
**@ken:** So what you get is this error: `.FileNotFoundException: File
file:/path/to/distribution/apache-pinot-incubating-0.7.0-SNAPSHOT-bin/pinot-
plugins.tar.gz does not exist`  
**@ken:** I think the job needs to use the staging directory (in HDFS) for
this file (and any others going into the distributed cache).  
**@g.kishore:** what is the fix?  
**@ken:** I think the tar file (in snippet above) should be generated in a
temp dir, and then uploaded to the staging directory. and the staging dir URI
is what’s added to the distributed cache  
**@ken:** I think this might only be an error path through the code when a
plugins dir is explicitly provided…trying without it now  
**@g.kishore:** what do you mean by upload to staginging directory  
**@g.kishore:** I thought the addacheArchirve code is getting executed on the
gateway node  
**@ken:** As part of the job spec file, you include a `stagingDir`
configuration.  
**@g.kishore:** so stagingDir should be on HDFS?  
**@ken:** And yes, the addCacheArchive() gets called on the server where you
start the job. Which is why it has to be provided a URI to a file that’s
available on every slave server. So it can’t be a  path.  
**@g.kishore:** we thought that happens automatically  
**@ken:** And yes, stagingDir should be on HDFS (when running distributed).
And if you don’t specify it as such, the job fails (as it should) because it’s
not using the same file system as the input/output directories.  
**@ken:** From the DistributedCache JavaDocs: “Applications specify the files,
via urls (hdfs:// or http://) to be cached via the `JobConf"`  
**@g.kishore:** got it! is this how it was from day one?  
**@ken:** It will work if you run locally, of course, because the  is
accessible to the mappers  
**@ken:** Or if every server has the shared drive mounted that contains the
Flink distribution  
**@ken:** Those are the only reasons why I think it could work as-is now  
**@ken:** Maybe @fx19880617 has some insights, I think he wrote this code. I
could be reading it wrong, of course…  
**@g.kishore:** what you are saying makes sense, but I thought job launcher
pushes this to worker nodes  
**@g.kishore:** looks like its more of a pull from the worker node  
**@ken:** It’s a bit confusing…if you use the standard Hadoop command line
`-files` parameter (as an example), then the standard Hadoop tool framework
will copy the file(s) to HDFS first, before adding them to the JobConf as ``
paths. In the Pinot code, you need to do this first step (of copying to HDFS)
yourself.  
**@ken:** And then the Hadoop slaves will take care of copying these cache
files from HDFS to a local directory (that part you don’t have to do anything
special for)  
**@g.kishore:** > then the standard Hadoop tool framework will copy the
file(s) to HDFS first that what I thought would happen when we do it via code,
do you know which staging directory will it copy it to?  
**@ken:** Each Hadoop job has a “staging” directory in the cluster  
**@ken:** There’s a job-specific directory inside of that, where the archives
(jar files), etc get copied  
**@ken:** Taking off for a bit, I might file a PR for this  
**@g.kishore:** thanks  
**@fx19880617:** Thanks @ken  
**@ken:** I just filed , looking at a fix now.  
**@fx19880617:** Thanks!  
**@fx19880617:** I made a change:  
**@fx19880617:** in the branch  
**@fx19880617:** can you help validate if this one works and you can submit a
PR for fixing it!  
**@ken:** Funny, looks very similar to what I’ve done: ``` protected void
packPluginsToDistributedCache(Job job, PinotFS outputDirFS, URI stagingDirURI)
{ File pluginsRootDir = new File(PluginManager.get().getPluginsRootDir()); if
(pluginsRootDir.exists()) { try { File pluginsTarGzFile =
File.createTempFile("pinot-plugins", ".tar.gz");
TarGzCompressionUtils.createTarGzFile(pluginsRootDir, pluginsTarGzFile); //
Copy to staging directory Path cachedPluginsTarball = new
Path(stagingDirURI.toString(), SegmentGenerationUtils.PINOT_PLUGINS_TAR_GZ);
outputDirFS.copyFromLocalFile(pluginsTarGzFile, cachedPluginsTarball.toUri());
job.addCacheArchive(cachedPluginsTarball.toUri());```  
**@ken:** Working on a way to unit test…  
**@fx19880617:** :thumbsup:  
**@ken:** I’ve also got a change to `addDepsJarToDistributedCache`, which has
the same issue  
**@ken:** I’m hoping to try it out tonight. brb  
 **@leerho:** @leerho has joined the channel  
 **@apadhy:** @apadhy has joined the channel  
 **@shipengxie:** @shipengxie has joined the channel  
 **@pabraham.usa:** All Pinot Server pods keeps crashing with following error.
Anyone came across this before? ```[Times: user=0.02 sys=0.00, real=0.00 secs]
# # A fatal error has been detected by the Java Runtime Environment: # #
SIGBUS (0x7) at pc=0x00007f104649b6ff, pid=1, tid=0x00007ee665d06700 # # JRE
version: OpenJDK Runtime Environment (8.0_282-b08) (build 1.8.0_282-b08) #
Java VM: OpenJDK 64-Bit Server VM (25.282-b08 mixed mode linux-amd64
compressed oops) # Problematic frame: # C [libc.so.6+0x15c6ff] # # Core dump
written. Default location: /opt/pinot/core or core.1 # # An error report file
with more information is saved as: # /opt/pinot/hs_err_pid1.log```  
**@dlavoie:** Can you share some details about your `Xmx` and available off
heap setup?  
**@pabraham.usa:** @dlavoie ```jvmOpts: "-Xms512M -Xmx4G -XX:+UseG1GC
-XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime
-Xloggc:/dev/stdout -XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 "```  
**@dlavoie:** How much ram is ram is available to that box?  
**@pabraham.usa:** Also seeing some WARNs ```my-pinot-controller-0 controller
2021/01/26 22:17:09.409 WARN [CallbackHandler] [ZkClient-EventThread-29-my-
pinot-zk-cp-zookeeper.logging.svc.cluster.local:2181] Callback handler
received event in wrong order. Listener:
org.apache.helix.controller.GenericHelixController@362617cf, path: /pinot-
quickstart/INSTANCES/Server_my-pinot my-pinot-controller-0 controller
2021/01/26 22:17:12.039 WARN [ZkBaseDataAccessor] [HelixController-pipeline-
task-pinot-quickstart-(83ee1db3_TASK)] Fail to read record for paths: {/pinot-
quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-
headless.logging.svc.cluster.local_8098/MESSAGES/51a431e9-c0e1-4e08-9bcb-
aee747608526=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-
pinot-server-
headless.logging.svc.cluster.local_8098/MESSAGES/f9005198-fe85-4cad-84e8-0df918de95d9=-101```  
**@pabraham.usa:** The box has 26G RAM  
**@dlavoie:** Server requires 50% heap and 50% non heap  
**@dlavoie:** Depending on the segments you have within a server, it’s
important to have memory boundaries.  
**@dlavoie:** Is it a multi-tenant box?  
**@pabraham.usa:** sort of k8s single tenant , as I restricted with memory
boundaries. 26G is fully available to Pinot.  
**@dlavoie:** ok, then make sure you have k8s memory request x2 your jvm Xmx  
**@dlavoie:** that’s a rule of thumb  
**@dlavoie:** for servers of course  
**@dlavoie:** Controller and Broker may configure a Xmx that is nearly maxing
the k8s memory request  
**@pabraham.usa:** I increased the xmx to 8G now, the mem req is 26G  
**@dlavoie:** That’s not what i meant  
**@dlavoie:** if you have a 4G XMX, ensure you configure a
`resources.request.memory: 8Gi` to your pod.  
**@dlavoie:** By default, if there’s no limit, the pod will think it can use
up to 26G of non heap  
**@dlavoie:** Until the container runtime says no, no, no  
**@pabraham.usa:** That can happen even if it is 8G right?  
**@dlavoie:** You need to think about 2 memory configuration.  
**@dlavoie:** The pod is a “VM”, the JVM is running inside it. When working
with offheap, the jvm will ask the OS, how much off heap can I use?. If the
pod is configured without memory limit. It will tell the JVM that 26G is
available.  
**@dlavoie:** That 26G will not be reserved  
**@dlavoie:** because others pods will also think they can use that.  
**@dlavoie:** so, having a pod with a hard 8G limit, will garantee that the
JVM will not go over the fence.  
**@pabraham.usa:** ahh ok, I am actually using AWS node with 122GB of RAM and
26 GB is mem request  
**@pabraham.usa:** 26GB mem request for single Pinot server  
**@dlavoie:** Ok!  
**@dlavoie:** then you can even bump it to 12GB xmx if the server pod has 26Gi
request  
**@pabraham.usa:** Thanks @dlavoie, I increased the xmx from 4G to 8G and
servers are up  
**@pabraham.usa:** I can use 12G  
**@dlavoie:** the rule of thump is ~50%  
**@pabraham.usa:** ok, How abt -Xms12G -Xmx12G  
**@dlavoie:** matching Xms is always a good thing in a container environment.
that memory is wasted anyway if it is request by the pod  
**@pabraham.usa:** great Thanks for the help here  
**@pabraham.usa:** All the segments are in bad status and search not working.
So I have to restore from backup. If the issue was OOM then I assume it might
have caused some rouge segments. Now finding those and changing the offset in
zookeeper to skip those will be hard..!!!  
**@dlavoie:** Are segments being reloaded?  
**@dlavoie:** They will remains in bad state until everything is reloaded.  
**@pabraham.usa:** Ohh triggered a reload now , see how it goes  
**@dlavoie:** watch your CPU and Disk IO, that’s a good tell of what’s
happening  
**@pabraham.usa:** yes both spiked, especially disk  
**@pabraham.usa:** and now all segments came back to good and Pinot is trying
to catchup with stream.  
**@pabraham.usa:** catching up very slowly though  
**@dlavoie:** _cries in 5 hour segments reloads_  
 **@jeff:** @jeff has joined the channel  

###  _#pinot-docs_

  
 **@ken:** I had a few issues/questions about the batch data ingestion
documentation, I’m guessing mostly looking for input from @fx19880617  
**@ken:** When using the example job spec for Hadoop, this line caused a
problem: `# 'glob:**\/*.avro' will include all the avro files under
inputDirURI recursively.` Looks like even though it’s commented out, the
parser complains about the `\\` character, as in
```SimpleTemplateScript1.groovy: 37: unexpected char: '\' @ line 37, column
13.```  
**@fx19880617:** ah, yes, please delete that  
**@ken:** Also I think the last section (currently `Tunning`, should be
`Tuning`) only applies to running a batch job locally, as setting `JAVA_OPTS`
doesn’t impact Hadoop jobs, right?  
**@fx19880617:** I feel it could be issue about the groovy lib version  
**@ken:** Right - I guess I could file an issue about that  
**@fx19880617:** Thanks!  
**@fx19880617:** for hadoop, there is a way to set slave executor size  
**@fx19880617:** I will make the changes  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org