You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/07 02:00:29 UTC

Apache Pinot Daily Email Digest (2021-10-06)

### _#general_

  
 **@prashant.pandey:** Hi folks I am trying to build a docker image from the
source as: ```./docker-build.sh pinot:new-range-index master ``` This gives me
an error: ```executor failed running [/bin/sh -c git clone ${PINOT_GIT_URL}
${PINOT_BUILD_DIR} && cd ${PINOT_BUILD_DIR} && git checkout ${PINOT_BRANCH} &&
mvn install package -DskipTests -Pbin-dist -Pbuild-shaded-jar
-Dkafka.version=${KAFKA_VERSION} -Djdk.version=${JDK_VERSION} && mkdir -p
${PINOT_HOME}/configs && mkdir -p ${PINOT_HOME}/data && cp -r pinot-
distribution/target/apache-pinot-*-bin/apache-pinot-*-bin/* ${PINOT_HOME}/. &&
chmod +x ${PINOT_HOME}/bin/*.sh]: exit code: 1``` Anything that I need to
do/configure to fix this?  
**@xiangfu0:** Can you try: ```./docker-build.sh pinot:new-range-index master
2.0 11```  
**@xiangfu0:** seems jdk8 build has some issue, will try to fix that  
**@prashant.pandey:** It’s building, will update if this works. Thanks
@xiangfu0  
**@xiangfu0:** or you can manually set the JDK image to jdk 11 to build with
jdk8  
**@xiangfu0:** In short this is just a wrapper script on top of ```docker
build --no-cache -t ${DOCKER_TAG} --build-arg PINOT_BRANCH=${PINOT_BRANCH}
--build-arg PINOT_GIT_URL=${PINOT_GIT_URL} --build-arg
KAFKA_VERSION=${KAFKA_VERSION} --build-arg JAVA_VERSION=${JAVA_VERSION} -f
Dockerfile .``` You can set it by yourself  
**@xiangfu0:** and modify Dockerfile accordingly :stuck_out_tongue:  
**@prashant.pandey:** Hey @xiangfu0 specifying java version to 11 is giving me
the same error. Were you able to run it?  
**@xiangfu0:** Hmm do you have full log  
**@sabhi8226:** @sabhi8226 has joined the channel  
 **@dadelcas:** Hi, is batch ingestion the only way to deal with late data
arrivals in a hybrid table at the moment? Are the any plans to add some sort
of out-of-window ingestion support?  
 **@ken:** I’ve always been fascinated by sketch algorithms. Here’s a  that
talks about a related issue: how do you estimate the result of a query, when
the filter condition (the predicate) is very expensive to compute? Their
solution is to create disjoint groups (stratification) using a (faster) proxy
for the expensive predicate.  
 **@flagiron2:** @flagiron2 has joined the channel  

###  _#random_

  
 **@sabhi8226:** @sabhi8226 has joined the channel  
 **@flagiron2:** @flagiron2 has joined the channel  

###  _#troubleshooting_

  
 **@sabhi8226:** @sabhi8226 has joined the channel  
 **@anu110195:** Can we update transformation function of fields in pinot ?  
**@mayanks:** In general backward incompatible changes are prohibited. What’s
the use case where you want to do this?  
**@anu110195:** we found a bug in a transformation function for a field.
Basically "." was missing..so want to update that  
**@anu110195:** i understand segments are imutable :no_mouth:, just looking
some easy solution to fix it.  
 **@deemish2:** Hi All, i am working on spark ingestion job to push previous
date data everyday and it is working fine locally using this command - bin/
-jobSpecFile ${PINOT_DIR}/ingestionJobSpec.yaml -values date=`date -v-1d +%F`
where ' date is set under includeFilePattern parameter
-includeFileNamePattern: 'glob:**/{date}/*.avro'. While executing spark submit
job with this command - $SPARK_HOME/bin/spark-submit --class
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master
"local[2]" \ \--deploy-mode client --conf
"spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins
\ -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-
job-log4j2.xml" \ \--conf
"spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-
all-${PINOT_VERSION}-jar-with-dependencies.jar:/Users/deemish2/apache-
pinot-0.8.0-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-
spark/pinot-batch-ingestion-spark-0.8.0-shaded.jar:/Users/deemish2/apache-
pinot-0.8.0-bin/plugins/pinot-file-system/pinot-hdfs/pinot-
hdfs-0.8.0-shaded.jar" \ local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-
all-${PINOT_VERSION}-jar-with-dependencies.jar \ -jobSpecFile
${PINOT_DIR}/SparkingestionJobSpec.yaml -values date=`date -v-1d +%F` It gives
error like - Caused by: java.lang.IllegalArgumentException: Positive number of
partitions required. It looks like this argument - -values date=`date -v-1d
+%F`. works only with bin/pinot-ingestion.sh. Please help to execute this
spark ingestion job to push previous date data in pinot  
 **@luisfernandez:** can I add partitioning to a table that is already
existing/ingesting data? and can partitioning work by itself without the
groupReplicas? would it speed up queries?  
**@g.kishore:** yes, existing segments wont benefit but new segments will use
that information to speed up queries  
**@luisfernandez:** thank you! that would explain why old segments still seem
to be slow  
**@luisfernandez:** is that true for any kind of indexing?  
**@mayanks:** Yes, this is true for any kind of indexing @luisfernandez.
Availability of metadata/indexing is per segment, and query execution makes
use of it as long as it is available.  
**@g.kishore:** I was about to say its not true :slightly_smiling_face: but I
think there are multiple parts to your question  
**@g.kishore:** Most indexes will start showing performance improvements once
you change the table config and invoke segment reload api on controller  
**@mayanks:** Ah yes, there are multiple parts indeed. What I described in my
previous response was a case where older segments did not have the desired
index, and newer ones did. In that case, new segments will use the indexes.
However, like @g.kishore mentioned, you can invoke segment reload api to
rebuild indexes for existing segments as well.  
 **@will.gan:** Hi, I tried to kick off a rebalance for one of my tables
(moving tenant), but afterward I saw that the idealstate wasn't correct. While
before I had two replicas for each segment on a set of servers, now each
segment only had 1 replica on the same set of servers (not even the new
servers I was trying to move them to), with the exception of the most recent
segment that got moved and has the correct number of replicas. Does anyone
know what the issue might be? FYI this table is being actively queried.  
**@jackie.jxt:** Did you by any chance change the replication of the table?  
**@jackie.jxt:** You can try a dry run and see the ideal state generated  
**@will.gan:** I have since try to change it to get it back to the original
state, but not when I originally did the rebalance.  
**@will.gan:** Hmm I ran the rebalance from the controller UI and when I did
it showed me the generated idealstate which was correct along with the
"Rebalance in progress (check controller logs for updates)" stuff  
**@will.gan:** but yeah the idealstate in the Zookeeper browser was diff
afterwards  
**@jackie.jxt:** That is definitely unexpected  
**@jackie.jxt:** Can you try again with dry-run enabled?  
**@jackie.jxt:** Oh, before that, let's make sure you are checking the ideal
state instead of the external view?  
**@jackie.jxt:** Ideal state should always have 2 replicas, while external
view might have 1 replica during the transitioning phase  
**@will.gan:** Yup, it's the idealstate. It's weird because
`"simpleFields"."REPLICAS"` is 3 for me right now but most segments only have
1.  
**@will.gan:** @jackie.jxt Here are some screenshots. After the dry run in the
first picture I ran normally, showed successfully in progress, then 2nd pic
shows current idealstate.  
**@jackie.jxt:** Can you try the rebalance again and see if the new replica is
added?  
**@will.gan:** no change unfortunately. Is it possible that since there are
segments with only 1 replica now nothing will work unless I set downtime to
true?  
**@jackie.jxt:** Can you please check the controller log and see why the ideal
state is not updated? The log should be under the class `TableRebalancer`  
**@will.gan:** @jackie.jxt hmm is this the right way to get that `cat pinot-
all.log | grep TableRebalancer` ?  
**@jackie.jxt:** Sync offline. Currently `TableRebalancer` cannot rebalance
the table without downtime when there is only 1 replica in the current
assignment. Submitted a fix for this:  
**@flagiron2:** @flagiron2 has joined the channel  

###  _#pinot-dev_

  
 **@dongxiaoman:** @dongxiaoman has joined the channel  

###  _#getting-started_

  
 **@barana:** @barana has joined the channel  
 **@dongxiaoman:** @dongxiaoman has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org