You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/04/16 02:00:24 UTC
Apache Pinot Daily Email Digest (2022-04-15)

### _#general_

  
 **@dprotsenko.repos:** @dprotsenko.repos has joined the channel  
 **@harish.bohara:** Hi, I have a setup with
apachepinot/pinot:0.10.0-SNAPSHOT-df1c2681fd-20220207-jdk11 docker version to
run my Pinot Server. I am running 4 pinot-servers, running on 4 m5a.large. It
is the only container running in one box (i have given 6GB for VM in JVM args)
What I expect: The ingestion of events should be constant. What I observed:
When I run this cluster, the ingestion is > 5000RPS. However, it goes below
100RPS after 1 day of continue run. If I restart all the pinot-servers, they
ingestion rate goes back to 5K+ RPS again Am I missing some setting which is
causing this degrade?  
**@kharekartik:** Hi Harish Can you share the tableConfig and schema for this.  
**@harish.bohara:** ``` { "schemaName": "aggregate_v1", "dimensionFieldSpecs":
[ { "name": "ext_event_type", "dataType": "STRING" }, { "name":
"dim__channel", "dataType": "STRING" }, { "name": "dim__pipeline", "dataType":
"STRING" }, { "name": "dim__internal", "dataType": "STRING" }, { "name":
"provider", "dataType": "STRING" }, { "name": "status", "dataType": "STRING"
}, { "name": "year", "dataType": "INT" }, { "name": "month", "dataType": "INT"
}, { "name": "day", "dataType": "INT" }, { "name": "hour", "dataType": "INT" }
], "dateTimeFieldSpecs": [ { "name": "eventTime", "dataType": "TIMESTAMP",
"format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ] } {
"tableName": "aggregate_v1", "tableType": "REALTIME", "segmentsConfig": {
"schemaName": "aggregate_v1", "replication": "2", "timeColumnName":
"eventTime", "allowNullTimeValue": false, "replicasPerPartition": "2",
"retentionTimeUnit": "DAYS", "retentionTimeValue": "30" }, "tenants": {
"broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {}
}, "tableIndexConfig": { "noDictionaryColumns": [], "invertedIndexColumns": [
], "streamConfigs": { "streamType": "kafka", "stream.kafka.topic.name": "MY
TOPIC", "stream.kafka.broker.list": "MY BROKER", "stream.kafka.consumer.type":
"lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.factory.class.name":
"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name":
"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.segment.size": "100M" }, "rangeIndexColumns": [],
"rangeIndexVersion": 1, "autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [],
"bloomFilterColumns": [], "loadMode": "MMAP", "onHeapDictionaryColumns": [],
"varLengthDictionaryColumns": [], "enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false, "aggregateMetrics": false,
"nullHandlingEnabled": true, "starTreeIndexConfigs": [ {
"dimensionsSplitOrder": [ "ext_event_type", "dim__channel", "status" ],
"skipStarNodeCreationForDimensions": [], "functionColumnPairs": [ "COUNT__*"
], "maxLeafRecords": 1 } ] }, "metadata": {}, "quota": {}, "routing": {},
"query": {}, "ingestionConfig": { "transformConfigs": [ { "columnName":
"year", "transformFunction": "year(eventTime, 'Asia/Kolkata')" }, {
"columnName": "month", "transformFunction": "month(eventTime, 'Asia/Kolkata')"
}, { "columnName": "day", "transformFunction": "day(eventTime,
'Asia/Kolkata')" }, { "columnName": "hour", "transformFunction":
"hour(eventTime, 'Asia/Kolkata')" } ] }, "isDimTable": false, "upsertConfig":
{} }```  
**@kharekartik:** @npawar can you help here? The flush configs seem to be
normal as well.  
**@navina:** @harish.bohara how do you restart your servers? and have you
checked if the segments are rolling over? Asking because I notice 2 things:
1\. flush threshold rows = 0 (not sure what the expected behavior is, probably
ignore row threshold). flush threshold time is 24h. 2\. stream's offset reset
criteria is set to `smallest`. I am guessing when you restart you don't have
completed segments that are flushed to disk. So, before restart, can you check
if the table has completed segments ?  
**@npawar:** Thresholds look right (0 means use size based ad ignore row based
like Navina said). my hunch is also the same, so just adding on top of what
Navina said. when you set smallest, pinot consumes from earliest offset in
kafka, hence the high rate of ingestion in the beginning, which then
stabilizes. And when you restart, Pinot will not flush what was in memory,
instead reconsume it when it comes back up, hence right rate of ingestion
after restart. It’s not a cause for concern, unless you are noticing lag in
event arrival.  
 **@sbang:** @sbang has joined the channel  

###  _#random_

  
 **@dprotsenko.repos:** @dprotsenko.repos has joined the channel  
 **@sbang:** @sbang has joined the channel  

###  _#troubleshooting_

  
 **@dprotsenko.repos:** @dprotsenko.repos has joined the channel  
 **@sbang:** @sbang has joined the channel  
 **@ysuo:** Hi, one Pinot server shows dead status. Is there any way to make
it cover from this status? :rolling_on_the_floor_laughing:  
**@npawar:** Restart is the only way I can think of to recover here. But
capture the logs before so we can check why it went dead  
 **@diogo.baeder:** Hi folks, I'm having issues running batch ingestion in my
local experiments with Pinot using docker-compose. More in this thread.  
**@diogo.baeder:** Here's what I get, with the command I run at the top: ```$
docker-compose -f docker-compose-databases.yml exec pinot-controller
bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile
/config/ingestion/weights.yaml SLF4J: Class path contains multiple SLF4J
bindings. SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-
all-0.10.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-environment/pinot-
azure/pinot-azure-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-file-
system/pinot-s3/pinot-s3-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-input-format/pinot-
parquet/pinot-
parquet-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J:
Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-
dropwizard/pinot-
dropwizard-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J:
Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-
yammer/pinot-
yammer-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See
for an explanation. SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory] WARNING:
sun.reflect.Reflection.getCallerClass is not supported. This will impact
performance. WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by
org.codehaus.groovy.reflection.CachedClass (file:/opt/pinot/lib/pinot-
all-0.10.0-jar-with-dependencies.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of
org.codehaus.groovy.reflection.CachedClass WARNING: Use --illegal-access=warn
to enable warnings of further illegal reflective access operations WARNING:
All illegal access operations will be denied in a future release``` and here
are the ingestion jobs files visible inside the controller: ``` $ docker-
compose -f docker-compose-databases.yml exec pinot-controller ls -lh
/config/ingestion/ total 12K -rw-r--r-- 1 1000 1000 956 Apr 15 21:34
brands_metrics.yaml -rw-r--r-- 1 1000 1000 935 Apr 15 21:37 filters.yaml -rw-r
--r-- 1 1000 1000 935 Apr 15 21:31 weights.yaml``` any ideas what's going on?  
**@xiaoman:** The warning itself is not fatal so it does not help finding what
is happening. Can you try ssh into that pinot controller container and find
out the Pinot logs?  
**@diogo.baeder:** Sure! Hold on...  
**@diogo.baeder:** These are the only non-INFO log lines I found when trying
to run the batch jobs: ```2022/04/15 22:02:59.975 INFO
[ControllerPeriodicTask] [pool-7-thread-7] Processing 3 tables in task:
OfflineSegmentIntervalChecker 2022/04/15 22:02:59.981 WARN
[ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/filters_OFFLINE does
not exist 2022/04/15 22:02:59.991 WARN [ZKMetadataProvider] [pool-7-thread-7]
Path: /SEGMENTS/brands_metrics_OFFLINE does not exist 2022/04/15 22:02:59.994
WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/weights_OFFLINE
does not exist``` but I'm not sure what's going on - shouldn't Pinot just
create the segments automatically, when running those jobs?  
**@diogo.baeder:** Here's my job spec for `weights`:
```executionFrameworkSpec: name: 'standalone'
segmentGenerationJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush inputDirURI: '/sensitive-data/outputs/'
includeFileNamePattern: 'glob:weights/**/*.json' outputDirURI:
'/opt/pinot/data' overwriteOutput: true pinotFSSpecs: \- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS recordReaderSpec:
dataFormat: 'json' className:
'org.apache.pinot.plugin.inputformat.json.JSONRecordReader' tableSpec:
tableName: 'weights' schemaURI: '' pinotClusterSpecs: \- controllerURI: ''```
and the data I have in that directory is a .json file, but which contains a
list of JSON dictionaries/maps/objects (not JSONL, just a simple JSON file
whose top-level element is a list/array)  
**@xiaoman:** We can try look through the log lines and see what the job has
progressed into. For example, is the log line in  seen?  
**@xiaoman:** The next checkpoint could be  
**@diogo.baeder:** Here are the full Controller logs for the time when I try
to run the load job:  
**@xiaoman:** Right now based on the code, it seems if the directory/path
points to a place where there is no input file, Pinot does not print out
warning log  If that’s the case, it could be a good chance for you to submit a
code change for Pinot too  
**@xiaoman:** The log seems to be cut off short.  
**@xiaoman:** Is very unlikely but is the pinot process alive? Someone else
killed the process?  
**@diogo.baeder:** But here are the relevant configs for the location of the
input files: ```inputDirURI: '/sensitive-data/outputs/'
includeFileNamePattern: 'glob:weights/**/*.json'``` and here are the files,
for example: ```root@d7ae8b63b8e1:/opt/pinot/logs# find /sensitive-
data/outputs/weights/ -name *2013041*.json /sensitive-
data/outputs/weights/br/20130415.json /sensitive-
data/outputs/weights/br/20130418.json /sensitive-
data/outputs/weights/br/20130416.json /sensitive-
data/outputs/weights/br/20130419.json /sensitive-
data/outputs/weights/br/20130417.json /sensitive-
data/outputs/weights/br/20130414.json /sensitive-
data/outputs/weights/br/20130413.json```  
**@diogo.baeder:** Lemme check  
**@diogo.baeder:** All instances alive, yes  
**@xiaoman:** The pinot logs stopped producing (if you copied the file
properly), it has only 86 lines.  
**@xiaoman:** If even logs are not written, something is wrong that Pinot has
not even started yet  
**@diogo.baeder:** Yeah, I noticed the truncation there, but that's what shows
up in my terminal, not sure why  
**@diogo.baeder:** I'll check the RAM usage  
**@xiaoman:** My past experience with such truncation of logs is the process
was killed, so that’s why I brought it up. Other case could be the Pinot
process is stuck in a dead loop, which is very unlikely. Or maybe the error
log is redirected somewhere else? maybe checking the log4j2.xml file?  
**@diogo.baeder:** Using ~16G, I have 32 total. So not that much.  
**@diogo.baeder:** Let me try taking a look at those, hold on.  
**@diogo.baeder:** The process hasn't been killed, it's been running for more
than 1h. Dunno if a thread has been killed though.  
**@diogo.baeder:** Is it normal for the Controller to spawn hundreds of
threads?  
**@diogo.baeder:** Alright, I truncated the whole log fine, checked the log4j
configs and they seem just fine, and I ran the batch jobs again, same issue,
the logs are somehow truncated. Let me restart Pinot from scratch with
everything clean and try to run the jobs again. I did notice a stupid bug in
my ingestable data that basically invalidates many of the data I have, but at
least some of it should have been processed.  
**@diogo.baeder:** I started from scratch, cleaned up all directories related
to those instances, created the tables, truncated the log file so that I could
only have the stuff for the ingestion job, and I still get truncated logs
after running it, and no useful lines telling me what went wrong  
**@diogo.baeder:** Anyway, I'll fix my issue with the dates I'm generating,
maybe that by itself fixes the issue, but I have low confidence that it will.  
**@diogo.baeder:** Thanks for the help!  

###  _#getting-started_

  
 **@dprotsenko.repos:** @dprotsenko.repos has joined the channel  
 **@sbang:** @sbang has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org