You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/12/13 02:00:15 UTC

Apache Pinot Daily Email Digest (2020-12-12)

### _#feat-presto-connector_

  
 **@alexj.nich:** @alexj.nich has joined the channel  

###  _#troubleshooting_

  
 **@michael:** I've noticed if I delete segments from the UI it only removes
them from ZK but not deep storage. The next time I run an ingestion job for
the table unrelated to the deleted segments it re-adds them to the table. Is
this expected? Am I missing something?  
**@g.kishore:** I dont think the delete call, deletes it from the deep store.
We delete it from deep store only retention manager kicks in (which is based
on the retention set in table config)  
**@michael:** It's expected that the ingestion job add the deleted segments
back? The deleted segments even have different prefix than the one the job is
running  
**@g.kishore:** > The next time I run an ingestion job for the table unrelated
to the deleted segments it re-adds them to the table. Is this expected? Am I
missing something? This is not expected. Can you show the segments list and
ingestion job spec  
**@michael:** ```executionFrameworkSpec: name: 'standalone'
segmentGenerationJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentUriPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndUriPush inputDirURI: '' includeFileNamePattern:
'glob:**/*.parquet' outputDirURI: '' segmentCreationJobParallelism: 4
overwriteOutput: true pinotFSSpecs: \- scheme: s3 className:
org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'us-east-1'
endpoint: '' accessKey: 'pinot' secretKey: 'pinot!!!' recordReaderSpec:
dataFormat: 'parquet' className:
'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader' tableSpec:
tableName: 'mm' schemaURI: '' tableConfigURI: '' pinotClusterSpecs: \-
controllerURI: '' pushJobSpec: pushAttempts: 2 pushRetryIntervalMillis: 1000
segmentNameGeneratorSpec: type: normalizedDate configs: segment.name.prefix:
'mm_batch_test'```  
**@michael:** ```{ "id": "mm_OFFLINE", "simpleFields": { "BATCH_MESSAGE_MODE":
"false", "IDEAL_STATE_MODE": "CUSTOMIZED", "INSTANCE_GROUP_TAG": "mm_OFFLINE",
"MAX_PARTITIONS_PER_INSTANCE": "1", "NUM_PARTITIONS": "3", "REBALANCE_MODE":
"CUSTOMIZED", "REPLICAS": "1", "STATE_MODEL_DEF_REF":
"SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" },
"mapFields": { "mm_batch_test_2020-11-19_2020-11-19_0": {
"Server_172.20.0.6_8098": "ONLINE" }, "mm_batch_test_2020-11-19_2020-11-19_1":
{ "Server_172.20.0.6_8098": "ONLINE" },
"mm_batch_test_2020-11-19_2020-11-19_2": { "Server_172.20.0.6_8098": "ONLINE"
} }, "listFields": {} }```  
**@michael:** after running the job:  
**@michael:** ```{ "id": "mm_OFFLINE", "simpleFields": { "BATCH_MESSAGE_MODE":
"false", "IDEAL_STATE_MODE": "CUSTOMIZED", "INSTANCE_GROUP_TAG": "mm_OFFLINE",
"MAX_PARTITIONS_PER_INSTANCE": "1", "NUM_PARTITIONS": "7", "REBALANCE_MODE":
"CUSTOMIZED", "REPLICAS": "1", "STATE_MODEL_DEF_REF":
"SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" },
"mapFields": { "mm_batch1_test_2020-11-19_2020-11-19_0": {
"Server_172.20.0.6_8098": "ONLINE" },
"mm_batch1_test_2020-11-19_2020-11-19_1": { "Server_172.20.0.6_8098": "ONLINE"
}, "mm_batch2_test_2020-11-19_2020-11-19_0": { "Server_172.20.0.6_8098":
"ONLINE" }, "mm_batch2_test_2020-11-19_2020-11-19_1": {
"Server_172.20.0.6_8098": "ONLINE" }, "mm_batch_test_2020-11-19_2020-11-19_0":
{ "Server_172.20.0.6_8098": "ONLINE" },
"mm_batch_test_2020-11-19_2020-11-19_1": { "Server_172.20.0.6_8098": "ONLINE"
}, "mm_batch_test_2020-11-19_2020-11-19_2": { "Server_172.20.0.6_8098":
"ONLINE" } }, "listFields": {} }```  
**@michael:** picked up the old deleted segments from other batch jobs  
**@michael:** I was expecting it to just replace the existing mm_batch_test
segments  
**@g.kishore:** May be because of old segment is still there in the output for
of the ingestion job?  
**@michael:** yes I see that  
**@michael:** what's the purpose of the ingestion job output and leaving the
output files there?  
**@g.kishore:** I don’t see any reason  
**@g.kishore:** We should delete it... also in spark mode the task directory
gets deleted automatically after task is run.. that’s probably why we don’t
delete it explicitly  
**@g.kishore:** Mind filing an issue?  
**@michael:** Sure  
**@michael:** thank you  
**@ssubrama:** The delete API should delete it right away, not just when the
retention manager kicks in. It will move it to the deleted folder inside your
deep store (I forget whether this is based on config), where will reside for
some number of days, and then be removed. It is a bug if the segments still
show up on a new table. It is possible if the new table is added within a very
short time of deletion, because it takes a few seconds for the segments to be
deleted, and then for helix externalview to stabilize. So, we always advise
creating tables with a different name  
**@michael:** I was deleting individual segments of the table and never saw
them removed from deep storage, are the servers responsible for task for
deleting?  
**@ssubrama:** Nope, they should get deleted when you delete the segments.
Like I said, they are moved into a folder called `Deleted_Segments/tableName`  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org