You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/07/01 02:00:08 UTC
Apache Pinot Daily Email Digest (2020-06-30)
<h3><u>#general</u></h3><br><strong>@somanshu.jindal: </strong>@somanshu.jindal has joined the channel<br><strong>@somanshu.jindal: </strong>Hi all, I was trying realtime ingestion in pinot following the docs. <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMdTeAXadp8BL3QinSdRtJdqF7hckgVpJ77N6aIHLFxaXdh8R-2FkcEA4nQ11ltv-2BHIwbrzcyzAWB-2FXjfYIvyo3q0eRZGiLxRWgQRBEMyeB-2FaiIj5v9mTaEZwdJoWK221JX4Q-3D-3DNkyB_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCeW67ym-2BIAwcGPWd1wyGN7Ea3Yzf8bkrMyke-2BYhmP9MvcIojmZ3LEIYjm1MBlncCD-2FINf5B51keWh1NfmJRtG9AfZsChBNT2EsubWBpEBwXb1Ltm4l-2FGKAR-2Fl3yn0BnAh4FZ1mExwSIwkkBjb9EMskuSxewr7QeuxF3M4eVfG1rys-3D>
In the query console i am unable to query timestamp field and getting errors. Any idea why is this happening?<br><strong>@npawar: </strong>I have changed column name to “timstampInEpoch” in the Getting started pages, so that folks don’t hit this error again @somanshu.jindal<br><h3><u>#random</u></h3><br><strong>@somanshu.jindal: </strong>@somanshu.jindal has joined the channel<br><h3><u>#troubleshooting</u></h3><br><strong>@jackie.jxt: </strong>The ideal state looks correct. What is the query you sent? Is the data partitioned by a column?<br><strong>@pradeepgv42: </strong>```select count(distinct(<column_name>)) from <table> ```
column_name % 64 is by which producer decides the kafka partition to send the data into<br><strong>@jackie.jxt: </strong>Without any filter the query should hit all segments<br><strong>@jackie.jxt: </strong>Can you paste the external view of the table?<br><strong>@pradeepgv42: </strong><br><strong>@pradeepgv42: </strong>Ah this is interesting, there’s some segments in ERROR state<br><strong>@pradeepgv42: </strong>And I only see 32 segments here<br><strong>@jackie.jxt: </strong>You might need to open the server log to see what's going wrong with the ERRORed segments<br><strong>@pradeepgv42: </strong>yup yup trying that<br><strong>@pradeepgv42: </strong>curious what does externalview imply?<br><strong>@jackie.jxt: </strong>You can think of Pinot cluster management as a state machine<br><strong>@jackie.jxt: </strong>Ideal state is the desired state, external view is the actual state<br><strong>@jackie.jxt: </strong>What command did you use to get the ideal state/external view?<br><strong>@pradeepgv42: </strong>swagger api<br><strong>@pradeepgv42: </strong>on the controller<br><strong>@pradeepgv42: </strong>ah there’s a `Caused by: java.lang.OutOfMemoryError: Direct buffer memory`<br><strong>@pradeepgv42: </strong>I have two servers consuming for this kafka topic and ideal size set to 150MB and initial segments sizes are ~70MB per partition
So, that leaves me at max (150 * 32) or if we need both the segments while swapping (150 * 64) ~9.6G
Machine size is 16G and I didn’t change the default setting of the pinot servers<br><strong>@pradeepgv42: </strong>Any suggestions on how to think about amount of memory to allocate? or machine size?<br><strong>@g.kishore: </strong>@npawar can you point him to the provisioning tool<br><strong>@npawar: </strong>I don't think there's a doc for that, but this blog has all the details: <https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMULmwXpUy0vBvQDjipJvea-2B1E47iEO0mvL4H3-2FQjn-2FwdnaGaShQy-2FTaAueBS0jK6Lr5Jzj8AtV42fXa0BudKOGwYhVjpaTjPmO9-2FDBNumbyEFoyqsfU1tUQs9sN-2FV9E3-2FiGwVvP0-2BD5HZ2ZFTuoenI0AERuzLEJKH7MgaAhAPyBkg_eg_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCesJQ-2B3qlBjwsGvkfuOSHKJa9bFI1OmzOBMAUwY8dnFqZwpktusOzM4Avt-2Fy0WB-2FTfzfkv2EgG7LJUhgqgHtheJ-2BpDycD-2BvcIdaHdr0YZjmzHxgGEYn9RfAgOqtTwLE4eVjsYt16Cti3O4C7fujxQZqjsYHd-2BK-2F6hNAWE-2Fr60ZRTA-3D><br><strong>@pradeepgv42: </strong>thanks will go over it<br><strong>@quietgolfer: </strong>I have a Kubernetes batch job that runs a LaunchDataIngestionJob. If the job fails, the kubernetes job is still marked as succeeded and completed. This seems like a bug. I'd expect it to indicate that the job failed.
``` kubectl get pods --namespace $NAMESPACE
NAME READY STATUS RESTARTS AGE
...
pinot-populate-local-data-hwpdm 0/1 Completed 0 14s```
```kubectl logs --namespace $NAMESPACE pinot-populate-local-data-hwpdm
...
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
...```
```kubectl describe --namespace $NAMESPACE pod/pinot-populate-local-data-hwpdm
...
Status: Succeeded```<br><strong>@quietgolfer: </strong>```# TODO - is outputDirURI set correctly?
apiVersion: v1
kind: ConfigMap
metadata:
name: pinot-local-data-config
data:
local_batch_job_spec.yaml: |-
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/home/pinot/local-raw-data/'
outputDirURI: '/tmp/metrics/segments/'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'json'
className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
tableName: 'metrics'
schemaURI: '<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYatTHeqKcLaMSt8ep4ihF7RXtXGgZxp5hTRHO-2BY-2B6tuCTOWdECu2nvqZ7jax-2FyFgbw-3D-3DbQGJ_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCeXSTl4QKLSvcODYEFcFQ2XbO1ZxQGtBNzkmb8oTZu8gyvTsQvytzCcTMuNcz8XoyW57DtD-2B7-2B3IUWHpBABhVI26176s9vuvZlnRF-2Fnme5vzlhS2DSW-2FfHLwUIzEvKYdcUqn7g3qfmZ-2BojnhIpLw83ku9IOFnwtd-2B3cxr1V728kCI-3D>'
tableConfigURI: '<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYatTHeqKcLaMSt8ep4ihF7RXtXGgZxp5hTRHO-2BY-2B6tuCaOm-2FZ-2BCvPv7tFdFPs-2Fbz5A-3D-3DWaKw_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCeu4phGPDJc-2F69-2BUz1JueV8543d2bqTd5VNWVeUIgYIqr3tOwdn4Spkza4ShRU-2FiC2TlB0LK7EFVWKDoPFprXdYhx0SVdXdwqtnsv5R-2BQcluyvXolFzqZ3PVRN5M-2FnPoA5yLQQYC7PTtaY44hAgFGxdy0QRMjBICnIa2d8fqVLONI-3D>'
pinotClusterSpecs:
- controllerURI: '<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYatTHeqKcLaMSt8ep4ihF7R7meK-2BGMYOH71C9ZJr7fDkHtzN_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCeXqkPGQ6TVVkkqHsyARtcxElaxvYabOXaNHDjm4W7Li5fv8GXpTdMs6cRTCoAakW9Q8eV4giBrhd-2BnTK4Ttp4-2FafdjnsiZMIrjyv1wuITX7lAzLxte-2F1T-2FCCM1jACgicO-2FcmJX6-2F8XWQsziQ-2BMqF53cVXI2PfwFqMpqGat0BbgZM-3D>'
---
apiVersion: batch/v1
kind: Job
metadata:
name: pinot-populate-local-data
spec:
template:
spec:
containers:
- name: pinot-populate-local-data
image: apachepinot/pinot:0.4.0
args: [ "LaunchDataIngestionJob", "-jobSpecFile", "/home/pinot/pinot-config/local_batch_job_spec.yaml" ]
volumeMounts:
- name: pinot-local-data-config
mountPath: /home/pinot/pinot-config
- name: pinot-local-data
mountPath: /home/pinot/local-raw-data
restartPolicy: OnFailure
volumes:
- name: pinot-local-data-config
configMap:
name: pinot-local-data-config
- name: pinot-local-data
hostPath:
path: /my/local/path
backoffLimit: 100```<br><strong>@quietgolfer: </strong>This isn't blocking me but I'd imagine this would lead to quality bugs in production.<br><strong>@fx19880617: </strong>I will take a look, it would be helpful if you can paste the stacktrace or create an issue<br><strong>@fx19880617: </strong>so I can check why the job is not failing<br>