You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/04/04 09:44:04 UTC
[GitHub] [pinot] stym06 opened a new issue, #8460: Null in all columns on Batch ingesting ORC data from S3
stym06 opened a new issue, #8460:
URL: https://github.com/apache/pinot/issues/8460
Hey guys,
I've been trying to ingest data stored on S3 in ORC format using the Pinot ingestor with the below command:
`./pinot-admin.sh LaunchDataIngestionJob -jobSpecFile batch-job-standalone-spec.yaml`
### Ingestion job spec
```
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'
jobType: SegmentCreationAndMetadataPush
inputDirURI: 's3://test-bucket/dev/pinot-input-new/'
outputDirURI: 's3://test-bucket/dev/pinot/axon_entity.db/segments-v2'
overwriteOutput: true
pinotFSSpecs:
- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS
configs:
region: ap-southeast-1
recordReaderSpec:
dataFormat: 'orc'
className: 'org.apache.pinot.plugin.inputformat.orc.ORCRecordReader'
tableSpec:
tableName: 'user_base_fact'
schemaURI: 'http://localhost:9000/tables/user_base_fact/schema'
tableConfigURI: 'http://localhost:9000/tables/user_base_fact'
pinotClusterSpecs:
- controllerURI: 'http://localhost:9000'
pushJobSpec:
pushParallelism: 2
pushAttempts: 2
pushRetryIntervalMillis: 1000
```
The job is able to complete but leads to all null values in the Pinot table:
<img width="1335" alt="Screenshot 2022-04-04 at 3 13 38 PM" src="https://user-images.githubusercontent.com/20970728/161518329-3fa4f1c0-cced-4294-bbcd-0ff2382a3a3a.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] stym06 closed issue #8460: Null in all columns on Batch ingesting ORC data from S3
Posted by GitBox <gi...@apache.org>.
stym06 closed issue #8460: Null in all columns on Batch ingesting ORC data from S3
URL: https://github.com/apache/pinot/issues/8460
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] KKcorps commented on issue #8460: Null in all columns on Batch ingesting ORC data from S3
Posted by GitBox <gi...@apache.org>.
KKcorps commented on issue #8460:
URL: https://github.com/apache/pinot/issues/8460#issuecomment-1087364095
Schema fields do not seem to match ORC field names.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] stym06 commented on issue #8460: Null in all columns on Batch ingesting ORC data from S3
Posted by GitBox <gi...@apache.org>.
stym06 commented on issue #8460:
URL: https://github.com/apache/pinot/issues/8460#issuecomment-1087396255
Thanks @KKcorps . changing the column names worked!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org