You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/04/04 09:44:04 UTC

[GitHub] [pinot] stym06 opened a new issue, #8460: Null in all columns on Batch ingesting ORC data from S3

stym06 opened a new issue, #8460:
URL: https://github.com/apache/pinot/issues/8460

   Hey guys,
   I've been trying to ingest data stored on S3 in ORC format using the Pinot ingestor with the below command:
   `./pinot-admin.sh LaunchDataIngestionJob -jobSpecFile batch-job-standalone-spec.yaml`
   
   ### Ingestion job spec
   ```
   executionFrameworkSpec:
     name: 'standalone'
     segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
     segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
     segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
     segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'
   jobType: SegmentCreationAndMetadataPush
   inputDirURI: 's3://test-bucket/dev/pinot-input-new/'
   outputDirURI: 's3://test-bucket/dev/pinot/axon_entity.db/segments-v2'
   overwriteOutput: true
   pinotFSSpecs:
     - scheme: s3
       className: org.apache.pinot.plugin.filesystem.S3PinotFS
       configs:
         region: ap-southeast-1
   recordReaderSpec:
     dataFormat: 'orc'
     className: 'org.apache.pinot.plugin.inputformat.orc.ORCRecordReader'
   tableSpec:
     tableName: 'user_base_fact'
     schemaURI: 'http://localhost:9000/tables/user_base_fact/schema'
     tableConfigURI: 'http://localhost:9000/tables/user_base_fact'
   pinotClusterSpecs:
     - controllerURI: 'http://localhost:9000'
   pushJobSpec:
     pushParallelism: 2
     pushAttempts: 2
     pushRetryIntervalMillis: 1000
   ```
   
   The job is able to complete but leads to all null values in the Pinot table:
   <img width="1335" alt="Screenshot 2022-04-04 at 3 13 38 PM" src="https://user-images.githubusercontent.com/20970728/161518329-3fa4f1c0-cced-4294-bbcd-0ff2382a3a3a.png">
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] stym06 closed issue #8460: Null in all columns on Batch ingesting ORC data from S3

Posted by GitBox <gi...@apache.org>.
stym06 closed issue #8460: Null in all columns on Batch ingesting ORC data from S3
URL: https://github.com/apache/pinot/issues/8460


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on issue #8460: Null in all columns on Batch ingesting ORC data from S3

Posted by GitBox <gi...@apache.org>.
KKcorps commented on issue #8460:
URL: https://github.com/apache/pinot/issues/8460#issuecomment-1087364095

   Schema fields do not seem to match ORC field names.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] stym06 commented on issue #8460: Null in all columns on Batch ingesting ORC data from S3

Posted by GitBox <gi...@apache.org>.
stym06 commented on issue #8460:
URL: https://github.com/apache/pinot/issues/8460#issuecomment-1087396255

   Thanks @KKcorps . changing the column names worked!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org