You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/17 00:41:00 UTC

[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5873: Add Hadoop counters for detecting schema mismatch

mayankshriv commented on a change in pull request #5873:
URL: https://github.com/apache/incubator-pinot/pull/5873#discussion_r471181999



##########
File path: pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentCreationMapper.java
##########
@@ -353,8 +368,71 @@ protected void addAdditionalSegmentGeneratorConfigs(SegmentGeneratorConfig segme
       int sequenceId) {
   }
 
+  public void validateSchema(SegmentGeneratorConfig segmentGeneratorConfig, RecordReader recordReader) {
+    if (recordReader instanceof AvroRecordReader) {

Review comment:
       Seems like we will have to either write pairwise validators (pinot-avro, pinot-orc, pinot-json, etc). Or can write pair-wise schema converters (avro->pinot, orc->pinot, json->pinot), and then the schema validator will only compare two pinot schemas (one provided as input, other derived from format). At this point, I see pros/cons in both, but leaning towards former as it provides dedicated validation between formats.
   
   However, in either of the approaches, I'd recommend creating interfaces/impls. For example, an interface for validator (with pair-wise impls), or an interaface for converter (with pair-wise converters, and validator just works over interface).
   
   

##########
File path: pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentCreationMapper.java
##########
@@ -243,14 +257,15 @@ protected void map(LongWritable key, Text value, Context context)
     addAdditionalSegmentGeneratorConfigs(segmentGeneratorConfig, hdfsInputFile, sequenceId);
 
     _logger.info("Start creating segment with sequence id: {}", sequenceId);
-    SegmentIndexCreationDriver driver = new SegmentIndexCreationDriverImpl();
+    SegmentIndexCreationDriverImpl driver = new SegmentIndexCreationDriverImpl();

Review comment:
       Seems like we are breaking interface here, what' the reasoning for that? Either the api should be justified to be part of the interface, or the design is broken somehow.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org