You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/03/17 21:40:30 UTC

[GitHub] [druid] suneet-s opened a new pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

suneet-s opened a new pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530
 
 
   WIP - pushing to see what tests fail. Will update the description in the next patch
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `MyFoo`
    * `OurBar`
    * `TheirBaz`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530#discussion_r398064095
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java
 ##########
 @@ -270,6 +273,17 @@ public boolean hasNext()
           theEvent.put(metric, value);
         }
       }
+
+      /*
 
 Review comment:
   Could you change this to single-line comments? We don't usually use multi-line comment.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530#discussion_r398077218
 
 

 ##########
 File path: integration-tests/src/test/java/org/apache/druid/tests/indexer/ITIndexerTest.java
 ##########
 @@ -34,16 +34,25 @@
   private static final String INDEX_QUERIES_RESOURCE = "/indexer/wikipedia_index_queries.json";
   private static final String INDEX_DATASOURCE = "wikipedia_index_test";
 
+  private static final String INDEX_WITH_TIMESTAMP_TASK = "/indexer/wikipedia_with_timestamp_index_task.json";
+  // TODO: add queries that validate timestamp is different from the __time column since it is a dimension
 
 Review comment:
   Would you open an issue for this instead of TODO?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530#discussion_r398084237
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java
 ##########
 @@ -270,6 +273,17 @@ public boolean hasNext()
           theEvent.put(metric, value);
         }
       }
+
+      /*
+       * Timestamp is added last because we expect that the time column will always be a date time object.
+       * If it is added earlier, it can be overwritten by metrics or dimenstions with the same name.
+       *
+       * If a user names a metric or dimension `__time` it will be overwritten. This case should be rare since
 
 Review comment:
   I'm worried about log explosion. since this is done per row. I'd have to add explicit checking outside of this next block. Maybe in the constructor? Would that be visible enough in the logs?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530#discussion_r398102803
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java
 ##########
 @@ -270,6 +273,17 @@ public boolean hasNext()
           theEvent.put(metric, value);
         }
       }
+
+      /*
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh merged pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

Posted by GitBox <gi...@apache.org>.
ccaominh merged pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530#discussion_r398105147
 
 

 ##########
 File path: integration-tests/src/test/java/org/apache/druid/tests/indexer/ITIndexerTest.java
 ##########
 @@ -34,16 +34,25 @@
   private static final String INDEX_QUERIES_RESOURCE = "/indexer/wikipedia_index_queries.json";
   private static final String INDEX_DATASOURCE = "wikipedia_index_test";
 
+  private static final String INDEX_WITH_TIMESTAMP_TASK = "/indexer/wikipedia_with_timestamp_index_task.json";
+  // TODO: add queries that validate timestamp is different from the __time column since it is a dimension
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530#discussion_r398066456
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java
 ##########
 @@ -270,6 +273,17 @@ public boolean hasNext()
           theEvent.put(metric, value);
         }
       }
+
+      /*
+       * Timestamp is added last because we expect that the time column will always be a date time object.
+       * If it is added earlier, it can be overwritten by metrics or dimenstions with the same name.
+       *
+       * If a user names a metric or dimension `__time` it will be overwritten. This case should be rare since
 
 Review comment:
   Hmm, I think this overwriting should never happen, but it could happen for some reason in practice, e.g., user mistake. How about logging a warning if there are duplicate column names? Doc could say some kind of warning messages can be printed if there are duplicates.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9530: DruidSegmentReader should work if timestamp is specified as a dimension
URL: https://github.com/apache/druid/pull/9530#discussion_r398089315
 
 

 ##########
 File path: indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java
 ##########
 @@ -270,6 +273,17 @@ public boolean hasNext()
           theEvent.put(metric, value);
         }
       }
+
+      /*
+       * Timestamp is added last because we expect that the time column will always be a date time object.
+       * If it is added earlier, it can be overwritten by metrics or dimenstions with the same name.
+       *
+       * If a user names a metric or dimension `__time` it will be overwritten. This case should be rare since
 
 Review comment:
   Ah good point. Now I think we need some schema validation for ingestion which could probably be done in `DataSchema`. But this would be a larger issue than the bug this PR fixes, and I'm ok with adding it later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org