You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/28 22:08:21 UTC

[GitHub] [druid] joykent99 opened a new pull request #10332: Fix stringFirst/stringLast rollup during ingestion

joykent99 opened a new pull request #10332:
URL: https://github.com/apache/druid/pull/10332


   Fixes #7243 .
   
   ### Description
   
   This PR fixes the issues with stringFirst/stringLast rollup during ingestion, based on the suggestions described in #7243. A test is added to trigger the rollup code path for stringFirst/stringLast.
   
   
   This PR also enables earliest/latest aggregator to operate on these columns during query time.
   
   #### Fixed the bug ...
   #### Renamed the class ...
   #### Added a forbidden-apis entry ...
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `MyFoo`
    * `OurBar`
    * `TheirBaz`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-689074103


   @joykent99 the way I resolve merge conflicts is like this
   
   ```
   git fetch upstream
   git merge upstream/master
   # Resolve merge conflicts manually
   # ...
   git push origin <branch>
   ```
   
   This way you don't need to force push, so the comment history is still referenceable in the PR. I think the change looks good to me, just going to do a once over today and post final comments.
   
   Thanks again for fixing this!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r482597454



##########
File path: processing/src/test/java/org/apache/druid/query/aggregation/first/StringFirstAggregationTest.java
##########
@@ -115,7 +117,7 @@ public void testCombine()
   {
     SerializablePairLongString pair1 = new SerializablePairLongString(1467225000L, "AAAA");
     SerializablePairLongString pair2 = new SerializablePairLongString(1467240000L, "BBBB");
-    Assert.assertEquals(pair2, stringFirstAggFactory.combine(pair1, pair2));
+    Assert.assertEquals(pair1, stringFirstAggFactory.combine(pair1, pair2));

Review comment:
       This is the fix to the existing test for `StringFirstAggregatorFactory.combine()` fixes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-685839699


   @suneet-s, I have added some integration tests in this PR. It looks like the CI build failed due to license errors. Do you have any suggestions to how to resolve these errors? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bavaria95 edited a comment on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
bavaria95 edited a comment on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-704324620


   @joykent99 suneet-s
   
   does the line `First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.` in docs still correct after this MR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
suneet-s merged pull request #10332:
URL: https://github.com/apache/druid/pull/10332


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-683529929


   @joykent99 Thanks for the contribution! I haven't looked through the approach yet, but I notice there aren't any integration tests added for this change. Can you add one for the bug reported in #7243. This way we can be sure not to break this feature going forward.
   
   I think you can do this by modifying one of the tests in `ITIndexerTest`. Consider adding a stringFirst and stringLast aggregation to `/indexer/wikipedia_index_task.json` and then modifying one of the queries in `/indexer/wikipedia_index_queries.json` to test that the aggregation is correct.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r482591914



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
##########
@@ -227,14 +257,14 @@ public Aggregation toDruidAggregation(
           aggregatorType.name(),
           null,
           SqlKind.OTHER_FUNCTION,
-          ReturnTypes.ARG0,
+          new EarliestLatestReturnTypeInference(0),
           InferTypes.RETURN_TYPE,
           OperandTypes.or(
               OperandTypes.NUMERIC,
               OperandTypes.BOOLEAN,
               OperandTypes.sequence(
                   "'" + aggregatorType.name() + "(expr, maxBytesPerString)'\n",
-                  OperandTypes.STRING,
+                  OperandTypes.ANY,

Review comment:
       When we specify "stringFirst/stringLast" in ingestion MetricsSpec, the type for the columns stored in segments is `OTHER`. With `STRING` type here, the SQL will report an error when trying to call `earliest/latest` on those columns.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r482593295



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
##########
@@ -227,14 +257,14 @@ public Aggregation toDruidAggregation(
           aggregatorType.name(),
           null,
           SqlKind.OTHER_FUNCTION,
-          ReturnTypes.ARG0,
+          new EarliestLatestReturnTypeInference(0),
           InferTypes.RETURN_TYPE,
           OperandTypes.or(
               OperandTypes.NUMERIC,
               OperandTypes.BOOLEAN,
               OperandTypes.sequence(
                   "'" + aggregatorType.name() + "(expr, maxBytesPerString)'\n",
-                  OperandTypes.STRING,
+                  OperandTypes.ANY,

Review comment:
       If you have any suggestions on where is a good place to add a test for this, that would be great!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-689005783


   @suneet-s Please let me know if you have more comments on this PR. It looks like @gianm also found the issue with StringFirstAggregatorFactory.combine() function, which is what caused the above conflicts. Should I pull in the new changes in the master to my branch, and force push it? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r485137792



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
##########
@@ -227,14 +257,14 @@ public Aggregation toDruidAggregation(
           aggregatorType.name(),
           null,
           SqlKind.OTHER_FUNCTION,
-          ReturnTypes.ARG0,
+          new EarliestLatestReturnTypeInference(0),
           InferTypes.RETURN_TYPE,
           OperandTypes.or(
               OperandTypes.NUMERIC,
               OperandTypes.BOOLEAN,
               OperandTypes.sequence(
                   "'" + aggregatorType.name() + "(expr, maxBytesPerString)'\n",
-                  OperandTypes.STRING,
+                  OperandTypes.ANY,

Review comment:
       Usually I sql add tests to `CalciteQueryTest` You can see `testEarliestAggregators` as an example. You could write a test in here to validate the SQL translation works, but I think you need to update `CalciteTests#INDEX_SCHEMA_DIFFERENT_DIM3_M1_TYPES` to add an earliest / latest aggregator. Then you should be able to test a SQL test in `CalciteQueryTest` - hope this helps




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r482569314



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
##########
@@ -227,14 +257,14 @@ public Aggregation toDruidAggregation(
           aggregatorType.name(),
           null,
           SqlKind.OTHER_FUNCTION,
-          ReturnTypes.ARG0,
+          new EarliestLatestReturnTypeInference(0),

Review comment:
       We should reuse this instance across all calls to `toDruidAggregation`

##########
File path: integration-tests/src/test/resources/indexer/wikipedia_merge_index_task.json
##########
@@ -0,0 +1,70 @@
+{
+    "type": "index",
+    "spec": {
+        "dataSchema": {
+            "dataSource": "%%DATASOURCE%%",
+            "metricsSpec": [
+                {
+                    "type": "count",
+                    "name": "count"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "added",
+                    "fieldName": "added"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "deleted",
+                    "fieldName": "deleted"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "delta",
+                    "fieldName": "delta"
+                },
+                {
+                    "type": "stringFirst",
+                    "name": "first_user",
+                    "fieldName": "user"
+                },
+                {
+                    "type": "stringLast",
+                    "name": "last_user",
+                    "fieldName": "user"
+                }

Review comment:
       Can we add these to `wikipedia_index_task.json` instead. This way we don't need to run another integration test which can be quite slow
   
   You will probably want to do something very similar to #9277

##########
File path: processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregatorFactory.java
##########
@@ -188,7 +188,7 @@ public Comparator getComparator()
   @Override
   public Object combine(Object lhs, Object rhs)
   {
-    return TIME_COMPARATOR.compare(lhs, rhs) > 0 ? lhs : rhs;
+    return TIME_COMPARATOR.compare(lhs, rhs) < 0 ? lhs : rhs;

Review comment:
       wow! Is there a test for this somewhere?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
##########
@@ -227,14 +257,14 @@ public Aggregation toDruidAggregation(
           aggregatorType.name(),
           null,
           SqlKind.OTHER_FUNCTION,
-          ReturnTypes.ARG0,
+          new EarliestLatestReturnTypeInference(0),
           InferTypes.RETURN_TYPE,
           OperandTypes.or(
               OperandTypes.NUMERIC,
               OperandTypes.BOOLEAN,
               OperandTypes.sequence(
                   "'" + aggregatorType.name() + "(expr, maxBytesPerString)'\n",
-                  OperandTypes.STRING,
+                  OperandTypes.ANY,

Review comment:
       shouldn't this remain `STRING` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bavaria95 edited a comment on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
bavaria95 edited a comment on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-704324620


   @joykent99 @suneet-s
   
   does the line `First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.` in docs still correct after this MR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-683766612


   Thanks @suneet-s for the tip on adding the integration tests. I will work on it and update the PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-685863211


   > @suneet-s, I have added some integration tests in this PR. It looks like the CI build failed due to license errors. Do you have any suggestions to how to resolve these errors? Thanks.
   
   I re-triggered the job. I've seen it be flaky before, not sure why. Looks like everything is passing now 🎉 
   
   I'll try to get some time to review this in the next couple days


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r482590046



##########
File path: processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregatorFactory.java
##########
@@ -188,7 +188,7 @@ public Comparator getComparator()
   @Override
   public Object combine(Object lhs, Object rhs)
   {
-    return TIME_COMPARATOR.compare(lhs, rhs) > 0 ? lhs : rhs;
+    return TIME_COMPARATOR.compare(lhs, rhs) < 0 ? lhs : rhs;

Review comment:
       Yes. The existing test case was testing the wrong behavior. The fix to the test is also in this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r482590660



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
##########
@@ -227,14 +257,14 @@ public Aggregation toDruidAggregation(
           aggregatorType.name(),
           null,
           SqlKind.OTHER_FUNCTION,
-          ReturnTypes.ARG0,
+          new EarliestLatestReturnTypeInference(0),

Review comment:
       Sounds good.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-704344568


   > @joykent99 @suneet-s
   > 
   > does the line `First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.` in docs still correct after this MR?
   
   @bavaria95 Can you point me to a link of the doc you mentioned please. See #7599 Long / Double / Float first/last aggs still can not be used at ingestion time


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-704351526


   > > @joykent99 @suneet-s
   > > does the line `First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.` in docs still correct after this MR?
   > 
   > @bavaria95 Can you point me to a link of the doc you mentioned please. See #7599 Long / Double / Float first/last aggs still can not be used at ingestion time
   
   That's correct -- Long / Double / Float first/last agg still can not be used at ingestion time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-683168180


   This is the first time I submitted a patch to Druid. Any comments and suggestions are appreciated!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bavaria95 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
bavaria95 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-704324620


   @joykent99 does the line `First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.` in docs still correct after this MR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bavaria95 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
bavaria95 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-704400533






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-689212372


   @suneet-s I have merged in the changes in master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-704406915


   > @joykent99 but firstString/lastString should be working fine?
   > 
   > Could you please explain how to use it inside the query? Let's say we added a metric to be `lastString` aggregator during ingestion time. And this metric has `OTHER` type:
   > 
   > How can it be queried, something like say `SELECT * WHERE test='JarBot'`?
   
   Since the data for `lastString` aggregator stored in segments are in `SerializablePair`, to get the final result, we issue the query like this:
   
   ```
   select <your_grouping_column>, latest(test, 1024)
   from wikipedia-test
   group by <your_grouping_column>
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r482594653



##########
File path: integration-tests/src/test/resources/indexer/wikipedia_merge_index_task.json
##########
@@ -0,0 +1,70 @@
+{
+    "type": "index",
+    "spec": {
+        "dataSchema": {
+            "dataSource": "%%DATASOURCE%%",
+            "metricsSpec": [
+                {
+                    "type": "count",
+                    "name": "count"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "added",
+                    "fieldName": "added"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "deleted",
+                    "fieldName": "deleted"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "delta",
+                    "fieldName": "delta"
+                },
+                {
+                    "type": "stringFirst",
+                    "name": "first_user",
+                    "fieldName": "user"
+                },
+                {
+                    "type": "stringLast",
+                    "name": "last_user",
+                    "fieldName": "user"
+                }

Review comment:
       I also added these in `wikipedia_index_task.json`, which led me to discover the issue with `StringFirstAggregatorFactory.combine()` below.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on pull request #10332:
URL: https://github.com/apache/druid/pull/10332#issuecomment-686085286


   > I don't fully understand how Combiners work, so I'll need to spend more time thinking about how this might impact other things. But I commented on stuff I have a little bit of a better understanding of.
   
   Thanks for taking a look, @suneet-s!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] joykent99 commented on a change in pull request #10332: Fix stringFirst/stringLast rollup during ingestion

Posted by GitBox <gi...@apache.org>.
joykent99 commented on a change in pull request #10332:
URL: https://github.com/apache/druid/pull/10332#discussion_r482589444



##########
File path: integration-tests/src/test/resources/indexer/wikipedia_merge_index_task.json
##########
@@ -0,0 +1,70 @@
+{
+    "type": "index",
+    "spec": {
+        "dataSchema": {
+            "dataSource": "%%DATASOURCE%%",
+            "metricsSpec": [
+                {
+                    "type": "count",
+                    "name": "count"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "added",
+                    "fieldName": "added"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "deleted",
+                    "fieldName": "deleted"
+                },
+                {
+                    "type": "doubleSum",
+                    "name": "delta",
+                    "fieldName": "delta"
+                },
+                {
+                    "type": "stringFirst",
+                    "name": "first_user",
+                    "fieldName": "user"
+                },
+                {
+                    "type": "stringLast",
+                    "name": "last_user",
+                    "fieldName": "user"
+                }

Review comment:
       The issue described in #7243 requires a trigger of merging of two index files, which will in turn cause a rollup of two rows. I wasn't able to trigger that in `wikipedia_index_task.json`. This is why I added a new task to capture this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org