You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/03 17:14:10 UTC

[GitHub] [druid] gianm opened a new pull request #10233: Add "offset" parameter to the Scan query.

gianm opened a new pull request #10233:
URL: https://github.com/apache/druid/pull/10233


   It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-672682361


   > I've converted the patch to a draft.
   
   I've updated the patch to stabilize Scan result order, meaning that it is ready to review. A new test, ScanQueryResultOrderingTest, verifies that for three test segments, the result order is stable at all possible server distributions, ordering, and limit settings.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668456943


   I'd like to get #10235 merged first, then use the StableLimitingSorter from there in this patch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-672811713


   This pull request **introduces 1 alert** when merging 09af824e1a78eea91978b6a52cce9f8f013193c4 into e273264332f5d1e619c50c6ee9727058ccdfb9b2 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-3b52efbc134b5ff1588264c5bc52c26b0067a542)
   
   **new alerts:**
   
   * 1 for Inconsistent equals and hashCode


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-672894052


   > I'm not sure why this warning keeps firing. I tried to add a suppression, but it doesn't seem to work…
   
   It looks correct to me... According to lgtm, the suppressions are only picked up after the change is merged, so I wouldn't worry about the warning - https://lgtm.com/help/lgtm/alert-suppression
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668173929


   Hmm, I just realized that this method of paginating is a bad idea if you're doing time ordering and lots of rows have the same timestamp. It uses a `java.util.PriorityQueue` internally, which isn't stable with regard to rows with the same timestamp. This is fine if you're only doing one query, but it means that offset-based pagination will not be stable either. More thought and test coverage is needed to make sure we handle this case properly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekrb19 commented on a change in pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
abhishekrb19 commented on a change in pull request #10233:
URL: https://github.com/apache/druid/pull/10233#discussion_r469421557



##########
File path: processing/src/main/java/org/apache/druid/query/scan/ScanQuery.java
##########
@@ -202,12 +210,38 @@ public int getBatchSize()
     return batchSize;
   }
 
-  @JsonProperty
+  /**
+   * Offset for this query; behaves like SQL "OFFSET". Zero means no offset. Negative values are invalid.
+   */
+  @JsonProperty("offset")
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public long getScanRowsOffset()
+  {
+    return scanRowsOffset;
+  }
+
+  /**
+   * Offset for this query; behaves like SQL "LIMIT". Will always be positive. {@link Long#MAX_VALUE} is used in

Review comment:
       Same comment as https://github.com/apache/druid/pull/10235#discussion_r469416176. :-) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-672896870


   > It looks correct to me... According to lgtm, the suppressions are only picked up after the change is merged, so I wouldn't worry about the warning - https://lgtm.com/help/lgtm/alert-suppression
   
   Ah, I didn't realize that. I was expecting it to respect the suppressions from the particular branch we're on. OK, I'll stop worrying about it then.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm merged pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm merged pull request #10233:
URL: https://github.com/apache/druid/pull/10233


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-673141482


   This pull request **introduces 1 alert** when merging 2edebad0fbf8fc77426075d1a25fa957fb315192 into d36a0f61da2cf1001738fb3bf54231a1ac7e6678 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-ff50e338d88be393b7ccfe5e4cc81ecd2b9c1d81)
   
   **new alerts:**
   
   * 1 for Inconsistent equals and hashCode


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10233:
URL: https://github.com/apache/druid/pull/10233#discussion_r469551515



##########
File path: processing/src/main/java/org/apache/druid/query/scan/ScanQuery.java
##########
@@ -202,12 +210,38 @@ public int getBatchSize()
     return batchSize;
   }
 
-  @JsonProperty
+  /**
+   * Offset for this query; behaves like SQL "OFFSET". Zero means no offset. Negative values are invalid.
+   */
+  @JsonProperty("offset")
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public long getScanRowsOffset()
+  {
+    return scanRowsOffset;
+  }
+
+  /**
+   * Offset for this query; behaves like SQL "LIMIT". Will always be positive. {@link Long#MAX_VALUE} is used in

Review comment:
       Yes, you're right, fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm edited a comment on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm edited a comment on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-672682361


   I've updated the patch to stabilize Scan result order, meaning that it is ready to review. A new test, ScanQueryResultOrderingTest, verifies that for three test segments, the result order is stable at all possible server distributions, ordering, and limit settings.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668336437


   This pull request **introduces 1 alert** when merging 488788e0de14baa8b7da425a7a44e94b582d2877 into 34a41137522fca463c6af1df5edb2f41297491f5 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-396f7ed8735e7124ae9757fb90d704a17c51be90)
   
   **new alerts:**
   
   * 1 for Inconsistent equals and hashCode


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-672885092


   > This pull request **introduces 1 alert** when merging [09af824](https://github.com/apache/druid/commit/09af824e1a78eea91978b6a52cce9f8f013193c4) into [e273264](https://github.com/apache/druid/commit/e273264332f5d1e619c50c6ee9727058ccdfb9b2) - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-3b52efbc134b5ff1588264c5bc52c26b0067a542)
   > 
   > **new alerts:**
   > 
   >     * 1 for Inconsistent equals and hashCode
   
   I'm not sure why this warning keeps firing. I tried to add a suppression, but it doesn't seem to work…
   
   Here it is, if anyone's got ideas: https://github.com/apache/druid/pull/10233/files#diff-b67af07d1cac62b329208a8397a0544fR410


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekrb19 commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
abhishekrb19 commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668245599


   > I was thinking of starting with Scan and GroupBy. Timeseries could make sense too. TopN, I'm not as sure about, because the fact that offset needs to bump up limit means that it will start to increase memory use pretty quickly as you go to the later pages.
   
   @gianm, yes, that makes sense. Also, I was wondering if in the future we could have a stateful offset implementation where the broker could cache the query/result set from a previous request for a short duration. This could potentially be leveraged in subsequent requests without having to look at the entire result set every time and throw away the rows that are not needed. I think this could get tricky for identical rows, but would be computationally _less_ resource intensive?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668226110


   This pull request **introduces 1 alert** when merging b3f4f83e497c78cbab646f058b3bc13028cb75f4 into 34a41137522fca463c6af1df5edb2f41297491f5 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-1db408ed53e602c8c1de7bd00e4dbe65b7e47bbd)
   
   **new alerts:**
   
   * 1 for Inconsistent equals and hashCode


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668365464


   This pull request **introduces 1 alert** when merging 015bd2adbdc1818ad95db99c2a4dcb5e81a8b3c3 into 34a41137522fca463c6af1df5edb2f41297491f5 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-84f63824ec7872906633bcf3489ca3a396f6646f)
   
   **new alerts:**
   
   * 1 for Inconsistent equals and hashCode


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm edited a comment on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm edited a comment on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668173929


   Hmm, I just realized that this method of paginating is a bad idea if you're doing time ordering and lots of rows have the same timestamp. It uses a `java.util.PriorityQueue` internally, which isn't stable with regard to rows with the same timestamp. This is fine if you're only doing one query, but it means that offset-based pagination will not be stable either. More thought and test coverage is needed to make sure we handle this case properly.
   
   I've converted the patch to a draft.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekrb19 commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
abhishekrb19 commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668154640


   @gianm, this is a great addition! Do you think it also makes sense to add `OFFSET` to other query types, namely timeseries/groupBy/topN queries? 
   
   I could see timeseries queries returning large result sets, so I think it'd be useful to throw away stuff that is not needed by a client, thereby reducing the # of bytes transferred over the wire. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668170209


   > @gianm, this is a great addition! Do you think it also makes sense to add `OFFSET` to other query types, namely timeseries/groupBy/topN queries?
   > 
   > I could see timeseries queries returning large result sets, so I think it'd be useful to throw away stuff that is not needed by a client, thereby reducing the # of bytes transferred over the wire. What do you think?
   
   I was thinking of starting with Scan and GroupBy. Timeseries could make sense too. TopN, I'm not as sure about, because the fact that offset needs to bump up limit means that it will start to increase memory use pretty quickly as you go to the later pages.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10233: Add "offset" parameter to the Scan query.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10233:
URL: https://github.com/apache/druid/pull/10233#issuecomment-668227398


   > This pull request **introduces 1 alert** when merging [b3f4f83](https://github.com/apache/druid/commit/b3f4f83e497c78cbab646f058b3bc13028cb75f4) into [34a4113](https://github.com/apache/druid/commit/34a41137522fca463c6af1df5edb2f41297491f5) - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-1db408ed53e602c8c1de7bd00e4dbe65b7e47bbd)
   > 
   > **new alerts:**
   > 
   >     * 1 for Inconsistent equals and hashCode
   
   This is intentional, btw. I added a suppression for the IntelliJ inspection; I'm not sure yet how to suppress the LGTM check.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org