You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/05/11 04:45:37 UTC

[GitHub] [druid] clintropolis opened a new pull request, #14236: add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays

clintropolis opened a new pull request, #14236:
URL: https://github.com/apache/druid/pull/14236

   ### Description
   This function adds a new `ARRAY_TO_MV` function which provides the opposite behavior of `MV_TO_ARRAY` to allow coercing any type of `ARRAY` into a plain `VARCHAR`. The intention of this function is to assist in the process of migrating tables that currently use native multi-value `STRING` columns to `ARRAY<STRING>`. 
   
   `ARRAY_TO_MV` allows cluster operators to begin ingesting true native `ARRAY` type columns, but can use this function to wrap these array types to use as a crutch to make them exhibit classic multi-value dimension (implicit unnest when grouping, etc) so that applications using Druid can continue with their current behavior. As the apps are updated to properly begin querying these columns with `ARRAY` semantics, existing MVD columns can instead be switched to use `MV_TO_ARRAY`, which has been modified to now accept `ARRAY` types to pass through as `ARRAY<STRING>`. Note that I didn't change the `MV_TO_ARRAY` of casting to `ARRAY<STRING>`, so only MVDs and the `ARRAY<STRING>` type they have become should be wrapped with this function. Other types of `ARRAY` when used with `MV_TO_ARRAY` will be coerced to `ARRAY<STRING>`, which likely isn't desirable behavior.
   
   Right now `ARRAY_TO_MV` exists purely as an expression virtual column, as a follow-up I will be adding a dedicated virtual column for both `ARRAY_TO_MV` and `MV_TO_ARRAY` so that coercing in either direction can be nearly as performant as the native column usage, including supporting indexes for fast filtering. Making migration not come with a penalty which is very important longer term.
   
   Deferring release notes specifically for this in favor of this being part of a larger section on mvd -> array migration in 27 release.
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #14236: add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on PR #14236:
URL: https://github.com/apache/druid/pull/14236#issuecomment-1543552218

   >I'm concerned about how good of a shim layer this really is given that it comes with a performance penalty. Likely, even with this, it's not really recommended to switch forward yet because of the performance penalty (or fear of one).
   
   >And, between the work of adding the virtual column versus doing an exhaustive validation that there isn't really a significant performance penalty, I think we should spend time on adding the virtual column and not try to quantify how much performance penalty comes from doing this as an expression.
   
   Totally agree, I plan to add specialized virtual column implementation for this function and also `MV_TO_ARRAY` asap, though this function would need to exist anyway, so this is primarily to start experimenting with an MVD to ARRAY migration path while I keep working on that stuff.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis closed pull request #14236: add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis closed pull request #14236: add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays
URL: https://github.com/apache/druid/pull/14236


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #14236: add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #14236:
URL: https://github.com/apache/druid/pull/14236#discussion_r1189122859


##########
sql/src/test/java/org/apache/druid/sql/avatica/DruidAvaticaHandlerTest.java:
##########
@@ -1260,6 +1260,26 @@
     }
   }
 
+  @Test
+  public void testNamedParameterBinding() throws SQLException
+  {
+    try (PreparedStatement statement = client.prepareStatement("SELECT COUNT(*) AS cnt FROM druid.foo WHERE dim1 = :val OR dim1 = :otherval")) {
+      statement.setString(1, "abc");
+      statement.setString(2, "def");
+      Map<String, Object> namedParamMap = new HashMap<>();

Review Comment:
   ## Container contents are never accessed
   
   The contents of this container are never accessed.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4933)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #14236: add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis merged PR #14236:
URL: https://github.com/apache/druid/pull/14236


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on pull request #14236: add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays

Posted by "abhishekagarwal87 (via GitHub)" <gi...@apache.org>.
abhishekagarwal87 commented on PR #14236:
URL: https://github.com/apache/druid/pull/14236#issuecomment-1541340362

   I am surprised that you didn't name this function `cool_arrays_to_sad_mvs`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #14236: add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14236:
URL: https://github.com/apache/druid/pull/14236#discussion_r1189159753


##########
sql/src/test/java/org/apache/druid/sql/avatica/DruidAvaticaHandlerTest.java:
##########
@@ -1260,6 +1260,26 @@
     }
   }
 
+  @Test
+  public void testNamedParameterBinding() throws SQLException
+  {
+    try (PreparedStatement statement = client.prepareStatement("SELECT COUNT(*) AS cnt FROM druid.foo WHERE dim1 = :val OR dim1 = :otherval")) {
+      statement.setString(1, "abc");
+      statement.setString(2, "def");
+      Map<String, Object> namedParamMap = new HashMap<>();

Review Comment:
   accidentally pushed an experiment that i never meant to commit 😅 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org