You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "MaxGekk (via GitHub)" <gi...@apache.org> on 2023/08/18 19:24:01 UTC

[GitHub] [spark] MaxGekk opened a new pull request, #42564: [WIP][SPARK-44840][SQL] Make array_insert 1-based for negative indexes

MaxGekk opened a new pull request, #42564:
URL: https://github.com/apache/spark/pull/42564

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srielau commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "srielau (via GitHub)" <gi...@apache.org>.
srielau commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1300174130


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   The behavior is inconsistent with itself.
   IFF array_insert() and (other functions) were 0-based then yes it would make sense that 1 is the second element and -1 the second last element.
   So to fix this function we either have to make it 0 based or change the behavior for negative indices.
   Against going 0-based speaks:
   1. 1-based was and remains a conscious decision
   2. We can agree (I think) that negative indices are less common as opposed to positive indices
   3. Our own docs actually don't have negative index examples and the instructions in the docs are ambiguous.
   
   Lastly the function is relatively new, the blast radius is limited.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #42564: [SPARK-44840][SQL] Make array_insert 1-based for negative indexes

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #42564:
URL: https://github.com/apache/spark/pull/42564#issuecomment-1686030237

   @cloud-fan @peter-toth @srielau @srowen Could you review this PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srielau commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "srielau (via GitHub)" <gi...@apache.org>.
srielau commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1300174130


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   The behavior is inconsistent with itself.
   IFF array_insert() and (other functions) were 0-based then yes it would make sense that 1 is the second element and -1 the second last element.
   So to fix this function we either have to make it 0 based or change the behavior for negative indices.
   Against going 0-based speaks:
   1. 1-based was and remains a conscious decision
   2. We can agree (I think) that negative indices are less common as opposed to positive indices
   3. Our own docs actually don't have negative index examples and the instructions in the docs are ambiguous.
   
   Lastly the function is relatively new, the blast radius is limited.
   
   Re the proposed (existing)semantic "It goes 'before' the position" I don't think we have anything like that anywhere.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1300199429


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   Is this the argument: If I insert at position X, then afterwards, element_at X should be that new element?
   Then 0-based vs 1-based indices are not really relevant; as it happens, everyone agrees that -1 is the last element, not -0. If that's the argument then I agree.
   
   If the semantics are meant to be: insert before position X, then the current behavior is correct.
   
   If the semantics are meant to be: insert before X, or after X if X is negative (i.e. 'before' but in reverse). That doesn't sound right.
   
   The docs say 'at position X' which leads me to believe the first interpretation is what was meant, but it doesn't really relate to 0- or 1-based indices.
   
   I agree the blast radius is small, but that cuts both ways.
   
   Can we really say this is a correctness fix? You'd have to argue that everyone has expected the new 'correct' behavior, but were getting the old behavior. But both are plausible, and user code must actually rely on the current behavior now. 
   
   Is there some other context that makes this urgent for _3.5_ which is already at RC2?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1300199429


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   Is this the argument: If I insert at position X, then afterwards, element_at X should be that new element?
   Then 0-based vs 1-based indices are not really relevant; as it happens, everyone agrees that -1 is the last element, not -0. If that's the argument then I agree.
   
   If the semantics are meant to be: insert before position X, then the current behavior is correct.
   
   If the semantics are meant to be: insert before X, or after X if X is negative (i.e. 'before' but in reverse) then the change is also correct. But that doesn't sound right.
   
   The docs say 'at position X' which leads me to believe the first interpretation is what was meant, but it doesn't really relate to 0- or 1-based indices.
   
   I agree the blast radius is small, but that cuts both ways.
   
   Can we really say this is a correctness fix? You'd have to argue that everyone has expected the new 'correct' behavior, but were getting the old behavior. But both are plausible, and user code must actually rely on the current behavior now. 
   
   Is there some other context that makes this urgent for _3.5_ which is already at RC2?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1302006261


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   @srowen You are ok to merge it to branch-3.5, correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1300354938


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   Note: `array_insert` with position parameter 0 fails with `INVALID_INDEX_OF_ZERO`. It's nonsense that you can put an element at the beginning with position 1, but you can't put an element at the end with any negative position. This inconsistency is confusing and sometimes makes the function hard to use.
   
   The reason to fix it in 3.5 is to reduce the impact. We don't want to have one more release (3.5.0) that has the wrong behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes
URL: https://github.com/apache/spark/pull/42564


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1300359886


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   That's a good observation. Personally I buy the argument that the insert position should be where it can be retrieved subsequently. OK I can see the argument to change it in 3.5.0 to limit affected versions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1300065665


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   If we treat it as a correctness bug, then it's fine to fix the result by default, but providing a legacy config to fallback.
   
   To me, it does seem like a correctness bug, as `array_insert` uses 1-based positive index, but 0-based negative index.
   
   There are other examples as well: 
   ```
   Since Spark 3.3, the `histogram_numeric` function in Spark SQL returns an output type of an array of structs (x, y), where the type of the 'x' field in the return value is propagated from the input values consumed in the aggregate function. In Spark 3.2 or earlier, 'x' always had double type. Optionally, use the configuration `spark.sql.legacy.histogramNumericPropagateInputType` since Spark 3.3 to revert back to the previous behavior.
   
   In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.
   ```
   
   We don't wait for the major version bump to fix the wrong behaviors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1299996947


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   I don't think we can change behavior in a minor release. The default has to be to keep current behavior.
   The default could change in 4.0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on code in PR #42564:
URL: https://github.com/apache/spark/pull/42564#discussion_r1300100758


##########
docs/sql-migration-guide.md:
##########
@@ -28,6 +28,7 @@ license: |
 - Since Spark 3.5, Spark thrift server will interrupt task when canceling a running statement. To restore the previous behavior, set `spark.sql.thriftServer.interruptOnCancel` to `false`.
 - Since Spark 3.5, Row's json and prettyJson methods are moved to `ToJsonUtil`.
 - Since Spark 3.5, the `plan` field is moved from `AnalysisException` to `EnhancedAnalysisException`.
+- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.

Review Comment:
   As an aside - I'm still not convinced this is 'incorrect'. The current behavior matches Python's, for example. Is the only other example Snowflake, where inserting at -1 means 'end of list'? If insert's semantics are "insert before position x" then the current behavior seems correct: the last element is element -1, and it ends up in front of it. 
   
   What is the problem, what can't you do, what confusion does the current behavior cause? This has been the behavior for a while and doesn't seem to have been an issue.
   
   We can debate this, but, this is a behavior change, so the bar should be high - especially if suggesting this must happen in a minor version!
   
   The reason I think this is actually more problematic to change is:
   - _anyone_ using negative indices and array_insert will be broken by this change; you _have_ to rely on the 'wrong' behavior to use it, so are expecting it, unlike other correctness bugs
   - The change in behavior is potentially subtle; your query doesn't fail, it just silently starts returning different results, with numbers or strings in a different place than you intend
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #42564: [SPARK-44840][SQL] Make `array_insert()` 1-based for negative indexes

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #42564:
URL: https://github.com/apache/spark/pull/42564#issuecomment-1688673650

   Merging to master. The PR causes some conflicts in `branch-3.5`. I will open a separate PR for 3.5.
   Thank you @srielau @cloud-fan @srowen @peter-toth for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org