You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/21 00:43:20 UTC

[GitHub] [spark] dongjoon-hyun opened a new pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

dongjoon-hyun opened a new pull request #29172:
URL: https://github.com/apache/spark/pull/29172


   …
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661609307


   Thank you for the review with the alternative.
   Also, cc @gatorsmile 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661654854


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661648937






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661632212


   Thank you, @HyukjinKwon .
   
   @cloud-fan and @HyukjinKwon . I updated Scala-2.13 code consistently with the same code logically.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661504976


   We need to fix this fundamental bug first.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661616433


   For @HyukjinKwon  comments,
   - `CaseInsensitiveMap` is already using modified `originalMap` at `-` operation. Technically, `originalMap` is immutable, but  `+` operation can modify it too in the same reason. In short, both `-` and `+` operation should return the desirable map object for case-insensitivity. Also, this PR makes the class more consistent.
   - In general, `DataFrameReader` [example](https://github.com/apache/spark/pull/29172#issuecomment-661604611) is orthogonal to this PR because this PR is focusing on `CaseInsensitiveMap` only.
   - For the example, I believe it's not a good practice. For that alternative approach, we need to add an implicit assumption into the Apache Spark code-base. And, it'll be fragile when we add a new code.
       - We should warn that `.toMap` should not be used always.
       - All concatenation should start with `collection.immutable.ListMap.empty[String, String]` always.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661616433


   For @HyukjinKwon  comments,
   - `CaseInsensitiveMap` is already modifying `originalMap` at `-` operation. Technically, `originalMap` is immutable, but  `+` operation can modify it too in the same reason. In short, both `-` and `+` operation should return the desirable map object for case-insensitivity. Also, this PR makes the class more consistent.
   - In general, `DataFrameReader` [example](https://github.com/apache/spark/pull/29172#issuecomment-661604611) is orthogonal to this PR because this PR is focusing on `CaseInsensitiveMap` only.
   - For the example, I believe it's not a good practice. For that alternative approach, we need to add an implicit assumption into the Apache Spark code-base. And, it'll be fragile when we add a new code.
       - We should warn that `.toMap` should not be used always.
       - All concatenation should start with `collection.immutable.ListMap.empty[String, String]` always.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661637178


   I thought it's better to be safe always for `originalMap` to contain the original map as is, and now it's broken. `originalMap` is an exposed property in `CaseInsensitiveMap`.
   
   Now once you combine `CaseInsensitiveMap` with another entry, e.g. `caseInsensitiveMap + ("option" -> 1)`, the `originalMap` removes the duplicates such as `Option`, `oPtion`, etc.
   
   Maybe I am too much cautious about it. I agree that it's not going to cause an external problem for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661654861


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126219/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661648937






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661503811


   **[Test build #126209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126209/testReport)** for PR 29172 at commit [`27bd6d5`](https://github.com/apache/spark/commit/27bd6d55ff85e4deecbb36eb04185382cca256d3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661637178


   I thought it's better to be safe to always for `originalMap` to contain the original map as is, and now it's broken. `originalMap` is an exposed property in `CaseInsensitiveMap`.
   
   Now once you combine `CaseInsensitiveMap` with another entry, e.g. `(caseInsensitiveMap + ("option" -> 1)`, the `originalMap` removes the duplicates such as `Option`, `oPtion`,etc.
   
   Maybe I am too much cautious about it. I agree that it's not going to cause an external problem for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29172:
URL: https://github.com/apache/spark/pull/29172#discussion_r457773370



##########
File path: sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala
##########
@@ -40,7 +40,7 @@ class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Ma
     keyLowerCasedMap.contains(k.toLowerCase(Locale.ROOT))
 
   override def +[B1 >: T](kv: (String, B1)): Map[String, B1] = {
-    new CaseInsensitiveMap(originalMap + kv)
+    new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(kv._1)) + kv)

Review comment:
       Please note that this is basically reusing the same logic for `deletion` at line 49.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661637178


   I thought it's better to be safe to always for `originalMap` to contain the original map as is, and now it's broken. `originalMap` is an exposed property in `CaseInsensitiveMap`.
   
   Now once you combine `CaseInsensitiveMap` with another entry, e.g. `(caseInsensitiveMap + ("option" -> 1)`, the `originalMap` removes the duplicates such as `Option`, `oPtion`,etc.
   
   Maybe I am too much cautious about it. I agree that it's not going to cause a problem for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661632576


   **[Test build #126219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126219/testReport)** for PR 29172 at commit [`a6dc25d`](https://github.com/apache/spark/commit/a6dc25dcacd34b315bf67d8bb1d28e52f2dec3bb).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661628411


   Thank you, @cloud-fan . BTW, hold on one second. I found that I need to update Scala-2.13 code path~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661604611


   Okay, I realised that why it's tricky. My concern about this patch is, `originalMap` will lose the duplicated instances when it adds now.
   
   I came up with a bit different approach with this:
   
   ```diff
   diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
   index a7b3d08ec4d..0d8691e3f05 100644
   --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
   +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
   @@ -238,7 +238,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
            Some("paths" -> objectMapper.writeValueAsString(paths.toArray))
          }
   
   -      val finalOptions = sessionOptions ++ extraOptions.toMap ++ pathsOption
   +      val finalOptions = collection.immutable.ListMap.empty[String, String] ++
   +        sessionOptions ++ extraOptions ++ pathsOption
          val dsOptions = new CaseInsensitiveStringMap(finalOptions.asJava)
          val (table, catalog, ident) = provider match {
            case _: SupportsCatalogOptions if userSpecifiedSchema.nonEmpty =>
   @@ -276,7 +277,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
            paths = paths,
            userSpecifiedSchema = userSpecifiedSchema,
            className = source,
   -        options = extraOptions.toMap).resolveRelation())
   +        options = collection.immutable.ListMap.empty[String, String] ++ extraOptions
   +      ).resolveRelation())
      }
   
      /**
   @@ -361,7 +363,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
          connectionProperties: Properties): DataFrame = {
        assertNoSpecifiedSchema("jdbc")
        // connectionProperties should override settings in extraOptions.
   -    val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
   +    val params = collection.immutable.ListMap.empty[String, String] ++
   +      extraOptions ++ connectionProperties.asScala.toMap
        val options = new JDBCOptions(url, table, params)
        val parts: Array[Partition] = predicates.zipWithIndex.map { case (part, i) =>
          JDBCPartition(part, i) : Partition
   @@ -499,7 +502,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
       */
      def json(jsonDataset: Dataset[String]): DataFrame = {
        val parsedOptions = new JSONOptions(
   -      extraOptions.toMap,
   +      collection.immutable.ListMap.empty[String, String] ++ extraOptions,
          sparkSession.sessionState.conf.sessionLocalTimeZone,
          sparkSession.sessionState.conf.columnNameOfCorruptRecord)
   
   @@ -553,7 +556,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
       */
      def csv(csvDataset: Dataset[String]): DataFrame = {
        val parsedOptions: CSVOptions = new CSVOptions(
   -      extraOptions.toMap,
   +      collection.immutable.ListMap.empty[String, String] ++ extraOptions,
          sparkSession.sessionState.conf.csvColumnPruning,
          sparkSession.sessionState.conf.sessionLocalTimeZone)
        val filteredLines: Dataset[String] =
   @@ -879,6 +882,6 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
   
      private var userSpecifiedSchema: Option[StructType] = None
   
   -  private val extraOptions = new scala.collection.mutable.HashMap[String, String]
   +  private val extraOptions = new scala.collection.mutable.LinkedHashMap[String, String]
   
    }
   ```
   
   I checked the related codes from `DataFrameReader`, and seems this path working at this moment.
   
   But there is another problem in my patch: it introduces `ListMap.empty + map` pattern whereas the signature within Spark is `Map`. This could cause many corner cases and it's error-prone in the future.
   
   cc @cloud-fan @maropu @viirya. Do you guys have some thoughts on this?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661616433


   For @HyukjinKwon  comments,
   - `CaseInsensitiveMap` is already using modified `originalMap` at `-` operation. Technically, `originalMap` is still immutable, and  `+` operation can use modified `originalMap` too in the same reason. In short, both `-` and `+` operation should return the desirable map object for case-insensitivity with this kind of `indeterministic` behavior. Also, this PR makes the class more consistent.
   - In general, `DataFrameReader` [example](https://github.com/apache/spark/pull/29172#issuecomment-661604611) is orthogonal to this PR because this PR is focusing on `CaseInsensitiveMap` only.
   - For the example, I believe it's not a good practice. For that alternative approach, we need to add an implicit assumption into the Apache Spark code-base. And, it'll be fragile when we add a new code.
       - We should warn that `.toMap` should not be used always.
       - All concatenation should start with `collection.immutable.ListMap.empty[String, String]` always.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661504894






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661627579


   I think this patch is easier to reason about, and we should have done it in the first place.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661654854






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661605706


   If `originalMap` doesn't matter much, I think It's good enough to go with the current PR.
   If `originalMap` matters, we should think about a different approach but can't come up with a good idea. One idea is https://github.com/apache/spark/pull/29172#issuecomment-661604611 but it has downsides such as being error-prone.
   Another possible approach is to change signature to something like `mutable.LinkedHashMap` or `immutable.ListMap` where `Map` is used in Spark but seems that'd be pretty invasive and a big change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661632576


   **[Test build #126219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126219/testReport)** for PR 29172 at commit [`a6dc25d`](https://github.com/apache/spark/commit/a6dc25dcacd34b315bf67d8bb1d28e52f2dec3bb).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661637178


   I thought it's better to be safe always for `originalMap` to contain the original map as is, and now it's broken. `originalMap` is an exposed property in `CaseInsensitiveMap`.
   
   Now once you combine `CaseInsensitiveMap` with another entry, e.g. `(caseInsensitiveMap + ("option" -> 1)`, the `originalMap` removes the duplicates such as `Option`, `oPtion`, etc.
   
   Maybe I am too much cautious about it. I agree that it's not going to cause an external problem for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #29172:
URL: https://github.com/apache/spark/pull/29172


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29172:
URL: https://github.com/apache/spark/pull/29172#discussion_r457834857



##########
File path: sql/catalyst/src/main/scala-2.13/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala
##########
@@ -40,7 +40,7 @@ class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Ma
     keyLowerCasedMap.contains(k.toLowerCase(Locale.ROOT))
 
   override def updated[B1 >: T](key: String, value: B1): Map[String, B1] = {
-    new CaseInsensitiveMap[B1](originalMap + (key -> value))
+    new CaseInsensitiveMap[B1](originalMap.filter(!_._1.equalsIgnoreCase(key)) + (key -> value))

Review comment:
       This one is also reusing the logic at `removed` at line 49. I also verified that Scala-2.13 compilation.
   
   For now, Scala-2.13 testing is not possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661654645


   **[Test build #126219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126219/testReport)** for PR 29172 at commit [`a6dc25d`](https://github.com/apache/spark/commit/a6dc25dcacd34b315bf67d8bb1d28e52f2dec3bb).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661616433


   For @HyukjinKwon  comments,
   - `CaseInsensitiveMap` is already using modified `originalMap` at `-` operation. Technically, `originalMap` is still immutable, and  `+` operation can use modified `originalMap` too in the same reason. In short, both `-` and `+` operation should return the desirable map object for case-insensitivity with this kind of `indeterministic` behavior. Also, this PR makes the `CaseInsensitiveMap` class more consistent.
   - In general, `DataFrameReader` [example](https://github.com/apache/spark/pull/29172#issuecomment-661604611) is orthogonal to this PR because this PR is focusing on `CaseInsensitiveMap` only.
   - For the example, I believe it's not a good practice. For that alternative approach, we need to add an implicit assumption into the Apache Spark code-base. And, it'll be fragile when we add a new code.
       - We should warn that `.toMap` should not be used always.
       - All concatenation should start with `collection.immutable.ListMap.empty[String, String]` always.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661632836






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661639361


   Thank you all, @cloud-fan , @HyukjinKwon , @viirya , @maropu .
   I'll merge this PR because the first PR already passed GitHub Action. The second commit is only about `scala-2.13` which is not complied in Jenkins/GitHub Action currently. Its compilation is verified locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661648372


   **[Test build #126209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126209/testReport)** for PR 29172 at commit [`27bd6d5`](https://github.com/apache/spark/commit/27bd6d55ff85e4deecbb36eb04185382cca256d3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661632836






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661639927


   Merged to master/3.0. I'll make a backporting PR for `branch-2.4`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661504894






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661635735


   > originalMap will lose the duplicated instances when it adds now.
   
   Not sure why this is a concern? I think `originalMap` is immutable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661616433


   For @HyukjinKwon  comments,
   - `CaseInsensitiveMap` is already modifying `originalMap` at `-` operation. `originalMap` is not immutable already. I believe `+` operation can modify it too in the same reason. Both `-` and `+` operation returns the desirable `CaseInsensitiveMap`.
   - In general, `DataFrameReader` [example](https://github.com/apache/spark/pull/29172#issuecomment-661604611) is orthogonal to this PR because this PR is focusing on `CaseInsensitiveMap` only.
   - For the example, I believe it's not a good practice. For that alternative approach, we need to add an implicit assumption into the Apache Spark code-base. And, it'll be fragile when we add a new code.
       - We should warn that `.toMap` should not be used always.
       - All concatenation should start with `collection.immutable.ListMap.empty[String, String]` always.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661616433


   For @HyukjinKwon  comments,
   - `CaseInsensitiveMap` is already using modified `originalMap` at `-` operation. Technically, `originalMap` is still immutable, and  `+` operation can use modified `originalMap` too in the same reason. In short, both `-` and `+` operation should return the desirable map object for case-insensitivity. Also, this PR makes the class more consistent.
   - In general, `DataFrameReader` [example](https://github.com/apache/spark/pull/29172#issuecomment-661604611) is orthogonal to this PR because this PR is focusing on `CaseInsensitiveMap` only.
   - For the example, I believe it's not a good practice. For that alternative approach, we need to add an implicit assumption into the Apache Spark code-base. And, it'll be fragile when we add a new code.
       - We should warn that `.toMap` should not be used always.
       - All concatenation should start with `collection.immutable.ListMap.empty[String, String]` always.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29172: [SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for addition

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29172:
URL: https://github.com/apache/spark/pull/29172#issuecomment-661503811


   **[Test build #126209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126209/testReport)** for PR 29172 at commit [`27bd6d5`](https://github.com/apache/spark/commit/27bd6d55ff85e4deecbb36eb04185382cca256d3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org