You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/21 08:53:22 UTC

[GitHub] [iceberg] uncleGen opened a new pull request #2717: fix inconsistent case type of field name

uncleGen opened a new pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717


   In #2053, we made sure that queries should be case insensitive. But the local cache `NAME_MAP_CACHE` in `GenericRecord`  is not case insensitive accordingly. In internal processing, querying column or field name is converted into lower case. When get field data from `Record` by low lower case name, It will return `null`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#discussion_r655798868



##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       There is precedence elsewhere for using the treemap comparator as mentioned by @marton-bod.
   
   The times that we already have `CaseInsensitiveMap` are usually using the `CaseInsensitiveMap` from spark (where it wouldn't make sense to use TreeMap and a Comparator in such a way in Scala).
   
   If we do add a `CaseInsensitiveMap`, could it potentially be made a little simpler, maybe even just a utility that returns the above (though I agree, the above inlined is the canonical java answer, such as when I google it etc... there are some scattered implementations but not in any libraries that we already shade such as guava).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#discussion_r655343274



##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       OK, I will consider using your suggestion after this commit makes sense to other reviewers. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#issuecomment-869867090


   This is a correctness problem. The contract for name-based methods is to match name exactly. Without a proposal for changing to case insensitive, I don't think this is a good idea. I'm going to close this PR. If you'd like to pursue this, I recommend creating a case insensitive record implementation or discussing what the behavior should be in a wider context. Simply changing the behavior is not a good idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#discussion_r655343274



##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       Make sense, I will consider using your suggestion after this commit can be accept. 

##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       OK, I will consider using your suggestion after this commit makes sense to other reviewers. 

##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       got it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#issuecomment-869867090


   This is a correctness problem. The contract for name-based methods is to match name exactly. Without a proposal for changing to case insensitive, I don't think this is a good idea. I'm going to close this PR. If you'd like to pursue this, I recommend creating a case insensitive record implementation or discussing what the behavior should be in a wider context. Simply changing the behavior is not a good idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#discussion_r655343274



##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       Make sense, I will consider using your suggestion after this commit can be accept. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#discussion_r655268048



##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       Could we just use case insensitive strings as keys instead of creating a new map implementation?
   e.g. simply using a treemap with a comparator `Map<String, Integer> idToPos = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);` could work I think




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#discussion_r655798868



##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       There is precedence elsewhere for using the treemap comparator as mentioned by @marton-bod.
   
   The times that we already have `CaseInsensitiveMap` are usually using the `CaseInsensitiveMap` from spark (where it wouldn't make sense to use TreeMap and a Comparator in such a way in Scala).
   
   If we do add a `CaseInsensitiveMap`, could it potentially be made a little simpler, maybe even just a utility that returns the above (though I agree, the above inlined is the canonical java answer, such as when I google it etc... there are some scattered implementations but not in any libraries that we already shade such as guava).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#discussion_r655831931



##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       got it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717#discussion_r655268048



##########
File path: core/src/main/java/org/apache/iceberg/data/GenericRecord.java
##########
@@ -28,16 +28,16 @@
 import org.apache.iceberg.StructLike;
 import org.apache.iceberg.relocated.com.google.common.base.Objects;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.types.Types.StructType;
+import org.apache.iceberg.util.CaseInsensitiveMap;
 
 public class GenericRecord implements Record, StructLike {
   private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE =
       Caffeine.newBuilder()
       .weakKeys()
       .build(struct -> {
-        Map<String, Integer> idToPos = Maps.newHashMap();
+        Map<String, Integer> idToPos = new CaseInsensitiveMap<>();

Review comment:
       Could we just use case insensitive strings as keys instead of creating a new map implementation?
   e.g. simply using a treemap with a comparator `Map<String, Integer> idToPos = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);` could work I think




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #2717: fix inconsistent case type of field name

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #2717:
URL: https://github.com/apache/iceberg/pull/2717


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org