You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/08/13 16:50:08 UTC

[GitHub] [iceberg] rdblue opened a new pull request #1336: Fix schema name conflicts

rdblue opened a new pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336


   Iceberg indexes schemas by name to look up fields, but uses short names by omitting "element" and "value" names when a list or map has a struct element or value. For example, a list, `locations`, of structs `struct<lat: double, long: double>` will use names `locations.lat` instead of `locations.element.lat`.
   
   This works most of the time, but can lead to conflicts when a map value contains a field name `key`. In that case, the schema is rejected with a failure message like this: `ValidationException: Invalid schema: multiple fields for name some_map.key: 146 and 144`
   
   In addition, Spark passes names through `alterTable` that include the `value` and `element` names that are omitted.
   
   This PR fixes the problem by keeping track of two names, the full name with `element` and `value`, and secondary "short" names that omit them. Indexing now uses all of the full names and adds any short names that are not ambiguous.
   
   This change also requires indexing IDs separately rather than using a `BiMap` because IDs can have multiple names, which is not valid when inverting the `BiMap`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#issuecomment-674164207


   Ah that makes sense to me, I thought these identifiers were internal only,
   but it makes much more sense to me now!
   
   On Fri, Aug 14, 2020 at 11:45 AM Ryan Blue <no...@github.com> wrote:
   
   > Thanks for reviewing, @RussellSpitzer <https://github.com/RussellSpitzer>!
   > I'll fix those issues.
   >
   > The short names are more obvious because they don't contain a hard-coded
   > string that users need to know about. Consider the schema from the
   > description, locations list<struct<lat: double, long: double>>. How does
   > someone know to add element to reference locations.element.lat? I think
   > it's more natural to allow referencing that field as locations.lat, as
   > long as it doesn't conflict with other names.
   >
   > We've also already supported these names, so I think we should continue to
   > support them. But adding the full names removes ambiguity introduced by the
   > short names.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/pull/1336#issuecomment-674163196>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AADE2YO6LJS2MP3DFDJC6LTSAVS3DANCNFSM4P6UNE3A>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r482521323



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,64 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  public Map<String, Integer> byName() {

Review comment:
       I added canonical here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r479702671



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,64 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  public Map<String, Integer> byName() {

Review comment:
       I think if the doc commented read something like 
   "Short names for maps.... For example.... will produce short name 'l.x' in addition to **the canonical long name** 'l.element.x'"
   
   That small addition in the doc comment would clarify things pretty heavily for me, but I might be biased as I've read the explanation of short vs long names now. Overall I am in favor of the short names though as they feel much more natural over `l.element.x`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#issuecomment-673875360


   I'm new to this but why are we shortening names at all? Were we running into some really long strings? Seems like we could just fix this by using the full names?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#issuecomment-674163196


   Thanks for reviewing, @RussellSpitzer! I'll fix those issues.
   
   The short names are more obvious because they don't contain a hard-coded string that users need to know about. Consider the schema from the description, `locations list<struct<lat: double, long: double>>`. How does someone know to add `element` to reference `locations.element.lat`? I think it's more natural to allow referencing that field as `locations.lat`, as long as it doesn't conflict with other names.
   
   We've also already supported these names, so I think we should continue to support them. But adding the full names removes ambiguity introduced by the short names.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r470402797



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -128,9 +157,15 @@ private void addField(String name, int fieldId) {
     }
 
     Integer existingFieldId = nameToId.put(fullName, fieldId);
-    if (existingFieldId != null && !"element".equals(name) && !"value".equals(name)) {
-      throw new ValidationException(
-          "Invalid schema: multiple fields for name %s: %s and %s", fullName, existingFieldId, fieldId);
+    ValidationException.check(existingFieldId == null,
+        "Invalid schema: multiple fields for name %s: %s and %s", fullName, existingFieldId, fieldId);
+
+    // if the short name is not

Review comment:
       unfinished comment?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r470403915



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -128,9 +157,15 @@ private void addField(String name, int fieldId) {
     }
 
     Integer existingFieldId = nameToId.put(fullName, fieldId);
-    if (existingFieldId != null && !"element".equals(name) && !"value".equals(name)) {
-      throw new ValidationException(
-          "Invalid schema: multiple fields for name %s: %s and %s", fullName, existingFieldId, fieldId);
+    ValidationException.check(existingFieldId == null,
+        "Invalid schema: multiple fields for name %s: %s and %s", fullName, existingFieldId, fieldId);
+
+    // if the short name is not

Review comment:
       If the short name is not empty and we haven't already added an identical short name?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r479702671



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,64 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  public Map<String, Integer> byName() {

Review comment:
       I think if the doc commented read something like `Short names for maps.... For example.... will produce short name 'l.x' in addition to **the canonical long name** 'l.element.x'`
   
   That small addition in the doc comment would clarify things pretty heavily for me, but I might be biased as I've read the explanation of short vs long names now. Overall I am in favor of the short names though as they feel much more natural over `l.element.x`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r470404899



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,64 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  public Map<String, Integer> byName() {

Review comment:
       I think it would be helpful to have a little comment here explaining the mapping of both Long and Short Names in the returned map but mostly because I don't understand the short names yet. They may be more obvious to a more experienced reader of this code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r474938283



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -128,9 +157,15 @@ private void addField(String name, int fieldId) {
     }
 
     Integer existingFieldId = nameToId.put(fullName, fieldId);
-    if (existingFieldId != null && !"element".equals(name) && !"value".equals(name)) {
-      throw new ValidationException(
-          "Invalid schema: multiple fields for name %s: %s and %s", fullName, existingFieldId, fieldId);
+    ValidationException.check(existingFieldId == null,
+        "Invalid schema: multiple fields for name %s: %s and %s", fullName, existingFieldId, fieldId);
+
+    // if the short name is not

Review comment:
       If this is a nested field. For top-level fields, the short name is the same as the canonical name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r470405016



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,64 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  public Map<String, Integer> byName() {
+    ImmutableMap.Builder<String, Integer> builder = ImmutableMap.builder();
+    builder.putAll(nameToId);
+    // add all short names that do not conflict with canonical names
+    shortNameToId.entrySet().stream()
+        .filter(entry -> !nameToId.containsKey(entry.getKey()))
+        .forEach(builder::put);
+    return builder.build();
+  }
+
+  public Map<Integer, String> byId() {

Review comment:
       I think here it may also be good to note this isn't a bijection anymore and short names will not be returned by this function, only long names.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r482548061



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,79 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  /**
+   * Returns a mapping from full field name to ID.
+   * <p>
+   * Short names for maps and lists are included for any name that does not conflict with a canonical name. For example,
+   * a list, 'l', with of structs with field 'x' will produce short name 'l.x' in addition to 'l.element.x'.

Review comment:
       Fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
fbocse commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r482294477



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,79 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  /**
+   * Returns a mapping from full field name to ID.
+   * <p>
+   * Short names for maps and lists are included for any name that does not conflict with a canonical name. For example,
+   * a list, 'l', with of structs with field 'x' will produce short name 'l.x' in addition to 'l.element.x'.

Review comment:
       `a list, 'l', with of structs...` or just `a list, 'l', of structs ...`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r479702671



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,64 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  public Map<String, Integer> byName() {

Review comment:
       I think if the doc commented read something like `Short names for maps.... For example.... will produce short name 'l.x' in addition to _the canonical long name_ 'l.element.x'`
   
   That small addition in the doc comment would clarify things pretty heavily for me, but I might be biased as I've read the explanation of short vs long names now. Overall I am in favor of the short names though as they feel much more natural over `l.element.x`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r479702671



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,64 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  public Map<String, Integer> byName() {

Review comment:
       I think if the doc commented read something like `Short names for maps.... For example.... will produce short name 'l.x' in addition to the canonical long name 'l.element.x'`
   
   That small addition in the doc comment would clarify things pretty heavily for me, but I might be biased as I've read the explanation of short vs long names now. Overall I am in favor of the short names though as they feel much more natural over `l.element.x`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1336: Fix schema name conflicts

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1336:
URL: https://github.com/apache/iceberg/pull/1336#discussion_r479703009



##########
File path: api/src/main/java/org/apache/iceberg/types/IndexByName.java
##########
@@ -25,39 +25,64 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.exceptions.ValidationException;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.relocated.com.google.common.collect.Maps;
 
 public class IndexByName extends TypeUtil.SchemaVisitor<Map<String, Integer>> {
   private static final Joiner DOT = Joiner.on(".");
 
   private final Deque<String> fieldNames = Lists.newLinkedList();
+  private final Deque<String> shortFieldNames = Lists.newLinkedList();
   private final Map<String, Integer> nameToId = Maps.newHashMap();
+  private final Map<String, Integer> shortNameToId = Maps.newHashMap();
+
+  public Map<String, Integer> byName() {
+    ImmutableMap.Builder<String, Integer> builder = ImmutableMap.builder();
+    builder.putAll(nameToId);
+    // add all short names that do not conflict with canonical names
+    shortNameToId.entrySet().stream()
+        .filter(entry -> !nameToId.containsKey(entry.getKey()))
+        .forEach(builder::put);
+    return builder.build();
+  }
+
+  public Map<Integer, String> byId() {

Review comment:
       It does somewhat seem implied that this isn't a bijection (and that long names are returned) via `Canonical names, not short names are returned, for example 'list.element.field' instead of 'list.field'.`
   
   Perhaps it might be specifically useful to call this out in the `@return` via something like `@return A  map from field ID to the long / canonical field name.`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org