You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "stevenzwu (via GitHub)" <gi...@apache.org> on 2023/06/07 22:41:42 UTC

[GitHub] [iceberg] stevenzwu opened a new pull request, #7798: API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators

stevenzwu opened a new pull request, #7798:
URL: https://github.com/apache/iceberg/pull/7798

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1368952270


##########
api/src/main/java/org/apache/iceberg/SortKey.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * A struct of flattened sort field values.

Review Comment:
   let's assume the table schema is like
   
   Schema schema =
           new Schema(
               Types.NestedField.required(1, "id", Types.StringType.get()),
               Types.NestedField.optional(
                   2,
                   "location",
                   Types.StructType.of(
                       Types.NestedField.required(11, "lat", Types.FloatType.get()),
                       Types.NestedField.required(12, "long", Types.FloatType.get()))));
   
   if the SortKey is applied on `id` and `lat`, the `StructTransform` only produces a flattened tuple. nested structure is not maintained anymore.
   ```
     private final Object[] transformedTuple;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #7798: API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#issuecomment-1727119135

   > > @stevenzwu: Extracting the partition key from the row is on the hot path of writes. Based on the code I do not see any obvious issues, but it might worth to check the performance implications of this change?
   > 
   > @pvary there is no change in the logic. simple refactor with a common base class extracted. so I don't expect performance implications
   
   Previous code had 1 less redirection/inheritance. Optimized loops can be a thing in this type of code paths, where the compiler could deduce that the different iterations of the loops are independent, and could be executed in batch. Adding more redirection/inheritance could break the optimization algorithm and instead of a parallel loop we can end up executing the loop sequentially. This could cause performance degradation. This is the only effect which might effect other users of the classes touched by this change, otherwise I think the change keeps the old behaviour correctly and introduces a new one, which we need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7798: API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1222252549


##########
api/src/main/java/org/apache/iceberg/PartitionKey.java:
##########
@@ -18,72 +18,32 @@
  */
 package org.apache.iceberg;
 
-import java.io.Serializable;
-import java.lang.reflect.Array;
-import java.util.Arrays;
 import java.util.List;
-import java.util.function.Function;
-import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.util.SerializableFunction;
+import java.util.stream.Collectors;
 
 /**
  * A struct of partition values.
  *
  * <p>Instances of this class can produce partition values from a data row passed to {@link
  * #partition(StructLike)}.
  */
-public class PartitionKey implements StructLike, Serializable {
+public class PartitionKey extends StructTransform {

Review Comment:
   A lot of code has been moved to the non-public `StructTransform` class so that it can be reused for the new `SortKey` class



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370461928


##########
api/src/main/java/org/apache/iceberg/StructTransform.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.lang.reflect.Array;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Function;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.util.SerializableFunction;
+
+/**
+ * A struct of flattened transformed values.
+ *
+ * <p>Instances of this class can produce transformed values from a row passed to {@link
+ * #wrap(StructLike)}.
+ */
+class StructTransform implements StructLike, Serializable {
+
+  private final int size;
+  private final Accessor<StructLike>[] accessors;
+
+  @SuppressWarnings("rawtypes")
+  private final SerializableFunction[] transforms;
+
+  private final Object[] transformedTuple;
+
+  StructTransform(Schema schema, List<FieldTransform> fieldTransforms) {
+    Preconditions.checkArgument(fieldTransforms != null, "Invalid field transform list: null");
+
+    this.size = fieldTransforms.size();
+    this.accessors = (Accessor<StructLike>[]) Array.newInstance(Accessor.class, size);
+    this.transforms = new SerializableFunction[size];
+
+    for (int i = 0; i < size; ++i) {
+      int sourceFieldId = fieldTransforms.get(i).sourceFieldId();
+      Transform<?, ?> transform = fieldTransforms.get(i).transform();
+      Accessor<StructLike> accessor = schema.accessorForField(sourceFieldId);
+      Preconditions.checkArgument(
+          accessor != null, "Cannot build accessor for field: " + schema.findField(sourceFieldId));
+      this.accessors[i] = accessor;
+      this.transforms[i] = transform.bind(accessor.type());
+    }
+
+    this.transformedTuple = new Object[size];
+  }
+
+  StructTransform(StructTransform toCopy) {
+    this.size = toCopy.size;
+    this.accessors = toCopy.accessors;
+    this.transforms = toCopy.transforms;
+
+    this.transformedTuple = new Object[size];
+    System.arraycopy(toCopy.transformedTuple, 0, this.transformedTuple, 0, size);
+  }
+
+  public void wrap(StructLike row) {
+    for (int i = 0; i < transformedTuple.length; i += 1) {
+      Function<Object, Object> transform = transforms[i];
+      transformedTuple[i] = transform.apply(accessors[i].get(row));
+    }
+  }
+
+  @Override
+  public int size() {
+    return size;
+  }
+
+  @Override
+  public <T> T get(int pos, Class<T> javaClass) {
+    return javaClass.cast(transformedTuple[pos]);
+  }
+
+  @Override
+  public <T> void set(int pos, T value) {
+    transformedTuple[pos] = value;
+  }
+
+  @Override
+  public String toString() {
+    StringBuilder sb = new StringBuilder();
+    sb.append("[");
+    for (int i = 0; i < transformedTuple.length; i += 1) {
+      if (i > 0) {
+        sb.append(", ");
+      }
+      sb.append(transformedTuple[i]);
+    }
+    sb.append("]");
+    return sb.toString();
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    } else if (!(o instanceof StructTransform)) {
+      return false;
+    }
+
+    StructTransform that = (StructTransform) o;
+    return Arrays.equals(transformedTuple, that.transformedTuple);
+  }
+
+  @Override
+  public int hashCode() {
+    return Arrays.hashCode(transformedTuple);
+  }
+
+  /**
+   * Simple POJO for source field id and transform function. {@code Pair} class is not usable here
+   * in API module, as it has Avro dep and lives in core module.

Review Comment:
   as it has a Avro dep and is in the core module.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1368918819


##########
api/src/main/java/org/apache/iceberg/StructTransform.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.lang.reflect.Array;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Function;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.util.SerializableFunction;
+
+/**
+ * A struct of flattened transformed values.
+ *
+ * <p>Instances of this class can produce sort values from a data row passed to {@link

Review Comment:
   Typo, can produce "transformed values"? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1368916210


##########
api/src/main/java/org/apache/iceberg/SortKey.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * A struct of flattened sort field values.

Review Comment:
   not sure I understand flattened in this context



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7798: API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1222253224


##########
api/src/main/java/org/apache/iceberg/SortOrderComparators.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.lang.reflect.Array;
+import java.util.Comparator;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.types.Comparators;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class SortOrderComparators {
+  private SortOrderComparators() {}
+
+  /** Compare structs with the specified sort order projection */
+  public static Comparator<StructLike> forSchema(Schema schema, SortOrder sortOrder) {
+    Preconditions.checkArgument(sortOrder.isSorted(), "Invalid sort order: unsorted");
+    SortOrder.checkCompatibility(sortOrder, schema);
+    return new SortOrderComparator(schema, sortOrder);
+  }
+
+  /** Util method to chain sort direction and null order to the original comparator. */
+  private static Comparator<Object> sortFieldComparator(
+      Comparator<Object> original, SortField sortField) {
+    Comparator<Object> comparator = original;
+    if (sortField == null) {
+      return Comparators.nullsFirst().thenComparing(comparator);
+    }
+
+    if (sortField.direction() == SortDirection.DESC) {
+      comparator = comparator.reversed();
+    }
+
+    if (sortField.nullOrder() == NullOrder.NULLS_FIRST) {
+      comparator = Comparators.nullsFirst().thenComparing(comparator);
+    } else if (sortField.nullOrder() == NullOrder.NULLS_LAST) {
+      comparator = Comparators.nullsLast().thenComparing(comparator);
+    }
+
+    return comparator;
+  }
+
+  private static class SortOrderComparator implements Comparator<StructLike> {
+    private final SortKey leftKey;

Review Comment:
   included `SortOrderComparator` impl in this PR to demonstrate the usage of `SortKey`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370435653


##########
api/src/main/java/org/apache/iceberg/SortKey.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * A struct of flattened sort field values.
+ *
+ * <p>Instances of this class can produce sort values from a row passed to {@link
+ * #wrap(StructLike)}.
+ */
+public class SortKey extends StructTransform {
+  private final Schema schema;
+  private final SortOrder sortOrder;
+
+  public SortKey(Schema schema, SortOrder sortOrder) {
+    super(schema, fieldTransform(sortOrder));
+    this.schema = schema;
+    this.sortOrder = sortOrder;
+  }
+
+  private SortKey(SortKey toCopy) {
+    // only need deep copy inside StructTransform
+    super(toCopy);
+    this.schema = toCopy.schema;
+    this.sortOrder = toCopy.sortOrder;
+  }
+
+  public SortKey copy() {

Review Comment:
   This is the old method but we aren't actually copying here. Schema and Sort Order are not copies, just noting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370531354


##########
api/src/main/java/org/apache/iceberg/SortKey.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * A struct of flattened sort field values.
+ *
+ * <p>Instances of this class can produce sort values from a row passed to {@link
+ * #wrap(StructLike)}.
+ */
+public class SortKey extends StructTransform {
+  private final Schema schema;
+  private final SortOrder sortOrder;
+
+  public SortKey(Schema schema, SortOrder sortOrder) {
+    super(schema, fieldTransform(sortOrder));
+    this.schema = schema;
+    this.sortOrder = sortOrder;
+  }
+
+  private SortKey(SortKey toCopy) {
+    // only need deep copy inside StructTransform
+    super(toCopy);
+    this.schema = toCopy.schema;
+    this.sortOrder = toCopy.sortOrder;
+  }
+
+  public SortKey copy() {

Review Comment:
   You are right it is  not deep copy of everything. but it is a still `copy/clone` implementation of the `SortKey` that we deemed correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7798: API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1222253224


##########
api/src/main/java/org/apache/iceberg/SortOrderComparators.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.lang.reflect.Array;
+import java.util.Comparator;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.types.Comparators;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class SortOrderComparators {
+  private SortOrderComparators() {}
+
+  /** Compare structs with the specified sort order projection */
+  public static Comparator<StructLike> forSchema(Schema schema, SortOrder sortOrder) {
+    Preconditions.checkArgument(sortOrder.isSorted(), "Invalid sort order: unsorted");
+    SortOrder.checkCompatibility(sortOrder, schema);
+    return new SortOrderComparator(schema, sortOrder);
+  }
+
+  /** Util method to chain sort direction and null order to the original comparator. */
+  private static Comparator<Object> sortFieldComparator(
+      Comparator<Object> original, SortField sortField) {
+    Comparator<Object> comparator = original;
+    if (sortField == null) {
+      return Comparators.nullsFirst().thenComparing(comparator);
+    }
+
+    if (sortField.direction() == SortDirection.DESC) {
+      comparator = comparator.reversed();
+    }
+
+    if (sortField.nullOrder() == NullOrder.NULLS_FIRST) {
+      comparator = Comparators.nullsFirst().thenComparing(comparator);
+    } else if (sortField.nullOrder() == NullOrder.NULLS_LAST) {
+      comparator = Comparators.nullsLast().thenComparing(comparator);
+    }
+
+    return comparator;
+  }
+
+  private static class SortOrderComparator implements Comparator<StructLike> {
+    private final SortKey leftKey;

Review Comment:
   included `SortOrderComparator` impl in this PR to demonstrate the usage or `SortKey`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1368952270


##########
api/src/main/java/org/apache/iceberg/SortKey.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * A struct of flattened sort field values.

Review Comment:
   let's assume the table schema is like
   ```
   Schema schema =
           new Schema(
               Types.NestedField.required(1, "id", Types.StringType.get()),
               Types.NestedField.optional(
                   2,
                   "location",
                   Types.StructType.of(
                       Types.NestedField.required(11, "lat", Types.FloatType.get()),
                       Types.NestedField.required(12, "long", Types.FloatType.get()))));
   ```
   
   if the SortKey is applied on `id` and `lat`, the `StructTransform` only produces a flattened tuple. nested structure is not maintained anymore.
   ```
     private final Object[] transformedTuple;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu merged PR #7798:
URL: https://github.com/apache/iceberg/pull/7798


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #7798: API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#issuecomment-1717427087

   @stevenzwu: Extracting the partition key from the row is on the hot path of writes. Based on the code I do not see any obvious issues, but it might worth to check the performance implications of this change?
   
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on pull request #7798: API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#issuecomment-1726664529

   > @stevenzwu: Extracting the partition key from the row is on the hot path of writes. Based on the code I do not see any obvious issues, but it might worth to check the performance implications of this change?
   
   @pvary there is no change in the logic. simple refactor with a common base class extracted. so I don't expect performance  implications
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370524541


##########
api/src/main/java/org/apache/iceberg/StructTransform.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.lang.reflect.Array;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Function;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.util.SerializableFunction;
+
+/**
+ * A struct of flattened transformed values.
+ *
+ * <p>Instances of this class can produce transformed values from a row passed to {@link
+ * #wrap(StructLike)}.
+ */
+class StructTransform implements StructLike, Serializable {
+
+  private final int size;
+  private final Accessor<StructLike>[] accessors;
+
+  @SuppressWarnings("rawtypes")
+  private final SerializableFunction[] transforms;
+
+  private final Object[] transformedTuple;
+
+  StructTransform(Schema schema, List<FieldTransform> fieldTransforms) {
+    Preconditions.checkArgument(fieldTransforms != null, "Invalid field transform list: null");
+
+    this.size = fieldTransforms.size();
+    this.accessors = (Accessor<StructLike>[]) Array.newInstance(Accessor.class, size);
+    this.transforms = new SerializableFunction[size];
+
+    for (int i = 0; i < size; ++i) {
+      int sourceFieldId = fieldTransforms.get(i).sourceFieldId();
+      Transform<?, ?> transform = fieldTransforms.get(i).transform();
+      Accessor<StructLike> accessor = schema.accessorForField(sourceFieldId);
+      Preconditions.checkArgument(
+          accessor != null, "Cannot build accessor for field: " + schema.findField(sourceFieldId));
+      this.accessors[i] = accessor;
+      this.transforms[i] = transform.bind(accessor.type());
+    }
+
+    this.transformedTuple = new Object[size];
+  }
+
+  StructTransform(StructTransform toCopy) {
+    this.size = toCopy.size;
+    this.accessors = toCopy.accessors;
+    this.transforms = toCopy.transforms;
+
+    this.transformedTuple = new Object[size];
+    System.arraycopy(toCopy.transformedTuple, 0, this.transformedTuple, 0, size);
+  }
+
+  public void wrap(StructLike row) {
+    for (int i = 0; i < transformedTuple.length; i += 1) {
+      Function<Object, Object> transform = transforms[i];
+      transformedTuple[i] = transform.apply(accessors[i].get(row));
+    }
+  }
+
+  @Override
+  public int size() {
+    return size;
+  }
+
+  @Override
+  public <T> T get(int pos, Class<T> javaClass) {
+    return javaClass.cast(transformedTuple[pos]);
+  }
+
+  @Override
+  public <T> void set(int pos, T value) {
+    transformedTuple[pos] = value;
+  }
+
+  @Override
+  public String toString() {
+    StringBuilder sb = new StringBuilder();
+    sb.append("[");
+    for (int i = 0; i < transformedTuple.length; i += 1) {
+      if (i > 0) {
+        sb.append(", ");
+      }
+      sb.append(transformedTuple[i]);
+    }
+    sb.append("]");
+    return sb.toString();
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    } else if (!(o instanceof StructTransform)) {
+      return false;
+    }
+
+    StructTransform that = (StructTransform) o;
+    return Arrays.equals(transformedTuple, that.transformedTuple);
+  }
+
+  @Override
+  public int hashCode() {
+    return Arrays.hashCode(transformedTuple);
+  }
+
+  /**
+   * Simple POJO for source field id and transform function. {@code Pair} class is not usable here
+   * in API module, as it has Avro dep and lives in core module.

Review Comment:
   thx. will update to `as it has an Avro dep and is in the core module`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7798:
URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370454116


##########
api/src/main/java/org/apache/iceberg/StructTransform.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.lang.reflect.Array;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Function;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.util.SerializableFunction;
+
+/**
+ * A struct of flattened transformed values.
+ *
+ * <p>Instances of this class can produce transformed values from a row passed to {@link
+ * #wrap(StructLike)}.
+ */
+class StructTransform implements StructLike, Serializable {
+
+  private final int size;
+  private final Accessor<StructLike>[] accessors;
+
+  @SuppressWarnings("rawtypes")
+  private final SerializableFunction[] transforms;
+
+  private final Object[] transformedTuple;
+
+  StructTransform(Schema schema, List<FieldTransform> fieldTransforms) {
+    Preconditions.checkArgument(fieldTransforms != null, "Invalid field transform list: null");
+
+    this.size = fieldTransforms.size();
+    this.accessors = (Accessor<StructLike>[]) Array.newInstance(Accessor.class, size);
+    this.transforms = new SerializableFunction[size];
+
+    for (int i = 0; i < size; ++i) {
+      int sourceFieldId = fieldTransforms.get(i).sourceFieldId();
+      Transform<?, ?> transform = fieldTransforms.get(i).transform();
+      Accessor<StructLike> accessor = schema.accessorForField(sourceFieldId);
+      Preconditions.checkArgument(
+          accessor != null, "Cannot build accessor for field: " + schema.findField(sourceFieldId));

Review Comment:
   minor nit: checkArgs has a "condition, string, args"  version so you don't have to do a string concat



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org