You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "gaborkaszab (via GitHub)" <gi...@apache.org> on 2023/03/29 12:45:37 UTC

[GitHub] [iceberg] gaborkaszab opened a new pull request, #5837: API,Core: Introduce metrics for data files by file format

gaborkaszab opened a new pull request, #5837:
URL: https://github.com/apache/iceberg/pull/5837

   There is an existing metric to hold the total number of non-skipped data files during a planFiles() call. This patch adds new metrics to break that number up by file format (Avro, ORC, Parquet). These new metrics could come handy for profiling queries on mixed file format tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1381462135

   Additionally, @danielcweeks @rdblue would you mind taking a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on code in PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#discussion_r1062450337


##########
api/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterKey.java:
##########
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+// TODO gaborkaszab: comment
+public interface MultiDimensionCounterKey {
+  String get();
+
+  int numDimensions();

Review Comment:
   I figured that we should guarantee that an instance of a MultiDimensionCounter has to work on a certain number of dimensions. I made the number of dimensions configurable through constructor and then whenever I give a MetricTag to the MultiDimensionCounter (e.g. when I call insert()) I validate that the MetricTag has the same number of dimensions as the MultiDimensionCounter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1346411031

   Thanks again for the review, @nastra!
   I believe I have addressed all of your previous comments. Another round of review would be appreciated :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on code in PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#discussion_r1062429524


##########
api/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterKey.java:
##########
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+// TODO gaborkaszab: comment
+public interface MultiDimensionCounterKey {

Review Comment:
   I like this suggestion, also makes the name way shorter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#discussion_r1050492924


##########
api/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterKey.java:
##########
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+// TODO gaborkaszab: comment
+public interface MultiDimensionCounterKey {

Review Comment:
   rather than calling this `MultiDimensionCounterKey` I think a better name would be `MetricTag`. Micrometer also uses tag(s) as a term for multiple dimensions. 



##########
api/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterKey.java:
##########
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+// TODO gaborkaszab: comment
+public interface MultiDimensionCounterKey {
+  String get();
+
+  int numDimensions();

Review Comment:
   I'm a bit confused why this method is needed



##########
core/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterResultParser.java:
##########
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.node.ArrayNode;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.Map;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.util.JsonUtil;
+
+public class MultiDimensionCounterResultParser {
+  private static final String MISSING_FIELD_ERROR_MSG =
+      "Cannot parse counter from '%s': Missing field '%s'";
+  private static final String UNIT = "unit";
+  private static final String VALUE = "value";
+  private static final String COUNTER_ID = "counter-id";

Review Comment:
   why do we use `counter-id`? Isn't this just a nested `CounterResult` where we can use `CounterResultParser` for ser/de?



##########
core/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterResult.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import java.util.Map;
+import org.apache.iceberg.metrics.MetricsContext.Unit;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.immutables.value.Value;
+
+@Value.Immutable
+public abstract class MultiDimensionCounterResult {
+  public abstract Map<String, Long> counterResults();

Review Comment:
   I would have expected this to be a `Map<String, CounterResult>` since a nested counter is still a valid counter. Also aren't we losing the unit of the counter that way?



##########
api/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounter.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import java.util.Set;
+import org.apache.iceberg.metrics.MetricsContext.Unit;
+
+/** Generalized multi-dimension Counter interface. */
+public interface MultiDimensionCounter {
+
+  /** Increment the counter under a given key by 1. */
+  void increment(MultiDimensionCounterKey key);
+
+  /** Increment the counter under a given key by the provided amount. */
+  void increment(MultiDimensionCounterKey key, long amount);
+
+  /**
+   * The name of the multi-dimension counter.
+   *
+   * @return The name of the multi-dimension counter.
+   */
+  String name();
+
+  /**
+   * The keys used by this multi-dimension counter.
+   *
+   * @return The keys used by this multi-dimension counter.
+   */
+  Set<String> keys();

Review Comment:
   I think a better name would be `metricTags()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#discussion_r1050495424


##########
api/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterKey.java:
##########
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+// TODO gaborkaszab: comment
+public interface MultiDimensionCounterKey {
+  String get();
+
+  int numDimensions();

Review Comment:
   I'm a bit confused why this method is needed. Could you elaborate please?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1327480362

   Quick update:
   I have a PoC for the new MultiDimensionCounter<T> type. There are some rough edges but apart from the Json conversions seems to work fine. The from-Json part is a bit tricky because the code has to be aware what the generic parameter type is for the multi-dimensional counter so I had to also write this info into Json and use reflection to recreate the type. Apart from this the ScanReportParser and ScanMetricsResultParser tests fail because the order of the Json elements seems to matter for the asserts but the order of the elements in the multi-dimensional counter is not guaranteed. Working on fixing these and I'll upload new patch.
   This is where I'm now: https://github.com/gaborkaszab/iceberg/commit/b268f65f3d588f30c637aa3db3f53be2a5f4c80d


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1488724330

   I did a rebase because there has been some conflicts. Could we somehow get this PR going? @nastra already took a look few months ago. @rdblue @danielcweeks Could you please take a look?
   
   Just looping in more committers, hoping that someone might have some spare cycles, cc @pvary @szehon-ho @RussellSpitzer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1535231425

   Hey @nastra,
   First of all let me congratulate for your promotion to committer!
   Would you mind continue the review on this PR? I remember there was a point where you told me to get @danielcweeks and @rdblue involved as we would need them to merge it anyway but I guess this is no longer an issue :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab closed pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab closed pull request #5837: API,Core: Introduce metrics for data files by file format
URL: https://github.com/apache/iceberg/pull/5837


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1274711124

   Hi @rdblue,
   We have recently implemented the support for mixed file format Iceberg tables in Impala: https://issues.apache.org/jira/browse/IMPALA-10610
   One use case would be to gradually migrate the files of a table from one file format to another. (Motivation is that Impala is more performant with Parquet than ORC.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1283564932

   @nastra As you are very familiar with this area, would you mind taking a quick look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on code in PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#discussion_r1062501894


##########
core/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterResult.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import java.util.Map;
+import org.apache.iceberg.metrics.MetricsContext.Unit;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.immutables.value.Value;
+
+@Value.Immutable
+public abstract class MultiDimensionCounterResult {
+  public abstract Map<String, Long> counterResults();

Review Comment:
   My fist approach had a Map<String, CounterResult> but then I figured that the Unit is going to be the same for the underlying counters so it's enough to have a Long instead of a CounterResult as the Unit would be redundant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on code in PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#discussion_r1062505038


##########
core/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounterResultParser.java:
##########
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.node.ArrayNode;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.Map;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.util.JsonUtil;
+
+public class MultiDimensionCounterResultParser {
+  private static final String MISSING_FIELD_ERROR_MSG =
+      "Cannot parse counter from '%s': Missing field '%s'";
+  private static final String UNIT = "unit";
+  private static final String VALUE = "value";
+  private static final String COUNTER_ID = "counter-id";

Review Comment:
   See my above answer, Instead of a CounterResult I went for simply storing Longs as the Unit would be redundant anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1427865520

   Hey @rdblue, @danielcweeks, @nastra, would you mind taking a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1535797801

   Thanks @gaborkaszab. I think it still makes sense to have @danielcweeks and/or @rdblue review this PR as they always can provide meaningful input which ultimately makes the outcome better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
nastra commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1346417834

   thanks @gaborkaszab, will try and review it this week. You might also want to include @rdblue or @danielcweeks for a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1380224550

   Hey @nastra,
   Would you mind taking another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1488539049

   Oops, I did a rebase with master and accidentally closed this PR. Let me re-open


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1569488106

   Hey @rdblue , @danielcweeks could you please help me how we can make this PR moving again? I've put some efforts into it and also would be a nice addition into Impala's planner.
   
   Hey @aokolnychyi, I saw you were pretty active on other PRs. I'm wondering if you have any spare cycles to take a quick look at this as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1319670877

   > the code changes themselves LGTM but I'm still not sure that this is how we'd want to represent **dimensions** in metrics as this doesn't really scale to add a new metric field for each new **dimension** (where the file format is the dimension we use here).
   > 
   > I think it would be interesting to explore we we could actually represent such dimensions in a more natural and scalable way.
   
   Let me give an update on this: At first I tried to introduce a Map<FileFormat, Counter> for this dimensional metric in ScanMetrics, but for me to make this work I ended up writing code to convert this into MAP<FileFormat, CounterResult> plus in ScanMetricsResultParser.fromJson() we still need to have the name of each filed (including the file format name) that isn't generic unfortunately. This approach seemed overcomplicated and still not 100% generic to handle automatically e.g. adding a new dimension.
   I'll give it a try to create something like a MultiDimensionCounter and let's see if it turns out better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1335193062

   Thanks for the review on https://github.com/gaborkaszab/iceberg/commit/b268f65f3d588f30c637aa3db3f53be2a5f4c80d @nastra! I think I addressed most of them.
   
   The changes since the last patch:
   - Made the MultiDimensionCounter non-generic.
   - The MultiDimensionCounter receives a name in the constructor.
   - The key for the internal counters is created from the MultiDimensionCounter name plus the name of the dimension.
   - Made a MultiDimensionCounter interface plus a DefaultMultiDimensionCounter implementation for it.
   - There is a NOOP implementation as well.
   - The DefaultMultiDimensionCounter lives in ScanMetrics but is created through MetricsContext. I believe this is in line with how the other Counters and Timers are handled.
   - The counters for the dimensions still live within the DefaultMultiDimensionCounter internally, but they are also created through MetricsContext. I figured this is the most natural way to represent the multi-dimensions.
   - The MultiDimensionCounterResult has a function to aggregate the results for all dimensions.
   - Updated the existing tests to cover the new counters, also to/from Json conversions are taken care. All tests should pass now.
   - Wrote some new tests as well to have more coverage on the new code.
   
   What is remaining:
   Currently there is only support for one dimension. I still have to figure out how this would look like having more than one dimensions. I think it has to satisfy the following requirements:
   - The number of dimensions should be decided during instantiation of the multi-counter. Maybe constructor param?
   - The increase() and value() functions should make sure that they receive as many "dimension key"s as many dimensions were decided in the constructor.
   - The key for the internal counters could be constructed from the name of the multi-counter plus appending the name of the dimensions. I think this should be enough to have unique keys when creating through MetricsContext.
   - The order of the dimensions matters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on code in PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#discussion_r1062505191


##########
api/src/main/java/org/apache/iceberg/metrics/MultiDimensionCounter.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import java.util.Set;
+import org.apache.iceberg.metrics.MetricsContext.Unit;
+
+/** Generalized multi-dimension Counter interface. */
+public interface MultiDimensionCounter {
+
+  /** Increment the counter under a given key by 1. */
+  void increment(MultiDimensionCounterKey key);
+
+  /** Increment the counter under a given key by the provided amount. */
+  void increment(MultiDimensionCounterKey key, long amount);
+
+  /**
+   * The name of the multi-dimension counter.
+   *
+   * @return The name of the multi-dimension counter.
+   */
+  String name();
+
+  /**
+   * The keys used by this multi-dimension counter.
+   *
+   * @return The keys used by this multi-dimension counter.
+   */
+  Set<String> keys();

Review Comment:
   Makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

Posted by GitBox <gi...@apache.org>.
gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1372305560

   Thanks for the comments, @nastra! I believe I've addressed all of them.
   
   (plus did a rebase with master to resolve conflicts)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org