You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/01 18:17:44 UTC

[GitHub] [iceberg] karuppayya opened a new pull request #2287: Add iceberg specific description on Spark UI

karuppayya opened a new pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287


   Add better description message on Spark UI, to the jobs spawned by Iceberg actions.
   @aokolnychyi @RussellSpitzer @rdblue 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r589880311



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       I'd like to hear @RussellSpitzer's thoughts on this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r592962839



##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupInfo.java
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+/**
+ * Captures information about the current job
+ * which is used for displaying on the UI
+ */
+public class JobGroupInfo implements AutoCloseable {

Review comment:
       I think we need to decide `AutoClosable` or `Supplier` approach. Either one would work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karuppayya commented on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
karuppayya commented on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-797839585


   > I gave it a round of testing and it seems we overlooked the fact that each action will get a new counter.
   > I think the counter should be a static variable:
   
   Completely missed this, Thanks!!!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593519073



##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupUtils.java
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.SparkContext$;
+
+public class JobGroupUtils {
+
+  private static final String JOB_GROUP_ID = SparkContext$.MODULE$.SPARK_JOB_GROUP_ID();
+  private static final String JOB_GROUP_DESC = SparkContext$.MODULE$.SPARK_JOB_GROUP_ID();
+  private static final String JOB_INTERRUPT_ON_CANCEL = SparkContext$.MODULE$.SPARK_JOB_INTERRUPT_ON_CANCEL();
+
+  private JobGroupUtils() {
+  }
+
+  public static JobGroupInfo getJobGroupInfo(SparkContext sparkContext) {
+    String groupId = sparkContext.getLocalProperty(JOB_GROUP_ID);
+    String description = sparkContext.getLocalProperty(JOB_GROUP_DESC);
+    String interruptOnCancel = sparkContext.getLocalProperty(JOB_INTERRUPT_ON_CANCEL);
+    return new JobGroupInfo(groupId, description, Boolean.parseBoolean(interruptOnCancel));
+  }
+
+  public static void setJobGroupInfo(SparkContext sparkContext, JobGroupInfo obj) {

Review comment:
       nit: `obj` -> `info`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r590629779



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       @rymurr, do you mean making `JobGroupInfo` implement `AutoCloseable` and setting the new job description in the constructor?
   
   ```
     public ... execute() {
       try (JobGroupInfo ignored = newJobGroupInfo(...)) {
         doExecute();
       }
     }
   ```
   
   If yes, both approaches work for me, I have no preference. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586835087



##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -210,7 +210,8 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> newDeleteFunc) {
   }
 
   @Override
-  public ExpireSnapshotsActionResult execute() {
+  public ExpireSnapshotsActionResult doExecute() {
+    spark.sparkContext().setJobGroup("EXPIRE", "EXPIRE-SNAPSHOTS", false);

Review comment:
       We may need my refactoring, though, to fix the hierarchy. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karuppayya commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
karuppayya commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593353444



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +55,7 @@
 
   private final SparkSession spark;
   private final JavaSparkContext sparkContext;
+  private AtomicInteger counter = new AtomicInteger();

Review comment:
       Few actions like `Spark3CreateAction`, `Spark3MigrateAction` implement `org.apache.iceberg.actions.Spark3CreateAction` and dont implement `org.apache.iceberg.actions.BaseSparkAction`.
   Hence added the counter to a common location. We cann add the counter and its getters to both `Spark3CreateAction` and `BaseSparkAction`, what do u think @aokolnychyi 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586927107



##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -210,7 +210,8 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> newDeleteFunc) {
   }
 
   @Override
-  public ExpireSnapshotsActionResult execute() {
+  public ExpireSnapshotsActionResult doExecute() {
+    spark.sparkContext().setJobGroup("EXPIRE", "EXPIRE-SNAPSHOTS", false);

Review comment:
       I'd probably add the following method to `BaseSparkAction` and make implementations call as needed:
   
   ```
   protected <T> T withJobGroupInfo(JobGroupInfo info, Runnable action) {
     JobGroupInfo previousInfo = JobGroupUtils.getJobGroupInfo(sparkContext);
     try {
       JobGroupUtils.setJobGroupInfo(sparkContext, info);
       return action.run();
     } finally {
       JobGroupUtils.setJobGroupInfo(sparkContext, previousInfo);
     }
   }
   ```
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593527646



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -64,6 +70,20 @@ protected JavaSparkContext sparkContext() {
     return sparkContext;
   }
 
+  protected <T> T withJobGroupInfo(JobGroupInfo info, Supplier<T> supplier) {
+    SparkContext context = spark().sparkContext();
+    JobGroupInfo previousInfo = JobGroupUtils.getJobGroupInfo(context);
+    try {
+      JobGroupUtils.setJobGroupInfo(context, info);
+      return supplier.get();
+    } finally {
+      JobGroupUtils.setJobGroupInfo(context, previousInfo);
+    }
+  }
+
+  protected JobGroupInfo newJobGroupInfo(String groupId, String desc) {
+    return new JobGroupInfo(groupId + "-" + jobCounter.incrementAndGet(), desc, false);
+  }

Review comment:
       nit: missing empty line




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586831811



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3CreateAction.java
##########
@@ -187,4 +189,16 @@ protected String getMetadataLocation(Table table) {
   protected abstract Map<String, String> targetTableProps();
 
   protected abstract TableCatalog checkSourceCatalog(CatalogPlugin catalog);
+

Review comment:
       The difference in Catalog Apis was a big part, Spark 2 doesn't have access to CatalogPlugin for example




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r589880210



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       As much as I like having this in a single place, I am not sure this will work for actions that trigger multiple jobs. For example, data compaction @RussellSpitzer is working on will submit a job per partition and we would want to a separate description each time.
   
   I've just merged a refactoring of `BaseSparkAction` that now makes SparkContext accessible in this class.
   
   I think we can have this:
   
   ```
     protected <T> T withJobGroupInfo(JobGroupInfo info, Supplier<R> supplier) {
       JobGroupInfo previousInfo = JobGroupUtils.getJobGroupInfo(sparkContext);
       try {
         JobGroupUtils.setJobGroupInfo(sparkContext, info);
         return supplier.get();
       } finally {
         JobGroupUtils.setJobGroupInfo(sparkContext, previousInfo);
       }
     }
   ```
   
   Then, yes, we would need to modify more code in the existing actions but it should not be that bad.
   
   ```
   public ... execute() {
     JobGroupInfo info = ... 
     withJobGroupInfo(info, () -> {
       // call doExecute (which is now points to the code in execute)
     })
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-790188201


   I think this is really useful and will help our users to debug action jobs. Thanks, @karuppayya!
   
   I'd also consider assigning some counter to each job group id. For example, if somebody calls the same action two times, I'd love to see EXPIRE-SNAPSHOTS-1 and EXPIRE-SNAPSHOTS-2 as group ids. We can achieve this by keeping an atomic integer somewhere and increment it on each call.
   
   Also, I'd consider adding more information to job description (like table name and some critical component?). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r592961634



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +55,7 @@
 
   private final SparkSession spark;
   private final JavaSparkContext sparkContext;
+  private AtomicInteger counter = new AtomicInteger();

Review comment:
       I think this can be final. I'd also call it `jobCounter` or something to be specific.
   
   If we decide to keep the job counter in `BaseSparkAction`, then we can offer the following method:
   
   ```
     protected JobGroupInfo newJobGroupInfo(String groupId, String desc) {
       return new JobGroupInfo(groupId, desc + "-" + jobCounter.incrementAndGet(), false);
     }
   ```
   
   That way, we hide the complexity of assigning the job count from individual actions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r594677710



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +55,7 @@
 
   private final SparkSession spark;
   private final JavaSparkContext sparkContext;
+  private static final AtomicInteger jobCounter = new AtomicInteger();

Review comment:
       Since this is a static variable now, should we move it up and use `JOB_COUNTER` as the name?
   
   I'd also add an empty line to separate static and instance variables like this:
   
   ```
   private static final AtomicInteger JOB_COUNTER = new AtomicInteger();
   
   private final SparkSession spark;
   private final JavaSparkContext sparkContext;
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586834813



##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -210,7 +210,8 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> newDeleteFunc) {
   }
 
   @Override
-  public ExpireSnapshotsActionResult execute() {
+  public ExpireSnapshotsActionResult doExecute() {
+    spark.sparkContext().setJobGroup("EXPIRE", "EXPIRE-SNAPSHOTS", false);

Review comment:
       +1 on this idea. I think we should add an abstract method to `BaseSparkAction` like @rymurr suggests:
   
   ```
   protected abstract JobGroupInfo jobGroup();
   ```
   
   Then we just need to rename `execute` to `doExecute` in actions and set `jobGroup` in `BaseSparkAction`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586921306



##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -210,7 +210,8 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> newDeleteFunc) {
   }
 
   @Override
-  public ExpireSnapshotsActionResult execute() {
+  public ExpireSnapshotsActionResult doExecute() {
+    spark.sparkContext().setJobGroup("EXPIRE", "EXPIRE-SNAPSHOTS", false);

Review comment:
       Data compaction action may submit multiple jobs (e.g. one job per partition) and we would want to have different descriptions for each job (e.g. include the partition we compact in each). Maybe, the abstract method won't work that well in that case.
   
   @RussellSpitzer, thoughts? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593423708



##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -206,11 +209,16 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> newDeleteFunc) {
 
   @Override
   public ExpireSnapshotsActionResult execute() {
-    if (streamResults) {
-      return deleteFiles(expire().toLocalIterator());
-    } else {
-      return deleteFiles(expire().collectAsList().iterator());
-    }
+    SparkContext context = spark().sparkContext();

Review comment:
       Can we simplify this a bit?
   
   ```
     @Override
      public ExpireSnapshotsActionResult execute() {
       JobGroupInfo info = withJobGroupInfo("EXPIRE", "EXPIRE-SNAPSHOTS");
       withJobGroupInfo(info, this::doExecute)
     }
   ```

##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -206,11 +209,16 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> newDeleteFunc) {
 
   @Override
   public ExpireSnapshotsActionResult execute() {
-    if (streamResults) {
-      return deleteFiles(expire().toLocalIterator());
-    } else {
-      return deleteFiles(expire().collectAsList().iterator());
-    }
+    SparkContext context = spark().sparkContext();

Review comment:
       Can we simplify this a bit?
   
   ```
      @Override
      public ExpireSnapshotsActionResult execute() {
       JobGroupInfo info = withJobGroupInfo("EXPIRE", "EXPIRE-SNAPSHOTS");
       withJobGroupInfo(info, this::doExecute)
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593421753



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +55,7 @@
 
   private final SparkSession spark;
   private final JavaSparkContext sparkContext;
+  private AtomicInteger counter = new AtomicInteger();

Review comment:
       I think we can skip action that don't extend `BaseSparkAction` for now. I am in the process of fixing the hierarchy and we can add this logic to other actions once they extend `BaseSparkAction`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-790188201


   I think this is really useful and will help our users to debug any jobs. Thanks, @karuppayya!
   
   I'd also consider assigning some counter to each job group id. For example, if somebody calls the same action two times, I'd love to see EXPIRE-SNAPSHOTS-1 and EXPIRE-SNAPSHOTS-2 as group ids. We can achieve this by keeping an atomic integer somewhere and increment it on each call.
   
   Also, I'd consider adding more information to job description (like table name?). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586913404



##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupInfo.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+/**
+ * Captures information about the current job
+ * which is used for displaying on the UI
+ */
+public class JobGroupInfo {
+  private String groupId;
+  private String description;
+  private String interruptOnCancel;
+
+  JobGroupInfo(String groupId, String desc, String interruptOnCancel) {
+    this.groupId = groupId;
+    this.description = desc;
+    this.interruptOnCancel = interruptOnCancel;
+  }
+
+  public String getGroupId() {

Review comment:
       nit: Iceberg does not use `get` prefixes for getters: `getGroupId` -> `groupId`.

##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupInfo.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+/**
+ * Captures information about the current job
+ * which is used for displaying on the UI
+ */
+public class JobGroupInfo {
+  private String groupId;
+  private String description;
+  private String interruptOnCancel;
+
+  JobGroupInfo(String groupId, String desc, String interruptOnCancel) {
+    this.groupId = groupId;
+    this.description = desc;
+    this.interruptOnCancel = interruptOnCancel;
+  }
+
+  public String getGroupId() {
+    return groupId;
+  }
+
+  public String getDescription() {

Review comment:
       same

##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupInfo.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+/**
+ * Captures information about the current job
+ * which is used for displaying on the UI
+ */
+public class JobGroupInfo {
+  private String groupId;
+  private String description;
+  private String interruptOnCancel;
+
+  JobGroupInfo(String groupId, String desc, String interruptOnCancel) {
+    this.groupId = groupId;
+    this.description = desc;
+    this.interruptOnCancel = interruptOnCancel;
+  }
+
+  public String getGroupId() {
+    return groupId;
+  }
+
+  public String getDescription() {
+    return description;
+  }
+
+  public String getInterruptOnCancel() {

Review comment:
       same




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r592962029



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -64,6 +70,28 @@ protected JavaSparkContext sparkContext() {
     return sparkContext;
   }
 
+  protected abstract R doExecute();

Review comment:
       I don't think we need `doExecute` and `jobGroup` per discussion in [this](https://github.com/apache/iceberg/pull/2287/files#r589880210) thread.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi closed pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi closed pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586927107



##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -210,7 +210,8 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> newDeleteFunc) {
   }
 
   @Override
-  public ExpireSnapshotsActionResult execute() {
+  public ExpireSnapshotsActionResult doExecute() {
+    spark.sparkContext().setJobGroup("EXPIRE", "EXPIRE-SNAPSHOTS", false);

Review comment:
       I'd probably add the following method to `BaseSparkAction` and make implementations call as needed:
   
   ```
   protected <T> T withJobGroupInfo(JobGroupInfo info, Runnable action) {
       JobGroupInfo previousInfo = JobGroupUtils.getJobGroupInfo(sparkContext);
       try {
         JobGroupUtils.setJobGroupInfo(sparkContext, info);
         return action.run();
       } finally {
         JobGroupUtils.setJobGroupInfo(sparkContext, previousInfo);
       }
   }
   ```
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r585427890



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3CreateAction.java
##########
@@ -187,4 +189,16 @@ protected String getMetadataLocation(Table table) {
   protected abstract Map<String, String> targetTableProps();
 
   protected abstract TableCatalog checkSourceCatalog(CatalogPlugin catalog);
+

Review comment:
       Same comment as for `BaseSparkAction` above.
   
   For my own benefit why do the spark3 actions have a different base and is this part of  what is being addressed by Antons refactor? Not a comment on this PR specifically, just me trying to learn the history.

##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -210,7 +210,8 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> newDeleteFunc) {
   }
 
   @Override
-  public ExpireSnapshotsActionResult execute() {
+  public ExpireSnapshotsActionResult doExecute() {
+    spark.sparkContext().setJobGroup("EXPIRE", "EXPIRE-SNAPSHOTS", false);

Review comment:
       I wonder if `protected JobGroupInfo jobGroup()` in `BaseSparkAction` would make it clearer to implementers that this info should be generated/supplied. Then `execute` in `BaseSparkAction` can call teh `jobGroup` method and set the context itself. I think that is more clear for non-spark experts.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586913404



##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupInfo.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+/**
+ * Captures information about the current job
+ * which is used for displaying on the UI
+ */
+public class JobGroupInfo {
+  private String groupId;
+  private String description;
+  private String interruptOnCancel;
+
+  JobGroupInfo(String groupId, String desc, String interruptOnCancel) {
+    this.groupId = groupId;
+    this.description = desc;
+    this.interruptOnCancel = interruptOnCancel;
+  }
+
+  public String getGroupId() {

Review comment:
       nit: Iceberg does not use `get` prefixes for getters. So `getGroupId` -> `groupId`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-797836912


   I gave it a round of testing and it seems we overlooked the fact that each action will get a new counter.
   I think the counter should be a static variable:
   
   ```
   private static final AtomicInteger JOB_COUNTER = new AtomicInteger();
   ```
   
   Also, there seems to be a style violation. @karuppayya, could you update those, please? LGTM otherwise.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593520243



##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupUtils.java
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.SparkContext$;
+
+public class JobGroupUtils {
+
+  private static final String JOB_GROUP_ID = SparkContext$.MODULE$.SPARK_JOB_GROUP_ID();
+  private static final String JOB_GROUP_DESC = SparkContext$.MODULE$.SPARK_JOB_GROUP_ID();

Review comment:
       I think there is a typo: it should be `SPARK_JOB_DESCRIPTION`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593517307



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -64,6 +70,20 @@ protected JavaSparkContext sparkContext() {
     return sparkContext;
   }
 
+  protected <T> T withJobGroupInfo(JobGroupInfo info, Supplier<T> supplier) {
+    SparkContext context = spark().sparkContext();
+    JobGroupInfo previousInfo = JobGroupUtils.getJobGroupInfo(context);
+    try {
+      JobGroupUtils.setJobGroupInfo(context, info);
+      return supplier.get();
+    } finally {
+      JobGroupUtils.setJobGroupInfo(context, previousInfo);
+    }
+  }
+
+  protected JobGroupInfo newJobGroupInfo(String groupId, String desc) {
+    return new JobGroupInfo(groupId, desc + "-" + jobCounter.incrementAndGet(), false);

Review comment:
       I think I made a typo in the snippet. We should probably append jobCounter to id, not desc. My bad.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593521557



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3CreateAction.java
##########
@@ -55,6 +60,7 @@
   protected static final String ICEBERG_METADATA_FOLDER = "metadata";
   protected static final List<String> EXCLUDED_PROPERTIES =
       ImmutableList.of("path", "transient_lastDdlTime", "serialization.format");
+  private AtomicInteger counter = new AtomicInteger();

Review comment:
       nit: This seems redundant.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r590631154



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       Burns on my love of scala aside, I think try with resource is a reasonable approach too




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586914671



##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupUtils.java
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.SparkContext$;
+
+public class JobGroupUtils {
+  private JobGroupUtils() {
+  }
+
+  public static JobGroupInfo getJobGroupInfo(SparkContext sc) {

Review comment:
       Should the arg name be `sparkContext` for consistency with `setJobGroupInfo`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-788225884


   This looks like a great start! I'll take a closer look in a bit. And we should also incorporate this into the refactor that @aokolnychyi is working on.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r586831521



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3CreateAction.java
##########
@@ -187,4 +189,16 @@ protected String getMetadataLocation(Table table) {
   protected abstract Map<String, String> targetTableProps();
 
   protected abstract TableCatalog checkSourceCatalog(CatalogPlugin catalog);
+

Review comment:
       Yeah, I'll fix the hierarchy. It does not make sense right now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-790109483


   Let me take a look at this today too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r589959033



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       That would be helpful for me, I will be having various actions fire off an will need multiple descriptions. In my future thoughts for distributed planning we would similarly want something that can apply separate job group info to separate actions.
   
   Too bad this isn't Scala it would be pretty :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r590631621



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       Though, try-with-resources makes it shorter. Maybe, that's a better option.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r593518788



##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupUtils.java
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.SparkContext$;
+
+public class JobGroupUtils {
+
+  private static final String JOB_GROUP_ID = SparkContext$.MODULE$.SPARK_JOB_GROUP_ID();
+  private static final String JOB_GROUP_DESC = SparkContext$.MODULE$.SPARK_JOB_GROUP_ID();
+  private static final String JOB_INTERRUPT_ON_CANCEL = SparkContext$.MODULE$.SPARK_JOB_INTERRUPT_ON_CANCEL();
+
+  private JobGroupUtils() {
+  }
+
+  public static JobGroupInfo getJobGroupInfo(SparkContext sparkContext) {
+    String groupId = sparkContext.getLocalProperty(JOB_GROUP_ID);
+    String description = sparkContext.getLocalProperty(JOB_GROUP_DESC);
+    String interruptOnCancel = sparkContext.getLocalProperty(JOB_INTERRUPT_ON_CANCEL);
+    return new JobGroupInfo(groupId, description, Boolean.parseBoolean(interruptOnCancel));
+  }
+
+  public static void setJobGroupInfo(SparkContext sparkContext, JobGroupInfo obj) {
+    sparkContext.setLocalProperty(JOB_GROUP_ID, obj.groupId());
+    sparkContext.setLocalProperty(JOB_GROUP_DESC, obj.description());
+    sparkContext.setLocalProperty(JOB_INTERRUPT_ON_CANCEL,
+            String.valueOf(obj.interruptOnCancel()));

Review comment:
       nit: could fit on one line




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-790475292


   > Also, I'd consider adding more information to job description (like table name and some critical component?).
   
   +1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-790188201


   I think this is really useful and will help our users to debug action jobs. Thanks, @karuppayya!
   
   I'd also consider assigning some counter to each job group id. For example, if somebody calls the same action two times, I'd love to see EXPIRE-SNAPSHOTS-1 and EXPIRE-SNAPSHOTS-2 as group ids. We can achieve this by keeping an atomic integer somewhere and increment it on each call.
   
   Also, I'd consider adding more information to job description (like table name?). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r590045533



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       `withJobGroupInfo` is starting to look more like a try-with-resource statement. I think that would be a more java-y way of expressing that. And to @RussellSpitzer's point it wouldn't require any icky Scala :stuck_out_tongue: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#issuecomment-797825441


   A few minor points and should be ready to go. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r589959033



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       That would be helpful for me, I will be having various actions fire off and will need multiple descriptions. In my future thoughts for distributed planning we would similarly want something that can apply separate job group info to separate actions.
   
   Too bad this isn't Scala it would be pretty :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r590631621



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -50,6 +53,24 @@
 
   protected abstract Table table();
 
+  protected abstract R doExecute();
+
+  protected abstract JobGroupInfo jobGroup();

Review comment:
       Thought, try-with-resources makes it shorter. Maybe, that's a better option.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karuppayya closed pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
karuppayya closed pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2287: Add iceberg specific description on Spark UI

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2287:
URL: https://github.com/apache/iceberg/pull/2287#discussion_r592962557



##########
File path: spark/src/main/java/org/apache/iceberg/spark/JobGroupUtils.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark;
+
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.spark.SparkContext;
+import org.apache.spark.SparkContext$;
+
+public class JobGroupUtils {

Review comment:
       What about defining the following constants and using them?
   
   ```
     private static final String JOB_GROUP_ID = SparkContext$.MODULE$.SPARK_JOB_GROUP_ID();
     private static final String JOB_GROUP_DESC = SparkContext$.MODULE$.SPARK_JOB_GROUP_ID();
     private static final String JOB_INTERRUPT_ON_CANCEL = SparkContext$.MODULE$.SPARK_JOB_INTERRUPT_ON_CANCEL();
   ```
   
   This will make get/setJobGroupInfo way simpler to read.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org