You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "pseudomuto (via GitHub)" <gi...@apache.org> on 2024/04/09 17:24:30 UTC

[I] NPE During RewriteDataFiles Action with Nessie [iceberg]

pseudomuto opened a new issue, #10110:
URL: https://github.com/apache/iceberg/issues/10110

   ### Apache Iceberg version
   
   1.4.3
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I'm having trouble running the RewriteDataFiles action in Spark. I have a table with ~60B records partitioned by domain and day. When I try to run the job, all the stages complete successfully and appear to have rewritten the data, but the final stage fails with an NPE inside of Nessie's JavaHttpClient, which results in no commit being made to the table.
   
   I have tried enabling partial progress and tweaking the max commits, but I'm not able to successfully commit regardless of those settings (at least the combinations I've tried).
   
   Oddly enough, the import jobs are working, using Spark and Nessie as well (same properties), without issue. I'm wondering if this is a bug or if anyone is able to shed some light on what might be going on here.
   
   **Code**
   
   ```java
   // "spark" is an existing context with the properties (below) configured
   SparkActions.get(spark)
       .rewriteDataFiles(table)
       .option(RewriteDataFilesSparkAction.MAX_CONCURRENT_FILE_GROUP_REWRITES, "1000")
       .option(RewriteDataFilesSparkAction.TARGET_FILE_SIZE_BYTES, "536870912")
       .filter(Expressions.equal(Expressions.day("occurred_at"), 19804))
       .execute();
   ```
   
   **Spark Properties**
   
   ```
   spark.sql.catalog.nessie.cache-enabled=false
   spark.sql.catalog.nessie.catalog-impl=org.apache.iceberg.nessie.NessieCatalog
   spark.sql.catalog.nessie.client-api-version=2
   spark.sql.catalog.nessie.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO
   spark.sql.catalog.nessie.ref=main
   spark.sql.catalog.nessie.uri=https://<domain>/api/v2
   spark.sql.catalog.nessie.warehouse=gs://<bucket>/<dir>
   spark.sql.catalog.nessie=org.apache.iceberg.spark.SparkCatalog
   spark.sql.defaultCatalog=nessie
   spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions
   spark.sql.catalog.nessie.authentication.type=BEARER
   spark.sql.catalog.nessie.quarkus.oidc.auth-server-url=https://accounts.google.com
   spark.sql.catalog.nessie.quarkus.oidc.client-id=<google_client_id>
   spark.hadoop.parquet.enable.summary-metadata=false
   spark.sql.parquet.mergeSchema=false
   spark.sql.parquet.filterPushdown=true
   spark.sql.source.partitionOverviewMode=dynamic
   spark.sql.hive.metastorePartitionPruning=true
   spark.sql.files.maxPartitionBytes=1073741824
   spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
   spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
   spark.hadoop.fs.gs.auth.service.account.enable=true
   ```
   
   **Logs from the Job**
   
   ![spark-logs](https://github.com/apache/iceberg/assets/4748863/6d4aecfc-cd54-4b4f-82f1-b90ae5034cf8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047147156

   I couldn't locally reproduce it. Can you share the detailed log? Maybe there is some clue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2063857060

   closing as it was a user error and user confirmed that it worked after correcting it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2046317631

   Ideally we should use the same version. Can you try once with the same version? 
   
   This NPE (that too in a debug log) is little odd to me (never seen before). Nessie doesn't differentiate normal commit or compaction commit. Problem should happen for any commit. 
   
   Just waking up here. I will analyze the callstack and also discuss with the team today. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2046273223

   @ajantha-bhat @jbonofre Would you be able to help out on this issue? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2048960308

   @dimas-b : catalog created by spark and catalog created by the user is using the same same `JavaHttpClient`  object?
   
   Isn't the culprit is the `Static`  class `ImplSwitch.FACTORY.apply(config);` , it uses the same instance of `JavaHttpClient`?
   
   https://github.com/projectnessie/nessie/blob/nessie-0.79.0/api/client/src/main/java/org/projectnessie/client/http/HttpClientBuilderImpl.java#L247
   
   The above code is from 0.79.0 (latest released version).
   
   I checked the master branch code, there we are now using Apache Http client (I don't see any static instance of `JavaHttpClientFactory` there) (https://github.com/projectnessie/nessie/pull/8224). So, I think issue is fixed in the master branch. WDYT? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "pseudomuto (via GitHub)" <gi...@apache.org>.
pseudomuto commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047775001

   Hmm..I do have debug logs enabled (via ` spark.sparkContext().setLogLevel("DEBUG");`). Would that be sufficient?
   
   Also, to my understanding [this](https://github.com/projectnessie/nessie/blob/a87faddc2fbdd7933cd5b67a61ebd7a55c85d176/api/client/src/main/java/org/projectnessie/client/http/impl/jdk11/JavaHttpClient.java#L77) is the HTTP client in question. Sorry if I'm missing something obvious here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "pseudomuto (via GitHub)" <gi...@apache.org>.
pseudomuto commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2048568071

   Hmm...I did notice that when I wrote it. My assumption was that the SparkActions would be creating tasks that would instantiate their own instance of the catalog via Spark rather than through my little upfront lookup code. It would make sense if it used a catalog instance attached to the table supplied to the call.
   
   I really appreciate you digging into this. I'll try without closing the catalog and post the results here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2046290081

   @pseudomuto: Which version of the Nessie server are you using and how it is deployed?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047672499

   @pseudomuto : a couple of semi-related points :)
   
   If you're interested in refreshing auth tokens in the Spark session, you may want to use the [OAUTH2](https://github.com/projectnessie/nessie/blob/main/api/client/src/main/java/org/projectnessie/client/NessieConfigConstants.java#L364) authentication mode in the Nessie Client.
   
   With that, if you also enable DEBUG logs for Nessie packages, we should be able to get a [log message](https://github.com/projectnessie/nessie/blob/main/api/client/src/main/java/org/projectnessie/client/auth/oauth2/OAuth2Client.java#L132) when the HTTP client is closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "pseudomuto (via GitHub)" <gi...@apache.org>.
pseudomuto commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2048366352

   Sure thing! I've removed imports and a few things for brevity
   
   ```java
   private static final String NESSIE_PREFIX = "spark.sql.catalog.nessie.";
   
   Table getTable(String ...names) {
       var nessieProps = JavaConverters
           .asJava(spark.conf().getAll())
           .entrySet()
           .stream()
           .filter(entry -> entry.getKey().startsWith(NESSIE_PREFIX))
           .map(entry -> Map.entry(entry.getKey().substring(NESSIE_PREFIX.length()), entry.getValue()))
           .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
   
       try (var catalog = new NessieCatalog()) {
           catalog.initialize("nessie", nessieProps);
           return catalog.loadTable(TableIdentifier.of(names));
       } catch (IOException exc) {
           throw new RuntimeException(exc);
       }
   }
   
   // in the main method
   var table = getTable(args[0].split("\\."));
   var actions = SparkActions.get(spark);
   
   actions
       .rewriteDataFiles(table)
       .option(RewriteDataFilesSparkAction.MAX_CONCURRENT_FILE_GROUP_REWRITES, "1000")
       .option(RewriteDataFilesSparkAction.TARGET_FILE_SIZE_BYTES, "536870912")
       .filter(Expressions.equal(Expressions.day("occurred_at"), getEpochDay()))
       .execute();
   
   actions
       .rewriteManifests(table)
       .rewriteIf(file -> file.length() > MAX_MANIFEST_FILE_SIZE)
       .execute();
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047561940

   Is it possible that the commit operation races with the closing of the NessieCatalog?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "pseudomuto (via GitHub)" <gi...@apache.org>.
pseudomuto commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047655639

   > These settings are not needed on the client (engine) side.
   
   We have Nessie running behind IAP in Google and the auth token is added to the spark properties before the code shown above. I was under the impression these were necessary, but I can certainly try removing them. Less is more, for sure 😄 
   
   I'm going to deploy the 0.78.0 server this morning, as suggested by @ajantha-bhat, to see if that makes a difference. My gut says it won't help because the issue doesn't appear to be related to API compatibility between the client (0.78.0) and server (0.76.3) but more like what you mentioned above where the JavaHttpClient is closed, and then a subsequent attempt is made to reuse it
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047483276

   Not sure it is critical in this case, but Iceberg 1.4.3 [includes](https://github.com/apache/iceberg/blob/apache-iceberg-1.4.3/gradle/libs.versions.toml#L50) Nessie Client 0.71.0, so the Nessie Spark Extensions should probably have this version too (to rule out hypothetical binary incompatibilities).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047816727

   In non-oauth2 modes, there's no log message on closing the client, unfortunately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047509787

   My guess is that the `JavaHttpClient` got closed and then reused after closing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047587036

   Side note @pseudomuto :
   
   > spark.sql.catalog.nessie.quarkus.oidc.auth-server-url=https://accounts.google.com
   > spark.sql.catalog.nessie.quarkus.oidc.client-id=<google_client_id>
   
   These settings are not needed on the client side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2049681447

   Never mind, it is a Function in static block not one time assignment. So, on each call, it will create a new `JavaHttpClient`.  So, no issue on the older versions too. 
   
   I think we can close this ticket. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "pseudomuto (via GitHub)" <gi...@apache.org>.
pseudomuto commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2046306547

   In case it's relevant, the spark extensions jar is version 0.78.0 (Iceberg 1.4.3)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "pseudomuto (via GitHub)" <gi...@apache.org>.
pseudomuto commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2046298594

   Hey @ajantha-bhat! I'm running 0.76.3 deployed with the Helm chart


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2046288535

   Will check it today. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "pseudomuto (via GitHub)" <gi...@apache.org>.
pseudomuto commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2047986951

   Just tried with matching server version (0.78.0). Same result unfortunately.
   
   ```
   java.lang.NullPointerException: Cannot invoke "java.net.http.HttpClient.send(java.net.http.HttpRequest, java.net.http.HttpResponse$BodyHandler)" because "this.client" is null
   	at org.projectnessie.client.http.impl.jdk11.JavaHttpClient.lambda$newRequest$0(JavaHttpClient.java:68)
   	at org.projectnessie.client.http.impl.jdk11.JavaRequest.executeRequest(JavaRequest.java:110)
   	at org.projectnessie.client.http.HttpRequest.get(HttpRequest.java:80)
   	at org.projectnessie.client.http.HttpRequestWrapper.unwrap(HttpRequestWrapper.java:72)
   	at org.projectnessie.client.http.HttpRequestWrapper.get(HttpRequestWrapper.java:52)
   	at org.projectnessie.client.rest.v2.HttpGetReference.get(HttpGetReference.java:41)
   	at org.apache.iceberg.nessie.UpdateableReference.refresh(UpdateableReference.java:46)
   	at org.apache.iceberg.nessie.NessieIcebergClient.refresh(NessieIcebergClient.java:97)
   	at org.apache.iceberg.nessie.NessieTableOperations.doRefresh(NessieTableOperations.java:86)
   	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
   	at org.apache.iceberg.SnapshotProducer.refresh(SnapshotProducer.java:354)
   	at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:219)
   	at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:376)
   	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
   	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
   	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
   	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
   	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:374)
   	at org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitFileGroups(RewriteDataFilesCommitManager.java:78)
   	at org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitOrClean(RewriteDataFilesCommitManager.java:100)
   	at org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.doExecute(RewriteDataFilesSparkAction.java:309)
   	at org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.execute(RewriteDataFilesSparkAction.java:178)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2048224986

   @pseudomuto : Would you be able to share your complete Spark driver (session) code (or as much as possible)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2048477509

   Well, that `try (var catalog = new NessieCatalog()) {...}` block _will_ close the Nessie client associated with the table that is loaded from this catalog.
   
   I believe the catalog needs to remain open until `rewriteManifests` finishes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "pseudomuto (via GitHub)" <gi...@apache.org>.
pseudomuto commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2048599779

   Well, that worked like a charm. That is indeed the issue. I can't thank you guys enough!
   
   I'd like to give something back for this. Can I take a stab at https://github.com/projectnessie/nessie/issues/8316 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "dimas-b (via GitHub)" <gi...@apache.org>.
dimas-b commented on issue #10110:
URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2049660446

   In this particular case we get a `Table` object from some `NessieCatalog`, then close the catalog, then use the table in `rewriteManifests()`. The table via its `TableOperations` has ties to the `JavaHttpClient` associated with that exact closed catalog instance, hence the NPE after closing (the `close()` method sets an internal field to `null`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat closed issue #10110: NPE During RewriteDataFiles Action with Nessie
URL: https://github.com/apache/iceberg/issues/10110


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org