You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/11 06:08:30 UTC

[GitHub] [iceberg] shardulm94 opened a new pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

shardulm94 opened a new pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582


   `InputSplit.getLocations()` can be called on mappers in some cases. Since both `locations` and `conf` are transient, the current code produces an NPE as `conf` is null on the mappers.
   
   The value of `locations` is not really relevant on the mappers since the tasks have already been distributed. So here we just return `ANYWHERE` when the `conf` is null. We can probably be more accurate by serializing the `locations` values if set, but I don't think its worth it.
   
   ```
   java.lang.Exception: java.lang.NullPointerException
       at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
   Caused by: java.lang.NullPointerException
       at org.apache.iceberg.mr.mapreduce.IcebergSplit.getLocations(IcebergSplit.java:69)
       .
       .
       .
       at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.toString(PigSplit.java:465)
       at java.lang.String.valueOf(String.java:2994)
       at java.lang.StringBuilder.append(StringBuilder.java:131)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
       at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   ```
   
   I discovered this issue while testing our internal Pig `LoadFunc` implementation which works with multiple `InputFormat`s. Iceberg's `LoadFunc` implementation in `iceberg-pig` does not have this issue as it contains its own `InputSplit` implementation which does not provide location information.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582#discussion_r502938185



##########
File path: mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergSplit.java
##########
@@ -74,9 +74,11 @@ public long getLength() {
 
   @Override
   public String[] getLocations() {
-    if (locations == null) {
+    if (locations == null && conf != null) {

Review comment:
       I think a comment can help here saying `implementation of getLocations is only meant to be used on the client side during splits computation.  getLocations won't be accurate when called on worker nodes `




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582#issuecomment-706656018


   cc: @guilload


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582#discussion_r502938185



##########
File path: mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergSplit.java
##########
@@ -74,9 +74,11 @@ public long getLength() {
 
   @Override
   public String[] getLocations() {
-    if (locations == null) {
+    if (locations == null && conf != null) {

Review comment:
       I think a comment can help here saying `implementation of getLocations is only meant to be used during splits computation.  getLocations won't be accurate when called on worker nodes `




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582#issuecomment-707849808


   Merging since @rdsr's comment was addressed. Thanks, @shardulm94!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick removed a comment on pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

Posted by GitBox <gi...@apache.org>.
kbendick removed a comment on pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582#issuecomment-706658283


   Also, would it be possible to upstream the location aware work into iceberg-pig @shardulm94? Seems really important from the POV of cost.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582#discussion_r503633646



##########
File path: mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergSplit.java
##########
@@ -74,9 +74,11 @@ public long getLength() {
 
   @Override
   public String[] getLocations() {
-    if (locations == null) {
+    if (locations == null && conf != null) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #1582: MR: Fix NPE when InputSplit.getLocations is called on mappers

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #1582:
URL: https://github.com/apache/iceberg/pull/1582#issuecomment-706658283


   Also, would it be possible to upstream the location aware work into iceberg-pig @shardulm94? Seems really important from the POV of cost.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org