You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2009/04/30 10:09:30 UTC

[jira] Created: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
-------------------------------------------------------------------------------

                 Key: HADOOP-5759
                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Amareshwari Sriramadasu
            Assignee: Amareshwari Sriramadasu
             Fix For: 0.21.0


As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: /default-rack
  at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
  at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
  at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
  at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
  at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
  at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
  at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
  at java.lang.Thread.run(Thread.java:619)

When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.

Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5759:
--------------------------------------------

    Attachment: patch-5759.txt

Patch changing racknames to hostnames. 
Unit test to do "submit a job with CombineFileInputFormat "will added in HADOOP-5698. Now, I manually tested the patch by running multifile wordcount using CombineFileInputFormat

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712949#action_12712949 ] 

Amareshwari Sriramadasu commented on HADOOP-5759:
-------------------------------------------------

test-patch result:
{noformat}
     [exec]
     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]
{noformat}

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759-1.txt, patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713407#action_12713407 ] 

Amareshwari Sriramadasu commented on HADOOP-5759:
-------------------------------------------------

test-patch and ant test passed on my machine.

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759-1.txt, patch-5759-2.txt, patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713408#action_12713408 ] 

Amareshwari Sriramadasu commented on HADOOP-5759:
-------------------------------------------------

test-patch :
{noformat}
     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]
{noformat}

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759-1.txt, patch-5759-2.txt, patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5759:
--------------------------------------------

    Attachment: patch-5759-2.txt

bq. 1. Is the change to addCreatedSplit() not needed anymore?
{code}
@@ -412,7 +417,7 @@
    */
   private void addCreatedSplit(JobConf job,
                                List<CombineFileSplit> splitList, 
-                               List<String> racks, 
+                               List<String> locations, 
                                ArrayList<OneBlockInfo> validBlocks) {
     // create an input split
{code}
I changed the above, because racks are not passed to addCreatedSplit any more.

Added rackToNodes.clear() just before returning splits from getSplits, as suggested by Dhruba.

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759-1.txt, patch-5759-2.txt, patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5759:
--------------------------------------------

    Attachment: patch-5759-1.txt

Patch adding rackToNodes map incrementally.

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759-1.txt, patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710689#action_12710689 ] 

Jothi Padmanabhan commented on HADOOP-5759:
-------------------------------------------

The patch does not apply cleanly because of the change to the test directory structure (src/test/org.. should be src/test/mapred/org/...)

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711018#action_12711018 ] 

Jothi Padmanabhan commented on HADOOP-5759:
-------------------------------------------

Since only hosts that actually contain the valid blocks are returned in getMoreSplits with this patch, it appears that we are losing the rack level aggregation that was intended for this InputFormat.

Dhruba, could you take a look as well? 

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-5759:
--------------------------------------------

    Status: Patch Available  (was: Open)

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713012#action_12713012 ] 

dhruba borthakur commented on HADOOP-5759:
------------------------------------------

Code looks good to me.  A few minor comments:

1. Is the change to addCreatedSplit() not needed anymore?
2. Since we added a static data structure rackToNodes, it might make sense to empty it just before returning from getSplits. This will ensure that we free up that memory sooner rather than later.

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759-1.txt, patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712524#action_12712524 ] 

dhruba borthakur commented on HADOOP-5759:
------------------------------------------

This patch keeps the original aim of combining blocks from different hosts in the same rack into a single split. The fix that that patch attempts is to figure out where such a combined split should reside.


> Since only hosts that actually contain the valid blocks are returned in getMoreSplits with this patch,

I agree with Jothi to a certain extent. The number ofsplits remain the same before, but the possibility of scheduling them on the rack where they reside is slightly reduced because we look only at those hosts where this block belongs. It is possible to enhance this patch to create  a new data strcture called rackToNodes at the very beginning. It can be populated by iterating through all the blocks at the very beginning. 

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-5759:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Amareshwari!



> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759-1.txt, patch-5759-2.txt, patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712683#action_12712683 ] 

Jothi Padmanabhan commented on HADOOP-5759:
-------------------------------------------

bq. It is possible to enhance this patch to create a new data strcture called rackToNodes at the very beginning. It can be populated by iterating through all the blocks at the very beginning.

Agreed. However, Amareshwari and I discussed this offline and we are not sure if we want to build this rackToNodes  before getMoreSplits method as it would mean calling getBlockLocations for all the blocks twice -- once to build the rackToNodes and once in getMoreSplits to build the blockInfo maps. Could we incrementally build this map in getMoreSplits. This would have the disadvantage of having incomplete information for the first few splits.

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713403#action_12713403 ] 

dhruba borthakur commented on HADOOP-5759:
------------------------------------------

+1 Code looks good.

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759-1.txt, patch-5759-2.txt, patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.