You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by rab ra <ra...@gmail.com> on 2013/08/21 14:21:10 UTC

running map task in remote node

Hello,

Here is the new bie question of the day.

For one of my use cases, I want to use hadoop map reduce without HDFS.
Here, I will have a text file containing a list of file names to process.
Assume that I have 10 lines (10 files to process) in the input text file
and I wish to generate 10 map tasks and execute them in parallel in 10
nodes. I started with basic tutorial on hadoop and could setup single node
hadoop cluster and successfully tested wordcount code.

Now, I took two machines A (master) and B (slave). I did the below
configuration in these machines to setup a two node cluster.

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
          <name>dfs.replication</name>
          <value>1</value>
</property>
<property>
  <name>dfs.name.dir</name>
  <value>/tmp/hadoop-bala/dfs/name</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/tmp/hadoop-bala/dfs/data</value>
</property>
<property>
     <name>mapred.job.tracker</name>
    <value>A:9001</value>
</property>

</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
            <name>mapred.job.tracker</name>
            <value>A:9001</value>
</property>
<property>
          <name>mapreduce.tasktracker.map.tasks.maximum</name>
           <value>1</value>
</property>
</configuration>

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
         <property>
                <name>fs.default.name</name>
                <value>hdfs://A:9000</value>
        </property>
</configuration>


In A and B, I do have a file named ‘slaves’ with an entry ‘B’ in it and
another file called ‘masters’ wherein an entry ‘A’ is there.

I have kept my input file at A. I see the map method process the input file
line by line but they are all processed in A. Ideally, I would expect those
processing to take place in B.

Can anyone highlight where I am going wrong?


regards
rab