You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by ghufran malik <gh...@gmail.com> on 2014/03/30 12:13:12 UTC

ConnectedComponents example

Hello,

I am a final year Bsc Computer Science Student who is using Apache Giraph
for my final year project and dissertation and would appreciate very much
if someone could help me with the following issue.

I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am
having trouble running the ConnectedComponents example. I use the following
command:

 hadoop jar
/home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.ConnectedComponentsComputation -vif
org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip
/user/ghufran/in/my_graph.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/ghufran/outCC -w 1


I believe it gets stuck in the InputSuperstep as the following is displayed
in terminal when the command is running:

14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
average 109.01MB
14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
average 109.01MB
14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
average 108.78MB
....

which I traced back to the following if statement in the toString() method
of core.org.apache.job.CombinedWorkerProgress:

if (isInputSuperstep()) {
      sb.append("Loading data: ");
      sb.append(verticesLoaded).append(" vertices loaded, ");
      sb.append(vertexInputSplitsLoaded).append(
          " vertex input splits loaded; ");
      sb.append(edgesLoaded).append(" edges loaded, ");
      sb.append(edgeInputSplitsLoaded).append(" edge input splits loaded");

sb.append("; min free memory on worker ").append(
        workerWithMinFreeMemory).append(" - ").append(
        DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average
").append(
        DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");

So it seems to me that it's not loading in the InputFormat correctly. So I
am assuming there's something wrong with my input format class or, probably
more likely, something wrong with the graph I passed in?

I pass in a small graph that has the format vertex id, vertex value,
neighbours separated by tabs, my graph is shown below:

1 0 2
2 1 1 3 4
3 2 2
4 3 2

The full output is shown below after I ran my command is shown below. If
anyone could explain to me why I am not getting the expected output I would
greatly appreciate it.

Many thanks,

Ghufran


FULL OUTPUT:


14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format
specified. Ensure your InputFormat does not require one.
14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format
specified. Ensure your OutputFormat does not require one.
14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is disabled
(default), do not allow any task retries (setting mapred.map.max.attempts =
0, old value = 4)
14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL:
http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
14/03/30 10:48:45 INFO
job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
writeHaltInstructions: To halt after next superstep execute:
'bin/halt-application --zkServer ghufran:22181 --zkNode
/_hadoopBsp/job_201403301044_0001/_haltComputation'
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:host.name
=ghufran
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.7.0_51
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Oracle Corporation
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-7-oracle/jre
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:os.version=3.8.0-35-generic
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.name
=ghufran
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/ghufran
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=ghufran:22181 sessionTimeout=60000
watcher=org.apache.giraph.job.JobProgressTracker@209fa588
14/03/30 10:48:45 INFO mapred.JobClient: Running job: job_201403301044_0001
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection to
server ghufran/127.0.1.1:22181. Will not attempt to authenticate using SASL
(unknown error)
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection established
to ghufran/127.0.1.1:22181, initiating session
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment complete
on server ghufran/127.0.1.1:22181, sessionid = 0x1451263c44c0002,
negotiated timeout = 600000
14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
average 109.01MB
14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
average 109.01MB
14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
average 109.01MB
14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
average 108.78MB

Re: ConnectedComponents example

Posted by nishant gandhi <ni...@gmail.com>.
I am facing same problem.. Let me know if you find solution for it..
I am following
http://anisnasir.wordpress.com/2014/01/03/running-a-custom-code-in-apache-giraph/
this blog for running my job


On Sun, Mar 30, 2014 at 3:43 PM, ghufran malik <gh...@gmail.com>wrote:

> Hello,
>
> I am a final year Bsc Computer Science Student who is using Apache Giraph
> for my final year project and dissertation and would appreciate very much
> if someone could help me with the following issue.
>
> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am
> having trouble running the ConnectedComponents example. I use the following
> command:
>
>  hadoop jar
> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.ConnectedComponentsComputation -vif
> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip
> /user/ghufran/in/my_graph.txt -vof
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/ghufran/outCC -w 1
>
>
> I believe it gets stuck in the InputSuperstep as the following is
> displayed in terminal when the command is running:
>
> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
> average 108.78MB
> ....
>
> which I traced back to the following if statement in the toString() method
> of core.org.apache.job.CombinedWorkerProgress:
>
> if (isInputSuperstep()) {
>       sb.append("Loading data: ");
>       sb.append(verticesLoaded).append(" vertices loaded, ");
>       sb.append(vertexInputSplitsLoaded).append(
>           " vertex input splits loaded; ");
>       sb.append(edgesLoaded).append(" edges loaded, ");
>       sb.append(edgeInputSplitsLoaded).append(" edge input splits loaded");
>
> sb.append("; min free memory on worker ").append(
>         workerWithMinFreeMemory).append(" - ").append(
>         DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average
> ").append(
>         DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");
>
> So it seems to me that it's not loading in the InputFormat correctly. So I
> am assuming there's something wrong with my input format class or, probably
> more likely, something wrong with the graph I passed in?
>
> I pass in a small graph that has the format vertex id, vertex value,
> neighbours separated by tabs, my graph is shown below:
>
> 1 0 2
> 2 1 1 3 4
> 3 2 2
> 4 3 2
>
> The full output is shown below after I ran my command is shown below. If
> anyone could explain to me why I am not getting the expected output I would
> greatly appreciate it.
>
> Many thanks,
>
> Ghufran
>
>
> FULL OUTPUT:
>
>
> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format
> specified. Ensure your OutputFormat does not require one.
> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL:
> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
> 14/03/30 10:48:45 INFO
> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
> writeHaltInstructions: To halt after next superstep execute:
> 'bin/halt-application --zkServer ghufran:22181 --zkNode
> /_hadoopBsp/job_201403301044_0001/_haltComputation'
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:host.name
> =ghufran
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.version=1.7.0_51
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.vendor=Oracle Corporation
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/tmp
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name
> =Linux
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:os.arch=amd64
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:os.version=3.8.0-35-generic
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.name
> =ghufran
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:user.home=/home/ghufran
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=ghufran:22181 sessionTimeout=60000
> watcher=org.apache.giraph.job.JobProgressTracker@209fa588
> 14/03/30 10:48:45 INFO mapred.JobClient: Running job: job_201403301044_0001
> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection to
> server ghufran/127.0.1.1:22181. Will not attempt to authenticate using
> SASL (unknown error)
> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection established
> to ghufran/127.0.1.1:22181, initiating session
> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment
> complete on server ghufran/127.0.1.1:22181, sessionid =
> 0x1451263c44c0002, negotiated timeout = 600000
>  14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
> average 108.78MB
>
>

Re: ConnectedComponents example

Posted by Young Han <yo...@uwaterloo.ca>.
That's pretty interesting. Forgot to mention, the output I get is

--3--
--4--
--5--
--6--
--7--

So it does look like something is up with Java.

Young


On Mon, Mar 31, 2014 at 5:05 PM, ghufran malik <gh...@gmail.com>wrote:

> Hmm yea, the only difference between mine and your system is the hadoop
> your using and maybe the jdk. I think it's most likely something to do with
> the jdk in this respect.
>
>
> On Mon, Mar 31, 2014 at 10:01 PM, ghufran malik <gh...@gmail.com>wrote:
>
>> the output your code produced is:
>>
>> --3--
>> --4--
>> ----
>> ----
>> ----
>> --5--
>> ----
>> ----
>> ----
>> --6--
>> ----
>> ----
>> ----
>> --7--
>>
>> it's because of the space between the \t and closing ] in [\t ]. This
>> will separate output by a space. Whereas if you just have [\t] it will
>> separate this out using tab spacing.
>>
>> Thanks for clearing that up for me!
>>
>> Ghufran
>>
>>
>> On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik <gh...@gmail.com>wrote:
>>
>>> Hey,
>>>
>>> Yes when originally debugging the code I thought to check what \t
>>> actually split by and created my own test class:
>>>
>>> import java.util.regex.Pattern;
>>>
>>>  class App
>>> {
>>>   private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
>>>     public static void main( String[] args )
>>>     {
>>>     String line = "1 0 2";
>>>      String[] tokens = SEPARATOR.split(line.toString());
>>>
>>>      System.out.println(SEPARATOR);
>>>      System.out.println(tokens.length);
>>>
>>>      for(String token : tokens){
>>>
>>>      System.out.println(token);
>>>      }
>>>     }
>>> }
>>>
>>> and the pattern worked as I thought it should by tab spaces.
>>>
>>> I'll try your test as well to double check
>>>
>>>
>>> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>
>>>> Weird, inputs with tabs work for me right out of the box. Either the
>>>> "\t" is not the cause or it's some Java-version specific issue. Try this
>>>> toy program:
>>>>
>>>>
>>>> import java.util.regex.Pattern;
>>>>
>>>> public class Test {
>>>>   public static void main(String[] args) {
>>>>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>>>>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>>>>
>>>>     for (int i = 0; i < tokens.length; i++) {
>>>>       System.out.println("--" + tokens[i] + "--");
>>>>     }
>>>>   }
>>>> }
>>>>
>>>>
>>>> Does it split the tabs properly for your Java?
>>>>
>>>> Young
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Yep you right it is a bug with all the InputFormats I believe,  I just
>>>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>>>>> and the example ConnectedComponents class and it worked like a charm with
>>>>> just the normal spacing.
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>>>
>>>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile
>>>>>> has to take "[\\t ]" (note the double backslash) to properly match tabs? If
>>>>>> so, that bug is in all the input formats...
>>>>>>
>>>>>> Happy to help :)
>>>>>>
>>>>>> Young
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <
>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I removed the spaces and it worked! I don't understand though. I'm
>>>>>>> sure the separator pattern means that it splits it by tab spaces?.
>>>>>>>
>>>>>>> Thanks for all your help though some what relieved now!
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Ghufran
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> That looks like an error with the algorithm... What do the Hadoop
>>>>>>>> userlogs say?
>>>>>>>>
>>>>>>>> And just to rule out weirdness, what happens if you use spaces
>>>>>>>> instead of tabs (for your input graph)?
>>>>>>>>
>>>>>>>> Young
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> No even after I added the .txt it gets to map 100% then drops back
>>>>>>>>> down to 50 and gives me the error:
>>>>>>>>>
>>>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input
>>>>>>>>> format specified. Ensure your InputFormat does not require one.
>>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>>>> format vertex index type is not known
>>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>>>> format vertex value type is not known
>>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>>>> format edge value type is not known
>>>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>>>>>>>> disabled (default), do not allow any task retries (setting
>>>>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>>>>> job_201403311622_0004
>>>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>>>>> job_201403311622_0004
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:
>>>>>>>>> SLOTS_MILLIS_MAPS=1238858
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by
>>>>>>>>> all reduces waiting after reserving slots (ms)=0
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by
>>>>>>>>> all maps waiting after reserving slots (ms)=0
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I did a check to make sure the graph was being stored correctly by
>>>>>>>>> doing:
>>>>>>>>>
>>>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs
>>>>>>>>> -cat input/*
>>>>>>>>> 1 2
>>>>>>>>> 2 1 3 4
>>>>>>>>> 3 2
>>>>>>>>> 4 2
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by ghufran malik <gh...@gmail.com>.
Hmm yea, the only difference between mine and your system is the hadoop
your using and maybe the jdk. I think it's most likely something to do with
the jdk in this respect.


On Mon, Mar 31, 2014 at 10:01 PM, ghufran malik <gh...@gmail.com>wrote:

> the output your code produced is:
>
> --3--
> --4--
> ----
> ----
> ----
> --5--
> ----
> ----
> ----
> --6--
> ----
> ----
> ----
> --7--
>
> it's because of the space between the \t and closing ] in [\t ]. This will
> separate output by a space. Whereas if you just have [\t] it will separate
> this out using tab spacing.
>
> Thanks for clearing that up for me!
>
> Ghufran
>
>
> On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik <gh...@gmail.com>wrote:
>
>> Hey,
>>
>> Yes when originally debugging the code I thought to check what \t
>> actually split by and created my own test class:
>>
>> import java.util.regex.Pattern;
>>
>>  class App
>> {
>>   private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
>>     public static void main( String[] args )
>>     {
>>     String line = "1 0 2";
>>      String[] tokens = SEPARATOR.split(line.toString());
>>
>>      System.out.println(SEPARATOR);
>>      System.out.println(tokens.length);
>>
>>      for(String token : tokens){
>>
>>      System.out.println(token);
>>      }
>>     }
>> }
>>
>> and the pattern worked as I thought it should by tab spaces.
>>
>> I'll try your test as well to double check
>>
>>
>> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>
>>> Weird, inputs with tabs work for me right out of the box. Either the
>>> "\t" is not the cause or it's some Java-version specific issue. Try this
>>> toy program:
>>>
>>>
>>> import java.util.regex.Pattern;
>>>
>>> public class Test {
>>>   public static void main(String[] args) {
>>>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>>>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>>>
>>>     for (int i = 0; i < tokens.length; i++) {
>>>       System.out.println("--" + tokens[i] + "--");
>>>     }
>>>   }
>>> }
>>>
>>>
>>> Does it split the tabs properly for your Java?
>>>
>>> Young
>>>
>>>
>>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <gh...@gmail.com>wrote:
>>>
>>>> Yep you right it is a bug with all the InputFormats I believe,  I just
>>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>>>> and the example ConnectedComponents class and it worked like a charm with
>>>> just the normal spacing.
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>>
>>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile
>>>>> has to take "[\\t ]" (note the double backslash) to properly match tabs? If
>>>>> so, that bug is in all the input formats...
>>>>>
>>>>> Happy to help :)
>>>>>
>>>>> Young
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <
>>>>> ghufran1malik@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I removed the spaces and it worked! I don't understand though. I'm
>>>>>> sure the separator pattern means that it splits it by tab spaces?.
>>>>>>
>>>>>> Thanks for all your help though some what relieved now!
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Ghufran
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> That looks like an error with the algorithm... What do the Hadoop
>>>>>>> userlogs say?
>>>>>>>
>>>>>>> And just to rule out weirdness, what happens if you use spaces
>>>>>>> instead of tabs (for your input graph)?
>>>>>>>
>>>>>>> Young
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hey,
>>>>>>>>
>>>>>>>> No even after I added the .txt it gets to map 100% then drops back
>>>>>>>> down to 50 and gives me the error:
>>>>>>>>
>>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input
>>>>>>>> format specified. Ensure your InputFormat does not require one.
>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>>> format vertex index type is not known
>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>>> format vertex value type is not known
>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>>> format edge value type is not known
>>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>>>>>>> disabled (default), do not allow any task retries (setting
>>>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>>>> job_201403311622_0004
>>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>>>> job_201403311622_0004
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:
>>>>>>>> SLOTS_MILLIS_MAPS=1238858
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by
>>>>>>>> all reduces waiting after reserving slots (ms)=0
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by
>>>>>>>> all maps waiting after reserving slots (ms)=0
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>>>>
>>>>>>>>
>>>>>>>> I did a check to make sure the graph was being stored correctly by
>>>>>>>> doing:
>>>>>>>>
>>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>>>>>>>> input/*
>>>>>>>> 1 2
>>>>>>>> 2 1 3 4
>>>>>>>> 3 2
>>>>>>>> 4 2
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by ghufran malik <gh...@gmail.com>.
the output your code produced is:

--3--
--4--
----
----
----
--5--
----
----
----
--6--
----
----
----
--7--

it's because of the space between the \t and closing ] in [\t ]. This will
separate output by a space. Whereas if you just have [\t] it will separate
this out using tab spacing.

Thanks for clearing that up for me!

Ghufran


On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik <gh...@gmail.com>wrote:

> Hey,
>
> Yes when originally debugging the code I thought to check what \t actually
> split by and created my own test class:
>
> import java.util.regex.Pattern;
>
>  class App
> {
>   private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
>     public static void main( String[] args )
>     {
>     String line = "1 0 2";
>      String[] tokens = SEPARATOR.split(line.toString());
>
>      System.out.println(SEPARATOR);
>      System.out.println(tokens.length);
>
>      for(String token : tokens){
>
>      System.out.println(token);
>      }
>     }
> }
>
> and the pattern worked as I thought it should by tab spaces.
>
> I'll try your test as well to double check
>
>
> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> Weird, inputs with tabs work for me right out of the box. Either the "\t"
>> is not the cause or it's some Java-version specific issue. Try this toy
>> program:
>>
>>
>> import java.util.regex.Pattern;
>>
>> public class Test {
>>   public static void main(String[] args) {
>>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>>
>>     for (int i = 0; i < tokens.length; i++) {
>>       System.out.println("--" + tokens[i] + "--");
>>     }
>>   }
>> }
>>
>>
>> Does it split the tabs properly for your Java?
>>
>> Young
>>
>>
>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <gh...@gmail.com>wrote:
>>
>>> Yep you right it is a bug with all the InputFormats I believe,  I just
>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>>> and the example ConnectedComponents class and it worked like a charm with
>>> just the normal spacing.
>>>
>>>
>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>
>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile
>>>> has to take "[\\t ]" (note the double backslash) to properly match tabs? If
>>>> so, that bug is in all the input formats...
>>>>
>>>> Happy to help :)
>>>>
>>>> Young
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I removed the spaces and it worked! I don't understand though. I'm
>>>>> sure the separator pattern means that it splits it by tab spaces?.
>>>>>
>>>>> Thanks for all your help though some what relieved now!
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Ghufran
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> That looks like an error with the algorithm... What do the Hadoop
>>>>>> userlogs say?
>>>>>>
>>>>>> And just to rule out weirdness, what happens if you use spaces
>>>>>> instead of tabs (for your input graph)?
>>>>>>
>>>>>> Young
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> No even after I added the .txt it gets to map 100% then drops back
>>>>>>> down to 50 and gives me the error:
>>>>>>>
>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input
>>>>>>> format specified. Ensure your InputFormat does not require one.
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format vertex index type is not known
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format vertex value type is not known
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format edge value type is not known
>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>>>>>> disabled (default), do not allow any task retries (setting
>>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>>> job_201403311622_0004
>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>>> job_201403311622_0004
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:
>>>>>>> SLOTS_MILLIS_MAPS=1238858
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>>>> reduces waiting after reserving slots (ms)=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>>>> maps waiting after reserving slots (ms)=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>>>
>>>>>>>
>>>>>>> I did a check to make sure the graph was being stored correctly by
>>>>>>> doing:
>>>>>>>
>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>>>>>>> input/*
>>>>>>> 1 2
>>>>>>> 2 1 3 4
>>>>>>> 3 2
>>>>>>> 4 2
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by Young Han <yo...@uwaterloo.ca>.
Ah yeah, I found the answer to that question:
https://stackoverflow.com/questions/3762347/

So I don't think that bit is a bug. I'm not really sure why inputs with
tabs don't work for you. I'm using Hadoop 1.0.4 and jdk1.6.0_30 on Ubuntu
12.04 x64, if that helps you.

Young


On Mon, Mar 31, 2014 at 4:50 PM, ghufran malik <gh...@gmail.com>wrote:

> Hey,
>
> Yes when originally debugging the code I thought to check what \t actually
> split by and created my own test class:
>
> import java.util.regex.Pattern;
>
>  class App
> {
>   private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
>     public static void main( String[] args )
>     {
>     String line = "1 0 2";
>      String[] tokens = SEPARATOR.split(line.toString());
>
>      System.out.println(SEPARATOR);
>      System.out.println(tokens.length);
>
>      for(String token : tokens){
>
>      System.out.println(token);
>      }
>     }
> }
>
> and the pattern worked as I thought it should by tab spaces.
>
> I'll try your test as well to double check
>
>
> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> Weird, inputs with tabs work for me right out of the box. Either the "\t"
>> is not the cause or it's some Java-version specific issue. Try this toy
>> program:
>>
>>
>> import java.util.regex.Pattern;
>>
>> public class Test {
>>   public static void main(String[] args) {
>>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>>
>>     for (int i = 0; i < tokens.length; i++) {
>>       System.out.println("--" + tokens[i] + "--");
>>     }
>>   }
>> }
>>
>>
>> Does it split the tabs properly for your Java?
>>
>> Young
>>
>>
>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <gh...@gmail.com>wrote:
>>
>>> Yep you right it is a bug with all the InputFormats I believe,  I just
>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>>> and the example ConnectedComponents class and it worked like a charm with
>>> just the normal spacing.
>>>
>>>
>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>
>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile
>>>> has to take "[\\t ]" (note the double backslash) to properly match tabs? If
>>>> so, that bug is in all the input formats...
>>>>
>>>> Happy to help :)
>>>>
>>>> Young
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I removed the spaces and it worked! I don't understand though. I'm
>>>>> sure the separator pattern means that it splits it by tab spaces?.
>>>>>
>>>>> Thanks for all your help though some what relieved now!
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Ghufran
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> That looks like an error with the algorithm... What do the Hadoop
>>>>>> userlogs say?
>>>>>>
>>>>>> And just to rule out weirdness, what happens if you use spaces
>>>>>> instead of tabs (for your input graph)?
>>>>>>
>>>>>> Young
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> No even after I added the .txt it gets to map 100% then drops back
>>>>>>> down to 50 and gives me the error:
>>>>>>>
>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input
>>>>>>> format specified. Ensure your InputFormat does not require one.
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format vertex index type is not known
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format vertex value type is not known
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format edge value type is not known
>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>>>>>> disabled (default), do not allow any task retries (setting
>>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>>> job_201403311622_0004
>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>>> job_201403311622_0004
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:
>>>>>>> SLOTS_MILLIS_MAPS=1238858
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>>>> reduces waiting after reserving slots (ms)=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>>>> maps waiting after reserving slots (ms)=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>>>
>>>>>>>
>>>>>>> I did a check to make sure the graph was being stored correctly by
>>>>>>> doing:
>>>>>>>
>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>>>>>>> input/*
>>>>>>> 1 2
>>>>>>> 2 1 3 4
>>>>>>> 3 2
>>>>>>> 4 2
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by ghufran malik <gh...@gmail.com>.
Hey,

Yes when originally debugging the code I thought to check what \t actually
split by and created my own test class:

import java.util.regex.Pattern;

 class App
{
  private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
    public static void main( String[] args )
    {
    String line = "1 0 2";
     String[] tokens = SEPARATOR.split(line.toString());

     System.out.println(SEPARATOR);
     System.out.println(tokens.length);

     for(String token : tokens){

     System.out.println(token);
     }
    }
}

and the pattern worked as I thought it should by tab spaces.

I'll try your test as well to double check


On Mon, Mar 31, 2014 at 9:34 PM, Young Han <yo...@uwaterloo.ca> wrote:

> Weird, inputs with tabs work for me right out of the box. Either the "\t"
> is not the cause or it's some Java-version specific issue. Try this toy
> program:
>
>
> import java.util.regex.Pattern;
>
> public class Test {
>   public static void main(String[] args) {
>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>
>     for (int i = 0; i < tokens.length; i++) {
>       System.out.println("--" + tokens[i] + "--");
>     }
>   }
> }
>
>
> Does it split the tabs properly for your Java?
>
> Young
>
>
> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <gh...@gmail.com>wrote:
>
>> Yep you right it is a bug with all the InputFormats I believe,  I just
>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>> and the example ConnectedComponents class and it worked like a charm with
>> just the normal spacing.
>>
>>
>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>
>>> Huh, it might be a bug in the code. Could it be that Pattern.compile has
>>> to take "[\\t ]" (note the double backslash) to properly match tabs? If so,
>>> that bug is in all the input formats...
>>>
>>> Happy to help :)
>>>
>>> Young
>>>
>>>
>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <gh...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I removed the spaces and it worked! I don't understand though. I'm sure
>>>> the separator pattern means that it splits it by tab spaces?.
>>>>
>>>> Thanks for all your help though some what relieved now!
>>>>
>>>> Kind regards,
>>>>
>>>> Ghufran
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> That looks like an error with the algorithm... What do the Hadoop
>>>>> userlogs say?
>>>>>
>>>>> And just to rule out weirdness, what happens if you use spaces instead
>>>>> of tabs (for your input graph)?
>>>>>
>>>>> Young
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>> ghufran1malik@gmail.com> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> No even after I added the .txt it gets to map 100% then drops back
>>>>>> down to 50 and gives me the error:
>>>>>>
>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format
>>>>>> specified. Ensure your InputFormat does not require one.
>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>> format vertex index type is not known
>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>> format vertex value type is not known
>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>> format edge value type is not known
>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>>>>> disabled (default), do not allow any task retries (setting
>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>> job_201403311622_0004
>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>> job_201403311622_0004
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1238858
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>>> reduces waiting after reserving slots (ms)=0
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>>> maps waiting after reserving slots (ms)=0
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>>
>>>>>>
>>>>>> I did a check to make sure the graph was being stored correctly by
>>>>>> doing:
>>>>>>
>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>>>>>> input/*
>>>>>> 1 2
>>>>>> 2 1 3 4
>>>>>> 3 2
>>>>>> 4 2
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by Young Han <yo...@uwaterloo.ca>.
Weird, inputs with tabs work for me right out of the box. Either the "\t"
is not the cause or it's some Java-version specific issue. Try this toy
program:


import java.util.regex.Pattern;

public class Test {
  public static void main(String[] args) {
    Pattern SEPARATOR = Pattern.compile("[\t ]");
    String[] tokens = SEPARATOR.split("3 4    5    6    7");

    for (int i = 0; i < tokens.length; i++) {
      System.out.println("--" + tokens[i] + "--");
    }
  }
}


Does it split the tabs properly for your Java?

Young


On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <gh...@gmail.com>wrote:

> Yep you right it is a bug with all the InputFormats I believe,  I just
> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
> and the example ConnectedComponents class and it worked like a charm with
> just the normal spacing.
>
>
> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> Huh, it might be a bug in the code. Could it be that Pattern.compile has
>> to take "[\\t ]" (note the double backslash) to properly match tabs? If so,
>> that bug is in all the input formats...
>>
>> Happy to help :)
>>
>> Young
>>
>>
>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <gh...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I removed the spaces and it worked! I don't understand though. I'm sure
>>> the separator pattern means that it splits it by tab spaces?.
>>>
>>> Thanks for all your help though some what relieved now!
>>>
>>> Kind regards,
>>>
>>> Ghufran
>>>
>>>
>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>
>>>> Hi,
>>>>
>>>> That looks like an error with the algorithm... What do the Hadoop
>>>> userlogs say?
>>>>
>>>> And just to rule out weirdness, what happens if you use spaces instead
>>>> of tabs (for your input graph)?
>>>>
>>>> Young
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> No even after I added the .txt it gets to map 100% then drops back
>>>>> down to 50 and gives me the error:
>>>>>
>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format
>>>>> specified. Ensure your InputFormat does not require one.
>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>>>> vertex index type is not known
>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>>>> vertex value type is not known
>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>>>> edge value type is not known
>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>>>> disabled (default), do not allow any task retries (setting
>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>> job_201403311622_0004
>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>> job_201403311622_0004
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1238858
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>> reduces waiting after reserving slots (ms)=0
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>> maps waiting after reserving slots (ms)=0
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>
>>>>>
>>>>> I did a check to make sure the graph was being stored correctly by
>>>>> doing:
>>>>>
>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>>>>> input/*
>>>>> 1 2
>>>>> 2 1 3 4
>>>>> 3 2
>>>>> 4 2
>>>>>
>>>>
>>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by Young Han <yo...@uwaterloo.ca>.
Huh, it might be a bug in the code. Could it be that Pattern.compile has to
take "[\\t ]" (note the double backslash) to properly match tabs? If so,
that bug is in all the input formats...

Happy to help :)

Young


On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <gh...@gmail.com>wrote:

> Hi,
>
> I removed the spaces and it worked! I don't understand though. I'm sure
> the separator pattern means that it splits it by tab spaces?.
>
> Thanks for all your help though some what relieved now!
>
> Kind regards,
>
> Ghufran
>
>
> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> Hi,
>>
>> That looks like an error with the algorithm... What do the Hadoop
>> userlogs say?
>>
>> And just to rule out weirdness, what happens if you use spaces instead of
>> tabs (for your input graph)?
>>
>> Young
>>
>>
>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <gh...@gmail.com>wrote:
>>
>>> Hey,
>>>
>>> No even after I added the .txt it gets to map 100% then drops back down
>>> to 50 and gives me the error:
>>>
>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format
>>> specified. Ensure your InputFormat does not require one.
>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>> vertex index type is not known
>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>> vertex value type is not known
>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>> edge value type is not known
>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>> disabled (default), do not allow any task retries (setting
>>> mapred.map.max.attempts = 0, old value = 4)
>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>> job_201403311622_0004
>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>> job_201403311622_0004
>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1238858
>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>> reduces waiting after reserving slots (ms)=0
>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>> maps waiting after reserving slots (ms)=0
>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>
>>>
>>> I did a check to make sure the graph was being stored correctly by
>>> doing:
>>>
>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>>> input/*
>>> 1 2
>>> 2 1 3 4
>>> 3 2
>>> 4 2
>>>
>>
>>
>

Re: ConnectedComponents example

Posted by ghufran malik <gh...@gmail.com>.
Hi,

I removed the spaces and it worked! I don't understand though. I'm sure the
separator pattern means that it splits it by tab spaces?.

Thanks for all your help though some what relieved now!

Kind regards,

Ghufran


On Mon, Mar 31, 2014 at 8:15 PM, Young Han <yo...@uwaterloo.ca> wrote:

> Hi,
>
> That looks like an error with the algorithm... What do the Hadoop userlogs
> say?
>
> And just to rule out weirdness, what happens if you use spaces instead of
> tabs (for your input graph)?
>
> Young
>
>
> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <gh...@gmail.com>wrote:
>
>> Hey,
>>
>> No even after I added the .txt it gets to map 100% then drops back down
>> to 50 and gives me the error:
>>
>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format
>> specified. Ensure your InputFormat does not require one.
>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>> vertex index type is not known
>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>> vertex value type is not known
>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>> edge value type is not known
>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>> disabled (default), do not allow any task retries (setting
>> mapred.map.max.attempts = 0, old value = 4)
>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>> job_201403311622_0004
>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>> job_201403311622_0004
>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1238858
>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>
>>
>> I did a check to make sure the graph was being stored correctly by doing:
>>
>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>> input/*
>> 1 2
>> 2 1 3 4
>> 3 2
>> 4 2
>>
>
>

Re: ConnectedComponents example

Posted by Young Han <yo...@uwaterloo.ca>.
Hi,

That looks like an error with the algorithm... What do the Hadoop userlogs
say?

And just to rule out weirdness, what happens if you use spaces instead of
tabs (for your input graph)?

Young


On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <gh...@gmail.com>wrote:

> Hey,
>
> No even after I added the .txt it gets to map 100% then drops back down to
> 50 and gives me the error:
>
> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
> vertex index type is not known
> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
> vertex value type is not known
> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
> edge value type is not known
> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 14/03/31 18:22:57 INFO mapred.JobClient: Running job: job_201403311622_0004
> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
> job_201403311622_0004
> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1238858
> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>
>
> I did a check to make sure the graph was being stored correctly by doing:
>
> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat input/*
> 1 2
> 2 1 3 4
> 3 2
> 4 2
>

Fwd: ConnectedComponents example

Posted by ghufran malik <gh...@gmail.com>.
Hey,

No even after I added the .txt it gets to map 100% then drops back down to
50 and gives me the error:

14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format
specified. Ensure your InputFormat does not require one.
14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
vertex index type is not known
14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
vertex value type is not known
14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format edge
value type is not known
14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is disabled
(default), do not allow any task retries (setting mapred.map.max.attempts =
0, old value = 4)
14/03/31 18:22:57 INFO mapred.JobClient: Running job: job_201403311622_0004
14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
14/03/31 18:33:30 INFO mapred.JobClient: Job complete: job_201403311622_0004
14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1238858
14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1


I did a check to make sure the graph was being stored correctly by doing:

ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat input/*
1 2
2 1 3 4
3 2
4 2

Re: ConnectedComponents example

Posted by Young Han <yo...@uwaterloo.ca>.
Hmm.. it looks like a failure during graph loading. Did you forget a .txt
in the input path?

Young


On Mon, Mar 31, 2014 at 1:17 PM, ghufran malik <gh...@gmail.com>wrote:

> Hi,
>
> Thanks for the speedy response!
>
> It didn't work for me :(.
>
> I updated the ConnectComponentsVertex class with yours and added in the
> new ConnectedComponentsInputFormat class. They are both in the
> giraph-examples/src/main/java/org/apache/giraph/examples package.
> To compile the example package:
> I cd'd to ~/Downloads/giraph-folder/giraph-1.0.0/giraph-examples
> and then typed "mvn compile" which resulted in BUILD SUCCESS. As a sanity
> check I checked the jar to make sure it had the
> ConnectedComponentsInputFormat class in it, and it did.
>
> I then updated my graph by taking out the vertex values so in the end I
> had:
>
>
> 1 2
> 2 1 3 4
> 3 2
> 4 2
>
> where the numbers are separated out by tab space ([\t]).
>
> The command I ran was:
>
> hadoop jar
> /home/ghufran/Downloads/giraph-folder/giraph-1.0.0/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.ConnectedComponentsVertex -vif
> org.apache.giraph.examples.ConnectedComponentsInputFormat -vip
> /user/ghufran/input/my_graph -of
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/ghufran/giraph-output -w 1
>
>
> but I ended up with the output:
>
> 14/03/31 17:43:49 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format
> vertex index type is not known
> 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format
> vertex value type is not known
> 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format
> edge value type is not known
> 14/03/31 17:43:49 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 14/03/31 17:43:50 INFO mapred.JobClient: Running job: job_201403311622_0002
> 14/03/31 17:43:51 INFO mapred.JobClient:  map 0% reduce 0%
> 14/03/31 17:44:08 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/31 17:54:54 INFO mapred.JobClient:  map 0% reduce 0%
> 14/03/31 17:54:59 INFO mapred.JobClient: Job complete:
> job_201403311622_0002
> 14/03/31 17:54:59 INFO mapred.JobClient: Counters: 6
> 14/03/31 17:54:59 INFO mapred.JobClient:   Job Counters
> 14/03/31 17:54:59 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=656429
> 14/03/31 17:54:59 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 14/03/31 17:54:59 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/31 17:54:59 INFO mapred.JobClient:     Launched map tasks=2
> 14/03/31 17:54:59 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 14/03/31 17:54:59 INFO mapred.JobClient:     Failed map tasks=1
>
> Any ideas to why this happened? Do you think I need to update the hadoop I
> am using?
>
> Kind regards,
>
> Ghufran
>
>
> On Mon, Mar 31, 2014 at 5:11 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> Hey,
>>
>> Sure, I've uploaded the 1.0.0 classes I'm using:
>> http://pastebin.com/0cTdWrR4
>> http://pastebin.com/jWgVAzH6
>>
>> They both go into giraph-examples/src/main/java/org/apache/giraph/examples
>>
>> Note that the input format it accepts is of the form "src dst1 dst2 dst3
>> ..."---there is no vertex value. So your test graph would be:
>>
>> 1 2
>> 2 1 3 4
>> 3 2
>> 4 2
>>
>> The command I'm using is:
>>
>> hadoop jar
>> "$GIRAPH_DIR"/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar
>> org.apache.giraph.GiraphRunner \
>>     org.apache.giraph.examples.ConnectedComponentsVertex \
>>     -vif org.apache.giraph.examples.ConnectedComponentsInputFormat \
>>     -vip /user/${USER}/input/${inputgraph} \
>>     -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
>>     -op /user/${USER}/giraph-output/ \
>>     -w 1
>>
>> You'll want to change $GIRAPH_DIR, ${inputgraph}, and also the JAR file
>> name since you're using Hadoop 0.20.203.
>>
>> Young
>>
>>
>> On Mon, Mar 31, 2014 at 12:00 PM, ghufran malik <gh...@gmail.com>wrote:
>>
>>> Hi Young,
>>>
>>> I'd just like to say first thank you for your help it's much appreciated!
>>>
>>> I did the sanity check and everything seems fine I see the correct
>>> results.
>>>
>>> Yes I hadn't noticed that before that is strange, I don't know how that
>>> happened as on the quick start guide (
>>> https://giraph.apache.org/quick_start.html#qs_section_2) it says hadoop
>>> 0.20.203 was the assumed default. I have both Giraph 1.1.0 and Giraph 1.0.0
>>> and my Giraph 1.0.0 is compiled to 0.20.203.
>>>
>>> I edited the code as you said for Giraph 1.1.0 but still received the
>>> same error as before, so I thought it may be due to the hadoop version it
>>> was compiled for. So I decided to try modify the code in Giraph 1.0.0
>>> instead, however since I do not have the correct input format class and the
>>> vertex object is not instantiated in the ConnectedComponents class of
>>> Giraph 1.0.0, I was wondering if you could send me the full classes for
>>> both the ConnectedComponents class and the InputFormat so that I know code
>>> wise everything should be correct.
>>>
>>> I will be trying to implement the InputFormat class and
>>> ConnectedComponents in the meantime and if I get it working before you
>>> respond I'll update this post.
>>>
>>> Thanks
>>>
>>> Ghufran.
>>>
>>>
>>> On Sun, Mar 30, 2014 at 5:41 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>>
>>>> Hey,
>>>>
>>>> As a sanity check, is the graph really loaded on HDFS? Do you see the
>>>> correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"?
>>>> (Where hadoop is your hadoop binary).
>>>>
>>>> Also, I noticed that your Giraph has been compiled for Hadoop 1.x,
>>>> while the logs show Hadoop 0.20.203.0. Maybe that could be the cause too?
>>>>
>>>> Finally, this may be completely irrelevant, but I had issues running
>>>> connected components on Giraph 1.0.0 and I fixed it by changing the
>>>> algorithm and the input format. The input format you're using on 1.1.0
>>>> looks correct to me. The algorithm change I did was to the first "if" block
>>>> in ConnectedComponentsComputation:
>>>>
>>>>     if (getSuperstep() == 0) {      currentComponent = vertex.getId().get();      vertex.setValue(new IntWritable(currentComponent));      sendMessageToAllEdges(vertex, vertex.getValue());      vertex.voteToHalt();      return;    }
>>>>
>>>> I forget what error this change solved, so it may not help in your case.
>>>>
>>>> Young
>>>>
>>>>
>>>>
>>>> On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am a final year Bsc Computer Science Student who is using Apache
>>>>> Giraph for my final year project and dissertation and would appreciate very
>>>>> much if someone could help me with the following issue.
>>>>>
>>>>> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am
>>>>> having trouble running the ConnectedComponents example. I use the following
>>>>> command:
>>>>>
>>>>>  hadoop jar
>>>>> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>>>>> org.apache.giraph.GiraphRunner
>>>>> org.apache.giraph.examples.ConnectedComponentsComputation -vif
>>>>> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip
>>>>> /user/ghufran/in/my_graph.txt -vof
>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>>>> /user/ghufran/outCC -w 1
>>>>>
>>>>>
>>>>> I believe it gets stuck in the InputSuperstep as the following is
>>>>> displayed in terminal when the command is running:
>>>>>
>>>>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>>>>> average 108.78MB
>>>>> ....
>>>>>
>>>>> which I traced back to the following if statement in the toString()
>>>>> method of core.org.apache.job.CombinedWorkerProgress:
>>>>>
>>>>> if (isInputSuperstep()) {
>>>>>       sb.append("Loading data: ");
>>>>>       sb.append(verticesLoaded).append(" vertices loaded, ");
>>>>>       sb.append(vertexInputSplitsLoaded).append(
>>>>>           " vertex input splits loaded; ");
>>>>>       sb.append(edgesLoaded).append(" edges loaded, ");
>>>>>       sb.append(edgeInputSplitsLoaded).append(" edge input splits
>>>>> loaded");
>>>>>
>>>>> sb.append("; min free memory on worker ").append(
>>>>>         workerWithMinFreeMemory).append(" - ").append(
>>>>>         DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average
>>>>> ").append(
>>>>>         DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");
>>>>>
>>>>> So it seems to me that it's not loading in the InputFormat correctly.
>>>>> So I am assuming there's something wrong with my input format class or,
>>>>> probably more likely, something wrong with the graph I passed in?
>>>>>
>>>>> I pass in a small graph that has the format vertex id, vertex value,
>>>>> neighbours separated by tabs, my graph is shown below:
>>>>>
>>>>> 1 0 2
>>>>> 2 1 1 3 4
>>>>> 3 2 2
>>>>> 4 3 2
>>>>>
>>>>> The full output is shown below after I ran my command is shown below.
>>>>> If anyone could explain to me why I am not getting the expected output I
>>>>> would greatly appreciate it.
>>>>>
>>>>> Many thanks,
>>>>>
>>>>> Ghufran
>>>>>
>>>>>
>>>>> FULL OUTPUT:
>>>>>
>>>>>
>>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format
>>>>> specified. Ensure your InputFormat does not require one.
>>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format
>>>>> specified. Ensure your OutputFormat does not require one.
>>>>> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is
>>>>> disabled (default), do not allow any task retries (setting
>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL:
>>>>> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
>>>>> 14/03/30 10:48:45 INFO
>>>>> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
>>>>> writeHaltInstructions: To halt after next superstep execute:
>>>>> 'bin/halt-application --zkServer ghufran:22181 --zkNode
>>>>> /_hadoopBsp/job_201403301044_0001/_haltComputation'
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:
>>>>> host.name=ghufran
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.version=1.7.0_51
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.vendor=Oracle Corporation
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.io.tmpdir=/tmp
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.compiler=<NA>
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name
>>>>> =Linux
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:os.arch=amd64
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:os.version=3.8.0-35-generic
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:
>>>>> user.name=ghufran
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:user.home=/home/ghufran
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client
>>>>> connection, connectString=ghufran:22181 sessionTimeout=60000
>>>>> watcher=org.apache.giraph.job.JobProgressTracker@209fa588
>>>>> 14/03/30 10:48:45 INFO mapred.JobClient: Running job:
>>>>> job_201403301044_0001
>>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection
>>>>> to server ghufran/127.0.1.1:22181. Will not attempt to authenticate
>>>>> using SASL (unknown error)
>>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection
>>>>> established to ghufran/127.0.1.1:22181, initiating session
>>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment
>>>>> complete on server ghufran/127.0.1.1:22181, sessionid =
>>>>> 0x1451263c44c0002, negotiated timeout = 600000
>>>>>  14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>>>>> average 108.78MB
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by ghufran malik <gh...@gmail.com>.
Hi,

Thanks for the speedy response!

It didn't work for me :(.

I updated the ConnectComponentsVertex class with yours and added in the new
ConnectedComponentsInputFormat class. They are both in the
giraph-examples/src/main/java/org/apache/giraph/examples package.
To compile the example package:
I cd'd to ~/Downloads/giraph-folder/giraph-1.0.0/giraph-examples
and then typed "mvn compile" which resulted in BUILD SUCCESS. As a sanity
check I checked the jar to make sure it had the
ConnectedComponentsInputFormat class in it, and it did.

I then updated my graph by taking out the vertex values so in the end I
had:

1 2
2 1 3 4
3 2
4 2

where the numbers are separated out by tab space ([\t]).

The command I ran was:

hadoop jar
/home/ghufran/Downloads/giraph-folder/giraph-1.0.0/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.ConnectedComponentsVertex -vif
org.apache.giraph.examples.ConnectedComponentsInputFormat -vip
/user/ghufran/input/my_graph -of
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/ghufran/giraph-output -w 1


but I ended up with the output:

14/03/31 17:43:49 INFO utils.ConfigurationUtils: No edge input format
specified. Ensure your InputFormat does not require one.
14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format
vertex index type is not known
14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format
vertex value type is not known
14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format edge
value type is not known
14/03/31 17:43:49 INFO job.GiraphJob: run: Since checkpointing is disabled
(default), do not allow any task retries (setting mapred.map.max.attempts =
0, old value = 4)
14/03/31 17:43:50 INFO mapred.JobClient: Running job: job_201403311622_0002
14/03/31 17:43:51 INFO mapred.JobClient:  map 0% reduce 0%
14/03/31 17:44:08 INFO mapred.JobClient:  map 50% reduce 0%
14/03/31 17:54:54 INFO mapred.JobClient:  map 0% reduce 0%
14/03/31 17:54:59 INFO mapred.JobClient: Job complete: job_201403311622_0002
14/03/31 17:54:59 INFO mapred.JobClient: Counters: 6
14/03/31 17:54:59 INFO mapred.JobClient:   Job Counters
14/03/31 17:54:59 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=656429
14/03/31 17:54:59 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
14/03/31 17:54:59 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
14/03/31 17:54:59 INFO mapred.JobClient:     Launched map tasks=2
14/03/31 17:54:59 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/03/31 17:54:59 INFO mapred.JobClient:     Failed map tasks=1

Any ideas to why this happened? Do you think I need to update the hadoop I
am using?

Kind regards,

Ghufran


On Mon, Mar 31, 2014 at 5:11 PM, Young Han <yo...@uwaterloo.ca> wrote:

> Hey,
>
> Sure, I've uploaded the 1.0.0 classes I'm using:
> http://pastebin.com/0cTdWrR4
> http://pastebin.com/jWgVAzH6
>
> They both go into giraph-examples/src/main/java/org/apache/giraph/examples
>
> Note that the input format it accepts is of the form "src dst1 dst2 dst3
> ..."---there is no vertex value. So your test graph would be:
>
> 1 2
> 2 1 3 4
> 3 2
> 4 2
>
> The command I'm using is:
>
> hadoop jar
> "$GIRAPH_DIR"/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner \
>     org.apache.giraph.examples.ConnectedComponentsVertex \
>     -vif org.apache.giraph.examples.ConnectedComponentsInputFormat \
>     -vip /user/${USER}/input/${inputgraph} \
>     -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
>     -op /user/${USER}/giraph-output/ \
>     -w 1
>
> You'll want to change $GIRAPH_DIR, ${inputgraph}, and also the JAR file
> name since you're using Hadoop 0.20.203.
>
> Young
>
>
> On Mon, Mar 31, 2014 at 12:00 PM, ghufran malik <gh...@gmail.com>wrote:
>
>> Hi Young,
>>
>> I'd just like to say first thank you for your help it's much appreciated!
>>
>> I did the sanity check and everything seems fine I see the correct
>> results.
>>
>> Yes I hadn't noticed that before that is strange, I don't know how that
>> happened as on the quick start guide (
>> https://giraph.apache.org/quick_start.html#qs_section_2) it says hadoop
>> 0.20.203 was the assumed default. I have both Giraph 1.1.0 and Giraph 1.0.0
>> and my Giraph 1.0.0 is compiled to 0.20.203.
>>
>> I edited the code as you said for Giraph 1.1.0 but still received the
>> same error as before, so I thought it may be due to the hadoop version it
>> was compiled for. So I decided to try modify the code in Giraph 1.0.0
>> instead, however since I do not have the correct input format class and the
>> vertex object is not instantiated in the ConnectedComponents class of
>> Giraph 1.0.0, I was wondering if you could send me the full classes for
>> both the ConnectedComponents class and the InputFormat so that I know code
>> wise everything should be correct.
>>
>> I will be trying to implement the InputFormat class and
>> ConnectedComponents in the meantime and if I get it working before you
>> respond I'll update this post.
>>
>> Thanks
>>
>> Ghufran.
>>
>>
>> On Sun, Mar 30, 2014 at 5:41 PM, Young Han <yo...@uwaterloo.ca>wrote:
>>
>>> Hey,
>>>
>>> As a sanity check, is the graph really loaded on HDFS? Do you see the
>>> correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"?
>>> (Where hadoop is your hadoop binary).
>>>
>>> Also, I noticed that your Giraph has been compiled for Hadoop 1.x, while
>>> the logs show Hadoop 0.20.203.0. Maybe that could be the cause too?
>>>
>>> Finally, this may be completely irrelevant, but I had issues running
>>> connected components on Giraph 1.0.0 and I fixed it by changing the
>>> algorithm and the input format. The input format you're using on 1.1.0
>>> looks correct to me. The algorithm change I did was to the first "if" block
>>> in ConnectedComponentsComputation:
>>>
>>>     if (getSuperstep() == 0) {      currentComponent = vertex.getId().get();      vertex.setValue(new IntWritable(currentComponent));      sendMessageToAllEdges(vertex, vertex.getValue());      vertex.voteToHalt();      return;    }
>>>
>>> I forget what error this change solved, so it may not help in your case.
>>>
>>> Young
>>>
>>>
>>>
>>> On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <gh...@gmail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> I am a final year Bsc Computer Science Student who is using Apache
>>>> Giraph for my final year project and dissertation and would appreciate very
>>>> much if someone could help me with the following issue.
>>>>
>>>> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am
>>>> having trouble running the ConnectedComponents example. I use the following
>>>> command:
>>>>
>>>>  hadoop jar
>>>> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>>>> org.apache.giraph.GiraphRunner
>>>> org.apache.giraph.examples.ConnectedComponentsComputation -vif
>>>> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip
>>>> /user/ghufran/in/my_graph.txt -vof
>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>>> /user/ghufran/outCC -w 1
>>>>
>>>>
>>>> I believe it gets stuck in the InputSuperstep as the following is
>>>> displayed in terminal when the command is running:
>>>>
>>>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>> average 109.01MB
>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>> average 109.01MB
>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>>>> average 108.78MB
>>>> ....
>>>>
>>>> which I traced back to the following if statement in the toString()
>>>> method of core.org.apache.job.CombinedWorkerProgress:
>>>>
>>>> if (isInputSuperstep()) {
>>>>       sb.append("Loading data: ");
>>>>       sb.append(verticesLoaded).append(" vertices loaded, ");
>>>>       sb.append(vertexInputSplitsLoaded).append(
>>>>           " vertex input splits loaded; ");
>>>>       sb.append(edgesLoaded).append(" edges loaded, ");
>>>>       sb.append(edgeInputSplitsLoaded).append(" edge input splits
>>>> loaded");
>>>>
>>>> sb.append("; min free memory on worker ").append(
>>>>         workerWithMinFreeMemory).append(" - ").append(
>>>>         DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average
>>>> ").append(
>>>>         DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");
>>>>
>>>> So it seems to me that it's not loading in the InputFormat correctly.
>>>> So I am assuming there's something wrong with my input format class or,
>>>> probably more likely, something wrong with the graph I passed in?
>>>>
>>>> I pass in a small graph that has the format vertex id, vertex value,
>>>> neighbours separated by tabs, my graph is shown below:
>>>>
>>>> 1 0 2
>>>> 2 1 1 3 4
>>>> 3 2 2
>>>> 4 3 2
>>>>
>>>> The full output is shown below after I ran my command is shown below.
>>>> If anyone could explain to me why I am not getting the expected output I
>>>> would greatly appreciate it.
>>>>
>>>> Many thanks,
>>>>
>>>> Ghufran
>>>>
>>>>
>>>> FULL OUTPUT:
>>>>
>>>>
>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format
>>>> specified. Ensure your InputFormat does not require one.
>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format
>>>> specified. Ensure your OutputFormat does not require one.
>>>> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is
>>>> disabled (default), do not allow any task retries (setting
>>>> mapred.map.max.attempts = 0, old value = 4)
>>>> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL:
>>>> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
>>>> 14/03/30 10:48:45 INFO
>>>> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
>>>> writeHaltInstructions: To halt after next superstep execute:
>>>> 'bin/halt-application --zkServer ghufran:22181 --zkNode
>>>> /_hadoopBsp/job_201403301044_0001/_haltComputation'
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:
>>>> host.name=ghufran
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:java.version=1.7.0_51
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:java.vendor=Oracle Corporation
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:java.io.tmpdir=/tmp
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:java.compiler=<NA>
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name
>>>> =Linux
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:os.arch=amd64
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:os.version=3.8.0-35-generic
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:
>>>> user.name=ghufran
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:user.home=/home/ghufran
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client
>>>> connection, connectString=ghufran:22181 sessionTimeout=60000
>>>> watcher=org.apache.giraph.job.JobProgressTracker@209fa588
>>>> 14/03/30 10:48:45 INFO mapred.JobClient: Running job:
>>>> job_201403301044_0001
>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection
>>>> to server ghufran/127.0.1.1:22181. Will not attempt to authenticate
>>>> using SASL (unknown error)
>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection
>>>> established to ghufran/127.0.1.1:22181, initiating session
>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment
>>>> complete on server ghufran/127.0.1.1:22181, sessionid =
>>>> 0x1451263c44c0002, negotiated timeout = 600000
>>>>  14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers -
>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>> average 109.01MB
>>>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>> average 109.01MB
>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>> average 109.01MB
>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>>>> average 108.78MB
>>>>
>>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by Young Han <yo...@uwaterloo.ca>.
Hey,

Sure, I've uploaded the 1.0.0 classes I'm using:
http://pastebin.com/0cTdWrR4
http://pastebin.com/jWgVAzH6

They both go into giraph-examples/src/main/java/org/apache/giraph/examples

Note that the input format it accepts is of the form "src dst1 dst2 dst3
..."---there is no vertex value. So your test graph would be:

1 2
2 1 3 4
3 2
4 2

The command I'm using is:

hadoop jar
"$GIRAPH_DIR"/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner \
    org.apache.giraph.examples.ConnectedComponentsVertex \
    -vif org.apache.giraph.examples.ConnectedComponentsInputFormat \
    -vip /user/${USER}/input/${inputgraph} \
    -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
    -op /user/${USER}/giraph-output/ \
    -w 1

You'll want to change $GIRAPH_DIR, ${inputgraph}, and also the JAR file
name since you're using Hadoop 0.20.203.

Young


On Mon, Mar 31, 2014 at 12:00 PM, ghufran malik <gh...@gmail.com>wrote:

> Hi Young,
>
> I'd just like to say first thank you for your help it's much appreciated!
>
> I did the sanity check and everything seems fine I see the correct
> results.
>
> Yes I hadn't noticed that before that is strange, I don't know how that
> happened as on the quick start guide (
> https://giraph.apache.org/quick_start.html#qs_section_2) it says hadoop
> 0.20.203 was the assumed default. I have both Giraph 1.1.0 and Giraph 1.0.0
> and my Giraph 1.0.0 is compiled to 0.20.203.
>
> I edited the code as you said for Giraph 1.1.0 but still received the same
> error as before, so I thought it may be due to the hadoop version it was
> compiled for. So I decided to try modify the code in Giraph 1.0.0 instead,
> however since I do not have the correct input format class and the vertex
> object is not instantiated in the ConnectedComponents class of Giraph
> 1.0.0, I was wondering if you could send me the full classes for both the
> ConnectedComponents class and the InputFormat so that I know code wise
> everything should be correct.
>
> I will be trying to implement the InputFormat class and
> ConnectedComponents in the meantime and if I get it working before you
> respond I'll update this post.
>
> Thanks
>
> Ghufran.
>
>
> On Sun, Mar 30, 2014 at 5:41 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> Hey,
>>
>> As a sanity check, is the graph really loaded on HDFS? Do you see the
>> correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"?
>> (Where hadoop is your hadoop binary).
>>
>> Also, I noticed that your Giraph has been compiled for Hadoop 1.x, while
>> the logs show Hadoop 0.20.203.0. Maybe that could be the cause too?
>>
>> Finally, this may be completely irrelevant, but I had issues running
>> connected components on Giraph 1.0.0 and I fixed it by changing the
>> algorithm and the input format. The input format you're using on 1.1.0
>> looks correct to me. The algorithm change I did was to the first "if" block
>> in ConnectedComponentsComputation:
>>
>>     if (getSuperstep() == 0) {      currentComponent = vertex.getId().get();      vertex.setValue(new IntWritable(currentComponent));      sendMessageToAllEdges(vertex, vertex.getValue());      vertex.voteToHalt();      return;    }
>>
>> I forget what error this change solved, so it may not help in your case.
>>
>> Young
>>
>>
>>
>> On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <gh...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I am a final year Bsc Computer Science Student who is using Apache
>>> Giraph for my final year project and dissertation and would appreciate very
>>> much if someone could help me with the following issue.
>>>
>>> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am
>>> having trouble running the ConnectedComponents example. I use the following
>>> command:
>>>
>>>  hadoop jar
>>> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>>> org.apache.giraph.GiraphRunner
>>> org.apache.giraph.examples.ConnectedComponentsComputation -vif
>>> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip
>>> /user/ghufran/in/my_graph.txt -vof
>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>> /user/ghufran/outCC -w 1
>>>
>>>
>>> I believe it gets stuck in the InputSuperstep as the following is
>>> displayed in terminal when the command is running:
>>>
>>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>> average 109.01MB
>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>> average 109.01MB
>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>>> average 108.78MB
>>> ....
>>>
>>> which I traced back to the following if statement in the toString()
>>> method of core.org.apache.job.CombinedWorkerProgress:
>>>
>>> if (isInputSuperstep()) {
>>>       sb.append("Loading data: ");
>>>       sb.append(verticesLoaded).append(" vertices loaded, ");
>>>       sb.append(vertexInputSplitsLoaded).append(
>>>           " vertex input splits loaded; ");
>>>       sb.append(edgesLoaded).append(" edges loaded, ");
>>>       sb.append(edgeInputSplitsLoaded).append(" edge input splits
>>> loaded");
>>>
>>> sb.append("; min free memory on worker ").append(
>>>         workerWithMinFreeMemory).append(" - ").append(
>>>         DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average
>>> ").append(
>>>         DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");
>>>
>>> So it seems to me that it's not loading in the InputFormat correctly. So
>>> I am assuming there's something wrong with my input format class or,
>>> probably more likely, something wrong with the graph I passed in?
>>>
>>> I pass in a small graph that has the format vertex id, vertex value,
>>> neighbours separated by tabs, my graph is shown below:
>>>
>>> 1 0 2
>>> 2 1 1 3 4
>>> 3 2 2
>>> 4 3 2
>>>
>>> The full output is shown below after I ran my command is shown below. If
>>> anyone could explain to me why I am not getting the expected output I would
>>> greatly appreciate it.
>>>
>>> Many thanks,
>>>
>>> Ghufran
>>>
>>>
>>> FULL OUTPUT:
>>>
>>>
>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format
>>> specified. Ensure your InputFormat does not require one.
>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format
>>> specified. Ensure your OutputFormat does not require one.
>>> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is
>>> disabled (default), do not allow any task retries (setting
>>> mapred.map.max.attempts = 0, old value = 4)
>>> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL:
>>> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
>>> 14/03/30 10:48:45 INFO
>>> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
>>> writeHaltInstructions: To halt after next superstep execute:
>>> 'bin/halt-application --zkServer ghufran:22181 --zkNode
>>> /_hadoopBsp/job_201403301044_0001/_haltComputation'
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:host.name
>>> =ghufran
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.version=1.7.0_51
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Oracle Corporation
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.io.tmpdir=/tmp
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.compiler=<NA>
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name
>>> =Linux
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:os.arch=amd64
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:os.version=3.8.0-35-generic
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.name
>>> =ghufran
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:user.home=/home/ghufran
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client
>>> connection, connectString=ghufran:22181 sessionTimeout=60000
>>> watcher=org.apache.giraph.job.JobProgressTracker@209fa588
>>> 14/03/30 10:48:45 INFO mapred.JobClient: Running job:
>>> job_201403301044_0001
>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection
>>> to server ghufran/127.0.1.1:22181. Will not attempt to authenticate
>>> using SASL (unknown error)
>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection
>>> established to ghufran/127.0.1.1:22181, initiating session
>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment
>>> complete on server ghufran/127.0.1.1:22181, sessionid =
>>> 0x1451263c44c0002, negotiated timeout = 600000
>>>  14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers -
>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>> average 109.01MB
>>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>> average 109.01MB
>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>> average 109.01MB
>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>>> average 108.78MB
>>>
>>>
>>
>

Re: ConnectedComponents example

Posted by ghufran malik <gh...@gmail.com>.
Hi Young,

I'd just like to say first thank you for your help it's much appreciated!

I did the sanity check and everything seems fine I see the correct results.

Yes I hadn't noticed that before that is strange, I don't know how that
happened as on the quick start guide (
https://giraph.apache.org/quick_start.html#qs_section_2) it says hadoop
0.20.203 was the assumed default. I have both Giraph 1.1.0 and Giraph 1.0.0
and my Giraph 1.0.0 is compiled to 0.20.203.

I edited the code as you said for Giraph 1.1.0 but still received the same
error as before, so I thought it may be due to the hadoop version it was
compiled for. So I decided to try modify the code in Giraph 1.0.0 instead,
however since I do not have the correct input format class and the vertex
object is not instantiated in the ConnectedComponents class of Giraph
1.0.0, I was wondering if you could send me the full classes for both the
ConnectedComponents class and the InputFormat so that I know code wise
everything should be correct.

I will be trying to implement the InputFormat class and ConnectedComponents
in the meantime and if I get it working before you respond I'll update this
post.

Thanks

Ghufran.


On Sun, Mar 30, 2014 at 5:41 PM, Young Han <yo...@uwaterloo.ca> wrote:

> Hey,
>
> As a sanity check, is the graph really loaded on HDFS? Do you see the
> correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"?
> (Where hadoop is your hadoop binary).
>
> Also, I noticed that your Giraph has been compiled for Hadoop 1.x, while
> the logs show Hadoop 0.20.203.0. Maybe that could be the cause too?
>
> Finally, this may be completely irrelevant, but I had issues running
> connected components on Giraph 1.0.0 and I fixed it by changing the
> algorithm and the input format. The input format you're using on 1.1.0
> looks correct to me. The algorithm change I did was to the first "if" block
> in ConnectedComponentsComputation:
>
>     if (getSuperstep() == 0) {      currentComponent = vertex.getId().get();      vertex.setValue(new IntWritable(currentComponent));      sendMessageToAllEdges(vertex, vertex.getValue());      vertex.voteToHalt();      return;    }
>
> I forget what error this change solved, so it may not help in your case.
>
> Young
>
>
>
> On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <gh...@gmail.com>wrote:
>
>> Hello,
>>
>> I am a final year Bsc Computer Science Student who is using Apache Giraph
>> for my final year project and dissertation and would appreciate very much
>> if someone could help me with the following issue.
>>
>> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am
>> having trouble running the ConnectedComponents example. I use the following
>> command:
>>
>>  hadoop jar
>> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>> org.apache.giraph.GiraphRunner
>> org.apache.giraph.examples.ConnectedComponentsComputation -vif
>> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip
>> /user/ghufran/in/my_graph.txt -vof
>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>> /user/ghufran/outCC -w 1
>>
>>
>> I believe it gets stuck in the InputSuperstep as the following is
>> displayed in terminal when the command is running:
>>
>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>> average 109.01MB
>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>> average 109.01MB
>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>> average 108.78MB
>> ....
>>
>> which I traced back to the following if statement in the toString()
>> method of core.org.apache.job.CombinedWorkerProgress:
>>
>> if (isInputSuperstep()) {
>>       sb.append("Loading data: ");
>>       sb.append(verticesLoaded).append(" vertices loaded, ");
>>       sb.append(vertexInputSplitsLoaded).append(
>>           " vertex input splits loaded; ");
>>       sb.append(edgesLoaded).append(" edges loaded, ");
>>       sb.append(edgeInputSplitsLoaded).append(" edge input splits
>> loaded");
>>
>> sb.append("; min free memory on worker ").append(
>>         workerWithMinFreeMemory).append(" - ").append(
>>         DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average
>> ").append(
>>         DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");
>>
>> So it seems to me that it's not loading in the InputFormat correctly. So
>> I am assuming there's something wrong with my input format class or,
>> probably more likely, something wrong with the graph I passed in?
>>
>> I pass in a small graph that has the format vertex id, vertex value,
>> neighbours separated by tabs, my graph is shown below:
>>
>> 1 0 2
>> 2 1 1 3 4
>> 3 2 2
>> 4 3 2
>>
>> The full output is shown below after I ran my command is shown below. If
>> anyone could explain to me why I am not getting the expected output I would
>> greatly appreciate it.
>>
>> Many thanks,
>>
>> Ghufran
>>
>>
>> FULL OUTPUT:
>>
>>
>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format
>> specified. Ensure your InputFormat does not require one.
>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format
>> specified. Ensure your OutputFormat does not require one.
>> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is
>> disabled (default), do not allow any task retries (setting
>> mapred.map.max.attempts = 0, old value = 4)
>> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL:
>> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
>> 14/03/30 10:48:45 INFO
>> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
>> writeHaltInstructions: To halt after next superstep execute:
>> 'bin/halt-application --zkServer ghufran:22181 --zkNode
>> /_hadoopBsp/job_201403301044_0001/_haltComputation'
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:host.name
>> =ghufran
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:java.version=1.7.0_51
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:java.vendor=Oracle Corporation
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:java.io.tmpdir=/tmp
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:java.compiler=<NA>
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name
>> =Linux
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:os.arch=amd64
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:os.version=3.8.0-35-generic
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.name
>> =ghufran
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:user.home=/home/ghufran
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client connection,
>> connectString=ghufran:22181 sessionTimeout=60000
>> watcher=org.apache.giraph.job.JobProgressTracker@209fa588
>> 14/03/30 10:48:45 INFO mapred.JobClient: Running job:
>> job_201403301044_0001
>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection to
>> server ghufran/127.0.1.1:22181. Will not attempt to authenticate using
>> SASL (unknown error)
>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection
>> established to ghufran/127.0.1.1:22181, initiating session
>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment
>> complete on server ghufran/127.0.1.1:22181, sessionid =
>> 0x1451263c44c0002, negotiated timeout = 600000
>>  14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers -
>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>> average 109.01MB
>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>> average 109.01MB
>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>> average 109.01MB
>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>> average 108.78MB
>>
>>
>

Re: ConnectedComponents example

Posted by Young Han <yo...@uwaterloo.ca>.
Hey,

As a sanity check, is the graph really loaded on HDFS? Do you see the
correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"?
(Where hadoop is your hadoop binary).

Also, I noticed that your Giraph has been compiled for Hadoop 1.x, while
the logs show Hadoop 0.20.203.0. Maybe that could be the cause too?

Finally, this may be completely irrelevant, but I had issues running
connected components on Giraph 1.0.0 and I fixed it by changing the
algorithm and the input format. The input format you're using on 1.1.0
looks correct to me. The algorithm change I did was to the first "if" block
in ConnectedComponentsComputation:

    if (getSuperstep() == 0) {      currentComponent =
vertex.getId().get();      vertex.setValue(new
IntWritable(currentComponent));      sendMessageToAllEdges(vertex,
vertex.getValue());      vertex.voteToHalt();      return;    }

I forget what error this change solved, so it may not help in your case.

Young



On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <gh...@gmail.com>wrote:

> Hello,
>
> I am a final year Bsc Computer Science Student who is using Apache Giraph
> for my final year project and dissertation and would appreciate very much
> if someone could help me with the following issue.
>
> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am
> having trouble running the ConnectedComponents example. I use the following
> command:
>
>  hadoop jar
> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.ConnectedComponentsComputation -vif
> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip
> /user/ghufran/in/my_graph.txt -vof
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/ghufran/outCC -w 1
>
>
> I believe it gets stuck in the InputSuperstep as the following is
> displayed in terminal when the command is running:
>
> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
> average 108.78MB
> ....
>
> which I traced back to the following if statement in the toString() method
> of core.org.apache.job.CombinedWorkerProgress:
>
> if (isInputSuperstep()) {
>       sb.append("Loading data: ");
>       sb.append(verticesLoaded).append(" vertices loaded, ");
>       sb.append(vertexInputSplitsLoaded).append(
>           " vertex input splits loaded; ");
>       sb.append(edgesLoaded).append(" edges loaded, ");
>       sb.append(edgeInputSplitsLoaded).append(" edge input splits loaded");
>
> sb.append("; min free memory on worker ").append(
>         workerWithMinFreeMemory).append(" - ").append(
>         DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average
> ").append(
>         DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");
>
> So it seems to me that it's not loading in the InputFormat correctly. So I
> am assuming there's something wrong with my input format class or, probably
> more likely, something wrong with the graph I passed in?
>
> I pass in a small graph that has the format vertex id, vertex value,
> neighbours separated by tabs, my graph is shown below:
>
> 1 0 2
> 2 1 1 3 4
> 3 2 2
> 4 3 2
>
> The full output is shown below after I ran my command is shown below. If
> anyone could explain to me why I am not getting the expected output I would
> greatly appreciate it.
>
> Many thanks,
>
> Ghufran
>
>
> FULL OUTPUT:
>
>
> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format
> specified. Ensure your OutputFormat does not require one.
> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL:
> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
> 14/03/30 10:48:45 INFO
> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
> writeHaltInstructions: To halt after next superstep execute:
> 'bin/halt-application --zkServer ghufran:22181 --zkNode
> /_hadoopBsp/job_201403301044_0001/_haltComputation'
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:host.name
> =ghufran
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.version=1.7.0_51
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.vendor=Oracle Corporation
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/tmp
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name
> =Linux
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:os.arch=amd64
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:os.version=3.8.0-35-generic
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.name
> =ghufran
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:user.home=/home/ghufran
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=ghufran:22181 sessionTimeout=60000
> watcher=org.apache.giraph.job.JobProgressTracker@209fa588
> 14/03/30 10:48:45 INFO mapred.JobClient: Running job: job_201403301044_0001
> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection to
> server ghufran/127.0.1.1:22181. Will not attempt to authenticate using
> SASL (unknown error)
> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection established
> to ghufran/127.0.1.1:22181, initiating session
> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment
> complete on server ghufran/127.0.1.1:22181, sessionid =
> 0x1451263c44c0002, negotiated timeout = 600000
> 14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
> average 109.01MB
> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
> average 108.78MB
>
>