You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2013/03/05 11:48:25 UTC

Re: Error with fastgen input

> spilling queue and sorted spilling queue, can we inject the partitioning
> superstep as the first superstep and use local memory?

Actually, I wanted to add something before calling BSP.setup() method
to avoid execute additional BSP job. But, in my opinion, current is
enough. I think, we need to collect more experiences of input
partitioning on large environments. I'll do.

BTW, I still don't know why it need to be Sorted?! MR-like?

On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <su...@apache.org> wrote:
> Sorry, I am increasing the scope here to outside graph module. When we have
> spilling queue and sorted spilling queue, can we inject the partitioning
> superstep as the first superstep and use local memory?
> Today we have partitioning job within a job and are creating two copies of
> data on HDFS. This could be really costly. Is it possible to create or
> redistribute the partitions on local memory and initialize the record
> reader there?
> The user can run a separate job give in examples area to explicitly
> repartition the data on HDFS. The deployment question is how much of disk
> space gets allocated for local memory usage? Would it be a safe approach
> with the limitations?
>
> -Suraj
>
> On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> <th...@gmail.com>wrote:
>
>> yes. Once Suraj added merging of sorted files we can add this to the
>> partitioner pretty easily.
>>
>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>
>> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
>> >
>> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>> > <th...@gmail.com> wrote:
>> > > Now I get how the partitioning works, obviously if you merge n sorted
>> > files
>> > > by just appending to each other, this will result in totally unsorted
>> > data
>> > > ;-)
>> > > Why didn't you solve this via messaging?
>> > >
>> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>> > >
>> > >> Seems that they are not correctly sorted:
>> > >>
>> > >> vertexID: 50
>> > >> vertexID: 52
>> > >> vertexID: 54
>> > >> vertexID: 56
>> > >> vertexID: 58
>> > >> vertexID: 61
>> > >> ...
>> > >> vertexID: 78
>> > >> vertexID: 81
>> > >> vertexID: 83
>> > >> vertexID: 85
>> > >> ...
>> > >> vertexID: 94
>> > >> vertexID: 96
>> > >> vertexID: 98
>> > >> vertexID: 1
>> > >> vertexID: 10
>> > >> vertexID: 12
>> > >> vertexID: 14
>> > >> vertexID: 16
>> > >> vertexID: 18
>> > >> vertexID: 21
>> > >> vertexID: 23
>> > >> vertexID: 25
>> > >> vertexID: 27
>> > >> vertexID: 29
>> > >> vertexID: 3
>> > >>
>> > >> So this won't work then correctly...
>> > >>
>> > >>
>> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>> > >>
>> > >>> sure, have fun on your holidays.
>> > >>>
>> > >>>
>> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >>>
>> > >>>> Sure, but if you can fix quickly, please do. March 1 is holiday[1]
>> so
>> > >>>> I'll appear next week.
>> > >>>>
>> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> > >>>>
>> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>> > >>>> <th...@gmail.com> wrote:
>> > >>>> > Maybe 50 is missing from the file, didn't observe if all items
>> were
>> > >>>> added.
>> > >>>> > As far as I remember, I copy/pasted the logic of the ID into the
>> > >>>> fastgen,
>> > >>>> > want to have a look into it?
>> > >>>> >
>> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >>>> >
>> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency matrix
>> into
>> > >>>> >> multiple files.
>> > >>>> >>
>> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>> > >>>> >> <th...@gmail.com> wrote:
>> > >>>> >> > You have two files, are they partitioned correctly?
>> > >>>> >> >
>> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >>>> >> >
>> > >>>> >> >> It looks like a bug.
>> > >>>> >> >>
>> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>> /tmp/randomgraph/
>> > >>>> >> >> total 44
>> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01 part-00000
>> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01 .part-00000.crc
>> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01 part-00001
>> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01 .part-00001.crc
>> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03 partitions
>> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>> > >>>> >> /tmp/randomgraph/partitions/
>> > >>>> >> >> total 24
>> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03 part-00000
>> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03 .part-00000.crc
>> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03 part-00001
>> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03 .part-00001.crc
>> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>> > >>>> >> >>
>> > >>>> >> >>
>> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <ed...@udanax.org>
>> > wrote:
>> > >>>> >> >> > yes i'll check again
>> > >>>> >> >> >
>> > >>>> >> >> > Sent from my iPhone
>> > >>>> >> >> >
>> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>> > >>>> >> thomas.jungblut@gmail.com>
>> > >>>> >> >> wrote:
>> > >>>> >> >> >
>> > >>>> >> >> >> Can you verify an observation for me please?
>> > >>>> >> >> >>
>> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>> part-00001,
>> > >>>> both
>> > >>>> >> ~2.2kb
>> > >>>> >> >> >> sized.
>> > >>>> >> >> >> In the below partition directory, there is only a single
>> > 5.56kb
>> > >>>> file.
>> > >>>> >> >> >>
>> > >>>> >> >> >> Is it intended for the partitioner to write a single file
>> if
>> > you
>> > >>>> >> >> configured
>> > >>>> >> >> >> two?
>> > >>>> >> >> >> It even reads it as a two files, strange huh?
>> > >>>> >> >> >>
>> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>> > >>>> >> >> >>
>> > >>>> >> >> >>> Will have a look into it.
>> > >>>> >> >> >>>
>> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>> > >>>> >> >> >>>
>> > >>>> >> >> >>> did work for me the last time I profiled, maybe the
>> > >>>> partitioning
>> > >>>> >> >> doesn't
>> > >>>> >> >> >>> partition correctly with the input or something else.
>> > >>>> >> >> >>>
>> > >>>> >> >> >>>
>> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >>>> >> >> >>>
>> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>> > >>>> >> >> >>>>
>> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> bin/hama
>> > jar
>> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>> > fastgen
>> > >>>> 100 10
>> > >>>> >> >> >>>> /tmp/randomgraph 2
>> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable to
>> > load
>> > >>>> >> >> >>>> native-hadoop library for your platform... using
>> > builtin-java
>> > >>>> >> classes
>> > >>>> >> >> >>>> where applicable
>> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running job:
>> > >>>> >> >> job_localrunner_0001
>> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting up a
>> new
>> > >>>> barrier
>> > >>>> >> >> for 2
>> > >>>> >> >> >>>> tasks!
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current
>> supersteps
>> > >>>> >> number: 0
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total number
>> > of
>> > >>>> >> >> supersteps: 0
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:     SUPERSTEPS=0
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > LAUNCHED_TASKS=2
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >>>> >> TASK_OUTPUT_RECORDS=100
>> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> bin/hama
>> > jar
>> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> bin/hama
>> > jar
>> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar pagerank
>> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable to
>> > load
>> > >>>> >> >> >>>> native-hadoop library for your platform... using
>> > builtin-java
>> > >>>> >> classes
>> > >>>> >> >> >>>> where applicable
>> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input
>> > paths
>> > >>>> to
>> > >>>> >> >> process
>> > >>>> >> >> >>>> : 2
>> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input
>> > paths
>> > >>>> to
>> > >>>> >> >> process
>> > >>>> >> >> >>>> : 2
>> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running job:
>> > >>>> >> >> job_localrunner_0001
>> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting up a
>> new
>> > >>>> barrier
>> > >>>> >> >> for 2
>> > >>>> >> >> >>>> tasks!
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current
>> supersteps
>> > >>>> >> number: 1
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total number
>> > of
>> > >>>> >> >> supersteps: 1
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:     SUPERSTEPS=1
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > LAUNCHED_TASKS=2
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > SUPERSTEP_SUM=4
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> IO_BYTES_READ=4332
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> TIME_IN_SYNC_MS=14
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> TASK_INPUT_RECORDS=100
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total input
>> > paths
>> > >>>> to
>> > >>>> >> >> process
>> > >>>> >> >> >>>> : 2
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running job:
>> > >>>> >> >> job_localrunner_0001
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting up a
>> new
>> > >>>> barrier
>> > >>>> >> >> for 2
>> > >>>> >> >> >>>> tasks!
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices
>> > are
>> > >>>> loaded
>> > >>>> >> >> into
>> > >>>> >> >> >>>> local:1
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices
>> > are
>> > >>>> loaded
>> > >>>> >> >> into
>> > >>>> >> >> >>>> local:0
>> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner: Exception
>> > during
>> > >>>> BSP
>> > >>>> >> >> >>>> execution!
>> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must never
>> be
>> > >>>> behind
>> > >>>> >> the
>> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >>
>> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> >
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> >
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > >>>> >> >> >>>>        at
>> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>>
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > >>>> >> >> >>>>        at
>> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>> > >>>> >> >> >>>>
>> > >>>> >> >> >>>>
>> > >>>> >> >> >>>> --
>> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>> > >>>> >> >> >>>> @eddieyoon
>> > >>>> >> >> >>>
>> > >>>> >> >> >>>
>> > >>>> >> >>
>> > >>>> >> >>
>> > >>>> >> >>
>> > >>>> >> >> --
>> > >>>> >> >> Best Regards, Edward J. Yoon
>> > >>>> >> >> @eddieyoon
>> > >>>> >> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> --
>> > >>>> >> Best Regards, Edward J. Yoon
>> > >>>> >> @eddieyoon
>> > >>>> >>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Best Regards, Edward J. Yoon
>> > >>>> @eddieyoon
>> > >>>>
>> > >>>
>> > >>>
>> > >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>> >
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

NOTE: this is my pure opinion about your thoughts.

> This is the change we talked about on the dev list and on JIRAs very
> extensively and chose a single design we want to implement. This requires a
> lot of code change, so I don't see how splitting that smaller (IMHO this is
> atomic enough) would be beneficial. And even if you split the stuff, it
> would add huge organizational overhead, because we lack of team
> members/contributors that can work on those tasks is limited.

and,

> Sorry Edward, but our releases have been a disaster so far. I'm only here
> since 0.3.0, but none of it was either scalable, nor good documented and
> well tested. I have no problem with taking more time for a product, as I
> don't feel the need to deliver half-baked stuff to people who are not using
> it anyways nor providing any feedback there (which is sad reality in many
> other open source projects as well). So in my opinion we have to iterate on
> our own and not with official releases. "It is done, when it's done" is the
> usual standard and I don't think deviating from it will give any advantages
> besides pissed off users getting Hama not to work like it should.

No one noticed that AvroRPC doesn't work well for
communication-intensive job. If you agree with this, the cause of bad
releases is lack of feedbacks from people who uses Hama actively. I
don't think this will be changed soon. So, I'll focus more on using
Hama.

Release plan of Hama 0.6.1 is just a minor release for [1][2].

> For other partitionings and with regard to our superstep API, Suraj's idea
> of injecting a preprocessing superstep that partitions the stuff into our
> messaging system is actually the best.

Before change the PartitioningJobRunner, I hope I can test, use, and
experience it on my large cluster (there are some people can't use
maven because internet security issues). Therefore, I think there is
no need to put SpillingQueue and DiskVerticesInfo into 0.6.1.

1. http://markmail.org/message/bgvojz334l76n3n7
2. http://markmail.org/thread/2yf4lkgdoreq37gn

> It is not about a skill discussion here, but I wanted to emphasize that you
> can very well work on other JIRAs instead of blocking our work on
> graph/messaging. And 23 is at least 22 more than the average of the rest of
> the team, think about that: would there be issues for newcomers? Yes there
> would! But why are you assigning them to yourself when you're not working
> actively on them?
>
> YARN is just a single umbrella issue that is "yours", there is work blocked
> on maven coding (HAMA-671) and also there is a pending patch review since
> 20/11/12 (4 months!) from me in HAMA-672, so don't tell me that you work on
> that things actively in your "full-time open sourcer" career.

Please take it if you want.

On Fri, Mar 15, 2013 at 2:47 AM, Thomas Jungblut
<th...@gmail.com> wrote:
>>
>> As you know, we have a problem of lack of team members and contributors.
>
> So we should break down every tasks as small as possible.
>
>
> Where was this task not broken into pieces?
> There are at least two tasks:
>
> - Improve GraphJobRunner memory consumption (HAMA-704, even reviewed on
> reviewboard with huge memory savings)
> - Implement SpillingQueue / SortedSpillingQueue (HAMA-644, HAMA-723
> whatever else)
>
> This is the change we talked about on the dev list and on JIRAs very
> extensively and chose a single design we want to implement. This requires a
> lot of code change, so I don't see how splitting that smaller (IMHO this is
> atomic enough) would be beneficial. And even if you split the stuff, it
> would add huge organizational overhead, because we lack of team
> members/contributors that can work on those tasks is limited.
>
> I don't know what you mean exactly. But 23 issues are almost examples
>> except YARN integration tasks. If you leave here, I have to take cover
>> YARN tasks. Should I wait someone? Am I touching core module
>> aggressively?
>
>
> It is not about a skill discussion here, but I wanted to emphasize that you
> can very well work on other JIRAs instead of blocking our work on
> graph/messaging. And 23 is at least 22 more than the average of the rest of
> the team, think about that: would there be issues for newcomers? Yes there
> would! But why are you assigning them to yourself when you're not working
> actively on them?
>
> YARN is just a single umbrella issue that is "yours", there is work blocked
> on maven coding (HAMA-671) and also there is a pending patch review since
> 20/11/12 (4 months!) from me in HAMA-672, so don't tell me that you work on
> that things actively in your "full-time open sourcer" career.
>
> By the way, can you answer about this question - Is it really
>> technical conflicts? or emotional conflicts?
>
>
> If someone is usually emotional about things, it is you. Technically
> speaking, should we branch out such (big) refactoring issues to work on our
> own, or do you want to brew your own soup on trunk and have us merge all
> the stuff together? In case you want to please fork your own playground
> Hama and do all the stuff you want, if something emerges successfuly feel
> free to slice a patch and emit a JIRA.
>
> So I think we need to cut release as often as possible.
>
>
> Sorry Edward, but our releases have been a disaster so far. I'm only here
> since 0.3.0, but none of it was either scalable, nor good documented and
> well tested. I have no problem with taking more time for a product, as I
> don't feel the need to deliver half-baked stuff to people who are not using
> it anyways nor providing any feedback there (which is sad reality in many
> other open source projects as well). So in my opinion we have to iterate on
> our own and not with official releases. "It is done, when it's done" is the
> usual standard and I don't think deviating from it will give any advantages
> besides pissed off users getting Hama not to work like it should.
>
> Also your changes on the wiki recently:
>
> However, if no one responds to your patches for 3 days, you can commit then
>> review later.
>
>
> Who in the community has voted for that rule, or do you make the rules
> here? You can't talk about community in the same sentence as changing rules
> for everybody just because you like that.
> Where was the need to commit HAMA-745 without review? Why did you change
> that testcase? This is just the "tip" of the iceberg of changes you are
> doing to the trunk without the agreement of the community. We established a
> community process during the incubation (that was even written on the
> charter when graduating), so why do we not stick to it instead of laying
> out the rules for self-needs / or that of your employee?
>
> Regarding branches, maybe we all are not familiar with online
>> collaboration (or don't want to collaborate anymore). If we want to
>> walk own ways, why we need to be in here together?
>
>
> Branching is something that is perfectly legal when something needs to be
> developed in parallel to ongoing work. We don't have much ongoing work do
> we? So I don't think branching is usually need when working on small
> projects, because issues can be solved by communication. But if you commit
> / plan stuff to trunk without coordinating that with people (YOU KNOW) that
> are currently working on it, then it is just a bad move.
>
> In HAMA-704, I wanted to remove only message map to reduce memory
>> consumption. I still don't want to talk about disk-based vertices and
>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
>> 'partitioning issue fixed and quick executable examples' version ASAP.
>>
>
> You can't say B without saying A. The problems are much deeper than you
> think they are. The message consumption is not a problem of the message
> map, but a two fold problem of vertices that are in memory although they
> don't need to and a not very scalable messaging system. I told you that
> since the time we added the graph module, but I still fall on deaf ears
> with you since more than a year.
> Yea and tell you what? This requires a lot of changes.
>
> If you would have invested the time to work with us on the root of all
> issues instead of doing strange stuff e.G. like the partitioning jobs (in
> the hours I wasted to tell you about the technical downsides of it I
> could've built another Hadoop in FORTRAN) we could've gotten a release out
> months ago and work on other things.
>
> If we want to sort partitioned data using messaging system, idea
>> should be collected.
>
>
> The idea is there and the idea works, but I guess you're not following the
> JIRA's you are +1'ing to?
> Suraj is already working on the second part of the idea we divided by two
> and instead of cock fighting with each other we should work together to
> make this happening. And not as fast as possible because you want to roll
> out a release for your employee, but because we want to improve the
> framework radically and have enough time to test it throughoutly with
> various configurations and not just a Oracle BDA.
>
> P.S., These comments are never helpful in developing community.
>
>
> It is something that needs to be discussed throughout the whole project,
> and not on a single private mailing list. Community development doesn't
> start with +1'ing and smiling to everything just to keep people on board.
> Truth hurts, but is necessary to evolve something. Community starts with
> people who have a vision in making a project better, it will develop for
> itself when it is stable enough and has a bigger user base, you know-
> developers are users too. If I can't run a graph job with 1gb of wikipedia
> links on my laptop, this project is not very likely to be something I want
> to develop on. So our first responsibility is to make our project running
> perfectly smooth and nothing else. And that is something that must be
> discussed with people who want to develop, but can't- and we need these
> people.
> And to be honest again, we didn't had much other people than GSoC students
> that get a shitton of money for developing stuff and then walking away
> again? I count myself in now as well, mea culpa.



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

I admit I didn't described enough info when open the JIRA ticket, and
I didn't review patches carefully.

For example, I saw the memory issue before release 0.6. So, opened
HAMA-596 "Optimize memory usage of graph job". Someone uploaded patch
on there, so I dropped a +1 without review.

BTW, the problem didn't fixed yet? So, I opened same issue HAMA-704
"Optimization of memory usage during message processing". Again,
someone uploaded patch on there, so I dropped a +1 without review.

Now, examples doesn't work? So I reported that error here and I've
started to read recent changes now.

That's all. The problem looks come from review culture.

---

> If you would have invested the time to work with us on the root of all
> issues instead of doing strange stuff e.G. like the partitioning jobs (in
> the hours I wasted to tell you about the technical downsides of it I
> could've built another Hadoop in FORTRAN) we could've gotten a release out
> months ago and work on other things.

Thomas, I really don't know why you saying like this. Let's don't
blame each other anymore. What do you want from me?

On Fri, Mar 15, 2013 at 2:47 AM, Thomas Jungblut
<th...@gmail.com> wrote:
>>
>> As you know, we have a problem of lack of team members and contributors.
>
> So we should break down every tasks as small as possible.
>
>
> Where was this task not broken into pieces?
> There are at least two tasks:
>
> - Improve GraphJobRunner memory consumption (HAMA-704, even reviewed on
> reviewboard with huge memory savings)
> - Implement SpillingQueue / SortedSpillingQueue (HAMA-644, HAMA-723
> whatever else)
>
> This is the change we talked about on the dev list and on JIRAs very
> extensively and chose a single design we want to implement. This requires a
> lot of code change, so I don't see how splitting that smaller (IMHO this is
> atomic enough) would be beneficial. And even if you split the stuff, it
> would add huge organizational overhead, because we lack of team
> members/contributors that can work on those tasks is limited.
>
> I don't know what you mean exactly. But 23 issues are almost examples
>> except YARN integration tasks. If you leave here, I have to take cover
>> YARN tasks. Should I wait someone? Am I touching core module
>> aggressively?
>
>
> It is not about a skill discussion here, but I wanted to emphasize that you
> can very well work on other JIRAs instead of blocking our work on
> graph/messaging. And 23 is at least 22 more than the average of the rest of
> the team, think about that: would there be issues for newcomers? Yes there
> would! But why are you assigning them to yourself when you're not working
> actively on them?
>
> YARN is just a single umbrella issue that is "yours", there is work blocked
> on maven coding (HAMA-671) and also there is a pending patch review since
> 20/11/12 (4 months!) from me in HAMA-672, so don't tell me that you work on
> that things actively in your "full-time open sourcer" career.
>
> By the way, can you answer about this question - Is it really
>> technical conflicts? or emotional conflicts?
>
>
> If someone is usually emotional about things, it is you. Technically
> speaking, should we branch out such (big) refactoring issues to work on our
> own, or do you want to brew your own soup on trunk and have us merge all
> the stuff together? In case you want to please fork your own playground
> Hama and do all the stuff you want, if something emerges successfuly feel
> free to slice a patch and emit a JIRA.
>
> So I think we need to cut release as often as possible.
>
>
> Sorry Edward, but our releases have been a disaster so far. I'm only here
> since 0.3.0, but none of it was either scalable, nor good documented and
> well tested. I have no problem with taking more time for a product, as I
> don't feel the need to deliver half-baked stuff to people who are not using
> it anyways nor providing any feedback there (which is sad reality in many
> other open source projects as well). So in my opinion we have to iterate on
> our own and not with official releases. "It is done, when it's done" is the
> usual standard and I don't think deviating from it will give any advantages
> besides pissed off users getting Hama not to work like it should.
>
> Also your changes on the wiki recently:
>
> However, if no one responds to your patches for 3 days, you can commit then
>> review later.
>
>
> Who in the community has voted for that rule, or do you make the rules
> here? You can't talk about community in the same sentence as changing rules
> for everybody just because you like that.
> Where was the need to commit HAMA-745 without review? Why did you change
> that testcase? This is just the "tip" of the iceberg of changes you are
> doing to the trunk without the agreement of the community. We established a
> community process during the incubation (that was even written on the
> charter when graduating), so why do we not stick to it instead of laying
> out the rules for self-needs / or that of your employee?
>
> Regarding branches, maybe we all are not familiar with online
>> collaboration (or don't want to collaborate anymore). If we want to
>> walk own ways, why we need to be in here together?
>
>
> Branching is something that is perfectly legal when something needs to be
> developed in parallel to ongoing work. We don't have much ongoing work do
> we? So I don't think branching is usually need when working on small
> projects, because issues can be solved by communication. But if you commit
> / plan stuff to trunk without coordinating that with people (YOU KNOW) that
> are currently working on it, then it is just a bad move.
>
> In HAMA-704, I wanted to remove only message map to reduce memory
>> consumption. I still don't want to talk about disk-based vertices and
>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
>> 'partitioning issue fixed and quick executable examples' version ASAP.
>>
>
> You can't say B without saying A. The problems are much deeper than you
> think they are. The message consumption is not a problem of the message
> map, but a two fold problem of vertices that are in memory although they
> don't need to and a not very scalable messaging system. I told you that
> since the time we added the graph module, but I still fall on deaf ears
> with you since more than a year.
> Yea and tell you what? This requires a lot of changes.
>
> If you would have invested the time to work with us on the root of all
> issues instead of doing strange stuff e.G. like the partitioning jobs (in
> the hours I wasted to tell you about the technical downsides of it I
> could've built another Hadoop in FORTRAN) we could've gotten a release out
> months ago and work on other things.
>
> If we want to sort partitioned data using messaging system, idea
>> should be collected.
>
>
> The idea is there and the idea works, but I guess you're not following the
> JIRA's you are +1'ing to?
> Suraj is already working on the second part of the idea we divided by two
> and instead of cock fighting with each other we should work together to
> make this happening. And not as fast as possible because you want to roll
> out a release for your employee, but because we want to improve the
> framework radically and have enough time to test it throughoutly with
> various configurations and not just a Oracle BDA.
>
> P.S., These comments are never helpful in developing community.
>
>
> It is something that needs to be discussed throughout the whole project,
> and not on a single private mailing list. Community development doesn't
> start with +1'ing and smiling to everything just to keep people on board.
> Truth hurts, but is necessary to evolve something. Community starts with
> people who have a vision in making a project better, it will develop for
> itself when it is stable enough and has a bigger user base, you know-
> developers are users too. If I can't run a graph job with 1gb of wikipedia
> links on my laptop, this project is not very likely to be something I want
> to develop on. So our first responsibility is to make our project running
> perfectly smooth and nothing else. And that is something that must be
> discussed with people who want to develop, but can't- and we need these
> people.
> And to be honest again, we didn't had much other people than GSoC students
> that get a shitton of money for developing stuff and then walking away
> again? I count myself in now as well, mea culpa.



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by Thomas Jungblut <th...@gmail.com>.

>
> As you know, we have a problem of lack of team members and contributors.

So we should break down every tasks as small as possible.


Where was this task not broken into pieces?
There are at least two tasks:

- Improve GraphJobRunner memory consumption (HAMA-704, even reviewed on
reviewboard with huge memory savings)
- Implement SpillingQueue / SortedSpillingQueue (HAMA-644, HAMA-723
whatever else)

This is the change we talked about on the dev list and on JIRAs very
extensively and chose a single design we want to implement. This requires a
lot of code change, so I don't see how splitting that smaller (IMHO this is
atomic enough) would be beneficial. And even if you split the stuff, it
would add huge organizational overhead, because we lack of team
members/contributors that can work on those tasks is limited.

I don't know what you mean exactly. But 23 issues are almost examples
> except YARN integration tasks. If you leave here, I have to take cover
> YARN tasks. Should I wait someone? Am I touching core module
> aggressively?


It is not about a skill discussion here, but I wanted to emphasize that you
can very well work on other JIRAs instead of blocking our work on
graph/messaging. And 23 is at least 22 more than the average of the rest of
the team, think about that: would there be issues for newcomers? Yes there
would! But why are you assigning them to yourself when you're not working
actively on them?

YARN is just a single umbrella issue that is "yours", there is work blocked
on maven coding (HAMA-671) and also there is a pending patch review since
20/11/12 (4 months!) from me in HAMA-672, so don't tell me that you work on
that things actively in your "full-time open sourcer" career.

By the way, can you answer about this question - Is it really
> technical conflicts? or emotional conflicts?


If someone is usually emotional about things, it is you. Technically
speaking, should we branch out such (big) refactoring issues to work on our
own, or do you want to brew your own soup on trunk and have us merge all
the stuff together? In case you want to please fork your own playground
Hama and do all the stuff you want, if something emerges successfuly feel
free to slice a patch and emit a JIRA.

So I think we need to cut release as often as possible.


Sorry Edward, but our releases have been a disaster so far. I'm only here
since 0.3.0, but none of it was either scalable, nor good documented and
well tested. I have no problem with taking more time for a product, as I
don't feel the need to deliver half-baked stuff to people who are not using
it anyways nor providing any feedback there (which is sad reality in many
other open source projects as well). So in my opinion we have to iterate on
our own and not with official releases. "It is done, when it's done" is the
usual standard and I don't think deviating from it will give any advantages
besides pissed off users getting Hama not to work like it should.

Also your changes on the wiki recently:

However, if no one responds to your patches for 3 days, you can commit then
> review later.


Who in the community has voted for that rule, or do you make the rules
here? You can't talk about community in the same sentence as changing rules
for everybody just because you like that.
Where was the need to commit HAMA-745 without review? Why did you change
that testcase? This is just the "tip" of the iceberg of changes you are
doing to the trunk without the agreement of the community. We established a
community process during the incubation (that was even written on the
charter when graduating), so why do we not stick to it instead of laying
out the rules for self-needs / or that of your employee?

Regarding branches, maybe we all are not familiar with online
> collaboration (or don't want to collaborate anymore). If we want to
> walk own ways, why we need to be in here together?


Branching is something that is perfectly legal when something needs to be
developed in parallel to ongoing work. We don't have much ongoing work do
we? So I don't think branching is usually need when working on small
projects, because issues can be solved by communication. But if you commit
/ plan stuff to trunk without coordinating that with people (YOU KNOW) that
are currently working on it, then it is just a bad move.

In HAMA-704, I wanted to remove only message map to reduce memory
> consumption. I still don't want to talk about disk-based vertices and
> Spilling Queue at the moment. With this, I wanted to release 0.6.1
> 'partitioning issue fixed and quick executable examples' version ASAP.
>

You can't say B without saying A. The problems are much deeper than you
think they are. The message consumption is not a problem of the message
map, but a two fold problem of vertices that are in memory although they
don't need to and a not very scalable messaging system. I told you that
since the time we added the graph module, but I still fall on deaf ears
with you since more than a year.
Yea and tell you what? This requires a lot of changes.

If you would have invested the time to work with us on the root of all
issues instead of doing strange stuff e.G. like the partitioning jobs (in
the hours I wasted to tell you about the technical downsides of it I
could've built another Hadoop in FORTRAN) we could've gotten a release out
months ago and work on other things.

If we want to sort partitioned data using messaging system, idea
> should be collected.


The idea is there and the idea works, but I guess you're not following the
JIRA's you are +1'ing to?
Suraj is already working on the second part of the idea we divided by two
and instead of cock fighting with each other we should work together to
make this happening. And not as fast as possible because you want to roll
out a release for your employee, but because we want to improve the
framework radically and have enough time to test it throughoutly with
various configurations and not just a Oracle BDA.

P.S., These comments are never helpful in developing community.


It is something that needs to be discussed throughout the whole project,
and not on a single private mailing list. Community development doesn't
start with +1'ing and smiling to everything just to keep people on board.
Truth hurts, but is necessary to evolve something. Community starts with
people who have a vision in making a project better, it will develop for
itself when it is stable enough and has a bigger user base, you know-
developers are users too. If I can't run a graph job with 1gb of wikipedia
links on my laptop, this project is not very likely to be something I want
to develop on. So our first responsibility is to make our project running
perfectly smooth and nothing else. And that is something that must be
discussed with people who want to develop, but can't- and we need these
people.
And to be honest again, we didn't had much other people than GSoC students
that get a shitton of money for developing stuff and then walking away
again? I count myself in now as well, mea culpa.

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

Hmm Okay.

The problem looks come from review culture.

On Fri, Mar 15, 2013 at 12:54 AM, Suraj Menon <su...@apache.org> wrote:
>> It can only be answered by patches?
>
> The answer is partly yes. As an example, please refer to the conversation
> we had in https://issues.apache.org/jira/browse/HAMA-559
> I think between me and Thomas we tried and went to and forth between
> atleast three to four designs before we finalized on one. (Mind you the
> read performance for spilling queue was fixed later.)
>
>> can we discuss about our plan for vertices first?
>
> The design we(I thought including you) are contemplating now is to do a
> join of 2 sorted entities (Vertices and Messages).  This is implied in
> HAMA-704 final patch, that has your +1(?). With synchronized communication
> and sorted queues, you did suggest somewhere that the performance was
> slower(which was expected). So to speed up with scalability, we should do
> async communication and spilled sorted queue. Now this needs refactoring
> everything we know in messaging code. So patch by patch we would have to
> reach there. I don't think there would be a redo. Because most of these are
> building blocks for other applications waiting for these changes.
>
> Thanks,
> Suraj
>
>
>
> On Thu, Mar 14, 2013 at 10:01 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> This is pure question.
>>
>> Before we discussing about below issues, can we discuss about our plan
>> for vertices first? Do I need to wait to see if messaging system can
>> be used to sort partitioned data by vertex comparator until all
>> Spilling queue related issues are fixed? It can only be answered by
>> patches?
>>
>> On Thu, Mar 14, 2013 at 8:40 PM, Suraj Menon <su...@apache.org>
>> wrote:
>> > Going in line with the latest topic of the conversation.
>> > Nothing is closed here and the JIRA's were already created for the whole
>> > thing to come in place:
>> >
>> > HAMA-644
>> > HAMA-490
>> > HAMA-722
>> > HAMA-728
>> > HAMA-707
>> > HAMA-728
>> >
>> > The JIRA's above are directly or indirectly affected during core
>> > refactoring.
>> >
>> > -Suraj
>> >
>> >
>> > On Thu, Mar 14, 2013 at 7:03 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >
>> >> P.S., These comments are never helpful in developing community.
>> >>
>> >> "before you run riot on all along the codebase, Suraj ist currently
>> working
>> >> on that stuff- don't make it more difficult for him rebasing all his
>> >> patches the whole time.
>> >> He has the plan so that we made to make the stuff working, his part is
>> >> currently missing. So don't try to muddle arround there, it will make
>> this
>> >> take longer than already needed."
>> >>
>> >> On Thu, Mar 14, 2013 at 7:57 PM, Edward J. Yoon <ed...@apache.org>
>> >> wrote:
>> >> > In my opinion, the our best action is - 1) explain the plans, edit
>> >> > together on Wiki, and then 2) break-down implementation tasks as small
>> >> > as possible so that available people can try it in parallel. Then, you
>> >> > can use available people. Do you remember, I asked you to write down
>> >> > your plan here? - http://wiki.apache.org/hama/SpillingQueue If you
>> >> > have some time, Please do for me. I'll help you in my free time.
>> >> >
>> >> > Regarding branches, maybe we all are not familiar with online
>> >> > collaboration (or don't want to collaborate anymore). If we want to
>> >> > walk own ways, why we need to be in here together?
>> >> >
>> >> > On Thu, Mar 14, 2013 at 7:13 PM, Suraj Menon <su...@apache.org>
>> >> wrote:
>> >> >> Three points:
>> >> >>
>> >> >> Firstly, apologies because partly this conversation emanates from the
>> >> delay
>> >> >> in providing the set of patches. I was not able to slice as much
>> time I
>> >> was
>> >> >> hoping.
>> >> >>
>> >> >> Second, I think I/we can work on a separate branches. Since most of
>> >> these
>> >> >> concerns could only be answered by future patches, a decision could
>> be
>> >> made
>> >> >> then. We can decide if svn revert is needed during the process on
>> trunk.
>> >> >> (This is a general comment and not related to particular JIRA)
>> >> >>
>> >> >> Third, Please feel free to slice a release if it is really important.
>> >> >>
>> >> >> Thanks,
>> >> >> Suraj
>> >> >>
>> >> >> On Thu, Mar 14, 2013 at 5:39 AM, Edward J. Yoon <
>> edwardyoon@apache.org
>> >> >wrote:
>> >> >>
>> >> >>> To reduce arguing, I'm appending my opinions.
>> >> >>>
>> >> >>> In HAMA-704, I wanted to remove only message map to reduce memory
>> >> >>> consumption. I still don't want to talk about disk-based vertices
>> and
>> >> >>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
>> >> >>> 'partitioning issue fixed and quick executable examples' version
>> ASAP.
>> >> >>> That's why I scheduled Spilling Queue in 0.7 roadmap.
>> >> >>>
>> >> >>> As you can see, issues are happening one right after another. I
>> don't
>> >> >>> think we have to clean all never-ending issues. We can improve
>> >> >>> step-by-step.
>> >> >>>
>> >> >>> 1. http://wiki.apache.org/hama/RoadMap
>> >> >>>
>> >> >>> On Thu, Mar 14, 2013 at 6:22 PM, Edward J. Yoon <
>> edwardyoon@apache.org
>> >> >
>> >> >>> wrote:
>> >> >>> > Typos ;)
>> >> >>> >
>> >> >>> >> except YARN integration tasks. If you leave here, I have to take
>> >> cover
>> >> >>> >> YARN tasks. Should I wait someone? Am I touching core module
>> >> >>> >
>> >> >>> > I have to cover YARN tasks instead of you.
>> >> >>> >
>> >> >>> > On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <
>> >> edwardyoon@apache.org>
>> >> >>> wrote:
>> >> >>> >> Hmm, here's my opinions:
>> >> >>> >>
>> >> >>> >> As you know, we have a problem of lack of team members and
>> >> >>> >> contributors. So we should break down every tasks as small as
>> >> >>> >> possible. Our best action is improving step-by-step. And every
>> >> >>> >> Hama-x.x.x should run well even though it's a baby cart level.
>> >> >>> >>
>> >> >>> >> And, Tech should be developed under the necessity. So I think we
>> >> need
>> >> >>> >> to cut release as often as possible. Therefore I volunteered to
>> >> manage
>> >> >>> >> release. Actually, I was wanted to work only on QA (quality
>> >> assurance)
>> >> >>> >> related tasks because yours code is better than me and I have a
>> >> >>> >> cluster.
>> >> >>> >>
>> >> >>> >> However, we are currently not doing like that. I guess there are
>> >> many
>> >> >>> >> reasons. We're all not a full-time open sourcer (except me).
>> >> >>> >>
>> >> >>> >>> You have 23 issues assigned.  Why do you need to work on that?
>> >> >>> >>
>> >> >>> >> I don't know what you mean exactly. But 23 issues are almost
>> >> examples
>> >> >>> >> except YARN integration tasks. If you leave here, I have to take
>> >> cover
>> >> >>> >> YARN tasks. Should I wait someone? Am I touching core module
>> >> >>> >> aggressively?
>> >> >>> >>
>> >> >>> >>> Otherwise Suraj and I branch that issues away and you can play
>> >> >>> arround.l in
>> >> >>> >>> trunk how you like.
>> >> >>> >>
>> >> >>> >> I also don't know what you mean exactly but if you want, Please
>> do.
>> >> >>> >>
>> >> >>> >> By the way, can you answer about this question - Is it really
>> >> >>> >> technical conflicts? or emotional conflicts?
>> >> >>> >>
>> >> >>> >> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
>> >> >>> >> <th...@gmail.com> wrote:
>> >> >>> >>> You have 23 issues assigned.  Why do you need to work on that?
>> >> >>> >>> Otherwise Suraj and I branch that issues away and you can play
>> >> >>> arround.l in
>> >> >>> >>> trunk how you like.
>> >> >>> >>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <
>> >> edwardyoon@apache.org>:
>> >> >>> >>>
>> >> >>> >>>> P.S., Please don't say like that.
>> >> >>> >>>>
>> >> >>> >>>> No decisions made yet. And if someone have a question or missed
>> >> >>> >>>> something, you have to try to explain here. Because this is a
>> open
>> >> >>> >>>> source. Anyone can't say "don't touch trunk bc I'm working on
>> it".
>> >> >>> >>>>
>> >> >>> >>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <
>> >> >>> edwardyoon@apache.org>
>> >> >>> >>>> wrote:
>> >> >>> >>>> > Sorry for my quick and dirty style small patches.
>> >> >>> >>>> >
>> >> >>> >>>> > However, we should work together in parallel. Please share
>> here
>> >> if
>> >> >>> >>>> > there are some progresses.
>> >> >>> >>>> >
>> >> >>> >>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
>> >> >>> >>>> > <th...@gmail.com> wrote:
>> >> >>> >>>> >> Hi Edward,
>> >> >>> >>>> >>
>> >> >>> >>>> >> before you run riot on all along the codebase, Suraj ist
>> >> currently
>> >> >>> >>>> working
>> >> >>> >>>> >> on that stuff- don't make it more difficult for him rebasing
>> >> all
>> >> >>> his
>> >> >>> >>>> >> patches the whole time.
>> >> >>> >>>> >> He has the plan so that we made to make the stuff working,
>> his
>> >> >>> part is
>> >> >>> >>>> >> currently missing. So don't try to muddle arround there, it
>> >> will
>> >> >>> make
>> >> >>> >>>> this
>> >> >>> >>>> >> take longer than already needed.
>> >> >>> >>>> >>
>> >> >>> >>>> >>
>> >> >>> >>>> >>
>> >> >>> >>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>> >> >>> >>>> >>
>> >> >>> >>>> >>> Personally, I would like to solve this issue by touching
>> >> >>> >>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices
>> into
>> >> >>> >>>> >>> multiple files, we can avoid huge memory consumption.
>> >> >>> >>>> >>>
>> >> >>> >>>> >>> If we want to sort partitioned data using messaging system,
>> >> idea
>> >> >>> >>>> >>> should be collected.
>> >> >>> >>>> >>>
>> >> >>> >>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
>> >> >>> >>>> edwardyoon@apache.org>
>> >> >>> >>>> >>> wrote:
>> >> >>> >>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely
>> >> written.
>> >> >>> >>>> >>> >
>> >> >>> >>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
>> >> >>> >>>> edwardyoon@apache.org>
>> >> >>> >>>> >>> wrote:
>> >> >>> >>>> >>> >> I'm reading changes of HAMA-704 again. As a result of
>> >> adding
>> >> >>> >>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted.
>> I'm
>> >> >>> not sure
>> >> >>> >>>> >>> >> but I think this approach will bring more disadvantages
>> >> than
>> >> >>> >>>> >>> >> advantages.
>> >> >>> >>>> >>> >>
>> >> >>> >>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
>> >> >>> >>>> edwardyoon@apache.org>
>> >> >>> >>>> >>> wrote:
>> >> >>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
>> >> >>> storage in
>> >> >>> >>>> >>> user space
>> >> >>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads
>> and
>> >> >>> writes.
>> >> >>> >>>> >>> This way
>> >> >>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can
>> keep
>> >> >>> vertices
>> >> >>> >>>> >>> sorted
>> >> >>> >>>> >>> >>>>>> with a single read and single write on every peer.
>> >> >>> >>>> >>> >>>
>> >> >>> >>>> >>> >>> And, as I commented JIRA ticket, I think we can't use
>> >> >>> messaging
>> >> >>> >>>> system
>> >> >>> >>>> >>> >>> for sorting vertices within partition files.
>> >> >>> >>>> >>> >>>
>> >> >>> >>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>> >> >>> >>>> >>> edwardyoon@apache.org> wrote:
>> >> >>> >>>> >>> >>>> P.S., (number of splits = number of partitions) is
>> really
>> >> >>> confuse
>> >> >>> >>>> to
>> >> >>> >>>> >>> >>>> me. Even though blocks number is equal to desired
>> tasks
>> >> >>> number,
>> >> >>> >>>> data
>> >> >>> >>>> >>> >>>> should be re-partitioned again.
>> >> >>> >>>> >>> >>>>
>> >> >>> >>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>> >> >>> >>>> >>> edwardyoon@apache.org> wrote:
>> >> >>> >>>> >>> >>>>> Indeed. If there are already partitioned input files
>> >> >>> (unsorted)
>> >> >>> >>>> and
>> >> >>> >>>> >>> so
>> >> >>> >>>> >>> >>>>> user want to skip pre-partitioning phase, it should
>> be
>> >> >>> handled in
>> >> >>> >>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't
>> >> know why
>> >> >>> >>>> >>> >>>>> re-partitioned files need to be Sorted. It's only
>> about
>> >> >>> >>>> >>> >>>>> GraphJobRunner.
>> >> >>> >>>> >>> >>>>>
>> >> >>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs.
>> We
>> >> can
>> >> >>> have
>> >> >>> >>>> a
>> >> >>> >>>> >>> dedicated
>> >> >>> >>>> >>> >>>>>> partitioning superstep for graph applications).
>> >> >>> >>>> >>> >>>>>
>> >> >>> >>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean
>> just
>> >> a
>> >> >>> >>>> >>> partitioning
>> >> >>> >>>> >>> >>>>> job based on superstep API?
>> >> >>> >>>> >>> >>>>>
>> >> >>> >>>> >>> >>>>> By default, 100 tasks will be assigned for
>> partitioning
>> >> job.
>> >> >>> >>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus,
>> we
>> >> can
>> >> >>> >>>> execute
>> >> >>> >>>> >>> >>>>> the Graph job with 1,000 tasks.
>> >> >>> >>>> >>> >>>>>
>> >> >>> >>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100
>> >> >>> blocks). If
>> >> >>> >>>> I
>> >> >>> >>>> >>> >>>>> want to run with 1,000 tasks, what happens?
>> >> >>> >>>> >>> >>>>>
>> >> >>> >>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
>> >> >>> >>>> surajsmenon@apache.org>
>> >> >>> >>>> >>> wrote:
>> >> >>> >>>> >>> >>>>>> I am responding on this thread because of better
>> >> >>> continuity for
>> >> >>> >>>> >>> >>>>>> conversation. We cannot expect the partitions to be
>> >> sorted
>> >> >>> every
>> >> >>> >>>> >>> time. When
>> >> >>> >>>> >>> >>>>>> the number of splits = number of partitions and
>> >> >>> partitioning is
>> >> >>> >>>> >>> switched
>> >> >>> >>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be
>> >> sorted.
>> >> >>> Can
>> >> >>> >>>> we
>> >> >>> >>>> >>> do this
>> >> >>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
>> >> >>> storage in
>> >> >>> >>>> >>> user space
>> >> >>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads
>> and
>> >> >>> writes.
>> >> >>> >>>> >>> This way
>> >> >>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can
>> keep
>> >> >>> vertices
>> >> >>> >>>> >>> sorted
>> >> >>> >>>> >>> >>>>>> with a single read and single write on every peer.
>> >> >>> >>>> >>> >>>>>>
>> >> >>> >>>> >>> >>>>>> Just clearing confusion if any regarding superstep
>> >> >>> injection for
>> >> >>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs.
>> We
>> >> can
>> >> >>> have
>> >> >>> >>>> a
>> >> >>> >>>> >>> dedicated
>> >> >>> >>>> >>> >>>>>> partitioning superstep for graph applications).
>> >> >>> >>>> >>> >>>>>> Say there are x splits and y number of tasks
>> >> configured by
>> >> >>> user.
>> >> >>> >>>> >>> >>>>>>
>> >> >>> >>>> >>> >>>>>> if x > y
>> >> >>> >>>> >>> >>>>>> The y tasks are scheduled with x of them having
>> each of
>> >> >>> the x
>> >> >>> >>>> >>> splits and
>> >> >>> >>>> >>> >>>>>> the remaining with no resource local to them. Then
>> the
>> >> >>> >>>> partitioning
>> >> >>> >>>> >>> >>>>>> superstep redistributes the partitions among them to
>> >> create
>> >> >>> >>>> local
>> >> >>> >>>> >>> >>>>>> partitions. Now the question is can we
>> re-initialize a
>> >> >>> peer's
>> >> >>> >>>> input
>> >> >>> >>>> >>> based
>> >> >>> >>>> >>> >>>>>> on this new local part of partition?
>> >> >>> >>>> >>> >>>>>>
>> >> >>> >>>> >>> >>>>>> if y > x
>> >> >>> >>>> >>> >>>>>> works as it works today.
>> >> >>> >>>> >>> >>>>>>
>> >> >>> >>>> >>> >>>>>> Just putting my points in brainstorming.
>> >> >>> >>>> >>> >>>>>>
>> >> >>> >>>> >>> >>>>>> -Suraj
>> >> >>> >>>> >>> >>>>>>
>> >> >>> >>>> >>> >>>>>>
>> >> >>> >>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>> >> >>> >>>> >>> edwardyoon@apache.org>wrote:
>> >> >>> >>>> >>> >>>>>>
>> >> >>> >>>> >>> >>>>>>> I just filed here
>> >> >>> >>>> https://issues.apache.org/jira/browse/HAMA-744
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>> >> >>> >>>> >>> edwardyoon@apache.org>
>> >> >>> >>>> >>> >>>>>>> wrote:
>> >> >>> >>>> >>> >>>>>>> > Additionally,
>> >> >>> >>>> >>> >>>>>>> >
>> >> >>> >>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we
>> >> >>> inject the
>> >> >>> >>>> >>> partitioning
>> >> >>> >>>> >>> >>>>>>> >> superstep as the first superstep and use local
>> >> memory?
>> >> >>> >>>> >>> >>>>>>> >
>> >> >>> >>>> >>> >>>>>>> > Can we execute different number of tasks per
>> >> superstep?
>> >> >>> >>>> >>> >>>>>>> >
>> >> >>> >>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>> >> >>> >>>> >>> edwardyoon@apache.org>
>> >> >>> >>>> >>> >>>>>>> wrote:
>> >> >>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files
>> that
>> >> >>> result
>> >> >>> >>>> from
>> >> >>> >>>> >>> the
>> >> >>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only
>> >> the
>> >> >>> >>>> partition
>> >> >>> >>>> >>> files in
>> >> >>> >>>> >>> >>>>>>> >>
>> >> >>> >>>> >>> >>>>>>> >> I see.
>> >> >>> >>>> >>> >>>>>>> >>
>> >> >>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
>> >> >>> superstep
>> >> >>> >>>> API,
>> >> >>> >>>> >>> Suraj's
>> >> >>> >>>> >>> >>>>>>> idea
>> >> >>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
>> >> >>> partitions the
>> >> >>> >>>> >>> stuff into
>> >> >>> >>>> >>> >>>>>>> our
>> >> >>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
>> >> >>> >>>> >>> >>>>>>> >>
>> >> >>> >>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated
>> in
>> >> >>> >>>> partitioning
>> >> >>> >>>> >>> step,
>> >> >>> >>>> >>> >>>>>>> >> separated partitioning job may not be bad idea.
>> Is
>> >> >>> there
>> >> >>> >>>> some
>> >> >>> >>>> >>> special
>> >> >>> >>>> >>> >>>>>>> >> reason?
>> >> >>> >>>> >>> >>>>>>> >>
>> >> >>> >>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>> >> >>> >>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
>> >> >>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files
>> that
>> >> >>> result
>> >> >>> >>>> from
>> >> >>> >>>> >>> the
>> >> >>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only
>> >> the
>> >> >>> >>>> partition
>> >> >>> >>>> >>> files in
>> >> >>> >>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in
>> not
>> >> >>> sorted
>> >> >>> >>>> data
>> >> >>> >>>> >>> in the
>> >> >>> >>>> >>> >>>>>>> >>> completed file. This only applies for the graph
>> >> >>> processing
>> >> >>> >>>> >>> package.
>> >> >>> >>>> >>> >>>>>>> >>> So as Suraj told, it would be much more
>> simpler to
>> >> >>> solve
>> >> >>> >>>> this
>> >> >>> >>>> >>> via
>> >> >>> >>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very
>> >> very
>> >> >>> >>>> >>> scalable!). So the
>> >> >>> >>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff
>> with
>> >> a
>> >> >>> single
>> >> >>> >>>> >>> superstep in
>> >> >>> >>>> >>> >>>>>>> >>> setup() as it was before ages ago. The
>> messaging
>> >> must
>> >> >>> be
>> >> >>> >>>> >>> sorted anyway
>> >> >>> >>>> >>> >>>>>>> for
>> >> >>> >>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and
>> >> saves
>> >> >>> us
>> >> >>> >>>> the
>> >> >>> >>>> >>> >>>>>>> partitioning
>> >> >>> >>>> >>> >>>>>>> >>> job for graph processing.
>> >> >>> >>>> >>> >>>>>>> >>>
>> >> >>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
>> >> >>> superstep
>> >> >>> >>>> API,
>> >> >>> >>>> >>> Suraj's
>> >> >>> >>>> >>> >>>>>>> idea
>> >> >>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
>> >> >>> partitions the
>> >> >>> >>>> >>> stuff into
>> >> >>> >>>> >>> >>>>>>> our
>> >> >>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
>> >> >>> >>>> >>> >>>>>>> >>>
>> >> >>> >>>> >>> >>>>>>> >>>
>> >> >>> >>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>> >> >>> >>>> >>> >>>>>>> >>>
>> >> >>> >>>> >>> >>>>>>> >>>> No, the partitions we write locally need not
>> be
>> >> >>> sorted.
>> >> >>> >>>> Sorry
>> >> >>> >>>> >>> for the
>> >> >>> >>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible
>> >> with
>> >> >>> >>>> Superstep
>> >> >>> >>>> >>> API.
>> >> >>> >>>> >>> >>>>>>> There
>> >> >>> >>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler
>> >> after
>> >> >>> I
>> >> >>> >>>> last
>> >> >>> >>>> >>> worked on
>> >> >>> >>>> >>> >>>>>>> it.
>> >> >>> >>>> >>> >>>>>>> >>>> We can then look into partitioning superstep
>> >> being
>> >> >>> >>>> executed
>> >> >>> >>>> >>> before the
>> >> >>> >>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I
>> >> think
>> >> >>> it is
>> >> >>> >>>> >>> feasible.
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J.
>> Yoon <
>> >> >>> >>>> >>> edwardyoon@apache.org
>> >> >>> >>>> >>> >>>>>>> >>>> >wrote:
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue,
>> >> can we
>> >> >>> >>>> inject
>> >> >>> >>>> >>> the
>> >> >>> >>>> >>> >>>>>>> >>>> partitioning
>> >> >>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use
>> >> local
>> >> >>> memory?
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before
>> >> calling
>> >> >>> >>>> >>> BSP.setup()
>> >> >>> >>>> >>> >>>>>>> method
>> >> >>> >>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But,
>> in my
>> >> >>> opinion,
>> >> >>> >>>> >>> current is
>> >> >>> >>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more
>> >> >>> experiences of
>> >> >>> >>>> >>> input
>> >> >>> >>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be
>> >> Sorted?!
>> >> >>> >>>> MR-like?
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj
>> Menon <
>> >> >>> >>>> >>> >>>>>>> surajsmenon@apache.org>
>> >> >>> >>>> >>> >>>>>>> >>>> > wrote:
>> >> >>> >>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to
>> >> outside
>> >> >>> graph
>> >> >>> >>>> >>> module.
>> >> >>> >>>> >>> >>>>>>> When we
>> >> >>> >>>> >>> >>>>>>> >>>> > have
>> >> >>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue,
>> >> can we
>> >> >>> >>>> inject
>> >> >>> >>>> >>> the
>> >> >>> >>>> >>> >>>>>>> >>>> partitioning
>> >> >>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use
>> >> local
>> >> >>> memory?
>> >> >>> >>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a
>> job
>> >> and
>> >> >>> are
>> >> >>> >>>> >>> creating two
>> >> >>> >>>> >>> >>>>>>> copies
>> >> >>> >>>> >>> >>>>>>> >>>> > of
>> >> >>> >>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly.
>> >> Is it
>> >> >>> >>>> possible
>> >> >>> >>>> >>> to
>> >> >>> >>>> >>> >>>>>>> create or
>> >> >>> >>>> >>> >>>>>>> >>>> > > redistribute the partitions on local
>> memory
>> >> and
>> >> >>> >>>> >>> initialize the
>> >> >>> >>>> >>> >>>>>>> record
>> >> >>> >>>> >>> >>>>>>> >>>> > > reader there?
>> >> >>> >>>> >>> >>>>>>> >>>> > > The user can run a separate job give in
>> >> examples
>> >> >>> area
>> >> >>> >>>> to
>> >> >>> >>>> >>> >>>>>>> explicitly
>> >> >>> >>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The
>> deployment
>> >> >>> question
>> >> >>> >>>> is
>> >> >>> >>>> >>> how much
>> >> >>> >>>> >>> >>>>>>> of
>> >> >>> >>>> >>> >>>>>>> >>>> disk
>> >> >>> >>>> >>> >>>>>>> >>>> > > space gets allocated for local memory
>> usage?
>> >> >>> Would it
>> >> >>> >>>> be
>> >> >>> >>>> >>> a safe
>> >> >>> >>>> >>> >>>>>>> >>>> approach
>> >> >>> >>>> >>> >>>>>>> >>>> > > with the limitations?
>> >> >>> >>>> >>> >>>>>>> >>>> > >
>> >> >>> >>>> >>> >>>>>>> >>>> > > -Suraj
>> >> >>> >>>> >>> >>>>>>> >>>> > >
>> >> >>> >>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas
>> >> Jungblut
>> >> >>> >>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>> >> >>> >>>> >>> >>>>>>> >>>> > >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted
>> >> files
>> >> >>> we can
>> >> >>> >>>> add
>> >> >>> >>>> >>> this to
>> >> >>> >>>> >>> >>>>>>> the
>> >> >>> >>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
>> >> >>> >>>> >>> >>>>>>> >>>> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <
>> >> edwardyoon@apache.org
>> >> >>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data
>> >> really
>> >> >>> >>>> necessary
>> >> >>> >>>> >>> to be
>> >> >>> >>>> >>> >>>>>>> Sorted?
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas
>> >> >>> Jungblut
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works,
>> >> >>> obviously
>> >> >>> >>>> if
>> >> >>> >>>> >>> you merge
>> >> >>> >>>> >>> >>>>>>> n
>> >> >>> >>>> >>> >>>>>>> >>>> > sorted
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > files
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this
>> >> will
>> >> >>> >>>> result in
>> >> >>> >>>> >>> totally
>> >> >>> >>>> >>> >>>>>>> >>>> > unsorted
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > data
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > > ;-)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via
>> messaging?
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
>> >> >>> >>>> thomas.jungblut@gmail.com
>> >> >>> >>>> >>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly
>> >> sorted:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> ...
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> ...
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>> >> >>> >>>> >>> thomas.jungblut@gmail.com>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
>> >> >>> >>>> edwardyoon@apache.org>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly,
>> >> please
>> >> >>> do.
>> >> >>> >>>> >>> March 1 is
>> >> >>> >>>> >>> >>>>>>> >>>> > holiday[1]
>> >> >>> >>>> >>> >>>>>>> >>>> > >> so
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> 1.
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM,
>> >> Thomas
>> >> >>> >>>> Jungblut
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com>
>> wrote:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the
>> file,
>> >> >>> didn't
>> >> >>> >>>> >>> observe if all
>> >> >>> >>>> >>> >>>>>>> >>>> items
>> >> >>> >>>> >>> >>>>>>> >>>> > >> were
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> added.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I
>> >> copy/pasted the
>> >> >>> >>>> logic
>> >> >>> >>>> >>> of the ID
>> >> >>> >>>> >>> >>>>>>> into
>> >> >>> >>>> >>> >>>>>>> >>>> > the
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
>> >> >>> >>>> edwardyoon@apache.org
>> >> >>> >>>> >>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen,
>> >> when
>> >> >>> >>>> generate
>> >> >>> >>>> >>> adjacency
>> >> >>> >>>> >>> >>>>>>> >>>> matrix
>> >> >>> >>>> >>> >>>>>>> >>>> > >> into
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29
>> PM,
>> >> >>> Thomas
>> >> >>> >>>> >>> Jungblut
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com>
>> >> wrote:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they
>> >> >>> partitioned
>> >> >>> >>>> >>> correctly?
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>> >> >>> >>>> >>> edwardyoon@apache.org>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> >> >>> :~/workspace/hama-trunk$
>> >> >>> >>>> ls
>> >> >>> >>>> >>> -al
>> >> >>> >>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward
>> >>  4096
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:03 .
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root
>> >> 20480
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:04 ..
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward
>> >>  2243
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:01
>> >> >>> >>>> >>> >>>>>>> part-00000
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward
>> >>  28
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:01
>> >> >>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward
>> >>  2251
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:01
>> >> >>> >>>> >>> >>>>>>> part-00001
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward
>> >>  28
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:01
>> >> >>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward
>> >>  4096
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:03
>> >> >>> >>>> >>> >>>>>>> partitions
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> >> >>> :~/workspace/hama-trunk$
>> >> >>> >>>> ls
>> >> >>> >>>> >>> -al
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward
>> 4096
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:03 .
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward
>> 4096
>> >> >>>  2월 28
>> >> >>> >>>> >>> 18:03 ..
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward
>> 2932
>> >> >>>  2월 28
>> >> >>> >>>> 18:03
>> >> >>> >>>> >>> >>>>>>> part-00000
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward
>>   32
>> >> >>>  2월 28
>> >> >>> >>>> 18:03
>> >> >>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward
>> 2955
>> >> >>>  2월 28
>> >> >>> >>>> 18:03
>> >> >>> >>>> >>> >>>>>>> part-00001
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward
>>   32
>> >> >>>  2월 28
>> >> >>> >>>> 18:03
>> >> >>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> >> >>> :~/workspace/hama-trunk$
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27
>> >> PM,
>> >> >>> Edward
>> >> >>> >>>> <
>> >> >>> >>>> >>> >>>>>>> >>>> edward@udanax.org
>> >> >>> >>>> >>> >>>>>>> >>>> > >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > wrote:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18
>> PM,
>> >> >>> Thomas
>> >> >>> >>>> >>> Jungblut <
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an
>> >> observation
>> >> >>> for me
>> >> >>> >>>> >>> please?
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from
>> >> >>> fastgen,
>> >> >>> >>>> >>> part-00000 and
>> >> >>> >>>> >>> >>>>>>> >>>> > >> part-00001,
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> both
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition
>> >> >>> directory,
>> >> >>> >>>> there
>> >> >>> >>>> >>> is only a
>> >> >>> >>>> >>> >>>>>>> >>>> single
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > 5.56kb
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> file.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the
>> >> >>> partitioner to
>> >> >>> >>>> >>> write a
>> >> >>> >>>> >>> >>>>>>> single
>> >> >>> >>>> >>> >>>>>>> >>>> > file
>> >> >>> >>>> >>> >>>>>>> >>>> > >> if
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > you
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two
>> >> files,
>> >> >>> >>>> strange
>> >> >>> >>>> >>> huh?
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas
>> Jungblut <
>> >> >>> >>>> >>> >>>>>>> thomas.jungblut@gmail.com>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into
>> it.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10
>> >> >>> /tmp/randomgraph
>> >> >>> >>>> 1
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank
>> /tmp/randomgraph
>> >> >>> >>>> /tmp/pageout
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last
>> >> time I
>> >> >>> >>>> >>> profiled, maybe
>> >> >>> >>>> >>> >>>>>>> the
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with
>> >> the
>> >> >>> input
>> >> >>> >>>> or
>> >> >>> >>>> >>> something
>> >> >>> >>>> >>> >>>>>>> else.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J.
>> Yoon <
>> >> >>> >>>> >>> edwardyoon@apache.org
>> >> >>> >>>> >>> >>>>>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not
>> >> work
>> >> >>> for
>> >> >>> >>>> graph
>> >> >>> >>>> >>> examples.
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> edward@edward-virtualBox
>> >> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>> gen
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > fastgen
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>> >> >>> >>>> >>> util.NativeCodeLoader:
>> >> >>> >>>> >>> >>>>>>> Unable
>> >> >>> >>>> >>> >>>>>>> >>>> > to
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > load
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library
>> for
>> >> your
>> >> >>> >>>> >>> platform...
>> >> >>> >>>> >>> >>>>>>> using
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> Running
>> >> >>> >>>> >>> >>>>>>> >>>> job:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> >> >>> >>>> >>> bsp.LocalBSPRunner:
>> >> >>> >>>> >>> >>>>>>> Setting
>> >> >>> >>>> >>> >>>>>>> >>>> up
>> >> >>> >>>> >>> >>>>>>> >>>> > a
>> >> >>> >>>> >>> >>>>>>> >>>> > >> new
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> Current
>> >> >>> >>>> >>> >>>>>>> >>>> > >> supersteps
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient: The
>> >> >>> >>>> >>> >>>>>>> total
>> >> >>> >>>> >>> >>>>>>> >>>> > number
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > of
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> Counters: 3
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212
>> >> seconds
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> edward@edward-virtualBox
>> >> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> hama-examples-0.7.0-SNAPSHOT.jar
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> edward@edward-virtualBox
>> >> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>> >> >>> >>>> >>> >>>>>>> >>>> > pagerank
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph
>> >> /tmp/pageour
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>> >> >>> >>>> >>> util.NativeCodeLoader:
>> >> >>> >>>> >>> >>>>>>> Unable
>> >> >>> >>>> >>> >>>>>>> >>>> > to
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > load
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library
>> for
>> >> your
>> >> >>> >>>> >>> platform...
>> >> >>> >>>> >>> >>>>>>> using
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> >> >>> >>>> >>> bsp.FileInputFormat:
>> >> >>> >>>> >>> >>>>>>> Total
>> >> >>> >>>> >>> >>>>>>> >>>> > input
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> >> >>> >>>> >>> bsp.FileInputFormat:
>> >> >>> >>>> >>> >>>>>>> Total
>> >> >>> >>>> >>> >>>>>>> >>>> > input
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> Running
>> >> >>> >>>> >>> >>>>>>> >>>> job:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> >> >>> >>>> >>> bsp.LocalBSPRunner:
>> >> >>> >>>> >>> >>>>>>> Setting
>> >> >>> >>>> >>> >>>>>>> >>>> up
>> >> >>> >>>> >>> >>>>>>> >>>> > a
>> >> >>> >>>> >>> >>>>>>> >>>> > >> new
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> Current
>> >> >>> >>>> >>> >>>>>>> >>>> > >> supersteps
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient: The
>> >> >>> >>>> >>> >>>>>>> total
>> >> >>> >>>> >>> >>>>>>> >>>> > number
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > of
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> Counters: 6
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.FileInputFormat:
>> >> >>> >>>> >>> >>>>>>> Total
>> >> >>> >>>> >>> >>>>>>> >>>> > input
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.BSPJobClient:
>> >> >>> >>>> >>> >>>>>>> Running
>> >> >>> >>>> >>> >>>>>>> >>>> job:
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> bsp.LocalBSPRunner:
>> >> >>> >>>> >>> >>>>>>> Setting
>> >> >>> >>>> >>> >>>>>>> >>>> up
>> >> >>> >>>> >>> >>>>>>> >>>> > a
>> >> >>> >>>> >>> >>>>>>> >>>> > >> new
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> graph.GraphJobRunner: 50
>> >> >>> >>>> >>> >>>>>>> >>>> > vertices
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > are
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >> >>> >>>> >>> graph.GraphJobRunner: 50
>> >> >>> >>>> >>> >>>>>>> >>>> > vertices
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > are
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>> >> >>> >>>> >>> bsp.LocalBSPRunner:
>> >> >>> >>>> >>> >>>>>>> >>>> Exception
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > during
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> BSP
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> java.lang.IllegalArgumentException:
>> >> >>> >>>> >>> Messages
>> >> >>> >>>> >>> >>>>>>> must
>> >> >>> >>>> >>> >>>>>>> >>>> > never
>> >> >>> >>>> >>> >>>>>>> >>>> > >> be
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> behind
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> the
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current
>> >> Message
>> >> >>> ID: 1
>> >> >>> >>>> >>> vs. 50
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>>
>> >> >>>
>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>>
>> >> >>> >>>>
>> >> >>>
>> >>
>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>>
>> >> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>>
>> >> >>> >>>>
>> >> >>>
>> >>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>>
>> >> >>> >>>>
>> >> >>>
>> >>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>>
>> >> >>> >>>>
>> >> >>>
>> >>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>>
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>>
>> >> >>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>>
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>>
>> >> >>> >>>>
>> >> >>>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>>
>> >> >>> >>>>
>> >> >>>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >> >>> >>>> >>> java.lang.Thread.run(Thread.java:722)
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J.
>> >> Yoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> --
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> --
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > --
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> > @eddieyoon
>> >> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >> >>> >>>> >>> >>>>>>> >>>> > >>
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>> > --
>> >> >>> >>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>>>> >>>> > @eddieyoon
>> >> >>> >>>> >>> >>>>>>> >>>> >
>> >> >>> >>>> >>> >>>>>>> >>>>
>> >> >>> >>>> >>> >>>>>>> >>
>> >> >>> >>>> >>> >>>>>>> >>
>> >> >>> >>>> >>> >>>>>>> >>
>> >> >>> >>>> >>> >>>>>>> >> --
>> >> >>> >>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>>>> >> @eddieyoon
>> >> >>> >>>> >>> >>>>>>> >
>> >> >>> >>>> >>> >>>>>>> >
>> >> >>> >>>> >>> >>>>>>> >
>> >> >>> >>>> >>> >>>>>>> > --
>> >> >>> >>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>>>> > @eddieyoon
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>> >>>>>>> --
>> >> >>> >>>> >>> >>>>>>> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>>>> @eddieyoon
>> >> >>> >>>> >>> >>>>>>>
>> >> >>> >>>> >>> >>>>>
>> >> >>> >>>> >>> >>>>>
>> >> >>> >>>> >>> >>>>>
>> >> >>> >>>> >>> >>>>> --
>> >> >>> >>>> >>> >>>>> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>>> @eddieyoon
>> >> >>> >>>> >>> >>>>
>> >> >>> >>>> >>> >>>>
>> >> >>> >>>> >>> >>>>
>> >> >>> >>>> >>> >>>> --
>> >> >>> >>>> >>> >>>> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>>> @eddieyoon
>> >> >>> >>>> >>> >>>
>> >> >>> >>>> >>> >>>
>> >> >>> >>>> >>> >>>
>> >> >>> >>>> >>> >>> --
>> >> >>> >>>> >>> >>> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >>> @eddieyoon
>> >> >>> >>>> >>> >>
>> >> >>> >>>> >>> >>
>> >> >>> >>>> >>> >>
>> >> >>> >>>> >>> >> --
>> >> >>> >>>> >>> >> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> >> @eddieyoon
>> >> >>> >>>> >>> >
>> >> >>> >>>> >>> >
>> >> >>> >>>> >>> >
>> >> >>> >>>> >>> > --
>> >> >>> >>>> >>> > Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> > @eddieyoon
>> >> >>> >>>> >>>
>> >> >>> >>>> >>>
>> >> >>> >>>> >>>
>> >> >>> >>>> >>> --
>> >> >>> >>>> >>> Best Regards, Edward J. Yoon
>> >> >>> >>>> >>> @eddieyoon
>> >> >>> >>>> >>>
>> >> >>> >>>> >
>> >> >>> >>>> >
>> >> >>> >>>> >
>> >> >>> >>>> > --
>> >> >>> >>>> > Best Regards, Edward J. Yoon
>> >> >>> >>>> > @eddieyoon
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>> --
>> >> >>> >>>> Best Regards, Edward J. Yoon
>> >> >>> >>>> @eddieyoon
>> >> >>> >>>>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> --
>> >> >>> >> Best Regards, Edward J. Yoon
>> >> >>> >> @eddieyoon
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > --
>> >> >>> > Best Regards, Edward J. Yoon
>> >> >>> > @eddieyoon
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Best Regards, Edward J. Yoon
>> >> >>> @eddieyoon
>> >> >>>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Best Regards, Edward J. Yoon
>> >> > @eddieyoon
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by Suraj Menon <su...@apache.org>.

> It can only be answered by patches?

The answer is partly yes. As an example, please refer to the conversation
we had in https://issues.apache.org/jira/browse/HAMA-559
I think between me and Thomas we tried and went to and forth between
atleast three to four designs before we finalized on one. (Mind you the
read performance for spilling queue was fixed later.)

> can we discuss about our plan for vertices first?

The design we(I thought including you) are contemplating now is to do a
join of 2 sorted entities (Vertices and Messages).  This is implied in
HAMA-704 final patch, that has your +1(?). With synchronized communication
and sorted queues, you did suggest somewhere that the performance was
slower(which was expected). So to speed up with scalability, we should do
async communication and spilled sorted queue. Now this needs refactoring
everything we know in messaging code. So patch by patch we would have to
reach there. I don't think there would be a redo. Because most of these are
building blocks for other applications waiting for these changes.

Thanks,
Suraj



On Thu, Mar 14, 2013 at 10:01 AM, Edward J. Yoon <ed...@apache.org>wrote:

> This is pure question.
>
> Before we discussing about below issues, can we discuss about our plan
> for vertices first? Do I need to wait to see if messaging system can
> be used to sort partitioned data by vertex comparator until all
> Spilling queue related issues are fixed? It can only be answered by
> patches?
>
> On Thu, Mar 14, 2013 at 8:40 PM, Suraj Menon <su...@apache.org>
> wrote:
> > Going in line with the latest topic of the conversation.
> > Nothing is closed here and the JIRA's were already created for the whole
> > thing to come in place:
> >
> > HAMA-644
> > HAMA-490
> > HAMA-722
> > HAMA-728
> > HAMA-707
> > HAMA-728
> >
> > The JIRA's above are directly or indirectly affected during core
> > refactoring.
> >
> > -Suraj
> >
> >
> > On Thu, Mar 14, 2013 at 7:03 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> P.S., These comments are never helpful in developing community.
> >>
> >> "before you run riot on all along the codebase, Suraj ist currently
> working
> >> on that stuff- don't make it more difficult for him rebasing all his
> >> patches the whole time.
> >> He has the plan so that we made to make the stuff working, his part is
> >> currently missing. So don't try to muddle arround there, it will make
> this
> >> take longer than already needed."
> >>
> >> On Thu, Mar 14, 2013 at 7:57 PM, Edward J. Yoon <ed...@apache.org>
> >> wrote:
> >> > In my opinion, the our best action is - 1) explain the plans, edit
> >> > together on Wiki, and then 2) break-down implementation tasks as small
> >> > as possible so that available people can try it in parallel. Then, you
> >> > can use available people. Do you remember, I asked you to write down
> >> > your plan here? - http://wiki.apache.org/hama/SpillingQueue If you
> >> > have some time, Please do for me. I'll help you in my free time.
> >> >
> >> > Regarding branches, maybe we all are not familiar with online
> >> > collaboration (or don't want to collaborate anymore). If we want to
> >> > walk own ways, why we need to be in here together?
> >> >
> >> > On Thu, Mar 14, 2013 at 7:13 PM, Suraj Menon <su...@apache.org>
> >> wrote:
> >> >> Three points:
> >> >>
> >> >> Firstly, apologies because partly this conversation emanates from the
> >> delay
> >> >> in providing the set of patches. I was not able to slice as much
> time I
> >> was
> >> >> hoping.
> >> >>
> >> >> Second, I think I/we can work on a separate branches. Since most of
> >> these
> >> >> concerns could only be answered by future patches, a decision could
> be
> >> made
> >> >> then. We can decide if svn revert is needed during the process on
> trunk.
> >> >> (This is a general comment and not related to particular JIRA)
> >> >>
> >> >> Third, Please feel free to slice a release if it is really important.
> >> >>
> >> >> Thanks,
> >> >> Suraj
> >> >>
> >> >> On Thu, Mar 14, 2013 at 5:39 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >wrote:
> >> >>
> >> >>> To reduce arguing, I'm appending my opinions.
> >> >>>
> >> >>> In HAMA-704, I wanted to remove only message map to reduce memory
> >> >>> consumption. I still don't want to talk about disk-based vertices
> and
> >> >>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
> >> >>> 'partitioning issue fixed and quick executable examples' version
> ASAP.
> >> >>> That's why I scheduled Spilling Queue in 0.7 roadmap.
> >> >>>
> >> >>> As you can see, issues are happening one right after another. I
> don't
> >> >>> think we have to clean all never-ending issues. We can improve
> >> >>> step-by-step.
> >> >>>
> >> >>> 1. http://wiki.apache.org/hama/RoadMap
> >> >>>
> >> >>> On Thu, Mar 14, 2013 at 6:22 PM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >
> >> >>> wrote:
> >> >>> > Typos ;)
> >> >>> >
> >> >>> >> except YARN integration tasks. If you leave here, I have to take
> >> cover
> >> >>> >> YARN tasks. Should I wait someone? Am I touching core module
> >> >>> >
> >> >>> > I have to cover YARN tasks instead of you.
> >> >>> >
> >> >>> > On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <
> >> edwardyoon@apache.org>
> >> >>> wrote:
> >> >>> >> Hmm, here's my opinions:
> >> >>> >>
> >> >>> >> As you know, we have a problem of lack of team members and
> >> >>> >> contributors. So we should break down every tasks as small as
> >> >>> >> possible. Our best action is improving step-by-step. And every
> >> >>> >> Hama-x.x.x should run well even though it's a baby cart level.
> >> >>> >>
> >> >>> >> And, Tech should be developed under the necessity. So I think we
> >> need
> >> >>> >> to cut release as often as possible. Therefore I volunteered to
> >> manage
> >> >>> >> release. Actually, I was wanted to work only on QA (quality
> >> assurance)
> >> >>> >> related tasks because yours code is better than me and I have a
> >> >>> >> cluster.
> >> >>> >>
> >> >>> >> However, we are currently not doing like that. I guess there are
> >> many
> >> >>> >> reasons. We're all not a full-time open sourcer (except me).
> >> >>> >>
> >> >>> >>> You have 23 issues assigned.  Why do you need to work on that?
> >> >>> >>
> >> >>> >> I don't know what you mean exactly. But 23 issues are almost
> >> examples
> >> >>> >> except YARN integration tasks. If you leave here, I have to take
> >> cover
> >> >>> >> YARN tasks. Should I wait someone? Am I touching core module
> >> >>> >> aggressively?
> >> >>> >>
> >> >>> >>> Otherwise Suraj and I branch that issues away and you can play
> >> >>> arround.l in
> >> >>> >>> trunk how you like.
> >> >>> >>
> >> >>> >> I also don't know what you mean exactly but if you want, Please
> do.
> >> >>> >>
> >> >>> >> By the way, can you answer about this question - Is it really
> >> >>> >> technical conflicts? or emotional conflicts?
> >> >>> >>
> >> >>> >> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
> >> >>> >> <th...@gmail.com> wrote:
> >> >>> >>> You have 23 issues assigned.  Why do you need to work on that?
> >> >>> >>> Otherwise Suraj and I branch that issues away and you can play
> >> >>> arround.l in
> >> >>> >>> trunk how you like.
> >> >>> >>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <
> >> edwardyoon@apache.org>:
> >> >>> >>>
> >> >>> >>>> P.S., Please don't say like that.
> >> >>> >>>>
> >> >>> >>>> No decisions made yet. And if someone have a question or missed
> >> >>> >>>> something, you have to try to explain here. Because this is a
> open
> >> >>> >>>> source. Anyone can't say "don't touch trunk bc I'm working on
> it".
> >> >>> >>>>
> >> >>> >>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <
> >> >>> edwardyoon@apache.org>
> >> >>> >>>> wrote:
> >> >>> >>>> > Sorry for my quick and dirty style small patches.
> >> >>> >>>> >
> >> >>> >>>> > However, we should work together in parallel. Please share
> here
> >> if
> >> >>> >>>> > there are some progresses.
> >> >>> >>>> >
> >> >>> >>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
> >> >>> >>>> > <th...@gmail.com> wrote:
> >> >>> >>>> >> Hi Edward,
> >> >>> >>>> >>
> >> >>> >>>> >> before you run riot on all along the codebase, Suraj ist
> >> currently
> >> >>> >>>> working
> >> >>> >>>> >> on that stuff- don't make it more difficult for him rebasing
> >> all
> >> >>> his
> >> >>> >>>> >> patches the whole time.
> >> >>> >>>> >> He has the plan so that we made to make the stuff working,
> his
> >> >>> part is
> >> >>> >>>> >> currently missing. So don't try to muddle arround there, it
> >> will
> >> >>> make
> >> >>> >>>> this
> >> >>> >>>> >> take longer than already needed.
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
> >> >>> >>>> >>
> >> >>> >>>> >>> Personally, I would like to solve this issue by touching
> >> >>> >>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices
> into
> >> >>> >>>> >>> multiple files, we can avoid huge memory consumption.
> >> >>> >>>> >>>
> >> >>> >>>> >>> If we want to sort partitioned data using messaging system,
> >> idea
> >> >>> >>>> >>> should be collected.
> >> >>> >>>> >>>
> >> >>> >>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
> >> >>> >>>> edwardyoon@apache.org>
> >> >>> >>>> >>> wrote:
> >> >>> >>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely
> >> written.
> >> >>> >>>> >>> >
> >> >>> >>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
> >> >>> >>>> edwardyoon@apache.org>
> >> >>> >>>> >>> wrote:
> >> >>> >>>> >>> >> I'm reading changes of HAMA-704 again. As a result of
> >> adding
> >> >>> >>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted.
> I'm
> >> >>> not sure
> >> >>> >>>> >>> >> but I think this approach will bring more disadvantages
> >> than
> >> >>> >>>> >>> >> advantages.
> >> >>> >>>> >>> >>
> >> >>> >>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
> >> >>> >>>> edwardyoon@apache.org>
> >> >>> >>>> >>> wrote:
> >> >>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
> >> >>> storage in
> >> >>> >>>> >>> user space
> >> >>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads
> and
> >> >>> writes.
> >> >>> >>>> >>> This way
> >> >>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can
> keep
> >> >>> vertices
> >> >>> >>>> >>> sorted
> >> >>> >>>> >>> >>>>>> with a single read and single write on every peer.
> >> >>> >>>> >>> >>>
> >> >>> >>>> >>> >>> And, as I commented JIRA ticket, I think we can't use
> >> >>> messaging
> >> >>> >>>> system
> >> >>> >>>> >>> >>> for sorting vertices within partition files.
> >> >>> >>>> >>> >>>
> >> >>> >>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
> >> >>> >>>> >>> edwardyoon@apache.org> wrote:
> >> >>> >>>> >>> >>>> P.S., (number of splits = number of partitions) is
> really
> >> >>> confuse
> >> >>> >>>> to
> >> >>> >>>> >>> >>>> me. Even though blocks number is equal to desired
> tasks
> >> >>> number,
> >> >>> >>>> data
> >> >>> >>>> >>> >>>> should be re-partitioned again.
> >> >>> >>>> >>> >>>>
> >> >>> >>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
> >> >>> >>>> >>> edwardyoon@apache.org> wrote:
> >> >>> >>>> >>> >>>>> Indeed. If there are already partitioned input files
> >> >>> (unsorted)
> >> >>> >>>> and
> >> >>> >>>> >>> so
> >> >>> >>>> >>> >>>>> user want to skip pre-partitioning phase, it should
> be
> >> >>> handled in
> >> >>> >>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't
> >> know why
> >> >>> >>>> >>> >>>>> re-partitioned files need to be Sorted. It's only
> about
> >> >>> >>>> >>> >>>>> GraphJobRunner.
> >> >>> >>>> >>> >>>>>
> >> >>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs.
> We
> >> can
> >> >>> have
> >> >>> >>>> a
> >> >>> >>>> >>> dedicated
> >> >>> >>>> >>> >>>>>> partitioning superstep for graph applications).
> >> >>> >>>> >>> >>>>>
> >> >>> >>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean
> just
> >> a
> >> >>> >>>> >>> partitioning
> >> >>> >>>> >>> >>>>> job based on superstep API?
> >> >>> >>>> >>> >>>>>
> >> >>> >>>> >>> >>>>> By default, 100 tasks will be assigned for
> partitioning
> >> job.
> >> >>> >>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus,
> we
> >> can
> >> >>> >>>> execute
> >> >>> >>>> >>> >>>>> the Graph job with 1,000 tasks.
> >> >>> >>>> >>> >>>>>
> >> >>> >>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100
> >> >>> blocks). If
> >> >>> >>>> I
> >> >>> >>>> >>> >>>>> want to run with 1,000 tasks, what happens?
> >> >>> >>>> >>> >>>>>
> >> >>> >>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
> >> >>> >>>> surajsmenon@apache.org>
> >> >>> >>>> >>> wrote:
> >> >>> >>>> >>> >>>>>> I am responding on this thread because of better
> >> >>> continuity for
> >> >>> >>>> >>> >>>>>> conversation. We cannot expect the partitions to be
> >> sorted
> >> >>> every
> >> >>> >>>> >>> time. When
> >> >>> >>>> >>> >>>>>> the number of splits = number of partitions and
> >> >>> partitioning is
> >> >>> >>>> >>> switched
> >> >>> >>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be
> >> sorted.
> >> >>> Can
> >> >>> >>>> we
> >> >>> >>>> >>> do this
> >> >>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
> >> >>> storage in
> >> >>> >>>> >>> user space
> >> >>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads
> and
> >> >>> writes.
> >> >>> >>>> >>> This way
> >> >>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can
> keep
> >> >>> vertices
> >> >>> >>>> >>> sorted
> >> >>> >>>> >>> >>>>>> with a single read and single write on every peer.
> >> >>> >>>> >>> >>>>>>
> >> >>> >>>> >>> >>>>>> Just clearing confusion if any regarding superstep
> >> >>> injection for
> >> >>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs.
> We
> >> can
> >> >>> have
> >> >>> >>>> a
> >> >>> >>>> >>> dedicated
> >> >>> >>>> >>> >>>>>> partitioning superstep for graph applications).
> >> >>> >>>> >>> >>>>>> Say there are x splits and y number of tasks
> >> configured by
> >> >>> user.
> >> >>> >>>> >>> >>>>>>
> >> >>> >>>> >>> >>>>>> if x > y
> >> >>> >>>> >>> >>>>>> The y tasks are scheduled with x of them having
> each of
> >> >>> the x
> >> >>> >>>> >>> splits and
> >> >>> >>>> >>> >>>>>> the remaining with no resource local to them. Then
> the
> >> >>> >>>> partitioning
> >> >>> >>>> >>> >>>>>> superstep redistributes the partitions among them to
> >> create
> >> >>> >>>> local
> >> >>> >>>> >>> >>>>>> partitions. Now the question is can we
> re-initialize a
> >> >>> peer's
> >> >>> >>>> input
> >> >>> >>>> >>> based
> >> >>> >>>> >>> >>>>>> on this new local part of partition?
> >> >>> >>>> >>> >>>>>>
> >> >>> >>>> >>> >>>>>> if y > x
> >> >>> >>>> >>> >>>>>> works as it works today.
> >> >>> >>>> >>> >>>>>>
> >> >>> >>>> >>> >>>>>> Just putting my points in brainstorming.
> >> >>> >>>> >>> >>>>>>
> >> >>> >>>> >>> >>>>>> -Suraj
> >> >>> >>>> >>> >>>>>>
> >> >>> >>>> >>> >>>>>>
> >> >>> >>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
> >> >>> >>>> >>> edwardyoon@apache.org>wrote:
> >> >>> >>>> >>> >>>>>>
> >> >>> >>>> >>> >>>>>>> I just filed here
> >> >>> >>>> https://issues.apache.org/jira/browse/HAMA-744
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
> >> >>> >>>> >>> edwardyoon@apache.org>
> >> >>> >>>> >>> >>>>>>> wrote:
> >> >>> >>>> >>> >>>>>>> > Additionally,
> >> >>> >>>> >>> >>>>>>> >
> >> >>> >>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we
> >> >>> inject the
> >> >>> >>>> >>> partitioning
> >> >>> >>>> >>> >>>>>>> >> superstep as the first superstep and use local
> >> memory?
> >> >>> >>>> >>> >>>>>>> >
> >> >>> >>>> >>> >>>>>>> > Can we execute different number of tasks per
> >> superstep?
> >> >>> >>>> >>> >>>>>>> >
> >> >>> >>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
> >> >>> >>>> >>> edwardyoon@apache.org>
> >> >>> >>>> >>> >>>>>>> wrote:
> >> >>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files
> that
> >> >>> result
> >> >>> >>>> from
> >> >>> >>>> >>> the
> >> >>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only
> >> the
> >> >>> >>>> partition
> >> >>> >>>> >>> files in
> >> >>> >>>> >>> >>>>>>> >>
> >> >>> >>>> >>> >>>>>>> >> I see.
> >> >>> >>>> >>> >>>>>>> >>
> >> >>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
> >> >>> superstep
> >> >>> >>>> API,
> >> >>> >>>> >>> Suraj's
> >> >>> >>>> >>> >>>>>>> idea
> >> >>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
> >> >>> partitions the
> >> >>> >>>> >>> stuff into
> >> >>> >>>> >>> >>>>>>> our
> >> >>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
> >> >>> >>>> >>> >>>>>>> >>
> >> >>> >>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated
> in
> >> >>> >>>> partitioning
> >> >>> >>>> >>> step,
> >> >>> >>>> >>> >>>>>>> >> separated partitioning job may not be bad idea.
> Is
> >> >>> there
> >> >>> >>>> some
> >> >>> >>>> >>> special
> >> >>> >>>> >>> >>>>>>> >> reason?
> >> >>> >>>> >>> >>>>>>> >>
> >> >>> >>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
> >> >>> >>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
> >> >>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files
> that
> >> >>> result
> >> >>> >>>> from
> >> >>> >>>> >>> the
> >> >>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only
> >> the
> >> >>> >>>> partition
> >> >>> >>>> >>> files in
> >> >>> >>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in
> not
> >> >>> sorted
> >> >>> >>>> data
> >> >>> >>>> >>> in the
> >> >>> >>>> >>> >>>>>>> >>> completed file. This only applies for the graph
> >> >>> processing
> >> >>> >>>> >>> package.
> >> >>> >>>> >>> >>>>>>> >>> So as Suraj told, it would be much more
> simpler to
> >> >>> solve
> >> >>> >>>> this
> >> >>> >>>> >>> via
> >> >>> >>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very
> >> very
> >> >>> >>>> >>> scalable!). So the
> >> >>> >>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff
> with
> >> a
> >> >>> single
> >> >>> >>>> >>> superstep in
> >> >>> >>>> >>> >>>>>>> >>> setup() as it was before ages ago. The
> messaging
> >> must
> >> >>> be
> >> >>> >>>> >>> sorted anyway
> >> >>> >>>> >>> >>>>>>> for
> >> >>> >>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and
> >> saves
> >> >>> us
> >> >>> >>>> the
> >> >>> >>>> >>> >>>>>>> partitioning
> >> >>> >>>> >>> >>>>>>> >>> job for graph processing.
> >> >>> >>>> >>> >>>>>>> >>>
> >> >>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
> >> >>> superstep
> >> >>> >>>> API,
> >> >>> >>>> >>> Suraj's
> >> >>> >>>> >>> >>>>>>> idea
> >> >>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
> >> >>> partitions the
> >> >>> >>>> >>> stuff into
> >> >>> >>>> >>> >>>>>>> our
> >> >>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
> >> >>> >>>> >>> >>>>>>> >>>
> >> >>> >>>> >>> >>>>>>> >>>
> >> >>> >>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
> >> >>> >>>> >>> >>>>>>> >>>
> >> >>> >>>> >>> >>>>>>> >>>> No, the partitions we write locally need not
> be
> >> >>> sorted.
> >> >>> >>>> Sorry
> >> >>> >>>> >>> for the
> >> >>> >>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible
> >> with
> >> >>> >>>> Superstep
> >> >>> >>>> >>> API.
> >> >>> >>>> >>> >>>>>>> There
> >> >>> >>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler
> >> after
> >> >>> I
> >> >>> >>>> last
> >> >>> >>>> >>> worked on
> >> >>> >>>> >>> >>>>>>> it.
> >> >>> >>>> >>> >>>>>>> >>>> We can then look into partitioning superstep
> >> being
> >> >>> >>>> executed
> >> >>> >>>> >>> before the
> >> >>> >>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I
> >> think
> >> >>> it is
> >> >>> >>>> >>> feasible.
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J.
> Yoon <
> >> >>> >>>> >>> edwardyoon@apache.org
> >> >>> >>>> >>> >>>>>>> >>>> >wrote:
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue,
> >> can we
> >> >>> >>>> inject
> >> >>> >>>> >>> the
> >> >>> >>>> >>> >>>>>>> >>>> partitioning
> >> >>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use
> >> local
> >> >>> memory?
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before
> >> calling
> >> >>> >>>> >>> BSP.setup()
> >> >>> >>>> >>> >>>>>>> method
> >> >>> >>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But,
> in my
> >> >>> opinion,
> >> >>> >>>> >>> current is
> >> >>> >>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more
> >> >>> experiences of
> >> >>> >>>> >>> input
> >> >>> >>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be
> >> Sorted?!
> >> >>> >>>> MR-like?
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj
> Menon <
> >> >>> >>>> >>> >>>>>>> surajsmenon@apache.org>
> >> >>> >>>> >>> >>>>>>> >>>> > wrote:
> >> >>> >>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to
> >> outside
> >> >>> graph
> >> >>> >>>> >>> module.
> >> >>> >>>> >>> >>>>>>> When we
> >> >>> >>>> >>> >>>>>>> >>>> > have
> >> >>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue,
> >> can we
> >> >>> >>>> inject
> >> >>> >>>> >>> the
> >> >>> >>>> >>> >>>>>>> >>>> partitioning
> >> >>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use
> >> local
> >> >>> memory?
> >> >>> >>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a
> job
> >> and
> >> >>> are
> >> >>> >>>> >>> creating two
> >> >>> >>>> >>> >>>>>>> copies
> >> >>> >>>> >>> >>>>>>> >>>> > of
> >> >>> >>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly.
> >> Is it
> >> >>> >>>> possible
> >> >>> >>>> >>> to
> >> >>> >>>> >>> >>>>>>> create or
> >> >>> >>>> >>> >>>>>>> >>>> > > redistribute the partitions on local
> memory
> >> and
> >> >>> >>>> >>> initialize the
> >> >>> >>>> >>> >>>>>>> record
> >> >>> >>>> >>> >>>>>>> >>>> > > reader there?
> >> >>> >>>> >>> >>>>>>> >>>> > > The user can run a separate job give in
> >> examples
> >> >>> area
> >> >>> >>>> to
> >> >>> >>>> >>> >>>>>>> explicitly
> >> >>> >>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The
> deployment
> >> >>> question
> >> >>> >>>> is
> >> >>> >>>> >>> how much
> >> >>> >>>> >>> >>>>>>> of
> >> >>> >>>> >>> >>>>>>> >>>> disk
> >> >>> >>>> >>> >>>>>>> >>>> > > space gets allocated for local memory
> usage?
> >> >>> Would it
> >> >>> >>>> be
> >> >>> >>>> >>> a safe
> >> >>> >>>> >>> >>>>>>> >>>> approach
> >> >>> >>>> >>> >>>>>>> >>>> > > with the limitations?
> >> >>> >>>> >>> >>>>>>> >>>> > >
> >> >>> >>>> >>> >>>>>>> >>>> > > -Suraj
> >> >>> >>>> >>> >>>>>>> >>>> > >
> >> >>> >>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas
> >> Jungblut
> >> >>> >>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
> >> >>> >>>> >>> >>>>>>> >>>> > >
> >> >>> >>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted
> >> files
> >> >>> we can
> >> >>> >>>> add
> >> >>> >>>> >>> this to
> >> >>> >>>> >>> >>>>>>> the
> >> >>> >>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
> >> >>> >>>> >>> >>>>>>> >>>> > >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <
> >> edwardyoon@apache.org
> >> >>> >
> >> >>> >>>> >>> >>>>>>> >>>> > >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data
> >> really
> >> >>> >>>> necessary
> >> >>> >>>> >>> to be
> >> >>> >>>> >>> >>>>>>> Sorted?
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas
> >> >>> Jungblut
> >> >>> >>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works,
> >> >>> obviously
> >> >>> >>>> if
> >> >>> >>>> >>> you merge
> >> >>> >>>> >>> >>>>>>> n
> >> >>> >>>> >>> >>>>>>> >>>> > sorted
> >> >>> >>>> >>> >>>>>>> >>>> > >> > files
> >> >>> >>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this
> >> will
> >> >>> >>>> result in
> >> >>> >>>> >>> totally
> >> >>> >>>> >>> >>>>>>> >>>> > unsorted
> >> >>> >>>> >>> >>>>>>> >>>> > >> > data
> >> >>> >>>> >>> >>>>>>> >>>> > >> > > ;-)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via
> messaging?
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
> >> >>> >>>> thomas.jungblut@gmail.com
> >> >>> >>>> >>> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly
> >> sorted:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> ...
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> ...
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
> >> >>> >>>> >>> thomas.jungblut@gmail.com>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
> >> >>> >>>> edwardyoon@apache.org>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly,
> >> please
> >> >>> do.
> >> >>> >>>> >>> March 1 is
> >> >>> >>>> >>> >>>>>>> >>>> > holiday[1]
> >> >>> >>>> >>> >>>>>>> >>>> > >> so
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> 1.
> >> >>> >>>> >>> >>>>>>>
> >> >>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM,
> >> Thomas
> >> >>> >>>> Jungblut
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com>
> wrote:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the
> file,
> >> >>> didn't
> >> >>> >>>> >>> observe if all
> >> >>> >>>> >>> >>>>>>> >>>> items
> >> >>> >>>> >>> >>>>>>> >>>> > >> were
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> added.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I
> >> copy/pasted the
> >> >>> >>>> logic
> >> >>> >>>> >>> of the ID
> >> >>> >>>> >>> >>>>>>> into
> >> >>> >>>> >>> >>>>>>> >>>> > the
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
> >> >>> >>>> edwardyoon@apache.org
> >> >>> >>>> >>> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen,
> >> when
> >> >>> >>>> generate
> >> >>> >>>> >>> adjacency
> >> >>> >>>> >>> >>>>>>> >>>> matrix
> >> >>> >>>> >>> >>>>>>> >>>> > >> into
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29
> PM,
> >> >>> Thomas
> >> >>> >>>> >>> Jungblut
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com>
> >> wrote:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they
> >> >>> partitioned
> >> >>> >>>> >>> correctly?
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
> >> >>> >>>> >>> edwardyoon@apache.org>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> >> >>> :~/workspace/hama-trunk$
> >> >>> >>>> ls
> >> >>> >>>> >>> -al
> >> >>> >>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward
> >>  4096
> >> >>>  2월 28
> >> >>> >>>> >>> 18:03 .
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root
> >> 20480
> >> >>>  2월 28
> >> >>> >>>> >>> 18:04 ..
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward
> >>  2243
> >> >>>  2월 28
> >> >>> >>>> >>> 18:01
> >> >>> >>>> >>> >>>>>>> part-00000
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward
> >>  28
> >> >>>  2월 28
> >> >>> >>>> >>> 18:01
> >> >>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward
> >>  2251
> >> >>>  2월 28
> >> >>> >>>> >>> 18:01
> >> >>> >>>> >>> >>>>>>> part-00001
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward
> >>  28
> >> >>>  2월 28
> >> >>> >>>> >>> 18:01
> >> >>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward
> >>  4096
> >> >>>  2월 28
> >> >>> >>>> >>> 18:03
> >> >>> >>>> >>> >>>>>>> partitions
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> >> >>> :~/workspace/hama-trunk$
> >> >>> >>>> ls
> >> >>> >>>> >>> -al
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward
> 4096
> >> >>>  2월 28
> >> >>> >>>> >>> 18:03 .
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward
> 4096
> >> >>>  2월 28
> >> >>> >>>> >>> 18:03 ..
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward
> 2932
> >> >>>  2월 28
> >> >>> >>>> 18:03
> >> >>> >>>> >>> >>>>>>> part-00000
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward
>   32
> >> >>>  2월 28
> >> >>> >>>> 18:03
> >> >>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward
> 2955
> >> >>>  2월 28
> >> >>> >>>> 18:03
> >> >>> >>>> >>> >>>>>>> part-00001
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward
>   32
> >> >>>  2월 28
> >> >>> >>>> 18:03
> >> >>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> >> >>> :~/workspace/hama-trunk$
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27
> >> PM,
> >> >>> Edward
> >> >>> >>>> <
> >> >>> >>>> >>> >>>>>>> >>>> edward@udanax.org
> >> >>> >>>> >>> >>>>>>> >>>> > >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > wrote:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18
> PM,
> >> >>> Thomas
> >> >>> >>>> >>> Jungblut <
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an
> >> observation
> >> >>> for me
> >> >>> >>>> >>> please?
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from
> >> >>> fastgen,
> >> >>> >>>> >>> part-00000 and
> >> >>> >>>> >>> >>>>>>> >>>> > >> part-00001,
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> both
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition
> >> >>> directory,
> >> >>> >>>> there
> >> >>> >>>> >>> is only a
> >> >>> >>>> >>> >>>>>>> >>>> single
> >> >>> >>>> >>> >>>>>>> >>>> > >> > 5.56kb
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> file.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the
> >> >>> partitioner to
> >> >>> >>>> >>> write a
> >> >>> >>>> >>> >>>>>>> single
> >> >>> >>>> >>> >>>>>>> >>>> > file
> >> >>> >>>> >>> >>>>>>> >>>> > >> if
> >> >>> >>>> >>> >>>>>>> >>>> > >> > you
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two
> >> files,
> >> >>> >>>> strange
> >> >>> >>>> >>> huh?
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas
> Jungblut <
> >> >>> >>>> >>> >>>>>>> thomas.jungblut@gmail.com>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into
> it.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10
> >> >>> /tmp/randomgraph
> >> >>> >>>> 1
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank
> /tmp/randomgraph
> >> >>> >>>> /tmp/pageout
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last
> >> time I
> >> >>> >>>> >>> profiled, maybe
> >> >>> >>>> >>> >>>>>>> the
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with
> >> the
> >> >>> input
> >> >>> >>>> or
> >> >>> >>>> >>> something
> >> >>> >>>> >>> >>>>>>> else.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J.
> Yoon <
> >> >>> >>>> >>> edwardyoon@apache.org
> >> >>> >>>> >>> >>>>>>> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not
> >> work
> >> >>> for
> >> >>> >>>> graph
> >> >>> >>>> >>> examples.
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> edward@edward-virtualBox
> >> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >> >>> >>>> >>> >>>>>>> >>>> > >> > jar
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> gen
> >> >>> >>>> >>> >>>>>>> >>>> > >> > fastgen
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
> >> >>> >>>> >>> util.NativeCodeLoader:
> >> >>> >>>> >>> >>>>>>> Unable
> >> >>> >>>> >>> >>>>>>> >>>> > to
> >> >>> >>>> >>> >>>>>>> >>>> > >> > load
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library
> for
> >> your
> >> >>> >>>> >>> platform...
> >> >>> >>>> >>> >>>>>>> using
> >> >>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> Running
> >> >>> >>>> >>> >>>>>>> >>>> job:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> >> >>> >>>> >>> bsp.LocalBSPRunner:
> >> >>> >>>> >>> >>>>>>> Setting
> >> >>> >>>> >>> >>>>>>> >>>> up
> >> >>> >>>> >>> >>>>>>> >>>> > a
> >> >>> >>>> >>> >>>>>>> >>>> > >> new
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> Current
> >> >>> >>>> >>> >>>>>>> >>>> > >> supersteps
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >> >>> >>>> >>> bsp.BSPJobClient: The
> >> >>> >>>> >>> >>>>>>> total
> >> >>> >>>> >>> >>>>>>> >>>> > number
> >> >>> >>>> >>> >>>>>>> >>>> > >> > of
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> Counters: 3
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212
> >> seconds
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> edward@edward-virtualBox
> >> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >> >>> >>>> >>> >>>>>>> >>>> > >> > jar
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> hama-examples-0.7.0-SNAPSHOT.jar
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> edward@edward-virtualBox
> >> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >> >>> >>>> >>> >>>>>>> >>>> > >> > jar
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> >> >>> >>>> >>> >>>>>>> >>>> > pagerank
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph
> >> /tmp/pageour
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
> >> >>> >>>> >>> util.NativeCodeLoader:
> >> >>> >>>> >>> >>>>>>> Unable
> >> >>> >>>> >>> >>>>>>> >>>> > to
> >> >>> >>>> >>> >>>>>>> >>>> > >> > load
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library
> for
> >> your
> >> >>> >>>> >>> platform...
> >> >>> >>>> >>> >>>>>>> using
> >> >>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> >> >>> >>>> >>> bsp.FileInputFormat:
> >> >>> >>>> >>> >>>>>>> Total
> >> >>> >>>> >>> >>>>>>> >>>> > input
> >> >>> >>>> >>> >>>>>>> >>>> > >> > paths
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> >> >>> >>>> >>> bsp.FileInputFormat:
> >> >>> >>>> >>> >>>>>>> Total
> >> >>> >>>> >>> >>>>>>> >>>> > input
> >> >>> >>>> >>> >>>>>>> >>>> > >> > paths
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> Running
> >> >>> >>>> >>> >>>>>>> >>>> job:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> >> >>> >>>> >>> bsp.LocalBSPRunner:
> >> >>> >>>> >>> >>>>>>> Setting
> >> >>> >>>> >>> >>>>>>> >>>> up
> >> >>> >>>> >>> >>>>>>> >>>> > a
> >> >>> >>>> >>> >>>>>>> >>>> > >> new
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> Current
> >> >>> >>>> >>> >>>>>>> >>>> > >> supersteps
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient: The
> >> >>> >>>> >>> >>>>>>> total
> >> >>> >>>> >>> >>>>>>> >>>> > number
> >> >>> >>>> >>> >>>>>>> >>>> > >> > of
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> Counters: 6
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.FileInputFormat:
> >> >>> >>>> >>> >>>>>>> Total
> >> >>> >>>> >>> >>>>>>> >>>> > input
> >> >>> >>>> >>> >>>>>>> >>>> > >> > paths
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.BSPJobClient:
> >> >>> >>>> >>> >>>>>>> Running
> >> >>> >>>> >>> >>>>>>> >>>> job:
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> bsp.LocalBSPRunner:
> >> >>> >>>> >>> >>>>>>> Setting
> >> >>> >>>> >>> >>>>>>> >>>> up
> >> >>> >>>> >>> >>>>>>> >>>> > a
> >> >>> >>>> >>> >>>>>>> >>>> > >> new
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> graph.GraphJobRunner: 50
> >> >>> >>>> >>> >>>>>>> >>>> > vertices
> >> >>> >>>> >>> >>>>>>> >>>> > >> > are
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >> >>> >>>> >>> graph.GraphJobRunner: 50
> >> >>> >>>> >>> >>>>>>> >>>> > vertices
> >> >>> >>>> >>> >>>>>>> >>>> > >> > are
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
> >> >>> >>>> >>> bsp.LocalBSPRunner:
> >> >>> >>>> >>> >>>>>>> >>>> Exception
> >> >>> >>>> >>> >>>>>>> >>>> > >> > during
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> BSP
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> java.lang.IllegalArgumentException:
> >> >>> >>>> >>> Messages
> >> >>> >>>> >>> >>>>>>> must
> >> >>> >>>> >>> >>>>>>> >>>> > never
> >> >>> >>>> >>> >>>>>>> >>>> > >> be
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> behind
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> the
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current
> >> Message
> >> >>> ID: 1
> >> >>> >>>> >>> vs. 50
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>>
> >> >>>
> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>>
> >> >>> >>>>
> >> >>>
> >>
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>>
> >> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>>
> >> >>> >>>>
> >> >>>
> >>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >>
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>>
> >> >>> >>>>
> >> >>>
> >>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >>
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>>
> >> >>> >>>>
> >> >>>
> >>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>>
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>>
> >> >>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>>
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >>
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>>
> >> >>> >>>>
> >> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >>
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>>
> >> >>> >>>>
> >> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >> >>> >>>> >>> java.lang.Thread.run(Thread.java:722)
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J.
> >> Yoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> --
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> --
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >> > --
> >> >>> >>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> > @eddieyoon
> >> >>> >>>> >>> >>>>>>> >>>> > >> >
> >> >>> >>>> >>> >>>>>>> >>>> > >>
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>> > --
> >> >>> >>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>>>> >>>> > @eddieyoon
> >> >>> >>>> >>> >>>>>>> >>>> >
> >> >>> >>>> >>> >>>>>>> >>>>
> >> >>> >>>> >>> >>>>>>> >>
> >> >>> >>>> >>> >>>>>>> >>
> >> >>> >>>> >>> >>>>>>> >>
> >> >>> >>>> >>> >>>>>>> >> --
> >> >>> >>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>>>> >> @eddieyoon
> >> >>> >>>> >>> >>>>>>> >
> >> >>> >>>> >>> >>>>>>> >
> >> >>> >>>> >>> >>>>>>> >
> >> >>> >>>> >>> >>>>>>> > --
> >> >>> >>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>>>> > @eddieyoon
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>> >>>>>>> --
> >> >>> >>>> >>> >>>>>>> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>>>> @eddieyoon
> >> >>> >>>> >>> >>>>>>>
> >> >>> >>>> >>> >>>>>
> >> >>> >>>> >>> >>>>>
> >> >>> >>>> >>> >>>>>
> >> >>> >>>> >>> >>>>> --
> >> >>> >>>> >>> >>>>> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>>> @eddieyoon
> >> >>> >>>> >>> >>>>
> >> >>> >>>> >>> >>>>
> >> >>> >>>> >>> >>>>
> >> >>> >>>> >>> >>>> --
> >> >>> >>>> >>> >>>> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>>> @eddieyoon
> >> >>> >>>> >>> >>>
> >> >>> >>>> >>> >>>
> >> >>> >>>> >>> >>>
> >> >>> >>>> >>> >>> --
> >> >>> >>>> >>> >>> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >>> @eddieyoon
> >> >>> >>>> >>> >>
> >> >>> >>>> >>> >>
> >> >>> >>>> >>> >>
> >> >>> >>>> >>> >> --
> >> >>> >>>> >>> >> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> >> @eddieyoon
> >> >>> >>>> >>> >
> >> >>> >>>> >>> >
> >> >>> >>>> >>> >
> >> >>> >>>> >>> > --
> >> >>> >>>> >>> > Best Regards, Edward J. Yoon
> >> >>> >>>> >>> > @eddieyoon
> >> >>> >>>> >>>
> >> >>> >>>> >>>
> >> >>> >>>> >>>
> >> >>> >>>> >>> --
> >> >>> >>>> >>> Best Regards, Edward J. Yoon
> >> >>> >>>> >>> @eddieyoon
> >> >>> >>>> >>>
> >> >>> >>>> >
> >> >>> >>>> >
> >> >>> >>>> >
> >> >>> >>>> > --
> >> >>> >>>> > Best Regards, Edward J. Yoon
> >> >>> >>>> > @eddieyoon
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> --
> >> >>> >>>> Best Regards, Edward J. Yoon
> >> >>> >>>> @eddieyoon
> >> >>> >>>>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> Best Regards, Edward J. Yoon
> >> >>> >> @eddieyoon
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > Best Regards, Edward J. Yoon
> >> >>> > @eddieyoon
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best Regards, Edward J. Yoon
> >> >>> @eddieyoon
> >> >>>
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards, Edward J. Yoon
> >> > @eddieyoon
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

This is pure question.

Before we discussing about below issues, can we discuss about our plan
for vertices first? Do I need to wait to see if messaging system can
be used to sort partitioned data by vertex comparator until all
Spilling queue related issues are fixed? It can only be answered by
patches?

On Thu, Mar 14, 2013 at 8:40 PM, Suraj Menon <su...@apache.org> wrote:
> Going in line with the latest topic of the conversation.
> Nothing is closed here and the JIRA's were already created for the whole
> thing to come in place:
>
> HAMA-644
> HAMA-490
> HAMA-722
> HAMA-728
> HAMA-707
> HAMA-728
>
> The JIRA's above are directly or indirectly affected during core
> refactoring.
>
> -Suraj
>
>
> On Thu, Mar 14, 2013 at 7:03 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> P.S., These comments are never helpful in developing community.
>>
>> "before you run riot on all along the codebase, Suraj ist currently working
>> on that stuff- don't make it more difficult for him rebasing all his
>> patches the whole time.
>> He has the plan so that we made to make the stuff working, his part is
>> currently missing. So don't try to muddle arround there, it will make this
>> take longer than already needed."
>>
>> On Thu, Mar 14, 2013 at 7:57 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> > In my opinion, the our best action is - 1) explain the plans, edit
>> > together on Wiki, and then 2) break-down implementation tasks as small
>> > as possible so that available people can try it in parallel. Then, you
>> > can use available people. Do you remember, I asked you to write down
>> > your plan here? - http://wiki.apache.org/hama/SpillingQueue If you
>> > have some time, Please do for me. I'll help you in my free time.
>> >
>> > Regarding branches, maybe we all are not familiar with online
>> > collaboration (or don't want to collaborate anymore). If we want to
>> > walk own ways, why we need to be in here together?
>> >
>> > On Thu, Mar 14, 2013 at 7:13 PM, Suraj Menon <su...@apache.org>
>> wrote:
>> >> Three points:
>> >>
>> >> Firstly, apologies because partly this conversation emanates from the
>> delay
>> >> in providing the set of patches. I was not able to slice as much time I
>> was
>> >> hoping.
>> >>
>> >> Second, I think I/we can work on a separate branches. Since most of
>> these
>> >> concerns could only be answered by future patches, a decision could be
>> made
>> >> then. We can decide if svn revert is needed during the process on trunk.
>> >> (This is a general comment and not related to particular JIRA)
>> >>
>> >> Third, Please feel free to slice a release if it is really important.
>> >>
>> >> Thanks,
>> >> Suraj
>> >>
>> >> On Thu, Mar 14, 2013 at 5:39 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >>
>> >>> To reduce arguing, I'm appending my opinions.
>> >>>
>> >>> In HAMA-704, I wanted to remove only message map to reduce memory
>> >>> consumption. I still don't want to talk about disk-based vertices and
>> >>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
>> >>> 'partitioning issue fixed and quick executable examples' version ASAP.
>> >>> That's why I scheduled Spilling Queue in 0.7 roadmap.
>> >>>
>> >>> As you can see, issues are happening one right after another. I don't
>> >>> think we have to clean all never-ending issues. We can improve
>> >>> step-by-step.
>> >>>
>> >>> 1. http://wiki.apache.org/hama/RoadMap
>> >>>
>> >>> On Thu, Mar 14, 2013 at 6:22 PM, Edward J. Yoon <edwardyoon@apache.org
>> >
>> >>> wrote:
>> >>> > Typos ;)
>> >>> >
>> >>> >> except YARN integration tasks. If you leave here, I have to take
>> cover
>> >>> >> YARN tasks. Should I wait someone? Am I touching core module
>> >>> >
>> >>> > I have to cover YARN tasks instead of you.
>> >>> >
>> >>> > On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>> wrote:
>> >>> >> Hmm, here's my opinions:
>> >>> >>
>> >>> >> As you know, we have a problem of lack of team members and
>> >>> >> contributors. So we should break down every tasks as small as
>> >>> >> possible. Our best action is improving step-by-step. And every
>> >>> >> Hama-x.x.x should run well even though it's a baby cart level.
>> >>> >>
>> >>> >> And, Tech should be developed under the necessity. So I think we
>> need
>> >>> >> to cut release as often as possible. Therefore I volunteered to
>> manage
>> >>> >> release. Actually, I was wanted to work only on QA (quality
>> assurance)
>> >>> >> related tasks because yours code is better than me and I have a
>> >>> >> cluster.
>> >>> >>
>> >>> >> However, we are currently not doing like that. I guess there are
>> many
>> >>> >> reasons. We're all not a full-time open sourcer (except me).
>> >>> >>
>> >>> >>> You have 23 issues assigned.  Why do you need to work on that?
>> >>> >>
>> >>> >> I don't know what you mean exactly. But 23 issues are almost
>> examples
>> >>> >> except YARN integration tasks. If you leave here, I have to take
>> cover
>> >>> >> YARN tasks. Should I wait someone? Am I touching core module
>> >>> >> aggressively?
>> >>> >>
>> >>> >>> Otherwise Suraj and I branch that issues away and you can play
>> >>> arround.l in
>> >>> >>> trunk how you like.
>> >>> >>
>> >>> >> I also don't know what you mean exactly but if you want, Please do.
>> >>> >>
>> >>> >> By the way, can you answer about this question - Is it really
>> >>> >> technical conflicts? or emotional conflicts?
>> >>> >>
>> >>> >> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
>> >>> >> <th...@gmail.com> wrote:
>> >>> >>> You have 23 issues assigned.  Why do you need to work on that?
>> >>> >>> Otherwise Suraj and I branch that issues away and you can play
>> >>> arround.l in
>> >>> >>> trunk how you like.
>> >>> >>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <
>> edwardyoon@apache.org>:
>> >>> >>>
>> >>> >>>> P.S., Please don't say like that.
>> >>> >>>>
>> >>> >>>> No decisions made yet. And if someone have a question or missed
>> >>> >>>> something, you have to try to explain here. Because this is a open
>> >>> >>>> source. Anyone can't say "don't touch trunk bc I'm working on it".
>> >>> >>>>
>> >>> >>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <
>> >>> edwardyoon@apache.org>
>> >>> >>>> wrote:
>> >>> >>>> > Sorry for my quick and dirty style small patches.
>> >>> >>>> >
>> >>> >>>> > However, we should work together in parallel. Please share here
>> if
>> >>> >>>> > there are some progresses.
>> >>> >>>> >
>> >>> >>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
>> >>> >>>> > <th...@gmail.com> wrote:
>> >>> >>>> >> Hi Edward,
>> >>> >>>> >>
>> >>> >>>> >> before you run riot on all along the codebase, Suraj ist
>> currently
>> >>> >>>> working
>> >>> >>>> >> on that stuff- don't make it more difficult for him rebasing
>> all
>> >>> his
>> >>> >>>> >> patches the whole time.
>> >>> >>>> >> He has the plan so that we made to make the stuff working, his
>> >>> part is
>> >>> >>>> >> currently missing. So don't try to muddle arround there, it
>> will
>> >>> make
>> >>> >>>> this
>> >>> >>>> >> take longer than already needed.
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>> >>> >>>> >>
>> >>> >>>> >>> Personally, I would like to solve this issue by touching
>> >>> >>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
>> >>> >>>> >>> multiple files, we can avoid huge memory consumption.
>> >>> >>>> >>>
>> >>> >>>> >>> If we want to sort partitioned data using messaging system,
>> idea
>> >>> >>>> >>> should be collected.
>> >>> >>>> >>>
>> >>> >>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
>> >>> >>>> edwardyoon@apache.org>
>> >>> >>>> >>> wrote:
>> >>> >>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely
>> written.
>> >>> >>>> >>> >
>> >>> >>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
>> >>> >>>> edwardyoon@apache.org>
>> >>> >>>> >>> wrote:
>> >>> >>>> >>> >> I'm reading changes of HAMA-704 again. As a result of
>> adding
>> >>> >>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm
>> >>> not sure
>> >>> >>>> >>> >> but I think this approach will bring more disadvantages
>> than
>> >>> >>>> >>> >> advantages.
>> >>> >>>> >>> >>
>> >>> >>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
>> >>> >>>> edwardyoon@apache.org>
>> >>> >>>> >>> wrote:
>> >>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
>> >>> storage in
>> >>> >>>> >>> user space
>> >>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
>> >>> writes.
>> >>> >>>> >>> This way
>> >>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
>> >>> vertices
>> >>> >>>> >>> sorted
>> >>> >>>> >>> >>>>>> with a single read and single write on every peer.
>> >>> >>>> >>> >>>
>> >>> >>>> >>> >>> And, as I commented JIRA ticket, I think we can't use
>> >>> messaging
>> >>> >>>> system
>> >>> >>>> >>> >>> for sorting vertices within partition files.
>> >>> >>>> >>> >>>
>> >>> >>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>> >>> >>>> >>> edwardyoon@apache.org> wrote:
>> >>> >>>> >>> >>>> P.S., (number of splits = number of partitions) is really
>> >>> confuse
>> >>> >>>> to
>> >>> >>>> >>> >>>> me. Even though blocks number is equal to desired tasks
>> >>> number,
>> >>> >>>> data
>> >>> >>>> >>> >>>> should be re-partitioned again.
>> >>> >>>> >>> >>>>
>> >>> >>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>> >>> >>>> >>> edwardyoon@apache.org> wrote:
>> >>> >>>> >>> >>>>> Indeed. If there are already partitioned input files
>> >>> (unsorted)
>> >>> >>>> and
>> >>> >>>> >>> so
>> >>> >>>> >>> >>>>> user want to skip pre-partitioning phase, it should be
>> >>> handled in
>> >>> >>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't
>> know why
>> >>> >>>> >>> >>>>> re-partitioned files need to be Sorted. It's only about
>> >>> >>>> >>> >>>>> GraphJobRunner.
>> >>> >>>> >>> >>>>>
>> >>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We
>> can
>> >>> have
>> >>> >>>> a
>> >>> >>>> >>> dedicated
>> >>> >>>> >>> >>>>>> partitioning superstep for graph applications).
>> >>> >>>> >>> >>>>>
>> >>> >>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just
>> a
>> >>> >>>> >>> partitioning
>> >>> >>>> >>> >>>>> job based on superstep API?
>> >>> >>>> >>> >>>>>
>> >>> >>>> >>> >>>>> By default, 100 tasks will be assigned for partitioning
>> job.
>> >>> >>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we
>> can
>> >>> >>>> execute
>> >>> >>>> >>> >>>>> the Graph job with 1,000 tasks.
>> >>> >>>> >>> >>>>>
>> >>> >>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100
>> >>> blocks). If
>> >>> >>>> I
>> >>> >>>> >>> >>>>> want to run with 1,000 tasks, what happens?
>> >>> >>>> >>> >>>>>
>> >>> >>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
>> >>> >>>> surajsmenon@apache.org>
>> >>> >>>> >>> wrote:
>> >>> >>>> >>> >>>>>> I am responding on this thread because of better
>> >>> continuity for
>> >>> >>>> >>> >>>>>> conversation. We cannot expect the partitions to be
>> sorted
>> >>> every
>> >>> >>>> >>> time. When
>> >>> >>>> >>> >>>>>> the number of splits = number of partitions and
>> >>> partitioning is
>> >>> >>>> >>> switched
>> >>> >>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be
>> sorted.
>> >>> Can
>> >>> >>>> we
>> >>> >>>> >>> do this
>> >>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
>> >>> storage in
>> >>> >>>> >>> user space
>> >>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
>> >>> writes.
>> >>> >>>> >>> This way
>> >>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
>> >>> vertices
>> >>> >>>> >>> sorted
>> >>> >>>> >>> >>>>>> with a single read and single write on every peer.
>> >>> >>>> >>> >>>>>>
>> >>> >>>> >>> >>>>>> Just clearing confusion if any regarding superstep
>> >>> injection for
>> >>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We
>> can
>> >>> have
>> >>> >>>> a
>> >>> >>>> >>> dedicated
>> >>> >>>> >>> >>>>>> partitioning superstep for graph applications).
>> >>> >>>> >>> >>>>>> Say there are x splits and y number of tasks
>> configured by
>> >>> user.
>> >>> >>>> >>> >>>>>>
>> >>> >>>> >>> >>>>>> if x > y
>> >>> >>>> >>> >>>>>> The y tasks are scheduled with x of them having each of
>> >>> the x
>> >>> >>>> >>> splits and
>> >>> >>>> >>> >>>>>> the remaining with no resource local to them. Then the
>> >>> >>>> partitioning
>> >>> >>>> >>> >>>>>> superstep redistributes the partitions among them to
>> create
>> >>> >>>> local
>> >>> >>>> >>> >>>>>> partitions. Now the question is can we re-initialize a
>> >>> peer's
>> >>> >>>> input
>> >>> >>>> >>> based
>> >>> >>>> >>> >>>>>> on this new local part of partition?
>> >>> >>>> >>> >>>>>>
>> >>> >>>> >>> >>>>>> if y > x
>> >>> >>>> >>> >>>>>> works as it works today.
>> >>> >>>> >>> >>>>>>
>> >>> >>>> >>> >>>>>> Just putting my points in brainstorming.
>> >>> >>>> >>> >>>>>>
>> >>> >>>> >>> >>>>>> -Suraj
>> >>> >>>> >>> >>>>>>
>> >>> >>>> >>> >>>>>>
>> >>> >>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>> >>> >>>> >>> edwardyoon@apache.org>wrote:
>> >>> >>>> >>> >>>>>>
>> >>> >>>> >>> >>>>>>> I just filed here
>> >>> >>>> https://issues.apache.org/jira/browse/HAMA-744
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>> >>> >>>> >>> edwardyoon@apache.org>
>> >>> >>>> >>> >>>>>>> wrote:
>> >>> >>>> >>> >>>>>>> > Additionally,
>> >>> >>>> >>> >>>>>>> >
>> >>> >>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we
>> >>> inject the
>> >>> >>>> >>> partitioning
>> >>> >>>> >>> >>>>>>> >> superstep as the first superstep and use local
>> memory?
>> >>> >>>> >>> >>>>>>> >
>> >>> >>>> >>> >>>>>>> > Can we execute different number of tasks per
>> superstep?
>> >>> >>>> >>> >>>>>>> >
>> >>> >>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>> >>> >>>> >>> edwardyoon@apache.org>
>> >>> >>>> >>> >>>>>>> wrote:
>> >>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
>> >>> result
>> >>> >>>> from
>> >>> >>>> >>> the
>> >>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only
>> the
>> >>> >>>> partition
>> >>> >>>> >>> files in
>> >>> >>>> >>> >>>>>>> >>
>> >>> >>>> >>> >>>>>>> >> I see.
>> >>> >>>> >>> >>>>>>> >>
>> >>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
>> >>> superstep
>> >>> >>>> API,
>> >>> >>>> >>> Suraj's
>> >>> >>>> >>> >>>>>>> idea
>> >>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
>> >>> partitions the
>> >>> >>>> >>> stuff into
>> >>> >>>> >>> >>>>>>> our
>> >>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
>> >>> >>>> >>> >>>>>>> >>
>> >>> >>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
>> >>> >>>> partitioning
>> >>> >>>> >>> step,
>> >>> >>>> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is
>> >>> there
>> >>> >>>> some
>> >>> >>>> >>> special
>> >>> >>>> >>> >>>>>>> >> reason?
>> >>> >>>> >>> >>>>>>> >>
>> >>> >>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>> >>> >>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
>> >>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
>> >>> result
>> >>> >>>> from
>> >>> >>>> >>> the
>> >>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only
>> the
>> >>> >>>> partition
>> >>> >>>> >>> files in
>> >>> >>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not
>> >>> sorted
>> >>> >>>> data
>> >>> >>>> >>> in the
>> >>> >>>> >>> >>>>>>> >>> completed file. This only applies for the graph
>> >>> processing
>> >>> >>>> >>> package.
>> >>> >>>> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to
>> >>> solve
>> >>> >>>> this
>> >>> >>>> >>> via
>> >>> >>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very
>> very
>> >>> >>>> >>> scalable!). So the
>> >>> >>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with
>> a
>> >>> single
>> >>> >>>> >>> superstep in
>> >>> >>>> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging
>> must
>> >>> be
>> >>> >>>> >>> sorted anyway
>> >>> >>>> >>> >>>>>>> for
>> >>> >>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and
>> saves
>> >>> us
>> >>> >>>> the
>> >>> >>>> >>> >>>>>>> partitioning
>> >>> >>>> >>> >>>>>>> >>> job for graph processing.
>> >>> >>>> >>> >>>>>>> >>>
>> >>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
>> >>> superstep
>> >>> >>>> API,
>> >>> >>>> >>> Suraj's
>> >>> >>>> >>> >>>>>>> idea
>> >>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
>> >>> partitions the
>> >>> >>>> >>> stuff into
>> >>> >>>> >>> >>>>>>> our
>> >>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
>> >>> >>>> >>> >>>>>>> >>>
>> >>> >>>> >>> >>>>>>> >>>
>> >>> >>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>> >>> >>>> >>> >>>>>>> >>>
>> >>> >>>> >>> >>>>>>> >>>> No, the partitions we write locally need not be
>> >>> sorted.
>> >>> >>>> Sorry
>> >>> >>>> >>> for the
>> >>> >>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible
>> with
>> >>> >>>> Superstep
>> >>> >>>> >>> API.
>> >>> >>>> >>> >>>>>>> There
>> >>> >>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler
>> after
>> >>> I
>> >>> >>>> last
>> >>> >>>> >>> worked on
>> >>> >>>> >>> >>>>>>> it.
>> >>> >>>> >>> >>>>>>> >>>> We can then look into partitioning superstep
>> being
>> >>> >>>> executed
>> >>> >>>> >>> before the
>> >>> >>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I
>> think
>> >>> it is
>> >>> >>>> >>> feasible.
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
>> >>> >>>> >>> edwardyoon@apache.org
>> >>> >>>> >>> >>>>>>> >>>> >wrote:
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue,
>> can we
>> >>> >>>> inject
>> >>> >>>> >>> the
>> >>> >>>> >>> >>>>>>> >>>> partitioning
>> >>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use
>> local
>> >>> memory?
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before
>> calling
>> >>> >>>> >>> BSP.setup()
>> >>> >>>> >>> >>>>>>> method
>> >>> >>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my
>> >>> opinion,
>> >>> >>>> >>> current is
>> >>> >>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more
>> >>> experiences of
>> >>> >>>> >>> input
>> >>> >>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be
>> Sorted?!
>> >>> >>>> MR-like?
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>> >>> >>>> >>> >>>>>>> surajsmenon@apache.org>
>> >>> >>>> >>> >>>>>>> >>>> > wrote:
>> >>> >>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to
>> outside
>> >>> graph
>> >>> >>>> >>> module.
>> >>> >>>> >>> >>>>>>> When we
>> >>> >>>> >>> >>>>>>> >>>> > have
>> >>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue,
>> can we
>> >>> >>>> inject
>> >>> >>>> >>> the
>> >>> >>>> >>> >>>>>>> >>>> partitioning
>> >>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use
>> local
>> >>> memory?
>> >>> >>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a job
>> and
>> >>> are
>> >>> >>>> >>> creating two
>> >>> >>>> >>> >>>>>>> copies
>> >>> >>>> >>> >>>>>>> >>>> > of
>> >>> >>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly.
>> Is it
>> >>> >>>> possible
>> >>> >>>> >>> to
>> >>> >>>> >>> >>>>>>> create or
>> >>> >>>> >>> >>>>>>> >>>> > > redistribute the partitions on local memory
>> and
>> >>> >>>> >>> initialize the
>> >>> >>>> >>> >>>>>>> record
>> >>> >>>> >>> >>>>>>> >>>> > > reader there?
>> >>> >>>> >>> >>>>>>> >>>> > > The user can run a separate job give in
>> examples
>> >>> area
>> >>> >>>> to
>> >>> >>>> >>> >>>>>>> explicitly
>> >>> >>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment
>> >>> question
>> >>> >>>> is
>> >>> >>>> >>> how much
>> >>> >>>> >>> >>>>>>> of
>> >>> >>>> >>> >>>>>>> >>>> disk
>> >>> >>>> >>> >>>>>>> >>>> > > space gets allocated for local memory usage?
>> >>> Would it
>> >>> >>>> be
>> >>> >>>> >>> a safe
>> >>> >>>> >>> >>>>>>> >>>> approach
>> >>> >>>> >>> >>>>>>> >>>> > > with the limitations?
>> >>> >>>> >>> >>>>>>> >>>> > >
>> >>> >>>> >>> >>>>>>> >>>> > > -Suraj
>> >>> >>>> >>> >>>>>>> >>>> > >
>> >>> >>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas
>> Jungblut
>> >>> >>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>> >>> >>>> >>> >>>>>>> >>>> > >
>> >>> >>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted
>> files
>> >>> we can
>> >>> >>>> add
>> >>> >>>> >>> this to
>> >>> >>>> >>> >>>>>>> the
>> >>> >>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
>> >>> >>>> >>> >>>>>>> >>>> > >>
>> >>> >>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <
>> edwardyoon@apache.org
>> >>> >
>> >>> >>>> >>> >>>>>>> >>>> > >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data
>> really
>> >>> >>>> necessary
>> >>> >>>> >>> to be
>> >>> >>>> >>> >>>>>>> Sorted?
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas
>> >>> Jungblut
>> >>> >>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>> >>> >>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works,
>> >>> obviously
>> >>> >>>> if
>> >>> >>>> >>> you merge
>> >>> >>>> >>> >>>>>>> n
>> >>> >>>> >>> >>>>>>> >>>> > sorted
>> >>> >>>> >>> >>>>>>> >>>> > >> > files
>> >>> >>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this
>> will
>> >>> >>>> result in
>> >>> >>>> >>> totally
>> >>> >>>> >>> >>>>>>> >>>> > unsorted
>> >>> >>>> >>> >>>>>>> >>>> > >> > data
>> >>> >>>> >>> >>>>>>> >>>> > >> > > ;-)
>> >>> >>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>> >>> >>>> >>> >>>>>>> >>>> > >> > >
>> >>> >>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
>> >>> >>>> thomas.jungblut@gmail.com
>> >>> >>>> >>> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly
>> sorted:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> ...
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> ...
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>> >>> >>>> >>> thomas.jungblut@gmail.com>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
>> >>> >>>> edwardyoon@apache.org>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly,
>> please
>> >>> do.
>> >>> >>>> >>> March 1 is
>> >>> >>>> >>> >>>>>>> >>>> > holiday[1]
>> >>> >>>> >>> >>>>>>> >>>> > >> so
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> 1.
>> >>> >>>> >>> >>>>>>>
>> >>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM,
>> Thomas
>> >>> >>>> Jungblut
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file,
>> >>> didn't
>> >>> >>>> >>> observe if all
>> >>> >>>> >>> >>>>>>> >>>> items
>> >>> >>>> >>> >>>>>>> >>>> > >> were
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> added.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I
>> copy/pasted the
>> >>> >>>> logic
>> >>> >>>> >>> of the ID
>> >>> >>>> >>> >>>>>>> into
>> >>> >>>> >>> >>>>>>> >>>> > the
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
>> >>> >>>> edwardyoon@apache.org
>> >>> >>>> >>> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen,
>> when
>> >>> >>>> generate
>> >>> >>>> >>> adjacency
>> >>> >>>> >>> >>>>>>> >>>> matrix
>> >>> >>>> >>> >>>>>>> >>>> > >> into
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM,
>> >>> Thomas
>> >>> >>>> >>> Jungblut
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com>
>> wrote:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they
>> >>> partitioned
>> >>> >>>> >>> correctly?
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>> >>> >>>> >>> edwardyoon@apache.org>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> >>> :~/workspace/hama-trunk$
>> >>> >>>> ls
>> >>> >>>> >>> -al
>> >>> >>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward
>>  4096
>> >>>  2월 28
>> >>> >>>> >>> 18:03 .
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root
>> 20480
>> >>>  2월 28
>> >>> >>>> >>> 18:04 ..
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward
>>  2243
>> >>>  2월 28
>> >>> >>>> >>> 18:01
>> >>> >>>> >>> >>>>>>> part-00000
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward
>>  28
>> >>>  2월 28
>> >>> >>>> >>> 18:01
>> >>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward
>>  2251
>> >>>  2월 28
>> >>> >>>> >>> 18:01
>> >>> >>>> >>> >>>>>>> part-00001
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward
>>  28
>> >>>  2월 28
>> >>> >>>> >>> 18:01
>> >>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward
>>  4096
>> >>>  2월 28
>> >>> >>>> >>> 18:03
>> >>> >>>> >>> >>>>>>> partitions
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> >>> :~/workspace/hama-trunk$
>> >>> >>>> ls
>> >>> >>>> >>> -al
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096
>> >>>  2월 28
>> >>> >>>> >>> 18:03 .
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096
>> >>>  2월 28
>> >>> >>>> >>> 18:03 ..
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932
>> >>>  2월 28
>> >>> >>>> 18:03
>> >>> >>>> >>> >>>>>>> part-00000
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
>> >>>  2월 28
>> >>> >>>> 18:03
>> >>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955
>> >>>  2월 28
>> >>> >>>> 18:03
>> >>> >>>> >>> >>>>>>> part-00001
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
>> >>>  2월 28
>> >>> >>>> 18:03
>> >>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> >>> :~/workspace/hama-trunk$
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27
>> PM,
>> >>> Edward
>> >>> >>>> <
>> >>> >>>> >>> >>>>>>> >>>> edward@udanax.org
>> >>> >>>> >>> >>>>>>> >>>> > >
>> >>> >>>> >>> >>>>>>> >>>> > >> > wrote:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM,
>> >>> Thomas
>> >>> >>>> >>> Jungblut <
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an
>> observation
>> >>> for me
>> >>> >>>> >>> please?
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from
>> >>> fastgen,
>> >>> >>>> >>> part-00000 and
>> >>> >>>> >>> >>>>>>> >>>> > >> part-00001,
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> both
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition
>> >>> directory,
>> >>> >>>> there
>> >>> >>>> >>> is only a
>> >>> >>>> >>> >>>>>>> >>>> single
>> >>> >>>> >>> >>>>>>> >>>> > >> > 5.56kb
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> file.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the
>> >>> partitioner to
>> >>> >>>> >>> write a
>> >>> >>>> >>> >>>>>>> single
>> >>> >>>> >>> >>>>>>> >>>> > file
>> >>> >>>> >>> >>>>>>> >>>> > >> if
>> >>> >>>> >>> >>>>>>> >>>> > >> > you
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two
>> files,
>> >>> >>>> strange
>> >>> >>>> >>> huh?
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>> >>> >>>> >>> >>>>>>> thomas.jungblut@gmail.com>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10
>> >>> /tmp/randomgraph
>> >>> >>>> 1
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
>> >>> >>>> /tmp/pageout
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last
>> time I
>> >>> >>>> >>> profiled, maybe
>> >>> >>>> >>> >>>>>>> the
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with
>> the
>> >>> input
>> >>> >>>> or
>> >>> >>>> >>> something
>> >>> >>>> >>> >>>>>>> else.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
>> >>> >>>> >>> edwardyoon@apache.org
>> >>> >>>> >>> >>>>>>> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not
>> work
>> >>> for
>> >>> >>>> graph
>> >>> >>>> >>> examples.
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>> >>> >>>> >>> >>>>>>> >>>> > >> > fastgen
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>> >>> >>>> >>> util.NativeCodeLoader:
>> >>> >>>> >>> >>>>>>> Unable
>> >>> >>>> >>> >>>>>>> >>>> > to
>> >>> >>>> >>> >>>>>>> >>>> > >> > load
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for
>> your
>> >>> >>>> >>> platform...
>> >>> >>>> >>> >>>>>>> using
>> >>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> Running
>> >>> >>>> >>> >>>>>>> >>>> job:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> >>> >>>> >>> bsp.LocalBSPRunner:
>> >>> >>>> >>> >>>>>>> Setting
>> >>> >>>> >>> >>>>>>> >>>> up
>> >>> >>>> >>> >>>>>>> >>>> > a
>> >>> >>>> >>> >>>>>>> >>>> > >> new
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> Current
>> >>> >>>> >>> >>>>>>> >>>> > >> supersteps
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> >>>> >>> bsp.BSPJobClient: The
>> >>> >>>> >>> >>>>>>> total
>> >>> >>>> >>> >>>>>>> >>>> > number
>> >>> >>>> >>> >>>>>>> >>>> > >> > of
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> Counters: 3
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212
>> seconds
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> hama-examples-0.7.0-SNAPSHOT.jar
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>> >>> >>>> >>> >>>>>>> >>>> > pagerank
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph
>> /tmp/pageour
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>> >>> >>>> >>> util.NativeCodeLoader:
>> >>> >>>> >>> >>>>>>> Unable
>> >>> >>>> >>> >>>>>>> >>>> > to
>> >>> >>>> >>> >>>>>>> >>>> > >> > load
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for
>> your
>> >>> >>>> >>> platform...
>> >>> >>>> >>> >>>>>>> using
>> >>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> >>> >>>> >>> bsp.FileInputFormat:
>> >>> >>>> >>> >>>>>>> Total
>> >>> >>>> >>> >>>>>>> >>>> > input
>> >>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> >>> >>>> >>> bsp.FileInputFormat:
>> >>> >>>> >>> >>>>>>> Total
>> >>> >>>> >>> >>>>>>> >>>> > input
>> >>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> Running
>> >>> >>>> >>> >>>>>>> >>>> job:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> >>> >>>> >>> bsp.LocalBSPRunner:
>> >>> >>>> >>> >>>>>>> Setting
>> >>> >>>> >>> >>>>>>> >>>> up
>> >>> >>>> >>> >>>>>>> >>>> > a
>> >>> >>>> >>> >>>>>>> >>>> > >> new
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> Current
>> >>> >>>> >>> >>>>>>> >>>> > >> supersteps
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient: The
>> >>> >>>> >>> >>>>>>> total
>> >>> >>>> >>> >>>>>>> >>>> > number
>> >>> >>>> >>> >>>>>>> >>>> > >> > of
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> Counters: 6
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.FileInputFormat:
>> >>> >>>> >>> >>>>>>> Total
>> >>> >>>> >>> >>>>>>> >>>> > input
>> >>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.BSPJobClient:
>> >>> >>>> >>> >>>>>>> Running
>> >>> >>>> >>> >>>>>>> >>>> job:
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> bsp.LocalBSPRunner:
>> >>> >>>> >>> >>>>>>> Setting
>> >>> >>>> >>> >>>>>>> >>>> up
>> >>> >>>> >>> >>>>>>> >>>> > a
>> >>> >>>> >>> >>>>>>> >>>> > >> new
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> graph.GraphJobRunner: 50
>> >>> >>>> >>> >>>>>>> >>>> > vertices
>> >>> >>>> >>> >>>>>>> >>>> > >> > are
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> >>>> >>> graph.GraphJobRunner: 50
>> >>> >>>> >>> >>>>>>> >>>> > vertices
>> >>> >>>> >>> >>>>>>> >>>> > >> > are
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>> >>> >>>> >>> bsp.LocalBSPRunner:
>> >>> >>>> >>> >>>>>>> >>>> Exception
>> >>> >>>> >>> >>>>>>> >>>> > >> > during
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> BSP
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> java.lang.IllegalArgumentException:
>> >>> >>>> >>> Messages
>> >>> >>>> >>> >>>>>>> must
>> >>> >>>> >>> >>>>>>> >>>> > never
>> >>> >>>> >>> >>>>>>> >>>> > >> be
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> behind
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> the
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current
>> Message
>> >>> ID: 1
>> >>> >>>> >>> vs. 50
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>>
>> >>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>>
>> >>> >>>>
>> >>>
>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>>
>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>>
>> >>> >>>>
>> >>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >>
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>>
>> >>> >>>>
>> >>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >>
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>>
>> >>> >>>>
>> >>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>>
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>>
>> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>>
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >>
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>>
>> >>> >>>>
>> >>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >>
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>>
>> >>> >>>>
>> >>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>> >>> java.lang.Thread.run(Thread.java:722)
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J.
>> Yoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> --
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> --
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >> > --
>> >>> >>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>>>> >>>> > >> > @eddieyoon
>> >>> >>>> >>> >>>>>>> >>>> > >> >
>> >>> >>>> >>> >>>>>>> >>>> > >>
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>> > --
>> >>> >>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>>>> >>>> > @eddieyoon
>> >>> >>>> >>> >>>>>>> >>>> >
>> >>> >>>> >>> >>>>>>> >>>>
>> >>> >>>> >>> >>>>>>> >>
>> >>> >>>> >>> >>>>>>> >>
>> >>> >>>> >>> >>>>>>> >>
>> >>> >>>> >>> >>>>>>> >> --
>> >>> >>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>>>> >> @eddieyoon
>> >>> >>>> >>> >>>>>>> >
>> >>> >>>> >>> >>>>>>> >
>> >>> >>>> >>> >>>>>>> >
>> >>> >>>> >>> >>>>>>> > --
>> >>> >>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>>>> > @eddieyoon
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>> >>>>>>> --
>> >>> >>>> >>> >>>>>>> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>>>> @eddieyoon
>> >>> >>>> >>> >>>>>>>
>> >>> >>>> >>> >>>>>
>> >>> >>>> >>> >>>>>
>> >>> >>>> >>> >>>>>
>> >>> >>>> >>> >>>>> --
>> >>> >>>> >>> >>>>> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>>> @eddieyoon
>> >>> >>>> >>> >>>>
>> >>> >>>> >>> >>>>
>> >>> >>>> >>> >>>>
>> >>> >>>> >>> >>>> --
>> >>> >>>> >>> >>>> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>>> @eddieyoon
>> >>> >>>> >>> >>>
>> >>> >>>> >>> >>>
>> >>> >>>> >>> >>>
>> >>> >>>> >>> >>> --
>> >>> >>>> >>> >>> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >>> @eddieyoon
>> >>> >>>> >>> >>
>> >>> >>>> >>> >>
>> >>> >>>> >>> >>
>> >>> >>>> >>> >> --
>> >>> >>>> >>> >> Best Regards, Edward J. Yoon
>> >>> >>>> >>> >> @eddieyoon
>> >>> >>>> >>> >
>> >>> >>>> >>> >
>> >>> >>>> >>> >
>> >>> >>>> >>> > --
>> >>> >>>> >>> > Best Regards, Edward J. Yoon
>> >>> >>>> >>> > @eddieyoon
>> >>> >>>> >>>
>> >>> >>>> >>>
>> >>> >>>> >>>
>> >>> >>>> >>> --
>> >>> >>>> >>> Best Regards, Edward J. Yoon
>> >>> >>>> >>> @eddieyoon
>> >>> >>>> >>>
>> >>> >>>> >
>> >>> >>>> >
>> >>> >>>> >
>> >>> >>>> > --
>> >>> >>>> > Best Regards, Edward J. Yoon
>> >>> >>>> > @eddieyoon
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> --
>> >>> >>>> Best Regards, Edward J. Yoon
>> >>> >>>> @eddieyoon
>> >>> >>>>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Best Regards, Edward J. Yoon
>> >>> >> @eddieyoon
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Best Regards, Edward J. Yoon
>> >>> > @eddieyoon
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>> @eddieyoon
>> >>>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by Suraj Menon <su...@apache.org>.

Going in line with the latest topic of the conversation.
Nothing is closed here and the JIRA's were already created for the whole
thing to come in place:

HAMA-644
HAMA-490
HAMA-722
HAMA-728
HAMA-707
HAMA-728

The JIRA's above are directly or indirectly affected during core
refactoring.

-Suraj


On Thu, Mar 14, 2013 at 7:03 AM, Edward J. Yoon <ed...@apache.org>wrote:

> P.S., These comments are never helpful in developing community.
>
> "before you run riot on all along the codebase, Suraj ist currently working
> on that stuff- don't make it more difficult for him rebasing all his
> patches the whole time.
> He has the plan so that we made to make the stuff working, his part is
> currently missing. So don't try to muddle arround there, it will make this
> take longer than already needed."
>
> On Thu, Mar 14, 2013 at 7:57 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > In my opinion, the our best action is - 1) explain the plans, edit
> > together on Wiki, and then 2) break-down implementation tasks as small
> > as possible so that available people can try it in parallel. Then, you
> > can use available people. Do you remember, I asked you to write down
> > your plan here? - http://wiki.apache.org/hama/SpillingQueue If you
> > have some time, Please do for me. I'll help you in my free time.
> >
> > Regarding branches, maybe we all are not familiar with online
> > collaboration (or don't want to collaborate anymore). If we want to
> > walk own ways, why we need to be in here together?
> >
> > On Thu, Mar 14, 2013 at 7:13 PM, Suraj Menon <su...@apache.org>
> wrote:
> >> Three points:
> >>
> >> Firstly, apologies because partly this conversation emanates from the
> delay
> >> in providing the set of patches. I was not able to slice as much time I
> was
> >> hoping.
> >>
> >> Second, I think I/we can work on a separate branches. Since most of
> these
> >> concerns could only be answered by future patches, a decision could be
> made
> >> then. We can decide if svn revert is needed during the process on trunk.
> >> (This is a general comment and not related to particular JIRA)
> >>
> >> Third, Please feel free to slice a release if it is really important.
> >>
> >> Thanks,
> >> Suraj
> >>
> >> On Thu, Mar 14, 2013 at 5:39 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >>
> >>> To reduce arguing, I'm appending my opinions.
> >>>
> >>> In HAMA-704, I wanted to remove only message map to reduce memory
> >>> consumption. I still don't want to talk about disk-based vertices and
> >>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
> >>> 'partitioning issue fixed and quick executable examples' version ASAP.
> >>> That's why I scheduled Spilling Queue in 0.7 roadmap.
> >>>
> >>> As you can see, issues are happening one right after another. I don't
> >>> think we have to clean all never-ending issues. We can improve
> >>> step-by-step.
> >>>
> >>> 1. http://wiki.apache.org/hama/RoadMap
> >>>
> >>> On Thu, Mar 14, 2013 at 6:22 PM, Edward J. Yoon <edwardyoon@apache.org
> >
> >>> wrote:
> >>> > Typos ;)
> >>> >
> >>> >> except YARN integration tasks. If you leave here, I have to take
> cover
> >>> >> YARN tasks. Should I wait someone? Am I touching core module
> >>> >
> >>> > I have to cover YARN tasks instead of you.
> >>> >
> >>> > On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>> wrote:
> >>> >> Hmm, here's my opinions:
> >>> >>
> >>> >> As you know, we have a problem of lack of team members and
> >>> >> contributors. So we should break down every tasks as small as
> >>> >> possible. Our best action is improving step-by-step. And every
> >>> >> Hama-x.x.x should run well even though it's a baby cart level.
> >>> >>
> >>> >> And, Tech should be developed under the necessity. So I think we
> need
> >>> >> to cut release as often as possible. Therefore I volunteered to
> manage
> >>> >> release. Actually, I was wanted to work only on QA (quality
> assurance)
> >>> >> related tasks because yours code is better than me and I have a
> >>> >> cluster.
> >>> >>
> >>> >> However, we are currently not doing like that. I guess there are
> many
> >>> >> reasons. We're all not a full-time open sourcer (except me).
> >>> >>
> >>> >>> You have 23 issues assigned.  Why do you need to work on that?
> >>> >>
> >>> >> I don't know what you mean exactly. But 23 issues are almost
> examples
> >>> >> except YARN integration tasks. If you leave here, I have to take
> cover
> >>> >> YARN tasks. Should I wait someone? Am I touching core module
> >>> >> aggressively?
> >>> >>
> >>> >>> Otherwise Suraj and I branch that issues away and you can play
> >>> arround.l in
> >>> >>> trunk how you like.
> >>> >>
> >>> >> I also don't know what you mean exactly but if you want, Please do.
> >>> >>
> >>> >> By the way, can you answer about this question - Is it really
> >>> >> technical conflicts? or emotional conflicts?
> >>> >>
> >>> >> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
> >>> >> <th...@gmail.com> wrote:
> >>> >>> You have 23 issues assigned.  Why do you need to work on that?
> >>> >>> Otherwise Suraj and I branch that issues away and you can play
> >>> arround.l in
> >>> >>> trunk how you like.
> >>> >>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <
> edwardyoon@apache.org>:
> >>> >>>
> >>> >>>> P.S., Please don't say like that.
> >>> >>>>
> >>> >>>> No decisions made yet. And if someone have a question or missed
> >>> >>>> something, you have to try to explain here. Because this is a open
> >>> >>>> source. Anyone can't say "don't touch trunk bc I'm working on it".
> >>> >>>>
> >>> >>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org>
> >>> >>>> wrote:
> >>> >>>> > Sorry for my quick and dirty style small patches.
> >>> >>>> >
> >>> >>>> > However, we should work together in parallel. Please share here
> if
> >>> >>>> > there are some progresses.
> >>> >>>> >
> >>> >>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
> >>> >>>> > <th...@gmail.com> wrote:
> >>> >>>> >> Hi Edward,
> >>> >>>> >>
> >>> >>>> >> before you run riot on all along the codebase, Suraj ist
> currently
> >>> >>>> working
> >>> >>>> >> on that stuff- don't make it more difficult for him rebasing
> all
> >>> his
> >>> >>>> >> patches the whole time.
> >>> >>>> >> He has the plan so that we made to make the stuff working, his
> >>> part is
> >>> >>>> >> currently missing. So don't try to muddle arround there, it
> will
> >>> make
> >>> >>>> this
> >>> >>>> >> take longer than already needed.
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
> >>> >>>> >>
> >>> >>>> >>> Personally, I would like to solve this issue by touching
> >>> >>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
> >>> >>>> >>> multiple files, we can avoid huge memory consumption.
> >>> >>>> >>>
> >>> >>>> >>> If we want to sort partitioned data using messaging system,
> idea
> >>> >>>> >>> should be collected.
> >>> >>>> >>>
> >>> >>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
> >>> >>>> edwardyoon@apache.org>
> >>> >>>> >>> wrote:
> >>> >>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely
> written.
> >>> >>>> >>> >
> >>> >>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
> >>> >>>> edwardyoon@apache.org>
> >>> >>>> >>> wrote:
> >>> >>>> >>> >> I'm reading changes of HAMA-704 again. As a result of
> adding
> >>> >>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm
> >>> not sure
> >>> >>>> >>> >> but I think this approach will bring more disadvantages
> than
> >>> >>>> >>> >> advantages.
> >>> >>>> >>> >>
> >>> >>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
> >>> >>>> edwardyoon@apache.org>
> >>> >>>> >>> wrote:
> >>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
> >>> storage in
> >>> >>>> >>> user space
> >>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
> >>> writes.
> >>> >>>> >>> This way
> >>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
> >>> vertices
> >>> >>>> >>> sorted
> >>> >>>> >>> >>>>>> with a single read and single write on every peer.
> >>> >>>> >>> >>>
> >>> >>>> >>> >>> And, as I commented JIRA ticket, I think we can't use
> >>> messaging
> >>> >>>> system
> >>> >>>> >>> >>> for sorting vertices within partition files.
> >>> >>>> >>> >>>
> >>> >>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
> >>> >>>> >>> edwardyoon@apache.org> wrote:
> >>> >>>> >>> >>>> P.S., (number of splits = number of partitions) is really
> >>> confuse
> >>> >>>> to
> >>> >>>> >>> >>>> me. Even though blocks number is equal to desired tasks
> >>> number,
> >>> >>>> data
> >>> >>>> >>> >>>> should be re-partitioned again.
> >>> >>>> >>> >>>>
> >>> >>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
> >>> >>>> >>> edwardyoon@apache.org> wrote:
> >>> >>>> >>> >>>>> Indeed. If there are already partitioned input files
> >>> (unsorted)
> >>> >>>> and
> >>> >>>> >>> so
> >>> >>>> >>> >>>>> user want to skip pre-partitioning phase, it should be
> >>> handled in
> >>> >>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't
> know why
> >>> >>>> >>> >>>>> re-partitioned files need to be Sorted. It's only about
> >>> >>>> >>> >>>>> GraphJobRunner.
> >>> >>>> >>> >>>>>
> >>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We
> can
> >>> have
> >>> >>>> a
> >>> >>>> >>> dedicated
> >>> >>>> >>> >>>>>> partitioning superstep for graph applications).
> >>> >>>> >>> >>>>>
> >>> >>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just
> a
> >>> >>>> >>> partitioning
> >>> >>>> >>> >>>>> job based on superstep API?
> >>> >>>> >>> >>>>>
> >>> >>>> >>> >>>>> By default, 100 tasks will be assigned for partitioning
> job.
> >>> >>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we
> can
> >>> >>>> execute
> >>> >>>> >>> >>>>> the Graph job with 1,000 tasks.
> >>> >>>> >>> >>>>>
> >>> >>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100
> >>> blocks). If
> >>> >>>> I
> >>> >>>> >>> >>>>> want to run with 1,000 tasks, what happens?
> >>> >>>> >>> >>>>>
> >>> >>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
> >>> >>>> surajsmenon@apache.org>
> >>> >>>> >>> wrote:
> >>> >>>> >>> >>>>>> I am responding on this thread because of better
> >>> continuity for
> >>> >>>> >>> >>>>>> conversation. We cannot expect the partitions to be
> sorted
> >>> every
> >>> >>>> >>> time. When
> >>> >>>> >>> >>>>>> the number of splits = number of partitions and
> >>> partitioning is
> >>> >>>> >>> switched
> >>> >>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be
> sorted.
> >>> Can
> >>> >>>> we
> >>> >>>> >>> do this
> >>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
> >>> storage in
> >>> >>>> >>> user space
> >>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
> >>> writes.
> >>> >>>> >>> This way
> >>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
> >>> vertices
> >>> >>>> >>> sorted
> >>> >>>> >>> >>>>>> with a single read and single write on every peer.
> >>> >>>> >>> >>>>>>
> >>> >>>> >>> >>>>>> Just clearing confusion if any regarding superstep
> >>> injection for
> >>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We
> can
> >>> have
> >>> >>>> a
> >>> >>>> >>> dedicated
> >>> >>>> >>> >>>>>> partitioning superstep for graph applications).
> >>> >>>> >>> >>>>>> Say there are x splits and y number of tasks
> configured by
> >>> user.
> >>> >>>> >>> >>>>>>
> >>> >>>> >>> >>>>>> if x > y
> >>> >>>> >>> >>>>>> The y tasks are scheduled with x of them having each of
> >>> the x
> >>> >>>> >>> splits and
> >>> >>>> >>> >>>>>> the remaining with no resource local to them. Then the
> >>> >>>> partitioning
> >>> >>>> >>> >>>>>> superstep redistributes the partitions among them to
> create
> >>> >>>> local
> >>> >>>> >>> >>>>>> partitions. Now the question is can we re-initialize a
> >>> peer's
> >>> >>>> input
> >>> >>>> >>> based
> >>> >>>> >>> >>>>>> on this new local part of partition?
> >>> >>>> >>> >>>>>>
> >>> >>>> >>> >>>>>> if y > x
> >>> >>>> >>> >>>>>> works as it works today.
> >>> >>>> >>> >>>>>>
> >>> >>>> >>> >>>>>> Just putting my points in brainstorming.
> >>> >>>> >>> >>>>>>
> >>> >>>> >>> >>>>>> -Suraj
> >>> >>>> >>> >>>>>>
> >>> >>>> >>> >>>>>>
> >>> >>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
> >>> >>>> >>> edwardyoon@apache.org>wrote:
> >>> >>>> >>> >>>>>>
> >>> >>>> >>> >>>>>>> I just filed here
> >>> >>>> https://issues.apache.org/jira/browse/HAMA-744
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
> >>> >>>> >>> edwardyoon@apache.org>
> >>> >>>> >>> >>>>>>> wrote:
> >>> >>>> >>> >>>>>>> > Additionally,
> >>> >>>> >>> >>>>>>> >
> >>> >>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we
> >>> inject the
> >>> >>>> >>> partitioning
> >>> >>>> >>> >>>>>>> >> superstep as the first superstep and use local
> memory?
> >>> >>>> >>> >>>>>>> >
> >>> >>>> >>> >>>>>>> > Can we execute different number of tasks per
> superstep?
> >>> >>>> >>> >>>>>>> >
> >>> >>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
> >>> >>>> >>> edwardyoon@apache.org>
> >>> >>>> >>> >>>>>>> wrote:
> >>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
> >>> result
> >>> >>>> from
> >>> >>>> >>> the
> >>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only
> the
> >>> >>>> partition
> >>> >>>> >>> files in
> >>> >>>> >>> >>>>>>> >>
> >>> >>>> >>> >>>>>>> >> I see.
> >>> >>>> >>> >>>>>>> >>
> >>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
> >>> superstep
> >>> >>>> API,
> >>> >>>> >>> Suraj's
> >>> >>>> >>> >>>>>>> idea
> >>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
> >>> partitions the
> >>> >>>> >>> stuff into
> >>> >>>> >>> >>>>>>> our
> >>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
> >>> >>>> >>> >>>>>>> >>
> >>> >>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
> >>> >>>> partitioning
> >>> >>>> >>> step,
> >>> >>>> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is
> >>> there
> >>> >>>> some
> >>> >>>> >>> special
> >>> >>>> >>> >>>>>>> >> reason?
> >>> >>>> >>> >>>>>>> >>
> >>> >>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
> >>> >>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
> >>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
> >>> result
> >>> >>>> from
> >>> >>>> >>> the
> >>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only
> the
> >>> >>>> partition
> >>> >>>> >>> files in
> >>> >>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not
> >>> sorted
> >>> >>>> data
> >>> >>>> >>> in the
> >>> >>>> >>> >>>>>>> >>> completed file. This only applies for the graph
> >>> processing
> >>> >>>> >>> package.
> >>> >>>> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to
> >>> solve
> >>> >>>> this
> >>> >>>> >>> via
> >>> >>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very
> very
> >>> >>>> >>> scalable!). So the
> >>> >>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with
> a
> >>> single
> >>> >>>> >>> superstep in
> >>> >>>> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging
> must
> >>> be
> >>> >>>> >>> sorted anyway
> >>> >>>> >>> >>>>>>> for
> >>> >>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and
> saves
> >>> us
> >>> >>>> the
> >>> >>>> >>> >>>>>>> partitioning
> >>> >>>> >>> >>>>>>> >>> job for graph processing.
> >>> >>>> >>> >>>>>>> >>>
> >>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
> >>> superstep
> >>> >>>> API,
> >>> >>>> >>> Suraj's
> >>> >>>> >>> >>>>>>> idea
> >>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
> >>> partitions the
> >>> >>>> >>> stuff into
> >>> >>>> >>> >>>>>>> our
> >>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
> >>> >>>> >>> >>>>>>> >>>
> >>> >>>> >>> >>>>>>> >>>
> >>> >>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
> >>> >>>> >>> >>>>>>> >>>
> >>> >>>> >>> >>>>>>> >>>> No, the partitions we write locally need not be
> >>> sorted.
> >>> >>>> Sorry
> >>> >>>> >>> for the
> >>> >>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible
> with
> >>> >>>> Superstep
> >>> >>>> >>> API.
> >>> >>>> >>> >>>>>>> There
> >>> >>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler
> after
> >>> I
> >>> >>>> last
> >>> >>>> >>> worked on
> >>> >>>> >>> >>>>>>> it.
> >>> >>>> >>> >>>>>>> >>>> We can then look into partitioning superstep
> being
> >>> >>>> executed
> >>> >>>> >>> before the
> >>> >>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I
> think
> >>> it is
> >>> >>>> >>> feasible.
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
> >>> >>>> >>> edwardyoon@apache.org
> >>> >>>> >>> >>>>>>> >>>> >wrote:
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue,
> can we
> >>> >>>> inject
> >>> >>>> >>> the
> >>> >>>> >>> >>>>>>> >>>> partitioning
> >>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use
> local
> >>> memory?
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before
> calling
> >>> >>>> >>> BSP.setup()
> >>> >>>> >>> >>>>>>> method
> >>> >>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my
> >>> opinion,
> >>> >>>> >>> current is
> >>> >>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more
> >>> experiences of
> >>> >>>> >>> input
> >>> >>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be
> Sorted?!
> >>> >>>> MR-like?
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
> >>> >>>> >>> >>>>>>> surajsmenon@apache.org>
> >>> >>>> >>> >>>>>>> >>>> > wrote:
> >>> >>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to
> outside
> >>> graph
> >>> >>>> >>> module.
> >>> >>>> >>> >>>>>>> When we
> >>> >>>> >>> >>>>>>> >>>> > have
> >>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue,
> can we
> >>> >>>> inject
> >>> >>>> >>> the
> >>> >>>> >>> >>>>>>> >>>> partitioning
> >>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use
> local
> >>> memory?
> >>> >>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a job
> and
> >>> are
> >>> >>>> >>> creating two
> >>> >>>> >>> >>>>>>> copies
> >>> >>>> >>> >>>>>>> >>>> > of
> >>> >>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly.
> Is it
> >>> >>>> possible
> >>> >>>> >>> to
> >>> >>>> >>> >>>>>>> create or
> >>> >>>> >>> >>>>>>> >>>> > > redistribute the partitions on local memory
> and
> >>> >>>> >>> initialize the
> >>> >>>> >>> >>>>>>> record
> >>> >>>> >>> >>>>>>> >>>> > > reader there?
> >>> >>>> >>> >>>>>>> >>>> > > The user can run a separate job give in
> examples
> >>> area
> >>> >>>> to
> >>> >>>> >>> >>>>>>> explicitly
> >>> >>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment
> >>> question
> >>> >>>> is
> >>> >>>> >>> how much
> >>> >>>> >>> >>>>>>> of
> >>> >>>> >>> >>>>>>> >>>> disk
> >>> >>>> >>> >>>>>>> >>>> > > space gets allocated for local memory usage?
> >>> Would it
> >>> >>>> be
> >>> >>>> >>> a safe
> >>> >>>> >>> >>>>>>> >>>> approach
> >>> >>>> >>> >>>>>>> >>>> > > with the limitations?
> >>> >>>> >>> >>>>>>> >>>> > >
> >>> >>>> >>> >>>>>>> >>>> > > -Suraj
> >>> >>>> >>> >>>>>>> >>>> > >
> >>> >>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas
> Jungblut
> >>> >>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
> >>> >>>> >>> >>>>>>> >>>> > >
> >>> >>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted
> files
> >>> we can
> >>> >>>> add
> >>> >>>> >>> this to
> >>> >>>> >>> >>>>>>> the
> >>> >>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
> >>> >>>> >>> >>>>>>> >>>> > >>
> >>> >>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <
> edwardyoon@apache.org
> >>> >
> >>> >>>> >>> >>>>>>> >>>> > >>
> >>> >>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data
> really
> >>> >>>> necessary
> >>> >>>> >>> to be
> >>> >>>> >>> >>>>>>> Sorted?
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas
> >>> Jungblut
> >>> >>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
> >>> >>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works,
> >>> obviously
> >>> >>>> if
> >>> >>>> >>> you merge
> >>> >>>> >>> >>>>>>> n
> >>> >>>> >>> >>>>>>> >>>> > sorted
> >>> >>>> >>> >>>>>>> >>>> > >> > files
> >>> >>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this
> will
> >>> >>>> result in
> >>> >>>> >>> totally
> >>> >>>> >>> >>>>>>> >>>> > unsorted
> >>> >>>> >>> >>>>>>> >>>> > >> > data
> >>> >>>> >>> >>>>>>> >>>> > >> > > ;-)
> >>> >>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
> >>> >>>> >>> >>>>>>> >>>> > >> > >
> >>> >>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
> >>> >>>> thomas.jungblut@gmail.com
> >>> >>>> >>> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >
> >>> >>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly
> sorted:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
> >>> >>>> >>> >>>>>>> >>>> > >> > >> ...
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
> >>> >>>> >>> >>>>>>> >>>> > >> > >> ...
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
> >>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
> >>> >>>> >>> thomas.jungblut@gmail.com>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
> >>> >>>> edwardyoon@apache.org>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly,
> please
> >>> do.
> >>> >>>> >>> March 1 is
> >>> >>>> >>> >>>>>>> >>>> > holiday[1]
> >>> >>>> >>> >>>>>>> >>>> > >> so
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> 1.
> >>> >>>> >>> >>>>>>>
> >>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM,
> Thomas
> >>> >>>> Jungblut
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file,
> >>> didn't
> >>> >>>> >>> observe if all
> >>> >>>> >>> >>>>>>> >>>> items
> >>> >>>> >>> >>>>>>> >>>> > >> were
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> added.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I
> copy/pasted the
> >>> >>>> logic
> >>> >>>> >>> of the ID
> >>> >>>> >>> >>>>>>> into
> >>> >>>> >>> >>>>>>> >>>> > the
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
> >>> >>>> edwardyoon@apache.org
> >>> >>>> >>> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen,
> when
> >>> >>>> generate
> >>> >>>> >>> adjacency
> >>> >>>> >>> >>>>>>> >>>> matrix
> >>> >>>> >>> >>>>>>> >>>> > >> into
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM,
> >>> Thomas
> >>> >>>> >>> Jungblut
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com>
> wrote:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they
> >>> partitioned
> >>> >>>> >>> correctly?
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
> >>> >>>> >>> edwardyoon@apache.org>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> >>> :~/workspace/hama-trunk$
> >>> >>>> ls
> >>> >>>> >>> -al
> >>> >>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward
>  4096
> >>>  2월 28
> >>> >>>> >>> 18:03 .
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root
> 20480
> >>>  2월 28
> >>> >>>> >>> 18:04 ..
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward
>  2243
> >>>  2월 28
> >>> >>>> >>> 18:01
> >>> >>>> >>> >>>>>>> part-00000
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward
>  28
> >>>  2월 28
> >>> >>>> >>> 18:01
> >>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward
>  2251
> >>>  2월 28
> >>> >>>> >>> 18:01
> >>> >>>> >>> >>>>>>> part-00001
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward
>  28
> >>>  2월 28
> >>> >>>> >>> 18:01
> >>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward
>  4096
> >>>  2월 28
> >>> >>>> >>> 18:03
> >>> >>>> >>> >>>>>>> partitions
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> >>> :~/workspace/hama-trunk$
> >>> >>>> ls
> >>> >>>> >>> -al
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096
> >>>  2월 28
> >>> >>>> >>> 18:03 .
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096
> >>>  2월 28
> >>> >>>> >>> 18:03 ..
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932
> >>>  2월 28
> >>> >>>> 18:03
> >>> >>>> >>> >>>>>>> part-00000
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
> >>>  2월 28
> >>> >>>> 18:03
> >>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955
> >>>  2월 28
> >>> >>>> 18:03
> >>> >>>> >>> >>>>>>> part-00001
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
> >>>  2월 28
> >>> >>>> 18:03
> >>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> >>> :~/workspace/hama-trunk$
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27
> PM,
> >>> Edward
> >>> >>>> <
> >>> >>>> >>> >>>>>>> >>>> edward@udanax.org
> >>> >>>> >>> >>>>>>> >>>> > >
> >>> >>>> >>> >>>>>>> >>>> > >> > wrote:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM,
> >>> Thomas
> >>> >>>> >>> Jungblut <
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an
> observation
> >>> for me
> >>> >>>> >>> please?
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from
> >>> fastgen,
> >>> >>>> >>> part-00000 and
> >>> >>>> >>> >>>>>>> >>>> > >> part-00001,
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> both
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition
> >>> directory,
> >>> >>>> there
> >>> >>>> >>> is only a
> >>> >>>> >>> >>>>>>> >>>> single
> >>> >>>> >>> >>>>>>> >>>> > >> > 5.56kb
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> file.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the
> >>> partitioner to
> >>> >>>> >>> write a
> >>> >>>> >>> >>>>>>> single
> >>> >>>> >>> >>>>>>> >>>> > file
> >>> >>>> >>> >>>>>>> >>>> > >> if
> >>> >>>> >>> >>>>>>> >>>> > >> > you
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two
> files,
> >>> >>>> strange
> >>> >>>> >>> huh?
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
> >>> >>>> >>> >>>>>>> thomas.jungblut@gmail.com>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10
> >>> /tmp/randomgraph
> >>> >>>> 1
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
> >>> >>>> /tmp/pageout
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last
> time I
> >>> >>>> >>> profiled, maybe
> >>> >>>> >>> >>>>>>> the
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with
> the
> >>> input
> >>> >>>> or
> >>> >>>> >>> something
> >>> >>>> >>> >>>>>>> else.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
> >>> >>>> >>> edwardyoon@apache.org
> >>> >>>> >>> >>>>>>> >
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not
> work
> >>> for
> >>> >>>> graph
> >>> >>>> >>> examples.
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >>> >>>> >>> >>>>>>> >>>> > >> > jar
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
> >>> >>>> >>> >>>>>>> >>>> > >> > fastgen
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
> >>> >>>> >>> util.NativeCodeLoader:
> >>> >>>> >>> >>>>>>> Unable
> >>> >>>> >>> >>>>>>> >>>> > to
> >>> >>>> >>> >>>>>>> >>>> > >> > load
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for
> your
> >>> >>>> >>> platform...
> >>> >>>> >>> >>>>>>> using
> >>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> Running
> >>> >>>> >>> >>>>>>> >>>> job:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> >>> >>>> >>> bsp.LocalBSPRunner:
> >>> >>>> >>> >>>>>>> Setting
> >>> >>>> >>> >>>>>>> >>>> up
> >>> >>>> >>> >>>>>>> >>>> > a
> >>> >>>> >>> >>>>>>> >>>> > >> new
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> Current
> >>> >>>> >>> >>>>>>> >>>> > >> supersteps
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> >>>> >>> bsp.BSPJobClient: The
> >>> >>>> >>> >>>>>>> total
> >>> >>>> >>> >>>>>>> >>>> > number
> >>> >>>> >>> >>>>>>> >>>> > >> > of
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> Counters: 3
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212
> seconds
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >>> >>>> >>> >>>>>>> >>>> > >> > jar
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> hama-examples-0.7.0-SNAPSHOT.jar
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >>> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >>> >>>> >>> >>>>>>> >>>> > >> > jar
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> >>> >>>> >>> >>>>>>> >>>> > pagerank
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph
> /tmp/pageour
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
> >>> >>>> >>> util.NativeCodeLoader:
> >>> >>>> >>> >>>>>>> Unable
> >>> >>>> >>> >>>>>>> >>>> > to
> >>> >>>> >>> >>>>>>> >>>> > >> > load
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for
> your
> >>> >>>> >>> platform...
> >>> >>>> >>> >>>>>>> using
> >>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> >>> >>>> >>> bsp.FileInputFormat:
> >>> >>>> >>> >>>>>>> Total
> >>> >>>> >>> >>>>>>> >>>> > input
> >>> >>>> >>> >>>>>>> >>>> > >> > paths
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> >>> >>>> >>> bsp.FileInputFormat:
> >>> >>>> >>> >>>>>>> Total
> >>> >>>> >>> >>>>>>> >>>> > input
> >>> >>>> >>> >>>>>>> >>>> > >> > paths
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> Running
> >>> >>>> >>> >>>>>>> >>>> job:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> >>> >>>> >>> bsp.LocalBSPRunner:
> >>> >>>> >>> >>>>>>> Setting
> >>> >>>> >>> >>>>>>> >>>> up
> >>> >>>> >>> >>>>>>> >>>> > a
> >>> >>>> >>> >>>>>>> >>>> > >> new
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> Current
> >>> >>>> >>> >>>>>>> >>>> > >> supersteps
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient: The
> >>> >>>> >>> >>>>>>> total
> >>> >>>> >>> >>>>>>> >>>> > number
> >>> >>>> >>> >>>>>>> >>>> > >> > of
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> Counters: 6
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.FileInputFormat:
> >>> >>>> >>> >>>>>>> Total
> >>> >>>> >>> >>>>>>> >>>> > input
> >>> >>>> >>> >>>>>>> >>>> > >> > paths
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.BSPJobClient:
> >>> >>>> >>> >>>>>>> Running
> >>> >>>> >>> >>>>>>> >>>> job:
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> bsp.LocalBSPRunner:
> >>> >>>> >>> >>>>>>> Setting
> >>> >>>> >>> >>>>>>> >>>> up
> >>> >>>> >>> >>>>>>> >>>> > a
> >>> >>>> >>> >>>>>>> >>>> > >> new
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> graph.GraphJobRunner: 50
> >>> >>>> >>> >>>>>>> >>>> > vertices
> >>> >>>> >>> >>>>>>> >>>> > >> > are
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> >>>> >>> graph.GraphJobRunner: 50
> >>> >>>> >>> >>>>>>> >>>> > vertices
> >>> >>>> >>> >>>>>>> >>>> > >> > are
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
> >>> >>>> >>> bsp.LocalBSPRunner:
> >>> >>>> >>> >>>>>>> >>>> Exception
> >>> >>>> >>> >>>>>>> >>>> > >> > during
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> BSP
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> java.lang.IllegalArgumentException:
> >>> >>>> >>> Messages
> >>> >>>> >>> >>>>>>> must
> >>> >>>> >>> >>>>>>> >>>> > never
> >>> >>>> >>> >>>>>>> >>>> > >> be
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> behind
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> the
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current
> Message
> >>> ID: 1
> >>> >>>> >>> vs. 50
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>>
> >>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>>
> >>> >>>>
> >>>
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>>
> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>>
> >>> >>>>
> >>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >>
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>>
> >>> >>>>
> >>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >>
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>>
> >>> >>>>
> >>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>>
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>>
> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>>
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >>
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>>
> >>> >>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >>
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>>
> >>> >>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>> >>> java.lang.Thread.run(Thread.java:722)
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J.
> Yoon
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> --
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> --
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>> >>> >>>>>>> >>>> > >> > >>
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >> > --
> >>> >>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>>>> >>>> > >> > @eddieyoon
> >>> >>>> >>> >>>>>>> >>>> > >> >
> >>> >>>> >>> >>>>>>> >>>> > >>
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>> > --
> >>> >>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>>>> >>>> > @eddieyoon
> >>> >>>> >>> >>>>>>> >>>> >
> >>> >>>> >>> >>>>>>> >>>>
> >>> >>>> >>> >>>>>>> >>
> >>> >>>> >>> >>>>>>> >>
> >>> >>>> >>> >>>>>>> >>
> >>> >>>> >>> >>>>>>> >> --
> >>> >>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>>>> >> @eddieyoon
> >>> >>>> >>> >>>>>>> >
> >>> >>>> >>> >>>>>>> >
> >>> >>>> >>> >>>>>>> >
> >>> >>>> >>> >>>>>>> > --
> >>> >>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>>>> > @eddieyoon
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>> >>>>>>> --
> >>> >>>> >>> >>>>>>> Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>>>> @eddieyoon
> >>> >>>> >>> >>>>>>>
> >>> >>>> >>> >>>>>
> >>> >>>> >>> >>>>>
> >>> >>>> >>> >>>>>
> >>> >>>> >>> >>>>> --
> >>> >>>> >>> >>>>> Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>>> @eddieyoon
> >>> >>>> >>> >>>>
> >>> >>>> >>> >>>>
> >>> >>>> >>> >>>>
> >>> >>>> >>> >>>> --
> >>> >>>> >>> >>>> Best Regards, Edward J. Yoon
> >>> >>>> >>> >>>> @eddieyoon
> >>> >>>> >>> >>>
> >>> >>>> >>> >>>
> >>> >>>> >>> >>>
> >>> >>>> >>> >>> --
> >>> >>>> >>> >>> Best Regards, Edward J. Yoon
> >>> >>>> >>> >>> @eddieyoon
> >>> >>>> >>> >>
> >>> >>>> >>> >>
> >>> >>>> >>> >>
> >>> >>>> >>> >> --
> >>> >>>> >>> >> Best Regards, Edward J. Yoon
> >>> >>>> >>> >> @eddieyoon
> >>> >>>> >>> >
> >>> >>>> >>> >
> >>> >>>> >>> >
> >>> >>>> >>> > --
> >>> >>>> >>> > Best Regards, Edward J. Yoon
> >>> >>>> >>> > @eddieyoon
> >>> >>>> >>>
> >>> >>>> >>>
> >>> >>>> >>>
> >>> >>>> >>> --
> >>> >>>> >>> Best Regards, Edward J. Yoon
> >>> >>>> >>> @eddieyoon
> >>> >>>> >>>
> >>> >>>> >
> >>> >>>> >
> >>> >>>> >
> >>> >>>> > --
> >>> >>>> > Best Regards, Edward J. Yoon
> >>> >>>> > @eddieyoon
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> --
> >>> >>>> Best Regards, Edward J. Yoon
> >>> >>>> @eddieyoon
> >>> >>>>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Best Regards, Edward J. Yoon
> >>> >> @eddieyoon
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Best Regards, Edward J. Yoon
> >>> > @eddieyoon
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

P.S., These comments are never helpful in developing community.

"before you run riot on all along the codebase, Suraj ist currently working
on that stuff- don't make it more difficult for him rebasing all his
patches the whole time.
He has the plan so that we made to make the stuff working, his part is
currently missing. So don't try to muddle arround there, it will make this
take longer than already needed."

On Thu, Mar 14, 2013 at 7:57 PM, Edward J. Yoon <ed...@apache.org> wrote:
> In my opinion, the our best action is - 1) explain the plans, edit
> together on Wiki, and then 2) break-down implementation tasks as small
> as possible so that available people can try it in parallel. Then, you
> can use available people. Do you remember, I asked you to write down
> your plan here? - http://wiki.apache.org/hama/SpillingQueue If you
> have some time, Please do for me. I'll help you in my free time.
>
> Regarding branches, maybe we all are not familiar with online
> collaboration (or don't want to collaborate anymore). If we want to
> walk own ways, why we need to be in here together?
>
> On Thu, Mar 14, 2013 at 7:13 PM, Suraj Menon <su...@apache.org> wrote:
>> Three points:
>>
>> Firstly, apologies because partly this conversation emanates from the delay
>> in providing the set of patches. I was not able to slice as much time I was
>> hoping.
>>
>> Second, I think I/we can work on a separate branches. Since most of these
>> concerns could only be answered by future patches, a decision could be made
>> then. We can decide if svn revert is needed during the process on trunk.
>> (This is a general comment and not related to particular JIRA)
>>
>> Third, Please feel free to slice a release if it is really important.
>>
>> Thanks,
>> Suraj
>>
>> On Thu, Mar 14, 2013 at 5:39 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> To reduce arguing, I'm appending my opinions.
>>>
>>> In HAMA-704, I wanted to remove only message map to reduce memory
>>> consumption. I still don't want to talk about disk-based vertices and
>>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
>>> 'partitioning issue fixed and quick executable examples' version ASAP.
>>> That's why I scheduled Spilling Queue in 0.7 roadmap.
>>>
>>> As you can see, issues are happening one right after another. I don't
>>> think we have to clean all never-ending issues. We can improve
>>> step-by-step.
>>>
>>> 1. http://wiki.apache.org/hama/RoadMap
>>>
>>> On Thu, Mar 14, 2013 at 6:22 PM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> > Typos ;)
>>> >
>>> >> except YARN integration tasks. If you leave here, I have to take cover
>>> >> YARN tasks. Should I wait someone? Am I touching core module
>>> >
>>> > I have to cover YARN tasks instead of you.
>>> >
>>> > On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> >> Hmm, here's my opinions:
>>> >>
>>> >> As you know, we have a problem of lack of team members and
>>> >> contributors. So we should break down every tasks as small as
>>> >> possible. Our best action is improving step-by-step. And every
>>> >> Hama-x.x.x should run well even though it's a baby cart level.
>>> >>
>>> >> And, Tech should be developed under the necessity. So I think we need
>>> >> to cut release as often as possible. Therefore I volunteered to manage
>>> >> release. Actually, I was wanted to work only on QA (quality assurance)
>>> >> related tasks because yours code is better than me and I have a
>>> >> cluster.
>>> >>
>>> >> However, we are currently not doing like that. I guess there are many
>>> >> reasons. We're all not a full-time open sourcer (except me).
>>> >>
>>> >>> You have 23 issues assigned.  Why do you need to work on that?
>>> >>
>>> >> I don't know what you mean exactly. But 23 issues are almost examples
>>> >> except YARN integration tasks. If you leave here, I have to take cover
>>> >> YARN tasks. Should I wait someone? Am I touching core module
>>> >> aggressively?
>>> >>
>>> >>> Otherwise Suraj and I branch that issues away and you can play
>>> arround.l in
>>> >>> trunk how you like.
>>> >>
>>> >> I also don't know what you mean exactly but if you want, Please do.
>>> >>
>>> >> By the way, can you answer about this question - Is it really
>>> >> technical conflicts? or emotional conflicts?
>>> >>
>>> >> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
>>> >> <th...@gmail.com> wrote:
>>> >>> You have 23 issues assigned.  Why do you need to work on that?
>>> >>> Otherwise Suraj and I branch that issues away and you can play
>>> arround.l in
>>> >>> trunk how you like.
>>> >>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <ed...@apache.org>:
>>> >>>
>>> >>>> P.S., Please don't say like that.
>>> >>>>
>>> >>>> No decisions made yet. And if someone have a question or missed
>>> >>>> something, you have to try to explain here. Because this is a open
>>> >>>> source. Anyone can't say "don't touch trunk bc I'm working on it".
>>> >>>>
>>> >>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>>> wrote:
>>> >>>> > Sorry for my quick and dirty style small patches.
>>> >>>> >
>>> >>>> > However, we should work together in parallel. Please share here if
>>> >>>> > there are some progresses.
>>> >>>> >
>>> >>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
>>> >>>> > <th...@gmail.com> wrote:
>>> >>>> >> Hi Edward,
>>> >>>> >>
>>> >>>> >> before you run riot on all along the codebase, Suraj ist currently
>>> >>>> working
>>> >>>> >> on that stuff- don't make it more difficult for him rebasing all
>>> his
>>> >>>> >> patches the whole time.
>>> >>>> >> He has the plan so that we made to make the stuff working, his
>>> part is
>>> >>>> >> currently missing. So don't try to muddle arround there, it will
>>> make
>>> >>>> this
>>> >>>> >> take longer than already needed.
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>>> >>>> >>
>>> >>>> >>> Personally, I would like to solve this issue by touching
>>> >>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
>>> >>>> >>> multiple files, we can avoid huge memory consumption.
>>> >>>> >>>
>>> >>>> >>> If we want to sort partitioned data using messaging system, idea
>>> >>>> >>> should be collected.
>>> >>>> >>>
>>> >>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
>>> >>>> edwardyoon@apache.org>
>>> >>>> >>> wrote:
>>> >>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
>>> >>>> >>> >
>>> >>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
>>> >>>> edwardyoon@apache.org>
>>> >>>> >>> wrote:
>>> >>>> >>> >> I'm reading changes of HAMA-704 again. As a result of adding
>>> >>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm
>>> not sure
>>> >>>> >>> >> but I think this approach will bring more disadvantages than
>>> >>>> >>> >> advantages.
>>> >>>> >>> >>
>>> >>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
>>> >>>> edwardyoon@apache.org>
>>> >>>> >>> wrote:
>>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
>>> storage in
>>> >>>> >>> user space
>>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
>>> writes.
>>> >>>> >>> This way
>>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
>>> vertices
>>> >>>> >>> sorted
>>> >>>> >>> >>>>>> with a single read and single write on every peer.
>>> >>>> >>> >>>
>>> >>>> >>> >>> And, as I commented JIRA ticket, I think we can't use
>>> messaging
>>> >>>> system
>>> >>>> >>> >>> for sorting vertices within partition files.
>>> >>>> >>> >>>
>>> >>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>>> >>>> >>> edwardyoon@apache.org> wrote:
>>> >>>> >>> >>>> P.S., (number of splits = number of partitions) is really
>>> confuse
>>> >>>> to
>>> >>>> >>> >>>> me. Even though blocks number is equal to desired tasks
>>> number,
>>> >>>> data
>>> >>>> >>> >>>> should be re-partitioned again.
>>> >>>> >>> >>>>
>>> >>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>>> >>>> >>> edwardyoon@apache.org> wrote:
>>> >>>> >>> >>>>> Indeed. If there are already partitioned input files
>>> (unsorted)
>>> >>>> and
>>> >>>> >>> so
>>> >>>> >>> >>>>> user want to skip pre-partitioning phase, it should be
>>> handled in
>>> >>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
>>> >>>> >>> >>>>> re-partitioned files need to be Sorted. It's only about
>>> >>>> >>> >>>>> GraphJobRunner.
>>> >>>> >>> >>>>>
>>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can
>>> have
>>> >>>> a
>>> >>>> >>> dedicated
>>> >>>> >>> >>>>>> partitioning superstep for graph applications).
>>> >>>> >>> >>>>>
>>> >>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
>>> >>>> >>> partitioning
>>> >>>> >>> >>>>> job based on superstep API?
>>> >>>> >>> >>>>>
>>> >>>> >>> >>>>> By default, 100 tasks will be assigned for partitioning job.
>>> >>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can
>>> >>>> execute
>>> >>>> >>> >>>>> the Graph job with 1,000 tasks.
>>> >>>> >>> >>>>>
>>> >>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100
>>> blocks). If
>>> >>>> I
>>> >>>> >>> >>>>> want to run with 1,000 tasks, what happens?
>>> >>>> >>> >>>>>
>>> >>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
>>> >>>> surajsmenon@apache.org>
>>> >>>> >>> wrote:
>>> >>>> >>> >>>>>> I am responding on this thread because of better
>>> continuity for
>>> >>>> >>> >>>>>> conversation. We cannot expect the partitions to be sorted
>>> every
>>> >>>> >>> time. When
>>> >>>> >>> >>>>>> the number of splits = number of partitions and
>>> partitioning is
>>> >>>> >>> switched
>>> >>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be sorted.
>>> Can
>>> >>>> we
>>> >>>> >>> do this
>>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
>>> storage in
>>> >>>> >>> user space
>>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
>>> writes.
>>> >>>> >>> This way
>>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
>>> vertices
>>> >>>> >>> sorted
>>> >>>> >>> >>>>>> with a single read and single write on every peer.
>>> >>>> >>> >>>>>>
>>> >>>> >>> >>>>>> Just clearing confusion if any regarding superstep
>>> injection for
>>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can
>>> have
>>> >>>> a
>>> >>>> >>> dedicated
>>> >>>> >>> >>>>>> partitioning superstep for graph applications).
>>> >>>> >>> >>>>>> Say there are x splits and y number of tasks configured by
>>> user.
>>> >>>> >>> >>>>>>
>>> >>>> >>> >>>>>> if x > y
>>> >>>> >>> >>>>>> The y tasks are scheduled with x of them having each of
>>> the x
>>> >>>> >>> splits and
>>> >>>> >>> >>>>>> the remaining with no resource local to them. Then the
>>> >>>> partitioning
>>> >>>> >>> >>>>>> superstep redistributes the partitions among them to create
>>> >>>> local
>>> >>>> >>> >>>>>> partitions. Now the question is can we re-initialize a
>>> peer's
>>> >>>> input
>>> >>>> >>> based
>>> >>>> >>> >>>>>> on this new local part of partition?
>>> >>>> >>> >>>>>>
>>> >>>> >>> >>>>>> if y > x
>>> >>>> >>> >>>>>> works as it works today.
>>> >>>> >>> >>>>>>
>>> >>>> >>> >>>>>> Just putting my points in brainstorming.
>>> >>>> >>> >>>>>>
>>> >>>> >>> >>>>>> -Suraj
>>> >>>> >>> >>>>>>
>>> >>>> >>> >>>>>>
>>> >>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>>> >>>> >>> edwardyoon@apache.org>wrote:
>>> >>>> >>> >>>>>>
>>> >>>> >>> >>>>>>> I just filed here
>>> >>>> https://issues.apache.org/jira/browse/HAMA-744
>>> >>>> >>> >>>>>>>
>>> >>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>>> >>>> >>> edwardyoon@apache.org>
>>> >>>> >>> >>>>>>> wrote:
>>> >>>> >>> >>>>>>> > Additionally,
>>> >>>> >>> >>>>>>> >
>>> >>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we
>>> inject the
>>> >>>> >>> partitioning
>>> >>>> >>> >>>>>>> >> superstep as the first superstep and use local memory?
>>> >>>> >>> >>>>>>> >
>>> >>>> >>> >>>>>>> > Can we execute different number of tasks per superstep?
>>> >>>> >>> >>>>>>> >
>>> >>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>>> >>>> >>> edwardyoon@apache.org>
>>> >>>> >>> >>>>>>> wrote:
>>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
>>> result
>>> >>>> from
>>> >>>> >>> the
>>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>>> >>>> partition
>>> >>>> >>> files in
>>> >>>> >>> >>>>>>> >>
>>> >>>> >>> >>>>>>> >> I see.
>>> >>>> >>> >>>>>>> >>
>>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
>>> superstep
>>> >>>> API,
>>> >>>> >>> Suraj's
>>> >>>> >>> >>>>>>> idea
>>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
>>> partitions the
>>> >>>> >>> stuff into
>>> >>>> >>> >>>>>>> our
>>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
>>> >>>> >>> >>>>>>> >>
>>> >>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
>>> >>>> partitioning
>>> >>>> >>> step,
>>> >>>> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is
>>> there
>>> >>>> some
>>> >>>> >>> special
>>> >>>> >>> >>>>>>> >> reason?
>>> >>>> >>> >>>>>>> >>
>>> >>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>> >>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
>>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
>>> result
>>> >>>> from
>>> >>>> >>> the
>>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>>> >>>> partition
>>> >>>> >>> files in
>>> >>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not
>>> sorted
>>> >>>> data
>>> >>>> >>> in the
>>> >>>> >>> >>>>>>> >>> completed file. This only applies for the graph
>>> processing
>>> >>>> >>> package.
>>> >>>> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to
>>> solve
>>> >>>> this
>>> >>>> >>> via
>>> >>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very very
>>> >>>> >>> scalable!). So the
>>> >>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a
>>> single
>>> >>>> >>> superstep in
>>> >>>> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging must
>>> be
>>> >>>> >>> sorted anyway
>>> >>>> >>> >>>>>>> for
>>> >>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and saves
>>> us
>>> >>>> the
>>> >>>> >>> >>>>>>> partitioning
>>> >>>> >>> >>>>>>> >>> job for graph processing.
>>> >>>> >>> >>>>>>> >>>
>>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
>>> superstep
>>> >>>> API,
>>> >>>> >>> Suraj's
>>> >>>> >>> >>>>>>> idea
>>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
>>> partitions the
>>> >>>> >>> stuff into
>>> >>>> >>> >>>>>>> our
>>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
>>> >>>> >>> >>>>>>> >>>
>>> >>>> >>> >>>>>>> >>>
>>> >>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>> >>>> >>> >>>>>>> >>>
>>> >>>> >>> >>>>>>> >>>> No, the partitions we write locally need not be
>>> sorted.
>>> >>>> Sorry
>>> >>>> >>> for the
>>> >>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible with
>>> >>>> Superstep
>>> >>>> >>> API.
>>> >>>> >>> >>>>>>> There
>>> >>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler after
>>> I
>>> >>>> last
>>> >>>> >>> worked on
>>> >>>> >>> >>>>>>> it.
>>> >>>> >>> >>>>>>> >>>> We can then look into partitioning superstep being
>>> >>>> executed
>>> >>>> >>> before the
>>> >>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I think
>>> it is
>>> >>>> >>> feasible.
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
>>> >>>> >>> edwardyoon@apache.org
>>> >>>> >>> >>>>>>> >>>> >wrote:
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>>> >>>> inject
>>> >>>> >>> the
>>> >>>> >>> >>>>>>> >>>> partitioning
>>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local
>>> memory?
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before calling
>>> >>>> >>> BSP.setup()
>>> >>>> >>> >>>>>>> method
>>> >>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my
>>> opinion,
>>> >>>> >>> current is
>>> >>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more
>>> experiences of
>>> >>>> >>> input
>>> >>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?!
>>> >>>> MR-like?
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>> >>>> >>> >>>>>>> surajsmenon@apache.org>
>>> >>>> >>> >>>>>>> >>>> > wrote:
>>> >>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside
>>> graph
>>> >>>> >>> module.
>>> >>>> >>> >>>>>>> When we
>>> >>>> >>> >>>>>>> >>>> > have
>>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>>> >>>> inject
>>> >>>> >>> the
>>> >>>> >>> >>>>>>> >>>> partitioning
>>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local
>>> memory?
>>> >>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a job and
>>> are
>>> >>>> >>> creating two
>>> >>>> >>> >>>>>>> copies
>>> >>>> >>> >>>>>>> >>>> > of
>>> >>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it
>>> >>>> possible
>>> >>>> >>> to
>>> >>>> >>> >>>>>>> create or
>>> >>>> >>> >>>>>>> >>>> > > redistribute the partitions on local memory and
>>> >>>> >>> initialize the
>>> >>>> >>> >>>>>>> record
>>> >>>> >>> >>>>>>> >>>> > > reader there?
>>> >>>> >>> >>>>>>> >>>> > > The user can run a separate job give in examples
>>> area
>>> >>>> to
>>> >>>> >>> >>>>>>> explicitly
>>> >>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment
>>> question
>>> >>>> is
>>> >>>> >>> how much
>>> >>>> >>> >>>>>>> of
>>> >>>> >>> >>>>>>> >>>> disk
>>> >>>> >>> >>>>>>> >>>> > > space gets allocated for local memory usage?
>>> Would it
>>> >>>> be
>>> >>>> >>> a safe
>>> >>>> >>> >>>>>>> >>>> approach
>>> >>>> >>> >>>>>>> >>>> > > with the limitations?
>>> >>>> >>> >>>>>>> >>>> > >
>>> >>>> >>> >>>>>>> >>>> > > -Suraj
>>> >>>> >>> >>>>>>> >>>> > >
>>> >>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>> >>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>>> >>>> >>> >>>>>>> >>>> > >
>>> >>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files
>>> we can
>>> >>>> add
>>> >>>> >>> this to
>>> >>>> >>> >>>>>>> the
>>> >>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
>>> >>>> >>> >>>>>>> >>>> > >>
>>> >>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>>> >
>>> >>>> >>> >>>>>>> >>>> > >>
>>> >>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really
>>> >>>> necessary
>>> >>>> >>> to be
>>> >>>> >>> >>>>>>> Sorted?
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas
>>> Jungblut
>>> >>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>>> >>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works,
>>> obviously
>>> >>>> if
>>> >>>> >>> you merge
>>> >>>> >>> >>>>>>> n
>>> >>>> >>> >>>>>>> >>>> > sorted
>>> >>>> >>> >>>>>>> >>>> > >> > files
>>> >>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this will
>>> >>>> result in
>>> >>>> >>> totally
>>> >>>> >>> >>>>>>> >>>> > unsorted
>>> >>>> >>> >>>>>>> >>>> > >> > data
>>> >>>> >>> >>>>>>> >>>> > >> > > ;-)
>>> >>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>>> >>>> >>> >>>>>>> >>>> > >> > >
>>> >>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
>>> >>>> thomas.jungblut@gmail.com
>>> >>>> >>> >
>>> >>>> >>> >>>>>>> >>>> > >> > >
>>> >>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>> >>>> >>> >>>>>>> >>>> > >> > >>
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
>>> >>>> >>> >>>>>>> >>>> > >> > >> ...
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
>>> >>>> >>> >>>>>>> >>>> > >> > >> ...
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
>>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
>>> >>>> >>> >>>>>>> >>>> > >> > >>
>>> >>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>>> >>>> >>> >>>>>>> >>>> > >> > >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>
>>> >>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>>> >>>> >>> thomas.jungblut@gmail.com>
>>> >>>> >>> >>>>>>> >>>> > >> > >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
>>> >>>> edwardyoon@apache.org>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please
>>> do.
>>> >>>> >>> March 1 is
>>> >>>> >>> >>>>>>> >>>> > holiday[1]
>>> >>>> >>> >>>>>>> >>>> > >> so
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> 1.
>>> >>>> >>> >>>>>>>
>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas
>>> >>>> Jungblut
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file,
>>> didn't
>>> >>>> >>> observe if all
>>> >>>> >>> >>>>>>> >>>> items
>>> >>>> >>> >>>>>>> >>>> > >> were
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> added.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the
>>> >>>> logic
>>> >>>> >>> of the ID
>>> >>>> >>> >>>>>>> into
>>> >>>> >>> >>>>>>> >>>> > the
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
>>> >>>> edwardyoon@apache.org
>>> >>>> >>> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when
>>> >>>> generate
>>> >>>> >>> adjacency
>>> >>>> >>> >>>>>>> >>>> matrix
>>> >>>> >>> >>>>>>> >>>> > >> into
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM,
>>> Thomas
>>> >>>> >>> Jungblut
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they
>>> partitioned
>>> >>>> >>> correctly?
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>>> >>>> >>> edwardyoon@apache.org>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>>> :~/workspace/hama-trunk$
>>> >>>> ls
>>> >>>> >>> -al
>>> >>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096
>>>  2월 28
>>> >>>> >>> 18:03 .
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480
>>>  2월 28
>>> >>>> >>> 18:04 ..
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243
>>>  2월 28
>>> >>>> >>> 18:01
>>> >>>> >>> >>>>>>> part-00000
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28
>>>  2월 28
>>> >>>> >>> 18:01
>>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251
>>>  2월 28
>>> >>>> >>> 18:01
>>> >>>> >>> >>>>>>> part-00001
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28
>>>  2월 28
>>> >>>> >>> 18:01
>>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096
>>>  2월 28
>>> >>>> >>> 18:03
>>> >>>> >>> >>>>>>> partitions
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>>> :~/workspace/hama-trunk$
>>> >>>> ls
>>> >>>> >>> -al
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096
>>>  2월 28
>>> >>>> >>> 18:03 .
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096
>>>  2월 28
>>> >>>> >>> 18:03 ..
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932
>>>  2월 28
>>> >>>> 18:03
>>> >>>> >>> >>>>>>> part-00000
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
>>>  2월 28
>>> >>>> 18:03
>>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955
>>>  2월 28
>>> >>>> 18:03
>>> >>>> >>> >>>>>>> part-00001
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
>>>  2월 28
>>> >>>> 18:03
>>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>>> :~/workspace/hama-trunk$
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM,
>>> Edward
>>> >>>> <
>>> >>>> >>> >>>>>>> >>>> edward@udanax.org
>>> >>>> >>> >>>>>>> >>>> > >
>>> >>>> >>> >>>>>>> >>>> > >> > wrote:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM,
>>> Thomas
>>> >>>> >>> Jungblut <
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation
>>> for me
>>> >>>> >>> please?
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from
>>> fastgen,
>>> >>>> >>> part-00000 and
>>> >>>> >>> >>>>>>> >>>> > >> part-00001,
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> both
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition
>>> directory,
>>> >>>> there
>>> >>>> >>> is only a
>>> >>>> >>> >>>>>>> >>>> single
>>> >>>> >>> >>>>>>> >>>> > >> > 5.56kb
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> file.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the
>>> partitioner to
>>> >>>> >>> write a
>>> >>>> >>> >>>>>>> single
>>> >>>> >>> >>>>>>> >>>> > file
>>> >>>> >>> >>>>>>> >>>> > >> if
>>> >>>> >>> >>>>>>> >>>> > >> > you
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files,
>>> >>>> strange
>>> >>>> >>> huh?
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>> >>>> >>> >>>>>>> thomas.jungblut@gmail.com>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10
>>> /tmp/randomgraph
>>> >>>> 1
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
>>> >>>> /tmp/pageout
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
>>> >>>> >>> profiled, maybe
>>> >>>> >>> >>>>>>> the
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the
>>> input
>>> >>>> or
>>> >>>> >>> something
>>> >>>> >>> >>>>>>> else.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
>>> >>>> >>> edwardyoon@apache.org
>>> >>>> >>> >>>>>>> >
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work
>>> for
>>> >>>> graph
>>> >>>> >>> examples.
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>>> >>>> >>> >>>>>>> >>>> > >> > jar
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>> >>>> >>> >>>>>>> >>>> > >> > fastgen
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>>> >>>> >>> util.NativeCodeLoader:
>>> >>>> >>> >>>>>>> Unable
>>> >>>> >>> >>>>>>> >>>> > to
>>> >>>> >>> >>>>>>> >>>> > >> > load
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>>> >>>> >>> platform...
>>> >>>> >>> >>>>>>> using
>>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> Running
>>> >>>> >>> >>>>>>> >>>> job:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>>> >>>> >>> bsp.LocalBSPRunner:
>>> >>>> >>> >>>>>>> Setting
>>> >>>> >>> >>>>>>> >>>> up
>>> >>>> >>> >>>>>>> >>>> > a
>>> >>>> >>> >>>>>>> >>>> > >> new
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> Current
>>> >>>> >>> >>>>>>> >>>> > >> supersteps
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>>> >>> bsp.BSPJobClient: The
>>> >>>> >>> >>>>>>> total
>>> >>>> >>> >>>>>>> >>>> > number
>>> >>>> >>> >>>>>>> >>>> > >> > of
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> Counters: 3
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>>> >>>> >>> >>>>>>> >>>> > >> > jar
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> hama-examples-0.7.0-SNAPSHOT.jar
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>>> >>>> >>> >>>>>>> >>>> > >> > jar
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>> >>>> >>> >>>>>>> >>>> > pagerank
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>>> >>>> >>> util.NativeCodeLoader:
>>> >>>> >>> >>>>>>> Unable
>>> >>>> >>> >>>>>>> >>>> > to
>>> >>>> >>> >>>>>>> >>>> > >> > load
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>>> >>>> >>> platform...
>>> >>>> >>> >>>>>>> using
>>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>>> >>>> >>> bsp.FileInputFormat:
>>> >>>> >>> >>>>>>> Total
>>> >>>> >>> >>>>>>> >>>> > input
>>> >>>> >>> >>>>>>> >>>> > >> > paths
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>>> >>>> >>> bsp.FileInputFormat:
>>> >>>> >>> >>>>>>> Total
>>> >>>> >>> >>>>>>> >>>> > input
>>> >>>> >>> >>>>>>> >>>> > >> > paths
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> Running
>>> >>>> >>> >>>>>>> >>>> job:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>>> >>>> >>> bsp.LocalBSPRunner:
>>> >>>> >>> >>>>>>> Setting
>>> >>>> >>> >>>>>>> >>>> up
>>> >>>> >>> >>>>>>> >>>> > a
>>> >>>> >>> >>>>>>> >>>> > >> new
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> Current
>>> >>>> >>> >>>>>>> >>>> > >> supersteps
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient: The
>>> >>>> >>> >>>>>>> total
>>> >>>> >>> >>>>>>> >>>> > number
>>> >>>> >>> >>>>>>> >>>> > >> > of
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> Counters: 6
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.FileInputFormat:
>>> >>>> >>> >>>>>>> Total
>>> >>>> >>> >>>>>>> >>>> > input
>>> >>>> >>> >>>>>>> >>>> > >> > paths
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.BSPJobClient:
>>> >>>> >>> >>>>>>> Running
>>> >>>> >>> >>>>>>> >>>> job:
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> bsp.LocalBSPRunner:
>>> >>>> >>> >>>>>>> Setting
>>> >>>> >>> >>>>>>> >>>> up
>>> >>>> >>> >>>>>>> >>>> > a
>>> >>>> >>> >>>>>>> >>>> > >> new
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> graph.GraphJobRunner: 50
>>> >>>> >>> >>>>>>> >>>> > vertices
>>> >>>> >>> >>>>>>> >>>> > >> > are
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>>> >>> graph.GraphJobRunner: 50
>>> >>>> >>> >>>>>>> >>>> > vertices
>>> >>>> >>> >>>>>>> >>>> > >> > are
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>>> >>>> >>> bsp.LocalBSPRunner:
>>> >>>> >>> >>>>>>> >>>> Exception
>>> >>>> >>> >>>>>>> >>>> > >> > during
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> BSP
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> java.lang.IllegalArgumentException:
>>> >>>> >>> Messages
>>> >>>> >>> >>>>>>> must
>>> >>>> >>> >>>>>>> >>>> > never
>>> >>>> >>> >>>>>>> >>>> > >> be
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> behind
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> the
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message
>>> ID: 1
>>> >>>> >>> vs. 50
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>>
>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>>
>>> >>>> >>>
>>> >>>>
>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>>
>>> >>>> >>>
>>> >>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >>
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>> >>>>>>>
>>> >>>> >>>
>>> >>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >>
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>> >>>>>>>
>>> >>>> >>>
>>> >>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>>
>>> >>>> >>>
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >>
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>> >>>>>>>
>>> >>>> >>>
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >>
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>> >>>>>>>
>>> >>>> >>>
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> >>> java.lang.Thread.run(Thread.java:722)
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> --
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> --
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>>> >>> >>>>>>> >>>> > >> > >>
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >> > --
>>> >>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> >>>> > >> > @eddieyoon
>>> >>>> >>> >>>>>>> >>>> > >> >
>>> >>>> >>> >>>>>>> >>>> > >>
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>> > --
>>> >>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> >>>> > @eddieyoon
>>> >>>> >>> >>>>>>> >>>> >
>>> >>>> >>> >>>>>>> >>>>
>>> >>>> >>> >>>>>>> >>
>>> >>>> >>> >>>>>>> >>
>>> >>>> >>> >>>>>>> >>
>>> >>>> >>> >>>>>>> >> --
>>> >>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> >> @eddieyoon
>>> >>>> >>> >>>>>>> >
>>> >>>> >>> >>>>>>> >
>>> >>>> >>> >>>>>>> >
>>> >>>> >>> >>>>>>> > --
>>> >>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> > @eddieyoon
>>> >>>> >>> >>>>>>>
>>> >>>> >>> >>>>>>>
>>> >>>> >>> >>>>>>>
>>> >>>> >>> >>>>>>> --
>>> >>>> >>> >>>>>>> Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>>>> @eddieyoon
>>> >>>> >>> >>>>>>>
>>> >>>> >>> >>>>>
>>> >>>> >>> >>>>>
>>> >>>> >>> >>>>>
>>> >>>> >>> >>>>> --
>>> >>>> >>> >>>>> Best Regards, Edward J. Yoon
>>> >>>> >>> >>>>> @eddieyoon
>>> >>>> >>> >>>>
>>> >>>> >>> >>>>
>>> >>>> >>> >>>>
>>> >>>> >>> >>>> --
>>> >>>> >>> >>>> Best Regards, Edward J. Yoon
>>> >>>> >>> >>>> @eddieyoon
>>> >>>> >>> >>>
>>> >>>> >>> >>>
>>> >>>> >>> >>>
>>> >>>> >>> >>> --
>>> >>>> >>> >>> Best Regards, Edward J. Yoon
>>> >>>> >>> >>> @eddieyoon
>>> >>>> >>> >>
>>> >>>> >>> >>
>>> >>>> >>> >>
>>> >>>> >>> >> --
>>> >>>> >>> >> Best Regards, Edward J. Yoon
>>> >>>> >>> >> @eddieyoon
>>> >>>> >>> >
>>> >>>> >>> >
>>> >>>> >>> >
>>> >>>> >>> > --
>>> >>>> >>> > Best Regards, Edward J. Yoon
>>> >>>> >>> > @eddieyoon
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>> --
>>> >>>> >>> Best Regards, Edward J. Yoon
>>> >>>> >>> @eddieyoon
>>> >>>> >>>
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > --
>>> >>>> > Best Regards, Edward J. Yoon
>>> >>>> > @eddieyoon
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Best Regards, Edward J. Yoon
>>> >>>> @eddieyoon
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> @eddieyoon
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon
>>> > @eddieyoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

In my opinion, the our best action is - 1) explain the plans, edit
together on Wiki, and then 2) break-down implementation tasks as small
as possible so that available people can try it in parallel. Then, you
can use available people. Do you remember, I asked you to write down
your plan here? - http://wiki.apache.org/hama/SpillingQueue If you
have some time, Please do for me. I'll help you in my free time.

Regarding branches, maybe we all are not familiar with online
collaboration (or don't want to collaborate anymore). If we want to
walk own ways, why we need to be in here together?

On Thu, Mar 14, 2013 at 7:13 PM, Suraj Menon <su...@apache.org> wrote:
> Three points:
>
> Firstly, apologies because partly this conversation emanates from the delay
> in providing the set of patches. I was not able to slice as much time I was
> hoping.
>
> Second, I think I/we can work on a separate branches. Since most of these
> concerns could only be answered by future patches, a decision could be made
> then. We can decide if svn revert is needed during the process on trunk.
> (This is a general comment and not related to particular JIRA)
>
> Third, Please feel free to slice a release if it is really important.
>
> Thanks,
> Suraj
>
> On Thu, Mar 14, 2013 at 5:39 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> To reduce arguing, I'm appending my opinions.
>>
>> In HAMA-704, I wanted to remove only message map to reduce memory
>> consumption. I still don't want to talk about disk-based vertices and
>> Spilling Queue at the moment. With this, I wanted to release 0.6.1
>> 'partitioning issue fixed and quick executable examples' version ASAP.
>> That's why I scheduled Spilling Queue in 0.7 roadmap.
>>
>> As you can see, issues are happening one right after another. I don't
>> think we have to clean all never-ending issues. We can improve
>> step-by-step.
>>
>> 1. http://wiki.apache.org/hama/RoadMap
>>
>> On Thu, Mar 14, 2013 at 6:22 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> > Typos ;)
>> >
>> >> except YARN integration tasks. If you leave here, I have to take cover
>> >> YARN tasks. Should I wait someone? Am I touching core module
>> >
>> > I have to cover YARN tasks instead of you.
>> >
>> > On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> >> Hmm, here's my opinions:
>> >>
>> >> As you know, we have a problem of lack of team members and
>> >> contributors. So we should break down every tasks as small as
>> >> possible. Our best action is improving step-by-step. And every
>> >> Hama-x.x.x should run well even though it's a baby cart level.
>> >>
>> >> And, Tech should be developed under the necessity. So I think we need
>> >> to cut release as often as possible. Therefore I volunteered to manage
>> >> release. Actually, I was wanted to work only on QA (quality assurance)
>> >> related tasks because yours code is better than me and I have a
>> >> cluster.
>> >>
>> >> However, we are currently not doing like that. I guess there are many
>> >> reasons. We're all not a full-time open sourcer (except me).
>> >>
>> >>> You have 23 issues assigned.  Why do you need to work on that?
>> >>
>> >> I don't know what you mean exactly. But 23 issues are almost examples
>> >> except YARN integration tasks. If you leave here, I have to take cover
>> >> YARN tasks. Should I wait someone? Am I touching core module
>> >> aggressively?
>> >>
>> >>> Otherwise Suraj and I branch that issues away and you can play
>> arround.l in
>> >>> trunk how you like.
>> >>
>> >> I also don't know what you mean exactly but if you want, Please do.
>> >>
>> >> By the way, can you answer about this question - Is it really
>> >> technical conflicts? or emotional conflicts?
>> >>
>> >> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
>> >> <th...@gmail.com> wrote:
>> >>> You have 23 issues assigned.  Why do you need to work on that?
>> >>> Otherwise Suraj and I branch that issues away and you can play
>> arround.l in
>> >>> trunk how you like.
>> >>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <ed...@apache.org>:
>> >>>
>> >>>> P.S., Please don't say like that.
>> >>>>
>> >>>> No decisions made yet. And if someone have a question or missed
>> >>>> something, you have to try to explain here. Because this is a open
>> >>>> source. Anyone can't say "don't touch trunk bc I'm working on it".
>> >>>>
>> >>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>>> wrote:
>> >>>> > Sorry for my quick and dirty style small patches.
>> >>>> >
>> >>>> > However, we should work together in parallel. Please share here if
>> >>>> > there are some progresses.
>> >>>> >
>> >>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
>> >>>> > <th...@gmail.com> wrote:
>> >>>> >> Hi Edward,
>> >>>> >>
>> >>>> >> before you run riot on all along the codebase, Suraj ist currently
>> >>>> working
>> >>>> >> on that stuff- don't make it more difficult for him rebasing all
>> his
>> >>>> >> patches the whole time.
>> >>>> >> He has the plan so that we made to make the stuff working, his
>> part is
>> >>>> >> currently missing. So don't try to muddle arround there, it will
>> make
>> >>>> this
>> >>>> >> take longer than already needed.
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>> >>>> >>
>> >>>> >>> Personally, I would like to solve this issue by touching
>> >>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
>> >>>> >>> multiple files, we can avoid huge memory consumption.
>> >>>> >>>
>> >>>> >>> If we want to sort partitioned data using messaging system, idea
>> >>>> >>> should be collected.
>> >>>> >>>
>> >>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
>> >>>> edwardyoon@apache.org>
>> >>>> >>> wrote:
>> >>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
>> >>>> >>> >
>> >>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
>> >>>> edwardyoon@apache.org>
>> >>>> >>> wrote:
>> >>>> >>> >> I'm reading changes of HAMA-704 again. As a result of adding
>> >>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm
>> not sure
>> >>>> >>> >> but I think this approach will bring more disadvantages than
>> >>>> >>> >> advantages.
>> >>>> >>> >>
>> >>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
>> >>>> edwardyoon@apache.org>
>> >>>> >>> wrote:
>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
>> storage in
>> >>>> >>> user space
>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
>> writes.
>> >>>> >>> This way
>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
>> vertices
>> >>>> >>> sorted
>> >>>> >>> >>>>>> with a single read and single write on every peer.
>> >>>> >>> >>>
>> >>>> >>> >>> And, as I commented JIRA ticket, I think we can't use
>> messaging
>> >>>> system
>> >>>> >>> >>> for sorting vertices within partition files.
>> >>>> >>> >>>
>> >>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>> >>>> >>> edwardyoon@apache.org> wrote:
>> >>>> >>> >>>> P.S., (number of splits = number of partitions) is really
>> confuse
>> >>>> to
>> >>>> >>> >>>> me. Even though blocks number is equal to desired tasks
>> number,
>> >>>> data
>> >>>> >>> >>>> should be re-partitioned again.
>> >>>> >>> >>>>
>> >>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>> >>>> >>> edwardyoon@apache.org> wrote:
>> >>>> >>> >>>>> Indeed. If there are already partitioned input files
>> (unsorted)
>> >>>> and
>> >>>> >>> so
>> >>>> >>> >>>>> user want to skip pre-partitioning phase, it should be
>> handled in
>> >>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
>> >>>> >>> >>>>> re-partitioned files need to be Sorted. It's only about
>> >>>> >>> >>>>> GraphJobRunner.
>> >>>> >>> >>>>>
>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can
>> have
>> >>>> a
>> >>>> >>> dedicated
>> >>>> >>> >>>>>> partitioning superstep for graph applications).
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
>> >>>> >>> partitioning
>> >>>> >>> >>>>> job based on superstep API?
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> By default, 100 tasks will be assigned for partitioning job.
>> >>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can
>> >>>> execute
>> >>>> >>> >>>>> the Graph job with 1,000 tasks.
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100
>> blocks). If
>> >>>> I
>> >>>> >>> >>>>> want to run with 1,000 tasks, what happens?
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
>> >>>> surajsmenon@apache.org>
>> >>>> >>> wrote:
>> >>>> >>> >>>>>> I am responding on this thread because of better
>> continuity for
>> >>>> >>> >>>>>> conversation. We cannot expect the partitions to be sorted
>> every
>> >>>> >>> time. When
>> >>>> >>> >>>>>> the number of splits = number of partitions and
>> partitioning is
>> >>>> >>> switched
>> >>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be sorted.
>> Can
>> >>>> we
>> >>>> >>> do this
>> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
>> storage in
>> >>>> >>> user space
>> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
>> writes.
>> >>>> >>> This way
>> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
>> vertices
>> >>>> >>> sorted
>> >>>> >>> >>>>>> with a single read and single write on every peer.
>> >>>> >>> >>>>>>
>> >>>> >>> >>>>>> Just clearing confusion if any regarding superstep
>> injection for
>> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can
>> have
>> >>>> a
>> >>>> >>> dedicated
>> >>>> >>> >>>>>> partitioning superstep for graph applications).
>> >>>> >>> >>>>>> Say there are x splits and y number of tasks configured by
>> user.
>> >>>> >>> >>>>>>
>> >>>> >>> >>>>>> if x > y
>> >>>> >>> >>>>>> The y tasks are scheduled with x of them having each of
>> the x
>> >>>> >>> splits and
>> >>>> >>> >>>>>> the remaining with no resource local to them. Then the
>> >>>> partitioning
>> >>>> >>> >>>>>> superstep redistributes the partitions among them to create
>> >>>> local
>> >>>> >>> >>>>>> partitions. Now the question is can we re-initialize a
>> peer's
>> >>>> input
>> >>>> >>> based
>> >>>> >>> >>>>>> on this new local part of partition?
>> >>>> >>> >>>>>>
>> >>>> >>> >>>>>> if y > x
>> >>>> >>> >>>>>> works as it works today.
>> >>>> >>> >>>>>>
>> >>>> >>> >>>>>> Just putting my points in brainstorming.
>> >>>> >>> >>>>>>
>> >>>> >>> >>>>>> -Suraj
>> >>>> >>> >>>>>>
>> >>>> >>> >>>>>>
>> >>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>> >>>> >>> edwardyoon@apache.org>wrote:
>> >>>> >>> >>>>>>
>> >>>> >>> >>>>>>> I just filed here
>> >>>> https://issues.apache.org/jira/browse/HAMA-744
>> >>>> >>> >>>>>>>
>> >>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>> >>>> >>> edwardyoon@apache.org>
>> >>>> >>> >>>>>>> wrote:
>> >>>> >>> >>>>>>> > Additionally,
>> >>>> >>> >>>>>>> >
>> >>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we
>> inject the
>> >>>> >>> partitioning
>> >>>> >>> >>>>>>> >> superstep as the first superstep and use local memory?
>> >>>> >>> >>>>>>> >
>> >>>> >>> >>>>>>> > Can we execute different number of tasks per superstep?
>> >>>> >>> >>>>>>> >
>> >>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>> >>>> >>> edwardyoon@apache.org>
>> >>>> >>> >>>>>>> wrote:
>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
>> result
>> >>>> from
>> >>>> >>> the
>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>> >>>> partition
>> >>>> >>> files in
>> >>>> >>> >>>>>>> >>
>> >>>> >>> >>>>>>> >> I see.
>> >>>> >>> >>>>>>> >>
>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
>> superstep
>> >>>> API,
>> >>>> >>> Suraj's
>> >>>> >>> >>>>>>> idea
>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
>> partitions the
>> >>>> >>> stuff into
>> >>>> >>> >>>>>>> our
>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
>> >>>> >>> >>>>>>> >>
>> >>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
>> >>>> partitioning
>> >>>> >>> step,
>> >>>> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is
>> there
>> >>>> some
>> >>>> >>> special
>> >>>> >>> >>>>>>> >> reason?
>> >>>> >>> >>>>>>> >>
>> >>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>> >>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
>> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
>> result
>> >>>> from
>> >>>> >>> the
>> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>> >>>> partition
>> >>>> >>> files in
>> >>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not
>> sorted
>> >>>> data
>> >>>> >>> in the
>> >>>> >>> >>>>>>> >>> completed file. This only applies for the graph
>> processing
>> >>>> >>> package.
>> >>>> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to
>> solve
>> >>>> this
>> >>>> >>> via
>> >>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very very
>> >>>> >>> scalable!). So the
>> >>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a
>> single
>> >>>> >>> superstep in
>> >>>> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging must
>> be
>> >>>> >>> sorted anyway
>> >>>> >>> >>>>>>> for
>> >>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and saves
>> us
>> >>>> the
>> >>>> >>> >>>>>>> partitioning
>> >>>> >>> >>>>>>> >>> job for graph processing.
>> >>>> >>> >>>>>>> >>>
>> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
>> superstep
>> >>>> API,
>> >>>> >>> Suraj's
>> >>>> >>> >>>>>>> idea
>> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
>> partitions the
>> >>>> >>> stuff into
>> >>>> >>> >>>>>>> our
>> >>>> >>> >>>>>>> >>> messaging system is actually the best.
>> >>>> >>> >>>>>>> >>>
>> >>>> >>> >>>>>>> >>>
>> >>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>> >>>> >>> >>>>>>> >>>
>> >>>> >>> >>>>>>> >>>> No, the partitions we write locally need not be
>> sorted.
>> >>>> Sorry
>> >>>> >>> for the
>> >>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible with
>> >>>> Superstep
>> >>>> >>> API.
>> >>>> >>> >>>>>>> There
>> >>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler after
>> I
>> >>>> last
>> >>>> >>> worked on
>> >>>> >>> >>>>>>> it.
>> >>>> >>> >>>>>>> >>>> We can then look into partitioning superstep being
>> >>>> executed
>> >>>> >>> before the
>> >>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I think
>> it is
>> >>>> >>> feasible.
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
>> >>>> >>> edwardyoon@apache.org
>> >>>> >>> >>>>>>> >>>> >wrote:
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>> >>>> inject
>> >>>> >>> the
>> >>>> >>> >>>>>>> >>>> partitioning
>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local
>> memory?
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before calling
>> >>>> >>> BSP.setup()
>> >>>> >>> >>>>>>> method
>> >>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my
>> opinion,
>> >>>> >>> current is
>> >>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more
>> experiences of
>> >>>> >>> input
>> >>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?!
>> >>>> MR-like?
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>> >>>> >>> >>>>>>> surajsmenon@apache.org>
>> >>>> >>> >>>>>>> >>>> > wrote:
>> >>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside
>> graph
>> >>>> >>> module.
>> >>>> >>> >>>>>>> When we
>> >>>> >>> >>>>>>> >>>> > have
>> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>> >>>> inject
>> >>>> >>> the
>> >>>> >>> >>>>>>> >>>> partitioning
>> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local
>> memory?
>> >>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a job and
>> are
>> >>>> >>> creating two
>> >>>> >>> >>>>>>> copies
>> >>>> >>> >>>>>>> >>>> > of
>> >>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it
>> >>>> possible
>> >>>> >>> to
>> >>>> >>> >>>>>>> create or
>> >>>> >>> >>>>>>> >>>> > > redistribute the partitions on local memory and
>> >>>> >>> initialize the
>> >>>> >>> >>>>>>> record
>> >>>> >>> >>>>>>> >>>> > > reader there?
>> >>>> >>> >>>>>>> >>>> > > The user can run a separate job give in examples
>> area
>> >>>> to
>> >>>> >>> >>>>>>> explicitly
>> >>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment
>> question
>> >>>> is
>> >>>> >>> how much
>> >>>> >>> >>>>>>> of
>> >>>> >>> >>>>>>> >>>> disk
>> >>>> >>> >>>>>>> >>>> > > space gets allocated for local memory usage?
>> Would it
>> >>>> be
>> >>>> >>> a safe
>> >>>> >>> >>>>>>> >>>> approach
>> >>>> >>> >>>>>>> >>>> > > with the limitations?
>> >>>> >>> >>>>>>> >>>> > >
>> >>>> >>> >>>>>>> >>>> > > -Suraj
>> >>>> >>> >>>>>>> >>>> > >
>> >>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>> >>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>> >>>> >>> >>>>>>> >>>> > >
>> >>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files
>> we can
>> >>>> add
>> >>>> >>> this to
>> >>>> >>> >>>>>>> the
>> >>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
>> >>>> >>> >>>>>>> >>>> > >>
>> >>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>> >
>> >>>> >>> >>>>>>> >>>> > >>
>> >>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really
>> >>>> necessary
>> >>>> >>> to be
>> >>>> >>> >>>>>>> Sorted?
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas
>> Jungblut
>> >>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>> >>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works,
>> obviously
>> >>>> if
>> >>>> >>> you merge
>> >>>> >>> >>>>>>> n
>> >>>> >>> >>>>>>> >>>> > sorted
>> >>>> >>> >>>>>>> >>>> > >> > files
>> >>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this will
>> >>>> result in
>> >>>> >>> totally
>> >>>> >>> >>>>>>> >>>> > unsorted
>> >>>> >>> >>>>>>> >>>> > >> > data
>> >>>> >>> >>>>>>> >>>> > >> > > ;-)
>> >>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>> >>>> >>> >>>>>>> >>>> > >> > >
>> >>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
>> >>>> thomas.jungblut@gmail.com
>> >>>> >>> >
>> >>>> >>> >>>>>>> >>>> > >> > >
>> >>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
>> >>>> >>> >>>>>>> >>>> > >> > >> ...
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
>> >>>> >>> >>>>>>> >>>> > >> > >> ...
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
>> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>> >>>> >>> thomas.jungblut@gmail.com>
>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
>> >>>> edwardyoon@apache.org>
>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please
>> do.
>> >>>> >>> March 1 is
>> >>>> >>> >>>>>>> >>>> > holiday[1]
>> >>>> >>> >>>>>>> >>>> > >> so
>> >>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> 1.
>> >>>> >>> >>>>>>>
>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas
>> >>>> Jungblut
>> >>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file,
>> didn't
>> >>>> >>> observe if all
>> >>>> >>> >>>>>>> >>>> items
>> >>>> >>> >>>>>>> >>>> > >> were
>> >>>> >>> >>>>>>> >>>> > >> > >>>> added.
>> >>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the
>> >>>> logic
>> >>>> >>> of the ID
>> >>>> >>> >>>>>>> into
>> >>>> >>> >>>>>>> >>>> > the
>> >>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
>> >>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
>> >>>> edwardyoon@apache.org
>> >>>> >>> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when
>> >>>> generate
>> >>>> >>> adjacency
>> >>>> >>> >>>>>>> >>>> matrix
>> >>>> >>> >>>>>>> >>>> > >> into
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM,
>> Thomas
>> >>>> >>> Jungblut
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they
>> partitioned
>> >>>> >>> correctly?
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>> >>>> >>> edwardyoon@apache.org>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> :~/workspace/hama-trunk$
>> >>>> ls
>> >>>> >>> -al
>> >>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096
>>  2월 28
>> >>>> >>> 18:03 .
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480
>>  2월 28
>> >>>> >>> 18:04 ..
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243
>>  2월 28
>> >>>> >>> 18:01
>> >>>> >>> >>>>>>> part-00000
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28
>>  2월 28
>> >>>> >>> 18:01
>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251
>>  2월 28
>> >>>> >>> 18:01
>> >>>> >>> >>>>>>> part-00001
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28
>>  2월 28
>> >>>> >>> 18:01
>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096
>>  2월 28
>> >>>> >>> 18:03
>> >>>> >>> >>>>>>> partitions
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> :~/workspace/hama-trunk$
>> >>>> ls
>> >>>> >>> -al
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096
>>  2월 28
>> >>>> >>> 18:03 .
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096
>>  2월 28
>> >>>> >>> 18:03 ..
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932
>>  2월 28
>> >>>> 18:03
>> >>>> >>> >>>>>>> part-00000
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
>>  2월 28
>> >>>> 18:03
>> >>>> >>> >>>>>>> >>>> > .part-00000.crc
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955
>>  2월 28
>> >>>> 18:03
>> >>>> >>> >>>>>>> part-00001
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
>>  2월 28
>> >>>> 18:03
>> >>>> >>> >>>>>>> >>>> > .part-00001.crc
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
>> :~/workspace/hama-trunk$
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM,
>> Edward
>> >>>> <
>> >>>> >>> >>>>>>> >>>> edward@udanax.org
>> >>>> >>> >>>>>>> >>>> > >
>> >>>> >>> >>>>>>> >>>> > >> > wrote:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM,
>> Thomas
>> >>>> >>> Jungblut <
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation
>> for me
>> >>>> >>> please?
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from
>> fastgen,
>> >>>> >>> part-00000 and
>> >>>> >>> >>>>>>> >>>> > >> part-00001,
>> >>>> >>> >>>>>>> >>>> > >> > >>>> both
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition
>> directory,
>> >>>> there
>> >>>> >>> is only a
>> >>>> >>> >>>>>>> >>>> single
>> >>>> >>> >>>>>>> >>>> > >> > 5.56kb
>> >>>> >>> >>>>>>> >>>> > >> > >>>> file.
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the
>> partitioner to
>> >>>> >>> write a
>> >>>> >>> >>>>>>> single
>> >>>> >>> >>>>>>> >>>> > file
>> >>>> >>> >>>>>>> >>>> > >> if
>> >>>> >>> >>>>>>> >>>> > >> > you
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files,
>> >>>> strange
>> >>>> >>> huh?
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>> >>>> >>> >>>>>>> thomas.jungblut@gmail.com>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10
>> /tmp/randomgraph
>> >>>> 1
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
>> >>>> /tmp/pageout
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
>> >>>> >>> profiled, maybe
>> >>>> >>> >>>>>>> the
>> >>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the
>> input
>> >>>> or
>> >>>> >>> something
>> >>>> >>> >>>>>>> else.
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
>> >>>> >>> edwardyoon@apache.org
>> >>>> >>> >>>>>>> >
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work
>> for
>> >>>> graph
>> >>>> >>> examples.
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>> >>>> >>> >>>>>>> >>>> > >> > fastgen
>> >>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>> >>>> >>> util.NativeCodeLoader:
>> >>>> >>> >>>>>>> Unable
>> >>>> >>> >>>>>>> >>>> > to
>> >>>> >>> >>>>>>> >>>> > >> > load
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>> >>>> >>> platform...
>> >>>> >>> >>>>>>> using
>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> Running
>> >>>> >>> >>>>>>> >>>> job:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> >>>> >>> bsp.LocalBSPRunner:
>> >>>> >>> >>>>>>> Setting
>> >>>> >>> >>>>>>> >>>> up
>> >>>> >>> >>>>>>> >>>> > a
>> >>>> >>> >>>>>>> >>>> > >> new
>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> Current
>> >>>> >>> >>>>>>> >>>> > >> supersteps
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>>> >>> bsp.BSPJobClient: The
>> >>>> >>> >>>>>>> total
>> >>>> >>> >>>>>>> >>>> > number
>> >>>> >>> >>>>>>> >>>> > >> > of
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> Counters: 3
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> hama-examples-0.7.0-SNAPSHOT.jar
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>>> >>> >>>>>>> >>>> > >> bin/hama
>> >>>> >>> >>>>>>> >>>> > >> > jar
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>> >>>> >>> >>>>>>> >>>> > pagerank
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>> >>>> >>> util.NativeCodeLoader:
>> >>>> >>> >>>>>>> Unable
>> >>>> >>> >>>>>>> >>>> > to
>> >>>> >>> >>>>>>> >>>> > >> > load
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>> >>>> >>> platform...
>> >>>> >>> >>>>>>> using
>> >>>> >>> >>>>>>> >>>> > >> > builtin-java
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> >>>> >>> bsp.FileInputFormat:
>> >>>> >>> >>>>>>> Total
>> >>>> >>> >>>>>>> >>>> > input
>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> >>>> >>> bsp.FileInputFormat:
>> >>>> >>> >>>>>>> Total
>> >>>> >>> >>>>>>> >>>> > input
>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> Running
>> >>>> >>> >>>>>>> >>>> job:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> >>>> >>> bsp.LocalBSPRunner:
>> >>>> >>> >>>>>>> Setting
>> >>>> >>> >>>>>>> >>>> up
>> >>>> >>> >>>>>>> >>>> > a
>> >>>> >>> >>>>>>> >>>> > >> new
>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> Current
>> >>>> >>> >>>>>>> >>>> > >> supersteps
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient: The
>> >>>> >>> >>>>>>> total
>> >>>> >>> >>>>>>> >>>> > number
>> >>>> >>> >>>>>>> >>>> > >> > of
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> Counters: 6
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.FileInputFormat:
>> >>>> >>> >>>>>>> Total
>> >>>> >>> >>>>>>> >>>> > input
>> >>>> >>> >>>>>>> >>>> > >> > paths
>> >>>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.BSPJobClient:
>> >>>> >>> >>>>>>> Running
>> >>>> >>> >>>>>>> >>>> job:
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> bsp.LocalBSPRunner:
>> >>>> >>> >>>>>>> Setting
>> >>>> >>> >>>>>>> >>>> up
>> >>>> >>> >>>>>>> >>>> > a
>> >>>> >>> >>>>>>> >>>> > >> new
>> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> graph.GraphJobRunner: 50
>> >>>> >>> >>>>>>> >>>> > vertices
>> >>>> >>> >>>>>>> >>>> > >> > are
>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>>> >>> graph.GraphJobRunner: 50
>> >>>> >>> >>>>>>> >>>> > vertices
>> >>>> >>> >>>>>>> >>>> > >> > are
>> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>> >>>> >>> bsp.LocalBSPRunner:
>> >>>> >>> >>>>>>> >>>> Exception
>> >>>> >>> >>>>>>> >>>> > >> > during
>> >>>> >>> >>>>>>> >>>> > >> > >>>> BSP
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> java.lang.IllegalArgumentException:
>> >>>> >>> Messages
>> >>>> >>> >>>>>>> must
>> >>>> >>> >>>>>>> >>>> > never
>> >>>> >>> >>>>>>> >>>> > >> be
>> >>>> >>> >>>>>>> >>>> > >> > >>>> behind
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> the
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message
>> ID: 1
>> >>>> >>> vs. 50
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>>
>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>>
>> >>>> >>>
>> >>>>
>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>>
>> >>>> >>>
>> >>>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >>
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>> >>>>>>>
>> >>>> >>>
>> >>>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >>
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>> >>>>>>>
>> >>>> >>>
>> >>>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>>
>> >>>> >>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >>
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>> >>>>>>>
>> >>>> >>>
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >>
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>> >>>>>>>
>> >>>> >>>
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> >>> java.lang.Thread.run(Thread.java:722)
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> --
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>> --
>> >>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>> >>>> >>> >>>>>>> >>>> > >> > >>>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>>
>> >>>> >>> >>>>>>> >>>> > >> > >>
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >> > --
>> >>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> >>>> > >> > @eddieyoon
>> >>>> >>> >>>>>>> >>>> > >> >
>> >>>> >>> >>>>>>> >>>> > >>
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>> > --
>> >>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> >>>> > @eddieyoon
>> >>>> >>> >>>>>>> >>>> >
>> >>>> >>> >>>>>>> >>>>
>> >>>> >>> >>>>>>> >>
>> >>>> >>> >>>>>>> >>
>> >>>> >>> >>>>>>> >>
>> >>>> >>> >>>>>>> >> --
>> >>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> >> @eddieyoon
>> >>>> >>> >>>>>>> >
>> >>>> >>> >>>>>>> >
>> >>>> >>> >>>>>>> >
>> >>>> >>> >>>>>>> > --
>> >>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> > @eddieyoon
>> >>>> >>> >>>>>>>
>> >>>> >>> >>>>>>>
>> >>>> >>> >>>>>>>
>> >>>> >>> >>>>>>> --
>> >>>> >>> >>>>>>> Best Regards, Edward J. Yoon
>> >>>> >>> >>>>>>> @eddieyoon
>> >>>> >>> >>>>>>>
>> >>>> >>> >>>>>
>> >>>> >>> >>>>>
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> --
>> >>>> >>> >>>>> Best Regards, Edward J. Yoon
>> >>>> >>> >>>>> @eddieyoon
>> >>>> >>> >>>>
>> >>>> >>> >>>>
>> >>>> >>> >>>>
>> >>>> >>> >>>> --
>> >>>> >>> >>>> Best Regards, Edward J. Yoon
>> >>>> >>> >>>> @eddieyoon
>> >>>> >>> >>>
>> >>>> >>> >>>
>> >>>> >>> >>>
>> >>>> >>> >>> --
>> >>>> >>> >>> Best Regards, Edward J. Yoon
>> >>>> >>> >>> @eddieyoon
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >> --
>> >>>> >>> >> Best Regards, Edward J. Yoon
>> >>>> >>> >> @eddieyoon
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > --
>> >>>> >>> > Best Regards, Edward J. Yoon
>> >>>> >>> > @eddieyoon
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> --
>> >>>> >>> Best Regards, Edward J. Yoon
>> >>>> >>> @eddieyoon
>> >>>> >>>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Best Regards, Edward J. Yoon
>> >>>> > @eddieyoon
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best Regards, Edward J. Yoon
>> >>>> @eddieyoon
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by Suraj Menon <su...@apache.org>.

Three points:

Firstly, apologies because partly this conversation emanates from the delay
in providing the set of patches. I was not able to slice as much time I was
hoping.

Second, I think I/we can work on a separate branches. Since most of these
concerns could only be answered by future patches, a decision could be made
then. We can decide if svn revert is needed during the process on trunk.
(This is a general comment and not related to particular JIRA)

Third, Please feel free to slice a release if it is really important.

Thanks,
Suraj

On Thu, Mar 14, 2013 at 5:39 AM, Edward J. Yoon <ed...@apache.org>wrote:

> To reduce arguing, I'm appending my opinions.
>
> In HAMA-704, I wanted to remove only message map to reduce memory
> consumption. I still don't want to talk about disk-based vertices and
> Spilling Queue at the moment. With this, I wanted to release 0.6.1
> 'partitioning issue fixed and quick executable examples' version ASAP.
> That's why I scheduled Spilling Queue in 0.7 roadmap.
>
> As you can see, issues are happening one right after another. I don't
> think we have to clean all never-ending issues. We can improve
> step-by-step.
>
> 1. http://wiki.apache.org/hama/RoadMap
>
> On Thu, Mar 14, 2013 at 6:22 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Typos ;)
> >
> >> except YARN integration tasks. If you leave here, I have to take cover
> >> YARN tasks. Should I wait someone? Am I touching core module
> >
> > I have to cover YARN tasks instead of you.
> >
> > On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> >> Hmm, here's my opinions:
> >>
> >> As you know, we have a problem of lack of team members and
> >> contributors. So we should break down every tasks as small as
> >> possible. Our best action is improving step-by-step. And every
> >> Hama-x.x.x should run well even though it's a baby cart level.
> >>
> >> And, Tech should be developed under the necessity. So I think we need
> >> to cut release as often as possible. Therefore I volunteered to manage
> >> release. Actually, I was wanted to work only on QA (quality assurance)
> >> related tasks because yours code is better than me and I have a
> >> cluster.
> >>
> >> However, we are currently not doing like that. I guess there are many
> >> reasons. We're all not a full-time open sourcer (except me).
> >>
> >>> You have 23 issues assigned.  Why do you need to work on that?
> >>
> >> I don't know what you mean exactly. But 23 issues are almost examples
> >> except YARN integration tasks. If you leave here, I have to take cover
> >> YARN tasks. Should I wait someone? Am I touching core module
> >> aggressively?
> >>
> >>> Otherwise Suraj and I branch that issues away and you can play
> arround.l in
> >>> trunk how you like.
> >>
> >> I also don't know what you mean exactly but if you want, Please do.
> >>
> >> By the way, can you answer about this question - Is it really
> >> technical conflicts? or emotional conflicts?
> >>
> >> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
> >> <th...@gmail.com> wrote:
> >>> You have 23 issues assigned.  Why do you need to work on that?
> >>> Otherwise Suraj and I branch that issues away and you can play
> arround.l in
> >>> trunk how you like.
> >>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <ed...@apache.org>:
> >>>
> >>>> P.S., Please don't say like that.
> >>>>
> >>>> No decisions made yet. And if someone have a question or missed
> >>>> something, you have to try to explain here. Because this is a open
> >>>> source. Anyone can't say "don't touch trunk bc I'm working on it".
> >>>>
> >>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>>> wrote:
> >>>> > Sorry for my quick and dirty style small patches.
> >>>> >
> >>>> > However, we should work together in parallel. Please share here if
> >>>> > there are some progresses.
> >>>> >
> >>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
> >>>> > <th...@gmail.com> wrote:
> >>>> >> Hi Edward,
> >>>> >>
> >>>> >> before you run riot on all along the codebase, Suraj ist currently
> >>>> working
> >>>> >> on that stuff- don't make it more difficult for him rebasing all
> his
> >>>> >> patches the whole time.
> >>>> >> He has the plan so that we made to make the stuff working, his
> part is
> >>>> >> currently missing. So don't try to muddle arround there, it will
> make
> >>>> this
> >>>> >> take longer than already needed.
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
> >>>> >>
> >>>> >>> Personally, I would like to solve this issue by touching
> >>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
> >>>> >>> multiple files, we can avoid huge memory consumption.
> >>>> >>>
> >>>> >>> If we want to sort partitioned data using messaging system, idea
> >>>> >>> should be collected.
> >>>> >>>
> >>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
> >>>> edwardyoon@apache.org>
> >>>> >>> wrote:
> >>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
> >>>> >>> >
> >>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
> >>>> edwardyoon@apache.org>
> >>>> >>> wrote:
> >>>> >>> >> I'm reading changes of HAMA-704 again. As a result of adding
> >>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm
> not sure
> >>>> >>> >> but I think this approach will bring more disadvantages than
> >>>> >>> >> advantages.
> >>>> >>> >>
> >>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
> >>>> edwardyoon@apache.org>
> >>>> >>> wrote:
> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
> storage in
> >>>> >>> user space
> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
> writes.
> >>>> >>> This way
> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
> vertices
> >>>> >>> sorted
> >>>> >>> >>>>>> with a single read and single write on every peer.
> >>>> >>> >>>
> >>>> >>> >>> And, as I commented JIRA ticket, I think we can't use
> messaging
> >>>> system
> >>>> >>> >>> for sorting vertices within partition files.
> >>>> >>> >>>
> >>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
> >>>> >>> edwardyoon@apache.org> wrote:
> >>>> >>> >>>> P.S., (number of splits = number of partitions) is really
> confuse
> >>>> to
> >>>> >>> >>>> me. Even though blocks number is equal to desired tasks
> number,
> >>>> data
> >>>> >>> >>>> should be re-partitioned again.
> >>>> >>> >>>>
> >>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
> >>>> >>> edwardyoon@apache.org> wrote:
> >>>> >>> >>>>> Indeed. If there are already partitioned input files
> (unsorted)
> >>>> and
> >>>> >>> so
> >>>> >>> >>>>> user want to skip pre-partitioning phase, it should be
> handled in
> >>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
> >>>> >>> >>>>> re-partitioned files need to be Sorted. It's only about
> >>>> >>> >>>>> GraphJobRunner.
> >>>> >>> >>>>>
> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can
> have
> >>>> a
> >>>> >>> dedicated
> >>>> >>> >>>>>> partitioning superstep for graph applications).
> >>>> >>> >>>>>
> >>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
> >>>> >>> partitioning
> >>>> >>> >>>>> job based on superstep API?
> >>>> >>> >>>>>
> >>>> >>> >>>>> By default, 100 tasks will be assigned for partitioning job.
> >>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can
> >>>> execute
> >>>> >>> >>>>> the Graph job with 1,000 tasks.
> >>>> >>> >>>>>
> >>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100
> blocks). If
> >>>> I
> >>>> >>> >>>>> want to run with 1,000 tasks, what happens?
> >>>> >>> >>>>>
> >>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
> >>>> surajsmenon@apache.org>
> >>>> >>> wrote:
> >>>> >>> >>>>>> I am responding on this thread because of better
> continuity for
> >>>> >>> >>>>>> conversation. We cannot expect the partitions to be sorted
> every
> >>>> >>> time. When
> >>>> >>> >>>>>> the number of splits = number of partitions and
> partitioning is
> >>>> >>> switched
> >>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be sorted.
> Can
> >>>> we
> >>>> >>> do this
> >>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling
> storage in
> >>>> >>> user space
> >>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and
> writes.
> >>>> >>> This way
> >>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep
> vertices
> >>>> >>> sorted
> >>>> >>> >>>>>> with a single read and single write on every peer.
> >>>> >>> >>>>>>
> >>>> >>> >>>>>> Just clearing confusion if any regarding superstep
> injection for
> >>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can
> have
> >>>> a
> >>>> >>> dedicated
> >>>> >>> >>>>>> partitioning superstep for graph applications).
> >>>> >>> >>>>>> Say there are x splits and y number of tasks configured by
> user.
> >>>> >>> >>>>>>
> >>>> >>> >>>>>> if x > y
> >>>> >>> >>>>>> The y tasks are scheduled with x of them having each of
> the x
> >>>> >>> splits and
> >>>> >>> >>>>>> the remaining with no resource local to them. Then the
> >>>> partitioning
> >>>> >>> >>>>>> superstep redistributes the partitions among them to create
> >>>> local
> >>>> >>> >>>>>> partitions. Now the question is can we re-initialize a
> peer's
> >>>> input
> >>>> >>> based
> >>>> >>> >>>>>> on this new local part of partition?
> >>>> >>> >>>>>>
> >>>> >>> >>>>>> if y > x
> >>>> >>> >>>>>> works as it works today.
> >>>> >>> >>>>>>
> >>>> >>> >>>>>> Just putting my points in brainstorming.
> >>>> >>> >>>>>>
> >>>> >>> >>>>>> -Suraj
> >>>> >>> >>>>>>
> >>>> >>> >>>>>>
> >>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
> >>>> >>> edwardyoon@apache.org>wrote:
> >>>> >>> >>>>>>
> >>>> >>> >>>>>>> I just filed here
> >>>> https://issues.apache.org/jira/browse/HAMA-744
> >>>> >>> >>>>>>>
> >>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
> >>>> >>> edwardyoon@apache.org>
> >>>> >>> >>>>>>> wrote:
> >>>> >>> >>>>>>> > Additionally,
> >>>> >>> >>>>>>> >
> >>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we
> inject the
> >>>> >>> partitioning
> >>>> >>> >>>>>>> >> superstep as the first superstep and use local memory?
> >>>> >>> >>>>>>> >
> >>>> >>> >>>>>>> > Can we execute different number of tasks per superstep?
> >>>> >>> >>>>>>> >
> >>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
> >>>> >>> edwardyoon@apache.org>
> >>>> >>> >>>>>>> wrote:
> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
> result
> >>>> from
> >>>> >>> the
> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
> >>>> partition
> >>>> >>> files in
> >>>> >>> >>>>>>> >>
> >>>> >>> >>>>>>> >> I see.
> >>>> >>> >>>>>>> >>
> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
> superstep
> >>>> API,
> >>>> >>> Suraj's
> >>>> >>> >>>>>>> idea
> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
> partitions the
> >>>> >>> stuff into
> >>>> >>> >>>>>>> our
> >>>> >>> >>>>>>> >>> messaging system is actually the best.
> >>>> >>> >>>>>>> >>
> >>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
> >>>> partitioning
> >>>> >>> step,
> >>>> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is
> there
> >>>> some
> >>>> >>> special
> >>>> >>> >>>>>>> >> reason?
> >>>> >>> >>>>>>> >>
> >>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
> >>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
> >>>> >>> >>>>>>> >>> For graph processing, the partitioned files that
> result
> >>>> from
> >>>> >>> the
> >>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
> >>>> partition
> >>>> >>> files in
> >>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not
> sorted
> >>>> data
> >>>> >>> in the
> >>>> >>> >>>>>>> >>> completed file. This only applies for the graph
> processing
> >>>> >>> package.
> >>>> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to
> solve
> >>>> this
> >>>> >>> via
> >>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very very
> >>>> >>> scalable!). So the
> >>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a
> single
> >>>> >>> superstep in
> >>>> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging must
> be
> >>>> >>> sorted anyway
> >>>> >>> >>>>>>> for
> >>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and saves
> us
> >>>> the
> >>>> >>> >>>>>>> partitioning
> >>>> >>> >>>>>>> >>> job for graph processing.
> >>>> >>> >>>>>>> >>>
> >>>> >>> >>>>>>> >>> For other partitionings and with regard to our
> superstep
> >>>> API,
> >>>> >>> Suraj's
> >>>> >>> >>>>>>> idea
> >>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that
> partitions the
> >>>> >>> stuff into
> >>>> >>> >>>>>>> our
> >>>> >>> >>>>>>> >>> messaging system is actually the best.
> >>>> >>> >>>>>>> >>>
> >>>> >>> >>>>>>> >>>
> >>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
> >>>> >>> >>>>>>> >>>
> >>>> >>> >>>>>>> >>>> No, the partitions we write locally need not be
> sorted.
> >>>> Sorry
> >>>> >>> for the
> >>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible with
> >>>> Superstep
> >>>> >>> API.
> >>>> >>> >>>>>>> There
> >>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler after
> I
> >>>> last
> >>>> >>> worked on
> >>>> >>> >>>>>>> it.
> >>>> >>> >>>>>>> >>>> We can then look into partitioning superstep being
> >>>> executed
> >>>> >>> before the
> >>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I think
> it is
> >>>> >>> feasible.
> >>>> >>> >>>>>>> >>>>
> >>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
> >>>> >>> edwardyoon@apache.org
> >>>> >>> >>>>>>> >>>> >wrote:
> >>>> >>> >>>>>>> >>>>
> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
> >>>> inject
> >>>> >>> the
> >>>> >>> >>>>>>> >>>> partitioning
> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local
> memory?
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before calling
> >>>> >>> BSP.setup()
> >>>> >>> >>>>>>> method
> >>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my
> opinion,
> >>>> >>> current is
> >>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more
> experiences of
> >>>> >>> input
> >>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?!
> >>>> MR-like?
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
> >>>> >>> >>>>>>> surajsmenon@apache.org>
> >>>> >>> >>>>>>> >>>> > wrote:
> >>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside
> graph
> >>>> >>> module.
> >>>> >>> >>>>>>> When we
> >>>> >>> >>>>>>> >>>> > have
> >>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
> >>>> inject
> >>>> >>> the
> >>>> >>> >>>>>>> >>>> partitioning
> >>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local
> memory?
> >>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a job and
> are
> >>>> >>> creating two
> >>>> >>> >>>>>>> copies
> >>>> >>> >>>>>>> >>>> > of
> >>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it
> >>>> possible
> >>>> >>> to
> >>>> >>> >>>>>>> create or
> >>>> >>> >>>>>>> >>>> > > redistribute the partitions on local memory and
> >>>> >>> initialize the
> >>>> >>> >>>>>>> record
> >>>> >>> >>>>>>> >>>> > > reader there?
> >>>> >>> >>>>>>> >>>> > > The user can run a separate job give in examples
> area
> >>>> to
> >>>> >>> >>>>>>> explicitly
> >>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment
> question
> >>>> is
> >>>> >>> how much
> >>>> >>> >>>>>>> of
> >>>> >>> >>>>>>> >>>> disk
> >>>> >>> >>>>>>> >>>> > > space gets allocated for local memory usage?
> Would it
> >>>> be
> >>>> >>> a safe
> >>>> >>> >>>>>>> >>>> approach
> >>>> >>> >>>>>>> >>>> > > with the limitations?
> >>>> >>> >>>>>>> >>>> > >
> >>>> >>> >>>>>>> >>>> > > -Suraj
> >>>> >>> >>>>>>> >>>> > >
> >>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> >>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
> >>>> >>> >>>>>>> >>>> > >
> >>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files
> we can
> >>>> add
> >>>> >>> this to
> >>>> >>> >>>>>>> the
> >>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
> >>>> >>> >>>>>>> >>>> > >>
> >>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
> >
> >>>> >>> >>>>>>> >>>> > >>
> >>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really
> >>>> necessary
> >>>> >>> to be
> >>>> >>> >>>>>>> Sorted?
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas
> Jungblut
> >>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
> >>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works,
> obviously
> >>>> if
> >>>> >>> you merge
> >>>> >>> >>>>>>> n
> >>>> >>> >>>>>>> >>>> > sorted
> >>>> >>> >>>>>>> >>>> > >> > files
> >>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this will
> >>>> result in
> >>>> >>> totally
> >>>> >>> >>>>>>> >>>> > unsorted
> >>>> >>> >>>>>>> >>>> > >> > data
> >>>> >>> >>>>>>> >>>> > >> > > ;-)
> >>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
> >>>> >>> >>>>>>> >>>> > >> > >
> >>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
> >>>> thomas.jungblut@gmail.com
> >>>> >>> >
> >>>> >>> >>>>>>> >>>> > >> > >
> >>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
> >>>> >>> >>>>>>> >>>> > >> > >>
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
> >>>> >>> >>>>>>> >>>> > >> > >> ...
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
> >>>> >>> >>>>>>> >>>> > >> > >> ...
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
> >>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
> >>>> >>> >>>>>>> >>>> > >> > >>
> >>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
> >>>> >>> >>>>>>> >>>> > >> > >>
> >>>> >>> >>>>>>> >>>> > >> > >>
> >>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
> >>>> >>> thomas.jungblut@gmail.com>
> >>>> >>> >>>>>>> >>>> > >> > >>
> >>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
> >>>> edwardyoon@apache.org>
> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please
> do.
> >>>> >>> March 1 is
> >>>> >>> >>>>>>> >>>> > holiday[1]
> >>>> >>> >>>>>>> >>>> > >> so
> >>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> 1.
> >>>> >>> >>>>>>>
> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas
> >>>> Jungblut
> >>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
> >>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file,
> didn't
> >>>> >>> observe if all
> >>>> >>> >>>>>>> >>>> items
> >>>> >>> >>>>>>> >>>> > >> were
> >>>> >>> >>>>>>> >>>> > >> > >>>> added.
> >>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the
> >>>> logic
> >>>> >>> of the ID
> >>>> >>> >>>>>>> into
> >>>> >>> >>>>>>> >>>> > the
> >>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
> >>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
> >>>> >>> >>>>>>> >>>> > >> > >>>> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
> >>>> edwardyoon@apache.org
> >>>> >>> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when
> >>>> generate
> >>>> >>> adjacency
> >>>> >>> >>>>>>> >>>> matrix
> >>>> >>> >>>>>>> >>>> > >> into
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM,
> Thomas
> >>>> >>> Jungblut
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they
> partitioned
> >>>> >>> correctly?
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
> >>>> >>> edwardyoon@apache.org>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> :~/workspace/hama-trunk$
> >>>> ls
> >>>> >>> -al
> >>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096
>  2월 28
> >>>> >>> 18:03 .
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480
>  2월 28
> >>>> >>> 18:04 ..
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243
>  2월 28
> >>>> >>> 18:01
> >>>> >>> >>>>>>> part-00000
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28
>  2월 28
> >>>> >>> 18:01
> >>>> >>> >>>>>>> >>>> > .part-00000.crc
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251
>  2월 28
> >>>> >>> 18:01
> >>>> >>> >>>>>>> part-00001
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28
>  2월 28
> >>>> >>> 18:01
> >>>> >>> >>>>>>> >>>> > .part-00001.crc
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096
>  2월 28
> >>>> >>> 18:03
> >>>> >>> >>>>>>> partitions
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> :~/workspace/hama-trunk$
> >>>> ls
> >>>> >>> -al
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096
>  2월 28
> >>>> >>> 18:03 .
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096
>  2월 28
> >>>> >>> 18:03 ..
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932
>  2월 28
> >>>> 18:03
> >>>> >>> >>>>>>> part-00000
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
>  2월 28
> >>>> 18:03
> >>>> >>> >>>>>>> >>>> > .part-00000.crc
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955
>  2월 28
> >>>> 18:03
> >>>> >>> >>>>>>> part-00001
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32
>  2월 28
> >>>> 18:03
> >>>> >>> >>>>>>> >>>> > .part-00001.crc
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax
> :~/workspace/hama-trunk$
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM,
> Edward
> >>>> <
> >>>> >>> >>>>>>> >>>> edward@udanax.org
> >>>> >>> >>>>>>> >>>> > >
> >>>> >>> >>>>>>> >>>> > >> > wrote:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM,
> Thomas
> >>>> >>> Jungblut <
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation
> for me
> >>>> >>> please?
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from
> fastgen,
> >>>> >>> part-00000 and
> >>>> >>> >>>>>>> >>>> > >> part-00001,
> >>>> >>> >>>>>>> >>>> > >> > >>>> both
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition
> directory,
> >>>> there
> >>>> >>> is only a
> >>>> >>> >>>>>>> >>>> single
> >>>> >>> >>>>>>> >>>> > >> > 5.56kb
> >>>> >>> >>>>>>> >>>> > >> > >>>> file.
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the
> partitioner to
> >>>> >>> write a
> >>>> >>> >>>>>>> single
> >>>> >>> >>>>>>> >>>> > file
> >>>> >>> >>>>>>> >>>> > >> if
> >>>> >>> >>>>>>> >>>> > >> > you
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files,
> >>>> strange
> >>>> >>> huh?
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
> >>>> >>> >>>>>>> thomas.jungblut@gmail.com>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10
> /tmp/randomgraph
> >>>> 1
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
> >>>> /tmp/pageout
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
> >>>> >>> profiled, maybe
> >>>> >>> >>>>>>> the
> >>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the
> input
> >>>> or
> >>>> >>> something
> >>>> >>> >>>>>>> else.
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
> >>>> >>> edwardyoon@apache.org
> >>>> >>> >>>>>>> >
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work
> for
> >>>> graph
> >>>> >>> examples.
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >>>> >>> >>>>>>> >>>> > >> > jar
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
> >>>> >>> >>>>>>> >>>> > >> > fastgen
> >>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
> >>>> >>> util.NativeCodeLoader:
> >>>> >>> >>>>>>> Unable
> >>>> >>> >>>>>>> >>>> > to
> >>>> >>> >>>>>>> >>>> > >> > load
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
> >>>> >>> platform...
> >>>> >>> >>>>>>> using
> >>>> >>> >>>>>>> >>>> > >> > builtin-java
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> Running
> >>>> >>> >>>>>>> >>>> job:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> >>>> >>> bsp.LocalBSPRunner:
> >>>> >>> >>>>>>> Setting
> >>>> >>> >>>>>>> >>>> up
> >>>> >>> >>>>>>> >>>> > a
> >>>> >>> >>>>>>> >>>> > >> new
> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> Current
> >>>> >>> >>>>>>> >>>> > >> supersteps
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>>> >>> bsp.BSPJobClient: The
> >>>> >>> >>>>>>> total
> >>>> >>> >>>>>>> >>>> > number
> >>>> >>> >>>>>>> >>>> > >> > of
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> Counters: 3
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >>>> >>> >>>>>>> >>>> > >> > jar
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> hama-examples-0.7.0-SNAPSHOT.jar
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>>> >>> >>>>>>> :~/workspace/hama-trunk$
> >>>> >>> >>>>>>> >>>> > >> bin/hama
> >>>> >>> >>>>>>> >>>> > >> > jar
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> >>>> >>> >>>>>>> >>>> > pagerank
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
> >>>> >>> util.NativeCodeLoader:
> >>>> >>> >>>>>>> Unable
> >>>> >>> >>>>>>> >>>> > to
> >>>> >>> >>>>>>> >>>> > >> > load
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
> >>>> >>> platform...
> >>>> >>> >>>>>>> using
> >>>> >>> >>>>>>> >>>> > >> > builtin-java
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> >>>> >>> bsp.FileInputFormat:
> >>>> >>> >>>>>>> Total
> >>>> >>> >>>>>>> >>>> > input
> >>>> >>> >>>>>>> >>>> > >> > paths
> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> >>>> >>> bsp.FileInputFormat:
> >>>> >>> >>>>>>> Total
> >>>> >>> >>>>>>> >>>> > input
> >>>> >>> >>>>>>> >>>> > >> > paths
> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> Running
> >>>> >>> >>>>>>> >>>> job:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> >>>> >>> bsp.LocalBSPRunner:
> >>>> >>> >>>>>>> Setting
> >>>> >>> >>>>>>> >>>> up
> >>>> >>> >>>>>>> >>>> > a
> >>>> >>> >>>>>>> >>>> > >> new
> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> Current
> >>>> >>> >>>>>>> >>>> > >> supersteps
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient: The
> >>>> >>> >>>>>>> total
> >>>> >>> >>>>>>> >>>> > number
> >>>> >>> >>>>>>> >>>> > >> > of
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> Counters: 6
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.FileInputFormat:
> >>>> >>> >>>>>>> Total
> >>>> >>> >>>>>>> >>>> > input
> >>>> >>> >>>>>>> >>>> > >> > paths
> >>>> >>> >>>>>>> >>>> > >> > >>>> to
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.BSPJobClient:
> >>>> >>> >>>>>>> Running
> >>>> >>> >>>>>>> >>>> job:
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> bsp.LocalBSPRunner:
> >>>> >>> >>>>>>> Setting
> >>>> >>> >>>>>>> >>>> up
> >>>> >>> >>>>>>> >>>> > a
> >>>> >>> >>>>>>> >>>> > >> new
> >>>> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> graph.GraphJobRunner: 50
> >>>> >>> >>>>>>> >>>> > vertices
> >>>> >>> >>>>>>> >>>> > >> > are
> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>>> >>> graph.GraphJobRunner: 50
> >>>> >>> >>>>>>> >>>> > vertices
> >>>> >>> >>>>>>> >>>> > >> > are
> >>>> >>> >>>>>>> >>>> > >> > >>>> loaded
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
> >>>> >>> bsp.LocalBSPRunner:
> >>>> >>> >>>>>>> >>>> Exception
> >>>> >>> >>>>>>> >>>> > >> > during
> >>>> >>> >>>>>>> >>>> > >> > >>>> BSP
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> java.lang.IllegalArgumentException:
> >>>> >>> Messages
> >>>> >>> >>>>>>> must
> >>>> >>> >>>>>>> >>>> > never
> >>>> >>> >>>>>>> >>>> > >> be
> >>>> >>> >>>>>>> >>>> > >> > >>>> behind
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> the
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message
> ID: 1
> >>>> >>> vs. 50
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>>
> >>>> >>>
> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>>
> >>>> >>>
> >>>>
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>>
> >>>> >>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>>
> >>>> >>>
> >>>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >>
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>>
> >>>> >>> >>>>>>>
> >>>> >>>
> >>>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >>
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>>
> >>>> >>> >>>>>>>
> >>>> >>>
> >>>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>>
> >>>> >>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >>
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>>
> >>>> >>> >>>>>>>
> >>>> >>>
> >>>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >>
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>>
> >>>> >>> >>>>>>>
> >>>> >>>
> >>>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>> >>> java.lang.Thread.run(Thread.java:722)
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> --
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
> >>>> >>> >>>>>>> >>>> > >> > >>>> >>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>> --
> >>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
> >>>> >>> >>>>>>> >>>> > >> > >>>>
> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>>> >>> >>>>>>> >>>> > >> > >>>
> >>>> >>> >>>>>>> >>>> > >> > >>
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >> > --
> >>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> >>>> > >> > @eddieyoon
> >>>> >>> >>>>>>> >>>> > >> >
> >>>> >>> >>>>>>> >>>> > >>
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>> > --
> >>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> >>>> > @eddieyoon
> >>>> >>> >>>>>>> >>>> >
> >>>> >>> >>>>>>> >>>>
> >>>> >>> >>>>>>> >>
> >>>> >>> >>>>>>> >>
> >>>> >>> >>>>>>> >>
> >>>> >>> >>>>>>> >> --
> >>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> >> @eddieyoon
> >>>> >>> >>>>>>> >
> >>>> >>> >>>>>>> >
> >>>> >>> >>>>>>> >
> >>>> >>> >>>>>>> > --
> >>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> > @eddieyoon
> >>>> >>> >>>>>>>
> >>>> >>> >>>>>>>
> >>>> >>> >>>>>>>
> >>>> >>> >>>>>>> --
> >>>> >>> >>>>>>> Best Regards, Edward J. Yoon
> >>>> >>> >>>>>>> @eddieyoon
> >>>> >>> >>>>>>>
> >>>> >>> >>>>>
> >>>> >>> >>>>>
> >>>> >>> >>>>>
> >>>> >>> >>>>> --
> >>>> >>> >>>>> Best Regards, Edward J. Yoon
> >>>> >>> >>>>> @eddieyoon
> >>>> >>> >>>>
> >>>> >>> >>>>
> >>>> >>> >>>>
> >>>> >>> >>>> --
> >>>> >>> >>>> Best Regards, Edward J. Yoon
> >>>> >>> >>>> @eddieyoon
> >>>> >>> >>>
> >>>> >>> >>>
> >>>> >>> >>>
> >>>> >>> >>> --
> >>>> >>> >>> Best Regards, Edward J. Yoon
> >>>> >>> >>> @eddieyoon
> >>>> >>> >>
> >>>> >>> >>
> >>>> >>> >>
> >>>> >>> >> --
> >>>> >>> >> Best Regards, Edward J. Yoon
> >>>> >>> >> @eddieyoon
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >
> >>>> >>> > --
> >>>> >>> > Best Regards, Edward J. Yoon
> >>>> >>> > @eddieyoon
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>> --
> >>>> >>> Best Regards, Edward J. Yoon
> >>>> >>> @eddieyoon
> >>>> >>>
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Best Regards, Edward J. Yoon
> >>>> > @eddieyoon
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>> @eddieyoon
> >>>>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

To reduce arguing, I'm appending my opinions.

In HAMA-704, I wanted to remove only message map to reduce memory
consumption. I still don't want to talk about disk-based vertices and
Spilling Queue at the moment. With this, I wanted to release 0.6.1
'partitioning issue fixed and quick executable examples' version ASAP.
That's why I scheduled Spilling Queue in 0.7 roadmap.

As you can see, issues are happening one right after another. I don't
think we have to clean all never-ending issues. We can improve
step-by-step.

1. http://wiki.apache.org/hama/RoadMap

On Thu, Mar 14, 2013 at 6:22 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Typos ;)
>
>> except YARN integration tasks. If you leave here, I have to take cover
>> YARN tasks. Should I wait someone? Am I touching core module
>
> I have to cover YARN tasks instead of you.
>
> On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <ed...@apache.org> wrote:
>> Hmm, here's my opinions:
>>
>> As you know, we have a problem of lack of team members and
>> contributors. So we should break down every tasks as small as
>> possible. Our best action is improving step-by-step. And every
>> Hama-x.x.x should run well even though it's a baby cart level.
>>
>> And, Tech should be developed under the necessity. So I think we need
>> to cut release as often as possible. Therefore I volunteered to manage
>> release. Actually, I was wanted to work only on QA (quality assurance)
>> related tasks because yours code is better than me and I have a
>> cluster.
>>
>> However, we are currently not doing like that. I guess there are many
>> reasons. We're all not a full-time open sourcer (except me).
>>
>>> You have 23 issues assigned.  Why do you need to work on that?
>>
>> I don't know what you mean exactly. But 23 issues are almost examples
>> except YARN integration tasks. If you leave here, I have to take cover
>> YARN tasks. Should I wait someone? Am I touching core module
>> aggressively?
>>
>>> Otherwise Suraj and I branch that issues away and you can play arround.l in
>>> trunk how you like.
>>
>> I also don't know what you mean exactly but if you want, Please do.
>>
>> By the way, can you answer about this question - Is it really
>> technical conflicts? or emotional conflicts?
>>
>> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
>> <th...@gmail.com> wrote:
>>> You have 23 issues assigned.  Why do you need to work on that?
>>> Otherwise Suraj and I branch that issues away and you can play arround.l in
>>> trunk how you like.
>>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <ed...@apache.org>:
>>>
>>>> P.S., Please don't say like that.
>>>>
>>>> No decisions made yet. And if someone have a question or missed
>>>> something, you have to try to explain here. Because this is a open
>>>> source. Anyone can't say "don't touch trunk bc I'm working on it".
>>>>
>>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <ed...@apache.org>
>>>> wrote:
>>>> > Sorry for my quick and dirty style small patches.
>>>> >
>>>> > However, we should work together in parallel. Please share here if
>>>> > there are some progresses.
>>>> >
>>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
>>>> > <th...@gmail.com> wrote:
>>>> >> Hi Edward,
>>>> >>
>>>> >> before you run riot on all along the codebase, Suraj ist currently
>>>> working
>>>> >> on that stuff- don't make it more difficult for him rebasing all his
>>>> >> patches the whole time.
>>>> >> He has the plan so that we made to make the stuff working, his part is
>>>> >> currently missing. So don't try to muddle arround there, it will make
>>>> this
>>>> >> take longer than already needed.
>>>> >>
>>>> >>
>>>> >>
>>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>>>> >>
>>>> >>> Personally, I would like to solve this issue by touching
>>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
>>>> >>> multiple files, we can avoid huge memory consumption.
>>>> >>>
>>>> >>> If we want to sort partitioned data using messaging system, idea
>>>> >>> should be collected.
>>>> >>>
>>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
>>>> edwardyoon@apache.org>
>>>> >>> wrote:
>>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
>>>> >>> >
>>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
>>>> edwardyoon@apache.org>
>>>> >>> wrote:
>>>> >>> >> I'm reading changes of HAMA-704 again. As a result of adding
>>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
>>>> >>> >> but I think this approach will bring more disadvantages than
>>>> >>> >> advantages.
>>>> >>> >>
>>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
>>>> edwardyoon@apache.org>
>>>> >>> wrote:
>>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>>>> >>> user space
>>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>>>> >>> This way
>>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>>>> >>> sorted
>>>> >>> >>>>>> with a single read and single write on every peer.
>>>> >>> >>>
>>>> >>> >>> And, as I commented JIRA ticket, I think we can't use messaging
>>>> system
>>>> >>> >>> for sorting vertices within partition files.
>>>> >>> >>>
>>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>>>> >>> edwardyoon@apache.org> wrote:
>>>> >>> >>>> P.S., (number of splits = number of partitions) is really confuse
>>>> to
>>>> >>> >>>> me. Even though blocks number is equal to desired tasks number,
>>>> data
>>>> >>> >>>> should be re-partitioned again.
>>>> >>> >>>>
>>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>>>> >>> edwardyoon@apache.org> wrote:
>>>> >>> >>>>> Indeed. If there are already partitioned input files (unsorted)
>>>> and
>>>> >>> so
>>>> >>> >>>>> user want to skip pre-partitioning phase, it should be handled in
>>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
>>>> >>> >>>>> re-partitioned files need to be Sorted. It's only about
>>>> >>> >>>>> GraphJobRunner.
>>>> >>> >>>>>
>>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can have
>>>> a
>>>> >>> dedicated
>>>> >>> >>>>>> partitioning superstep for graph applications).
>>>> >>> >>>>>
>>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
>>>> >>> partitioning
>>>> >>> >>>>> job based on superstep API?
>>>> >>> >>>>>
>>>> >>> >>>>> By default, 100 tasks will be assigned for partitioning job.
>>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can
>>>> execute
>>>> >>> >>>>> the Graph job with 1,000 tasks.
>>>> >>> >>>>>
>>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100 blocks). If
>>>> I
>>>> >>> >>>>> want to run with 1,000 tasks, what happens?
>>>> >>> >>>>>
>>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
>>>> surajsmenon@apache.org>
>>>> >>> wrote:
>>>> >>> >>>>>> I am responding on this thread because of better continuity for
>>>> >>> >>>>>> conversation. We cannot expect the partitions to be sorted every
>>>> >>> time. When
>>>> >>> >>>>>> the number of splits = number of partitions and partitioning is
>>>> >>> switched
>>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be sorted. Can
>>>> we
>>>> >>> do this
>>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>>>> >>> user space
>>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>>>> >>> This way
>>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>>>> >>> sorted
>>>> >>> >>>>>> with a single read and single write on every peer.
>>>> >>> >>>>>>
>>>> >>> >>>>>> Just clearing confusion if any regarding superstep injection for
>>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can have
>>>> a
>>>> >>> dedicated
>>>> >>> >>>>>> partitioning superstep for graph applications).
>>>> >>> >>>>>> Say there are x splits and y number of tasks configured by user.
>>>> >>> >>>>>>
>>>> >>> >>>>>> if x > y
>>>> >>> >>>>>> The y tasks are scheduled with x of them having each of the x
>>>> >>> splits and
>>>> >>> >>>>>> the remaining with no resource local to them. Then the
>>>> partitioning
>>>> >>> >>>>>> superstep redistributes the partitions among them to create
>>>> local
>>>> >>> >>>>>> partitions. Now the question is can we re-initialize a peer's
>>>> input
>>>> >>> based
>>>> >>> >>>>>> on this new local part of partition?
>>>> >>> >>>>>>
>>>> >>> >>>>>> if y > x
>>>> >>> >>>>>> works as it works today.
>>>> >>> >>>>>>
>>>> >>> >>>>>> Just putting my points in brainstorming.
>>>> >>> >>>>>>
>>>> >>> >>>>>> -Suraj
>>>> >>> >>>>>>
>>>> >>> >>>>>>
>>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>>>> >>> edwardyoon@apache.org>wrote:
>>>> >>> >>>>>>
>>>> >>> >>>>>>> I just filed here
>>>> https://issues.apache.org/jira/browse/HAMA-744
>>>> >>> >>>>>>>
>>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>>>> >>> edwardyoon@apache.org>
>>>> >>> >>>>>>> wrote:
>>>> >>> >>>>>>> > Additionally,
>>>> >>> >>>>>>> >
>>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we inject the
>>>> >>> partitioning
>>>> >>> >>>>>>> >> superstep as the first superstep and use local memory?
>>>> >>> >>>>>>> >
>>>> >>> >>>>>>> > Can we execute different number of tasks per superstep?
>>>> >>> >>>>>>> >
>>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>>>> >>> edwardyoon@apache.org>
>>>> >>> >>>>>>> wrote:
>>>> >>> >>>>>>> >>> For graph processing, the partitioned files that result
>>>> from
>>>> >>> the
>>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>>>> partition
>>>> >>> files in
>>>> >>> >>>>>>> >>
>>>> >>> >>>>>>> >> I see.
>>>> >>> >>>>>>> >>
>>>> >>> >>>>>>> >>> For other partitionings and with regard to our superstep
>>>> API,
>>>> >>> Suraj's
>>>> >>> >>>>>>> idea
>>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>>>> >>> stuff into
>>>> >>> >>>>>>> our
>>>> >>> >>>>>>> >>> messaging system is actually the best.
>>>> >>> >>>>>>> >>
>>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
>>>> partitioning
>>>> >>> step,
>>>> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is there
>>>> some
>>>> >>> special
>>>> >>> >>>>>>> >> reason?
>>>> >>> >>>>>>> >>
>>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
>>>> >>> >>>>>>> >>> For graph processing, the partitioned files that result
>>>> from
>>>> >>> the
>>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>>>> partition
>>>> >>> files in
>>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not sorted
>>>> data
>>>> >>> in the
>>>> >>> >>>>>>> >>> completed file. This only applies for the graph processing
>>>> >>> package.
>>>> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to solve
>>>> this
>>>> >>> via
>>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very very
>>>> >>> scalable!). So the
>>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single
>>>> >>> superstep in
>>>> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging must be
>>>> >>> sorted anyway
>>>> >>> >>>>>>> for
>>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and saves us
>>>> the
>>>> >>> >>>>>>> partitioning
>>>> >>> >>>>>>> >>> job for graph processing.
>>>> >>> >>>>>>> >>>
>>>> >>> >>>>>>> >>> For other partitionings and with regard to our superstep
>>>> API,
>>>> >>> Suraj's
>>>> >>> >>>>>>> idea
>>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>>>> >>> stuff into
>>>> >>> >>>>>>> our
>>>> >>> >>>>>>> >>> messaging system is actually the best.
>>>> >>> >>>>>>> >>>
>>>> >>> >>>>>>> >>>
>>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>>> >>> >>>>>>> >>>
>>>> >>> >>>>>>> >>>> No, the partitions we write locally need not be sorted.
>>>> Sorry
>>>> >>> for the
>>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible with
>>>> Superstep
>>>> >>> API.
>>>> >>> >>>>>>> There
>>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler after I
>>>> last
>>>> >>> worked on
>>>> >>> >>>>>>> it.
>>>> >>> >>>>>>> >>>> We can then look into partitioning superstep being
>>>> executed
>>>> >>> before the
>>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I think it is
>>>> >>> feasible.
>>>> >>> >>>>>>> >>>>
>>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
>>>> >>> edwardyoon@apache.org
>>>> >>> >>>>>>> >>>> >wrote:
>>>> >>> >>>>>>> >>>>
>>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>>>> inject
>>>> >>> the
>>>> >>> >>>>>>> >>>> partitioning
>>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before calling
>>>> >>> BSP.setup()
>>>> >>> >>>>>>> method
>>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion,
>>>> >>> current is
>>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more experiences of
>>>> >>> input
>>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?!
>>>> MR-like?
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>>> >>> >>>>>>> surajsmenon@apache.org>
>>>> >>> >>>>>>> >>>> > wrote:
>>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph
>>>> >>> module.
>>>> >>> >>>>>>> When we
>>>> >>> >>>>>>> >>>> > have
>>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>>>> inject
>>>> >>> the
>>>> >>> >>>>>>> >>>> partitioning
>>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a job and are
>>>> >>> creating two
>>>> >>> >>>>>>> copies
>>>> >>> >>>>>>> >>>> > of
>>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it
>>>> possible
>>>> >>> to
>>>> >>> >>>>>>> create or
>>>> >>> >>>>>>> >>>> > > redistribute the partitions on local memory and
>>>> >>> initialize the
>>>> >>> >>>>>>> record
>>>> >>> >>>>>>> >>>> > > reader there?
>>>> >>> >>>>>>> >>>> > > The user can run a separate job give in examples area
>>>> to
>>>> >>> >>>>>>> explicitly
>>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment question
>>>> is
>>>> >>> how much
>>>> >>> >>>>>>> of
>>>> >>> >>>>>>> >>>> disk
>>>> >>> >>>>>>> >>>> > > space gets allocated for local memory usage? Would it
>>>> be
>>>> >>> a safe
>>>> >>> >>>>>>> >>>> approach
>>>> >>> >>>>>>> >>>> > > with the limitations?
>>>> >>> >>>>>>> >>>> > >
>>>> >>> >>>>>>> >>>> > > -Suraj
>>>> >>> >>>>>>> >>>> > >
>>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>>>> >>> >>>>>>> >>>> > >
>>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can
>>>> add
>>>> >>> this to
>>>> >>> >>>>>>> the
>>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
>>>> >>> >>>>>>> >>>> > >>
>>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> >>> >>>>>>> >>>> > >>
>>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really
>>>> necessary
>>>> >>> to be
>>>> >>> >>>>>>> Sorted?
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously
>>>> if
>>>> >>> you merge
>>>> >>> >>>>>>> n
>>>> >>> >>>>>>> >>>> > sorted
>>>> >>> >>>>>>> >>>> > >> > files
>>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this will
>>>> result in
>>>> >>> totally
>>>> >>> >>>>>>> >>>> > unsorted
>>>> >>> >>>>>>> >>>> > >> > data
>>>> >>> >>>>>>> >>>> > >> > > ;-)
>>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>>>> >>> >>>>>>> >>>> > >> > >
>>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
>>>> thomas.jungblut@gmail.com
>>>> >>> >
>>>> >>> >>>>>>> >>>> > >> > >
>>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>>> >>> >>>>>>> >>>> > >> > >>
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
>>>> >>> >>>>>>> >>>> > >> > >> ...
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
>>>> >>> >>>>>>> >>>> > >> > >> ...
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
>>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
>>>> >>> >>>>>>> >>>> > >> > >>
>>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>>>> >>> >>>>>>> >>>> > >> > >>
>>>> >>> >>>>>>> >>>> > >> > >>
>>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>>>> >>> thomas.jungblut@gmail.com>
>>>> >>> >>>>>>> >>>> > >> > >>
>>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>>>> >>> >>>>>>> >>>> > >> > >>>
>>>> >>> >>>>>>> >>>> > >> > >>>
>>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
>>>> edwardyoon@apache.org>
>>>> >>> >>>>>>> >>>> > >> > >>>
>>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do.
>>>> >>> March 1 is
>>>> >>> >>>>>>> >>>> > holiday[1]
>>>> >>> >>>>>>> >>>> > >> so
>>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> 1.
>>>> >>> >>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas
>>>> Jungblut
>>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't
>>>> >>> observe if all
>>>> >>> >>>>>>> >>>> items
>>>> >>> >>>>>>> >>>> > >> were
>>>> >>> >>>>>>> >>>> > >> > >>>> added.
>>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the
>>>> logic
>>>> >>> of the ID
>>>> >>> >>>>>>> into
>>>> >>> >>>>>>> >>>> > the
>>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
>>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>>>> >>> >>>>>>> >>>> > >> > >>>> >
>>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
>>>> edwardyoon@apache.org
>>>> >>> >
>>>> >>> >>>>>>> >>>> > >> > >>>> >
>>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when
>>>> generate
>>>> >>> adjacency
>>>> >>> >>>>>>> >>>> matrix
>>>> >>> >>>>>>> >>>> > >> into
>>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas
>>>> >>> Jungblut
>>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned
>>>> >>> correctly?
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>>>> >>> edwardyoon@apache.org>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>>> ls
>>>> >>> -al
>>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28
>>>> >>> 18:03 .
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28
>>>> >>> 18:04 ..
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28
>>>> >>> 18:01
>>>> >>> >>>>>>> part-00000
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>>>> >>> 18:01
>>>> >>> >>>>>>> >>>> > .part-00000.crc
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28
>>>> >>> 18:01
>>>> >>> >>>>>>> part-00001
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>>>> >>> 18:01
>>>> >>> >>>>>>> >>>> > .part-00001.crc
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28
>>>> >>> 18:03
>>>> >>> >>>>>>> partitions
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>>> ls
>>>> >>> -al
>>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28
>>>> >>> 18:03 .
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28
>>>> >>> 18:03 ..
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28
>>>> 18:03
>>>> >>> >>>>>>> part-00000
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
>>>> 18:03
>>>> >>> >>>>>>> >>>> > .part-00000.crc
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28
>>>> 18:03
>>>> >>> >>>>>>> part-00001
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
>>>> 18:03
>>>> >>> >>>>>>> >>>> > .part-00001.crc
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward
>>>> <
>>>> >>> >>>>>>> >>>> edward@udanax.org
>>>> >>> >>>>>>> >>>> > >
>>>> >>> >>>>>>> >>>> > >> > wrote:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
>>>> >>> Jungblut <
>>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me
>>>> >>> please?
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen,
>>>> >>> part-00000 and
>>>> >>> >>>>>>> >>>> > >> part-00001,
>>>> >>> >>>>>>> >>>> > >> > >>>> both
>>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory,
>>>> there
>>>> >>> is only a
>>>> >>> >>>>>>> >>>> single
>>>> >>> >>>>>>> >>>> > >> > 5.56kb
>>>> >>> >>>>>>> >>>> > >> > >>>> file.
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to
>>>> >>> write a
>>>> >>> >>>>>>> single
>>>> >>> >>>>>>> >>>> > file
>>>> >>> >>>>>>> >>>> > >> if
>>>> >>> >>>>>>> >>>> > >> > you
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files,
>>>> strange
>>>> >>> huh?
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>>> >>> >>>>>>> thomas.jungblut@gmail.com>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph
>>>> 1
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
>>>> /tmp/pageout
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
>>>> >>> profiled, maybe
>>>> >>> >>>>>>> the
>>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input
>>>> or
>>>> >>> something
>>>> >>> >>>>>>> else.
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
>>>> >>> edwardyoon@apache.org
>>>> >>> >>>>>>> >
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for
>>>> graph
>>>> >>> examples.
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>>> >>> >>>>>>> >>>> > >> bin/hama
>>>> >>> >>>>>>> >>>> > >> > jar
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>>> >>> >>>>>>> >>>> > >> > fastgen
>>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>>>> >>> util.NativeCodeLoader:
>>>> >>> >>>>>>> Unable
>>>> >>> >>>>>>> >>>> > to
>>>> >>> >>>>>>> >>>> > >> > load
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>>>> >>> platform...
>>>> >>> >>>>>>> using
>>>> >>> >>>>>>> >>>> > >> > builtin-java
>>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> Running
>>>> >>> >>>>>>> >>>> job:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>>>> >>> bsp.LocalBSPRunner:
>>>> >>> >>>>>>> Setting
>>>> >>> >>>>>>> >>>> up
>>>> >>> >>>>>>> >>>> > a
>>>> >>> >>>>>>> >>>> > >> new
>>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> Current
>>>> >>> >>>>>>> >>>> > >> supersteps
>>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>>> >>> bsp.BSPJobClient: The
>>>> >>> >>>>>>> total
>>>> >>> >>>>>>> >>>> > number
>>>> >>> >>>>>>> >>>> > >> > of
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> Counters: 3
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>>> >>> >>>>>>> >>>> > >> bin/hama
>>>> >>> >>>>>>> >>>> > >> > jar
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>>> >>> >>>>>>> >>>> > >> bin/hama
>>>> >>> >>>>>>> >>>> > >> > jar
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>>> >>> >>>>>>> >>>> > pagerank
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>>>> >>> util.NativeCodeLoader:
>>>> >>> >>>>>>> Unable
>>>> >>> >>>>>>> >>>> > to
>>>> >>> >>>>>>> >>>> > >> > load
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>>>> >>> platform...
>>>> >>> >>>>>>> using
>>>> >>> >>>>>>> >>>> > >> > builtin-java
>>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>>>> >>> bsp.FileInputFormat:
>>>> >>> >>>>>>> Total
>>>> >>> >>>>>>> >>>> > input
>>>> >>> >>>>>>> >>>> > >> > paths
>>>> >>> >>>>>>> >>>> > >> > >>>> to
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>>>> >>> bsp.FileInputFormat:
>>>> >>> >>>>>>> Total
>>>> >>> >>>>>>> >>>> > input
>>>> >>> >>>>>>> >>>> > >> > paths
>>>> >>> >>>>>>> >>>> > >> > >>>> to
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> Running
>>>> >>> >>>>>>> >>>> job:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>>>> >>> bsp.LocalBSPRunner:
>>>> >>> >>>>>>> Setting
>>>> >>> >>>>>>> >>>> up
>>>> >>> >>>>>>> >>>> > a
>>>> >>> >>>>>>> >>>> > >> new
>>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> Current
>>>> >>> >>>>>>> >>>> > >> supersteps
>>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient: The
>>>> >>> >>>>>>> total
>>>> >>> >>>>>>> >>>> > number
>>>> >>> >>>>>>> >>>> > >> > of
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> Counters: 6
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.FileInputFormat:
>>>> >>> >>>>>>> Total
>>>> >>> >>>>>>> >>>> > input
>>>> >>> >>>>>>> >>>> > >> > paths
>>>> >>> >>>>>>> >>>> > >> > >>>> to
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.BSPJobClient:
>>>> >>> >>>>>>> Running
>>>> >>> >>>>>>> >>>> job:
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> bsp.LocalBSPRunner:
>>>> >>> >>>>>>> Setting
>>>> >>> >>>>>>> >>>> up
>>>> >>> >>>>>>> >>>> > a
>>>> >>> >>>>>>> >>>> > >> new
>>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> graph.GraphJobRunner: 50
>>>> >>> >>>>>>> >>>> > vertices
>>>> >>> >>>>>>> >>>> > >> > are
>>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>>> >>> graph.GraphJobRunner: 50
>>>> >>> >>>>>>> >>>> > vertices
>>>> >>> >>>>>>> >>>> > >> > are
>>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>>>> >>> bsp.LocalBSPRunner:
>>>> >>> >>>>>>> >>>> Exception
>>>> >>> >>>>>>> >>>> > >> > during
>>>> >>> >>>>>>> >>>> > >> > >>>> BSP
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
>>>> >>> Messages
>>>> >>> >>>>>>> must
>>>> >>> >>>>>>> >>>> > never
>>>> >>> >>>>>>> >>>> > >> be
>>>> >>> >>>>>>> >>>> > >> > >>>> behind
>>>> >>> >>>>>>> >>>> > >> > >>>> >> the
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1
>>>> >>> vs. 50
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>>
>>>> >>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>>
>>>> >>>
>>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>>
>>>> >>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>>
>>>> >>>
>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >>
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>>
>>>> >>> >>>>>>>
>>>> >>>
>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >>
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>>
>>>> >>> >>>>>>>
>>>> >>>
>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> >
>>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>>
>>>> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> >
>>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >>
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>>
>>>> >>> >>>>>>>
>>>> >>>
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >>
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>>
>>>> >>> >>>>>>>
>>>> >>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>> java.lang.Thread.run(Thread.java:722)
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>> >> --
>>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>> --
>>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>>>> >>> >>>>>>> >>>> > >> > >>>>
>>>> >>> >>>>>>> >>>> > >> > >>>
>>>> >>> >>>>>>> >>>> > >> > >>>
>>>> >>> >>>>>>> >>>> > >> > >>
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >> > --
>>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> >>>> > >> > @eddieyoon
>>>> >>> >>>>>>> >>>> > >> >
>>>> >>> >>>>>>> >>>> > >>
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>> > --
>>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> >>>> > @eddieyoon
>>>> >>> >>>>>>> >>>> >
>>>> >>> >>>>>>> >>>>
>>>> >>> >>>>>>> >>
>>>> >>> >>>>>>> >>
>>>> >>> >>>>>>> >>
>>>> >>> >>>>>>> >> --
>>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> >> @eddieyoon
>>>> >>> >>>>>>> >
>>>> >>> >>>>>>> >
>>>> >>> >>>>>>> >
>>>> >>> >>>>>>> > --
>>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> > @eddieyoon
>>>> >>> >>>>>>>
>>>> >>> >>>>>>>
>>>> >>> >>>>>>>
>>>> >>> >>>>>>> --
>>>> >>> >>>>>>> Best Regards, Edward J. Yoon
>>>> >>> >>>>>>> @eddieyoon
>>>> >>> >>>>>>>
>>>> >>> >>>>>
>>>> >>> >>>>>
>>>> >>> >>>>>
>>>> >>> >>>>> --
>>>> >>> >>>>> Best Regards, Edward J. Yoon
>>>> >>> >>>>> @eddieyoon
>>>> >>> >>>>
>>>> >>> >>>>
>>>> >>> >>>>
>>>> >>> >>>> --
>>>> >>> >>>> Best Regards, Edward J. Yoon
>>>> >>> >>>> @eddieyoon
>>>> >>> >>>
>>>> >>> >>>
>>>> >>> >>>
>>>> >>> >>> --
>>>> >>> >>> Best Regards, Edward J. Yoon
>>>> >>> >>> @eddieyoon
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> --
>>>> >>> >> Best Regards, Edward J. Yoon
>>>> >>> >> @eddieyoon
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > --
>>>> >>> > Best Regards, Edward J. Yoon
>>>> >>> > @eddieyoon
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Best Regards, Edward J. Yoon
>>>> >>> @eddieyoon
>>>> >>>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best Regards, Edward J. Yoon
>>>> > @eddieyoon
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

Typos ;)

> except YARN integration tasks. If you leave here, I have to take cover
> YARN tasks. Should I wait someone? Am I touching core module

I have to cover YARN tasks instead of you.

On Thu, Mar 14, 2013 at 6:12 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Hmm, here's my opinions:
>
> As you know, we have a problem of lack of team members and
> contributors. So we should break down every tasks as small as
> possible. Our best action is improving step-by-step. And every
> Hama-x.x.x should run well even though it's a baby cart level.
>
> And, Tech should be developed under the necessity. So I think we need
> to cut release as often as possible. Therefore I volunteered to manage
> release. Actually, I was wanted to work only on QA (quality assurance)
> related tasks because yours code is better than me and I have a
> cluster.
>
> However, we are currently not doing like that. I guess there are many
> reasons. We're all not a full-time open sourcer (except me).
>
>> You have 23 issues assigned.  Why do you need to work on that?
>
> I don't know what you mean exactly. But 23 issues are almost examples
> except YARN integration tasks. If you leave here, I have to take cover
> YARN tasks. Should I wait someone? Am I touching core module
> aggressively?
>
>> Otherwise Suraj and I branch that issues away and you can play arround.l in
>> trunk how you like.
>
> I also don't know what you mean exactly but if you want, Please do.
>
> By the way, can you answer about this question - Is it really
> technical conflicts? or emotional conflicts?
>
> On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
> <th...@gmail.com> wrote:
>> You have 23 issues assigned.  Why do you need to work on that?
>> Otherwise Suraj and I branch that issues away and you can play arround.l in
>> trunk how you like.
>> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <ed...@apache.org>:
>>
>>> P.S., Please don't say like that.
>>>
>>> No decisions made yet. And if someone have a question or missed
>>> something, you have to try to explain here. Because this is a open
>>> source. Anyone can't say "don't touch trunk bc I'm working on it".
>>>
>>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> > Sorry for my quick and dirty style small patches.
>>> >
>>> > However, we should work together in parallel. Please share here if
>>> > there are some progresses.
>>> >
>>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
>>> > <th...@gmail.com> wrote:
>>> >> Hi Edward,
>>> >>
>>> >> before you run riot on all along the codebase, Suraj ist currently
>>> working
>>> >> on that stuff- don't make it more difficult for him rebasing all his
>>> >> patches the whole time.
>>> >> He has the plan so that we made to make the stuff working, his part is
>>> >> currently missing. So don't try to muddle arround there, it will make
>>> this
>>> >> take longer than already needed.
>>> >>
>>> >>
>>> >>
>>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>>> >>
>>> >>> Personally, I would like to solve this issue by touching
>>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
>>> >>> multiple files, we can avoid huge memory consumption.
>>> >>>
>>> >>> If we want to sort partitioned data using messaging system, idea
>>> >>> should be collected.
>>> >>>
>>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>> wrote:
>>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
>>> >>> >
>>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>> wrote:
>>> >>> >> I'm reading changes of HAMA-704 again. As a result of adding
>>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
>>> >>> >> but I think this approach will bring more disadvantages than
>>> >>> >> advantages.
>>> >>> >>
>>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>> wrote:
>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>>> >>> user space
>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>>> >>> This way
>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>>> >>> sorted
>>> >>> >>>>>> with a single read and single write on every peer.
>>> >>> >>>
>>> >>> >>> And, as I commented JIRA ticket, I think we can't use messaging
>>> system
>>> >>> >>> for sorting vertices within partition files.
>>> >>> >>>
>>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>>> >>> edwardyoon@apache.org> wrote:
>>> >>> >>>> P.S., (number of splits = number of partitions) is really confuse
>>> to
>>> >>> >>>> me. Even though blocks number is equal to desired tasks number,
>>> data
>>> >>> >>>> should be re-partitioned again.
>>> >>> >>>>
>>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>>> >>> edwardyoon@apache.org> wrote:
>>> >>> >>>>> Indeed. If there are already partitioned input files (unsorted)
>>> and
>>> >>> so
>>> >>> >>>>> user want to skip pre-partitioning phase, it should be handled in
>>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
>>> >>> >>>>> re-partitioned files need to be Sorted. It's only about
>>> >>> >>>>> GraphJobRunner.
>>> >>> >>>>>
>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can have
>>> a
>>> >>> dedicated
>>> >>> >>>>>> partitioning superstep for graph applications).
>>> >>> >>>>>
>>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
>>> >>> partitioning
>>> >>> >>>>> job based on superstep API?
>>> >>> >>>>>
>>> >>> >>>>> By default, 100 tasks will be assigned for partitioning job.
>>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can
>>> execute
>>> >>> >>>>> the Graph job with 1,000 tasks.
>>> >>> >>>>>
>>> >>> >>>>> Let's assume that a input sequence file is 20GB (100 blocks). If
>>> I
>>> >>> >>>>> want to run with 1,000 tasks, what happens?
>>> >>> >>>>>
>>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
>>> surajsmenon@apache.org>
>>> >>> wrote:
>>> >>> >>>>>> I am responding on this thread because of better continuity for
>>> >>> >>>>>> conversation. We cannot expect the partitions to be sorted every
>>> >>> time. When
>>> >>> >>>>>> the number of splits = number of partitions and partitioning is
>>> >>> switched
>>> >>> >>>>>> off by user[HAMA-561], the partitions would not be sorted. Can
>>> we
>>> >>> do this
>>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>>> >>> user space
>>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>>> >>> This way
>>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>>> >>> sorted
>>> >>> >>>>>> with a single read and single write on every peer.
>>> >>> >>>>>>
>>> >>> >>>>>> Just clearing confusion if any regarding superstep injection for
>>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can have
>>> a
>>> >>> dedicated
>>> >>> >>>>>> partitioning superstep for graph applications).
>>> >>> >>>>>> Say there are x splits and y number of tasks configured by user.
>>> >>> >>>>>>
>>> >>> >>>>>> if x > y
>>> >>> >>>>>> The y tasks are scheduled with x of them having each of the x
>>> >>> splits and
>>> >>> >>>>>> the remaining with no resource local to them. Then the
>>> partitioning
>>> >>> >>>>>> superstep redistributes the partitions among them to create
>>> local
>>> >>> >>>>>> partitions. Now the question is can we re-initialize a peer's
>>> input
>>> >>> based
>>> >>> >>>>>> on this new local part of partition?
>>> >>> >>>>>>
>>> >>> >>>>>> if y > x
>>> >>> >>>>>> works as it works today.
>>> >>> >>>>>>
>>> >>> >>>>>> Just putting my points in brainstorming.
>>> >>> >>>>>>
>>> >>> >>>>>> -Suraj
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>>> >>> edwardyoon@apache.org>wrote:
>>> >>> >>>>>>
>>> >>> >>>>>>> I just filed here
>>> https://issues.apache.org/jira/browse/HAMA-744
>>> >>> >>>>>>>
>>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>>> >>> edwardyoon@apache.org>
>>> >>> >>>>>>> wrote:
>>> >>> >>>>>>> > Additionally,
>>> >>> >>>>>>> >
>>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we inject the
>>> >>> partitioning
>>> >>> >>>>>>> >> superstep as the first superstep and use local memory?
>>> >>> >>>>>>> >
>>> >>> >>>>>>> > Can we execute different number of tasks per superstep?
>>> >>> >>>>>>> >
>>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>>> >>> edwardyoon@apache.org>
>>> >>> >>>>>>> wrote:
>>> >>> >>>>>>> >>> For graph processing, the partitioned files that result
>>> from
>>> >>> the
>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>>> partition
>>> >>> files in
>>> >>> >>>>>>> >>
>>> >>> >>>>>>> >> I see.
>>> >>> >>>>>>> >>
>>> >>> >>>>>>> >>> For other partitionings and with regard to our superstep
>>> API,
>>> >>> Suraj's
>>> >>> >>>>>>> idea
>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>>> >>> stuff into
>>> >>> >>>>>>> our
>>> >>> >>>>>>> >>> messaging system is actually the best.
>>> >>> >>>>>>> >>
>>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
>>> partitioning
>>> >>> step,
>>> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is there
>>> some
>>> >>> special
>>> >>> >>>>>>> >> reason?
>>> >>> >>>>>>> >>
>>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>> >>> >>>>>>> >> <th...@gmail.com> wrote:
>>> >>> >>>>>>> >>> For graph processing, the partitioned files that result
>>> from
>>> >>> the
>>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>>> partition
>>> >>> files in
>>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not sorted
>>> data
>>> >>> in the
>>> >>> >>>>>>> >>> completed file. This only applies for the graph processing
>>> >>> package.
>>> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to solve
>>> this
>>> >>> via
>>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very very
>>> >>> scalable!). So the
>>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single
>>> >>> superstep in
>>> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging must be
>>> >>> sorted anyway
>>> >>> >>>>>>> for
>>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and saves us
>>> the
>>> >>> >>>>>>> partitioning
>>> >>> >>>>>>> >>> job for graph processing.
>>> >>> >>>>>>> >>>
>>> >>> >>>>>>> >>> For other partitionings and with regard to our superstep
>>> API,
>>> >>> Suraj's
>>> >>> >>>>>>> idea
>>> >>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>>> >>> stuff into
>>> >>> >>>>>>> our
>>> >>> >>>>>>> >>> messaging system is actually the best.
>>> >>> >>>>>>> >>>
>>> >>> >>>>>>> >>>
>>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>> >>> >>>>>>> >>>
>>> >>> >>>>>>> >>>> No, the partitions we write locally need not be sorted.
>>> Sorry
>>> >>> for the
>>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible with
>>> Superstep
>>> >>> API.
>>> >>> >>>>>>> There
>>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler after I
>>> last
>>> >>> worked on
>>> >>> >>>>>>> it.
>>> >>> >>>>>>> >>>> We can then look into partitioning superstep being
>>> executed
>>> >>> before the
>>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I think it is
>>> >>> feasible.
>>> >>> >>>>>>> >>>>
>>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
>>> >>> edwardyoon@apache.org
>>> >>> >>>>>>> >>>> >wrote:
>>> >>> >>>>>>> >>>>
>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>>> inject
>>> >>> the
>>> >>> >>>>>>> >>>> partitioning
>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>> > Actually, I wanted to add something before calling
>>> >>> BSP.setup()
>>> >>> >>>>>>> method
>>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion,
>>> >>> current is
>>> >>> >>>>>>> >>>> > enough. I think, we need to collect more experiences of
>>> >>> input
>>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?!
>>> MR-like?
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>> >>> >>>>>>> surajsmenon@apache.org>
>>> >>> >>>>>>> >>>> > wrote:
>>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph
>>> >>> module.
>>> >>> >>>>>>> When we
>>> >>> >>>>>>> >>>> > have
>>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>>> inject
>>> >>> the
>>> >>> >>>>>>> >>>> partitioning
>>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>> >>> >>>>>>> >>>> > > Today we have partitioning job within a job and are
>>> >>> creating two
>>> >>> >>>>>>> copies
>>> >>> >>>>>>> >>>> > of
>>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it
>>> possible
>>> >>> to
>>> >>> >>>>>>> create or
>>> >>> >>>>>>> >>>> > > redistribute the partitions on local memory and
>>> >>> initialize the
>>> >>> >>>>>>> record
>>> >>> >>>>>>> >>>> > > reader there?
>>> >>> >>>>>>> >>>> > > The user can run a separate job give in examples area
>>> to
>>> >>> >>>>>>> explicitly
>>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment question
>>> is
>>> >>> how much
>>> >>> >>>>>>> of
>>> >>> >>>>>>> >>>> disk
>>> >>> >>>>>>> >>>> > > space gets allocated for local memory usage? Would it
>>> be
>>> >>> a safe
>>> >>> >>>>>>> >>>> approach
>>> >>> >>>>>>> >>>> > > with the limitations?
>>> >>> >>>>>>> >>>> > >
>>> >>> >>>>>>> >>>> > > -Suraj
>>> >>> >>>>>>> >>>> > >
>>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>>> >>> >>>>>>> >>>> > >
>>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can
>>> add
>>> >>> this to
>>> >>> >>>>>>> the
>>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
>>> >>> >>>>>>> >>>> > >>
>>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> >>> >>>>>>> >>>> > >>
>>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really
>>> necessary
>>> >>> to be
>>> >>> >>>>>>> Sorted?
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously
>>> if
>>> >>> you merge
>>> >>> >>>>>>> n
>>> >>> >>>>>>> >>>> > sorted
>>> >>> >>>>>>> >>>> > >> > files
>>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this will
>>> result in
>>> >>> totally
>>> >>> >>>>>>> >>>> > unsorted
>>> >>> >>>>>>> >>>> > >> > data
>>> >>> >>>>>>> >>>> > >> > > ;-)
>>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>>> >>> >>>>>>> >>>> > >> > >
>>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
>>> thomas.jungblut@gmail.com
>>> >>> >
>>> >>> >>>>>>> >>>> > >> > >
>>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>> >>> >>>>>>> >>>> > >> > >>
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
>>> >>> >>>>>>> >>>> > >> > >> ...
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
>>> >>> >>>>>>> >>>> > >> > >> ...
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
>>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
>>> >>> >>>>>>> >>>> > >> > >>
>>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>>> >>> >>>>>>> >>>> > >> > >>
>>> >>> >>>>>>> >>>> > >> > >>
>>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>>> >>> thomas.jungblut@gmail.com>
>>> >>> >>>>>>> >>>> > >> > >>
>>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do.
>>> >>> March 1 is
>>> >>> >>>>>>> >>>> > holiday[1]
>>> >>> >>>>>>> >>>> > >> so
>>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> 1.
>>> >>> >>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas
>>> Jungblut
>>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't
>>> >>> observe if all
>>> >>> >>>>>>> >>>> items
>>> >>> >>>>>>> >>>> > >> were
>>> >>> >>>>>>> >>>> > >> > >>>> added.
>>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the
>>> logic
>>> >>> of the ID
>>> >>> >>>>>>> into
>>> >>> >>>>>>> >>>> > the
>>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
>>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>>> >>> >>>>>>> >>>> > >> > >>>> >
>>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
>>> edwardyoon@apache.org
>>> >>> >
>>> >>> >>>>>>> >>>> > >> > >>>> >
>>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when
>>> generate
>>> >>> adjacency
>>> >>> >>>>>>> >>>> matrix
>>> >>> >>>>>>> >>>> > >> into
>>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas
>>> >>> Jungblut
>>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned
>>> >>> correctly?
>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>>> >>> edwardyoon@apache.org>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>> ls
>>> >>> -al
>>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28
>>> >>> 18:03 .
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28
>>> >>> 18:04 ..
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28
>>> >>> 18:01
>>> >>> >>>>>>> part-00000
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>>> >>> 18:01
>>> >>> >>>>>>> >>>> > .part-00000.crc
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28
>>> >>> 18:01
>>> >>> >>>>>>> part-00001
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>>> >>> 18:01
>>> >>> >>>>>>> >>>> > .part-00001.crc
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28
>>> >>> 18:03
>>> >>> >>>>>>> partitions
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>> ls
>>> >>> -al
>>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28
>>> >>> 18:03 .
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28
>>> >>> 18:03 ..
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28
>>> 18:03
>>> >>> >>>>>>> part-00000
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
>>> 18:03
>>> >>> >>>>>>> >>>> > .part-00000.crc
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28
>>> 18:03
>>> >>> >>>>>>> part-00001
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
>>> 18:03
>>> >>> >>>>>>> >>>> > .part-00001.crc
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward
>>> <
>>> >>> >>>>>>> >>>> edward@udanax.org
>>> >>> >>>>>>> >>>> > >
>>> >>> >>>>>>> >>>> > >> > wrote:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
>>> >>> Jungblut <
>>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me
>>> >>> please?
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen,
>>> >>> part-00000 and
>>> >>> >>>>>>> >>>> > >> part-00001,
>>> >>> >>>>>>> >>>> > >> > >>>> both
>>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory,
>>> there
>>> >>> is only a
>>> >>> >>>>>>> >>>> single
>>> >>> >>>>>>> >>>> > >> > 5.56kb
>>> >>> >>>>>>> >>>> > >> > >>>> file.
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to
>>> >>> write a
>>> >>> >>>>>>> single
>>> >>> >>>>>>> >>>> > file
>>> >>> >>>>>>> >>>> > >> if
>>> >>> >>>>>>> >>>> > >> > you
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files,
>>> strange
>>> >>> huh?
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>> >>> >>>>>>> thomas.jungblut@gmail.com>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph
>>> 1
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
>>> /tmp/pageout
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
>>> >>> profiled, maybe
>>> >>> >>>>>>> the
>>> >>> >>>>>>> >>>> > >> > >>>> partitioning
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input
>>> or
>>> >>> something
>>> >>> >>>>>>> else.
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
>>> >>> edwardyoon@apache.org
>>> >>> >>>>>>> >
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for
>>> graph
>>> >>> examples.
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>> >>> >>>>>>> >>>> > >> bin/hama
>>> >>> >>>>>>> >>>> > >> > jar
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>> >>> >>>>>>> >>>> > >> > fastgen
>>> >>> >>>>>>> >>>> > >> > >>>> 100 10
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>>> >>> util.NativeCodeLoader:
>>> >>> >>>>>>> Unable
>>> >>> >>>>>>> >>>> > to
>>> >>> >>>>>>> >>>> > >> > load
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>>> >>> platform...
>>> >>> >>>>>>> using
>>> >>> >>>>>>> >>>> > >> > builtin-java
>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> Running
>>> >>> >>>>>>> >>>> job:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>>> >>> bsp.LocalBSPRunner:
>>> >>> >>>>>>> Setting
>>> >>> >>>>>>> >>>> up
>>> >>> >>>>>>> >>>> > a
>>> >>> >>>>>>> >>>> > >> new
>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> Current
>>> >>> >>>>>>> >>>> > >> supersteps
>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>> bsp.BSPJobClient: The
>>> >>> >>>>>>> total
>>> >>> >>>>>>> >>>> > number
>>> >>> >>>>>>> >>>> > >> > of
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> Counters: 3
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > SUPERSTEPS=0
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>> >>> >>>>>>> >>>> > >> bin/hama
>>> >>> >>>>>>> >>>> > >> > jar
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>> >>>>>>> :~/workspace/hama-trunk$
>>> >>> >>>>>>> >>>> > >> bin/hama
>>> >>> >>>>>>> >>>> > >> > jar
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>> >>> >>>>>>> >>>> > pagerank
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>>> >>> util.NativeCodeLoader:
>>> >>> >>>>>>> Unable
>>> >>> >>>>>>> >>>> > to
>>> >>> >>>>>>> >>>> > >> > load
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>>> >>> platform...
>>> >>> >>>>>>> using
>>> >>> >>>>>>> >>>> > >> > builtin-java
>>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>>> >>> bsp.FileInputFormat:
>>> >>> >>>>>>> Total
>>> >>> >>>>>>> >>>> > input
>>> >>> >>>>>>> >>>> > >> > paths
>>> >>> >>>>>>> >>>> > >> > >>>> to
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>>> >>> bsp.FileInputFormat:
>>> >>> >>>>>>> Total
>>> >>> >>>>>>> >>>> > input
>>> >>> >>>>>>> >>>> > >> > paths
>>> >>> >>>>>>> >>>> > >> > >>>> to
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> Running
>>> >>> >>>>>>> >>>> job:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>>> >>> bsp.LocalBSPRunner:
>>> >>> >>>>>>> Setting
>>> >>> >>>>>>> >>>> up
>>> >>> >>>>>>> >>>> > a
>>> >>> >>>>>>> >>>> > >> new
>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> Current
>>> >>> >>>>>>> >>>> > >> supersteps
>>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient: The
>>> >>> >>>>>>> total
>>> >>> >>>>>>> >>>> > number
>>> >>> >>>>>>> >>>> > >> > of
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> Counters: 6
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > SUPERSTEPS=1
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.FileInputFormat:
>>> >>> >>>>>>> Total
>>> >>> >>>>>>> >>>> > input
>>> >>> >>>>>>> >>>> > >> > paths
>>> >>> >>>>>>> >>>> > >> > >>>> to
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.BSPJobClient:
>>> >>> >>>>>>> Running
>>> >>> >>>>>>> >>>> job:
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> bsp.LocalBSPRunner:
>>> >>> >>>>>>> Setting
>>> >>> >>>>>>> >>>> up
>>> >>> >>>>>>> >>>> > a
>>> >>> >>>>>>> >>>> > >> new
>>> >>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> graph.GraphJobRunner: 50
>>> >>> >>>>>>> >>>> > vertices
>>> >>> >>>>>>> >>>> > >> > are
>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> >>> graph.GraphJobRunner: 50
>>> >>> >>>>>>> >>>> > vertices
>>> >>> >>>>>>> >>>> > >> > are
>>> >>> >>>>>>> >>>> > >> > >>>> loaded
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>>> >>> bsp.LocalBSPRunner:
>>> >>> >>>>>>> >>>> Exception
>>> >>> >>>>>>> >>>> > >> > during
>>> >>> >>>>>>> >>>> > >> > >>>> BSP
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
>>> >>> Messages
>>> >>> >>>>>>> must
>>> >>> >>>>>>> >>>> > never
>>> >>> >>>>>>> >>>> > >> be
>>> >>> >>>>>>> >>>> > >> > >>>> behind
>>> >>> >>>>>>> >>>> > >> > >>>> >> the
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1
>>> >>> vs. 50
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>>
>>> >>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>>
>>> >>>
>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>>
>>> >>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>>
>>> >>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >>
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>>
>>> >>> >>>>>>>
>>> >>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >>
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>>
>>> >>> >>>>>>>
>>> >>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> >
>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>>
>>> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> >
>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >>
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>>
>>> >>> >>>>>>>
>>> >>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >>
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>>
>>> >>> >>>>>>>
>>> >>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>> java.lang.Thread.run(Thread.java:722)
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>> >> --
>>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>>> >>> >>>>>>> >>>> > >> > >>>> >>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> > >>>> --
>>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>>> >>> >>>>>>> >>>> > >> > >>>>
>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>> >>>>>>> >>>> > >> > >>>
>>> >>> >>>>>>> >>>> > >> > >>
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >> > --
>>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>>> >>> >>>>>>> >>>> > >> > @eddieyoon
>>> >>> >>>>>>> >>>> > >> >
>>> >>> >>>>>>> >>>> > >>
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>> > --
>>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>>> >>> >>>>>>> >>>> > @eddieyoon
>>> >>> >>>>>>> >>>> >
>>> >>> >>>>>>> >>>>
>>> >>> >>>>>>> >>
>>> >>> >>>>>>> >>
>>> >>> >>>>>>> >>
>>> >>> >>>>>>> >> --
>>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
>>> >>> >>>>>>> >> @eddieyoon
>>> >>> >>>>>>> >
>>> >>> >>>>>>> >
>>> >>> >>>>>>> >
>>> >>> >>>>>>> > --
>>> >>> >>>>>>> > Best Regards, Edward J. Yoon
>>> >>> >>>>>>> > @eddieyoon
>>> >>> >>>>>>>
>>> >>> >>>>>>>
>>> >>> >>>>>>>
>>> >>> >>>>>>> --
>>> >>> >>>>>>> Best Regards, Edward J. Yoon
>>> >>> >>>>>>> @eddieyoon
>>> >>> >>>>>>>
>>> >>> >>>>>
>>> >>> >>>>>
>>> >>> >>>>>
>>> >>> >>>>> --
>>> >>> >>>>> Best Regards, Edward J. Yoon
>>> >>> >>>>> @eddieyoon
>>> >>> >>>>
>>> >>> >>>>
>>> >>> >>>>
>>> >>> >>>> --
>>> >>> >>>> Best Regards, Edward J. Yoon
>>> >>> >>>> @eddieyoon
>>> >>> >>>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> --
>>> >>> >>> Best Regards, Edward J. Yoon
>>> >>> >>> @eddieyoon
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Best Regards, Edward J. Yoon
>>> >>> >> @eddieyoon
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Best Regards, Edward J. Yoon
>>> >>> > @eddieyoon
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Best Regards, Edward J. Yoon
>>> >>> @eddieyoon
>>> >>>
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon
>>> > @eddieyoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

Hmm, here's my opinions:

As you know, we have a problem of lack of team members and
contributors. So we should break down every tasks as small as
possible. Our best action is improving step-by-step. And every
Hama-x.x.x should run well even though it's a baby cart level.

And, Tech should be developed under the necessity. So I think we need
to cut release as often as possible. Therefore I volunteered to manage
release. Actually, I was wanted to work only on QA (quality assurance)
related tasks because yours code is better than me and I have a
cluster.

However, we are currently not doing like that. I guess there are many
reasons. We're all not a full-time open sourcer (except me).

> You have 23 issues assigned.  Why do you need to work on that?

I don't know what you mean exactly. But 23 issues are almost examples
except YARN integration tasks. If you leave here, I have to take cover
YARN tasks. Should I wait someone? Am I touching core module
aggressively?

> Otherwise Suraj and I branch that issues away and you can play arround.l in
> trunk how you like.

I also don't know what you mean exactly but if you want, Please do.

By the way, can you answer about this question - Is it really
technical conflicts? or emotional conflicts?

On Thu, Mar 14, 2013 at 5:32 PM, Thomas Jungblut
<th...@gmail.com> wrote:
> You have 23 issues assigned.  Why do you need to work on that?
> Otherwise Suraj and I branch that issues away and you can play arround.l in
> trunk how you like.
> Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <ed...@apache.org>:
>
>> P.S., Please don't say like that.
>>
>> No decisions made yet. And if someone have a question or missed
>> something, you have to try to explain here. Because this is a open
>> source. Anyone can't say "don't touch trunk bc I'm working on it".
>>
>> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> > Sorry for my quick and dirty style small patches.
>> >
>> > However, we should work together in parallel. Please share here if
>> > there are some progresses.
>> >
>> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
>> > <th...@gmail.com> wrote:
>> >> Hi Edward,
>> >>
>> >> before you run riot on all along the codebase, Suraj ist currently
>> working
>> >> on that stuff- don't make it more difficult for him rebasing all his
>> >> patches the whole time.
>> >> He has the plan so that we made to make the stuff working, his part is
>> >> currently missing. So don't try to muddle arround there, it will make
>> this
>> >> take longer than already needed.
>> >>
>> >>
>> >>
>> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>> >>
>> >>> Personally, I would like to solve this issue by touching
>> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
>> >>> multiple files, we can avoid huge memory consumption.
>> >>>
>> >>> If we want to sort partitioned data using messaging system, idea
>> >>> should be collected.
>> >>>
>> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>> wrote:
>> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
>> >>> >
>> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>> wrote:
>> >>> >> I'm reading changes of HAMA-704 again. As a result of adding
>> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
>> >>> >> but I think this approach will bring more disadvantages than
>> >>> >> advantages.
>> >>> >>
>> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>> wrote:
>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>> >>> user space
>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>> >>> This way
>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>> >>> sorted
>> >>> >>>>>> with a single read and single write on every peer.
>> >>> >>>
>> >>> >>> And, as I commented JIRA ticket, I think we can't use messaging
>> system
>> >>> >>> for sorting vertices within partition files.
>> >>> >>>
>> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>> >>> edwardyoon@apache.org> wrote:
>> >>> >>>> P.S., (number of splits = number of partitions) is really confuse
>> to
>> >>> >>>> me. Even though blocks number is equal to desired tasks number,
>> data
>> >>> >>>> should be re-partitioned again.
>> >>> >>>>
>> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>> >>> edwardyoon@apache.org> wrote:
>> >>> >>>>> Indeed. If there are already partitioned input files (unsorted)
>> and
>> >>> so
>> >>> >>>>> user want to skip pre-partitioning phase, it should be handled in
>> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
>> >>> >>>>> re-partitioned files need to be Sorted. It's only about
>> >>> >>>>> GraphJobRunner.
>> >>> >>>>>
>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can have
>> a
>> >>> dedicated
>> >>> >>>>>> partitioning superstep for graph applications).
>> >>> >>>>>
>> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
>> >>> partitioning
>> >>> >>>>> job based on superstep API?
>> >>> >>>>>
>> >>> >>>>> By default, 100 tasks will be assigned for partitioning job.
>> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can
>> execute
>> >>> >>>>> the Graph job with 1,000 tasks.
>> >>> >>>>>
>> >>> >>>>> Let's assume that a input sequence file is 20GB (100 blocks). If
>> I
>> >>> >>>>> want to run with 1,000 tasks, what happens?
>> >>> >>>>>
>> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
>> surajsmenon@apache.org>
>> >>> wrote:
>> >>> >>>>>> I am responding on this thread because of better continuity for
>> >>> >>>>>> conversation. We cannot expect the partitions to be sorted every
>> >>> time. When
>> >>> >>>>>> the number of splits = number of partitions and partitioning is
>> >>> switched
>> >>> >>>>>> off by user[HAMA-561], the partitions would not be sorted. Can
>> we
>> >>> do this
>> >>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>> >>> user space
>> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>> >>> This way
>> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>> >>> sorted
>> >>> >>>>>> with a single read and single write on every peer.
>> >>> >>>>>>
>> >>> >>>>>> Just clearing confusion if any regarding superstep injection for
>> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can have
>> a
>> >>> dedicated
>> >>> >>>>>> partitioning superstep for graph applications).
>> >>> >>>>>> Say there are x splits and y number of tasks configured by user.
>> >>> >>>>>>
>> >>> >>>>>> if x > y
>> >>> >>>>>> The y tasks are scheduled with x of them having each of the x
>> >>> splits and
>> >>> >>>>>> the remaining with no resource local to them. Then the
>> partitioning
>> >>> >>>>>> superstep redistributes the partitions among them to create
>> local
>> >>> >>>>>> partitions. Now the question is can we re-initialize a peer's
>> input
>> >>> based
>> >>> >>>>>> on this new local part of partition?
>> >>> >>>>>>
>> >>> >>>>>> if y > x
>> >>> >>>>>> works as it works today.
>> >>> >>>>>>
>> >>> >>>>>> Just putting my points in brainstorming.
>> >>> >>>>>>
>> >>> >>>>>> -Suraj
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>> >>> edwardyoon@apache.org>wrote:
>> >>> >>>>>>
>> >>> >>>>>>> I just filed here
>> https://issues.apache.org/jira/browse/HAMA-744
>> >>> >>>>>>>
>> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>> >>> edwardyoon@apache.org>
>> >>> >>>>>>> wrote:
>> >>> >>>>>>> > Additionally,
>> >>> >>>>>>> >
>> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we inject the
>> >>> partitioning
>> >>> >>>>>>> >> superstep as the first superstep and use local memory?
>> >>> >>>>>>> >
>> >>> >>>>>>> > Can we execute different number of tasks per superstep?
>> >>> >>>>>>> >
>> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>> >>> edwardyoon@apache.org>
>> >>> >>>>>>> wrote:
>> >>> >>>>>>> >>> For graph processing, the partitioned files that result
>> from
>> >>> the
>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>> partition
>> >>> files in
>> >>> >>>>>>> >>
>> >>> >>>>>>> >> I see.
>> >>> >>>>>>> >>
>> >>> >>>>>>> >>> For other partitionings and with regard to our superstep
>> API,
>> >>> Suraj's
>> >>> >>>>>>> idea
>> >>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>> >>> stuff into
>> >>> >>>>>>> our
>> >>> >>>>>>> >>> messaging system is actually the best.
>> >>> >>>>>>> >>
>> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
>> partitioning
>> >>> step,
>> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is there
>> some
>> >>> special
>> >>> >>>>>>> >> reason?
>> >>> >>>>>>> >>
>> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>> >>> >>>>>>> >> <th...@gmail.com> wrote:
>> >>> >>>>>>> >>> For graph processing, the partitioned files that result
>> from
>> >>> the
>> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
>> partition
>> >>> files in
>> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not sorted
>> data
>> >>> in the
>> >>> >>>>>>> >>> completed file. This only applies for the graph processing
>> >>> package.
>> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to solve
>> this
>> >>> via
>> >>> >>>>>>> >>> messaging, once it is scalable (it will be very very
>> >>> scalable!). So the
>> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single
>> >>> superstep in
>> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging must be
>> >>> sorted anyway
>> >>> >>>>>>> for
>> >>> >>>>>>> >>> the algorithm so this is a nice side effect and saves us
>> the
>> >>> >>>>>>> partitioning
>> >>> >>>>>>> >>> job for graph processing.
>> >>> >>>>>>> >>>
>> >>> >>>>>>> >>> For other partitionings and with regard to our superstep
>> API,
>> >>> Suraj's
>> >>> >>>>>>> idea
>> >>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>> >>> stuff into
>> >>> >>>>>>> our
>> >>> >>>>>>> >>> messaging system is actually the best.
>> >>> >>>>>>> >>>
>> >>> >>>>>>> >>>
>> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>> >>> >>>>>>> >>>
>> >>> >>>>>>> >>>> No, the partitions we write locally need not be sorted.
>> Sorry
>> >>> for the
>> >>> >>>>>>> >>>> confusion. The Superstep injection is possible with
>> Superstep
>> >>> API.
>> >>> >>>>>>> There
>> >>> >>>>>>> >>>> are few enhancements needed to make it simpler after I
>> last
>> >>> worked on
>> >>> >>>>>>> it.
>> >>> >>>>>>> >>>> We can then look into partitioning superstep being
>> executed
>> >>> before the
>> >>> >>>>>>> >>>> setup of first superstep of submitted job. I think it is
>> >>> feasible.
>> >>> >>>>>>> >>>>
>> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
>> >>> edwardyoon@apache.org
>> >>> >>>>>>> >>>> >wrote:
>> >>> >>>>>>> >>>>
>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>> inject
>> >>> the
>> >>> >>>>>>> >>>> partitioning
>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>> > Actually, I wanted to add something before calling
>> >>> BSP.setup()
>> >>> >>>>>>> method
>> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion,
>> >>> current is
>> >>> >>>>>>> >>>> > enough. I think, we need to collect more experiences of
>> >>> input
>> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?!
>> MR-like?
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>> >>> >>>>>>> surajsmenon@apache.org>
>> >>> >>>>>>> >>>> > wrote:
>> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph
>> >>> module.
>> >>> >>>>>>> When we
>> >>> >>>>>>> >>>> > have
>> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
>> inject
>> >>> the
>> >>> >>>>>>> >>>> partitioning
>> >>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>> >>> >>>>>>> >>>> > > Today we have partitioning job within a job and are
>> >>> creating two
>> >>> >>>>>>> copies
>> >>> >>>>>>> >>>> > of
>> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it
>> possible
>> >>> to
>> >>> >>>>>>> create or
>> >>> >>>>>>> >>>> > > redistribute the partitions on local memory and
>> >>> initialize the
>> >>> >>>>>>> record
>> >>> >>>>>>> >>>> > > reader there?
>> >>> >>>>>>> >>>> > > The user can run a separate job give in examples area
>> to
>> >>> >>>>>>> explicitly
>> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment question
>> is
>> >>> how much
>> >>> >>>>>>> of
>> >>> >>>>>>> >>>> disk
>> >>> >>>>>>> >>>> > > space gets allocated for local memory usage? Would it
>> be
>> >>> a safe
>> >>> >>>>>>> >>>> approach
>> >>> >>>>>>> >>>> > > with the limitations?
>> >>> >>>>>>> >>>> > >
>> >>> >>>>>>> >>>> > > -Suraj
>> >>> >>>>>>> >>>> > >
>> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>> >>> >>>>>>> >>>> > >
>> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can
>> add
>> >>> this to
>> >>> >>>>>>> the
>> >>> >>>>>>> >>>> > >> partitioner pretty easily.
>> >>> >>>>>>> >>>> > >>
>> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> >>> >>>>>>> >>>> > >>
>> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really
>> necessary
>> >>> to be
>> >>> >>>>>>> Sorted?
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously
>> if
>> >>> you merge
>> >>> >>>>>>> n
>> >>> >>>>>>> >>>> > sorted
>> >>> >>>>>>> >>>> > >> > files
>> >>> >>>>>>> >>>> > >> > > by just appending to each other, this will
>> result in
>> >>> totally
>> >>> >>>>>>> >>>> > unsorted
>> >>> >>>>>>> >>>> > >> > data
>> >>> >>>>>>> >>>> > >> > > ;-)
>> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>> >>> >>>>>>> >>>> > >> > >
>> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
>> thomas.jungblut@gmail.com
>> >>> >
>> >>> >>>>>>> >>>> > >> > >
>> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>>>>> >>>> > >> > >> vertexID: 50
>> >>> >>>>>>> >>>> > >> > >> vertexID: 52
>> >>> >>>>>>> >>>> > >> > >> vertexID: 54
>> >>> >>>>>>> >>>> > >> > >> vertexID: 56
>> >>> >>>>>>> >>>> > >> > >> vertexID: 58
>> >>> >>>>>>> >>>> > >> > >> vertexID: 61
>> >>> >>>>>>> >>>> > >> > >> ...
>> >>> >>>>>>> >>>> > >> > >> vertexID: 78
>> >>> >>>>>>> >>>> > >> > >> vertexID: 81
>> >>> >>>>>>> >>>> > >> > >> vertexID: 83
>> >>> >>>>>>> >>>> > >> > >> vertexID: 85
>> >>> >>>>>>> >>>> > >> > >> ...
>> >>> >>>>>>> >>>> > >> > >> vertexID: 94
>> >>> >>>>>>> >>>> > >> > >> vertexID: 96
>> >>> >>>>>>> >>>> > >> > >> vertexID: 98
>> >>> >>>>>>> >>>> > >> > >> vertexID: 1
>> >>> >>>>>>> >>>> > >> > >> vertexID: 10
>> >>> >>>>>>> >>>> > >> > >> vertexID: 12
>> >>> >>>>>>> >>>> > >> > >> vertexID: 14
>> >>> >>>>>>> >>>> > >> > >> vertexID: 16
>> >>> >>>>>>> >>>> > >> > >> vertexID: 18
>> >>> >>>>>>> >>>> > >> > >> vertexID: 21
>> >>> >>>>>>> >>>> > >> > >> vertexID: 23
>> >>> >>>>>>> >>>> > >> > >> vertexID: 25
>> >>> >>>>>>> >>>> > >> > >> vertexID: 27
>> >>> >>>>>>> >>>> > >> > >> vertexID: 29
>> >>> >>>>>>> >>>> > >> > >> vertexID: 3
>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>> >>> thomas.jungblut@gmail.com>
>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do.
>> >>> March 1 is
>> >>> >>>>>>> >>>> > holiday[1]
>> >>> >>>>>>> >>>> > >> so
>> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> > >>>> 1.
>> >>> >>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas
>> Jungblut
>> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't
>> >>> observe if all
>> >>> >>>>>>> >>>> items
>> >>> >>>>>>> >>>> > >> were
>> >>> >>>>>>> >>>> > >> > >>>> added.
>> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the
>> logic
>> >>> of the ID
>> >>> >>>>>>> into
>> >>> >>>>>>> >>>> > the
>> >>> >>>>>>> >>>> > >> > >>>> fastgen,
>> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>> >>> >>>>>>> >>>> > >> > >>>> >
>> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
>> edwardyoon@apache.org
>> >>> >
>> >>> >>>>>>> >>>> > >> > >>>> >
>> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when
>> generate
>> >>> adjacency
>> >>> >>>>>>> >>>> matrix
>> >>> >>>>>>> >>>> > >> into
>> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas
>> >>> Jungblut
>> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned
>> >>> correctly?
>> >>> >>>>>>> >>>> > >> > >>>> >> >
>> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>> >>> edwardyoon@apache.org>
>> >>> >>>>>>> >>>> > >> > >>>> >> >
>> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>> ls
>> >>> -al
>> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28
>> >>> 18:03 .
>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28
>> >>> 18:04 ..
>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28
>> >>> 18:01
>> >>> >>>>>>> part-00000
>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>> >>> 18:01
>> >>> >>>>>>> >>>> > .part-00000.crc
>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28
>> >>> 18:01
>> >>> >>>>>>> part-00001
>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>> >>> 18:01
>> >>> >>>>>>> >>>> > .part-00001.crc
>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28
>> >>> 18:03
>> >>> >>>>>>> partitions
>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>> ls
>> >>> -al
>> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28
>> >>> 18:03 .
>> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28
>> >>> 18:03 ..
>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28
>> 18:03
>> >>> >>>>>>> part-00000
>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
>> 18:03
>> >>> >>>>>>> >>>> > .part-00000.crc
>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28
>> 18:03
>> >>> >>>>>>> part-00001
>> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
>> 18:03
>> >>> >>>>>>> >>>> > .part-00001.crc
>> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward
>> <
>> >>> >>>>>>> >>>> edward@udanax.org
>> >>> >>>>>>> >>>> > >
>> >>> >>>>>>> >>>> > >> > wrote:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
>> >>> Jungblut <
>> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me
>> >>> please?
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen,
>> >>> part-00000 and
>> >>> >>>>>>> >>>> > >> part-00001,
>> >>> >>>>>>> >>>> > >> > >>>> both
>> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory,
>> there
>> >>> is only a
>> >>> >>>>>>> >>>> single
>> >>> >>>>>>> >>>> > >> > 5.56kb
>> >>> >>>>>>> >>>> > >> > >>>> file.
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to
>> >>> write a
>> >>> >>>>>>> single
>> >>> >>>>>>> >>>> > file
>> >>> >>>>>>> >>>> > >> if
>> >>> >>>>>>> >>>> > >> > you
>> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files,
>> strange
>> >>> huh?
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>> >>> >>>>>>> thomas.jungblut@gmail.com>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph
>> 1
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
>> /tmp/pageout
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
>> >>> profiled, maybe
>> >>> >>>>>>> the
>> >>> >>>>>>> >>>> > >> > >>>> partitioning
>> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input
>> or
>> >>> something
>> >>> >>>>>>> else.
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
>> >>> edwardyoon@apache.org
>> >>> >>>>>>> >
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for
>> graph
>> >>> examples.
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>> >>>>>>> >>>> > >> bin/hama
>> >>> >>>>>>> >>>> > >> > jar
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>> >>> >>>>>>> >>>> > >> > fastgen
>> >>> >>>>>>> >>>> > >> > >>>> 100 10
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>> >>> util.NativeCodeLoader:
>> >>> >>>>>>> Unable
>> >>> >>>>>>> >>>> > to
>> >>> >>>>>>> >>>> > >> > load
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>> >>> platform...
>> >>> >>>>>>> using
>> >>> >>>>>>> >>>> > >> > builtin-java
>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> Running
>> >>> >>>>>>> >>>> job:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> >>> bsp.LocalBSPRunner:
>> >>> >>>>>>> Setting
>> >>> >>>>>>> >>>> up
>> >>> >>>>>>> >>>> > a
>> >>> >>>>>>> >>>> > >> new
>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> Current
>> >>> >>>>>>> >>>> > >> supersteps
>> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> bsp.BSPJobClient: The
>> >>> >>>>>>> total
>> >>> >>>>>>> >>>> > number
>> >>> >>>>>>> >>>> > >> > of
>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> Counters: 3
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > SUPERSTEPS=0
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>> >>>>>>> >>>> > >> bin/hama
>> >>> >>>>>>> >>>> > >> > jar
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>> >>>>>>> :~/workspace/hama-trunk$
>> >>> >>>>>>> >>>> > >> bin/hama
>> >>> >>>>>>> >>>> > >> > jar
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>> >>> >>>>>>> >>>> > pagerank
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>> >>> util.NativeCodeLoader:
>> >>> >>>>>>> Unable
>> >>> >>>>>>> >>>> > to
>> >>> >>>>>>> >>>> > >> > load
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>> >>> platform...
>> >>> >>>>>>> using
>> >>> >>>>>>> >>>> > >> > builtin-java
>> >>> >>>>>>> >>>> > >> > >>>> >> classes
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> >>> bsp.FileInputFormat:
>> >>> >>>>>>> Total
>> >>> >>>>>>> >>>> > input
>> >>> >>>>>>> >>>> > >> > paths
>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> >>> bsp.FileInputFormat:
>> >>> >>>>>>> Total
>> >>> >>>>>>> >>>> > input
>> >>> >>>>>>> >>>> > >> > paths
>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> Running
>> >>> >>>>>>> >>>> job:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> >>> bsp.LocalBSPRunner:
>> >>> >>>>>>> Setting
>> >>> >>>>>>> >>>> up
>> >>> >>>>>>> >>>> > a
>> >>> >>>>>>> >>>> > >> new
>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> Current
>> >>> >>>>>>> >>>> > >> supersteps
>> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient: The
>> >>> >>>>>>> total
>> >>> >>>>>>> >>>> > number
>> >>> >>>>>>> >>>> > >> > of
>> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> Counters: 6
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > SUPERSTEPS=1
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.FileInputFormat:
>> >>> >>>>>>> Total
>> >>> >>>>>>> >>>> > input
>> >>> >>>>>>> >>>> > >> > paths
>> >>> >>>>>>> >>>> > >> > >>>> to
>> >>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.BSPJobClient:
>> >>> >>>>>>> Running
>> >>> >>>>>>> >>>> job:
>> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> bsp.LocalBSPRunner:
>> >>> >>>>>>> Setting
>> >>> >>>>>>> >>>> up
>> >>> >>>>>>> >>>> > a
>> >>> >>>>>>> >>>> > >> new
>> >>> >>>>>>> >>>> > >> > >>>> barrier
>> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> graph.GraphJobRunner: 50
>> >>> >>>>>>> >>>> > vertices
>> >>> >>>>>>> >>>> > >> > are
>> >>> >>>>>>> >>>> > >> > >>>> loaded
>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> >>> graph.GraphJobRunner: 50
>> >>> >>>>>>> >>>> > vertices
>> >>> >>>>>>> >>>> > >> > are
>> >>> >>>>>>> >>>> > >> > >>>> loaded
>> >>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>> >>> bsp.LocalBSPRunner:
>> >>> >>>>>>> >>>> Exception
>> >>> >>>>>>> >>>> > >> > during
>> >>> >>>>>>> >>>> > >> > >>>> BSP
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
>> >>> Messages
>> >>> >>>>>>> must
>> >>> >>>>>>> >>>> > never
>> >>> >>>>>>> >>>> > >> be
>> >>> >>>>>>> >>>> > >> > >>>> behind
>> >>> >>>>>>> >>>> > >> > >>>> >> the
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1
>> >>> vs. 50
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>>
>> >>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>>
>> >>>
>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>>
>> >>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>>
>> >>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >>
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>>
>> >>> >>>>>>>
>> >>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >>
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>>
>> >>> >>>>>>>
>> >>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> >
>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>>
>> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> >
>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >>
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>>
>> >>> >>>>>>>
>> >>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >>
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>>
>> >>> >>>>>>>
>> >>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>> java.lang.Thread.run(Thread.java:722)
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> >> --
>> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>> >>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>> >> --
>> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>> >>> >>>>>>> >>>> > >> > >>>> >>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> > >>>> --
>> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>> >>> >>>>>>> >>>> > >> > >>>>
>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>>>>> >>>> > >> > >>>
>> >>> >>>>>>> >>>> > >> > >>
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >> > --
>> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>> >>> >>>>>>> >>>> > >> > @eddieyoon
>> >>> >>>>>>> >>>> > >> >
>> >>> >>>>>>> >>>> > >>
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>> > --
>> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>> >>> >>>>>>> >>>> > @eddieyoon
>> >>> >>>>>>> >>>> >
>> >>> >>>>>>> >>>>
>> >>> >>>>>>> >>
>> >>> >>>>>>> >>
>> >>> >>>>>>> >>
>> >>> >>>>>>> >> --
>> >>> >>>>>>> >> Best Regards, Edward J. Yoon
>> >>> >>>>>>> >> @eddieyoon
>> >>> >>>>>>> >
>> >>> >>>>>>> >
>> >>> >>>>>>> >
>> >>> >>>>>>> > --
>> >>> >>>>>>> > Best Regards, Edward J. Yoon
>> >>> >>>>>>> > @eddieyoon
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> --
>> >>> >>>>>>> Best Regards, Edward J. Yoon
>> >>> >>>>>>> @eddieyoon
>> >>> >>>>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> --
>> >>> >>>>> Best Regards, Edward J. Yoon
>> >>> >>>>> @eddieyoon
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> --
>> >>> >>>> Best Regards, Edward J. Yoon
>> >>> >>>> @eddieyoon
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> Best Regards, Edward J. Yoon
>> >>> >>> @eddieyoon
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Best Regards, Edward J. Yoon
>> >>> >> @eddieyoon
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Best Regards, Edward J. Yoon
>> >>> > @eddieyoon
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>> @eddieyoon
>> >>>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by Thomas Jungblut <th...@gmail.com>.

You have 23 issues assigned.  Why do you need to work on that?
Otherwise Suraj and I branch that issues away and you can play arround.l in
trunk how you like.
Am 14.03.2013 09:04 schrieb "Edward J. Yoon" <ed...@apache.org>:

> P.S., Please don't say like that.
>
> No decisions made yet. And if someone have a question or missed
> something, you have to try to explain here. Because this is a open
> source. Anyone can't say "don't touch trunk bc I'm working on it".
>
> On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Sorry for my quick and dirty style small patches.
> >
> > However, we should work together in parallel. Please share here if
> > there are some progresses.
> >
> > On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
> > <th...@gmail.com> wrote:
> >> Hi Edward,
> >>
> >> before you run riot on all along the codebase, Suraj ist currently
> working
> >> on that stuff- don't make it more difficult for him rebasing all his
> >> patches the whole time.
> >> He has the plan so that we made to make the stuff working, his part is
> >> currently missing. So don't try to muddle arround there, it will make
> this
> >> take longer than already needed.
> >>
> >>
> >>
> >> 2013/3/14 Edward J. Yoon <ed...@apache.org>
> >>
> >>> Personally, I would like to solve this issue by touching
> >>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
> >>> multiple files, we can avoid huge memory consumption.
> >>>
> >>> If we want to sort partitioned data using messaging system, idea
> >>> should be collected.
> >>>
> >>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>> wrote:
> >>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
> >>> >
> >>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>> wrote:
> >>> >> I'm reading changes of HAMA-704 again. As a result of adding
> >>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
> >>> >> but I think this approach will bring more disadvantages than
> >>> >> advantages.
> >>> >>
> >>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>> wrote:
> >>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
> >>> user space
> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
> >>> This way
> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
> >>> sorted
> >>> >>>>>> with a single read and single write on every peer.
> >>> >>>
> >>> >>> And, as I commented JIRA ticket, I think we can't use messaging
> system
> >>> >>> for sorting vertices within partition files.
> >>> >>>
> >>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org> wrote:
> >>> >>>> P.S., (number of splits = number of partitions) is really confuse
> to
> >>> >>>> me. Even though blocks number is equal to desired tasks number,
> data
> >>> >>>> should be re-partitioned again.
> >>> >>>>
> >>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org> wrote:
> >>> >>>>> Indeed. If there are already partitioned input files (unsorted)
> and
> >>> so
> >>> >>>>> user want to skip pre-partitioning phase, it should be handled in
> >>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
> >>> >>>>> re-partitioned files need to be Sorted. It's only about
> >>> >>>>> GraphJobRunner.
> >>> >>>>>
> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can have
> a
> >>> dedicated
> >>> >>>>>> partitioning superstep for graph applications).
> >>> >>>>>
> >>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
> >>> partitioning
> >>> >>>>> job based on superstep API?
> >>> >>>>>
> >>> >>>>> By default, 100 tasks will be assigned for partitioning job.
> >>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can
> execute
> >>> >>>>> the Graph job with 1,000 tasks.
> >>> >>>>>
> >>> >>>>> Let's assume that a input sequence file is 20GB (100 blocks). If
> I
> >>> >>>>> want to run with 1,000 tasks, what happens?
> >>> >>>>>
> >>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <
> surajsmenon@apache.org>
> >>> wrote:
> >>> >>>>>> I am responding on this thread because of better continuity for
> >>> >>>>>> conversation. We cannot expect the partitions to be sorted every
> >>> time. When
> >>> >>>>>> the number of splits = number of partitions and partitioning is
> >>> switched
> >>> >>>>>> off by user[HAMA-561], the partitions would not be sorted. Can
> we
> >>> do this
> >>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
> >>> user space
> >>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
> >>> This way
> >>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
> >>> sorted
> >>> >>>>>> with a single read and single write on every peer.
> >>> >>>>>>
> >>> >>>>>> Just clearing confusion if any regarding superstep injection for
> >>> >>>>>> partitioning. (This is outside the scope of graphs. We can have
> a
> >>> dedicated
> >>> >>>>>> partitioning superstep for graph applications).
> >>> >>>>>> Say there are x splits and y number of tasks configured by user.
> >>> >>>>>>
> >>> >>>>>> if x > y
> >>> >>>>>> The y tasks are scheduled with x of them having each of the x
> >>> splits and
> >>> >>>>>> the remaining with no resource local to them. Then the
> partitioning
> >>> >>>>>> superstep redistributes the partitions among them to create
> local
> >>> >>>>>> partitions. Now the question is can we re-initialize a peer's
> input
> >>> based
> >>> >>>>>> on this new local part of partition?
> >>> >>>>>>
> >>> >>>>>> if y > x
> >>> >>>>>> works as it works today.
> >>> >>>>>>
> >>> >>>>>> Just putting my points in brainstorming.
> >>> >>>>>>
> >>> >>>>>> -Suraj
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
> >>> edwardyoon@apache.org>wrote:
> >>> >>>>>>
> >>> >>>>>>> I just filed here
> https://issues.apache.org/jira/browse/HAMA-744
> >>> >>>>>>>
> >>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org>
> >>> >>>>>>> wrote:
> >>> >>>>>>> > Additionally,
> >>> >>>>>>> >
> >>> >>>>>>> >> spilling queue and sorted spilling queue, can we inject the
> >>> partitioning
> >>> >>>>>>> >> superstep as the first superstep and use local memory?
> >>> >>>>>>> >
> >>> >>>>>>> > Can we execute different number of tasks per superstep?
> >>> >>>>>>> >
> >>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org>
> >>> >>>>>>> wrote:
> >>> >>>>>>> >>> For graph processing, the partitioned files that result
> from
> >>> the
> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
> partition
> >>> files in
> >>> >>>>>>> >>
> >>> >>>>>>> >> I see.
> >>> >>>>>>> >>
> >>> >>>>>>> >>> For other partitionings and with regard to our superstep
> API,
> >>> Suraj's
> >>> >>>>>>> idea
> >>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
> >>> stuff into
> >>> >>>>>>> our
> >>> >>>>>>> >>> messaging system is actually the best.
> >>> >>>>>>> >>
> >>> >>>>>>> >> BTW, if some garbage objects can be accumulated in
> partitioning
> >>> step,
> >>> >>>>>>> >> separated partitioning job may not be bad idea. Is there
> some
> >>> special
> >>> >>>>>>> >> reason?
> >>> >>>>>>> >>
> >>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
> >>> >>>>>>> >> <th...@gmail.com> wrote:
> >>> >>>>>>> >>> For graph processing, the partitioned files that result
> from
> >>> the
> >>> >>>>>>> >>> partitioning job must be sorted. Currently only the
> partition
> >>> files in
> >>> >>>>>>> >>> itself are sorted, thus more tasks result in not sorted
> data
> >>> in the
> >>> >>>>>>> >>> completed file. This only applies for the graph processing
> >>> package.
> >>> >>>>>>> >>> So as Suraj told, it would be much more simpler to solve
> this
> >>> via
> >>> >>>>>>> >>> messaging, once it is scalable (it will be very very
> >>> scalable!). So the
> >>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single
> >>> superstep in
> >>> >>>>>>> >>> setup() as it was before ages ago. The messaging must be
> >>> sorted anyway
> >>> >>>>>>> for
> >>> >>>>>>> >>> the algorithm so this is a nice side effect and saves us
> the
> >>> >>>>>>> partitioning
> >>> >>>>>>> >>> job for graph processing.
> >>> >>>>>>> >>>
> >>> >>>>>>> >>> For other partitionings and with regard to our superstep
> API,
> >>> Suraj's
> >>> >>>>>>> idea
> >>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
> >>> stuff into
> >>> >>>>>>> our
> >>> >>>>>>> >>> messaging system is actually the best.
> >>> >>>>>>> >>>
> >>> >>>>>>> >>>
> >>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
> >>> >>>>>>> >>>
> >>> >>>>>>> >>>> No, the partitions we write locally need not be sorted.
> Sorry
> >>> for the
> >>> >>>>>>> >>>> confusion. The Superstep injection is possible with
> Superstep
> >>> API.
> >>> >>>>>>> There
> >>> >>>>>>> >>>> are few enhancements needed to make it simpler after I
> last
> >>> worked on
> >>> >>>>>>> it.
> >>> >>>>>>> >>>> We can then look into partitioning superstep being
> executed
> >>> before the
> >>> >>>>>>> >>>> setup of first superstep of submitted job. I think it is
> >>> feasible.
> >>> >>>>>>> >>>>
> >>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
> >>> edwardyoon@apache.org
> >>> >>>>>>> >>>> >wrote:
> >>> >>>>>>> >>>>
> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
> inject
> >>> the
> >>> >>>>>>> >>>> partitioning
> >>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>> > Actually, I wanted to add something before calling
> >>> BSP.setup()
> >>> >>>>>>> method
> >>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion,
> >>> current is
> >>> >>>>>>> >>>> > enough. I think, we need to collect more experiences of
> >>> input
> >>> >>>>>>> >>>> > partitioning on large environments. I'll do.
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?!
> MR-like?
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
> >>> >>>>>>> surajsmenon@apache.org>
> >>> >>>>>>> >>>> > wrote:
> >>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph
> >>> module.
> >>> >>>>>>> When we
> >>> >>>>>>> >>>> > have
> >>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we
> inject
> >>> the
> >>> >>>>>>> >>>> partitioning
> >>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
> >>> >>>>>>> >>>> > > Today we have partitioning job within a job and are
> >>> creating two
> >>> >>>>>>> copies
> >>> >>>>>>> >>>> > of
> >>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it
> possible
> >>> to
> >>> >>>>>>> create or
> >>> >>>>>>> >>>> > > redistribute the partitions on local memory and
> >>> initialize the
> >>> >>>>>>> record
> >>> >>>>>>> >>>> > > reader there?
> >>> >>>>>>> >>>> > > The user can run a separate job give in examples area
> to
> >>> >>>>>>> explicitly
> >>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment question
> is
> >>> how much
> >>> >>>>>>> of
> >>> >>>>>>> >>>> disk
> >>> >>>>>>> >>>> > > space gets allocated for local memory usage? Would it
> be
> >>> a safe
> >>> >>>>>>> >>>> approach
> >>> >>>>>>> >>>> > > with the limitations?
> >>> >>>>>>> >>>> > >
> >>> >>>>>>> >>>> > > -Suraj
> >>> >>>>>>> >>>> > >
> >>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> >>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
> >>> >>>>>>> >>>> > >
> >>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can
> add
> >>> this to
> >>> >>>>>>> the
> >>> >>>>>>> >>>> > >> partitioner pretty easily.
> >>> >>>>>>> >>>> > >>
> >>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >>> >>>>>>> >>>> > >>
> >>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really
> necessary
> >>> to be
> >>> >>>>>>> Sorted?
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
> >>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
> >>> >>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously
> if
> >>> you merge
> >>> >>>>>>> n
> >>> >>>>>>> >>>> > sorted
> >>> >>>>>>> >>>> > >> > files
> >>> >>>>>>> >>>> > >> > > by just appending to each other, this will
> result in
> >>> totally
> >>> >>>>>>> >>>> > unsorted
> >>> >>>>>>> >>>> > >> > data
> >>> >>>>>>> >>>> > >> > > ;-)
> >>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
> >>> >>>>>>> >>>> > >> > >
> >>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <
> thomas.jungblut@gmail.com
> >>> >
> >>> >>>>>>> >>>> > >> > >
> >>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
> >>> >>>>>>> >>>> > >> > >>
> >>> >>>>>>> >>>> > >> > >> vertexID: 50
> >>> >>>>>>> >>>> > >> > >> vertexID: 52
> >>> >>>>>>> >>>> > >> > >> vertexID: 54
> >>> >>>>>>> >>>> > >> > >> vertexID: 56
> >>> >>>>>>> >>>> > >> > >> vertexID: 58
> >>> >>>>>>> >>>> > >> > >> vertexID: 61
> >>> >>>>>>> >>>> > >> > >> ...
> >>> >>>>>>> >>>> > >> > >> vertexID: 78
> >>> >>>>>>> >>>> > >> > >> vertexID: 81
> >>> >>>>>>> >>>> > >> > >> vertexID: 83
> >>> >>>>>>> >>>> > >> > >> vertexID: 85
> >>> >>>>>>> >>>> > >> > >> ...
> >>> >>>>>>> >>>> > >> > >> vertexID: 94
> >>> >>>>>>> >>>> > >> > >> vertexID: 96
> >>> >>>>>>> >>>> > >> > >> vertexID: 98
> >>> >>>>>>> >>>> > >> > >> vertexID: 1
> >>> >>>>>>> >>>> > >> > >> vertexID: 10
> >>> >>>>>>> >>>> > >> > >> vertexID: 12
> >>> >>>>>>> >>>> > >> > >> vertexID: 14
> >>> >>>>>>> >>>> > >> > >> vertexID: 16
> >>> >>>>>>> >>>> > >> > >> vertexID: 18
> >>> >>>>>>> >>>> > >> > >> vertexID: 21
> >>> >>>>>>> >>>> > >> > >> vertexID: 23
> >>> >>>>>>> >>>> > >> > >> vertexID: 25
> >>> >>>>>>> >>>> > >> > >> vertexID: 27
> >>> >>>>>>> >>>> > >> > >> vertexID: 29
> >>> >>>>>>> >>>> > >> > >> vertexID: 3
> >>> >>>>>>> >>>> > >> > >>
> >>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
> >>> >>>>>>> >>>> > >> > >>
> >>> >>>>>>> >>>> > >> > >>
> >>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
> >>> thomas.jungblut@gmail.com>
> >>> >>>>>>> >>>> > >> > >>
> >>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <
> edwardyoon@apache.org>
> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do.
> >>> March 1 is
> >>> >>>>>>> >>>> > holiday[1]
> >>> >>>>>>> >>>> > >> so
> >>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> > >>>> 1.
> >>> >>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas
> Jungblut
> >>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
> >>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't
> >>> observe if all
> >>> >>>>>>> >>>> items
> >>> >>>>>>> >>>> > >> were
> >>> >>>>>>> >>>> > >> > >>>> added.
> >>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the
> logic
> >>> of the ID
> >>> >>>>>>> into
> >>> >>>>>>> >>>> > the
> >>> >>>>>>> >>>> > >> > >>>> fastgen,
> >>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
> >>> >>>>>>> >>>> > >> > >>>> >
> >>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <
> edwardyoon@apache.org
> >>> >
> >>> >>>>>>> >>>> > >> > >>>> >
> >>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when
> generate
> >>> adjacency
> >>> >>>>>>> >>>> matrix
> >>> >>>>>>> >>>> > >> into
> >>> >>>>>>> >>>> > >> > >>>> >> multiple files.
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas
> >>> Jungblut
> >>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
> >>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned
> >>> correctly?
> >>> >>>>>>> >>>> > >> > >>>> >> >
> >>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
> >>> edwardyoon@apache.org>
> >>> >>>>>>> >>>> > >> > >>>> >> >
> >>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> ls
> >>> -al
> >>> >>>>>>> >>>> > >> /tmp/randomgraph/
> >>> >>>>>>> >>>> > >> > >>>> >> >> total 44
> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28
> >>> 18:03 .
> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28
> >>> 18:04 ..
> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28
> >>> 18:01
> >>> >>>>>>> part-00000
> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
> >>> 18:01
> >>> >>>>>>> >>>> > .part-00000.crc
> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28
> >>> 18:01
> >>> >>>>>>> part-00001
> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
> >>> 18:01
> >>> >>>>>>> >>>> > .part-00001.crc
> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28
> >>> 18:03
> >>> >>>>>>> partitions
> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> ls
> >>> -al
> >>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
> >>> >>>>>>> >>>> > >> > >>>> >> >> total 24
> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28
> >>> 18:03 .
> >>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28
> >>> 18:03 ..
> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28
> 18:03
> >>> >>>>>>> part-00000
> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
> 18:03
> >>> >>>>>>> >>>> > .part-00000.crc
> >>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28
> 18:03
> >>> >>>>>>> part-00001
> >>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
> 18:03
> >>> >>>>>>> >>>> > .part-00001.crc
> >>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward
> <
> >>> >>>>>>> >>>> edward@udanax.org
> >>> >>>>>>> >>>> > >
> >>> >>>>>>> >>>> > >> > wrote:
> >>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
> >>> Jungblut <
> >>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
> >>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
> >>> >>>>>>> >>>> > >> > >>>> >> >> >
> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me
> >>> please?
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen,
> >>> part-00000 and
> >>> >>>>>>> >>>> > >> part-00001,
> >>> >>>>>>> >>>> > >> > >>>> both
> >>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
> >>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
> >>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory,
> there
> >>> is only a
> >>> >>>>>>> >>>> single
> >>> >>>>>>> >>>> > >> > 5.56kb
> >>> >>>>>>> >>>> > >> > >>>> file.
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to
> >>> write a
> >>> >>>>>>> single
> >>> >>>>>>> >>>> > file
> >>> >>>>>>> >>>> > >> if
> >>> >>>>>>> >>>> > >> > you
> >>> >>>>>>> >>>> > >> > >>>> >> >> configured
> >>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
> >>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files,
> strange
> >>> huh?
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
> >>> >>>>>>> thomas.jungblut@gmail.com>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph
> 1
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph
> /tmp/pageout
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
> >>> profiled, maybe
> >>> >>>>>>> the
> >>> >>>>>>> >>>> > >> > >>>> partitioning
> >>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input
> or
> >>> something
> >>> >>>>>>> else.
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
> >>> edwardyoon@apache.org
> >>> >>>>>>> >
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for
> graph
> >>> examples.
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>> >>>>>>> :~/workspace/hama-trunk$
> >>> >>>>>>> >>>> > >> bin/hama
> >>> >>>>>>> >>>> > >> > jar
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
> >>> >>>>>>> >>>> > >> > fastgen
> >>> >>>>>>> >>>> > >> > >>>> 100 10
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
> >>> util.NativeCodeLoader:
> >>> >>>>>>> Unable
> >>> >>>>>>> >>>> > to
> >>> >>>>>>> >>>> > >> > load
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
> >>> platform...
> >>> >>>>>>> using
> >>> >>>>>>> >>>> > >> > builtin-java
> >>> >>>>>>> >>>> > >> > >>>> >> classes
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> Running
> >>> >>>>>>> >>>> job:
> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> >>> bsp.LocalBSPRunner:
> >>> >>>>>>> Setting
> >>> >>>>>>> >>>> up
> >>> >>>>>>> >>>> > a
> >>> >>>>>>> >>>> > >> new
> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> Current
> >>> >>>>>>> >>>> > >> supersteps
> >>> >>>>>>> >>>> > >> > >>>> >> number: 0
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> bsp.BSPJobClient: The
> >>> >>>>>>> total
> >>> >>>>>>> >>>> > number
> >>> >>>>>>> >>>> > >> > of
> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> Counters: 3
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > SUPERSTEPS=0
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>> >>>>>>> :~/workspace/hama-trunk$
> >>> >>>>>>> >>>> > >> bin/hama
> >>> >>>>>>> >>>> > >> > jar
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> examples/target/hama-examples-0.7.0-SNAPSHOT
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>> >>>>>>> :~/workspace/hama-trunk$
> >>> >>>>>>> >>>> > >> bin/hama
> >>> >>>>>>> >>>> > >> > jar
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> >>> >>>>>>> >>>> > pagerank
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
> >>> util.NativeCodeLoader:
> >>> >>>>>>> Unable
> >>> >>>>>>> >>>> > to
> >>> >>>>>>> >>>> > >> > load
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
> >>> platform...
> >>> >>>>>>> using
> >>> >>>>>>> >>>> > >> > builtin-java
> >>> >>>>>>> >>>> > >> > >>>> >> classes
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> >>> bsp.FileInputFormat:
> >>> >>>>>>> Total
> >>> >>>>>>> >>>> > input
> >>> >>>>>>> >>>> > >> > paths
> >>> >>>>>>> >>>> > >> > >>>> to
> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> >>> bsp.FileInputFormat:
> >>> >>>>>>> Total
> >>> >>>>>>> >>>> > input
> >>> >>>>>>> >>>> > >> > paths
> >>> >>>>>>> >>>> > >> > >>>> to
> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> Running
> >>> >>>>>>> >>>> job:
> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> >>> bsp.LocalBSPRunner:
> >>> >>>>>>> Setting
> >>> >>>>>>> >>>> up
> >>> >>>>>>> >>>> > a
> >>> >>>>>>> >>>> > >> new
> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> Current
> >>> >>>>>>> >>>> > >> supersteps
> >>> >>>>>>> >>>> > >> > >>>> >> number: 1
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient: The
> >>> >>>>>>> total
> >>> >>>>>>> >>>> > number
> >>> >>>>>>> >>>> > >> > of
> >>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> Counters: 6
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > SUPERSTEPS=1
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.FileInputFormat:
> >>> >>>>>>> Total
> >>> >>>>>>> >>>> > input
> >>> >>>>>>> >>>> > >> > paths
> >>> >>>>>>> >>>> > >> > >>>> to
> >>> >>>>>>> >>>> > >> > >>>> >> >> process
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.BSPJobClient:
> >>> >>>>>>> Running
> >>> >>>>>>> >>>> job:
> >>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> bsp.LocalBSPRunner:
> >>> >>>>>>> Setting
> >>> >>>>>>> >>>> up
> >>> >>>>>>> >>>> > a
> >>> >>>>>>> >>>> > >> new
> >>> >>>>>>> >>>> > >> > >>>> barrier
> >>> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> graph.GraphJobRunner: 50
> >>> >>>>>>> >>>> > vertices
> >>> >>>>>>> >>>> > >> > are
> >>> >>>>>>> >>>> > >> > >>>> loaded
> >>> >>>>>>> >>>> > >> > >>>> >> >> into
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> >>> graph.GraphJobRunner: 50
> >>> >>>>>>> >>>> > vertices
> >>> >>>>>>> >>>> > >> > are
> >>> >>>>>>> >>>> > >> > >>>> loaded
> >>> >>>>>>> >>>> > >> > >>>> >> >> into
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
> >>> bsp.LocalBSPRunner:
> >>> >>>>>>> >>>> Exception
> >>> >>>>>>> >>>> > >> > during
> >>> >>>>>>> >>>> > >> > >>>> BSP
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
> >>> Messages
> >>> >>>>>>> must
> >>> >>>>>>> >>>> > never
> >>> >>>>>>> >>>> > >> be
> >>> >>>>>>> >>>> > >> > >>>> behind
> >>> >>>>>>> >>>> > >> > >>>> >> the
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1
> >>> vs. 50
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>>
> >>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> >
> >>> >>>>>>>
> >>>
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>>
> >>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> >
> >>> >>>>>>>
> >>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >>
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>>
> >>> >>>>>>>
> >>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >>
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>>
> >>> >>>>>>>
> >>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> >
> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>>
> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> >
> >>> >>>>>>>
> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> >
> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>>
> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >>
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>>
> >>> >>>>>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >>
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>>
> >>> >>>>>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>> java.lang.Thread.run(Thread.java:722)
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >> >> --
> >>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
> >>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
> >>> >>>>>>> >>>> > >> > >>>> >> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>> >> --
> >>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
> >>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
> >>> >>>>>>> >>>> > >> > >>>> >>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> > >>>> --
> >>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
> >>> >>>>>>> >>>> > >> > >>>> @eddieyoon
> >>> >>>>>>> >>>> > >> > >>>>
> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>>>>> >>>> > >> > >>>
> >>> >>>>>>> >>>> > >> > >>
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >> > --
> >>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
> >>> >>>>>>> >>>> > >> > @eddieyoon
> >>> >>>>>>> >>>> > >> >
> >>> >>>>>>> >>>> > >>
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>> > --
> >>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
> >>> >>>>>>> >>>> > @eddieyoon
> >>> >>>>>>> >>>> >
> >>> >>>>>>> >>>>
> >>> >>>>>>> >>
> >>> >>>>>>> >>
> >>> >>>>>>> >>
> >>> >>>>>>> >> --
> >>> >>>>>>> >> Best Regards, Edward J. Yoon
> >>> >>>>>>> >> @eddieyoon
> >>> >>>>>>> >
> >>> >>>>>>> >
> >>> >>>>>>> >
> >>> >>>>>>> > --
> >>> >>>>>>> > Best Regards, Edward J. Yoon
> >>> >>>>>>> > @eddieyoon
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> --
> >>> >>>>>>> Best Regards, Edward J. Yoon
> >>> >>>>>>> @eddieyoon
> >>> >>>>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> --
> >>> >>>>> Best Regards, Edward J. Yoon
> >>> >>>>> @eddieyoon
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> --
> >>> >>>> Best Regards, Edward J. Yoon
> >>> >>>> @eddieyoon
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> Best Regards, Edward J. Yoon
> >>> >>> @eddieyoon
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Best Regards, Edward J. Yoon
> >>> >> @eddieyoon
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Best Regards, Edward J. Yoon
> >>> > @eddieyoon
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

P.S., Please don't say like that.

No decisions made yet. And if someone have a question or missed
something, you have to try to explain here. Because this is a open
source. Anyone can't say "don't touch trunk bc I'm working on it".

On Thu, Mar 14, 2013 at 4:37 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Sorry for my quick and dirty style small patches.
>
> However, we should work together in parallel. Please share here if
> there are some progresses.
>
> On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
> <th...@gmail.com> wrote:
>> Hi Edward,
>>
>> before you run riot on all along the codebase, Suraj ist currently working
>> on that stuff- don't make it more difficult for him rebasing all his
>> patches the whole time.
>> He has the plan so that we made to make the stuff working, his part is
>> currently missing. So don't try to muddle arround there, it will make this
>> take longer than already needed.
>>
>>
>>
>> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>>
>>> Personally, I would like to solve this issue by touching
>>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
>>> multiple files, we can avoid huge memory consumption.
>>>
>>> If we want to sort partitioned data using messaging system, idea
>>> should be collected.
>>>
>>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
>>> >
>>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> >> I'm reading changes of HAMA-704 again. As a result of adding
>>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
>>> >> but I think this approach will bring more disadvantages than
>>> >> advantages.
>>> >>
>>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>>> user space
>>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>>> This way
>>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>>> sorted
>>> >>>>>> with a single read and single write on every peer.
>>> >>>
>>> >>> And, as I commented JIRA ticket, I think we can't use messaging system
>>> >>> for sorting vertices within partition files.
>>> >>>
>>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>>> edwardyoon@apache.org> wrote:
>>> >>>> P.S., (number of splits = number of partitions) is really confuse to
>>> >>>> me. Even though blocks number is equal to desired tasks number, data
>>> >>>> should be re-partitioned again.
>>> >>>>
>>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>>> edwardyoon@apache.org> wrote:
>>> >>>>> Indeed. If there are already partitioned input files (unsorted) and
>>> so
>>> >>>>> user want to skip pre-partitioning phase, it should be handled in
>>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
>>> >>>>> re-partitioned files need to be Sorted. It's only about
>>> >>>>> GraphJobRunner.
>>> >>>>>
>>> >>>>>> partitioning. (This is outside the scope of graphs. We can have a
>>> dedicated
>>> >>>>>> partitioning superstep for graph applications).
>>> >>>>>
>>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
>>> partitioning
>>> >>>>> job based on superstep API?
>>> >>>>>
>>> >>>>> By default, 100 tasks will be assigned for partitioning job.
>>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can execute
>>> >>>>> the Graph job with 1,000 tasks.
>>> >>>>>
>>> >>>>> Let's assume that a input sequence file is 20GB (100 blocks). If I
>>> >>>>> want to run with 1,000 tasks, what happens?
>>> >>>>>
>>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org>
>>> wrote:
>>> >>>>>> I am responding on this thread because of better continuity for
>>> >>>>>> conversation. We cannot expect the partitions to be sorted every
>>> time. When
>>> >>>>>> the number of splits = number of partitions and partitioning is
>>> switched
>>> >>>>>> off by user[HAMA-561], the partitions would not be sorted. Can we
>>> do this
>>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>>> user space
>>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>>> This way
>>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>>> sorted
>>> >>>>>> with a single read and single write on every peer.
>>> >>>>>>
>>> >>>>>> Just clearing confusion if any regarding superstep injection for
>>> >>>>>> partitioning. (This is outside the scope of graphs. We can have a
>>> dedicated
>>> >>>>>> partitioning superstep for graph applications).
>>> >>>>>> Say there are x splits and y number of tasks configured by user.
>>> >>>>>>
>>> >>>>>> if x > y
>>> >>>>>> The y tasks are scheduled with x of them having each of the x
>>> splits and
>>> >>>>>> the remaining with no resource local to them. Then the partitioning
>>> >>>>>> superstep redistributes the partitions among them to create local
>>> >>>>>> partitions. Now the question is can we re-initialize a peer's input
>>> based
>>> >>>>>> on this new local part of partition?
>>> >>>>>>
>>> >>>>>> if y > x
>>> >>>>>> works as it works today.
>>> >>>>>>
>>> >>>>>> Just putting my points in brainstorming.
>>> >>>>>>
>>> >>>>>> -Suraj
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>>> edwardyoon@apache.org>wrote:
>>> >>>>>>
>>> >>>>>>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>>> >>>>>>>
>>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>>>>>> wrote:
>>> >>>>>>> > Additionally,
>>> >>>>>>> >
>>> >>>>>>> >> spilling queue and sorted spilling queue, can we inject the
>>> partitioning
>>> >>>>>>> >> superstep as the first superstep and use local memory?
>>> >>>>>>> >
>>> >>>>>>> > Can we execute different number of tasks per superstep?
>>> >>>>>>> >
>>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>>>>>> wrote:
>>> >>>>>>> >>> For graph processing, the partitioned files that result from
>>> the
>>> >>>>>>> >>> partitioning job must be sorted. Currently only the partition
>>> files in
>>> >>>>>>> >>
>>> >>>>>>> >> I see.
>>> >>>>>>> >>
>>> >>>>>>> >>> For other partitionings and with regard to our superstep API,
>>> Suraj's
>>> >>>>>>> idea
>>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>>> stuff into
>>> >>>>>>> our
>>> >>>>>>> >>> messaging system is actually the best.
>>> >>>>>>> >>
>>> >>>>>>> >> BTW, if some garbage objects can be accumulated in partitioning
>>> step,
>>> >>>>>>> >> separated partitioning job may not be bad idea. Is there some
>>> special
>>> >>>>>>> >> reason?
>>> >>>>>>> >>
>>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>> >>>>>>> >> <th...@gmail.com> wrote:
>>> >>>>>>> >>> For graph processing, the partitioned files that result from
>>> the
>>> >>>>>>> >>> partitioning job must be sorted. Currently only the partition
>>> files in
>>> >>>>>>> >>> itself are sorted, thus more tasks result in not sorted data
>>> in the
>>> >>>>>>> >>> completed file. This only applies for the graph processing
>>> package.
>>> >>>>>>> >>> So as Suraj told, it would be much more simpler to solve this
>>> via
>>> >>>>>>> >>> messaging, once it is scalable (it will be very very
>>> scalable!). So the
>>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single
>>> superstep in
>>> >>>>>>> >>> setup() as it was before ages ago. The messaging must be
>>> sorted anyway
>>> >>>>>>> for
>>> >>>>>>> >>> the algorithm so this is a nice side effect and saves us the
>>> >>>>>>> partitioning
>>> >>>>>>> >>> job for graph processing.
>>> >>>>>>> >>>
>>> >>>>>>> >>> For other partitionings and with regard to our superstep API,
>>> Suraj's
>>> >>>>>>> idea
>>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>>> stuff into
>>> >>>>>>> our
>>> >>>>>>> >>> messaging system is actually the best.
>>> >>>>>>> >>>
>>> >>>>>>> >>>
>>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>> >>>>>>> >>>
>>> >>>>>>> >>>> No, the partitions we write locally need not be sorted. Sorry
>>> for the
>>> >>>>>>> >>>> confusion. The Superstep injection is possible with Superstep
>>> API.
>>> >>>>>>> There
>>> >>>>>>> >>>> are few enhancements needed to make it simpler after I last
>>> worked on
>>> >>>>>>> it.
>>> >>>>>>> >>>> We can then look into partitioning superstep being executed
>>> before the
>>> >>>>>>> >>>> setup of first superstep of submitted job. I think it is
>>> feasible.
>>> >>>>>>> >>>>
>>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
>>> edwardyoon@apache.org
>>> >>>>>>> >>>> >wrote:
>>> >>>>>>> >>>>
>>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject
>>> the
>>> >>>>>>> >>>> partitioning
>>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>> > Actually, I wanted to add something before calling
>>> BSP.setup()
>>> >>>>>>> method
>>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion,
>>> current is
>>> >>>>>>> >>>> > enough. I think, we need to collect more experiences of
>>> input
>>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>> >>>>>>> surajsmenon@apache.org>
>>> >>>>>>> >>>> > wrote:
>>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph
>>> module.
>>> >>>>>>> When we
>>> >>>>>>> >>>> > have
>>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject
>>> the
>>> >>>>>>> >>>> partitioning
>>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>> >>>>>>> >>>> > > Today we have partitioning job within a job and are
>>> creating two
>>> >>>>>>> copies
>>> >>>>>>> >>>> > of
>>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it possible
>>> to
>>> >>>>>>> create or
>>> >>>>>>> >>>> > > redistribute the partitions on local memory and
>>> initialize the
>>> >>>>>>> record
>>> >>>>>>> >>>> > > reader there?
>>> >>>>>>> >>>> > > The user can run a separate job give in examples area to
>>> >>>>>>> explicitly
>>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment question is
>>> how much
>>> >>>>>>> of
>>> >>>>>>> >>>> disk
>>> >>>>>>> >>>> > > space gets allocated for local memory usage? Would it be
>>> a safe
>>> >>>>>>> >>>> approach
>>> >>>>>>> >>>> > > with the limitations?
>>> >>>>>>> >>>> > >
>>> >>>>>>> >>>> > > -Suraj
>>> >>>>>>> >>>> > >
>>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>>> >>>>>>> >>>> > >
>>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can add
>>> this to
>>> >>>>>>> the
>>> >>>>>>> >>>> > >> partitioner pretty easily.
>>> >>>>>>> >>>> > >>
>>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> >>>>>>> >>>> > >>
>>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary
>>> to be
>>> >>>>>>> Sorted?
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>>> >>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously if
>>> you merge
>>> >>>>>>> n
>>> >>>>>>> >>>> > sorted
>>> >>>>>>> >>>> > >> > files
>>> >>>>>>> >>>> > >> > > by just appending to each other, this will result in
>>> totally
>>> >>>>>>> >>>> > unsorted
>>> >>>>>>> >>>> > >> > data
>>> >>>>>>> >>>> > >> > > ;-)
>>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>>> >>>>>>> >>>> > >> > >
>>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com
>>> >
>>> >>>>>>> >>>> > >> > >
>>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>> >>>>>>> >>>> > >> > >>
>>> >>>>>>> >>>> > >> > >> vertexID: 50
>>> >>>>>>> >>>> > >> > >> vertexID: 52
>>> >>>>>>> >>>> > >> > >> vertexID: 54
>>> >>>>>>> >>>> > >> > >> vertexID: 56
>>> >>>>>>> >>>> > >> > >> vertexID: 58
>>> >>>>>>> >>>> > >> > >> vertexID: 61
>>> >>>>>>> >>>> > >> > >> ...
>>> >>>>>>> >>>> > >> > >> vertexID: 78
>>> >>>>>>> >>>> > >> > >> vertexID: 81
>>> >>>>>>> >>>> > >> > >> vertexID: 83
>>> >>>>>>> >>>> > >> > >> vertexID: 85
>>> >>>>>>> >>>> > >> > >> ...
>>> >>>>>>> >>>> > >> > >> vertexID: 94
>>> >>>>>>> >>>> > >> > >> vertexID: 96
>>> >>>>>>> >>>> > >> > >> vertexID: 98
>>> >>>>>>> >>>> > >> > >> vertexID: 1
>>> >>>>>>> >>>> > >> > >> vertexID: 10
>>> >>>>>>> >>>> > >> > >> vertexID: 12
>>> >>>>>>> >>>> > >> > >> vertexID: 14
>>> >>>>>>> >>>> > >> > >> vertexID: 16
>>> >>>>>>> >>>> > >> > >> vertexID: 18
>>> >>>>>>> >>>> > >> > >> vertexID: 21
>>> >>>>>>> >>>> > >> > >> vertexID: 23
>>> >>>>>>> >>>> > >> > >> vertexID: 25
>>> >>>>>>> >>>> > >> > >> vertexID: 27
>>> >>>>>>> >>>> > >> > >> vertexID: 29
>>> >>>>>>> >>>> > >> > >> vertexID: 3
>>> >>>>>>> >>>> > >> > >>
>>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>>> >>>>>>> >>>> > >> > >>
>>> >>>>>>> >>>> > >> > >>
>>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>>> thomas.jungblut@gmail.com>
>>> >>>>>>> >>>> > >> > >>
>>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>>> >>>>>>> >>>> > >> > >>>
>>> >>>>>>> >>>> > >> > >>>
>>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> >>>>>>> >>>> > >> > >>>
>>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do.
>>> March 1 is
>>> >>>>>>> >>>> > holiday[1]
>>> >>>>>>> >>>> > >> so
>>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> > >>>> 1.
>>> >>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't
>>> observe if all
>>> >>>>>>> >>>> items
>>> >>>>>>> >>>> > >> were
>>> >>>>>>> >>>> > >> > >>>> added.
>>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic
>>> of the ID
>>> >>>>>>> into
>>> >>>>>>> >>>> > the
>>> >>>>>>> >>>> > >> > >>>> fastgen,
>>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>>> >>>>>>> >>>> > >> > >>>> >
>>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>>> >
>>> >>>>>>> >>>> > >> > >>>> >
>>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate
>>> adjacency
>>> >>>>>>> >>>> matrix
>>> >>>>>>> >>>> > >> into
>>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas
>>> Jungblut
>>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned
>>> correctly?
>>> >>>>>>> >>>> > >> > >>>> >> >
>>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>>>>>> >>>> > >> > >>>> >> >
>>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls
>>> -al
>>> >>>>>>> >>>> > >> /tmp/randomgraph/
>>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28
>>> 18:03 .
>>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28
>>> 18:04 ..
>>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28
>>> 18:01
>>> >>>>>>> part-00000
>>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>>> 18:01
>>> >>>>>>> >>>> > .part-00000.crc
>>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28
>>> 18:01
>>> >>>>>>> part-00001
>>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>>> 18:01
>>> >>>>>>> >>>> > .part-00001.crc
>>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28
>>> 18:03
>>> >>>>>>> partitions
>>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls
>>> -al
>>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28
>>> 18:03 .
>>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28
>>> 18:03 ..
>>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
>>> >>>>>>> part-00000
>>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>> >>>>>>> >>>> > .part-00000.crc
>>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
>>> >>>>>>> part-00001
>>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>> >>>>>>> >>>> > .part-00001.crc
>>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>>> >>>>>>> >>>> edward@udanax.org
>>> >>>>>>> >>>> > >
>>> >>>>>>> >>>> > >> > wrote:
>>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
>>> Jungblut <
>>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>>> >>>>>>> >>>> > >> > >>>> >> >> >
>>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me
>>> please?
>>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen,
>>> part-00000 and
>>> >>>>>>> >>>> > >> part-00001,
>>> >>>>>>> >>>> > >> > >>>> both
>>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory, there
>>> is only a
>>> >>>>>>> >>>> single
>>> >>>>>>> >>>> > >> > 5.56kb
>>> >>>>>>> >>>> > >> > >>>> file.
>>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to
>>> write a
>>> >>>>>>> single
>>> >>>>>>> >>>> > file
>>> >>>>>>> >>>> > >> if
>>> >>>>>>> >>>> > >> > you
>>> >>>>>>> >>>> > >> > >>>> >> >> configured
>>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange
>>> huh?
>>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>> >>>>>>> thomas.jungblut@gmail.com>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
>>> profiled, maybe
>>> >>>>>>> the
>>> >>>>>>> >>>> > >> > >>>> partitioning
>>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or
>>> something
>>> >>>>>>> else.
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
>>> edwardyoon@apache.org
>>> >>>>>>> >
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph
>>> examples.
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>>>>>> :~/workspace/hama-trunk$
>>> >>>>>>> >>>> > >> bin/hama
>>> >>>>>>> >>>> > >> > jar
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>> >>>>>>> >>>> > >> > fastgen
>>> >>>>>>> >>>> > >> > >>>> 100 10
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>>> util.NativeCodeLoader:
>>> >>>>>>> Unable
>>> >>>>>>> >>>> > to
>>> >>>>>>> >>>> > >> > load
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>>> platform...
>>> >>>>>>> using
>>> >>>>>>> >>>> > >> > builtin-java
>>> >>>>>>> >>>> > >> > >>>> >> classes
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> Running
>>> >>>>>>> >>>> job:
>>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>>> bsp.LocalBSPRunner:
>>> >>>>>>> Setting
>>> >>>>>>> >>>> up
>>> >>>>>>> >>>> > a
>>> >>>>>>> >>>> > >> new
>>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> Current
>>> >>>>>>> >>>> > >> supersteps
>>> >>>>>>> >>>> > >> > >>>> >> number: 0
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> bsp.BSPJobClient: The
>>> >>>>>>> total
>>> >>>>>>> >>>> > number
>>> >>>>>>> >>>> > >> > of
>>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> Counters: 3
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > SUPERSTEPS=0
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>>>>>> :~/workspace/hama-trunk$
>>> >>>>>>> >>>> > >> bin/hama
>>> >>>>>>> >>>> > >> > jar
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> >>>>>>> :~/workspace/hama-trunk$
>>> >>>>>>> >>>> > >> bin/hama
>>> >>>>>>> >>>> > >> > jar
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>> >>>>>>> >>>> > pagerank
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>>> util.NativeCodeLoader:
>>> >>>>>>> Unable
>>> >>>>>>> >>>> > to
>>> >>>>>>> >>>> > >> > load
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>>> platform...
>>> >>>>>>> using
>>> >>>>>>> >>>> > >> > builtin-java
>>> >>>>>>> >>>> > >> > >>>> >> classes
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>>> bsp.FileInputFormat:
>>> >>>>>>> Total
>>> >>>>>>> >>>> > input
>>> >>>>>>> >>>> > >> > paths
>>> >>>>>>> >>>> > >> > >>>> to
>>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>>> bsp.FileInputFormat:
>>> >>>>>>> Total
>>> >>>>>>> >>>> > input
>>> >>>>>>> >>>> > >> > paths
>>> >>>>>>> >>>> > >> > >>>> to
>>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> Running
>>> >>>>>>> >>>> job:
>>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>>> bsp.LocalBSPRunner:
>>> >>>>>>> Setting
>>> >>>>>>> >>>> up
>>> >>>>>>> >>>> > a
>>> >>>>>>> >>>> > >> new
>>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> Current
>>> >>>>>>> >>>> > >> supersteps
>>> >>>>>>> >>>> > >> > >>>> >> number: 1
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient: The
>>> >>>>>>> total
>>> >>>>>>> >>>> > number
>>> >>>>>>> >>>> > >> > of
>>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> Counters: 6
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > SUPERSTEPS=1
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.FileInputFormat:
>>> >>>>>>> Total
>>> >>>>>>> >>>> > input
>>> >>>>>>> >>>> > >> > paths
>>> >>>>>>> >>>> > >> > >>>> to
>>> >>>>>>> >>>> > >> > >>>> >> >> process
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.BSPJobClient:
>>> >>>>>>> Running
>>> >>>>>>> >>>> job:
>>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> bsp.LocalBSPRunner:
>>> >>>>>>> Setting
>>> >>>>>>> >>>> up
>>> >>>>>>> >>>> > a
>>> >>>>>>> >>>> > >> new
>>> >>>>>>> >>>> > >> > >>>> barrier
>>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> graph.GraphJobRunner: 50
>>> >>>>>>> >>>> > vertices
>>> >>>>>>> >>>> > >> > are
>>> >>>>>>> >>>> > >> > >>>> loaded
>>> >>>>>>> >>>> > >> > >>>> >> >> into
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>>> graph.GraphJobRunner: 50
>>> >>>>>>> >>>> > vertices
>>> >>>>>>> >>>> > >> > are
>>> >>>>>>> >>>> > >> > >>>> loaded
>>> >>>>>>> >>>> > >> > >>>> >> >> into
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>>> bsp.LocalBSPRunner:
>>> >>>>>>> >>>> Exception
>>> >>>>>>> >>>> > >> > during
>>> >>>>>>> >>>> > >> > >>>> BSP
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
>>> Messages
>>> >>>>>>> must
>>> >>>>>>> >>>> > never
>>> >>>>>>> >>>> > >> be
>>> >>>>>>> >>>> > >> > >>>> behind
>>> >>>>>>> >>>> > >> > >>>> >> the
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1
>>> vs. 50
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>>
>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> >
>>> >>>>>>>
>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>>
>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> >
>>> >>>>>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >>
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>>
>>> >>>>>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >>
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>>
>>> >>>>>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> >
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>>
>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> >
>>> >>>>>>>
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> >
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>>
>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >>
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>>
>>> >>>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >>
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>>
>>> >>>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>> java.lang.Thread.run(Thread.java:722)
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >> >> --
>>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>>> >>>>>>> >>>> > >> > >>>> >> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>> >> --
>>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>>> >>>>>>> >>>> > >> > >>>> >>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> > >>>> --
>>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>>> >>>>>>> >>>> > >> > >>>>
>>> >>>>>>> >>>> > >> > >>>
>>> >>>>>>> >>>> > >> > >>>
>>> >>>>>>> >>>> > >> > >>
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >> > --
>>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>>> >>>>>>> >>>> > >> > @eddieyoon
>>> >>>>>>> >>>> > >> >
>>> >>>>>>> >>>> > >>
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>> > --
>>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>>> >>>>>>> >>>> > @eddieyoon
>>> >>>>>>> >>>> >
>>> >>>>>>> >>>>
>>> >>>>>>> >>
>>> >>>>>>> >>
>>> >>>>>>> >>
>>> >>>>>>> >> --
>>> >>>>>>> >> Best Regards, Edward J. Yoon
>>> >>>>>>> >> @eddieyoon
>>> >>>>>>> >
>>> >>>>>>> >
>>> >>>>>>> >
>>> >>>>>>> > --
>>> >>>>>>> > Best Regards, Edward J. Yoon
>>> >>>>>>> > @eddieyoon
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> Best Regards, Edward J. Yoon
>>> >>>>>>> @eddieyoon
>>> >>>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Best Regards, Edward J. Yoon
>>> >>>>> @eddieyoon
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Best Regards, Edward J. Yoon
>>> >>>> @eddieyoon
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Best Regards, Edward J. Yoon
>>> >>> @eddieyoon
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> @eddieyoon
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon
>>> > @eddieyoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

Sorry for my quick and dirty style small patches.

However, we should work together in parallel. Please share here if
there are some progresses.

On Thu, Mar 14, 2013 at 3:46 PM, Thomas Jungblut
<th...@gmail.com> wrote:
> Hi Edward,
>
> before you run riot on all along the codebase, Suraj ist currently working
> on that stuff- don't make it more difficult for him rebasing all his
> patches the whole time.
> He has the plan so that we made to make the stuff working, his part is
> currently missing. So don't try to muddle arround there, it will make this
> take longer than already needed.
>
>
>
> 2013/3/14 Edward J. Yoon <ed...@apache.org>
>
>> Personally, I would like to solve this issue by touching
>> DiskVerticesInfo. If we write sorted sub-sets of vertices into
>> multiple files, we can avoid huge memory consumption.
>>
>> If we want to sort partitioned data using messaging system, idea
>> should be collected.
>>
>> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
>> >
>> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> >> I'm reading changes of HAMA-704 again. As a result of adding
>> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
>> >> but I think this approach will bring more disadvantages than
>> >> advantages.
>> >>
>> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>> user space
>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>> This way
>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>> sorted
>> >>>>>> with a single read and single write on every peer.
>> >>>
>> >>> And, as I commented JIRA ticket, I think we can't use messaging system
>> >>> for sorting vertices within partition files.
>> >>>
>> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
>> edwardyoon@apache.org> wrote:
>> >>>> P.S., (number of splits = number of partitions) is really confuse to
>> >>>> me. Even though blocks number is equal to desired tasks number, data
>> >>>> should be re-partitioned again.
>> >>>>
>> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
>> edwardyoon@apache.org> wrote:
>> >>>>> Indeed. If there are already partitioned input files (unsorted) and
>> so
>> >>>>> user want to skip pre-partitioning phase, it should be handled in
>> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
>> >>>>> re-partitioned files need to be Sorted. It's only about
>> >>>>> GraphJobRunner.
>> >>>>>
>> >>>>>> partitioning. (This is outside the scope of graphs. We can have a
>> dedicated
>> >>>>>> partitioning superstep for graph applications).
>> >>>>>
>> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
>> partitioning
>> >>>>> job based on superstep API?
>> >>>>>
>> >>>>> By default, 100 tasks will be assigned for partitioning job.
>> >>>>> Partitioning job will create 1,000 partitions. Thus, we can execute
>> >>>>> the Graph job with 1,000 tasks.
>> >>>>>
>> >>>>> Let's assume that a input sequence file is 20GB (100 blocks). If I
>> >>>>> want to run with 1,000 tasks, what happens?
>> >>>>>
>> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org>
>> wrote:
>> >>>>>> I am responding on this thread because of better continuity for
>> >>>>>> conversation. We cannot expect the partitions to be sorted every
>> time. When
>> >>>>>> the number of splits = number of partitions and partitioning is
>> switched
>> >>>>>> off by user[HAMA-561], the partitions would not be sorted. Can we
>> do this
>> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
>> user space
>> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
>> This way
>> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
>> sorted
>> >>>>>> with a single read and single write on every peer.
>> >>>>>>
>> >>>>>> Just clearing confusion if any regarding superstep injection for
>> >>>>>> partitioning. (This is outside the scope of graphs. We can have a
>> dedicated
>> >>>>>> partitioning superstep for graph applications).
>> >>>>>> Say there are x splits and y number of tasks configured by user.
>> >>>>>>
>> >>>>>> if x > y
>> >>>>>> The y tasks are scheduled with x of them having each of the x
>> splits and
>> >>>>>> the remaining with no resource local to them. Then the partitioning
>> >>>>>> superstep redistributes the partitions among them to create local
>> >>>>>> partitions. Now the question is can we re-initialize a peer's input
>> based
>> >>>>>> on this new local part of partition?
>> >>>>>>
>> >>>>>> if y > x
>> >>>>>> works as it works today.
>> >>>>>>
>> >>>>>> Just putting my points in brainstorming.
>> >>>>>>
>> >>>>>> -Suraj
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
>> edwardyoon@apache.org>wrote:
>> >>>>>>
>> >>>>>>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>> >>>>>>>
>> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>>>>>> wrote:
>> >>>>>>> > Additionally,
>> >>>>>>> >
>> >>>>>>> >> spilling queue and sorted spilling queue, can we inject the
>> partitioning
>> >>>>>>> >> superstep as the first superstep and use local memory?
>> >>>>>>> >
>> >>>>>>> > Can we execute different number of tasks per superstep?
>> >>>>>>> >
>> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>>>>>> wrote:
>> >>>>>>> >>> For graph processing, the partitioned files that result from
>> the
>> >>>>>>> >>> partitioning job must be sorted. Currently only the partition
>> files in
>> >>>>>>> >>
>> >>>>>>> >> I see.
>> >>>>>>> >>
>> >>>>>>> >>> For other partitionings and with regard to our superstep API,
>> Suraj's
>> >>>>>>> idea
>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>> stuff into
>> >>>>>>> our
>> >>>>>>> >>> messaging system is actually the best.
>> >>>>>>> >>
>> >>>>>>> >> BTW, if some garbage objects can be accumulated in partitioning
>> step,
>> >>>>>>> >> separated partitioning job may not be bad idea. Is there some
>> special
>> >>>>>>> >> reason?
>> >>>>>>> >>
>> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>> >>>>>>> >> <th...@gmail.com> wrote:
>> >>>>>>> >>> For graph processing, the partitioned files that result from
>> the
>> >>>>>>> >>> partitioning job must be sorted. Currently only the partition
>> files in
>> >>>>>>> >>> itself are sorted, thus more tasks result in not sorted data
>> in the
>> >>>>>>> >>> completed file. This only applies for the graph processing
>> package.
>> >>>>>>> >>> So as Suraj told, it would be much more simpler to solve this
>> via
>> >>>>>>> >>> messaging, once it is scalable (it will be very very
>> scalable!). So the
>> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single
>> superstep in
>> >>>>>>> >>> setup() as it was before ages ago. The messaging must be
>> sorted anyway
>> >>>>>>> for
>> >>>>>>> >>> the algorithm so this is a nice side effect and saves us the
>> >>>>>>> partitioning
>> >>>>>>> >>> job for graph processing.
>> >>>>>>> >>>
>> >>>>>>> >>> For other partitionings and with regard to our superstep API,
>> Suraj's
>> >>>>>>> idea
>> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
>> stuff into
>> >>>>>>> our
>> >>>>>>> >>> messaging system is actually the best.
>> >>>>>>> >>>
>> >>>>>>> >>>
>> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>> >>>>>>> >>>
>> >>>>>>> >>>> No, the partitions we write locally need not be sorted. Sorry
>> for the
>> >>>>>>> >>>> confusion. The Superstep injection is possible with Superstep
>> API.
>> >>>>>>> There
>> >>>>>>> >>>> are few enhancements needed to make it simpler after I last
>> worked on
>> >>>>>>> it.
>> >>>>>>> >>>> We can then look into partitioning superstep being executed
>> before the
>> >>>>>>> >>>> setup of first superstep of submitted job. I think it is
>> feasible.
>> >>>>>>> >>>>
>> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
>> edwardyoon@apache.org
>> >>>>>>> >>>> >wrote:
>> >>>>>>> >>>>
>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject
>> the
>> >>>>>>> >>>> partitioning
>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>> >>>>>>> >>>> >
>> >>>>>>> >>>> > Actually, I wanted to add something before calling
>> BSP.setup()
>> >>>>>>> method
>> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion,
>> current is
>> >>>>>>> >>>> > enough. I think, we need to collect more experiences of
>> input
>> >>>>>>> >>>> > partitioning on large environments. I'll do.
>> >>>>>>> >>>> >
>> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>> >>>>>>> >>>> >
>> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>> >>>>>>> surajsmenon@apache.org>
>> >>>>>>> >>>> > wrote:
>> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph
>> module.
>> >>>>>>> When we
>> >>>>>>> >>>> > have
>> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject
>> the
>> >>>>>>> >>>> partitioning
>> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
>> >>>>>>> >>>> > > Today we have partitioning job within a job and are
>> creating two
>> >>>>>>> copies
>> >>>>>>> >>>> > of
>> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it possible
>> to
>> >>>>>>> create or
>> >>>>>>> >>>> > > redistribute the partitions on local memory and
>> initialize the
>> >>>>>>> record
>> >>>>>>> >>>> > > reader there?
>> >>>>>>> >>>> > > The user can run a separate job give in examples area to
>> >>>>>>> explicitly
>> >>>>>>> >>>> > > repartition the data on HDFS. The deployment question is
>> how much
>> >>>>>>> of
>> >>>>>>> >>>> disk
>> >>>>>>> >>>> > > space gets allocated for local memory usage? Would it be
>> a safe
>> >>>>>>> >>>> approach
>> >>>>>>> >>>> > > with the limitations?
>> >>>>>>> >>>> > >
>> >>>>>>> >>>> > > -Suraj
>> >>>>>>> >>>> > >
>> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>> >>>>>>> >>>> > > <th...@gmail.com>wrote:
>> >>>>>>> >>>> > >
>> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can add
>> this to
>> >>>>>>> the
>> >>>>>>> >>>> > >> partitioner pretty easily.
>> >>>>>>> >>>> > >>
>> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> >>>>>>> >>>> > >>
>> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary
>> to be
>> >>>>>>> Sorted?
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>> >>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously if
>> you merge
>> >>>>>>> n
>> >>>>>>> >>>> > sorted
>> >>>>>>> >>>> > >> > files
>> >>>>>>> >>>> > >> > > by just appending to each other, this will result in
>> totally
>> >>>>>>> >>>> > unsorted
>> >>>>>>> >>>> > >> > data
>> >>>>>>> >>>> > >> > > ;-)
>> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>> >>>>>>> >>>> > >> > >
>> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com
>> >
>> >>>>>>> >>>> > >> > >
>> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>> >>>>>>> >>>> > >> > >>
>> >>>>>>> >>>> > >> > >> vertexID: 50
>> >>>>>>> >>>> > >> > >> vertexID: 52
>> >>>>>>> >>>> > >> > >> vertexID: 54
>> >>>>>>> >>>> > >> > >> vertexID: 56
>> >>>>>>> >>>> > >> > >> vertexID: 58
>> >>>>>>> >>>> > >> > >> vertexID: 61
>> >>>>>>> >>>> > >> > >> ...
>> >>>>>>> >>>> > >> > >> vertexID: 78
>> >>>>>>> >>>> > >> > >> vertexID: 81
>> >>>>>>> >>>> > >> > >> vertexID: 83
>> >>>>>>> >>>> > >> > >> vertexID: 85
>> >>>>>>> >>>> > >> > >> ...
>> >>>>>>> >>>> > >> > >> vertexID: 94
>> >>>>>>> >>>> > >> > >> vertexID: 96
>> >>>>>>> >>>> > >> > >> vertexID: 98
>> >>>>>>> >>>> > >> > >> vertexID: 1
>> >>>>>>> >>>> > >> > >> vertexID: 10
>> >>>>>>> >>>> > >> > >> vertexID: 12
>> >>>>>>> >>>> > >> > >> vertexID: 14
>> >>>>>>> >>>> > >> > >> vertexID: 16
>> >>>>>>> >>>> > >> > >> vertexID: 18
>> >>>>>>> >>>> > >> > >> vertexID: 21
>> >>>>>>> >>>> > >> > >> vertexID: 23
>> >>>>>>> >>>> > >> > >> vertexID: 25
>> >>>>>>> >>>> > >> > >> vertexID: 27
>> >>>>>>> >>>> > >> > >> vertexID: 29
>> >>>>>>> >>>> > >> > >> vertexID: 3
>> >>>>>>> >>>> > >> > >>
>> >>>>>>> >>>> > >> > >> So this won't work then correctly...
>> >>>>>>> >>>> > >> > >>
>> >>>>>>> >>>> > >> > >>
>> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
>> thomas.jungblut@gmail.com>
>> >>>>>>> >>>> > >> > >>
>> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>> >>>>>>> >>>> > >> > >>>
>> >>>>>>> >>>> > >> > >>>
>> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> >>>>>>> >>>> > >> > >>>
>> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do.
>> March 1 is
>> >>>>>>> >>>> > holiday[1]
>> >>>>>>> >>>> > >> so
>> >>>>>>> >>>> > >> > >>>> I'll appear next week.
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> > >>>> 1.
>> >>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't
>> observe if all
>> >>>>>>> >>>> items
>> >>>>>>> >>>> > >> were
>> >>>>>>> >>>> > >> > >>>> added.
>> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic
>> of the ID
>> >>>>>>> into
>> >>>>>>> >>>> > the
>> >>>>>>> >>>> > >> > >>>> fastgen,
>> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
>> >>>>>>> >>>> > >> > >>>> >
>> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>> >
>> >>>>>>> >>>> > >> > >>>> >
>> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate
>> adjacency
>> >>>>>>> >>>> matrix
>> >>>>>>> >>>> > >> into
>> >>>>>>> >>>> > >> > >>>> >> multiple files.
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas
>> Jungblut
>> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned
>> correctly?
>> >>>>>>> >>>> > >> > >>>> >> >
>> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>>>>>> >>>> > >> > >>>> >> >
>> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls
>> -al
>> >>>>>>> >>>> > >> /tmp/randomgraph/
>> >>>>>>> >>>> > >> > >>>> >> >> total 44
>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28
>> 18:03 .
>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28
>> 18:04 ..
>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28
>> 18:01
>> >>>>>>> part-00000
>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>> 18:01
>> >>>>>>> >>>> > .part-00000.crc
>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28
>> 18:01
>> >>>>>>> part-00001
>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
>> 18:01
>> >>>>>>> >>>> > .part-00001.crc
>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28
>> 18:03
>> >>>>>>> partitions
>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls
>> -al
>> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>> >>>>>>> >>>> > >> > >>>> >> >> total 24
>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28
>> 18:03 .
>> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28
>> 18:03 ..
>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
>> >>>>>>> part-00000
>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>> >>>>>>> >>>> > .part-00000.crc
>> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
>> >>>>>>> part-00001
>> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>> >>>>>>> >>>> > .part-00001.crc
>> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>> >>>>>>> >>>> edward@udanax.org
>> >>>>>>> >>>> > >
>> >>>>>>> >>>> > >> > wrote:
>> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
>> Jungblut <
>> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>> >>>>>>> >>>> > >> > >>>> >> >> wrote:
>> >>>>>>> >>>> > >> > >>>> >> >> >
>> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me
>> please?
>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen,
>> part-00000 and
>> >>>>>>> >>>> > >> part-00001,
>> >>>>>>> >>>> > >> > >>>> both
>> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
>> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
>> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory, there
>> is only a
>> >>>>>>> >>>> single
>> >>>>>>> >>>> > >> > 5.56kb
>> >>>>>>> >>>> > >> > >>>> file.
>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to
>> write a
>> >>>>>>> single
>> >>>>>>> >>>> > file
>> >>>>>>> >>>> > >> if
>> >>>>>>> >>>> > >> > you
>> >>>>>>> >>>> > >> > >>>> >> >> configured
>> >>>>>>> >>>> > >> > >>>> >> >> >> two?
>> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange
>> huh?
>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>> >>>>>>> thomas.jungblut@gmail.com>
>> >>>>>>> >>>> > >> > >>>> >> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
>> profiled, maybe
>> >>>>>>> the
>> >>>>>>> >>>> > >> > >>>> partitioning
>> >>>>>>> >>>> > >> > >>>> >> >> doesn't
>> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or
>> something
>> >>>>>>> else.
>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
>> edwardyoon@apache.org
>> >>>>>>> >
>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph
>> examples.
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>>>>>> :~/workspace/hama-trunk$
>> >>>>>>> >>>> > >> bin/hama
>> >>>>>>> >>>> > >> > jar
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>> >>>>>>> >>>> > >> > fastgen
>> >>>>>>> >>>> > >> > >>>> 100 10
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
>> util.NativeCodeLoader:
>> >>>>>>> Unable
>> >>>>>>> >>>> > to
>> >>>>>>> >>>> > >> > load
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>> platform...
>> >>>>>>> using
>> >>>>>>> >>>> > >> > builtin-java
>> >>>>>>> >>>> > >> > >>>> >> classes
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> bsp.BSPJobClient:
>> >>>>>>> Running
>> >>>>>>> >>>> job:
>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
>> bsp.LocalBSPRunner:
>> >>>>>>> Setting
>> >>>>>>> >>>> up
>> >>>>>>> >>>> > a
>> >>>>>>> >>>> > >> new
>> >>>>>>> >>>> > >> > >>>> barrier
>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> bsp.BSPJobClient:
>> >>>>>>> Current
>> >>>>>>> >>>> > >> supersteps
>> >>>>>>> >>>> > >> > >>>> >> number: 0
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> bsp.BSPJobClient: The
>> >>>>>>> total
>> >>>>>>> >>>> > number
>> >>>>>>> >>>> > >> > of
>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> bsp.BSPJobClient:
>> >>>>>>> Counters: 3
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > SUPERSTEPS=0
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>>>>>> :~/workspace/hama-trunk$
>> >>>>>>> >>>> > >> bin/hama
>> >>>>>>> >>>> > >> > jar
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> >>>>>>> :~/workspace/hama-trunk$
>> >>>>>>> >>>> > >> bin/hama
>> >>>>>>> >>>> > >> > jar
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>> >>>>>>> >>>> > pagerank
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
>> util.NativeCodeLoader:
>> >>>>>>> Unable
>> >>>>>>> >>>> > to
>> >>>>>>> >>>> > >> > load
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
>> platform...
>> >>>>>>> using
>> >>>>>>> >>>> > >> > builtin-java
>> >>>>>>> >>>> > >> > >>>> >> classes
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> bsp.FileInputFormat:
>> >>>>>>> Total
>> >>>>>>> >>>> > input
>> >>>>>>> >>>> > >> > paths
>> >>>>>>> >>>> > >> > >>>> to
>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
>> bsp.FileInputFormat:
>> >>>>>>> Total
>> >>>>>>> >>>> > input
>> >>>>>>> >>>> > >> > paths
>> >>>>>>> >>>> > >> > >>>> to
>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> bsp.BSPJobClient:
>> >>>>>>> Running
>> >>>>>>> >>>> job:
>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
>> bsp.LocalBSPRunner:
>> >>>>>>> Setting
>> >>>>>>> >>>> up
>> >>>>>>> >>>> > a
>> >>>>>>> >>>> > >> new
>> >>>>>>> >>>> > >> > >>>> barrier
>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> Current
>> >>>>>>> >>>> > >> supersteps
>> >>>>>>> >>>> > >> > >>>> >> number: 1
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient: The
>> >>>>>>> total
>> >>>>>>> >>>> > number
>> >>>>>>> >>>> > >> > of
>> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> Counters: 6
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > SUPERSTEPS=1
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.FileInputFormat:
>> >>>>>>> Total
>> >>>>>>> >>>> > input
>> >>>>>>> >>>> > >> > paths
>> >>>>>>> >>>> > >> > >>>> to
>> >>>>>>> >>>> > >> > >>>> >> >> process
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.BSPJobClient:
>> >>>>>>> Running
>> >>>>>>> >>>> job:
>> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> bsp.LocalBSPRunner:
>> >>>>>>> Setting
>> >>>>>>> >>>> up
>> >>>>>>> >>>> > a
>> >>>>>>> >>>> > >> new
>> >>>>>>> >>>> > >> > >>>> barrier
>> >>>>>>> >>>> > >> > >>>> >> >> for 2
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> graph.GraphJobRunner: 50
>> >>>>>>> >>>> > vertices
>> >>>>>>> >>>> > >> > are
>> >>>>>>> >>>> > >> > >>>> loaded
>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
>> graph.GraphJobRunner: 50
>> >>>>>>> >>>> > vertices
>> >>>>>>> >>>> > >> > are
>> >>>>>>> >>>> > >> > >>>> loaded
>> >>>>>>> >>>> > >> > >>>> >> >> into
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
>> bsp.LocalBSPRunner:
>> >>>>>>> >>>> Exception
>> >>>>>>> >>>> > >> > during
>> >>>>>>> >>>> > >> > >>>> BSP
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
>> Messages
>> >>>>>>> must
>> >>>>>>> >>>> > never
>> >>>>>>> >>>> > >> be
>> >>>>>>> >>>> > >> > >>>> behind
>> >>>>>>> >>>> > >> > >>>> >> the
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1
>> vs. 50
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>>
>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> >
>> >>>>>>>
>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>>
>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> >
>> >>>>>>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >>
>> >>>>>>> >>>> >
>> >>>>>>> >>>>
>> >>>>>>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >>
>> >>>>>>> >>>> >
>> >>>>>>> >>>>
>> >>>>>>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> >
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>>
>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> >
>> >>>>>>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> >
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>>
>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >>
>> >>>>>>> >>>> >
>> >>>>>>> >>>>
>> >>>>>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >>
>> >>>>>>> >>>> >
>> >>>>>>> >>>>
>> >>>>>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>> java.lang.Thread.run(Thread.java:722)
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>>>>> >>>> > >> > >>>> >> >> >>>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >> >> --
>> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>> >>>>>>> >>>> > >> > >>>> >> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>> >> --
>> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
>> >>>>>>> >>>> > >> > >>>> >>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> > >>>> --
>> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>> >>>>>>> >>>> > >> > >>>> @eddieyoon
>> >>>>>>> >>>> > >> > >>>>
>> >>>>>>> >>>> > >> > >>>
>> >>>>>>> >>>> > >> > >>>
>> >>>>>>> >>>> > >> > >>
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >> > --
>> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>> >>>>>>> >>>> > >> > @eddieyoon
>> >>>>>>> >>>> > >> >
>> >>>>>>> >>>> > >>
>> >>>>>>> >>>> >
>> >>>>>>> >>>> >
>> >>>>>>> >>>> >
>> >>>>>>> >>>> > --
>> >>>>>>> >>>> > Best Regards, Edward J. Yoon
>> >>>>>>> >>>> > @eddieyoon
>> >>>>>>> >>>> >
>> >>>>>>> >>>>
>> >>>>>>> >>
>> >>>>>>> >>
>> >>>>>>> >>
>> >>>>>>> >> --
>> >>>>>>> >> Best Regards, Edward J. Yoon
>> >>>>>>> >> @eddieyoon
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> > --
>> >>>>>>> > Best Regards, Edward J. Yoon
>> >>>>>>> > @eddieyoon
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Best Regards, Edward J. Yoon
>> >>>>>>> @eddieyoon
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Best Regards, Edward J. Yoon
>> >>>>> @eddieyoon
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best Regards, Edward J. Yoon
>> >>>> @eddieyoon
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>> @eddieyoon
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by Thomas Jungblut <th...@gmail.com>.

Hi Edward,

before you run riot on all along the codebase, Suraj ist currently working
on that stuff- don't make it more difficult for him rebasing all his
patches the whole time.
He has the plan so that we made to make the stuff working, his part is
currently missing. So don't try to muddle arround there, it will make this
take longer than already needed.



2013/3/14 Edward J. Yoon <ed...@apache.org>

> Personally, I would like to solve this issue by touching
> DiskVerticesInfo. If we write sorted sub-sets of vertices into
> multiple files, we can avoid huge memory consumption.
>
> If we want to sort partitioned data using messaging system, idea
> should be collected.
>
> On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Oh, now I get how iterate() works. HAMA-704 is nicely written.
> >
> > On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
> >> I'm reading changes of HAMA-704 again. As a result of adding
> >> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
> >> but I think this approach will bring more disadvantages than
> >> advantages.
> >>
> >> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
> user space
> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
> This way
> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
> sorted
> >>>>>> with a single read and single write on every peer.
> >>>
> >>> And, as I commented JIRA ticket, I think we can't use messaging system
> >>> for sorting vertices within partition files.
> >>>
> >>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <
> edwardyoon@apache.org> wrote:
> >>>> P.S., (number of splits = number of partitions) is really confuse to
> >>>> me. Even though blocks number is equal to desired tasks number, data
> >>>> should be re-partitioned again.
> >>>>
> >>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <
> edwardyoon@apache.org> wrote:
> >>>>> Indeed. If there are already partitioned input files (unsorted) and
> so
> >>>>> user want to skip pre-partitioning phase, it should be handled in
> >>>>> GraphJobRunner BSP program. Actually, I still don't know why
> >>>>> re-partitioned files need to be Sorted. It's only about
> >>>>> GraphJobRunner.
> >>>>>
> >>>>>> partitioning. (This is outside the scope of graphs. We can have a
> dedicated
> >>>>>> partitioning superstep for graph applications).
> >>>>>
> >>>>> Sorry. I don't understand exactly yet. Do you mean just a
> partitioning
> >>>>> job based on superstep API?
> >>>>>
> >>>>> By default, 100 tasks will be assigned for partitioning job.
> >>>>> Partitioning job will create 1,000 partitions. Thus, we can execute
> >>>>> the Graph job with 1,000 tasks.
> >>>>>
> >>>>> Let's assume that a input sequence file is 20GB (100 blocks). If I
> >>>>> want to run with 1,000 tasks, what happens?
> >>>>>
> >>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org>
> wrote:
> >>>>>> I am responding on this thread because of better continuity for
> >>>>>> conversation. We cannot expect the partitions to be sorted every
> time. When
> >>>>>> the number of splits = number of partitions and partitioning is
> switched
> >>>>>> off by user[HAMA-561], the partitions would not be sorted. Can we
> do this
> >>>>>> in loadVertices? Maybe consider feature for coupling storage in
> user space
> >>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes.
> This way
> >>>>>> partitioned or non-partitioned by partitioner, can keep vertices
> sorted
> >>>>>> with a single read and single write on every peer.
> >>>>>>
> >>>>>> Just clearing confusion if any regarding superstep injection for
> >>>>>> partitioning. (This is outside the scope of graphs. We can have a
> dedicated
> >>>>>> partitioning superstep for graph applications).
> >>>>>> Say there are x splits and y number of tasks configured by user.
> >>>>>>
> >>>>>> if x > y
> >>>>>> The y tasks are scheduled with x of them having each of the x
> splits and
> >>>>>> the remaining with no resource local to them. Then the partitioning
> >>>>>> superstep redistributes the partitions among them to create local
> >>>>>> partitions. Now the question is can we re-initialize a peer's input
> based
> >>>>>> on this new local part of partition?
> >>>>>>
> >>>>>> if y > x
> >>>>>> works as it works today.
> >>>>>>
> >>>>>> Just putting my points in brainstorming.
> >>>>>>
> >>>>>> -Suraj
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <
> edwardyoon@apache.org>wrote:
> >>>>>>
> >>>>>>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
> >>>>>>>
> >>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>>>>>> wrote:
> >>>>>>> > Additionally,
> >>>>>>> >
> >>>>>>> >> spilling queue and sorted spilling queue, can we inject the
> partitioning
> >>>>>>> >> superstep as the first superstep and use local memory?
> >>>>>>> >
> >>>>>>> > Can we execute different number of tasks per superstep?
> >>>>>>> >
> >>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>>>>>> wrote:
> >>>>>>> >>> For graph processing, the partitioned files that result from
> the
> >>>>>>> >>> partitioning job must be sorted. Currently only the partition
> files in
> >>>>>>> >>
> >>>>>>> >> I see.
> >>>>>>> >>
> >>>>>>> >>> For other partitionings and with regard to our superstep API,
> Suraj's
> >>>>>>> idea
> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
> stuff into
> >>>>>>> our
> >>>>>>> >>> messaging system is actually the best.
> >>>>>>> >>
> >>>>>>> >> BTW, if some garbage objects can be accumulated in partitioning
> step,
> >>>>>>> >> separated partitioning job may not be bad idea. Is there some
> special
> >>>>>>> >> reason?
> >>>>>>> >>
> >>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
> >>>>>>> >> <th...@gmail.com> wrote:
> >>>>>>> >>> For graph processing, the partitioned files that result from
> the
> >>>>>>> >>> partitioning job must be sorted. Currently only the partition
> files in
> >>>>>>> >>> itself are sorted, thus more tasks result in not sorted data
> in the
> >>>>>>> >>> completed file. This only applies for the graph processing
> package.
> >>>>>>> >>> So as Suraj told, it would be much more simpler to solve this
> via
> >>>>>>> >>> messaging, once it is scalable (it will be very very
> scalable!). So the
> >>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single
> superstep in
> >>>>>>> >>> setup() as it was before ages ago. The messaging must be
> sorted anyway
> >>>>>>> for
> >>>>>>> >>> the algorithm so this is a nice side effect and saves us the
> >>>>>>> partitioning
> >>>>>>> >>> job for graph processing.
> >>>>>>> >>>
> >>>>>>> >>> For other partitionings and with regard to our superstep API,
> Suraj's
> >>>>>>> idea
> >>>>>>> >>> of injecting a preprocessing superstep that partitions the
> stuff into
> >>>>>>> our
> >>>>>>> >>> messaging system is actually the best.
> >>>>>>> >>>
> >>>>>>> >>>
> >>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
> >>>>>>> >>>
> >>>>>>> >>>> No, the partitions we write locally need not be sorted. Sorry
> for the
> >>>>>>> >>>> confusion. The Superstep injection is possible with Superstep
> API.
> >>>>>>> There
> >>>>>>> >>>> are few enhancements needed to make it simpler after I last
> worked on
> >>>>>>> it.
> >>>>>>> >>>> We can then look into partitioning superstep being executed
> before the
> >>>>>>> >>>> setup of first superstep of submitted job. I think it is
> feasible.
> >>>>>>> >>>>
> >>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >>>>>>> >>>> >wrote:
> >>>>>>> >>>>
> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject
> the
> >>>>>>> >>>> partitioning
> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
> >>>>>>> >>>> >
> >>>>>>> >>>> > Actually, I wanted to add something before calling
> BSP.setup()
> >>>>>>> method
> >>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion,
> current is
> >>>>>>> >>>> > enough. I think, we need to collect more experiences of
> input
> >>>>>>> >>>> > partitioning on large environments. I'll do.
> >>>>>>> >>>> >
> >>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
> >>>>>>> >>>> >
> >>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
> >>>>>>> surajsmenon@apache.org>
> >>>>>>> >>>> > wrote:
> >>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph
> module.
> >>>>>>> When we
> >>>>>>> >>>> > have
> >>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject
> the
> >>>>>>> >>>> partitioning
> >>>>>>> >>>> > > superstep as the first superstep and use local memory?
> >>>>>>> >>>> > > Today we have partitioning job within a job and are
> creating two
> >>>>>>> copies
> >>>>>>> >>>> > of
> >>>>>>> >>>> > > data on HDFS. This could be really costly. Is it possible
> to
> >>>>>>> create or
> >>>>>>> >>>> > > redistribute the partitions on local memory and
> initialize the
> >>>>>>> record
> >>>>>>> >>>> > > reader there?
> >>>>>>> >>>> > > The user can run a separate job give in examples area to
> >>>>>>> explicitly
> >>>>>>> >>>> > > repartition the data on HDFS. The deployment question is
> how much
> >>>>>>> of
> >>>>>>> >>>> disk
> >>>>>>> >>>> > > space gets allocated for local memory usage? Would it be
> a safe
> >>>>>>> >>>> approach
> >>>>>>> >>>> > > with the limitations?
> >>>>>>> >>>> > >
> >>>>>>> >>>> > > -Suraj
> >>>>>>> >>>> > >
> >>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> >>>>>>> >>>> > > <th...@gmail.com>wrote:
> >>>>>>> >>>> > >
> >>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can add
> this to
> >>>>>>> the
> >>>>>>> >>>> > >> partitioner pretty easily.
> >>>>>>> >>>> > >>
> >>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >>>>>>> >>>> > >>
> >>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary
> to be
> >>>>>>> Sorted?
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
> >>>>>>> >>>> > >> > <th...@gmail.com> wrote:
> >>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously if
> you merge
> >>>>>>> n
> >>>>>>> >>>> > sorted
> >>>>>>> >>>> > >> > files
> >>>>>>> >>>> > >> > > by just appending to each other, this will result in
> totally
> >>>>>>> >>>> > unsorted
> >>>>>>> >>>> > >> > data
> >>>>>>> >>>> > >> > > ;-)
> >>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
> >>>>>>> >>>> > >> > >
> >>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com
> >
> >>>>>>> >>>> > >> > >
> >>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
> >>>>>>> >>>> > >> > >>
> >>>>>>> >>>> > >> > >> vertexID: 50
> >>>>>>> >>>> > >> > >> vertexID: 52
> >>>>>>> >>>> > >> > >> vertexID: 54
> >>>>>>> >>>> > >> > >> vertexID: 56
> >>>>>>> >>>> > >> > >> vertexID: 58
> >>>>>>> >>>> > >> > >> vertexID: 61
> >>>>>>> >>>> > >> > >> ...
> >>>>>>> >>>> > >> > >> vertexID: 78
> >>>>>>> >>>> > >> > >> vertexID: 81
> >>>>>>> >>>> > >> > >> vertexID: 83
> >>>>>>> >>>> > >> > >> vertexID: 85
> >>>>>>> >>>> > >> > >> ...
> >>>>>>> >>>> > >> > >> vertexID: 94
> >>>>>>> >>>> > >> > >> vertexID: 96
> >>>>>>> >>>> > >> > >> vertexID: 98
> >>>>>>> >>>> > >> > >> vertexID: 1
> >>>>>>> >>>> > >> > >> vertexID: 10
> >>>>>>> >>>> > >> > >> vertexID: 12
> >>>>>>> >>>> > >> > >> vertexID: 14
> >>>>>>> >>>> > >> > >> vertexID: 16
> >>>>>>> >>>> > >> > >> vertexID: 18
> >>>>>>> >>>> > >> > >> vertexID: 21
> >>>>>>> >>>> > >> > >> vertexID: 23
> >>>>>>> >>>> > >> > >> vertexID: 25
> >>>>>>> >>>> > >> > >> vertexID: 27
> >>>>>>> >>>> > >> > >> vertexID: 29
> >>>>>>> >>>> > >> > >> vertexID: 3
> >>>>>>> >>>> > >> > >>
> >>>>>>> >>>> > >> > >> So this won't work then correctly...
> >>>>>>> >>>> > >> > >>
> >>>>>>> >>>> > >> > >>
> >>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <
> thomas.jungblut@gmail.com>
> >>>>>>> >>>> > >> > >>
> >>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
> >>>>>>> >>>> > >> > >>>
> >>>>>>> >>>> > >> > >>>
> >>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >>>>>>> >>>> > >> > >>>
> >>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do.
> March 1 is
> >>>>>>> >>>> > holiday[1]
> >>>>>>> >>>> > >> so
> >>>>>>> >>>> > >> > >>>> I'll appear next week.
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> > >>>> 1.
> >>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
> >>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
> >>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't
> observe if all
> >>>>>>> >>>> items
> >>>>>>> >>>> > >> were
> >>>>>>> >>>> > >> > >>>> added.
> >>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic
> of the ID
> >>>>>>> into
> >>>>>>> >>>> > the
> >>>>>>> >>>> > >> > >>>> fastgen,
> >>>>>>> >>>> > >> > >>>> > want to have a look into it?
> >>>>>>> >>>> > >> > >>>> >
> >>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
> >
> >>>>>>> >>>> > >> > >>>> >
> >>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate
> adjacency
> >>>>>>> >>>> matrix
> >>>>>>> >>>> > >> into
> >>>>>>> >>>> > >> > >>>> >> multiple files.
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas
> Jungblut
> >>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
> >>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned
> correctly?
> >>>>>>> >>>> > >> > >>>> >> >
> >>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <
> edwardyoon@apache.org>
> >>>>>>> >>>> > >> > >>>> >> >
> >>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls
> -al
> >>>>>>> >>>> > >> /tmp/randomgraph/
> >>>>>>> >>>> > >> > >>>> >> >> total 44
> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28
> 18:03 .
> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28
> 18:04 ..
> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28
> 18:01
> >>>>>>> part-00000
> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
> 18:01
> >>>>>>> >>>> > .part-00000.crc
> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28
> 18:01
> >>>>>>> part-00001
> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
> 18:01
> >>>>>>> >>>> > .part-00001.crc
> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28
> 18:03
> >>>>>>> partitions
> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls
> -al
> >>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
> >>>>>>> >>>> > >> > >>>> >> >> total 24
> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28
> 18:03 .
> >>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28
> 18:03 ..
> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
> >>>>>>> part-00000
> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
> >>>>>>> >>>> > .part-00000.crc
> >>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
> >>>>>>> part-00001
> >>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
> >>>>>>> >>>> > .part-00001.crc
> >>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
> >>>>>>> >>>> edward@udanax.org
> >>>>>>> >>>> > >
> >>>>>>> >>>> > >> > wrote:
> >>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
> >>>>>>> >>>> > >> > >>>> >> >> >
> >>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
> >>>>>>> >>>> > >> > >>>> >> >> >
> >>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
> Jungblut <
> >>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
> >>>>>>> >>>> > >> > >>>> >> >> wrote:
> >>>>>>> >>>> > >> > >>>> >> >> >
> >>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me
> please?
> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen,
> part-00000 and
> >>>>>>> >>>> > >> part-00001,
> >>>>>>> >>>> > >> > >>>> both
> >>>>>>> >>>> > >> > >>>> >> ~2.2kb
> >>>>>>> >>>> > >> > >>>> >> >> >> sized.
> >>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory, there
> is only a
> >>>>>>> >>>> single
> >>>>>>> >>>> > >> > 5.56kb
> >>>>>>> >>>> > >> > >>>> file.
> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to
> write a
> >>>>>>> single
> >>>>>>> >>>> > file
> >>>>>>> >>>> > >> if
> >>>>>>> >>>> > >> > you
> >>>>>>> >>>> > >> > >>>> >> >> configured
> >>>>>>> >>>> > >> > >>>> >> >> >> two?
> >>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange
> huh?
> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
> >>>>>>> thomas.jungblut@gmail.com>
> >>>>>>> >>>> > >> > >>>> >> >> >>
> >>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
> >>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I
> profiled, maybe
> >>>>>>> the
> >>>>>>> >>>> > >> > >>>> partitioning
> >>>>>>> >>>> > >> > >>>> >> >> doesn't
> >>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or
> something
> >>>>>>> else.
> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <
> edwardyoon@apache.org
> >>>>>>> >
> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph
> examples.
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>>>>>> :~/workspace/hama-trunk$
> >>>>>>> >>>> > >> bin/hama
> >>>>>>> >>>> > >> > jar
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
> >>>>>>> >>>> > >> > fastgen
> >>>>>>> >>>> > >> > >>>> 100 10
> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
> util.NativeCodeLoader:
> >>>>>>> Unable
> >>>>>>> >>>> > to
> >>>>>>> >>>> > >> > load
> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
> platform...
> >>>>>>> using
> >>>>>>> >>>> > >> > builtin-java
> >>>>>>> >>>> > >> > >>>> >> classes
> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> bsp.BSPJobClient:
> >>>>>>> Running
> >>>>>>> >>>> job:
> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
> bsp.LocalBSPRunner:
> >>>>>>> Setting
> >>>>>>> >>>> up
> >>>>>>> >>>> > a
> >>>>>>> >>>> > >> new
> >>>>>>> >>>> > >> > >>>> barrier
> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> bsp.BSPJobClient:
> >>>>>>> Current
> >>>>>>> >>>> > >> supersteps
> >>>>>>> >>>> > >> > >>>> >> number: 0
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> bsp.BSPJobClient: The
> >>>>>>> total
> >>>>>>> >>>> > number
> >>>>>>> >>>> > >> > of
> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> bsp.BSPJobClient:
> >>>>>>> Counters: 3
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> org.apache.hama.bsp.JobInProgress$JobCounter
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > SUPERSTEPS=0
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
> >>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>>>>>> :~/workspace/hama-trunk$
> >>>>>>> >>>> > >> bin/hama
> >>>>>>> >>>> > >> > jar
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> examples/target/hama-examples-0.7.0-SNAPSHOT
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> >>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
> >>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> >>>>>>> :~/workspace/hama-trunk$
> >>>>>>> >>>> > >> bin/hama
> >>>>>>> >>>> > >> > jar
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> >>>>>>> >>>> > pagerank
> >>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
> util.NativeCodeLoader:
> >>>>>>> Unable
> >>>>>>> >>>> > to
> >>>>>>> >>>> > >> > load
> >>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your
> platform...
> >>>>>>> using
> >>>>>>> >>>> > >> > builtin-java
> >>>>>>> >>>> > >> > >>>> >> classes
> >>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> bsp.FileInputFormat:
> >>>>>>> Total
> >>>>>>> >>>> > input
> >>>>>>> >>>> > >> > paths
> >>>>>>> >>>> > >> > >>>> to
> >>>>>>> >>>> > >> > >>>> >> >> process
> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
> bsp.FileInputFormat:
> >>>>>>> Total
> >>>>>>> >>>> > input
> >>>>>>> >>>> > >> > paths
> >>>>>>> >>>> > >> > >>>> to
> >>>>>>> >>>> > >> > >>>> >> >> process
> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> bsp.BSPJobClient:
> >>>>>>> Running
> >>>>>>> >>>> job:
> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
> bsp.LocalBSPRunner:
> >>>>>>> Setting
> >>>>>>> >>>> up
> >>>>>>> >>>> > a
> >>>>>>> >>>> > >> new
> >>>>>>> >>>> > >> > >>>> barrier
> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> Current
> >>>>>>> >>>> > >> supersteps
> >>>>>>> >>>> > >> > >>>> >> number: 1
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient: The
> >>>>>>> total
> >>>>>>> >>>> > number
> >>>>>>> >>>> > >> > of
> >>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> Counters: 6
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> org.apache.hama.bsp.JobInProgress$JobCounter
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > SUPERSTEPS=1
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > LAUNCHED_TASKS=2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > SUPERSTEP_SUM=4
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.FileInputFormat:
> >>>>>>> Total
> >>>>>>> >>>> > input
> >>>>>>> >>>> > >> > paths
> >>>>>>> >>>> > >> > >>>> to
> >>>>>>> >>>> > >> > >>>> >> >> process
> >>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.BSPJobClient:
> >>>>>>> Running
> >>>>>>> >>>> job:
> >>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> bsp.LocalBSPRunner:
> >>>>>>> Setting
> >>>>>>> >>>> up
> >>>>>>> >>>> > a
> >>>>>>> >>>> > >> new
> >>>>>>> >>>> > >> > >>>> barrier
> >>>>>>> >>>> > >> > >>>> >> >> for 2
> >>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> graph.GraphJobRunner: 50
> >>>>>>> >>>> > vertices
> >>>>>>> >>>> > >> > are
> >>>>>>> >>>> > >> > >>>> loaded
> >>>>>>> >>>> > >> > >>>> >> >> into
> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
> graph.GraphJobRunner: 50
> >>>>>>> >>>> > vertices
> >>>>>>> >>>> > >> > are
> >>>>>>> >>>> > >> > >>>> loaded
> >>>>>>> >>>> > >> > >>>> >> >> into
> >>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
> >>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
> bsp.LocalBSPRunner:
> >>>>>>> >>>> Exception
> >>>>>>> >>>> > >> > during
> >>>>>>> >>>> > >> > >>>> BSP
> >>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
> >>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
> Messages
> >>>>>>> must
> >>>>>>> >>>> > never
> >>>>>>> >>>> > >> be
> >>>>>>> >>>> > >> > >>>> behind
> >>>>>>> >>>> > >> > >>>> >> the
> >>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1
> vs. 50
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>>
> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> >
> >>>>>>>
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>>
> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> >
> >>>>>>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >>
> >>>>>>> >>>> >
> >>>>>>> >>>>
> >>>>>>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >>
> >>>>>>> >>>> >
> >>>>>>> >>>>
> >>>>>>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>>
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> >
> >>>>>>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>>
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >>
> >>>>>>> >>>> >
> >>>>>>> >>>>
> >>>>>>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >>
> >>>>>>> >>>> >
> >>>>>>> >>>>
> >>>>>>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>        at
> java.lang.Thread.run(Thread.java:722)
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >> >>>>
> >>>>>>> >>>> > >> > >>>> >> >> >>>> --
> >>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
> >>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>>>>> >>>> > >> > >>>> >> >> >>>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >> >> --
> >>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
> >>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
> >>>>>>> >>>> > >> > >>>> >> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>> >> --
> >>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
> >>>>>>> >>>> > >> > >>>> >> @eddieyoon
> >>>>>>> >>>> > >> > >>>> >>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> > >>>> --
> >>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
> >>>>>>> >>>> > >> > >>>> @eddieyoon
> >>>>>>> >>>> > >> > >>>>
> >>>>>>> >>>> > >> > >>>
> >>>>>>> >>>> > >> > >>>
> >>>>>>> >>>> > >> > >>
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >> > --
> >>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
> >>>>>>> >>>> > >> > @eddieyoon
> >>>>>>> >>>> > >> >
> >>>>>>> >>>> > >>
> >>>>>>> >>>> >
> >>>>>>> >>>> >
> >>>>>>> >>>> >
> >>>>>>> >>>> > --
> >>>>>>> >>>> > Best Regards, Edward J. Yoon
> >>>>>>> >>>> > @eddieyoon
> >>>>>>> >>>> >
> >>>>>>> >>>>
> >>>>>>> >>
> >>>>>>> >>
> >>>>>>> >>
> >>>>>>> >> --
> >>>>>>> >> Best Regards, Edward J. Yoon
> >>>>>>> >> @eddieyoon
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> > --
> >>>>>>> > Best Regards, Edward J. Yoon
> >>>>>>> > @eddieyoon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best Regards, Edward J. Yoon
> >>>>>>> @eddieyoon
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best Regards, Edward J. Yoon
> >>>>> @eddieyoon
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>> @eddieyoon
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

Personally, I would like to solve this issue by touching
DiskVerticesInfo. If we write sorted sub-sets of vertices into
multiple files, we can avoid huge memory consumption.

If we want to sort partitioned data using messaging system, idea
should be collected.

On Thu, Mar 14, 2013 at 10:31 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Oh, now I get how iterate() works. HAMA-704 is nicely written.
>
> On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
>> I'm reading changes of HAMA-704 again. As a result of adding
>> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
>> but I think this approach will bring more disadvantages than
>> advantages.
>>
>> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>>>>> in loadVertices? Maybe consider feature for coupling storage in user space
>>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>>>>>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>>>>>> with a single read and single write on every peer.
>>>
>>> And, as I commented JIRA ticket, I think we can't use messaging system
>>> for sorting vertices within partition files.
>>>
>>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>>> P.S., (number of splits = number of partitions) is really confuse to
>>>> me. Even though blocks number is equal to desired tasks number, data
>>>> should be re-partitioned again.
>>>>
>>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>>>> Indeed. If there are already partitioned input files (unsorted) and so
>>>>> user want to skip pre-partitioning phase, it should be handled in
>>>>> GraphJobRunner BSP program. Actually, I still don't know why
>>>>> re-partitioned files need to be Sorted. It's only about
>>>>> GraphJobRunner.
>>>>>
>>>>>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>>>>>> partitioning superstep for graph applications).
>>>>>
>>>>> Sorry. I don't understand exactly yet. Do you mean just a partitioning
>>>>> job based on superstep API?
>>>>>
>>>>> By default, 100 tasks will be assigned for partitioning job.
>>>>> Partitioning job will create 1,000 partitions. Thus, we can execute
>>>>> the Graph job with 1,000 tasks.
>>>>>
>>>>> Let's assume that a input sequence file is 20GB (100 blocks). If I
>>>>> want to run with 1,000 tasks, what happens?
>>>>>
>>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org> wrote:
>>>>>> I am responding on this thread because of better continuity for
>>>>>> conversation. We cannot expect the partitions to be sorted every time. When
>>>>>> the number of splits = number of partitions and partitioning is switched
>>>>>> off by user[HAMA-561], the partitions would not be sorted. Can we do this
>>>>>> in loadVertices? Maybe consider feature for coupling storage in user space
>>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>>>>>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>>>>>> with a single read and single write on every peer.
>>>>>>
>>>>>> Just clearing confusion if any regarding superstep injection for
>>>>>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>>>>>> partitioning superstep for graph applications).
>>>>>> Say there are x splits and y number of tasks configured by user.
>>>>>>
>>>>>> if x > y
>>>>>> The y tasks are scheduled with x of them having each of the x splits and
>>>>>> the remaining with no resource local to them. Then the partitioning
>>>>>> superstep redistributes the partitions among them to create local
>>>>>> partitions. Now the question is can we re-initialize a peer's input based
>>>>>> on this new local part of partition?
>>>>>>
>>>>>> if y > x
>>>>>> works as it works today.
>>>>>>
>>>>>> Just putting my points in brainstorming.
>>>>>>
>>>>>> -Suraj
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>>>>>
>>>>>>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>>>>>>>
>>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <ed...@apache.org>
>>>>>>> wrote:
>>>>>>> > Additionally,
>>>>>>> >
>>>>>>> >> spilling queue and sorted spilling queue, can we inject the partitioning
>>>>>>> >> superstep as the first superstep and use local memory?
>>>>>>> >
>>>>>>> > Can we execute different number of tasks per superstep?
>>>>>>> >
>>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org>
>>>>>>> wrote:
>>>>>>> >>> For graph processing, the partitioned files that result from the
>>>>>>> >>> partitioning job must be sorted. Currently only the partition files in
>>>>>>> >>
>>>>>>> >> I see.
>>>>>>> >>
>>>>>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>>>>>> idea
>>>>>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>>>>>> our
>>>>>>> >>> messaging system is actually the best.
>>>>>>> >>
>>>>>>> >> BTW, if some garbage objects can be accumulated in partitioning step,
>>>>>>> >> separated partitioning job may not be bad idea. Is there some special
>>>>>>> >> reason?
>>>>>>> >>
>>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>>>>>> >> <th...@gmail.com> wrote:
>>>>>>> >>> For graph processing, the partitioned files that result from the
>>>>>>> >>> partitioning job must be sorted. Currently only the partition files in
>>>>>>> >>> itself are sorted, thus more tasks result in not sorted data in the
>>>>>>> >>> completed file. This only applies for the graph processing package.
>>>>>>> >>> So as Suraj told, it would be much more simpler to solve this via
>>>>>>> >>> messaging, once it is scalable (it will be very very scalable!). So the
>>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single superstep in
>>>>>>> >>> setup() as it was before ages ago. The messaging must be sorted anyway
>>>>>>> for
>>>>>>> >>> the algorithm so this is a nice side effect and saves us the
>>>>>>> partitioning
>>>>>>> >>> job for graph processing.
>>>>>>> >>>
>>>>>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>>>>>> idea
>>>>>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>>>>>> our
>>>>>>> >>> messaging system is actually the best.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>>>>>> >>>
>>>>>>> >>>> No, the partitions we write locally need not be sorted. Sorry for the
>>>>>>> >>>> confusion. The Superstep injection is possible with Superstep API.
>>>>>>> There
>>>>>>> >>>> are few enhancements needed to make it simpler after I last worked on
>>>>>>> it.
>>>>>>> >>>> We can then look into partitioning superstep being executed before the
>>>>>>> >>>> setup of first superstep of submitted job. I think it is feasible.
>>>>>>> >>>>
>>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>>>>>>> >>>> >wrote:
>>>>>>> >>>>
>>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>>>>>> >>>> partitioning
>>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>>>>>> >>>> >
>>>>>>> >>>> > Actually, I wanted to add something before calling BSP.setup()
>>>>>>> method
>>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion, current is
>>>>>>> >>>> > enough. I think, we need to collect more experiences of input
>>>>>>> >>>> > partitioning on large environments. I'll do.
>>>>>>> >>>> >
>>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>>>>>>> >>>> >
>>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>>>>>> surajsmenon@apache.org>
>>>>>>> >>>> > wrote:
>>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph module.
>>>>>>> When we
>>>>>>> >>>> > have
>>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>>>>>> >>>> partitioning
>>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>>>>>> >>>> > > Today we have partitioning job within a job and are creating two
>>>>>>> copies
>>>>>>> >>>> > of
>>>>>>> >>>> > > data on HDFS. This could be really costly. Is it possible to
>>>>>>> create or
>>>>>>> >>>> > > redistribute the partitions on local memory and initialize the
>>>>>>> record
>>>>>>> >>>> > > reader there?
>>>>>>> >>>> > > The user can run a separate job give in examples area to
>>>>>>> explicitly
>>>>>>> >>>> > > repartition the data on HDFS. The deployment question is how much
>>>>>>> of
>>>>>>> >>>> disk
>>>>>>> >>>> > > space gets allocated for local memory usage? Would it be a safe
>>>>>>> >>>> approach
>>>>>>> >>>> > > with the limitations?
>>>>>>> >>>> > >
>>>>>>> >>>> > > -Suraj
>>>>>>> >>>> > >
>>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>>>>>> >>>> > > <th...@gmail.com>wrote:
>>>>>>> >>>> > >
>>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can add this to
>>>>>>> the
>>>>>>> >>>> > >> partitioner pretty easily.
>>>>>>> >>>> > >>
>>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>>>> >>>> > >>
>>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be
>>>>>>> Sorted?
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously if you merge
>>>>>>> n
>>>>>>> >>>> > sorted
>>>>>>> >>>> > >> > files
>>>>>>> >>>> > >> > > by just appending to each other, this will result in totally
>>>>>>> >>>> > unsorted
>>>>>>> >>>> > >> > data
>>>>>>> >>>> > >> > > ;-)
>>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>>>>>>> >>>> > >> > >
>>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>>>>> >>>> > >> > >
>>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>>>>>> >>>> > >> > >>
>>>>>>> >>>> > >> > >> vertexID: 50
>>>>>>> >>>> > >> > >> vertexID: 52
>>>>>>> >>>> > >> > >> vertexID: 54
>>>>>>> >>>> > >> > >> vertexID: 56
>>>>>>> >>>> > >> > >> vertexID: 58
>>>>>>> >>>> > >> > >> vertexID: 61
>>>>>>> >>>> > >> > >> ...
>>>>>>> >>>> > >> > >> vertexID: 78
>>>>>>> >>>> > >> > >> vertexID: 81
>>>>>>> >>>> > >> > >> vertexID: 83
>>>>>>> >>>> > >> > >> vertexID: 85
>>>>>>> >>>> > >> > >> ...
>>>>>>> >>>> > >> > >> vertexID: 94
>>>>>>> >>>> > >> > >> vertexID: 96
>>>>>>> >>>> > >> > >> vertexID: 98
>>>>>>> >>>> > >> > >> vertexID: 1
>>>>>>> >>>> > >> > >> vertexID: 10
>>>>>>> >>>> > >> > >> vertexID: 12
>>>>>>> >>>> > >> > >> vertexID: 14
>>>>>>> >>>> > >> > >> vertexID: 16
>>>>>>> >>>> > >> > >> vertexID: 18
>>>>>>> >>>> > >> > >> vertexID: 21
>>>>>>> >>>> > >> > >> vertexID: 23
>>>>>>> >>>> > >> > >> vertexID: 25
>>>>>>> >>>> > >> > >> vertexID: 27
>>>>>>> >>>> > >> > >> vertexID: 29
>>>>>>> >>>> > >> > >> vertexID: 3
>>>>>>> >>>> > >> > >>
>>>>>>> >>>> > >> > >> So this won't work then correctly...
>>>>>>> >>>> > >> > >>
>>>>>>> >>>> > >> > >>
>>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>>>>> >>>> > >> > >>
>>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>>>>>>> >>>> > >> > >>>
>>>>>>> >>>> > >> > >>>
>>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>>>> >>>> > >> > >>>
>>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>>>>>>> >>>> > holiday[1]
>>>>>>> >>>> > >> so
>>>>>>> >>>> > >> > >>>> I'll appear next week.
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> > >>>> 1.
>>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>>>>>>> >>>> items
>>>>>>> >>>> > >> were
>>>>>>> >>>> > >> > >>>> added.
>>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID
>>>>>>> into
>>>>>>> >>>> > the
>>>>>>> >>>> > >> > >>>> fastgen,
>>>>>>> >>>> > >> > >>>> > want to have a look into it?
>>>>>>> >>>> > >> > >>>> >
>>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>>>> >>>> > >> > >>>> >
>>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>>>>>>> >>>> matrix
>>>>>>> >>>> > >> into
>>>>>>> >>>> > >> > >>>> >> multiple files.
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>>>>>>> >>>> > >> > >>>> >> >
>>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>>>> >>>> > >> > >>>> >> >
>>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>>>>> >>>> > >> /tmp/randomgraph/
>>>>>>> >>>> > >> > >>>> >> >> total 44
>>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01
>>>>>>> part-00000
>>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>>>>> >>>> > .part-00000.crc
>>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01
>>>>>>> part-00001
>>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>>>>> >>>> > .part-00001.crc
>>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03
>>>>>>> partitions
>>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>>>>>> >>>> > >> > >>>> >> >> total 24
>>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
>>>>>>> part-00000
>>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>>>>> >>>> > .part-00000.crc
>>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
>>>>>>> part-00001
>>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>>>>> >>>> > .part-00001.crc
>>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>>>>>>> >>>> edward@udanax.org
>>>>>>> >>>> > >
>>>>>>> >>>> > >> > wrote:
>>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>>>>>> >>>> > >> > >>>> >> >> >
>>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>>>>>> >>>> > >> > >>>> >> >> >
>>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>>>>>> >>>> > >> > >>>> >> >> wrote:
>>>>>>> >>>> > >> > >>>> >> >> >
>>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>>>>>>> >>>> > >> > >>>> >> >> >>
>>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>>>>>>> >>>> > >> part-00001,
>>>>>>> >>>> > >> > >>>> both
>>>>>>> >>>> > >> > >>>> >> ~2.2kb
>>>>>>> >>>> > >> > >>>> >> >> >> sized.
>>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>>>>>>> >>>> single
>>>>>>> >>>> > >> > 5.56kb
>>>>>>> >>>> > >> > >>>> file.
>>>>>>> >>>> > >> > >>>> >> >> >>
>>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a
>>>>>>> single
>>>>>>> >>>> > file
>>>>>>> >>>> > >> if
>>>>>>> >>>> > >> > you
>>>>>>> >>>> > >> > >>>> >> >> configured
>>>>>>> >>>> > >> > >>>> >> >> >> two?
>>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>>>>>>> >>>> > >> > >>>> >> >> >>
>>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>>>>>> thomas.jungblut@gmail.com>
>>>>>>> >>>> > >> > >>>> >> >> >>
>>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe
>>>>>>> the
>>>>>>> >>>> > >> > >>>> partitioning
>>>>>>> >>>> > >> > >>>> >> >> doesn't
>>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or something
>>>>>>> else.
>>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>>>>>>> >
>>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>>>> :~/workspace/hama-trunk$
>>>>>>> >>>> > >> bin/hama
>>>>>>> >>>> > >> > jar
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>>>>>> >>>> > >> > fastgen
>>>>>>> >>>> > >> > >>>> 100 10
>>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
>>>>>>> Unable
>>>>>>> >>>> > to
>>>>>>> >>>> > >> > load
>>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>>>>>> using
>>>>>>> >>>> > >> > builtin-java
>>>>>>> >>>> > >> > >>>> >> classes
>>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
>>>>>>> Running
>>>>>>> >>>> job:
>>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
>>>>>>> Setting
>>>>>>> >>>> up
>>>>>>> >>>> > a
>>>>>>> >>>> > >> new
>>>>>>> >>>> > >> > >>>> barrier
>>>>>>> >>>> > >> > >>>> >> >> for 2
>>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>>> Current
>>>>>>> >>>> > >> supersteps
>>>>>>> >>>> > >> > >>>> >> number: 0
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The
>>>>>>> total
>>>>>>> >>>> > number
>>>>>>> >>>> > >> > of
>>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>>> Counters: 3
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>>> >>>> > SUPERSTEPS=0
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>>>> :~/workspace/hama-trunk$
>>>>>>> >>>> > >> bin/hama
>>>>>>> >>>> > >> > jar
>>>>>>> >>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>>>> :~/workspace/hama-trunk$
>>>>>>> >>>> > >> bin/hama
>>>>>>> >>>> > >> > jar
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>>>>>> >>>> > pagerank
>>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
>>>>>>> Unable
>>>>>>> >>>> > to
>>>>>>> >>>> > >> > load
>>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>>>>>> using
>>>>>>> >>>> > >> > builtin-java
>>>>>>> >>>> > >> > >>>> >> classes
>>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>>>>>> Total
>>>>>>> >>>> > input
>>>>>>> >>>> > >> > paths
>>>>>>> >>>> > >> > >>>> to
>>>>>>> >>>> > >> > >>>> >> >> process
>>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>>>>>> Total
>>>>>>> >>>> > input
>>>>>>> >>>> > >> > paths
>>>>>>> >>>> > >> > >>>> to
>>>>>>> >>>> > >> > >>>> >> >> process
>>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
>>>>>>> Running
>>>>>>> >>>> job:
>>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
>>>>>>> Setting
>>>>>>> >>>> up
>>>>>>> >>>> > a
>>>>>>> >>>> > >> new
>>>>>>> >>>> > >> > >>>> barrier
>>>>>>> >>>> > >> > >>>> >> >> for 2
>>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> Current
>>>>>>> >>>> > >> supersteps
>>>>>>> >>>> > >> > >>>> >> number: 1
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The
>>>>>>> total
>>>>>>> >>>> > number
>>>>>>> >>>> > >> > of
>>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> Counters: 6
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> >>>> > SUPERSTEPS=1
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
>>>>>>> Total
>>>>>>> >>>> > input
>>>>>>> >>>> > >> > paths
>>>>>>> >>>> > >> > >>>> to
>>>>>>> >>>> > >> > >>>> >> >> process
>>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>>> Running
>>>>>>> >>>> job:
>>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
>>>>>>> Setting
>>>>>>> >>>> up
>>>>>>> >>>> > a
>>>>>>> >>>> > >> new
>>>>>>> >>>> > >> > >>>> barrier
>>>>>>> >>>> > >> > >>>> >> >> for 2
>>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>>>>> >>>> > vertices
>>>>>>> >>>> > >> > are
>>>>>>> >>>> > >> > >>>> loaded
>>>>>>> >>>> > >> > >>>> >> >> into
>>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>>>>> >>>> > vertices
>>>>>>> >>>> > >> > are
>>>>>>> >>>> > >> > >>>> loaded
>>>>>>> >>>> > >> > >>>> >> >> into
>>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>>>>>>> >>>> Exception
>>>>>>> >>>> > >> > during
>>>>>>> >>>> > >> > >>>> BSP
>>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages
>>>>>>> must
>>>>>>> >>>> > never
>>>>>>> >>>> > >> be
>>>>>>> >>>> > >> > >>>> behind
>>>>>>> >>>> > >> > >>>> >> the
>>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> >
>>>>>>> >>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> >
>>>>>>> >>>> >
>>>>>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> >
>>>>>>> >>>> >
>>>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >>
>>>>>>> >>>> >
>>>>>>> >>>>
>>>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >>
>>>>>>> >>>> >
>>>>>>> >>>>
>>>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> >
>>>>>>> >>>> >
>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >>
>>>>>>> >>>> >
>>>>>>> >>>>
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >>
>>>>>>> >>>> >
>>>>>>> >>>>
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>>>> >>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>>> >>>> > >> > >>>> >> >> >>>> --
>>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >> >> --
>>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>>>>>>> >>>> > >> > >>>> >> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>> >> --
>>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>>>>>> >>>> > >> > >>>> >> @eddieyoon
>>>>>>> >>>> > >> > >>>> >>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> > >>>> --
>>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>>>>>> >>>> > >> > >>>> @eddieyoon
>>>>>>> >>>> > >> > >>>>
>>>>>>> >>>> > >> > >>>
>>>>>>> >>>> > >> > >>>
>>>>>>> >>>> > >> > >>
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >> > --
>>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>>>>>>> >>>> > >> > @eddieyoon
>>>>>>> >>>> > >> >
>>>>>>> >>>> > >>
>>>>>>> >>>> >
>>>>>>> >>>> >
>>>>>>> >>>> >
>>>>>>> >>>> > --
>>>>>>> >>>> > Best Regards, Edward J. Yoon
>>>>>>> >>>> > @eddieyoon
>>>>>>> >>>> >
>>>>>>> >>>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Best Regards, Edward J. Yoon
>>>>>>> >> @eddieyoon
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Best Regards, Edward J. Yoon
>>>>>>> > @eddieyoon
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards, Edward J. Yoon
>>>>>>> @eddieyoon
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards, Edward J. Yoon
>>>>> @eddieyoon
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

Oh, now I get how iterate() works. HAMA-704 is nicely written.

On Thu, Mar 14, 2013 at 12:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
> I'm reading changes of HAMA-704 again. As a result of adding
> DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
> but I think this approach will bring more disadvantages than
> advantages.
>
> On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>>>> in loadVertices? Maybe consider feature for coupling storage in user space
>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>>>>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>>>>> with a single read and single write on every peer.
>>
>> And, as I commented JIRA ticket, I think we can't use messaging system
>> for sorting vertices within partition files.
>>
>> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>> P.S., (number of splits = number of partitions) is really confuse to
>>> me. Even though blocks number is equal to desired tasks number, data
>>> should be re-partitioned again.
>>>
>>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>>> Indeed. If there are already partitioned input files (unsorted) and so
>>>> user want to skip pre-partitioning phase, it should be handled in
>>>> GraphJobRunner BSP program. Actually, I still don't know why
>>>> re-partitioned files need to be Sorted. It's only about
>>>> GraphJobRunner.
>>>>
>>>>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>>>>> partitioning superstep for graph applications).
>>>>
>>>> Sorry. I don't understand exactly yet. Do you mean just a partitioning
>>>> job based on superstep API?
>>>>
>>>> By default, 100 tasks will be assigned for partitioning job.
>>>> Partitioning job will create 1,000 partitions. Thus, we can execute
>>>> the Graph job with 1,000 tasks.
>>>>
>>>> Let's assume that a input sequence file is 20GB (100 blocks). If I
>>>> want to run with 1,000 tasks, what happens?
>>>>
>>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org> wrote:
>>>>> I am responding on this thread because of better continuity for
>>>>> conversation. We cannot expect the partitions to be sorted every time. When
>>>>> the number of splits = number of partitions and partitioning is switched
>>>>> off by user[HAMA-561], the partitions would not be sorted. Can we do this
>>>>> in loadVertices? Maybe consider feature for coupling storage in user space
>>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>>>>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>>>>> with a single read and single write on every peer.
>>>>>
>>>>> Just clearing confusion if any regarding superstep injection for
>>>>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>>>>> partitioning superstep for graph applications).
>>>>> Say there are x splits and y number of tasks configured by user.
>>>>>
>>>>> if x > y
>>>>> The y tasks are scheduled with x of them having each of the x splits and
>>>>> the remaining with no resource local to them. Then the partitioning
>>>>> superstep redistributes the partitions among them to create local
>>>>> partitions. Now the question is can we re-initialize a peer's input based
>>>>> on this new local part of partition?
>>>>>
>>>>> if y > x
>>>>> works as it works today.
>>>>>
>>>>> Just putting my points in brainstorming.
>>>>>
>>>>> -Suraj
>>>>>
>>>>>
>>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>>>>
>>>>>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>>>>>>
>>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <ed...@apache.org>
>>>>>> wrote:
>>>>>> > Additionally,
>>>>>> >
>>>>>> >> spilling queue and sorted spilling queue, can we inject the partitioning
>>>>>> >> superstep as the first superstep and use local memory?
>>>>>> >
>>>>>> > Can we execute different number of tasks per superstep?
>>>>>> >
>>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org>
>>>>>> wrote:
>>>>>> >>> For graph processing, the partitioned files that result from the
>>>>>> >>> partitioning job must be sorted. Currently only the partition files in
>>>>>> >>
>>>>>> >> I see.
>>>>>> >>
>>>>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>>>>> idea
>>>>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>>>>> our
>>>>>> >>> messaging system is actually the best.
>>>>>> >>
>>>>>> >> BTW, if some garbage objects can be accumulated in partitioning step,
>>>>>> >> separated partitioning job may not be bad idea. Is there some special
>>>>>> >> reason?
>>>>>> >>
>>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>>>>> >> <th...@gmail.com> wrote:
>>>>>> >>> For graph processing, the partitioned files that result from the
>>>>>> >>> partitioning job must be sorted. Currently only the partition files in
>>>>>> >>> itself are sorted, thus more tasks result in not sorted data in the
>>>>>> >>> completed file. This only applies for the graph processing package.
>>>>>> >>> So as Suraj told, it would be much more simpler to solve this via
>>>>>> >>> messaging, once it is scalable (it will be very very scalable!). So the
>>>>>> >>> GraphJobRunner can be partitioning the stuff with a single superstep in
>>>>>> >>> setup() as it was before ages ago. The messaging must be sorted anyway
>>>>>> for
>>>>>> >>> the algorithm so this is a nice side effect and saves us the
>>>>>> partitioning
>>>>>> >>> job for graph processing.
>>>>>> >>>
>>>>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>>>>> idea
>>>>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>>>>> our
>>>>>> >>> messaging system is actually the best.
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>>>>> >>>
>>>>>> >>>> No, the partitions we write locally need not be sorted. Sorry for the
>>>>>> >>>> confusion. The Superstep injection is possible with Superstep API.
>>>>>> There
>>>>>> >>>> are few enhancements needed to make it simpler after I last worked on
>>>>>> it.
>>>>>> >>>> We can then look into partitioning superstep being executed before the
>>>>>> >>>> setup of first superstep of submitted job. I think it is feasible.
>>>>>> >>>>
>>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>>>>>> >>>> >wrote:
>>>>>> >>>>
>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>>>>> >>>> partitioning
>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>>>>> >>>> >
>>>>>> >>>> > Actually, I wanted to add something before calling BSP.setup()
>>>>>> method
>>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion, current is
>>>>>> >>>> > enough. I think, we need to collect more experiences of input
>>>>>> >>>> > partitioning on large environments. I'll do.
>>>>>> >>>> >
>>>>>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>>>>>> >>>> >
>>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>>>>> surajsmenon@apache.org>
>>>>>> >>>> > wrote:
>>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph module.
>>>>>> When we
>>>>>> >>>> > have
>>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>>>>> >>>> partitioning
>>>>>> >>>> > > superstep as the first superstep and use local memory?
>>>>>> >>>> > > Today we have partitioning job within a job and are creating two
>>>>>> copies
>>>>>> >>>> > of
>>>>>> >>>> > > data on HDFS. This could be really costly. Is it possible to
>>>>>> create or
>>>>>> >>>> > > redistribute the partitions on local memory and initialize the
>>>>>> record
>>>>>> >>>> > > reader there?
>>>>>> >>>> > > The user can run a separate job give in examples area to
>>>>>> explicitly
>>>>>> >>>> > > repartition the data on HDFS. The deployment question is how much
>>>>>> of
>>>>>> >>>> disk
>>>>>> >>>> > > space gets allocated for local memory usage? Would it be a safe
>>>>>> >>>> approach
>>>>>> >>>> > > with the limitations?
>>>>>> >>>> > >
>>>>>> >>>> > > -Suraj
>>>>>> >>>> > >
>>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>>>>> >>>> > > <th...@gmail.com>wrote:
>>>>>> >>>> > >
>>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can add this to
>>>>>> the
>>>>>> >>>> > >> partitioner pretty easily.
>>>>>> >>>> > >>
>>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>>> >>>> > >>
>>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be
>>>>>> Sorted?
>>>>>> >>>> > >> >
>>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>>>>> >>>> > >> > <th...@gmail.com> wrote:
>>>>>> >>>> > >> > > Now I get how the partitioning works, obviously if you merge
>>>>>> n
>>>>>> >>>> > sorted
>>>>>> >>>> > >> > files
>>>>>> >>>> > >> > > by just appending to each other, this will result in totally
>>>>>> >>>> > unsorted
>>>>>> >>>> > >> > data
>>>>>> >>>> > >> > > ;-)
>>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>>>>>> >>>> > >> > >
>>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>>>> >>>> > >> > >
>>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>>>>> >>>> > >> > >>
>>>>>> >>>> > >> > >> vertexID: 50
>>>>>> >>>> > >> > >> vertexID: 52
>>>>>> >>>> > >> > >> vertexID: 54
>>>>>> >>>> > >> > >> vertexID: 56
>>>>>> >>>> > >> > >> vertexID: 58
>>>>>> >>>> > >> > >> vertexID: 61
>>>>>> >>>> > >> > >> ...
>>>>>> >>>> > >> > >> vertexID: 78
>>>>>> >>>> > >> > >> vertexID: 81
>>>>>> >>>> > >> > >> vertexID: 83
>>>>>> >>>> > >> > >> vertexID: 85
>>>>>> >>>> > >> > >> ...
>>>>>> >>>> > >> > >> vertexID: 94
>>>>>> >>>> > >> > >> vertexID: 96
>>>>>> >>>> > >> > >> vertexID: 98
>>>>>> >>>> > >> > >> vertexID: 1
>>>>>> >>>> > >> > >> vertexID: 10
>>>>>> >>>> > >> > >> vertexID: 12
>>>>>> >>>> > >> > >> vertexID: 14
>>>>>> >>>> > >> > >> vertexID: 16
>>>>>> >>>> > >> > >> vertexID: 18
>>>>>> >>>> > >> > >> vertexID: 21
>>>>>> >>>> > >> > >> vertexID: 23
>>>>>> >>>> > >> > >> vertexID: 25
>>>>>> >>>> > >> > >> vertexID: 27
>>>>>> >>>> > >> > >> vertexID: 29
>>>>>> >>>> > >> > >> vertexID: 3
>>>>>> >>>> > >> > >>
>>>>>> >>>> > >> > >> So this won't work then correctly...
>>>>>> >>>> > >> > >>
>>>>>> >>>> > >> > >>
>>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>>>> >>>> > >> > >>
>>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>>>>>> >>>> > >> > >>>
>>>>>> >>>> > >> > >>>
>>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>>> >>>> > >> > >>>
>>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>>>>>> >>>> > holiday[1]
>>>>>> >>>> > >> so
>>>>>> >>>> > >> > >>>> I'll appear next week.
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> > >>>> 1.
>>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>>>>>> >>>> items
>>>>>> >>>> > >> were
>>>>>> >>>> > >> > >>>> added.
>>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID
>>>>>> into
>>>>>> >>>> > the
>>>>>> >>>> > >> > >>>> fastgen,
>>>>>> >>>> > >> > >>>> > want to have a look into it?
>>>>>> >>>> > >> > >>>> >
>>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>>> >>>> > >> > >>>> >
>>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>>>>>> >>>> matrix
>>>>>> >>>> > >> into
>>>>>> >>>> > >> > >>>> >> multiple files.
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>>>>>> >>>> > >> > >>>> >> >
>>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>>> >>>> > >> > >>>> >> >
>>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>>>> >>>> > >> /tmp/randomgraph/
>>>>>> >>>> > >> > >>>> >> >> total 44
>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01
>>>>>> part-00000
>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>>>> >>>> > .part-00000.crc
>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01
>>>>>> part-00001
>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>>>> >>>> > .part-00001.crc
>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03
>>>>>> partitions
>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>>>>> >>>> > >> > >>>> >> >> total 24
>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
>>>>>> part-00000
>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>>>> >>>> > .part-00000.crc
>>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
>>>>>> part-00001
>>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>>>> >>>> > .part-00001.crc
>>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>>>>>> >>>> edward@udanax.org
>>>>>> >>>> > >
>>>>>> >>>> > >> > wrote:
>>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>>>>> >>>> > >> > >>>> >> >> >
>>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>>>>> >>>> > >> > >>>> >> >> >
>>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>>>>> >>>> > >> > >>>> >> >> wrote:
>>>>>> >>>> > >> > >>>> >> >> >
>>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>>>>>> >>>> > >> > >>>> >> >> >>
>>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>>>>>> >>>> > >> part-00001,
>>>>>> >>>> > >> > >>>> both
>>>>>> >>>> > >> > >>>> >> ~2.2kb
>>>>>> >>>> > >> > >>>> >> >> >> sized.
>>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>>>>>> >>>> single
>>>>>> >>>> > >> > 5.56kb
>>>>>> >>>> > >> > >>>> file.
>>>>>> >>>> > >> > >>>> >> >> >>
>>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a
>>>>>> single
>>>>>> >>>> > file
>>>>>> >>>> > >> if
>>>>>> >>>> > >> > you
>>>>>> >>>> > >> > >>>> >> >> configured
>>>>>> >>>> > >> > >>>> >> >> >> two?
>>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>>>>>> >>>> > >> > >>>> >> >> >>
>>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>>>>> thomas.jungblut@gmail.com>
>>>>>> >>>> > >> > >>>> >> >> >>
>>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe
>>>>>> the
>>>>>> >>>> > >> > >>>> partitioning
>>>>>> >>>> > >> > >>>> >> >> doesn't
>>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or something
>>>>>> else.
>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>>>>>> >
>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>>> :~/workspace/hama-trunk$
>>>>>> >>>> > >> bin/hama
>>>>>> >>>> > >> > jar
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>>>>> >>>> > >> > fastgen
>>>>>> >>>> > >> > >>>> 100 10
>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
>>>>>> Unable
>>>>>> >>>> > to
>>>>>> >>>> > >> > load
>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>>>>> using
>>>>>> >>>> > >> > builtin-java
>>>>>> >>>> > >> > >>>> >> classes
>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
>>>>>> Running
>>>>>> >>>> job:
>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
>>>>>> Setting
>>>>>> >>>> up
>>>>>> >>>> > a
>>>>>> >>>> > >> new
>>>>>> >>>> > >> > >>>> barrier
>>>>>> >>>> > >> > >>>> >> >> for 2
>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>> Current
>>>>>> >>>> > >> supersteps
>>>>>> >>>> > >> > >>>> >> number: 0
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The
>>>>>> total
>>>>>> >>>> > number
>>>>>> >>>> > >> > of
>>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>> Counters: 3
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>> >>>> > SUPERSTEPS=0
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>>> :~/workspace/hama-trunk$
>>>>>> >>>> > >> bin/hama
>>>>>> >>>> > >> > jar
>>>>>> >>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>>> :~/workspace/hama-trunk$
>>>>>> >>>> > >> bin/hama
>>>>>> >>>> > >> > jar
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>>>>> >>>> > pagerank
>>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
>>>>>> Unable
>>>>>> >>>> > to
>>>>>> >>>> > >> > load
>>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>>>>> using
>>>>>> >>>> > >> > builtin-java
>>>>>> >>>> > >> > >>>> >> classes
>>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>>>>> Total
>>>>>> >>>> > input
>>>>>> >>>> > >> > paths
>>>>>> >>>> > >> > >>>> to
>>>>>> >>>> > >> > >>>> >> >> process
>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>>>>> Total
>>>>>> >>>> > input
>>>>>> >>>> > >> > paths
>>>>>> >>>> > >> > >>>> to
>>>>>> >>>> > >> > >>>> >> >> process
>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
>>>>>> Running
>>>>>> >>>> job:
>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
>>>>>> Setting
>>>>>> >>>> up
>>>>>> >>>> > a
>>>>>> >>>> > >> new
>>>>>> >>>> > >> > >>>> barrier
>>>>>> >>>> > >> > >>>> >> >> for 2
>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> Current
>>>>>> >>>> > >> supersteps
>>>>>> >>>> > >> > >>>> >> number: 1
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The
>>>>>> total
>>>>>> >>>> > number
>>>>>> >>>> > >> > of
>>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> Counters: 6
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> >>>> > SUPERSTEPS=1
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > SUPERSTEP_SUM=4
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
>>>>>> Total
>>>>>> >>>> > input
>>>>>> >>>> > >> > paths
>>>>>> >>>> > >> > >>>> to
>>>>>> >>>> > >> > >>>> >> >> process
>>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>>> Running
>>>>>> >>>> job:
>>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
>>>>>> Setting
>>>>>> >>>> up
>>>>>> >>>> > a
>>>>>> >>>> > >> new
>>>>>> >>>> > >> > >>>> barrier
>>>>>> >>>> > >> > >>>> >> >> for 2
>>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>>>> >>>> > vertices
>>>>>> >>>> > >> > are
>>>>>> >>>> > >> > >>>> loaded
>>>>>> >>>> > >> > >>>> >> >> into
>>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>>>> >>>> > vertices
>>>>>> >>>> > >> > are
>>>>>> >>>> > >> > >>>> loaded
>>>>>> >>>> > >> > >>>> >> >> into
>>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>>>>>> >>>> Exception
>>>>>> >>>> > >> > during
>>>>>> >>>> > >> > >>>> BSP
>>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages
>>>>>> must
>>>>>> >>>> > never
>>>>>> >>>> > >> be
>>>>>> >>>> > >> > >>>> behind
>>>>>> >>>> > >> > >>>> >> the
>>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> >
>>>>>> >>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> >
>>>>>> >>>> >
>>>>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> >
>>>>>> >>>> >
>>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> >
>>>>>> >>>> > >>
>>>>>> >>>> >
>>>>>> >>>>
>>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> >
>>>>>> >>>> > >>
>>>>>> >>>> >
>>>>>> >>>>
>>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> >
>>>>>> >>>> >
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> >
>>>>>> >>>> > >>
>>>>>> >>>> >
>>>>>> >>>>
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> >
>>>>>> >>>> > >>
>>>>>> >>>> >
>>>>>> >>>>
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>>> >>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>>> >>>> > >> > >>>> >> >> >>>> --
>>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>> >>>> > >> > >>>> >> >> >>>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >> >> --
>>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>>>>>> >>>> > >> > >>>> >> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>> >> --
>>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>>>>> >>>> > >> > >>>> >> @eddieyoon
>>>>>> >>>> > >> > >>>> >>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> > >>>> --
>>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>>>>> >>>> > >> > >>>> @eddieyoon
>>>>>> >>>> > >> > >>>>
>>>>>> >>>> > >> > >>>
>>>>>> >>>> > >> > >>>
>>>>>> >>>> > >> > >>
>>>>>> >>>> > >> >
>>>>>> >>>> > >> >
>>>>>> >>>> > >> >
>>>>>> >>>> > >> > --
>>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>>>>>> >>>> > >> > @eddieyoon
>>>>>> >>>> > >> >
>>>>>> >>>> > >>
>>>>>> >>>> >
>>>>>> >>>> >
>>>>>> >>>> >
>>>>>> >>>> > --
>>>>>> >>>> > Best Regards, Edward J. Yoon
>>>>>> >>>> > @eddieyoon
>>>>>> >>>> >
>>>>>> >>>>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Best Regards, Edward J. Yoon
>>>>>> >> @eddieyoon
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Best Regards, Edward J. Yoon
>>>>>> > @eddieyoon
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards, Edward J. Yoon
>>>>>> @eddieyoon
>>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

I'm reading changes of HAMA-704 again. As a result of adding
DiskVerticesInfo, vertices list is needed to be sorted. I'm not sure
but I think this approach will bring more disadvantages than
advantages.

On Wed, Mar 13, 2013 at 11:09 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>>> in loadVertices? Maybe consider feature for coupling storage in user space
>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>>>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>>>> with a single read and single write on every peer.
>
> And, as I commented JIRA ticket, I think we can't use messaging system
> for sorting vertices within partition files.
>
> On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <ed...@apache.org> wrote:
>> P.S., (number of splits = number of partitions) is really confuse to
>> me. Even though blocks number is equal to desired tasks number, data
>> should be re-partitioned again.
>>
>> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>> Indeed. If there are already partitioned input files (unsorted) and so
>>> user want to skip pre-partitioning phase, it should be handled in
>>> GraphJobRunner BSP program. Actually, I still don't know why
>>> re-partitioned files need to be Sorted. It's only about
>>> GraphJobRunner.
>>>
>>>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>>>> partitioning superstep for graph applications).
>>>
>>> Sorry. I don't understand exactly yet. Do you mean just a partitioning
>>> job based on superstep API?
>>>
>>> By default, 100 tasks will be assigned for partitioning job.
>>> Partitioning job will create 1,000 partitions. Thus, we can execute
>>> the Graph job with 1,000 tasks.
>>>
>>> Let's assume that a input sequence file is 20GB (100 blocks). If I
>>> want to run with 1,000 tasks, what happens?
>>>
>>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org> wrote:
>>>> I am responding on this thread because of better continuity for
>>>> conversation. We cannot expect the partitions to be sorted every time. When
>>>> the number of splits = number of partitions and partitioning is switched
>>>> off by user[HAMA-561], the partitions would not be sorted. Can we do this
>>>> in loadVertices? Maybe consider feature for coupling storage in user space
>>>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>>>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>>>> with a single read and single write on every peer.
>>>>
>>>> Just clearing confusion if any regarding superstep injection for
>>>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>>>> partitioning superstep for graph applications).
>>>> Say there are x splits and y number of tasks configured by user.
>>>>
>>>> if x > y
>>>> The y tasks are scheduled with x of them having each of the x splits and
>>>> the remaining with no resource local to them. Then the partitioning
>>>> superstep redistributes the partitions among them to create local
>>>> partitions. Now the question is can we re-initialize a peer's input based
>>>> on this new local part of partition?
>>>>
>>>> if y > x
>>>> works as it works today.
>>>>
>>>> Just putting my points in brainstorming.
>>>>
>>>> -Suraj
>>>>
>>>>
>>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>>>
>>>>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>>>>>
>>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <ed...@apache.org>
>>>>> wrote:
>>>>> > Additionally,
>>>>> >
>>>>> >> spilling queue and sorted spilling queue, can we inject the partitioning
>>>>> >> superstep as the first superstep and use local memory?
>>>>> >
>>>>> > Can we execute different number of tasks per superstep?
>>>>> >
>>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org>
>>>>> wrote:
>>>>> >>> For graph processing, the partitioned files that result from the
>>>>> >>> partitioning job must be sorted. Currently only the partition files in
>>>>> >>
>>>>> >> I see.
>>>>> >>
>>>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>>>> idea
>>>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>>>> our
>>>>> >>> messaging system is actually the best.
>>>>> >>
>>>>> >> BTW, if some garbage objects can be accumulated in partitioning step,
>>>>> >> separated partitioning job may not be bad idea. Is there some special
>>>>> >> reason?
>>>>> >>
>>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>>>> >> <th...@gmail.com> wrote:
>>>>> >>> For graph processing, the partitioned files that result from the
>>>>> >>> partitioning job must be sorted. Currently only the partition files in
>>>>> >>> itself are sorted, thus more tasks result in not sorted data in the
>>>>> >>> completed file. This only applies for the graph processing package.
>>>>> >>> So as Suraj told, it would be much more simpler to solve this via
>>>>> >>> messaging, once it is scalable (it will be very very scalable!). So the
>>>>> >>> GraphJobRunner can be partitioning the stuff with a single superstep in
>>>>> >>> setup() as it was before ages ago. The messaging must be sorted anyway
>>>>> for
>>>>> >>> the algorithm so this is a nice side effect and saves us the
>>>>> partitioning
>>>>> >>> job for graph processing.
>>>>> >>>
>>>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>>>> idea
>>>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>>>> our
>>>>> >>> messaging system is actually the best.
>>>>> >>>
>>>>> >>>
>>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>>>> >>>
>>>>> >>>> No, the partitions we write locally need not be sorted. Sorry for the
>>>>> >>>> confusion. The Superstep injection is possible with Superstep API.
>>>>> There
>>>>> >>>> are few enhancements needed to make it simpler after I last worked on
>>>>> it.
>>>>> >>>> We can then look into partitioning superstep being executed before the
>>>>> >>>> setup of first superstep of submitted job. I think it is feasible.
>>>>> >>>>
>>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>>>>> >>>> >wrote:
>>>>> >>>>
>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>>>> >>>> partitioning
>>>>> >>>> > > superstep as the first superstep and use local memory?
>>>>> >>>> >
>>>>> >>>> > Actually, I wanted to add something before calling BSP.setup()
>>>>> method
>>>>> >>>> > to avoid execute additional BSP job. But, in my opinion, current is
>>>>> >>>> > enough. I think, we need to collect more experiences of input
>>>>> >>>> > partitioning on large environments. I'll do.
>>>>> >>>> >
>>>>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>>>>> >>>> >
>>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>>>> surajsmenon@apache.org>
>>>>> >>>> > wrote:
>>>>> >>>> > > Sorry, I am increasing the scope here to outside graph module.
>>>>> When we
>>>>> >>>> > have
>>>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>>>> >>>> partitioning
>>>>> >>>> > > superstep as the first superstep and use local memory?
>>>>> >>>> > > Today we have partitioning job within a job and are creating two
>>>>> copies
>>>>> >>>> > of
>>>>> >>>> > > data on HDFS. This could be really costly. Is it possible to
>>>>> create or
>>>>> >>>> > > redistribute the partitions on local memory and initialize the
>>>>> record
>>>>> >>>> > > reader there?
>>>>> >>>> > > The user can run a separate job give in examples area to
>>>>> explicitly
>>>>> >>>> > > repartition the data on HDFS. The deployment question is how much
>>>>> of
>>>>> >>>> disk
>>>>> >>>> > > space gets allocated for local memory usage? Would it be a safe
>>>>> >>>> approach
>>>>> >>>> > > with the limitations?
>>>>> >>>> > >
>>>>> >>>> > > -Suraj
>>>>> >>>> > >
>>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>>>> >>>> > > <th...@gmail.com>wrote:
>>>>> >>>> > >
>>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can add this to
>>>>> the
>>>>> >>>> > >> partitioner pretty easily.
>>>>> >>>> > >>
>>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>> >>>> > >>
>>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be
>>>>> Sorted?
>>>>> >>>> > >> >
>>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>>>> >>>> > >> > <th...@gmail.com> wrote:
>>>>> >>>> > >> > > Now I get how the partitioning works, obviously if you merge
>>>>> n
>>>>> >>>> > sorted
>>>>> >>>> > >> > files
>>>>> >>>> > >> > > by just appending to each other, this will result in totally
>>>>> >>>> > unsorted
>>>>> >>>> > >> > data
>>>>> >>>> > >> > > ;-)
>>>>> >>>> > >> > > Why didn't you solve this via messaging?
>>>>> >>>> > >> > >
>>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>>> >>>> > >> > >
>>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>>>> >>>> > >> > >>
>>>>> >>>> > >> > >> vertexID: 50
>>>>> >>>> > >> > >> vertexID: 52
>>>>> >>>> > >> > >> vertexID: 54
>>>>> >>>> > >> > >> vertexID: 56
>>>>> >>>> > >> > >> vertexID: 58
>>>>> >>>> > >> > >> vertexID: 61
>>>>> >>>> > >> > >> ...
>>>>> >>>> > >> > >> vertexID: 78
>>>>> >>>> > >> > >> vertexID: 81
>>>>> >>>> > >> > >> vertexID: 83
>>>>> >>>> > >> > >> vertexID: 85
>>>>> >>>> > >> > >> ...
>>>>> >>>> > >> > >> vertexID: 94
>>>>> >>>> > >> > >> vertexID: 96
>>>>> >>>> > >> > >> vertexID: 98
>>>>> >>>> > >> > >> vertexID: 1
>>>>> >>>> > >> > >> vertexID: 10
>>>>> >>>> > >> > >> vertexID: 12
>>>>> >>>> > >> > >> vertexID: 14
>>>>> >>>> > >> > >> vertexID: 16
>>>>> >>>> > >> > >> vertexID: 18
>>>>> >>>> > >> > >> vertexID: 21
>>>>> >>>> > >> > >> vertexID: 23
>>>>> >>>> > >> > >> vertexID: 25
>>>>> >>>> > >> > >> vertexID: 27
>>>>> >>>> > >> > >> vertexID: 29
>>>>> >>>> > >> > >> vertexID: 3
>>>>> >>>> > >> > >>
>>>>> >>>> > >> > >> So this won't work then correctly...
>>>>> >>>> > >> > >>
>>>>> >>>> > >> > >>
>>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>>> >>>> > >> > >>
>>>>> >>>> > >> > >>> sure, have fun on your holidays.
>>>>> >>>> > >> > >>>
>>>>> >>>> > >> > >>>
>>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>> >>>> > >> > >>>
>>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>>>>> >>>> > holiday[1]
>>>>> >>>> > >> so
>>>>> >>>> > >> > >>>> I'll appear next week.
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> > >>>> 1.
>>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>>>>> >>>> items
>>>>> >>>> > >> were
>>>>> >>>> > >> > >>>> added.
>>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID
>>>>> into
>>>>> >>>> > the
>>>>> >>>> > >> > >>>> fastgen,
>>>>> >>>> > >> > >>>> > want to have a look into it?
>>>>> >>>> > >> > >>>> >
>>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>> >>>> > >> > >>>> >
>>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>>>>> >>>> matrix
>>>>> >>>> > >> into
>>>>> >>>> > >> > >>>> >> multiple files.
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>>>>> >>>> > >> > >>>> >> >
>>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>>> >>>> > >> > >>>> >> >
>>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>>> >>>> > >> /tmp/randomgraph/
>>>>> >>>> > >> > >>>> >> >> total 44
>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01
>>>>> part-00000
>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>>> >>>> > .part-00000.crc
>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01
>>>>> part-00001
>>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>>> >>>> > .part-00001.crc
>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03
>>>>> partitions
>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>>>> >>>> > >> > >>>> >> >> total 24
>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
>>>>> part-00000
>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>>> >>>> > .part-00000.crc
>>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
>>>>> part-00001
>>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>>> >>>> > .part-00001.crc
>>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>>>>> >>>> edward@udanax.org
>>>>> >>>> > >
>>>>> >>>> > >> > wrote:
>>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>>>> >>>> > >> > >>>> >> >> >
>>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>>>> >>>> > >> > >>>> >> >> >
>>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>>>> >>>> > >> > >>>> >> >> wrote:
>>>>> >>>> > >> > >>>> >> >> >
>>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>>>>> >>>> > >> > >>>> >> >> >>
>>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>>>>> >>>> > >> part-00001,
>>>>> >>>> > >> > >>>> both
>>>>> >>>> > >> > >>>> >> ~2.2kb
>>>>> >>>> > >> > >>>> >> >> >> sized.
>>>>> >>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>>>>> >>>> single
>>>>> >>>> > >> > 5.56kb
>>>>> >>>> > >> > >>>> file.
>>>>> >>>> > >> > >>>> >> >> >>
>>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a
>>>>> single
>>>>> >>>> > file
>>>>> >>>> > >> if
>>>>> >>>> > >> > you
>>>>> >>>> > >> > >>>> >> >> configured
>>>>> >>>> > >> > >>>> >> >> >> two?
>>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>>>>> >>>> > >> > >>>> >> >> >>
>>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>>>> thomas.jungblut@gmail.com>
>>>>> >>>> > >> > >>>> >> >> >>
>>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>>>> >>>> > >> > >>>> >> >> >>>
>>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>>>>> >>>> > >> > >>>> >> >> >>>
>>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe
>>>>> the
>>>>> >>>> > >> > >>>> partitioning
>>>>> >>>> > >> > >>>> >> >> doesn't
>>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or something
>>>>> else.
>>>>> >>>> > >> > >>>> >> >> >>>
>>>>> >>>> > >> > >>>> >> >> >>>
>>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>>>>> >
>>>>> >>>> > >> > >>>> >> >> >>>
>>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>> :~/workspace/hama-trunk$
>>>>> >>>> > >> bin/hama
>>>>> >>>> > >> > jar
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>>>> >>>> > >> > fastgen
>>>>> >>>> > >> > >>>> 100 10
>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
>>>>> Unable
>>>>> >>>> > to
>>>>> >>>> > >> > load
>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>>>> using
>>>>> >>>> > >> > builtin-java
>>>>> >>>> > >> > >>>> >> classes
>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
>>>>> Running
>>>>> >>>> job:
>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
>>>>> Setting
>>>>> >>>> up
>>>>> >>>> > a
>>>>> >>>> > >> new
>>>>> >>>> > >> > >>>> barrier
>>>>> >>>> > >> > >>>> >> >> for 2
>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>> Current
>>>>> >>>> > >> supersteps
>>>>> >>>> > >> > >>>> >> number: 0
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The
>>>>> total
>>>>> >>>> > number
>>>>> >>>> > >> > of
>>>>> >>>> > >> > >>>> >> >> supersteps: 0
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>> Counters: 3
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>> >>>> > SUPERSTEPS=0
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>> :~/workspace/hama-trunk$
>>>>> >>>> > >> bin/hama
>>>>> >>>> > >> > jar
>>>>> >>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>>> :~/workspace/hama-trunk$
>>>>> >>>> > >> bin/hama
>>>>> >>>> > >> > jar
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>>>> >>>> > pagerank
>>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
>>>>> Unable
>>>>> >>>> > to
>>>>> >>>> > >> > load
>>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>>>> using
>>>>> >>>> > >> > builtin-java
>>>>> >>>> > >> > >>>> >> classes
>>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>>>> Total
>>>>> >>>> > input
>>>>> >>>> > >> > paths
>>>>> >>>> > >> > >>>> to
>>>>> >>>> > >> > >>>> >> >> process
>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>>>> Total
>>>>> >>>> > input
>>>>> >>>> > >> > paths
>>>>> >>>> > >> > >>>> to
>>>>> >>>> > >> > >>>> >> >> process
>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
>>>>> Running
>>>>> >>>> job:
>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
>>>>> Setting
>>>>> >>>> up
>>>>> >>>> > a
>>>>> >>>> > >> new
>>>>> >>>> > >> > >>>> barrier
>>>>> >>>> > >> > >>>> >> >> for 2
>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> Current
>>>>> >>>> > >> supersteps
>>>>> >>>> > >> > >>>> >> number: 1
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The
>>>>> total
>>>>> >>>> > number
>>>>> >>>> > >> > of
>>>>> >>>> > >> > >>>> >> >> supersteps: 1
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> Counters: 6
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> >>>> > SUPERSTEPS=1
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > SUPERSTEP_SUM=4
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
>>>>> Total
>>>>> >>>> > input
>>>>> >>>> > >> > paths
>>>>> >>>> > >> > >>>> to
>>>>> >>>> > >> > >>>> >> >> process
>>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>>> Running
>>>>> >>>> job:
>>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
>>>>> Setting
>>>>> >>>> up
>>>>> >>>> > a
>>>>> >>>> > >> new
>>>>> >>>> > >> > >>>> barrier
>>>>> >>>> > >> > >>>> >> >> for 2
>>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>>> >>>> > vertices
>>>>> >>>> > >> > are
>>>>> >>>> > >> > >>>> loaded
>>>>> >>>> > >> > >>>> >> >> into
>>>>> >>>> > >> > >>>> >> >> >>>> local:1
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>>> >>>> > vertices
>>>>> >>>> > >> > are
>>>>> >>>> > >> > >>>> loaded
>>>>> >>>> > >> > >>>> >> >> into
>>>>> >>>> > >> > >>>> >> >> >>>> local:0
>>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>>>>> >>>> Exception
>>>>> >>>> > >> > during
>>>>> >>>> > >> > >>>> BSP
>>>>> >>>> > >> > >>>> >> >> >>>> execution!
>>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages
>>>>> must
>>>>> >>>> > never
>>>>> >>>> > >> be
>>>>> >>>> > >> > >>>> behind
>>>>> >>>> > >> > >>>> >> the
>>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> >
>>>>> >>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> >
>>>>> >>>> >
>>>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>>
>>>>> >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> >
>>>>> >>>> >
>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> >
>>>>> >>>> > >>
>>>>> >>>> >
>>>>> >>>>
>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> >
>>>>> >>>> > >>
>>>>> >>>> >
>>>>> >>>>
>>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> >
>>>>> >>>> >
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> >
>>>>> >>>> > >>
>>>>> >>>> >
>>>>> >>>>
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> >
>>>>> >>>> > >>
>>>>> >>>> >
>>>>> >>>>
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>> >>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >> >>>>
>>>>> >>>> > >> > >>>> >> >> >>>> --
>>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>>>> >>>> > >> > >>>> >> >> >>>
>>>>> >>>> > >> > >>>> >> >> >>>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >> >> --
>>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>>>> >>>> > >> > >>>> >> >> @eddieyoon
>>>>> >>>> > >> > >>>> >> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>> >> --
>>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>>>> >>>> > >> > >>>> >> @eddieyoon
>>>>> >>>> > >> > >>>> >>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> > >>>> --
>>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>>>> >>>> > >> > >>>> @eddieyoon
>>>>> >>>> > >> > >>>>
>>>>> >>>> > >> > >>>
>>>>> >>>> > >> > >>>
>>>>> >>>> > >> > >>
>>>>> >>>> > >> >
>>>>> >>>> > >> >
>>>>> >>>> > >> >
>>>>> >>>> > >> > --
>>>>> >>>> > >> > Best Regards, Edward J. Yoon
>>>>> >>>> > >> > @eddieyoon
>>>>> >>>> > >> >
>>>>> >>>> > >>
>>>>> >>>> >
>>>>> >>>> >
>>>>> >>>> >
>>>>> >>>> > --
>>>>> >>>> > Best Regards, Edward J. Yoon
>>>>> >>>> > @eddieyoon
>>>>> >>>> >
>>>>> >>>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Best Regards, Edward J. Yoon
>>>>> >> @eddieyoon
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Best Regards, Edward J. Yoon
>>>>> > @eddieyoon
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards, Edward J. Yoon
>>>>> @eddieyoon
>>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

>>> in loadVertices? Maybe consider feature for coupling storage in user space
>>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>>> with a single read and single write on every peer.

And, as I commented JIRA ticket, I think we can't use messaging system
for sorting vertices within partition files.

On Wed, Mar 13, 2013 at 11:00 PM, Edward J. Yoon <ed...@apache.org> wrote:
> P.S., (number of splits = number of partitions) is really confuse to
> me. Even though blocks number is equal to desired tasks number, data
> should be re-partitioned again.
>
> On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
>> Indeed. If there are already partitioned input files (unsorted) and so
>> user want to skip pre-partitioning phase, it should be handled in
>> GraphJobRunner BSP program. Actually, I still don't know why
>> re-partitioned files need to be Sorted. It's only about
>> GraphJobRunner.
>>
>>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>>> partitioning superstep for graph applications).
>>
>> Sorry. I don't understand exactly yet. Do you mean just a partitioning
>> job based on superstep API?
>>
>> By default, 100 tasks will be assigned for partitioning job.
>> Partitioning job will create 1,000 partitions. Thus, we can execute
>> the Graph job with 1,000 tasks.
>>
>> Let's assume that a input sequence file is 20GB (100 blocks). If I
>> want to run with 1,000 tasks, what happens?
>>
>> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org> wrote:
>>> I am responding on this thread because of better continuity for
>>> conversation. We cannot expect the partitions to be sorted every time. When
>>> the number of splits = number of partitions and partitioning is switched
>>> off by user[HAMA-561], the partitions would not be sorted. Can we do this
>>> in loadVertices? Maybe consider feature for coupling storage in user space
>>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>>> with a single read and single write on every peer.
>>>
>>> Just clearing confusion if any regarding superstep injection for
>>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>>> partitioning superstep for graph applications).
>>> Say there are x splits and y number of tasks configured by user.
>>>
>>> if x > y
>>> The y tasks are scheduled with x of them having each of the x splits and
>>> the remaining with no resource local to them. Then the partitioning
>>> superstep redistributes the partitions among them to create local
>>> partitions. Now the question is can we re-initialize a peer's input based
>>> on this new local part of partition?
>>>
>>> if y > x
>>> works as it works today.
>>>
>>> Just putting my points in brainstorming.
>>>
>>> -Suraj
>>>
>>>
>>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>>
>>>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>>>>
>>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <ed...@apache.org>
>>>> wrote:
>>>> > Additionally,
>>>> >
>>>> >> spilling queue and sorted spilling queue, can we inject the partitioning
>>>> >> superstep as the first superstep and use local memory?
>>>> >
>>>> > Can we execute different number of tasks per superstep?
>>>> >
>>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org>
>>>> wrote:
>>>> >>> For graph processing, the partitioned files that result from the
>>>> >>> partitioning job must be sorted. Currently only the partition files in
>>>> >>
>>>> >> I see.
>>>> >>
>>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>>> idea
>>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>>> our
>>>> >>> messaging system is actually the best.
>>>> >>
>>>> >> BTW, if some garbage objects can be accumulated in partitioning step,
>>>> >> separated partitioning job may not be bad idea. Is there some special
>>>> >> reason?
>>>> >>
>>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>>> >> <th...@gmail.com> wrote:
>>>> >>> For graph processing, the partitioned files that result from the
>>>> >>> partitioning job must be sorted. Currently only the partition files in
>>>> >>> itself are sorted, thus more tasks result in not sorted data in the
>>>> >>> completed file. This only applies for the graph processing package.
>>>> >>> So as Suraj told, it would be much more simpler to solve this via
>>>> >>> messaging, once it is scalable (it will be very very scalable!). So the
>>>> >>> GraphJobRunner can be partitioning the stuff with a single superstep in
>>>> >>> setup() as it was before ages ago. The messaging must be sorted anyway
>>>> for
>>>> >>> the algorithm so this is a nice side effect and saves us the
>>>> partitioning
>>>> >>> job for graph processing.
>>>> >>>
>>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>>> idea
>>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>>> our
>>>> >>> messaging system is actually the best.
>>>> >>>
>>>> >>>
>>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>>> >>>
>>>> >>>> No, the partitions we write locally need not be sorted. Sorry for the
>>>> >>>> confusion. The Superstep injection is possible with Superstep API.
>>>> There
>>>> >>>> are few enhancements needed to make it simpler after I last worked on
>>>> it.
>>>> >>>> We can then look into partitioning superstep being executed before the
>>>> >>>> setup of first superstep of submitted job. I think it is feasible.
>>>> >>>>
>>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>>>> >>>> >wrote:
>>>> >>>>
>>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>>> >>>> partitioning
>>>> >>>> > > superstep as the first superstep and use local memory?
>>>> >>>> >
>>>> >>>> > Actually, I wanted to add something before calling BSP.setup()
>>>> method
>>>> >>>> > to avoid execute additional BSP job. But, in my opinion, current is
>>>> >>>> > enough. I think, we need to collect more experiences of input
>>>> >>>> > partitioning on large environments. I'll do.
>>>> >>>> >
>>>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>>>> >>>> >
>>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>>> surajsmenon@apache.org>
>>>> >>>> > wrote:
>>>> >>>> > > Sorry, I am increasing the scope here to outside graph module.
>>>> When we
>>>> >>>> > have
>>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>>> >>>> partitioning
>>>> >>>> > > superstep as the first superstep and use local memory?
>>>> >>>> > > Today we have partitioning job within a job and are creating two
>>>> copies
>>>> >>>> > of
>>>> >>>> > > data on HDFS. This could be really costly. Is it possible to
>>>> create or
>>>> >>>> > > redistribute the partitions on local memory and initialize the
>>>> record
>>>> >>>> > > reader there?
>>>> >>>> > > The user can run a separate job give in examples area to
>>>> explicitly
>>>> >>>> > > repartition the data on HDFS. The deployment question is how much
>>>> of
>>>> >>>> disk
>>>> >>>> > > space gets allocated for local memory usage? Would it be a safe
>>>> >>>> approach
>>>> >>>> > > with the limitations?
>>>> >>>> > >
>>>> >>>> > > -Suraj
>>>> >>>> > >
>>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>>> >>>> > > <th...@gmail.com>wrote:
>>>> >>>> > >
>>>> >>>> > >> yes. Once Suraj added merging of sorted files we can add this to
>>>> the
>>>> >>>> > >> partitioner pretty easily.
>>>> >>>> > >>
>>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> >>>> > >>
>>>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be
>>>> Sorted?
>>>> >>>> > >> >
>>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>>> >>>> > >> > <th...@gmail.com> wrote:
>>>> >>>> > >> > > Now I get how the partitioning works, obviously if you merge
>>>> n
>>>> >>>> > sorted
>>>> >>>> > >> > files
>>>> >>>> > >> > > by just appending to each other, this will result in totally
>>>> >>>> > unsorted
>>>> >>>> > >> > data
>>>> >>>> > >> > > ;-)
>>>> >>>> > >> > > Why didn't you solve this via messaging?
>>>> >>>> > >> > >
>>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>> >>>> > >> > >
>>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>>> >>>> > >> > >>
>>>> >>>> > >> > >> vertexID: 50
>>>> >>>> > >> > >> vertexID: 52
>>>> >>>> > >> > >> vertexID: 54
>>>> >>>> > >> > >> vertexID: 56
>>>> >>>> > >> > >> vertexID: 58
>>>> >>>> > >> > >> vertexID: 61
>>>> >>>> > >> > >> ...
>>>> >>>> > >> > >> vertexID: 78
>>>> >>>> > >> > >> vertexID: 81
>>>> >>>> > >> > >> vertexID: 83
>>>> >>>> > >> > >> vertexID: 85
>>>> >>>> > >> > >> ...
>>>> >>>> > >> > >> vertexID: 94
>>>> >>>> > >> > >> vertexID: 96
>>>> >>>> > >> > >> vertexID: 98
>>>> >>>> > >> > >> vertexID: 1
>>>> >>>> > >> > >> vertexID: 10
>>>> >>>> > >> > >> vertexID: 12
>>>> >>>> > >> > >> vertexID: 14
>>>> >>>> > >> > >> vertexID: 16
>>>> >>>> > >> > >> vertexID: 18
>>>> >>>> > >> > >> vertexID: 21
>>>> >>>> > >> > >> vertexID: 23
>>>> >>>> > >> > >> vertexID: 25
>>>> >>>> > >> > >> vertexID: 27
>>>> >>>> > >> > >> vertexID: 29
>>>> >>>> > >> > >> vertexID: 3
>>>> >>>> > >> > >>
>>>> >>>> > >> > >> So this won't work then correctly...
>>>> >>>> > >> > >>
>>>> >>>> > >> > >>
>>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>> >>>> > >> > >>
>>>> >>>> > >> > >>> sure, have fun on your holidays.
>>>> >>>> > >> > >>>
>>>> >>>> > >> > >>>
>>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> >>>> > >> > >>>
>>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>>>> >>>> > holiday[1]
>>>> >>>> > >> so
>>>> >>>> > >> > >>>> I'll appear next week.
>>>> >>>> > >> > >>>>
>>>> >>>> > >> > >>>> 1.
>>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>>> >>>> > >> > >>>>
>>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>>>> >>>> items
>>>> >>>> > >> were
>>>> >>>> > >> > >>>> added.
>>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID
>>>> into
>>>> >>>> > the
>>>> >>>> > >> > >>>> fastgen,
>>>> >>>> > >> > >>>> > want to have a look into it?
>>>> >>>> > >> > >>>> >
>>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> >>>> > >> > >>>> >
>>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>>>> >>>> matrix
>>>> >>>> > >> into
>>>> >>>> > >> > >>>> >> multiple files.
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>>> >>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>>>> >>>> > >> > >>>> >> >
>>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> >>>> > >> > >>>> >> >
>>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>> >>>> > >> /tmp/randomgraph/
>>>> >>>> > >> > >>>> >> >> total 44
>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01
>>>> part-00000
>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>> >>>> > .part-00000.crc
>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01
>>>> part-00001
>>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>> >>>> > .part-00001.crc
>>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03
>>>> partitions
>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>>> >>>> > >> > >>>> >> >> total 24
>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
>>>> part-00000
>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>> >>>> > .part-00000.crc
>>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
>>>> part-00001
>>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>> >>>> > .part-00001.crc
>>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>>>> >>>> edward@udanax.org
>>>> >>>> > >
>>>> >>>> > >> > wrote:
>>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>>> >>>> > >> > >>>> >> >> >
>>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>>> >>>> > >> > >>>> >> >> >
>>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>>> >>>> > >> > >>>> >> >> wrote:
>>>> >>>> > >> > >>>> >> >> >
>>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>>>> >>>> > >> > >>>> >> >> >>
>>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>>>> >>>> > >> part-00001,
>>>> >>>> > >> > >>>> both
>>>> >>>> > >> > >>>> >> ~2.2kb
>>>> >>>> > >> > >>>> >> >> >> sized.
>>>> >>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>>>> >>>> single
>>>> >>>> > >> > 5.56kb
>>>> >>>> > >> > >>>> file.
>>>> >>>> > >> > >>>> >> >> >>
>>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a
>>>> single
>>>> >>>> > file
>>>> >>>> > >> if
>>>> >>>> > >> > you
>>>> >>>> > >> > >>>> >> >> configured
>>>> >>>> > >> > >>>> >> >> >> two?
>>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>>>> >>>> > >> > >>>> >> >> >>
>>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>>> thomas.jungblut@gmail.com>
>>>> >>>> > >> > >>>> >> >> >>
>>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe
>>>> the
>>>> >>>> > >> > >>>> partitioning
>>>> >>>> > >> > >>>> >> >> doesn't
>>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or something
>>>> else.
>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>>>> >
>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>> :~/workspace/hama-trunk$
>>>> >>>> > >> bin/hama
>>>> >>>> > >> > jar
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>>> >>>> > >> > fastgen
>>>> >>>> > >> > >>>> 100 10
>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
>>>> Unable
>>>> >>>> > to
>>>> >>>> > >> > load
>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>>> using
>>>> >>>> > >> > builtin-java
>>>> >>>> > >> > >>>> >> classes
>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
>>>> Running
>>>> >>>> job:
>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
>>>> Setting
>>>> >>>> up
>>>> >>>> > a
>>>> >>>> > >> new
>>>> >>>> > >> > >>>> barrier
>>>> >>>> > >> > >>>> >> >> for 2
>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> Current
>>>> >>>> > >> supersteps
>>>> >>>> > >> > >>>> >> number: 0
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The
>>>> total
>>>> >>>> > number
>>>> >>>> > >> > of
>>>> >>>> > >> > >>>> >> >> supersteps: 0
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> Counters: 3
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> >>>> > SUPERSTEPS=0
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>> :~/workspace/hama-trunk$
>>>> >>>> > >> bin/hama
>>>> >>>> > >> > jar
>>>> >>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>>> :~/workspace/hama-trunk$
>>>> >>>> > >> bin/hama
>>>> >>>> > >> > jar
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>>> >>>> > pagerank
>>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
>>>> Unable
>>>> >>>> > to
>>>> >>>> > >> > load
>>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>>> using
>>>> >>>> > >> > builtin-java
>>>> >>>> > >> > >>>> >> classes
>>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>>> Total
>>>> >>>> > input
>>>> >>>> > >> > paths
>>>> >>>> > >> > >>>> to
>>>> >>>> > >> > >>>> >> >> process
>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>>> Total
>>>> >>>> > input
>>>> >>>> > >> > paths
>>>> >>>> > >> > >>>> to
>>>> >>>> > >> > >>>> >> >> process
>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
>>>> Running
>>>> >>>> job:
>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
>>>> Setting
>>>> >>>> up
>>>> >>>> > a
>>>> >>>> > >> new
>>>> >>>> > >> > >>>> barrier
>>>> >>>> > >> > >>>> >> >> for 2
>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> Current
>>>> >>>> > >> supersteps
>>>> >>>> > >> > >>>> >> number: 1
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The
>>>> total
>>>> >>>> > number
>>>> >>>> > >> > of
>>>> >>>> > >> > >>>> >> >> supersteps: 1
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> Counters: 6
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> >>>> > SUPERSTEPS=1
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> >>>> > >> > LAUNCHED_TASKS=2
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> >>>> > >> > SUPERSTEP_SUM=4
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
>>>> Total
>>>> >>>> > input
>>>> >>>> > >> > paths
>>>> >>>> > >> > >>>> to
>>>> >>>> > >> > >>>> >> >> process
>>>> >>>> > >> > >>>> >> >> >>>> : 2
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> Running
>>>> >>>> job:
>>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
>>>> Setting
>>>> >>>> up
>>>> >>>> > a
>>>> >>>> > >> new
>>>> >>>> > >> > >>>> barrier
>>>> >>>> > >> > >>>> >> >> for 2
>>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>> >>>> > vertices
>>>> >>>> > >> > are
>>>> >>>> > >> > >>>> loaded
>>>> >>>> > >> > >>>> >> >> into
>>>> >>>> > >> > >>>> >> >> >>>> local:1
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>> >>>> > vertices
>>>> >>>> > >> > are
>>>> >>>> > >> > >>>> loaded
>>>> >>>> > >> > >>>> >> >> into
>>>> >>>> > >> > >>>> >> >> >>>> local:0
>>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>>>> >>>> Exception
>>>> >>>> > >> > during
>>>> >>>> > >> > >>>> BSP
>>>> >>>> > >> > >>>> >> >> >>>> execution!
>>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages
>>>> must
>>>> >>>> > never
>>>> >>>> > >> be
>>>> >>>> > >> > >>>> behind
>>>> >>>> > >> > >>>> >> the
>>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> >
>>>> >>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> >
>>>> >>>> >
>>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>>
>>>> >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> >
>>>> >>>> >
>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> >
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>>
>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> >
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>>
>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>>
>>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> >
>>>> >>>> >
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>>
>>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> >
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>>
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> >>>> > >> > >>>> >> >> >>>>        at
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> >
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> >>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >> >>>>
>>>> >>>> > >> > >>>> >> >> >>>> --
>>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>>> > >> > >>>> >> >> >>>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >> >> --
>>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>>> >>>> > >> > >>>> >> >> @eddieyoon
>>>> >>>> > >> > >>>> >> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>> >> --
>>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>>> >>>> > >> > >>>> >> @eddieyoon
>>>> >>>> > >> > >>>> >>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> > >>>>
>>>> >>>> > >> > >>>> --
>>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>>> >>>> > >> > >>>> @eddieyoon
>>>> >>>> > >> > >>>>
>>>> >>>> > >> > >>>
>>>> >>>> > >> > >>>
>>>> >>>> > >> > >>
>>>> >>>> > >> >
>>>> >>>> > >> >
>>>> >>>> > >> >
>>>> >>>> > >> > --
>>>> >>>> > >> > Best Regards, Edward J. Yoon
>>>> >>>> > >> > @eddieyoon
>>>> >>>> > >> >
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > --
>>>> >>>> > Best Regards, Edward J. Yoon
>>>> >>>> > @eddieyoon
>>>> >>>> >
>>>> >>>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Best Regards, Edward J. Yoon
>>>> >> @eddieyoon
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best Regards, Edward J. Yoon
>>>> > @eddieyoon
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

P.S., (number of splits = number of partitions) is really confuse to
me. Even though blocks number is equal to desired tasks number, data
should be re-partitioned again.

On Wed, Mar 13, 2013 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Indeed. If there are already partitioned input files (unsorted) and so
> user want to skip pre-partitioning phase, it should be handled in
> GraphJobRunner BSP program. Actually, I still don't know why
> re-partitioned files need to be Sorted. It's only about
> GraphJobRunner.
>
>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>> partitioning superstep for graph applications).
>
> Sorry. I don't understand exactly yet. Do you mean just a partitioning
> job based on superstep API?
>
> By default, 100 tasks will be assigned for partitioning job.
> Partitioning job will create 1,000 partitions. Thus, we can execute
> the Graph job with 1,000 tasks.
>
> Let's assume that a input sequence file is 20GB (100 blocks). If I
> want to run with 1,000 tasks, what happens?
>
> On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org> wrote:
>> I am responding on this thread because of better continuity for
>> conversation. We cannot expect the partitions to be sorted every time. When
>> the number of splits = number of partitions and partitioning is switched
>> off by user[HAMA-561], the partitions would not be sorted. Can we do this
>> in loadVertices? Maybe consider feature for coupling storage in user space
>> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
>> partitioned or non-partitioned by partitioner, can keep vertices sorted
>> with a single read and single write on every peer.
>>
>> Just clearing confusion if any regarding superstep injection for
>> partitioning. (This is outside the scope of graphs. We can have a dedicated
>> partitioning superstep for graph applications).
>> Say there are x splits and y number of tasks configured by user.
>>
>> if x > y
>> The y tasks are scheduled with x of them having each of the x splits and
>> the remaining with no resource local to them. Then the partitioning
>> superstep redistributes the partitions among them to create local
>> partitions. Now the question is can we re-initialize a peer's input based
>> on this new local part of partition?
>>
>> if y > x
>> works as it works today.
>>
>> Just putting my points in brainstorming.
>>
>> -Suraj
>>
>>
>> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>>>
>>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> > Additionally,
>>> >
>>> >> spilling queue and sorted spilling queue, can we inject the partitioning
>>> >> superstep as the first superstep and use local memory?
>>> >
>>> > Can we execute different number of tasks per superstep?
>>> >
>>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> >>> For graph processing, the partitioned files that result from the
>>> >>> partitioning job must be sorted. Currently only the partition files in
>>> >>
>>> >> I see.
>>> >>
>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>> idea
>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>> our
>>> >>> messaging system is actually the best.
>>> >>
>>> >> BTW, if some garbage objects can be accumulated in partitioning step,
>>> >> separated partitioning job may not be bad idea. Is there some special
>>> >> reason?
>>> >>
>>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>>> >> <th...@gmail.com> wrote:
>>> >>> For graph processing, the partitioned files that result from the
>>> >>> partitioning job must be sorted. Currently only the partition files in
>>> >>> itself are sorted, thus more tasks result in not sorted data in the
>>> >>> completed file. This only applies for the graph processing package.
>>> >>> So as Suraj told, it would be much more simpler to solve this via
>>> >>> messaging, once it is scalable (it will be very very scalable!). So the
>>> >>> GraphJobRunner can be partitioning the stuff with a single superstep in
>>> >>> setup() as it was before ages ago. The messaging must be sorted anyway
>>> for
>>> >>> the algorithm so this is a nice side effect and saves us the
>>> partitioning
>>> >>> job for graph processing.
>>> >>>
>>> >>> For other partitionings and with regard to our superstep API, Suraj's
>>> idea
>>> >>> of injecting a preprocessing superstep that partitions the stuff into
>>> our
>>> >>> messaging system is actually the best.
>>> >>>
>>> >>>
>>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>>> >>>
>>> >>>> No, the partitions we write locally need not be sorted. Sorry for the
>>> >>>> confusion. The Superstep injection is possible with Superstep API.
>>> There
>>> >>>> are few enhancements needed to make it simpler after I last worked on
>>> it.
>>> >>>> We can then look into partitioning superstep being executed before the
>>> >>>> setup of first superstep of submitted job. I think it is feasible.
>>> >>>>
>>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>>> >>>> >wrote:
>>> >>>>
>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>> >>>> partitioning
>>> >>>> > > superstep as the first superstep and use local memory?
>>> >>>> >
>>> >>>> > Actually, I wanted to add something before calling BSP.setup()
>>> method
>>> >>>> > to avoid execute additional BSP job. But, in my opinion, current is
>>> >>>> > enough. I think, we need to collect more experiences of input
>>> >>>> > partitioning on large environments. I'll do.
>>> >>>> >
>>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>>> >>>> >
>>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>>> surajsmenon@apache.org>
>>> >>>> > wrote:
>>> >>>> > > Sorry, I am increasing the scope here to outside graph module.
>>> When we
>>> >>>> > have
>>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>>> >>>> partitioning
>>> >>>> > > superstep as the first superstep and use local memory?
>>> >>>> > > Today we have partitioning job within a job and are creating two
>>> copies
>>> >>>> > of
>>> >>>> > > data on HDFS. This could be really costly. Is it possible to
>>> create or
>>> >>>> > > redistribute the partitions on local memory and initialize the
>>> record
>>> >>>> > > reader there?
>>> >>>> > > The user can run a separate job give in examples area to
>>> explicitly
>>> >>>> > > repartition the data on HDFS. The deployment question is how much
>>> of
>>> >>>> disk
>>> >>>> > > space gets allocated for local memory usage? Would it be a safe
>>> >>>> approach
>>> >>>> > > with the limitations?
>>> >>>> > >
>>> >>>> > > -Suraj
>>> >>>> > >
>>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>> >>>> > > <th...@gmail.com>wrote:
>>> >>>> > >
>>> >>>> > >> yes. Once Suraj added merging of sorted files we can add this to
>>> the
>>> >>>> > >> partitioner pretty easily.
>>> >>>> > >>
>>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> >>>> > >>
>>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be
>>> Sorted?
>>> >>>> > >> >
>>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>> >>>> > >> > <th...@gmail.com> wrote:
>>> >>>> > >> > > Now I get how the partitioning works, obviously if you merge
>>> n
>>> >>>> > sorted
>>> >>>> > >> > files
>>> >>>> > >> > > by just appending to each other, this will result in totally
>>> >>>> > unsorted
>>> >>>> > >> > data
>>> >>>> > >> > > ;-)
>>> >>>> > >> > > Why didn't you solve this via messaging?
>>> >>>> > >> > >
>>> >>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>> >>>> > >> > >
>>> >>>> > >> > >> Seems that they are not correctly sorted:
>>> >>>> > >> > >>
>>> >>>> > >> > >> vertexID: 50
>>> >>>> > >> > >> vertexID: 52
>>> >>>> > >> > >> vertexID: 54
>>> >>>> > >> > >> vertexID: 56
>>> >>>> > >> > >> vertexID: 58
>>> >>>> > >> > >> vertexID: 61
>>> >>>> > >> > >> ...
>>> >>>> > >> > >> vertexID: 78
>>> >>>> > >> > >> vertexID: 81
>>> >>>> > >> > >> vertexID: 83
>>> >>>> > >> > >> vertexID: 85
>>> >>>> > >> > >> ...
>>> >>>> > >> > >> vertexID: 94
>>> >>>> > >> > >> vertexID: 96
>>> >>>> > >> > >> vertexID: 98
>>> >>>> > >> > >> vertexID: 1
>>> >>>> > >> > >> vertexID: 10
>>> >>>> > >> > >> vertexID: 12
>>> >>>> > >> > >> vertexID: 14
>>> >>>> > >> > >> vertexID: 16
>>> >>>> > >> > >> vertexID: 18
>>> >>>> > >> > >> vertexID: 21
>>> >>>> > >> > >> vertexID: 23
>>> >>>> > >> > >> vertexID: 25
>>> >>>> > >> > >> vertexID: 27
>>> >>>> > >> > >> vertexID: 29
>>> >>>> > >> > >> vertexID: 3
>>> >>>> > >> > >>
>>> >>>> > >> > >> So this won't work then correctly...
>>> >>>> > >> > >>
>>> >>>> > >> > >>
>>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>> >>>> > >> > >>
>>> >>>> > >> > >>> sure, have fun on your holidays.
>>> >>>> > >> > >>>
>>> >>>> > >> > >>>
>>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> >>>> > >> > >>>
>>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>>> >>>> > holiday[1]
>>> >>>> > >> so
>>> >>>> > >> > >>>> I'll appear next week.
>>> >>>> > >> > >>>>
>>> >>>> > >> > >>>> 1.
>>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>> >>>> > >> > >>>>
>>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>>> >>>> items
>>> >>>> > >> were
>>> >>>> > >> > >>>> added.
>>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID
>>> into
>>> >>>> > the
>>> >>>> > >> > >>>> fastgen,
>>> >>>> > >> > >>>> > want to have a look into it?
>>> >>>> > >> > >>>> >
>>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> >>>> > >> > >>>> >
>>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>>> >>>> matrix
>>> >>>> > >> into
>>> >>>> > >> > >>>> >> multiple files.
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>> >>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>>> >>>> > >> > >>>> >> >
>>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> >>>> > >> > >>>> >> >
>>> >>>> > >> > >>>> >> >> It looks like a bug.
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>> >>>> > >> /tmp/randomgraph/
>>> >>>> > >> > >>>> >> >> total 44
>>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01
>>> part-00000
>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>> >>>> > .part-00000.crc
>>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01
>>> part-00001
>>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>> >>>> > .part-00001.crc
>>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03
>>> partitions
>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>> >>>> > >> > >>>> >> >> total 24
>>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
>>> part-00000
>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>> >>>> > .part-00000.crc
>>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
>>> part-00001
>>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>> >>>> > .part-00001.crc
>>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>>> >>>> edward@udanax.org
>>> >>>> > >
>>> >>>> > >> > wrote:
>>> >>>> > >> > >>>> >> >> > yes i'll check again
>>> >>>> > >> > >>>> >> >> >
>>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>>> >>>> > >> > >>>> >> >> >
>>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>> >>>> > >> > >>>> >> >> wrote:
>>> >>>> > >> > >>>> >> >> >
>>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>>> >>>> > >> > >>>> >> >> >>
>>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>>> >>>> > >> part-00001,
>>> >>>> > >> > >>>> both
>>> >>>> > >> > >>>> >> ~2.2kb
>>> >>>> > >> > >>>> >> >> >> sized.
>>> >>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>>> >>>> single
>>> >>>> > >> > 5.56kb
>>> >>>> > >> > >>>> file.
>>> >>>> > >> > >>>> >> >> >>
>>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a
>>> single
>>> >>>> > file
>>> >>>> > >> if
>>> >>>> > >> > you
>>> >>>> > >> > >>>> >> >> configured
>>> >>>> > >> > >>>> >> >> >> two?
>>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>>> >>>> > >> > >>>> >> >> >>
>>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>>> thomas.jungblut@gmail.com>
>>> >>>> > >> > >>>> >> >> >>
>>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe
>>> the
>>> >>>> > >> > >>>> partitioning
>>> >>>> > >> > >>>> >> >> doesn't
>>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or something
>>> else.
>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>>> >
>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> :~/workspace/hama-trunk$
>>> >>>> > >> bin/hama
>>> >>>> > >> > jar
>>> >>>> > >> > >>>> >> >> >>>>
>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>> >>>> > >> > fastgen
>>> >>>> > >> > >>>> 100 10
>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
>>> Unable
>>> >>>> > to
>>> >>>> > >> > load
>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>> using
>>> >>>> > >> > builtin-java
>>> >>>> > >> > >>>> >> classes
>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
>>> Running
>>> >>>> job:
>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
>>> Setting
>>> >>>> up
>>> >>>> > a
>>> >>>> > >> new
>>> >>>> > >> > >>>> barrier
>>> >>>> > >> > >>>> >> >> for 2
>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> Current
>>> >>>> > >> supersteps
>>> >>>> > >> > >>>> >> number: 0
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The
>>> total
>>> >>>> > number
>>> >>>> > >> > of
>>> >>>> > >> > >>>> >> >> supersteps: 0
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> Counters: 3
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> >>>> > SUPERSTEPS=0
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> >>>> > >> > LAUNCHED_TASKS=2
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> :~/workspace/hama-trunk$
>>> >>>> > >> bin/hama
>>> >>>> > >> > jar
>>> >>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>>> :~/workspace/hama-trunk$
>>> >>>> > >> bin/hama
>>> >>>> > >> > jar
>>> >>>> > >> > >>>> >> >> >>>>
>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>> >>>> > pagerank
>>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
>>> Unable
>>> >>>> > to
>>> >>>> > >> > load
>>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>>> using
>>> >>>> > >> > builtin-java
>>> >>>> > >> > >>>> >> classes
>>> >>>> > >> > >>>> >> >> >>>> where applicable
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>> Total
>>> >>>> > input
>>> >>>> > >> > paths
>>> >>>> > >> > >>>> to
>>> >>>> > >> > >>>> >> >> process
>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>>> Total
>>> >>>> > input
>>> >>>> > >> > paths
>>> >>>> > >> > >>>> to
>>> >>>> > >> > >>>> >> >> process
>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
>>> Running
>>> >>>> job:
>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
>>> Setting
>>> >>>> up
>>> >>>> > a
>>> >>>> > >> new
>>> >>>> > >> > >>>> barrier
>>> >>>> > >> > >>>> >> >> for 2
>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> Current
>>> >>>> > >> supersteps
>>> >>>> > >> > >>>> >> number: 1
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The
>>> total
>>> >>>> > number
>>> >>>> > >> > of
>>> >>>> > >> > >>>> >> >> supersteps: 1
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> Counters: 6
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> >>>> > SUPERSTEPS=1
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> >>>> > >> > LAUNCHED_TASKS=2
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> >>>> > >> > SUPERSTEP_SUM=4
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> >>>> > >> > >>>> IO_BYTES_READ=4332
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
>>> Total
>>> >>>> > input
>>> >>>> > >> > paths
>>> >>>> > >> > >>>> to
>>> >>>> > >> > >>>> >> >> process
>>> >>>> > >> > >>>> >> >> >>>> : 2
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> Running
>>> >>>> job:
>>> >>>> > >> > >>>> >> >> job_localrunner_0001
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
>>> Setting
>>> >>>> up
>>> >>>> > a
>>> >>>> > >> new
>>> >>>> > >> > >>>> barrier
>>> >>>> > >> > >>>> >> >> for 2
>>> >>>> > >> > >>>> >> >> >>>> tasks!
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>> >>>> > vertices
>>> >>>> > >> > are
>>> >>>> > >> > >>>> loaded
>>> >>>> > >> > >>>> >> >> into
>>> >>>> > >> > >>>> >> >> >>>> local:1
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>> >>>> > vertices
>>> >>>> > >> > are
>>> >>>> > >> > >>>> loaded
>>> >>>> > >> > >>>> >> >> into
>>> >>>> > >> > >>>> >> >> >>>> local:0
>>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>>> >>>> Exception
>>> >>>> > >> > during
>>> >>>> > >> > >>>> BSP
>>> >>>> > >> > >>>> >> >> >>>> execution!
>>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages
>>> must
>>> >>>> > never
>>> >>>> > >> be
>>> >>>> > >> > >>>> behind
>>> >>>> > >> > >>>> >> the
>>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> >
>>> >>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>>
>>> >>>> > >> >
>>> >>>> >
>>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>>
>>> >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>>
>>> >>>> > >> >
>>> >>>> >
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>>
>>> >>>> > >> >
>>> >>>> > >>
>>> >>>> >
>>> >>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>>
>>> >>>> > >> >
>>> >>>> > >>
>>> >>>> >
>>> >>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>>
>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>>
>>> >>>> > >> >
>>> >>>> >
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>>
>>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>>
>>> >>>> > >> >
>>> >>>> > >>
>>> >>>> >
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> >>>> > >> > >>>> >> >> >>>>        at
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>>
>>> >>>> > >> >
>>> >>>> > >>
>>> >>>> >
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> >>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >> >>>>
>>> >>>> > >> > >>>> >> >> >>>> --
>>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> > >> > >>>> >> >> >>>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >> >> --
>>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>> >>>> > >> > >>>> >> >> @eddieyoon
>>> >>>> > >> > >>>> >> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>> >> --
>>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>> >>>> > >> > >>>> >> @eddieyoon
>>> >>>> > >> > >>>> >>
>>> >>>> > >> > >>>>
>>> >>>> > >> > >>>>
>>> >>>> > >> > >>>>
>>> >>>> > >> > >>>> --
>>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>>> >>>> > >> > >>>> @eddieyoon
>>> >>>> > >> > >>>>
>>> >>>> > >> > >>>
>>> >>>> > >> > >>>
>>> >>>> > >> > >>
>>> >>>> > >> >
>>> >>>> > >> >
>>> >>>> > >> >
>>> >>>> > >> > --
>>> >>>> > >> > Best Regards, Edward J. Yoon
>>> >>>> > >> > @eddieyoon
>>> >>>> > >> >
>>> >>>> > >>
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > --
>>> >>>> > Best Regards, Edward J. Yoon
>>> >>>> > @eddieyoon
>>> >>>> >
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> @eddieyoon
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon
>>> > @eddieyoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

Indeed. If there are already partitioned input files (unsorted) and so
user want to skip pre-partitioning phase, it should be handled in
GraphJobRunner BSP program. Actually, I still don't know why
re-partitioned files need to be Sorted. It's only about
GraphJobRunner.

> partitioning. (This is outside the scope of graphs. We can have a dedicated
> partitioning superstep for graph applications).

Sorry. I don't understand exactly yet. Do you mean just a partitioning
job based on superstep API?

By default, 100 tasks will be assigned for partitioning job.
Partitioning job will create 1,000 partitions. Thus, we can execute
the Graph job with 1,000 tasks.

Let's assume that a input sequence file is 20GB (100 blocks). If I
want to run with 1,000 tasks, what happens?

On Wed, Mar 13, 2013 at 9:49 PM, Suraj Menon <su...@apache.org> wrote:
> I am responding on this thread because of better continuity for
> conversation. We cannot expect the partitions to be sorted every time. When
> the number of splits = number of partitions and partitioning is switched
> off by user[HAMA-561], the partitions would not be sorted. Can we do this
> in loadVertices? Maybe consider feature for coupling storage in user space
> with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
> partitioned or non-partitioned by partitioner, can keep vertices sorted
> with a single read and single write on every peer.
>
> Just clearing confusion if any regarding superstep injection for
> partitioning. (This is outside the scope of graphs. We can have a dedicated
> partitioning superstep for graph applications).
> Say there are x splits and y number of tasks configured by user.
>
> if x > y
> The y tasks are scheduled with x of them having each of the x splits and
> the remaining with no resource local to them. Then the partitioning
> superstep redistributes the partitions among them to create local
> partitions. Now the question is can we re-initialize a peer's input based
> on this new local part of partition?
>
> if y > x
> works as it works today.
>
> Just putting my points in brainstorming.
>
> -Suraj
>
>
> On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>>
>> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> > Additionally,
>> >
>> >> spilling queue and sorted spilling queue, can we inject the partitioning
>> >> superstep as the first superstep and use local memory?
>> >
>> > Can we execute different number of tasks per superstep?
>> >
>> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> >>> For graph processing, the partitioned files that result from the
>> >>> partitioning job must be sorted. Currently only the partition files in
>> >>
>> >> I see.
>> >>
>> >>> For other partitionings and with regard to our superstep API, Suraj's
>> idea
>> >>> of injecting a preprocessing superstep that partitions the stuff into
>> our
>> >>> messaging system is actually the best.
>> >>
>> >> BTW, if some garbage objects can be accumulated in partitioning step,
>> >> separated partitioning job may not be bad idea. Is there some special
>> >> reason?
>> >>
>> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>> >> <th...@gmail.com> wrote:
>> >>> For graph processing, the partitioned files that result from the
>> >>> partitioning job must be sorted. Currently only the partition files in
>> >>> itself are sorted, thus more tasks result in not sorted data in the
>> >>> completed file. This only applies for the graph processing package.
>> >>> So as Suraj told, it would be much more simpler to solve this via
>> >>> messaging, once it is scalable (it will be very very scalable!). So the
>> >>> GraphJobRunner can be partitioning the stuff with a single superstep in
>> >>> setup() as it was before ages ago. The messaging must be sorted anyway
>> for
>> >>> the algorithm so this is a nice side effect and saves us the
>> partitioning
>> >>> job for graph processing.
>> >>>
>> >>> For other partitionings and with regard to our superstep API, Suraj's
>> idea
>> >>> of injecting a preprocessing superstep that partitions the stuff into
>> our
>> >>> messaging system is actually the best.
>> >>>
>> >>>
>> >>> 2013/3/6 Suraj Menon <su...@apache.org>
>> >>>
>> >>>> No, the partitions we write locally need not be sorted. Sorry for the
>> >>>> confusion. The Superstep injection is possible with Superstep API.
>> There
>> >>>> are few enhancements needed to make it simpler after I last worked on
>> it.
>> >>>> We can then look into partitioning superstep being executed before the
>> >>>> setup of first superstep of submitted job. I think it is feasible.
>> >>>>
>> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>> >>>> >wrote:
>> >>>>
>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>> >>>> partitioning
>> >>>> > > superstep as the first superstep and use local memory?
>> >>>> >
>> >>>> > Actually, I wanted to add something before calling BSP.setup()
>> method
>> >>>> > to avoid execute additional BSP job. But, in my opinion, current is
>> >>>> > enough. I think, we need to collect more experiences of input
>> >>>> > partitioning on large environments. I'll do.
>> >>>> >
>> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>> >>>> >
>> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
>> surajsmenon@apache.org>
>> >>>> > wrote:
>> >>>> > > Sorry, I am increasing the scope here to outside graph module.
>> When we
>> >>>> > have
>> >>>> > > spilling queue and sorted spilling queue, can we inject the
>> >>>> partitioning
>> >>>> > > superstep as the first superstep and use local memory?
>> >>>> > > Today we have partitioning job within a job and are creating two
>> copies
>> >>>> > of
>> >>>> > > data on HDFS. This could be really costly. Is it possible to
>> create or
>> >>>> > > redistribute the partitions on local memory and initialize the
>> record
>> >>>> > > reader there?
>> >>>> > > The user can run a separate job give in examples area to
>> explicitly
>> >>>> > > repartition the data on HDFS. The deployment question is how much
>> of
>> >>>> disk
>> >>>> > > space gets allocated for local memory usage? Would it be a safe
>> >>>> approach
>> >>>> > > with the limitations?
>> >>>> > >
>> >>>> > > -Suraj
>> >>>> > >
>> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>> >>>> > > <th...@gmail.com>wrote:
>> >>>> > >
>> >>>> > >> yes. Once Suraj added merging of sorted files we can add this to
>> the
>> >>>> > >> partitioner pretty easily.
>> >>>> > >>
>> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> >>>> > >>
>> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be
>> Sorted?
>> >>>> > >> >
>> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>> >>>> > >> > <th...@gmail.com> wrote:
>> >>>> > >> > > Now I get how the partitioning works, obviously if you merge
>> n
>> >>>> > sorted
>> >>>> > >> > files
>> >>>> > >> > > by just appending to each other, this will result in totally
>> >>>> > unsorted
>> >>>> > >> > data
>> >>>> > >> > > ;-)
>> >>>> > >> > > Why didn't you solve this via messaging?
>> >>>> > >> > >
>> >>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>> >>>> > >> > >
>> >>>> > >> > >> Seems that they are not correctly sorted:
>> >>>> > >> > >>
>> >>>> > >> > >> vertexID: 50
>> >>>> > >> > >> vertexID: 52
>> >>>> > >> > >> vertexID: 54
>> >>>> > >> > >> vertexID: 56
>> >>>> > >> > >> vertexID: 58
>> >>>> > >> > >> vertexID: 61
>> >>>> > >> > >> ...
>> >>>> > >> > >> vertexID: 78
>> >>>> > >> > >> vertexID: 81
>> >>>> > >> > >> vertexID: 83
>> >>>> > >> > >> vertexID: 85
>> >>>> > >> > >> ...
>> >>>> > >> > >> vertexID: 94
>> >>>> > >> > >> vertexID: 96
>> >>>> > >> > >> vertexID: 98
>> >>>> > >> > >> vertexID: 1
>> >>>> > >> > >> vertexID: 10
>> >>>> > >> > >> vertexID: 12
>> >>>> > >> > >> vertexID: 14
>> >>>> > >> > >> vertexID: 16
>> >>>> > >> > >> vertexID: 18
>> >>>> > >> > >> vertexID: 21
>> >>>> > >> > >> vertexID: 23
>> >>>> > >> > >> vertexID: 25
>> >>>> > >> > >> vertexID: 27
>> >>>> > >> > >> vertexID: 29
>> >>>> > >> > >> vertexID: 3
>> >>>> > >> > >>
>> >>>> > >> > >> So this won't work then correctly...
>> >>>> > >> > >>
>> >>>> > >> > >>
>> >>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>> >>>> > >> > >>
>> >>>> > >> > >>> sure, have fun on your holidays.
>> >>>> > >> > >>>
>> >>>> > >> > >>>
>> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> >>>> > >> > >>>
>> >>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>> >>>> > holiday[1]
>> >>>> > >> so
>> >>>> > >> > >>>> I'll appear next week.
>> >>>> > >> > >>>>
>> >>>> > >> > >>>> 1.
>> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> >>>> > >> > >>>>
>> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>> >>>> > >> > >>>> <th...@gmail.com> wrote:
>> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>> >>>> items
>> >>>> > >> were
>> >>>> > >> > >>>> added.
>> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID
>> into
>> >>>> > the
>> >>>> > >> > >>>> fastgen,
>> >>>> > >> > >>>> > want to have a look into it?
>> >>>> > >> > >>>> >
>> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> >>>> > >> > >>>> >
>> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>> >>>> matrix
>> >>>> > >> into
>> >>>> > >> > >>>> >> multiple files.
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
>> >>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>> >>>> > >> > >>>> >> >
>> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> >>>> > >> > >>>> >> >
>> >>>> > >> > >>>> >> >> It looks like a bug.
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>> >>>> > >> /tmp/randomgraph/
>> >>>> > >> > >>>> >> >> total 44
>> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01
>> part-00000
>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>> >>>> > .part-00000.crc
>> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01
>> part-00001
>> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>> >>>> > .part-00001.crc
>> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03
>> partitions
>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>> >>>> > >> > >>>> >> >> total 24
>> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
>> part-00000
>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>> >>>> > .part-00000.crc
>> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
>> part-00001
>> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>> >>>> > .part-00001.crc
>> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>> >>>> edward@udanax.org
>> >>>> > >
>> >>>> > >> > wrote:
>> >>>> > >> > >>>> >> >> > yes i'll check again
>> >>>> > >> > >>>> >> >> >
>> >>>> > >> > >>>> >> >> > Sent from my iPhone
>> >>>> > >> > >>>> >> >> >
>> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>> >>>> > >> > >>>> >> >> wrote:
>> >>>> > >> > >>>> >> >> >
>> >>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>> >>>> > >> > >>>> >> >> >>
>> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>> >>>> > >> part-00001,
>> >>>> > >> > >>>> both
>> >>>> > >> > >>>> >> ~2.2kb
>> >>>> > >> > >>>> >> >> >> sized.
>> >>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>> >>>> single
>> >>>> > >> > 5.56kb
>> >>>> > >> > >>>> file.
>> >>>> > >> > >>>> >> >> >>
>> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a
>> single
>> >>>> > file
>> >>>> > >> if
>> >>>> > >> > you
>> >>>> > >> > >>>> >> >> configured
>> >>>> > >> > >>>> >> >> >> two?
>> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>> >>>> > >> > >>>> >> >> >>
>> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
>> thomas.jungblut@gmail.com>
>> >>>> > >> > >>>> >> >> >>
>> >>>> > >> > >>>> >> >> >>> Will have a look into it.
>> >>>> > >> > >>>> >> >> >>>
>> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>> >>>> > >> > >>>> >> >> >>>
>> >>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe
>> the
>> >>>> > >> > >>>> partitioning
>> >>>> > >> > >>>> >> >> doesn't
>> >>>> > >> > >>>> >> >> >>> partition correctly with the input or something
>> else.
>> >>>> > >> > >>>> >> >> >>>
>> >>>> > >> > >>>> >> >> >>>
>> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
>> >
>> >>>> > >> > >>>> >> >> >>>
>> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> :~/workspace/hama-trunk$
>> >>>> > >> bin/hama
>> >>>> > >> > jar
>> >>>> > >> > >>>> >> >> >>>>
>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>> >>>> > >> > fastgen
>> >>>> > >> > >>>> 100 10
>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
>> Unable
>> >>>> > to
>> >>>> > >> > load
>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>> using
>> >>>> > >> > builtin-java
>> >>>> > >> > >>>> >> classes
>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
>> Running
>> >>>> job:
>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
>> Setting
>> >>>> up
>> >>>> > a
>> >>>> > >> new
>> >>>> > >> > >>>> barrier
>> >>>> > >> > >>>> >> >> for 2
>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> Current
>> >>>> > >> supersteps
>> >>>> > >> > >>>> >> number: 0
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The
>> total
>> >>>> > number
>> >>>> > >> > of
>> >>>> > >> > >>>> >> >> supersteps: 0
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> Counters: 3
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> >>>> > SUPERSTEPS=0
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> >>>> > >> > LAUNCHED_TASKS=2
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> :~/workspace/hama-trunk$
>> >>>> > >> bin/hama
>> >>>> > >> > jar
>> >>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
>> :~/workspace/hama-trunk$
>> >>>> > >> bin/hama
>> >>>> > >> > jar
>> >>>> > >> > >>>> >> >> >>>>
>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>> >>>> > pagerank
>> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
>> Unable
>> >>>> > to
>> >>>> > >> > load
>> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
>> using
>> >>>> > >> > builtin-java
>> >>>> > >> > >>>> >> classes
>> >>>> > >> > >>>> >> >> >>>> where applicable
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>> Total
>> >>>> > input
>> >>>> > >> > paths
>> >>>> > >> > >>>> to
>> >>>> > >> > >>>> >> >> process
>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
>> Total
>> >>>> > input
>> >>>> > >> > paths
>> >>>> > >> > >>>> to
>> >>>> > >> > >>>> >> >> process
>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
>> Running
>> >>>> job:
>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
>> Setting
>> >>>> up
>> >>>> > a
>> >>>> > >> new
>> >>>> > >> > >>>> barrier
>> >>>> > >> > >>>> >> >> for 2
>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> Current
>> >>>> > >> supersteps
>> >>>> > >> > >>>> >> number: 1
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The
>> total
>> >>>> > number
>> >>>> > >> > of
>> >>>> > >> > >>>> >> >> supersteps: 1
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> Counters: 6
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> >>>> > SUPERSTEPS=1
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> >>>> > >> > LAUNCHED_TASKS=2
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> >>>> > >> > SUPERSTEP_SUM=4
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> >>>> > >> > >>>> IO_BYTES_READ=4332
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
>> Total
>> >>>> > input
>> >>>> > >> > paths
>> >>>> > >> > >>>> to
>> >>>> > >> > >>>> >> >> process
>> >>>> > >> > >>>> >> >> >>>> : 2
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> Running
>> >>>> job:
>> >>>> > >> > >>>> >> >> job_localrunner_0001
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
>> Setting
>> >>>> up
>> >>>> > a
>> >>>> > >> new
>> >>>> > >> > >>>> barrier
>> >>>> > >> > >>>> >> >> for 2
>> >>>> > >> > >>>> >> >> >>>> tasks!
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>> >>>> > vertices
>> >>>> > >> > are
>> >>>> > >> > >>>> loaded
>> >>>> > >> > >>>> >> >> into
>> >>>> > >> > >>>> >> >> >>>> local:1
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>> >>>> > vertices
>> >>>> > >> > are
>> >>>> > >> > >>>> loaded
>> >>>> > >> > >>>> >> >> into
>> >>>> > >> > >>>> >> >> >>>> local:0
>> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>> >>>> Exception
>> >>>> > >> > during
>> >>>> > >> > >>>> BSP
>> >>>> > >> > >>>> >> >> >>>> execution!
>> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages
>> must
>> >>>> > never
>> >>>> > >> be
>> >>>> > >> > >>>> behind
>> >>>> > >> > >>>> >> the
>> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >>
>> >>>> > >> >
>> >>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>>
>> >>>> > >> >
>> >>>> >
>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>>
>> >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>>
>> >>>> > >> >
>> >>>> >
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>>
>> >>>> > >> >
>> >>>> > >>
>> >>>> >
>> >>>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>>
>> >>>> > >> >
>> >>>> > >>
>> >>>> >
>> >>>>
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>>
>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>>
>> >>>> > >> >
>> >>>> >
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>>
>> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>>
>> >>>> > >> >
>> >>>> > >>
>> >>>> >
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>>> > >> > >>>> >> >> >>>>        at
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>>
>> >>>> > >> >
>> >>>> > >>
>> >>>> >
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >> >>>>
>> >>>> > >> > >>>> >> >> >>>> --
>> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>> >>>> > >> > >>>> >> >> >>>> @eddieyoon
>> >>>> > >> > >>>> >> >> >>>
>> >>>> > >> > >>>> >> >> >>>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >> >> --
>> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>> >>>> > >> > >>>> >> >> @eddieyoon
>> >>>> > >> > >>>> >> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>> >> --
>> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>> >>>> > >> > >>>> >> @eddieyoon
>> >>>> > >> > >>>> >>
>> >>>> > >> > >>>>
>> >>>> > >> > >>>>
>> >>>> > >> > >>>>
>> >>>> > >> > >>>> --
>> >>>> > >> > >>>> Best Regards, Edward J. Yoon
>> >>>> > >> > >>>> @eddieyoon
>> >>>> > >> > >>>>
>> >>>> > >> > >>>
>> >>>> > >> > >>>
>> >>>> > >> > >>
>> >>>> > >> >
>> >>>> > >> >
>> >>>> > >> >
>> >>>> > >> > --
>> >>>> > >> > Best Regards, Edward J. Yoon
>> >>>> > >> > @eddieyoon
>> >>>> > >> >
>> >>>> > >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Best Regards, Edward J. Yoon
>> >>>> > @eddieyoon
>> >>>> >
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by Suraj Menon <su...@apache.org>.

I am responding on this thread because of better continuity for
conversation. We cannot expect the partitions to be sorted every time. When
the number of splits = number of partitions and partitioning is switched
off by user[HAMA-561], the partitions would not be sorted. Can we do this
in loadVertices? Maybe consider feature for coupling storage in user space
with BSP Messaging[HAMA-734] can avoid double reads and writes. This way
partitioned or non-partitioned by partitioner, can keep vertices sorted
with a single read and single write on every peer.

Just clearing confusion if any regarding superstep injection for
partitioning. (This is outside the scope of graphs. We can have a dedicated
partitioning superstep for graph applications).
Say there are x splits and y number of tasks configured by user.

if x > y
The y tasks are scheduled with x of them having each of the x splits and
the remaining with no resource local to them. Then the partitioning
superstep redistributes the partitions among them to create local
partitions. Now the question is can we re-initialize a peer's input based
on this new local part of partition?

if y > x
works as it works today.

Just putting my points in brainstorming.

-Suraj


On Mon, Mar 11, 2013 at 7:39 AM, Edward J. Yoon <ed...@apache.org>wrote:

> I just filed here https://issues.apache.org/jira/browse/HAMA-744
>
> On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Additionally,
> >
> >> spilling queue and sorted spilling queue, can we inject the partitioning
> >> superstep as the first superstep and use local memory?
> >
> > Can we execute different number of tasks per superstep?
> >
> > On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> >>> For graph processing, the partitioned files that result from the
> >>> partitioning job must be sorted. Currently only the partition files in
> >>
> >> I see.
> >>
> >>> For other partitionings and with regard to our superstep API, Suraj's
> idea
> >>> of injecting a preprocessing superstep that partitions the stuff into
> our
> >>> messaging system is actually the best.
> >>
> >> BTW, if some garbage objects can be accumulated in partitioning step,
> >> separated partitioning job may not be bad idea. Is there some special
> >> reason?
> >>
> >> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
> >> <th...@gmail.com> wrote:
> >>> For graph processing, the partitioned files that result from the
> >>> partitioning job must be sorted. Currently only the partition files in
> >>> itself are sorted, thus more tasks result in not sorted data in the
> >>> completed file. This only applies for the graph processing package.
> >>> So as Suraj told, it would be much more simpler to solve this via
> >>> messaging, once it is scalable (it will be very very scalable!). So the
> >>> GraphJobRunner can be partitioning the stuff with a single superstep in
> >>> setup() as it was before ages ago. The messaging must be sorted anyway
> for
> >>> the algorithm so this is a nice side effect and saves us the
> partitioning
> >>> job for graph processing.
> >>>
> >>> For other partitionings and with regard to our superstep API, Suraj's
> idea
> >>> of injecting a preprocessing superstep that partitions the stuff into
> our
> >>> messaging system is actually the best.
> >>>
> >>>
> >>> 2013/3/6 Suraj Menon <su...@apache.org>
> >>>
> >>>> No, the partitions we write locally need not be sorted. Sorry for the
> >>>> confusion. The Superstep injection is possible with Superstep API.
> There
> >>>> are few enhancements needed to make it simpler after I last worked on
> it.
> >>>> We can then look into partitioning superstep being executed before the
> >>>> setup of first superstep of submitted job. I think it is feasible.
> >>>>
> >>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
> >>>> >wrote:
> >>>>
> >>>> > > spilling queue and sorted spilling queue, can we inject the
> >>>> partitioning
> >>>> > > superstep as the first superstep and use local memory?
> >>>> >
> >>>> > Actually, I wanted to add something before calling BSP.setup()
> method
> >>>> > to avoid execute additional BSP job. But, in my opinion, current is
> >>>> > enough. I think, we need to collect more experiences of input
> >>>> > partitioning on large environments. I'll do.
> >>>> >
> >>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
> >>>> >
> >>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <
> surajsmenon@apache.org>
> >>>> > wrote:
> >>>> > > Sorry, I am increasing the scope here to outside graph module.
> When we
> >>>> > have
> >>>> > > spilling queue and sorted spilling queue, can we inject the
> >>>> partitioning
> >>>> > > superstep as the first superstep and use local memory?
> >>>> > > Today we have partitioning job within a job and are creating two
> copies
> >>>> > of
> >>>> > > data on HDFS. This could be really costly. Is it possible to
> create or
> >>>> > > redistribute the partitions on local memory and initialize the
> record
> >>>> > > reader there?
> >>>> > > The user can run a separate job give in examples area to
> explicitly
> >>>> > > repartition the data on HDFS. The deployment question is how much
> of
> >>>> disk
> >>>> > > space gets allocated for local memory usage? Would it be a safe
> >>>> approach
> >>>> > > with the limitations?
> >>>> > >
> >>>> > > -Suraj
> >>>> > >
> >>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> >>>> > > <th...@gmail.com>wrote:
> >>>> > >
> >>>> > >> yes. Once Suraj added merging of sorted files we can add this to
> the
> >>>> > >> partitioner pretty easily.
> >>>> > >>
> >>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >>>> > >>
> >>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be
> Sorted?
> >>>> > >> >
> >>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
> >>>> > >> > <th...@gmail.com> wrote:
> >>>> > >> > > Now I get how the partitioning works, obviously if you merge
> n
> >>>> > sorted
> >>>> > >> > files
> >>>> > >> > > by just appending to each other, this will result in totally
> >>>> > unsorted
> >>>> > >> > data
> >>>> > >> > > ;-)
> >>>> > >> > > Why didn't you solve this via messaging?
> >>>> > >> > >
> >>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
> >>>> > >> > >
> >>>> > >> > >> Seems that they are not correctly sorted:
> >>>> > >> > >>
> >>>> > >> > >> vertexID: 50
> >>>> > >> > >> vertexID: 52
> >>>> > >> > >> vertexID: 54
> >>>> > >> > >> vertexID: 56
> >>>> > >> > >> vertexID: 58
> >>>> > >> > >> vertexID: 61
> >>>> > >> > >> ...
> >>>> > >> > >> vertexID: 78
> >>>> > >> > >> vertexID: 81
> >>>> > >> > >> vertexID: 83
> >>>> > >> > >> vertexID: 85
> >>>> > >> > >> ...
> >>>> > >> > >> vertexID: 94
> >>>> > >> > >> vertexID: 96
> >>>> > >> > >> vertexID: 98
> >>>> > >> > >> vertexID: 1
> >>>> > >> > >> vertexID: 10
> >>>> > >> > >> vertexID: 12
> >>>> > >> > >> vertexID: 14
> >>>> > >> > >> vertexID: 16
> >>>> > >> > >> vertexID: 18
> >>>> > >> > >> vertexID: 21
> >>>> > >> > >> vertexID: 23
> >>>> > >> > >> vertexID: 25
> >>>> > >> > >> vertexID: 27
> >>>> > >> > >> vertexID: 29
> >>>> > >> > >> vertexID: 3
> >>>> > >> > >>
> >>>> > >> > >> So this won't work then correctly...
> >>>> > >> > >>
> >>>> > >> > >>
> >>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
> >>>> > >> > >>
> >>>> > >> > >>> sure, have fun on your holidays.
> >>>> > >> > >>>
> >>>> > >> > >>>
> >>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >>>> > >> > >>>
> >>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
> >>>> > holiday[1]
> >>>> > >> so
> >>>> > >> > >>>> I'll appear next week.
> >>>> > >> > >>>>
> >>>> > >> > >>>> 1.
> http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> >>>> > >> > >>>>
> >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
> >>>> > >> > >>>> <th...@gmail.com> wrote:
> >>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
> >>>> items
> >>>> > >> were
> >>>> > >> > >>>> added.
> >>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID
> into
> >>>> > the
> >>>> > >> > >>>> fastgen,
> >>>> > >> > >>>> > want to have a look into it?
> >>>> > >> > >>>> >
> >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >>>> > >> > >>>> >
> >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
> >>>> matrix
> >>>> > >> into
> >>>> > >> > >>>> >> multiple files.
> >>>> > >> > >>>> >>
> >>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
> >>>> > >> > >>>> >> <th...@gmail.com> wrote:
> >>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
> >>>> > >> > >>>> >> >
> >>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >>>> > >> > >>>> >> >
> >>>> > >> > >>>> >> >> It looks like a bug.
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> >>>> > >> /tmp/randomgraph/
> >>>> > >> > >>>> >> >> total 44
> >>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
> >>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01
> part-00000
> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
> >>>> > .part-00000.crc
> >>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01
> part-00001
> >>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
> >>>> > .part-00001.crc
> >>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03
> partitions
> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> >>>> > >> > >>>> >> /tmp/randomgraph/partitions/
> >>>> > >> > >>>> >> >> total 24
> >>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
> >>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
> part-00000
> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
> >>>> > .part-00000.crc
> >>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
> part-00001
> >>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
> >>>> > .part-00001.crc
> >>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
> >>>> edward@udanax.org
> >>>> > >
> >>>> > >> > wrote:
> >>>> > >> > >>>> >> >> > yes i'll check again
> >>>> > >> > >>>> >> >> >
> >>>> > >> > >>>> >> >> > Sent from my iPhone
> >>>> > >> > >>>> >> >> >
> >>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
> >>>> > >> > >>>> >> thomas.jungblut@gmail.com>
> >>>> > >> > >>>> >> >> wrote:
> >>>> > >> > >>>> >> >> >
> >>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
> >>>> > >> > >>>> >> >> >>
> >>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
> >>>> > >> part-00001,
> >>>> > >> > >>>> both
> >>>> > >> > >>>> >> ~2.2kb
> >>>> > >> > >>>> >> >> >> sized.
> >>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
> >>>> single
> >>>> > >> > 5.56kb
> >>>> > >> > >>>> file.
> >>>> > >> > >>>> >> >> >>
> >>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a
> single
> >>>> > file
> >>>> > >> if
> >>>> > >> > you
> >>>> > >> > >>>> >> >> configured
> >>>> > >> > >>>> >> >> >> two?
> >>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
> >>>> > >> > >>>> >> >> >>
> >>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <
> thomas.jungblut@gmail.com>
> >>>> > >> > >>>> >> >> >>
> >>>> > >> > >>>> >> >> >>> Will have a look into it.
> >>>> > >> > >>>> >> >> >>>
> >>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
> >>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
> >>>> > >> > >>>> >> >> >>>
> >>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe
> the
> >>>> > >> > >>>> partitioning
> >>>> > >> > >>>> >> >> doesn't
> >>>> > >> > >>>> >> >> >>> partition correctly with the input or something
> else.
> >>>> > >> > >>>> >> >> >>>
> >>>> > >> > >>>> >> >> >>>
> >>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org
> >
> >>>> > >> > >>>> >> >> >>>
> >>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> :~/workspace/hama-trunk$
> >>>> > >> bin/hama
> >>>> > >> > jar
> >>>> > >> > >>>> >> >> >>>>
> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
> >>>> > >> > fastgen
> >>>> > >> > >>>> 100 10
> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
> Unable
> >>>> > to
> >>>> > >> > load
> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
> using
> >>>> > >> > builtin-java
> >>>> > >> > >>>> >> classes
> >>>> > >> > >>>> >> >> >>>> where applicable
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
> Running
> >>>> job:
> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
> Setting
> >>>> up
> >>>> > a
> >>>> > >> new
> >>>> > >> > >>>> barrier
> >>>> > >> > >>>> >> >> for 2
> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> Current
> >>>> > >> supersteps
> >>>> > >> > >>>> >> number: 0
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The
> total
> >>>> > number
> >>>> > >> > of
> >>>> > >> > >>>> >> >> supersteps: 0
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> Counters: 3
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >>>> > SUPERSTEPS=0
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >>>> > >> > LAUNCHED_TASKS=2
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
> >>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> :~/workspace/hama-trunk$
> >>>> > >> bin/hama
> >>>> > >> > jar
> >>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> >>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox
> :~/workspace/hama-trunk$
> >>>> > >> bin/hama
> >>>> > >> > jar
> >>>> > >> > >>>> >> >> >>>>
> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> >>>> > pagerank
> >>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
> Unable
> >>>> > to
> >>>> > >> > load
> >>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform...
> using
> >>>> > >> > builtin-java
> >>>> > >> > >>>> >> classes
> >>>> > >> > >>>> >> >> >>>> where applicable
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
> Total
> >>>> > input
> >>>> > >> > paths
> >>>> > >> > >>>> to
> >>>> > >> > >>>> >> >> process
> >>>> > >> > >>>> >> >> >>>> : 2
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
> Total
> >>>> > input
> >>>> > >> > paths
> >>>> > >> > >>>> to
> >>>> > >> > >>>> >> >> process
> >>>> > >> > >>>> >> >> >>>> : 2
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
> Running
> >>>> job:
> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
> Setting
> >>>> up
> >>>> > a
> >>>> > >> new
> >>>> > >> > >>>> barrier
> >>>> > >> > >>>> >> >> for 2
> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> Current
> >>>> > >> supersteps
> >>>> > >> > >>>> >> number: 1
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The
> total
> >>>> > number
> >>>> > >> > of
> >>>> > >> > >>>> >> >> supersteps: 1
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> Counters: 6
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >>>> > SUPERSTEPS=1
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >>>> > >> > LAUNCHED_TASKS=2
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >>>> > >> > SUPERSTEP_SUM=4
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >>>> > >> > >>>> IO_BYTES_READ=4332
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >>>> > >> > >>>> TIME_IN_SYNC_MS=14
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >>>> > >> > >>>> TASK_INPUT_RECORDS=100
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
> Total
> >>>> > input
> >>>> > >> > paths
> >>>> > >> > >>>> to
> >>>> > >> > >>>> >> >> process
> >>>> > >> > >>>> >> >> >>>> : 2
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> Running
> >>>> job:
> >>>> > >> > >>>> >> >> job_localrunner_0001
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
> Setting
> >>>> up
> >>>> > a
> >>>> > >> new
> >>>> > >> > >>>> barrier
> >>>> > >> > >>>> >> >> for 2
> >>>> > >> > >>>> >> >> >>>> tasks!
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
> >>>> > vertices
> >>>> > >> > are
> >>>> > >> > >>>> loaded
> >>>> > >> > >>>> >> >> into
> >>>> > >> > >>>> >> >> >>>> local:1
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
> >>>> > vertices
> >>>> > >> > are
> >>>> > >> > >>>> loaded
> >>>> > >> > >>>> >> >> into
> >>>> > >> > >>>> >> >> >>>> local:0
> >>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
> >>>> Exception
> >>>> > >> > during
> >>>> > >> > >>>> BSP
> >>>> > >> > >>>> >> >> >>>> execution!
> >>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages
> must
> >>>> > never
> >>>> > >> be
> >>>> > >> > >>>> behind
> >>>> > >> > >>>> >> the
> >>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >>
> >>>> > >> >
> >>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>>
> >>>> > >> >
> >>>> >
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>>
> >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>>
> >>>> > >> >
> >>>> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>>
> >>>> > >> >
> >>>> > >>
> >>>> >
> >>>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>>
> >>>> > >> >
> >>>> > >>
> >>>> >
> >>>>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>>
> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>>
> >>>> > >> >
> >>>> >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>>
> >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>>
> >>>> > >> >
> >>>> > >>
> >>>> >
> >>>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>> > >> > >>>> >> >> >>>>        at
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>>
> >>>> > >> >
> >>>> > >>
> >>>> >
> >>>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >> >>>>
> >>>> > >> > >>>> >> >> >>>> --
> >>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
> >>>> > >> > >>>> >> >> >>>> @eddieyoon
> >>>> > >> > >>>> >> >> >>>
> >>>> > >> > >>>> >> >> >>>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >> >> --
> >>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
> >>>> > >> > >>>> >> >> @eddieyoon
> >>>> > >> > >>>> >> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>> >>
> >>>> > >> > >>>> >> --
> >>>> > >> > >>>> >> Best Regards, Edward J. Yoon
> >>>> > >> > >>>> >> @eddieyoon
> >>>> > >> > >>>> >>
> >>>> > >> > >>>>
> >>>> > >> > >>>>
> >>>> > >> > >>>>
> >>>> > >> > >>>> --
> >>>> > >> > >>>> Best Regards, Edward J. Yoon
> >>>> > >> > >>>> @eddieyoon
> >>>> > >> > >>>>
> >>>> > >> > >>>
> >>>> > >> > >>>
> >>>> > >> > >>
> >>>> > >> >
> >>>> > >> >
> >>>> > >> >
> >>>> > >> > --
> >>>> > >> > Best Regards, Edward J. Yoon
> >>>> > >> > @eddieyoon
> >>>> > >> >
> >>>> > >>
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Best Regards, Edward J. Yoon
> >>>> > @eddieyoon
> >>>> >
> >>>>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

I just filed here https://issues.apache.org/jira/browse/HAMA-744

On Mon, Mar 11, 2013 at 7:35 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Additionally,
>
>> spilling queue and sorted spilling queue, can we inject the partitioning
>> superstep as the first superstep and use local memory?
>
> Can we execute different number of tasks per superstep?
>
> On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org> wrote:
>>> For graph processing, the partitioned files that result from the
>>> partitioning job must be sorted. Currently only the partition files in
>>
>> I see.
>>
>>> For other partitionings and with regard to our superstep API, Suraj's idea
>>> of injecting a preprocessing superstep that partitions the stuff into our
>>> messaging system is actually the best.
>>
>> BTW, if some garbage objects can be accumulated in partitioning step,
>> separated partitioning job may not be bad idea. Is there some special
>> reason?
>>
>> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
>> <th...@gmail.com> wrote:
>>> For graph processing, the partitioned files that result from the
>>> partitioning job must be sorted. Currently only the partition files in
>>> itself are sorted, thus more tasks result in not sorted data in the
>>> completed file. This only applies for the graph processing package.
>>> So as Suraj told, it would be much more simpler to solve this via
>>> messaging, once it is scalable (it will be very very scalable!). So the
>>> GraphJobRunner can be partitioning the stuff with a single superstep in
>>> setup() as it was before ages ago. The messaging must be sorted anyway for
>>> the algorithm so this is a nice side effect and saves us the partitioning
>>> job for graph processing.
>>>
>>> For other partitionings and with regard to our superstep API, Suraj's idea
>>> of injecting a preprocessing superstep that partitions the stuff into our
>>> messaging system is actually the best.
>>>
>>>
>>> 2013/3/6 Suraj Menon <su...@apache.org>
>>>
>>>> No, the partitions we write locally need not be sorted. Sorry for the
>>>> confusion. The Superstep injection is possible with Superstep API. There
>>>> are few enhancements needed to make it simpler after I last worked on it.
>>>> We can then look into partitioning superstep being executed before the
>>>> setup of first superstep of submitted job. I think it is feasible.
>>>>
>>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>>>> >wrote:
>>>>
>>>> > > spilling queue and sorted spilling queue, can we inject the
>>>> partitioning
>>>> > > superstep as the first superstep and use local memory?
>>>> >
>>>> > Actually, I wanted to add something before calling BSP.setup() method
>>>> > to avoid execute additional BSP job. But, in my opinion, current is
>>>> > enough. I think, we need to collect more experiences of input
>>>> > partitioning on large environments. I'll do.
>>>> >
>>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>>>> >
>>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <su...@apache.org>
>>>> > wrote:
>>>> > > Sorry, I am increasing the scope here to outside graph module. When we
>>>> > have
>>>> > > spilling queue and sorted spilling queue, can we inject the
>>>> partitioning
>>>> > > superstep as the first superstep and use local memory?
>>>> > > Today we have partitioning job within a job and are creating two copies
>>>> > of
>>>> > > data on HDFS. This could be really costly. Is it possible to create or
>>>> > > redistribute the partitions on local memory and initialize the record
>>>> > > reader there?
>>>> > > The user can run a separate job give in examples area to explicitly
>>>> > > repartition the data on HDFS. The deployment question is how much of
>>>> disk
>>>> > > space gets allocated for local memory usage? Would it be a safe
>>>> approach
>>>> > > with the limitations?
>>>> > >
>>>> > > -Suraj
>>>> > >
>>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>>> > > <th...@gmail.com>wrote:
>>>> > >
>>>> > >> yes. Once Suraj added merging of sorted files we can add this to the
>>>> > >> partitioner pretty easily.
>>>> > >>
>>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> > >>
>>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
>>>> > >> >
>>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>>> > >> > <th...@gmail.com> wrote:
>>>> > >> > > Now I get how the partitioning works, obviously if you merge n
>>>> > sorted
>>>> > >> > files
>>>> > >> > > by just appending to each other, this will result in totally
>>>> > unsorted
>>>> > >> > data
>>>> > >> > > ;-)
>>>> > >> > > Why didn't you solve this via messaging?
>>>> > >> > >
>>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>> > >> > >
>>>> > >> > >> Seems that they are not correctly sorted:
>>>> > >> > >>
>>>> > >> > >> vertexID: 50
>>>> > >> > >> vertexID: 52
>>>> > >> > >> vertexID: 54
>>>> > >> > >> vertexID: 56
>>>> > >> > >> vertexID: 58
>>>> > >> > >> vertexID: 61
>>>> > >> > >> ...
>>>> > >> > >> vertexID: 78
>>>> > >> > >> vertexID: 81
>>>> > >> > >> vertexID: 83
>>>> > >> > >> vertexID: 85
>>>> > >> > >> ...
>>>> > >> > >> vertexID: 94
>>>> > >> > >> vertexID: 96
>>>> > >> > >> vertexID: 98
>>>> > >> > >> vertexID: 1
>>>> > >> > >> vertexID: 10
>>>> > >> > >> vertexID: 12
>>>> > >> > >> vertexID: 14
>>>> > >> > >> vertexID: 16
>>>> > >> > >> vertexID: 18
>>>> > >> > >> vertexID: 21
>>>> > >> > >> vertexID: 23
>>>> > >> > >> vertexID: 25
>>>> > >> > >> vertexID: 27
>>>> > >> > >> vertexID: 29
>>>> > >> > >> vertexID: 3
>>>> > >> > >>
>>>> > >> > >> So this won't work then correctly...
>>>> > >> > >>
>>>> > >> > >>
>>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>> > >> > >>
>>>> > >> > >>> sure, have fun on your holidays.
>>>> > >> > >>>
>>>> > >> > >>>
>>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> > >> > >>>
>>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>>>> > holiday[1]
>>>> > >> so
>>>> > >> > >>>> I'll appear next week.
>>>> > >> > >>>>
>>>> > >> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>>> > >> > >>>>
>>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>>>> > >> > >>>> <th...@gmail.com> wrote:
>>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>>>> items
>>>> > >> were
>>>> > >> > >>>> added.
>>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID into
>>>> > the
>>>> > >> > >>>> fastgen,
>>>> > >> > >>>> > want to have a look into it?
>>>> > >> > >>>> >
>>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> > >> > >>>> >
>>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>>>> matrix
>>>> > >> into
>>>> > >> > >>>> >> multiple files.
>>>> > >> > >>>> >>
>>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>>>> > >> > >>>> >> >
>>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> > >> > >>>> >> >
>>>> > >> > >>>> >> >> It looks like a bug.
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>> > >> /tmp/randomgraph/
>>>> > >> > >>>> >> >> total 44
>>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01 part-00000
>>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>> > .part-00000.crc
>>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01 part-00001
>>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>>> > .part-00001.crc
>>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03 partitions
>>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>>> > >> > >>>> >> >> total 24
>>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03 part-00000
>>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>> > .part-00000.crc
>>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03 part-00001
>>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>>> > .part-00001.crc
>>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>>>> edward@udanax.org
>>>> > >
>>>> > >> > wrote:
>>>> > >> > >>>> >> >> > yes i'll check again
>>>> > >> > >>>> >> >> >
>>>> > >> > >>>> >> >> > Sent from my iPhone
>>>> > >> > >>>> >> >> >
>>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>>> > >> > >>>> >> >> wrote:
>>>> > >> > >>>> >> >> >
>>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>>>> > >> > >>>> >> >> >>
>>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>>>> > >> part-00001,
>>>> > >> > >>>> both
>>>> > >> > >>>> >> ~2.2kb
>>>> > >> > >>>> >> >> >> sized.
>>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>>>> single
>>>> > >> > 5.56kb
>>>> > >> > >>>> file.
>>>> > >> > >>>> >> >> >>
>>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a single
>>>> > file
>>>> > >> if
>>>> > >> > you
>>>> > >> > >>>> >> >> configured
>>>> > >> > >>>> >> >> >> two?
>>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>>>> > >> > >>>> >> >> >>
>>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>>> > >> > >>>> >> >> >>
>>>> > >> > >>>> >> >> >>> Will have a look into it.
>>>> > >> > >>>> >> >> >>>
>>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>>>> > >> > >>>> >> >> >>>
>>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe the
>>>> > >> > >>>> partitioning
>>>> > >> > >>>> >> >> doesn't
>>>> > >> > >>>> >> >> >>> partition correctly with the input or something else.
>>>> > >> > >>>> >> >> >>>
>>>> > >> > >>>> >> >> >>>
>>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>>> > >> > >>>> >> >> >>>
>>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>>>> > >> bin/hama
>>>> > >> > jar
>>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>>> > >> > fastgen
>>>> > >> > >>>> 100 10
>>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable
>>>> > to
>>>> > >> > load
>>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform... using
>>>> > >> > builtin-java
>>>> > >> > >>>> >> classes
>>>> > >> > >>>> >> >> >>>> where applicable
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running
>>>> job:
>>>> > >> > >>>> >> >> job_localrunner_0001
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting
>>>> up
>>>> > a
>>>> > >> new
>>>> > >> > >>>> barrier
>>>> > >> > >>>> >> >> for 2
>>>> > >> > >>>> >> >> >>>> tasks!
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current
>>>> > >> supersteps
>>>> > >> > >>>> >> number: 0
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total
>>>> > number
>>>> > >> > of
>>>> > >> > >>>> >> >> supersteps: 0
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> > SUPERSTEPS=0
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> > >> > LAUNCHED_TASKS=2
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>>>> > >> bin/hama
>>>> > >> > jar
>>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>>>> > >> bin/hama
>>>> > >> > jar
>>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>>> > pagerank
>>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable
>>>> > to
>>>> > >> > load
>>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform... using
>>>> > >> > builtin-java
>>>> > >> > >>>> >> classes
>>>> > >> > >>>> >> >> >>>> where applicable
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
>>>> > input
>>>> > >> > paths
>>>> > >> > >>>> to
>>>> > >> > >>>> >> >> process
>>>> > >> > >>>> >> >> >>>> : 2
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
>>>> > input
>>>> > >> > paths
>>>> > >> > >>>> to
>>>> > >> > >>>> >> >> process
>>>> > >> > >>>> >> >> >>>> : 2
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running
>>>> job:
>>>> > >> > >>>> >> >> job_localrunner_0001
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting
>>>> up
>>>> > a
>>>> > >> new
>>>> > >> > >>>> barrier
>>>> > >> > >>>> >> >> for 2
>>>> > >> > >>>> >> >> >>>> tasks!
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current
>>>> > >> supersteps
>>>> > >> > >>>> >> number: 1
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total
>>>> > number
>>>> > >> > of
>>>> > >> > >>>> >> >> supersteps: 1
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> > SUPERSTEPS=1
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> > >> > LAUNCHED_TASKS=2
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> > >> > SUPERSTEP_SUM=4
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> > >> > >>>> IO_BYTES_READ=4332
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total
>>>> > input
>>>> > >> > paths
>>>> > >> > >>>> to
>>>> > >> > >>>> >> >> process
>>>> > >> > >>>> >> >> >>>> : 2
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running
>>>> job:
>>>> > >> > >>>> >> >> job_localrunner_0001
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting
>>>> up
>>>> > a
>>>> > >> new
>>>> > >> > >>>> barrier
>>>> > >> > >>>> >> >> for 2
>>>> > >> > >>>> >> >> >>>> tasks!
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>> > vertices
>>>> > >> > are
>>>> > >> > >>>> loaded
>>>> > >> > >>>> >> >> into
>>>> > >> > >>>> >> >> >>>> local:1
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>>> > vertices
>>>> > >> > are
>>>> > >> > >>>> loaded
>>>> > >> > >>>> >> >> into
>>>> > >> > >>>> >> >> >>>> local:0
>>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>>>> Exception
>>>> > >> > during
>>>> > >> > >>>> BSP
>>>> > >> > >>>> >> >> >>>> execution!
>>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must
>>>> > never
>>>> > >> be
>>>> > >> > >>>> behind
>>>> > >> > >>>> >> the
>>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >>
>>>> > >> >
>>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>>
>>>> > >> >
>>>> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>>
>>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>>
>>>> > >> >
>>>> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>>
>>>> > >> >
>>>> > >>
>>>> >
>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>>
>>>> > >> >
>>>> > >>
>>>> >
>>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>>
>>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>>
>>>> > >> >
>>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>>
>>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>>
>>>> > >> >
>>>> > >>
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> > >> > >>>> >> >> >>>>        at
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>>
>>>> > >> >
>>>> > >>
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >> >>>>
>>>> > >> > >>>> >> >> >>>> --
>>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>>> > >> > >>>> >> >> >>>> @eddieyoon
>>>> > >> > >>>> >> >> >>>
>>>> > >> > >>>> >> >> >>>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >> >> --
>>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>>> > >> > >>>> >> >> @eddieyoon
>>>> > >> > >>>> >> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>> >>
>>>> > >> > >>>> >> --
>>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>>> > >> > >>>> >> @eddieyoon
>>>> > >> > >>>> >>
>>>> > >> > >>>>
>>>> > >> > >>>>
>>>> > >> > >>>>
>>>> > >> > >>>> --
>>>> > >> > >>>> Best Regards, Edward J. Yoon
>>>> > >> > >>>> @eddieyoon
>>>> > >> > >>>>
>>>> > >> > >>>
>>>> > >> > >>>
>>>> > >> > >>
>>>> > >> >
>>>> > >> >
>>>> > >> >
>>>> > >> > --
>>>> > >> > Best Regards, Edward J. Yoon
>>>> > >> > @eddieyoon
>>>> > >> >
>>>> > >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best Regards, Edward J. Yoon
>>>> > @eddieyoon
>>>> >
>>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

Additionally,

> spilling queue and sorted spilling queue, can we inject the partitioning
> superstep as the first superstep and use local memory?

Can we execute different number of tasks per superstep?

On Mon, Mar 11, 2013 at 6:56 PM, Edward J. Yoon <ed...@apache.org> wrote:
>> For graph processing, the partitioned files that result from the
>> partitioning job must be sorted. Currently only the partition files in
>
> I see.
>
>> For other partitionings and with regard to our superstep API, Suraj's idea
>> of injecting a preprocessing superstep that partitions the stuff into our
>> messaging system is actually the best.
>
> BTW, if some garbage objects can be accumulated in partitioning step,
> separated partitioning job may not be bad idea. Is there some special
> reason?
>
> On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
> <th...@gmail.com> wrote:
>> For graph processing, the partitioned files that result from the
>> partitioning job must be sorted. Currently only the partition files in
>> itself are sorted, thus more tasks result in not sorted data in the
>> completed file. This only applies for the graph processing package.
>> So as Suraj told, it would be much more simpler to solve this via
>> messaging, once it is scalable (it will be very very scalable!). So the
>> GraphJobRunner can be partitioning the stuff with a single superstep in
>> setup() as it was before ages ago. The messaging must be sorted anyway for
>> the algorithm so this is a nice side effect and saves us the partitioning
>> job for graph processing.
>>
>> For other partitionings and with regard to our superstep API, Suraj's idea
>> of injecting a preprocessing superstep that partitions the stuff into our
>> messaging system is actually the best.
>>
>>
>> 2013/3/6 Suraj Menon <su...@apache.org>
>>
>>> No, the partitions we write locally need not be sorted. Sorry for the
>>> confusion. The Superstep injection is possible with Superstep API. There
>>> are few enhancements needed to make it simpler after I last worked on it.
>>> We can then look into partitioning superstep being executed before the
>>> setup of first superstep of submitted job. I think it is feasible.
>>>
>>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>>> >wrote:
>>>
>>> > > spilling queue and sorted spilling queue, can we inject the
>>> partitioning
>>> > > superstep as the first superstep and use local memory?
>>> >
>>> > Actually, I wanted to add something before calling BSP.setup() method
>>> > to avoid execute additional BSP job. But, in my opinion, current is
>>> > enough. I think, we need to collect more experiences of input
>>> > partitioning on large environments. I'll do.
>>> >
>>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>>> >
>>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <su...@apache.org>
>>> > wrote:
>>> > > Sorry, I am increasing the scope here to outside graph module. When we
>>> > have
>>> > > spilling queue and sorted spilling queue, can we inject the
>>> partitioning
>>> > > superstep as the first superstep and use local memory?
>>> > > Today we have partitioning job within a job and are creating two copies
>>> > of
>>> > > data on HDFS. This could be really costly. Is it possible to create or
>>> > > redistribute the partitions on local memory and initialize the record
>>> > > reader there?
>>> > > The user can run a separate job give in examples area to explicitly
>>> > > repartition the data on HDFS. The deployment question is how much of
>>> disk
>>> > > space gets allocated for local memory usage? Would it be a safe
>>> approach
>>> > > with the limitations?
>>> > >
>>> > > -Suraj
>>> > >
>>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>>> > > <th...@gmail.com>wrote:
>>> > >
>>> > >> yes. Once Suraj added merging of sorted files we can add this to the
>>> > >> partitioner pretty easily.
>>> > >>
>>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> > >>
>>> > >> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
>>> > >> >
>>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>>> > >> > <th...@gmail.com> wrote:
>>> > >> > > Now I get how the partitioning works, obviously if you merge n
>>> > sorted
>>> > >> > files
>>> > >> > > by just appending to each other, this will result in totally
>>> > unsorted
>>> > >> > data
>>> > >> > > ;-)
>>> > >> > > Why didn't you solve this via messaging?
>>> > >> > >
>>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>> > >> > >
>>> > >> > >> Seems that they are not correctly sorted:
>>> > >> > >>
>>> > >> > >> vertexID: 50
>>> > >> > >> vertexID: 52
>>> > >> > >> vertexID: 54
>>> > >> > >> vertexID: 56
>>> > >> > >> vertexID: 58
>>> > >> > >> vertexID: 61
>>> > >> > >> ...
>>> > >> > >> vertexID: 78
>>> > >> > >> vertexID: 81
>>> > >> > >> vertexID: 83
>>> > >> > >> vertexID: 85
>>> > >> > >> ...
>>> > >> > >> vertexID: 94
>>> > >> > >> vertexID: 96
>>> > >> > >> vertexID: 98
>>> > >> > >> vertexID: 1
>>> > >> > >> vertexID: 10
>>> > >> > >> vertexID: 12
>>> > >> > >> vertexID: 14
>>> > >> > >> vertexID: 16
>>> > >> > >> vertexID: 18
>>> > >> > >> vertexID: 21
>>> > >> > >> vertexID: 23
>>> > >> > >> vertexID: 25
>>> > >> > >> vertexID: 27
>>> > >> > >> vertexID: 29
>>> > >> > >> vertexID: 3
>>> > >> > >>
>>> > >> > >> So this won't work then correctly...
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>> > >> > >>
>>> > >> > >>> sure, have fun on your holidays.
>>> > >> > >>>
>>> > >> > >>>
>>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> > >> > >>>
>>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>>> > holiday[1]
>>> > >> so
>>> > >> > >>>> I'll appear next week.
>>> > >> > >>>>
>>> > >> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>>> > >> > >>>>
>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>>> > >> > >>>> <th...@gmail.com> wrote:
>>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>>> items
>>> > >> were
>>> > >> > >>>> added.
>>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID into
>>> > the
>>> > >> > >>>> fastgen,
>>> > >> > >>>> > want to have a look into it?
>>> > >> > >>>> >
>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> > >> > >>>> >
>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>>> matrix
>>> > >> into
>>> > >> > >>>> >> multiple files.
>>> > >> > >>>> >>
>>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>>> > >> > >>>> >> <th...@gmail.com> wrote:
>>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>>> > >> > >>>> >> >
>>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> > >> > >>>> >> >
>>> > >> > >>>> >> >> It looks like a bug.
>>> > >> > >>>> >> >>
>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>> > >> /tmp/randomgraph/
>>> > >> > >>>> >> >> total 44
>>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01 part-00000
>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>> > .part-00000.crc
>>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01 part-00001
>>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>>> > .part-00001.crc
>>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03 partitions
>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>>> > >> > >>>> >> /tmp/randomgraph/partitions/
>>> > >> > >>>> >> >> total 24
>>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03 part-00000
>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>> > .part-00000.crc
>>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03 part-00001
>>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>>> > .part-00001.crc
>>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>>> > >> > >>>> >> >>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>>> edward@udanax.org
>>> > >
>>> > >> > wrote:
>>> > >> > >>>> >> >> > yes i'll check again
>>> > >> > >>>> >> >> >
>>> > >> > >>>> >> >> > Sent from my iPhone
>>> > >> > >>>> >> >> >
>>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>>> > >> > >>>> >> thomas.jungblut@gmail.com>
>>> > >> > >>>> >> >> wrote:
>>> > >> > >>>> >> >> >
>>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>>> > >> > >>>> >> >> >>
>>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>>> > >> part-00001,
>>> > >> > >>>> both
>>> > >> > >>>> >> ~2.2kb
>>> > >> > >>>> >> >> >> sized.
>>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>>> single
>>> > >> > 5.56kb
>>> > >> > >>>> file.
>>> > >> > >>>> >> >> >>
>>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a single
>>> > file
>>> > >> if
>>> > >> > you
>>> > >> > >>>> >> >> configured
>>> > >> > >>>> >> >> >> two?
>>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>>> > >> > >>>> >> >> >>
>>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>>> > >> > >>>> >> >> >>
>>> > >> > >>>> >> >> >>> Will have a look into it.
>>> > >> > >>>> >> >> >>>
>>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>>> > >> > >>>> >> >> >>>
>>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe the
>>> > >> > >>>> partitioning
>>> > >> > >>>> >> >> doesn't
>>> > >> > >>>> >> >> >>> partition correctly with the input or something else.
>>> > >> > >>>> >> >> >>>
>>> > >> > >>>> >> >> >>>
>>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>>> > >> > >>>> >> >> >>>
>>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>>> > >> bin/hama
>>> > >> > jar
>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>>> > >> > fastgen
>>> > >> > >>>> 100 10
>>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable
>>> > to
>>> > >> > load
>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform... using
>>> > >> > builtin-java
>>> > >> > >>>> >> classes
>>> > >> > >>>> >> >> >>>> where applicable
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running
>>> job:
>>> > >> > >>>> >> >> job_localrunner_0001
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting
>>> up
>>> > a
>>> > >> new
>>> > >> > >>>> barrier
>>> > >> > >>>> >> >> for 2
>>> > >> > >>>> >> >> >>>> tasks!
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current
>>> > >> supersteps
>>> > >> > >>>> >> number: 0
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total
>>> > number
>>> > >> > of
>>> > >> > >>>> >> >> supersteps: 0
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> > SUPERSTEPS=0
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> > >> > LAUNCHED_TASKS=2
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>>> > >> bin/hama
>>> > >> > jar
>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>>> > >> bin/hama
>>> > >> > jar
>>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>>> > pagerank
>>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable
>>> > to
>>> > >> > load
>>> > >> > >>>> >> >> >>>> native-hadoop library for your platform... using
>>> > >> > builtin-java
>>> > >> > >>>> >> classes
>>> > >> > >>>> >> >> >>>> where applicable
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
>>> > input
>>> > >> > paths
>>> > >> > >>>> to
>>> > >> > >>>> >> >> process
>>> > >> > >>>> >> >> >>>> : 2
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
>>> > input
>>> > >> > paths
>>> > >> > >>>> to
>>> > >> > >>>> >> >> process
>>> > >> > >>>> >> >> >>>> : 2
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running
>>> job:
>>> > >> > >>>> >> >> job_localrunner_0001
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting
>>> up
>>> > a
>>> > >> new
>>> > >> > >>>> barrier
>>> > >> > >>>> >> >> for 2
>>> > >> > >>>> >> >> >>>> tasks!
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current
>>> > >> supersteps
>>> > >> > >>>> >> number: 1
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total
>>> > number
>>> > >> > of
>>> > >> > >>>> >> >> supersteps: 1
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> > SUPERSTEPS=1
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> > >> > LAUNCHED_TASKS=2
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> > >> > SUPERSTEP_SUM=4
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> > >> > >>>> IO_BYTES_READ=4332
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> > >> > >>>> TIME_IN_SYNC_MS=14
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>>> > >> > >>>> TASK_INPUT_RECORDS=100
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total
>>> > input
>>> > >> > paths
>>> > >> > >>>> to
>>> > >> > >>>> >> >> process
>>> > >> > >>>> >> >> >>>> : 2
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running
>>> job:
>>> > >> > >>>> >> >> job_localrunner_0001
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting
>>> up
>>> > a
>>> > >> new
>>> > >> > >>>> barrier
>>> > >> > >>>> >> >> for 2
>>> > >> > >>>> >> >> >>>> tasks!
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>> > vertices
>>> > >> > are
>>> > >> > >>>> loaded
>>> > >> > >>>> >> >> into
>>> > >> > >>>> >> >> >>>> local:1
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>>> > vertices
>>> > >> > are
>>> > >> > >>>> loaded
>>> > >> > >>>> >> >> into
>>> > >> > >>>> >> >> >>>> local:0
>>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>>> Exception
>>> > >> > during
>>> > >> > >>>> BSP
>>> > >> > >>>> >> >> >>>> execution!
>>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must
>>> > never
>>> > >> be
>>> > >> > >>>> behind
>>> > >> > >>>> >> the
>>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >>
>>> > >> >
>>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >>
>>> > >> > >>>>
>>> > >> >
>>> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>>
>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >>
>>> > >> > >>>>
>>> > >> >
>>> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >>
>>> > >> > >>>>
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >>
>>> > >> > >>>>
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>>
>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >>
>>> > >> > >>>>
>>> > >> >
>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>>
>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >>
>>> > >> > >>>>
>>> > >> >
>>> > >>
>>> >
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> > >> > >>>> >> >> >>>>        at
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >>
>>> > >> > >>>>
>>> > >> >
>>> > >>
>>> >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >> >>>>
>>> > >> > >>>> >> >> >>>> --
>>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>>> > >> > >>>> >> >> >>>> @eddieyoon
>>> > >> > >>>> >> >> >>>
>>> > >> > >>>> >> >> >>>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >> >>
>>> > >> > >>>> >> >> --
>>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>>> > >> > >>>> >> >> @eddieyoon
>>> > >> > >>>> >> >>
>>> > >> > >>>> >>
>>> > >> > >>>> >>
>>> > >> > >>>> >>
>>> > >> > >>>> >> --
>>> > >> > >>>> >> Best Regards, Edward J. Yoon
>>> > >> > >>>> >> @eddieyoon
>>> > >> > >>>> >>
>>> > >> > >>>>
>>> > >> > >>>>
>>> > >> > >>>>
>>> > >> > >>>> --
>>> > >> > >>>> Best Regards, Edward J. Yoon
>>> > >> > >>>> @eddieyoon
>>> > >> > >>>>
>>> > >> > >>>
>>> > >> > >>>
>>> > >> > >>
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > --
>>> > >> > Best Regards, Edward J. Yoon
>>> > >> > @eddieyoon
>>> > >> >
>>> > >>
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon
>>> > @eddieyoon
>>> >
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by "Edward J. Yoon" <ed...@apache.org>.

> For graph processing, the partitioned files that result from the
> partitioning job must be sorted. Currently only the partition files in

I see.

> For other partitionings and with regard to our superstep API, Suraj's idea
> of injecting a preprocessing superstep that partitions the stuff into our
> messaging system is actually the best.

BTW, if some garbage objects can be accumulated in partitioning step,
separated partitioning job may not be bad idea. Is there some special
reason?

On Wed, Mar 6, 2013 at 6:15 PM, Thomas Jungblut
<th...@gmail.com> wrote:
> For graph processing, the partitioned files that result from the
> partitioning job must be sorted. Currently only the partition files in
> itself are sorted, thus more tasks result in not sorted data in the
> completed file. This only applies for the graph processing package.
> So as Suraj told, it would be much more simpler to solve this via
> messaging, once it is scalable (it will be very very scalable!). So the
> GraphJobRunner can be partitioning the stuff with a single superstep in
> setup() as it was before ages ago. The messaging must be sorted anyway for
> the algorithm so this is a nice side effect and saves us the partitioning
> job for graph processing.
>
> For other partitionings and with regard to our superstep API, Suraj's idea
> of injecting a preprocessing superstep that partitions the stuff into our
> messaging system is actually the best.
>
>
> 2013/3/6 Suraj Menon <su...@apache.org>
>
>> No, the partitions we write locally need not be sorted. Sorry for the
>> confusion. The Superstep injection is possible with Superstep API. There
>> are few enhancements needed to make it simpler after I last worked on it.
>> We can then look into partitioning superstep being executed before the
>> setup of first superstep of submitted job. I think it is feasible.
>>
>> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>>
>> > > spilling queue and sorted spilling queue, can we inject the
>> partitioning
>> > > superstep as the first superstep and use local memory?
>> >
>> > Actually, I wanted to add something before calling BSP.setup() method
>> > to avoid execute additional BSP job. But, in my opinion, current is
>> > enough. I think, we need to collect more experiences of input
>> > partitioning on large environments. I'll do.
>> >
>> > BTW, I still don't know why it need to be Sorted?! MR-like?
>> >
>> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <su...@apache.org>
>> > wrote:
>> > > Sorry, I am increasing the scope here to outside graph module. When we
>> > have
>> > > spilling queue and sorted spilling queue, can we inject the
>> partitioning
>> > > superstep as the first superstep and use local memory?
>> > > Today we have partitioning job within a job and are creating two copies
>> > of
>> > > data on HDFS. This could be really costly. Is it possible to create or
>> > > redistribute the partitions on local memory and initialize the record
>> > > reader there?
>> > > The user can run a separate job give in examples area to explicitly
>> > > repartition the data on HDFS. The deployment question is how much of
>> disk
>> > > space gets allocated for local memory usage? Would it be a safe
>> approach
>> > > with the limitations?
>> > >
>> > > -Suraj
>> > >
>> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
>> > > <th...@gmail.com>wrote:
>> > >
>> > >> yes. Once Suraj added merging of sorted files we can add this to the
>> > >> partitioner pretty easily.
>> > >>
>> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >>
>> > >> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
>> > >> >
>> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>> > >> > <th...@gmail.com> wrote:
>> > >> > > Now I get how the partitioning works, obviously if you merge n
>> > sorted
>> > >> > files
>> > >> > > by just appending to each other, this will result in totally
>> > unsorted
>> > >> > data
>> > >> > > ;-)
>> > >> > > Why didn't you solve this via messaging?
>> > >> > >
>> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
>> > >> > >
>> > >> > >> Seems that they are not correctly sorted:
>> > >> > >>
>> > >> > >> vertexID: 50
>> > >> > >> vertexID: 52
>> > >> > >> vertexID: 54
>> > >> > >> vertexID: 56
>> > >> > >> vertexID: 58
>> > >> > >> vertexID: 61
>> > >> > >> ...
>> > >> > >> vertexID: 78
>> > >> > >> vertexID: 81
>> > >> > >> vertexID: 83
>> > >> > >> vertexID: 85
>> > >> > >> ...
>> > >> > >> vertexID: 94
>> > >> > >> vertexID: 96
>> > >> > >> vertexID: 98
>> > >> > >> vertexID: 1
>> > >> > >> vertexID: 10
>> > >> > >> vertexID: 12
>> > >> > >> vertexID: 14
>> > >> > >> vertexID: 16
>> > >> > >> vertexID: 18
>> > >> > >> vertexID: 21
>> > >> > >> vertexID: 23
>> > >> > >> vertexID: 25
>> > >> > >> vertexID: 27
>> > >> > >> vertexID: 29
>> > >> > >> vertexID: 3
>> > >> > >>
>> > >> > >> So this won't work then correctly...
>> > >> > >>
>> > >> > >>
>> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>> > >> > >>
>> > >> > >>> sure, have fun on your holidays.
>> > >> > >>>
>> > >> > >>>
>> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >> > >>>
>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
>> > holiday[1]
>> > >> so
>> > >> > >>>> I'll appear next week.
>> > >> > >>>>
>> > >> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> > >> > >>>>
>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>> > >> > >>>> <th...@gmail.com> wrote:
>> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
>> items
>> > >> were
>> > >> > >>>> added.
>> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID into
>> > the
>> > >> > >>>> fastgen,
>> > >> > >>>> > want to have a look into it?
>> > >> > >>>> >
>> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >> > >>>> >
>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
>> matrix
>> > >> into
>> > >> > >>>> >> multiple files.
>> > >> > >>>> >>
>> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>> > >> > >>>> >> <th...@gmail.com> wrote:
>> > >> > >>>> >> > You have two files, are they partitioned correctly?
>> > >> > >>>> >> >
>> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >> > >>>> >> >
>> > >> > >>>> >> >> It looks like a bug.
>> > >> > >>>> >> >>
>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>> > >> /tmp/randomgraph/
>> > >> > >>>> >> >> total 44
>> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
>> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01 part-00000
>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>> > .part-00000.crc
>> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01 part-00001
>> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
>> > .part-00001.crc
>> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03 partitions
>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
>> > >> > >>>> >> /tmp/randomgraph/partitions/
>> > >> > >>>> >> >> total 24
>> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
>> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03 part-00000
>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>> > .part-00000.crc
>> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03 part-00001
>> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
>> > .part-00001.crc
>> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>> > >> > >>>> >> >>
>> > >> > >>>> >> >>
>> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
>> edward@udanax.org
>> > >
>> > >> > wrote:
>> > >> > >>>> >> >> > yes i'll check again
>> > >> > >>>> >> >> >
>> > >> > >>>> >> >> > Sent from my iPhone
>> > >> > >>>> >> >> >
>> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
>> > >> > >>>> >> thomas.jungblut@gmail.com>
>> > >> > >>>> >> >> wrote:
>> > >> > >>>> >> >> >
>> > >> > >>>> >> >> >> Can you verify an observation for me please?
>> > >> > >>>> >> >> >>
>> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
>> > >> part-00001,
>> > >> > >>>> both
>> > >> > >>>> >> ~2.2kb
>> > >> > >>>> >> >> >> sized.
>> > >> > >>>> >> >> >> In the below partition directory, there is only a
>> single
>> > >> > 5.56kb
>> > >> > >>>> file.
>> > >> > >>>> >> >> >>
>> > >> > >>>> >> >> >> Is it intended for the partitioner to write a single
>> > file
>> > >> if
>> > >> > you
>> > >> > >>>> >> >> configured
>> > >> > >>>> >> >> >> two?
>> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
>> > >> > >>>> >> >> >>
>> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
>> > >> > >>>> >> >> >>
>> > >> > >>>> >> >> >>> Will have a look into it.
>> > >> > >>>> >> >> >>>
>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
>> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>> > >> > >>>> >> >> >>>
>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe the
>> > >> > >>>> partitioning
>> > >> > >>>> >> >> doesn't
>> > >> > >>>> >> >> >>> partition correctly with the input or something else.
>> > >> > >>>> >> >> >>>
>> > >> > >>>> >> >> >>>
>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
>> > >> > >>>> >> >> >>>
>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> > >> bin/hama
>> > >> > jar
>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
>> > >> > fastgen
>> > >> > >>>> 100 10
>> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable
>> > to
>> > >> > load
>> > >> > >>>> >> >> >>>> native-hadoop library for your platform... using
>> > >> > builtin-java
>> > >> > >>>> >> classes
>> > >> > >>>> >> >> >>>> where applicable
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running
>> job:
>> > >> > >>>> >> >> job_localrunner_0001
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting
>> up
>> > a
>> > >> new
>> > >> > >>>> barrier
>> > >> > >>>> >> >> for 2
>> > >> > >>>> >> >> >>>> tasks!
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current
>> > >> supersteps
>> > >> > >>>> >> number: 0
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total
>> > number
>> > >> > of
>> > >> > >>>> >> >> supersteps: 0
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > SUPERSTEPS=0
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >> > LAUNCHED_TASKS=2
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
>> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> > >> bin/hama
>> > >> > jar
>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> > >> bin/hama
>> > >> > jar
>> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
>> > pagerank
>> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable
>> > to
>> > >> > load
>> > >> > >>>> >> >> >>>> native-hadoop library for your platform... using
>> > >> > builtin-java
>> > >> > >>>> >> classes
>> > >> > >>>> >> >> >>>> where applicable
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
>> > input
>> > >> > paths
>> > >> > >>>> to
>> > >> > >>>> >> >> process
>> > >> > >>>> >> >> >>>> : 2
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
>> > input
>> > >> > paths
>> > >> > >>>> to
>> > >> > >>>> >> >> process
>> > >> > >>>> >> >> >>>> : 2
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running
>> job:
>> > >> > >>>> >> >> job_localrunner_0001
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting
>> up
>> > a
>> > >> new
>> > >> > >>>> barrier
>> > >> > >>>> >> >> for 2
>> > >> > >>>> >> >> >>>> tasks!
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current
>> > >> supersteps
>> > >> > >>>> >> number: 1
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total
>> > number
>> > >> > of
>> > >> > >>>> >> >> supersteps: 1
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > SUPERSTEPS=1
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >> > LAUNCHED_TASKS=2
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >> > SUPERSTEP_SUM=4
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >> > >>>> IO_BYTES_READ=4332
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >> > >>>> TIME_IN_SYNC_MS=14
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >> > >>>> TASK_INPUT_RECORDS=100
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total
>> > input
>> > >> > paths
>> > >> > >>>> to
>> > >> > >>>> >> >> process
>> > >> > >>>> >> >> >>>> : 2
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running
>> job:
>> > >> > >>>> >> >> job_localrunner_0001
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting
>> up
>> > a
>> > >> new
>> > >> > >>>> barrier
>> > >> > >>>> >> >> for 2
>> > >> > >>>> >> >> >>>> tasks!
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>> > vertices
>> > >> > are
>> > >> > >>>> loaded
>> > >> > >>>> >> >> into
>> > >> > >>>> >> >> >>>> local:1
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
>> > vertices
>> > >> > are
>> > >> > >>>> loaded
>> > >> > >>>> >> >> into
>> > >> > >>>> >> >> >>>> local:0
>> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
>> Exception
>> > >> > during
>> > >> > >>>> BSP
>> > >> > >>>> >> >> >>>> execution!
>> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must
>> > never
>> > >> be
>> > >> > >>>> behind
>> > >> > >>>> >> the
>> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >>
>> > >> >
>> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >>
>> > >> > >>>> >>
>> > >> > >>>>
>> > >> >
>> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>>
>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >>
>> > >> > >>>> >>
>> > >> > >>>>
>> > >> >
>> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >>
>> > >> > >>>> >>
>> > >> > >>>>
>> > >> >
>> > >>
>> >
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >>
>> > >> > >>>> >>
>> > >> > >>>>
>> > >> >
>> > >>
>> >
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>>
>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >>
>> > >> > >>>>
>> > >> >
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>>
>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >>
>> > >> > >>>> >>
>> > >> > >>>>
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> > >> > >>>> >> >> >>>>        at
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >>
>> > >> > >>>> >>
>> > >> > >>>>
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >> >>>>
>> > >> > >>>> >> >> >>>> --
>> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
>> > >> > >>>> >> >> >>>> @eddieyoon
>> > >> > >>>> >> >> >>>
>> > >> > >>>> >> >> >>>
>> > >> > >>>> >> >>
>> > >> > >>>> >> >>
>> > >> > >>>> >> >>
>> > >> > >>>> >> >> --
>> > >> > >>>> >> >> Best Regards, Edward J. Yoon
>> > >> > >>>> >> >> @eddieyoon
>> > >> > >>>> >> >>
>> > >> > >>>> >>
>> > >> > >>>> >>
>> > >> > >>>> >>
>> > >> > >>>> >> --
>> > >> > >>>> >> Best Regards, Edward J. Yoon
>> > >> > >>>> >> @eddieyoon
>> > >> > >>>> >>
>> > >> > >>>>
>> > >> > >>>>
>> > >> > >>>>
>> > >> > >>>> --
>> > >> > >>>> Best Regards, Edward J. Yoon
>> > >> > >>>> @eddieyoon
>> > >> > >>>>
>> > >> > >>>
>> > >> > >>>
>> > >> > >>
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> > Best Regards, Edward J. Yoon
>> > >> > @eddieyoon
>> > >> >
>> > >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>> >
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Error with fastgen input

Posted by Thomas Jungblut <th...@gmail.com>.

For graph processing, the partitioned files that result from the
partitioning job must be sorted. Currently only the partition files in
itself are sorted, thus more tasks result in not sorted data in the
completed file. This only applies for the graph processing package.
So as Suraj told, it would be much more simpler to solve this via
messaging, once it is scalable (it will be very very scalable!). So the
GraphJobRunner can be partitioning the stuff with a single superstep in
setup() as it was before ages ago. The messaging must be sorted anyway for
the algorithm so this is a nice side effect and saves us the partitioning
job for graph processing.

For other partitionings and with regard to our superstep API, Suraj's idea
of injecting a preprocessing superstep that partitions the stuff into our
messaging system is actually the best.


2013/3/6 Suraj Menon <su...@apache.org>

> No, the partitions we write locally need not be sorted. Sorry for the
> confusion. The Superstep injection is possible with Superstep API. There
> are few enhancements needed to make it simpler after I last worked on it.
> We can then look into partitioning superstep being executed before the
> setup of first superstep of submitted job. I think it is feasible.
>
> On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > > spilling queue and sorted spilling queue, can we inject the
> partitioning
> > > superstep as the first superstep and use local memory?
> >
> > Actually, I wanted to add something before calling BSP.setup() method
> > to avoid execute additional BSP job. But, in my opinion, current is
> > enough. I think, we need to collect more experiences of input
> > partitioning on large environments. I'll do.
> >
> > BTW, I still don't know why it need to be Sorted?! MR-like?
> >
> > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <su...@apache.org>
> > wrote:
> > > Sorry, I am increasing the scope here to outside graph module. When we
> > have
> > > spilling queue and sorted spilling queue, can we inject the
> partitioning
> > > superstep as the first superstep and use local memory?
> > > Today we have partitioning job within a job and are creating two copies
> > of
> > > data on HDFS. This could be really costly. Is it possible to create or
> > > redistribute the partitions on local memory and initialize the record
> > > reader there?
> > > The user can run a separate job give in examples area to explicitly
> > > repartition the data on HDFS. The deployment question is how much of
> disk
> > > space gets allocated for local memory usage? Would it be a safe
> approach
> > > with the limitations?
> > >
> > > -Suraj
> > >
> > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> > > <th...@gmail.com>wrote:
> > >
> > >> yes. Once Suraj added merging of sorted files we can add this to the
> > >> partitioner pretty easily.
> > >>
> > >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> > >>
> > >> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
> > >> >
> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
> > >> > <th...@gmail.com> wrote:
> > >> > > Now I get how the partitioning works, obviously if you merge n
> > sorted
> > >> > files
> > >> > > by just appending to each other, this will result in totally
> > unsorted
> > >> > data
> > >> > > ;-)
> > >> > > Why didn't you solve this via messaging?
> > >> > >
> > >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
> > >> > >
> > >> > >> Seems that they are not correctly sorted:
> > >> > >>
> > >> > >> vertexID: 50
> > >> > >> vertexID: 52
> > >> > >> vertexID: 54
> > >> > >> vertexID: 56
> > >> > >> vertexID: 58
> > >> > >> vertexID: 61
> > >> > >> ...
> > >> > >> vertexID: 78
> > >> > >> vertexID: 81
> > >> > >> vertexID: 83
> > >> > >> vertexID: 85
> > >> > >> ...
> > >> > >> vertexID: 94
> > >> > >> vertexID: 96
> > >> > >> vertexID: 98
> > >> > >> vertexID: 1
> > >> > >> vertexID: 10
> > >> > >> vertexID: 12
> > >> > >> vertexID: 14
> > >> > >> vertexID: 16
> > >> > >> vertexID: 18
> > >> > >> vertexID: 21
> > >> > >> vertexID: 23
> > >> > >> vertexID: 25
> > >> > >> vertexID: 27
> > >> > >> vertexID: 29
> > >> > >> vertexID: 3
> > >> > >>
> > >> > >> So this won't work then correctly...
> > >> > >>
> > >> > >>
> > >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
> > >> > >>
> > >> > >>> sure, have fun on your holidays.
> > >> > >>>
> > >> > >>>
> > >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> > >> > >>>
> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
> > holiday[1]
> > >> so
> > >> > >>>> I'll appear next week.
> > >> > >>>>
> > >> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> > >> > >>>>
> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
> > >> > >>>> <th...@gmail.com> wrote:
> > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all
> items
> > >> were
> > >> > >>>> added.
> > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID into
> > the
> > >> > >>>> fastgen,
> > >> > >>>> > want to have a look into it?
> > >> > >>>> >
> > >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
> > >> > >>>> >
> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
> matrix
> > >> into
> > >> > >>>> >> multiple files.
> > >> > >>>> >>
> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
> > >> > >>>> >> <th...@gmail.com> wrote:
> > >> > >>>> >> > You have two files, are they partitioned correctly?
> > >> > >>>> >> >
> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
> > >> > >>>> >> >
> > >> > >>>> >> >> It looks like a bug.
> > >> > >>>> >> >>
> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> > >> /tmp/randomgraph/
> > >> > >>>> >> >> total 44
> > >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
> > >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01 part-00000
> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
> > .part-00000.crc
> > >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01 part-00001
> > >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
> > .part-00001.crc
> > >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03 partitions
> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> > >> > >>>> >> /tmp/randomgraph/partitions/
> > >> > >>>> >> >> total 24
> > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
> > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03 part-00000
> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
> > .part-00000.crc
> > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03 part-00001
> > >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
> > .part-00001.crc
> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> > >> > >>>> >> >>
> > >> > >>>> >> >>
> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <
> edward@udanax.org
> > >
> > >> > wrote:
> > >> > >>>> >> >> > yes i'll check again
> > >> > >>>> >> >> >
> > >> > >>>> >> >> > Sent from my iPhone
> > >> > >>>> >> >> >
> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
> > >> > >>>> >> thomas.jungblut@gmail.com>
> > >> > >>>> >> >> wrote:
> > >> > >>>> >> >> >
> > >> > >>>> >> >> >> Can you verify an observation for me please?
> > >> > >>>> >> >> >>
> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
> > >> part-00001,
> > >> > >>>> both
> > >> > >>>> >> ~2.2kb
> > >> > >>>> >> >> >> sized.
> > >> > >>>> >> >> >> In the below partition directory, there is only a
> single
> > >> > 5.56kb
> > >> > >>>> file.
> > >> > >>>> >> >> >>
> > >> > >>>> >> >> >> Is it intended for the partitioner to write a single
> > file
> > >> if
> > >> > you
> > >> > >>>> >> >> configured
> > >> > >>>> >> >> >> two?
> > >> > >>>> >> >> >> It even reads it as a two files, strange huh?
> > >> > >>>> >> >> >>
> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
> > >> > >>>> >> >> >>
> > >> > >>>> >> >> >>> Will have a look into it.
> > >> > >>>> >> >> >>>
> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
> > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
> > >> > >>>> >> >> >>>
> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe the
> > >> > >>>> partitioning
> > >> > >>>> >> >> doesn't
> > >> > >>>> >> >> >>> partition correctly with the input or something else.
> > >> > >>>> >> >> >>>
> > >> > >>>> >> >> >>>
> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> > >> > >>>> >> >> >>>
> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> > >> bin/hama
> > >> > jar
> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
> > >> > fastgen
> > >> > >>>> 100 10
> > >> > >>>> >> >> >>>> /tmp/randomgraph 2
> > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable
> > to
> > >> > load
> > >> > >>>> >> >> >>>> native-hadoop library for your platform... using
> > >> > builtin-java
> > >> > >>>> >> classes
> > >> > >>>> >> >> >>>> where applicable
> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running
> job:
> > >> > >>>> >> >> job_localrunner_0001
> > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting
> up
> > a
> > >> new
> > >> > >>>> barrier
> > >> > >>>> >> >> for 2
> > >> > >>>> >> >> >>>> tasks!
> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current
> > >> supersteps
> > >> > >>>> >> number: 0
> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total
> > number
> > >> > of
> > >> > >>>> >> >> supersteps: 0
> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3
> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > SUPERSTEPS=0
> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >> > LAUNCHED_TASKS=2
> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >> > >>>> >> TASK_OUTPUT_RECORDS=100
> > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> > >> bin/hama
> > >> > jar
> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> > >> bin/hama
> > >> > jar
> > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> > pagerank
> > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable
> > to
> > >> > load
> > >> > >>>> >> >> >>>> native-hadoop library for your platform... using
> > >> > builtin-java
> > >> > >>>> >> classes
> > >> > >>>> >> >> >>>> where applicable
> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
> > input
> > >> > paths
> > >> > >>>> to
> > >> > >>>> >> >> process
> > >> > >>>> >> >> >>>> : 2
> > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
> > input
> > >> > paths
> > >> > >>>> to
> > >> > >>>> >> >> process
> > >> > >>>> >> >> >>>> : 2
> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running
> job:
> > >> > >>>> >> >> job_localrunner_0001
> > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting
> up
> > a
> > >> new
> > >> > >>>> barrier
> > >> > >>>> >> >> for 2
> > >> > >>>> >> >> >>>> tasks!
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current
> > >> supersteps
> > >> > >>>> >> number: 1
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total
> > number
> > >> > of
> > >> > >>>> >> >> supersteps: 1
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > SUPERSTEPS=1
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >> > LAUNCHED_TASKS=2
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >> > SUPERSTEP_SUM=4
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >> > >>>> IO_BYTES_READ=4332
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >> > >>>> TIME_IN_SYNC_MS=14
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >> > >>>> TASK_INPUT_RECORDS=100
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total
> > input
> > >> > paths
> > >> > >>>> to
> > >> > >>>> >> >> process
> > >> > >>>> >> >> >>>> : 2
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running
> job:
> > >> > >>>> >> >> job_localrunner_0001
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting
> up
> > a
> > >> new
> > >> > >>>> barrier
> > >> > >>>> >> >> for 2
> > >> > >>>> >> >> >>>> tasks!
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
> > vertices
> > >> > are
> > >> > >>>> loaded
> > >> > >>>> >> >> into
> > >> > >>>> >> >> >>>> local:1
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
> > vertices
> > >> > are
> > >> > >>>> loaded
> > >> > >>>> >> >> into
> > >> > >>>> >> >> >>>> local:0
> > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
> Exception
> > >> > during
> > >> > >>>> BSP
> > >> > >>>> >> >> >>>> execution!
> > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must
> > never
> > >> be
> > >> > >>>> behind
> > >> > >>>> >> the
> > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>> >>
> > >> >
> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >>
> > >> > >>>> >>
> > >> > >>>>
> > >> >
> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>>
> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >>
> > >> > >>>> >>
> > >> > >>>>
> > >> >
> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >>
> > >> > >>>> >>
> > >> > >>>>
> > >> >
> > >>
> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >>
> > >> > >>>> >>
> > >> > >>>>
> > >> >
> > >>
> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>>
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >>
> > >> > >>>>
> > >> >
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>>
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >>
> > >> > >>>> >>
> > >> > >>>>
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >> > >>>> >> >> >>>>        at
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >>
> > >> > >>>> >>
> > >> > >>>>
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >> >>>>
> > >> > >>>> >> >> >>>> --
> > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
> > >> > >>>> >> >> >>>> @eddieyoon
> > >> > >>>> >> >> >>>
> > >> > >>>> >> >> >>>
> > >> > >>>> >> >>
> > >> > >>>> >> >>
> > >> > >>>> >> >>
> > >> > >>>> >> >> --
> > >> > >>>> >> >> Best Regards, Edward J. Yoon
> > >> > >>>> >> >> @eddieyoon
> > >> > >>>> >> >>
> > >> > >>>> >>
> > >> > >>>> >>
> > >> > >>>> >>
> > >> > >>>> >> --
> > >> > >>>> >> Best Regards, Edward J. Yoon
> > >> > >>>> >> @eddieyoon
> > >> > >>>> >>
> > >> > >>>>
> > >> > >>>>
> > >> > >>>>
> > >> > >>>> --
> > >> > >>>> Best Regards, Edward J. Yoon
> > >> > >>>> @eddieyoon
> > >> > >>>>
> > >> > >>>
> > >> > >>>
> > >> > >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Best Regards, Edward J. Yoon
> > >> > @eddieyoon
> > >> >
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Re: Error with fastgen input

Posted by Suraj Menon <su...@apache.org>.

No, the partitions we write locally need not be sorted. Sorry for the
confusion. The Superstep injection is possible with Superstep API. There
are few enhancements needed to make it simpler after I last worked on it.
We can then look into partitioning superstep being executed before the
setup of first superstep of submitted job. I think it is feasible.

On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <ed...@apache.org>wrote:

> > spilling queue and sorted spilling queue, can we inject the partitioning
> > superstep as the first superstep and use local memory?
>
> Actually, I wanted to add something before calling BSP.setup() method
> to avoid execute additional BSP job. But, in my opinion, current is
> enough. I think, we need to collect more experiences of input
> partitioning on large environments. I'll do.
>
> BTW, I still don't know why it need to be Sorted?! MR-like?
>
> On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <su...@apache.org>
> wrote:
> > Sorry, I am increasing the scope here to outside graph module. When we
> have
> > spilling queue and sorted spilling queue, can we inject the partitioning
> > superstep as the first superstep and use local memory?
> > Today we have partitioning job within a job and are creating two copies
> of
> > data on HDFS. This could be really costly. Is it possible to create or
> > redistribute the partitions on local memory and initialize the record
> > reader there?
> > The user can run a separate job give in examples area to explicitly
> > repartition the data on HDFS. The deployment question is how much of disk
> > space gets allocated for local memory usage? Would it be a safe approach
> > with the limitations?
> >
> > -Suraj
> >
> > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> > <th...@gmail.com>wrote:
> >
> >> yes. Once Suraj added merging of sorted files we can add this to the
> >> partitioner pretty easily.
> >>
> >> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >>
> >> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
> >> >
> >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
> >> > <th...@gmail.com> wrote:
> >> > > Now I get how the partitioning works, obviously if you merge n
> sorted
> >> > files
> >> > > by just appending to each other, this will result in totally
> unsorted
> >> > data
> >> > > ;-)
> >> > > Why didn't you solve this via messaging?
> >> > >
> >> > > 2013/2/28 Thomas Jungblut <th...@gmail.com>
> >> > >
> >> > >> Seems that they are not correctly sorted:
> >> > >>
> >> > >> vertexID: 50
> >> > >> vertexID: 52
> >> > >> vertexID: 54
> >> > >> vertexID: 56
> >> > >> vertexID: 58
> >> > >> vertexID: 61
> >> > >> ...
> >> > >> vertexID: 78
> >> > >> vertexID: 81
> >> > >> vertexID: 83
> >> > >> vertexID: 85
> >> > >> ...
> >> > >> vertexID: 94
> >> > >> vertexID: 96
> >> > >> vertexID: 98
> >> > >> vertexID: 1
> >> > >> vertexID: 10
> >> > >> vertexID: 12
> >> > >> vertexID: 14
> >> > >> vertexID: 16
> >> > >> vertexID: 18
> >> > >> vertexID: 21
> >> > >> vertexID: 23
> >> > >> vertexID: 25
> >> > >> vertexID: 27
> >> > >> vertexID: 29
> >> > >> vertexID: 3
> >> > >>
> >> > >> So this won't work then correctly...
> >> > >>
> >> > >>
> >> > >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
> >> > >>
> >> > >>> sure, have fun on your holidays.
> >> > >>>
> >> > >>>
> >> > >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >> > >>>
> >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
> holiday[1]
> >> so
> >> > >>>> I'll appear next week.
> >> > >>>>
> >> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> >> > >>>>
> >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
> >> > >>>> <th...@gmail.com> wrote:
> >> > >>>> > Maybe 50 is missing from the file, didn't observe if all items
> >> were
> >> > >>>> added.
> >> > >>>> > As far as I remember, I copy/pasted the logic of the ID into
> the
> >> > >>>> fastgen,
> >> > >>>> > want to have a look into it?
> >> > >>>> >
> >> > >>>> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >> > >>>> >
> >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency matrix
> >> into
> >> > >>>> >> multiple files.
> >> > >>>> >>
> >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
> >> > >>>> >> <th...@gmail.com> wrote:
> >> > >>>> >> > You have two files, are they partitioned correctly?
> >> > >>>> >> >
> >> > >>>> >> > 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >> > >>>> >> >
> >> > >>>> >> >> It looks like a bug.
> >> > >>>> >> >>
> >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> >> /tmp/randomgraph/
> >> > >>>> >> >> total 44
> >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
> >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
> >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01 part-00000
> >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
> .part-00000.crc
> >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01 part-00001
> >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
> .part-00001.crc
> >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03 partitions
> >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> >> > >>>> >> /tmp/randomgraph/partitions/
> >> > >>>> >> >> total 24
> >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
> >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
> >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03 part-00000
> >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
> .part-00000.crc
> >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03 part-00001
> >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
> .part-00001.crc
> >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> >> > >>>> >> >>
> >> > >>>> >> >>
> >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <edward@udanax.org
> >
> >> > wrote:
> >> > >>>> >> >> > yes i'll check again
> >> > >>>> >> >> >
> >> > >>>> >> >> > Sent from my iPhone
> >> > >>>> >> >> >
> >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
> >> > >>>> >> thomas.jungblut@gmail.com>
> >> > >>>> >> >> wrote:
> >> > >>>> >> >> >
> >> > >>>> >> >> >> Can you verify an observation for me please?
> >> > >>>> >> >> >>
> >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
> >> part-00001,
> >> > >>>> both
> >> > >>>> >> ~2.2kb
> >> > >>>> >> >> >> sized.
> >> > >>>> >> >> >> In the below partition directory, there is only a single
> >> > 5.56kb
> >> > >>>> file.
> >> > >>>> >> >> >>
> >> > >>>> >> >> >> Is it intended for the partitioner to write a single
> file
> >> if
> >> > you
> >> > >>>> >> >> configured
> >> > >>>> >> >> >> two?
> >> > >>>> >> >> >> It even reads it as a two files, strange huh?
> >> > >>>> >> >> >>
> >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <th...@gmail.com>
> >> > >>>> >> >> >>
> >> > >>>> >> >> >>> Will have a look into it.
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
> >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>> did work for me the last time I profiled, maybe the
> >> > >>>> partitioning
> >> > >>>> >> >> doesn't
> >> > >>>> >> >> >>> partition correctly with the input or something else.
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <ed...@apache.org>
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
> >> > >>>> >> >> >>>>
> >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> >> bin/hama
> >> > jar
> >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
> >> > fastgen
> >> > >>>> 100 10
> >> > >>>> >> >> >>>> /tmp/randomgraph 2
> >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable
> to
> >> > load
> >> > >>>> >> >> >>>> native-hadoop library for your platform... using
> >> > builtin-java
> >> > >>>> >> classes
> >> > >>>> >> >> >>>> where applicable
> >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running job:
> >> > >>>> >> >> job_localrunner_0001
> >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting up
> a
> >> new
> >> > >>>> barrier
> >> > >>>> >> >> for 2
> >> > >>>> >> >> >>>> tasks!
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current
> >> supersteps
> >> > >>>> >> number: 0
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total
> number
> >> > of
> >> > >>>> >> >> supersteps: 0
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> SUPERSTEPS=0
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >> > LAUNCHED_TASKS=2
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> >> > >>>> >> TASK_OUTPUT_RECORDS=100
> >> > >>>> >> >> >>>> Job Finished in 3.212 seconds
> >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> >> bin/hama
> >> > jar
> >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
> >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
> >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> >> bin/hama
> >> > jar
> >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> pagerank
> >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable
> to
> >> > load
> >> > >>>> >> >> >>>> native-hadoop library for your platform... using
> >> > builtin-java
> >> > >>>> >> classes
> >> > >>>> >> >> >>>> where applicable
> >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
> input
> >> > paths
> >> > >>>> to
> >> > >>>> >> >> process
> >> > >>>> >> >> >>>> : 2
> >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total
> input
> >> > paths
> >> > >>>> to
> >> > >>>> >> >> process
> >> > >>>> >> >> >>>> : 2
> >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running job:
> >> > >>>> >> >> job_localrunner_0001
> >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting up
> a
> >> new
> >> > >>>> barrier
> >> > >>>> >> >> for 2
> >> > >>>> >> >> >>>> tasks!
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current
> >> supersteps
> >> > >>>> >> number: 1
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total
> number
> >> > of
> >> > >>>> >> >> supersteps: 1
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> SUPERSTEPS=1
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >> > LAUNCHED_TASKS=2
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >> > SUPERSTEP_SUM=4
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >> > >>>> IO_BYTES_READ=4332
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >> > >>>> TIME_IN_SYNC_MS=14
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> >> > >>>> TASK_INPUT_RECORDS=100
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total
> input
> >> > paths
> >> > >>>> to
> >> > >>>> >> >> process
> >> > >>>> >> >> >>>> : 2
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running job:
> >> > >>>> >> >> job_localrunner_0001
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting up
> a
> >> new
> >> > >>>> barrier
> >> > >>>> >> >> for 2
> >> > >>>> >> >> >>>> tasks!
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
> vertices
> >> > are
> >> > >>>> loaded
> >> > >>>> >> >> into
> >> > >>>> >> >> >>>> local:1
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50
> vertices
> >> > are
> >> > >>>> loaded
> >> > >>>> >> >> into
> >> > >>>> >> >> >>>> local:0
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner: Exception
> >> > during
> >> > >>>> BSP
> >> > >>>> >> >> >>>> execution!
> >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must
> never
> >> be
> >> > >>>> behind
> >> > >>>> >> the
> >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >>
> >> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> >>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> >>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>>
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> > >>>> >> >> >>>>        at
> >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>>
> >> >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>>
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> > >>>> >> >> >>>>        at
> >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
> >> > >>>> >> >> >>>>
> >> > >>>> >> >> >>>>
> >> > >>>> >> >> >>>> --
> >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
> >> > >>>> >> >> >>>> @eddieyoon
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>>
> >> > >>>> >> >>
> >> > >>>> >> >>
> >> > >>>> >> >>
> >> > >>>> >> >> --
> >> > >>>> >> >> Best Regards, Edward J. Yoon
> >> > >>>> >> >> @eddieyoon
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>> >>
> >> > >>>> >>
> >> > >>>> >> --
> >> > >>>> >> Best Regards, Edward J. Yoon
> >> > >>>> >> @eddieyoon
> >> > >>>> >>
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> --
> >> > >>>> Best Regards, Edward J. Yoon
> >> > >>>> @eddieyoon
> >> > >>>>
> >> > >>>
> >> > >>>
> >> > >>
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards, Edward J. Yoon
> >> > @eddieyoon
> >> >
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>