You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Thomas Jungblut (Updated) (JIRA)" <ji...@apache.org> on 2012/01/27 18:26:10 UTC

[jira] [Updated] (HAMA-493) Provide text to seq-file utils for graph examples

     [ https://issues.apache.org/jira/browse/HAMA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Jungblut updated HAMA-493:
---------------------------------

    Attachment: HAMA-493.patch


Here is the example usage: 

{noformat}
~$ /usr/local/hama/bin/hama jar /usr/local/hama/hama-examples-0.4.0-incubating-SNAPSHOT.jar pagerank-text2seq /tmp/test_seq/in.txt hdfs://localhost:9000/tmp/test_seq/out.seq
12/01/27 17:33:55 INFO util.TextToSequenceFile: Processing file : file:/tmp/test_seq/in.txt
12/01/27 17:33:55 INFO util.TextToSequenceFile: Written 246 to hdfs://localhost:9000/tmp/test_seq/out.seq/in.txt.seq
{noformat}

Then you can run pagerank on it:

{noformat}
~$ /usr/local/hama/bin/hama jar /usr/local/hama/hama-examples-0.4.0-incubating-SNAPSHOT.jar pagerank /tmp/test_seq/out.seq/ /tmp/test_seq/out/
{noformat}

Similar it is working with SSSP.
In both, you can customize a separator string that is delimiting the records.
Play arround a bit with it. It also allows people to use regex'es in their paths and is able to transform multiple text files into sequencefiles.

BTW, we should delete the partition in the input directory once it has run, otherwise the user gets "Not a file" errors when rerunning the job.
Didn't we have a cleanup issue for that?
I added a remove part to the partition-dir in the FileInputFormat. Please review this, and say if you feel okay with this solution.

And just another thing, once a task has thrown an exception, we should kill the whole job. It is just hanging to infinity because the task doesn't report back to the groom?

However I should add testcases for it this patch. And document the public methods.
                
> Provide text to seq-file utils for graph examples
> -------------------------------------------------
>
>                 Key: HAMA-493
>                 URL: https://issues.apache.org/jira/browse/HAMA-493
>             Project: Hama
>          Issue Type: New Feature
>    Affects Versions: 0.3.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>             Fix For: 0.4.0
>
>         Attachments: HAMA-493.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Updated] (HAMA-493) Provide text to seq-file utils for graph examples

Posted by "Edward J. Yoon" <ed...@apache.org>.
And, job hangs when zookeeper.clientPort setting is not correct (SyncException).

killing job/task seems not stable.

On Sat, Jan 28, 2012 at 2:26 AM, Thomas Jungblut (Updated) (JIRA)
<ji...@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/HAMA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Thomas Jungblut updated HAMA-493:
> ---------------------------------
>
>    Attachment: HAMA-493.patch
>
>
> Here is the example usage:
>
> {noformat}
> ~$ /usr/local/hama/bin/hama jar /usr/local/hama/hama-examples-0.4.0-incubating-SNAPSHOT.jar pagerank-text2seq /tmp/test_seq/in.txt hdfs://localhost:9000/tmp/test_seq/out.seq
> 12/01/27 17:33:55 INFO util.TextToSequenceFile: Processing file : file:/tmp/test_seq/in.txt
> 12/01/27 17:33:55 INFO util.TextToSequenceFile: Written 246 to hdfs://localhost:9000/tmp/test_seq/out.seq/in.txt.seq
> {noformat}
>
> Then you can run pagerank on it:
>
> {noformat}
> ~$ /usr/local/hama/bin/hama jar /usr/local/hama/hama-examples-0.4.0-incubating-SNAPSHOT.jar pagerank /tmp/test_seq/out.seq/ /tmp/test_seq/out/
> {noformat}
>
> Similar it is working with SSSP.
> In both, you can customize a separator string that is delimiting the records.
> Play arround a bit with it. It also allows people to use regex'es in their paths and is able to transform multiple text files into sequencefiles.
>
> BTW, we should delete the partition in the input directory once it has run, otherwise the user gets "Not a file" errors when rerunning the job.
> Didn't we have a cleanup issue for that?
> I added a remove part to the partition-dir in the FileInputFormat. Please review this, and say if you feel okay with this solution.
>
> And just another thing, once a task has thrown an exception, we should kill the whole job. It is just hanging to infinity because the task doesn't report back to the groom?
>
> However I should add testcases for it this patch. And document the public methods.
>
>> Provide text to seq-file utils for graph examples
>> -------------------------------------------------
>>
>>                 Key: HAMA-493
>>                 URL: https://issues.apache.org/jira/browse/HAMA-493
>>             Project: Hama
>>          Issue Type: New Feature
>>    Affects Versions: 0.3.0
>>            Reporter: Thomas Jungblut
>>            Assignee: Thomas Jungblut
>>             Fix For: 0.4.0
>>
>>         Attachments: HAMA-493.patch
>>
>>
>
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon