You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon (JIRA)" <ji...@apache.org> on 2008/12/11 09:57:44 UTC

[jira] Created: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
-------------------------------------------------------------------------

                 Key: HAMA-133
                 URL: https://issues.apache.org/jira/browse/HAMA-133
             Project: Hama
          Issue Type: Sub-task
          Components: implementation
    Affects Versions: 0.1.0
            Reporter: Edward J. Yoon
             Fix For: 0.1.0


> If we remove 'reduce phase', I guess we can reduce the disk I/O operations.


Yes.


>
>
> In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> Constants.COLUMN }, and write directly blocks.


Two methods to be considered:
1) We need a InputFormat that partitions the matrix table according to the
row boundaries of the blocks.
   This should be carefully to make sure a single block will not divied
into two or more mappers.

2) Like what RandomMatrixMap does, we just tell the mappers the row/column
boundaries of the blocks of a matrix-table.
   Scanner the portion of the table will be done in a mapper.

I think 1) may be better than 2).
An InputFormat can get the locality of a range of table to let MR know how
to move the mr computations close to it.
In 2), if we do it like RandomMatrixMap, we may lose some locality
informations of the table. so that the network transfer overhead may be
increase.

It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-133:
--------------------------------

    Attachment: HAMA-133.patch

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-133:
--------------------------------

    Status: Patch Available  (was: Open)

Submit to hudson.

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-133:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch, HAMA-133_v01.patch, HAMA-133_v02.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-133:
--------------------------------

    Attachment: HAMA-133_v02.patch

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch, HAMA-133_v01.patch, HAMA-133_v02.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-133:
--------------------------------

    Status: Open  (was: Patch Available)

obviously reduced.

----
08/12/15 17:28:12 INFO hama.AbstractMatrix: Create 2 * 2 blocked matrix
08/12/15 17:28:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
08/12/15 17:28:12 WARN mapred.JobClient: Use genericOptions for the option -libjars
08/12/15 17:28:12 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
08/12/15 17:28:13 INFO mapred.JobClient: Running job: job_200812151723_0003
08/12/15 17:28:14 INFO mapred.JobClient:  map 0% reduce 0%
08/12/15 17:28:18 INFO mapred.JobClient:  map 75% reduce 0%
08/12/15 17:28:25 INFO mapred.JobClient:  map 75% reduce 6%
08/12/15 17:28:28 INFO mapred.JobClient:  map 75% reduce 22%
08/12/15 17:28:38 INFO mapred.JobClient:  map 75% reduce 25%
08/12/15 17:31:23 INFO mapred.JobClient:  map 100% reduce 25%
08/12/15 17:31:27 INFO mapred.JobClient:  map 100% reduce 62%
08/12/15 17:31:30 INFO mapred.JobClient:  map 100% reduce 81%
08/12/15 17:31:32 INFO mapred.JobClient: Job complete: job_200812151723_0003
08/12/15 17:31:32 INFO mapred.JobClient: Counters: 13
08/12/15 17:31:32 INFO mapred.JobClient:   File Systems
08/12/15 17:31:32 INFO mapred.JobClient:     Local bytes read=56
08/12/15 17:31:32 INFO mapred.JobClient:     Local bytes written=568
08/12/15 17:31:32 INFO mapred.JobClient:   Job Counters 
08/12/15 17:31:32 INFO mapred.JobClient:     Launched reduce tasks=5
08/12/15 17:31:32 INFO mapred.JobClient:     Launched map tasks=5
08/12/15 17:31:32 INFO mapred.JobClient:   Map-Reduce Framework
08/12/15 17:31:32 INFO mapred.JobClient:     Reduce input groups=0
08/12/15 17:31:32 INFO mapred.JobClient:     Combine output records=0
08/12/15 17:31:32 INFO mapred.JobClient:     Map input records=4
08/12/15 17:31:32 INFO mapred.JobClient:     Reduce output records=0
08/12/15 17:31:32 INFO mapred.JobClient:     Map output bytes=0
08/12/15 17:31:32 INFO mapred.JobClient:     Map input bytes=0
08/12/15 17:31:32 INFO mapred.JobClient:     Combine input records=0
08/12/15 17:31:32 INFO mapred.JobClient:     Map output records=0
08/12/15 17:31:32 INFO mapred.JobClient:     Reduce input records=0


> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch, HAMA-133_v01.patch, HAMA-133_v02.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656584#action_12656584 ] 

Hudson commented on HAMA-133:
-----------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12396063/HAMA-133_v02.patch
against trunk revision 726076.

    @author +1.  The patch does not contain any @author tags.

    tests included +1.  The patch appears to include 9 new or modified tests.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hama-Patch/133/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hama-Patch/133/artifact/trunk/build/reports/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hama-Patch/133/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hama-Patch/133/console

This message is automatically generated.

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch, HAMA-133_v01.patch, HAMA-133_v02.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656567#action_12656567 ] 

Hudson commented on HAMA-133:
-----------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12396060/HAMA-133.patch
against trunk revision 726076.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests -1.  The patch failed core unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hama-Patch/132/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hama-Patch/132/artifact/trunk/build/reports/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hama-Patch/132/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hama-Patch/132/console

This message is automatically generated.

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-133:
--------------------------------

    Attachment: HAMA-133_v01.patch

Re-attach

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch, HAMA-133_v01.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-133:
--------------------------------

    Status: Patch Available  (was: Open)

Local test passed. Submit my patch.

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: HAMA-133.patch, HAMA-133_v01.patch, HAMA-133_v02.patch
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon reassigned HAMA-133:
-----------------------------------

    Assignee: Edward J. Yoon

> To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
> -------------------------------------------------------------------------
>
>                 Key: HAMA-133
>                 URL: https://issues.apache.org/jira/browse/HAMA-133
>             Project: Hama
>          Issue Type: Sub-task
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>
> > If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
> Yes.
> >
> >
> > In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> > Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> > Constants.COLUMN }, and write directly blocks.
> Two methods to be considered:
> 1) We need a InputFormat that partitions the matrix table according to the
> row boundaries of the blocks.
>    This should be carefully to make sure a single block will not divied
> into two or more mappers.
> 2) Like what RandomMatrixMap does, we just tell the mappers the row/column
> boundaries of the blocks of a matrix-table.
>    Scanner the portion of the table will be done in a mapper.
> I think 1) may be better than 2).
> An InputFormat can get the locality of a range of table to let MR know how
> to move the mr computations close to it.
> In 2), if we do it like RandomMatrixMap, we may lose some locality
> informations of the table. so that the network transfer overhead may be
> increase.
> It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.