You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Edward J. Yoon (JIRA)" <ji...@apache.org> on 2009/05/06 08:40:30 UTC

[jira] Created: (HAMA-182) Seperate column families

Seperate column families
------------------------

                 Key: HAMA-182
                 URL: https://issues.apache.org/jira/browse/HAMA-182
             Project: Hama
          Issue Type: Improvement
          Components: implementation
    Affects Versions: 0.1.0
            Reporter: Edward J. Yoon
            Assignee: Edward J. Yoon
             Fix For: 0.1.0


The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HAMA-182) Seperate column families

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon resolved HAMA-182.
---------------------------------

    Resolution: Won't Fix

won't fix.

> Seperate column families
> ------------------------
>
>                 Key: HAMA-182
>                 URL: https://issues.apache.org/jira/browse/HAMA-182
>             Project: Hama
>          Issue Type: Improvement
>          Components: matrix
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HAMA-182) Seperate column families

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706698#action_12706698 ] 

Edward J. Yoon commented on HAMA-182:
-------------------------------------

When too many families (100) are in one table, I received below messages. 

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 61.247.200.35:60020 for region DenseMatrix_randonmwq,,1241660972061, row '000000000000442', but failed after 10 attempts.
Exceptions:
java.io.IOException: Call to /61.247.200.35:60020 failed on local exception: java.io.EOFException
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused

	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:841)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:932)
	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
	at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
	at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
	at org.apache.hama.mapred.VectorOutputFormat$TableRecordWriter.write(VectorOutputFormat.java:71)
	at org.apache.hama.mapred.VectorOutputFormat$TableRecordWriter.write(VectorOutputFormat.java:51)
	at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
	at org.apache.hama.mapred.RandomMatrixReduce.reduce(RandomMatrixReduce.java:71)
	at org.apache.hama.mapred.RandomMatrixReduce.reduce(RandomMatrixReduce.java:36)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
	at org.apache.hadoop.mapred.Child.main(Child.java:158)


> Seperate column families
> ------------------------
>
>                 Key: HAMA-182
>                 URL: https://issues.apache.org/jira/browse/HAMA-182
>             Project: Hama
>          Issue Type: Improvement
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HAMA-182) Seperate column families

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707264#action_12707264 ] 

Edward J. Yoon commented on HAMA-182:
-------------------------------------

Seems not good idea. It doesn't shows anything.

> Seperate column families
> ------------------------
>
>                 Key: HAMA-182
>                 URL: https://issues.apache.org/jira/browse/HAMA-182
>             Project: Hama
>          Issue Type: Improvement
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HAMA-182) Seperate column families

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-182:
--------------------------------

    Attachment: columns.patch

Here's the patch. I'll check the performance and algorithms.

> Seperate column families
> ------------------------
>
>                 Key: HAMA-182
>                 URL: https://issues.apache.org/jira/browse/HAMA-182
>             Project: Hama
>          Issue Type: Improvement
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HAMA-182) Seperate column families

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706701#action_12706701 ] 

Edward J. Yoon commented on HAMA-182:
-------------------------------------

A lot of families have to flush often. If I reduces the number of families, it's ok. However, I'm not sure whether small families are worth anything.

> Seperate column families
> ------------------------
>
>                 Key: HAMA-182
>                 URL: https://issues.apache.org/jira/browse/HAMA-182
>             Project: Hama
>          Issue Type: Improvement
>          Components: implementation
>    Affects Versions: 0.1.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.1.0
>
>         Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.