You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon (JIRA)" <ji...@apache.org> on 2009/05/06 08:40:30 UTC
[jira] Created: (HAMA-182) Seperate column families
Seperate column families
------------------------
Key: HAMA-182
URL: https://issues.apache.org/jira/browse/HAMA-182
Project: Hama
Issue Type: Improvement
Components: implementation
Affects Versions: 0.1.0
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
Fix For: 0.1.0
The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HAMA-182) Seperate column families
Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edward J. Yoon resolved HAMA-182.
---------------------------------
Resolution: Won't Fix
won't fix.
> Seperate column families
> ------------------------
>
> Key: HAMA-182
> URL: https://issues.apache.org/jira/browse/HAMA-182
> Project: Hama
> Issue Type: Improvement
> Components: matrix
> Affects Versions: 0.1.0
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Fix For: 0.1.0
>
> Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HAMA-182) Seperate column families
Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706698#action_12706698 ]
Edward J. Yoon commented on HAMA-182:
-------------------------------------
When too many families (100) are in one table, I received below messages.
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 61.247.200.35:60020 for region DenseMatrix_randonmwq,,1241660972061, row '000000000000442', but failed after 10 attempts.
Exceptions:
java.io.IOException: Call to /61.247.200.35:60020 failed on local exception: java.io.EOFException
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
java.net.ConnectException: Call to /61.247.200.35:60020 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:841)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:932)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
at org.apache.hama.mapred.VectorOutputFormat$TableRecordWriter.write(VectorOutputFormat.java:71)
at org.apache.hama.mapred.VectorOutputFormat$TableRecordWriter.write(VectorOutputFormat.java:51)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
at org.apache.hama.mapred.RandomMatrixReduce.reduce(RandomMatrixReduce.java:71)
at org.apache.hama.mapred.RandomMatrixReduce.reduce(RandomMatrixReduce.java:36)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
> Seperate column families
> ------------------------
>
> Key: HAMA-182
> URL: https://issues.apache.org/jira/browse/HAMA-182
> Project: Hama
> Issue Type: Improvement
> Components: implementation
> Affects Versions: 0.1.0
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Fix For: 0.1.0
>
> Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HAMA-182) Seperate column families
Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707264#action_12707264 ]
Edward J. Yoon commented on HAMA-182:
-------------------------------------
Seems not good idea. It doesn't shows anything.
> Seperate column families
> ------------------------
>
> Key: HAMA-182
> URL: https://issues.apache.org/jira/browse/HAMA-182
> Project: Hama
> Issue Type: Improvement
> Components: implementation
> Affects Versions: 0.1.0
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Fix For: 0.1.0
>
> Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HAMA-182) Seperate column families
Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edward J. Yoon updated HAMA-182:
--------------------------------
Attachment: columns.patch
Here's the patch. I'll check the performance and algorithms.
> Seperate column families
> ------------------------
>
> Key: HAMA-182
> URL: https://issues.apache.org/jira/browse/HAMA-182
> Project: Hama
> Issue Type: Improvement
> Components: implementation
> Affects Versions: 0.1.0
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Fix For: 0.1.0
>
> Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HAMA-182) Seperate column families
Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706701#action_12706701 ]
Edward J. Yoon commented on HAMA-182:
-------------------------------------
A lot of families have to flush often. If I reduces the number of families, it's ok. However, I'm not sure whether small families are worth anything.
> Seperate column families
> ------------------------
>
> Key: HAMA-182
> URL: https://issues.apache.org/jira/browse/HAMA-182
> Project: Hama
> Issue Type: Improvement
> Components: implementation
> Affects Versions: 0.1.0
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Fix For: 0.1.0
>
> Attachments: columns.patch
>
>
> The all columns are stored in a single column family. I propose to separate column families by column index range so that we reduces I/O during column-based process.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.