You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@chukwa.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2010/12/27 22:44:45 UTC

[jira] Created: (CHUKWA-569) Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name

Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name
---------------------------------------------------------------------------------

                 Key: CHUKWA-569
                 URL: https://issues.apache.org/jira/browse/CHUKWA-569
             Project: Chukwa
          Issue Type: Improvement
          Components: User Interface
         Environment: Java 6, MacOSX 10.6
            Reporter: Eric Yang
            Assignee: Eric Yang
             Fix For: 0.5.0


When select a column family, the only way to figure out all columns inside the column family is to do a scan.  Since HBase does not have an API to get column names only, we have to provide a temporary workaround.  We can add an option to do full table scan, and we scan small amount of data to figure out the column patterns by default.  This short cut make assumption that data are in repeated pattern, scanning one row should be sufficient to find all the columns.  The same principle applies in scanning for unique row names.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-569) Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CHUKWA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-569:
-----------------------------

    Attachment: CHUKWA-569.patch

This is a temporary fix to speed up graph_explorer until we have an aggregation system to scan and extract all columns and store in metadata table.

> Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name
> ---------------------------------------------------------------------------------
>
>                 Key: CHUKWA-569
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-569
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: User Interface
>         Environment: Java 6, MacOSX 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-569.patch
>
>
> When select a column family, the only way to figure out all columns inside the column family is to do a scan.  Since HBase does not have an API to get column names only, we have to provide a temporary workaround.  We can add an option to do full table scan, and we scan small amount of data to figure out the column patterns by default.  This short cut make assumption that data are in repeated pattern, scanning one row should be sufficient to find all the columns.  The same principle applies in scanning for unique row names.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-569) Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CHUKWA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975703#action_12975703 ] 

Eric Yang commented on CHUKWA-569:
----------------------------------

A separate metadata table to keep track of the columns is a better implementation than this jira implies.  There are two missing pieces for the recommendation to work.  We need a workflow scheduler to run periodically, and the mapreduce job to scan for extra columns with specified table name and time range.  

For the first one, workflow scheduler system, we can use Oozie workflow scheduler.  For mapreduce which scan for columns, and update the meta data table, I filed CHUKWA-570.

> Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name
> ---------------------------------------------------------------------------------
>
>                 Key: CHUKWA-569
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-569
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: User Interface
>         Environment: Java 6, MacOSX 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>
> When select a column family, the only way to figure out all columns inside the column family is to do a scan.  Since HBase does not have an API to get column names only, we have to provide a temporary workaround.  We can add an option to do full table scan, and we scan small amount of data to figure out the column patterns by default.  This short cut make assumption that data are in repeated pattern, scanning one row should be sufficient to find all the columns.  The same principle applies in scanning for unique row names.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-569) Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CHUKWA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-569:
-----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name
> ---------------------------------------------------------------------------------
>
>                 Key: CHUKWA-569
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-569
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: User Interface
>         Environment: Java 6, MacOSX 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-569.patch
>
>
> When select a column family, the only way to figure out all columns inside the column family is to do a scan.  Since HBase does not have an API to get column names only, we have to provide a temporary workaround.  We can add an option to do full table scan, and we scan small amount of data to figure out the column patterns by default.  This short cut make assumption that data are in repeated pattern, scanning one row should be sufficient to find all the columns.  The same principle applies in scanning for unique row names.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-569) Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CHUKWA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-569:
-----------------------------

    Status: Patch Available  (was: Open)

> Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name
> ---------------------------------------------------------------------------------
>
>                 Key: CHUKWA-569
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-569
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: User Interface
>         Environment: Java 6, MacOSX 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>         Attachments: CHUKWA-569.patch
>
>
> When select a column family, the only way to figure out all columns inside the column family is to do a scan.  Since HBase does not have an API to get column names only, we have to provide a temporary workaround.  We can add an option to do full table scan, and we scan small amount of data to figure out the column patterns by default.  This short cut make assumption that data are in repeated pattern, scanning one row should be sufficient to find all the columns.  The same principle applies in scanning for unique row names.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-569) Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CHUKWA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975656#action_12975656 ] 

Ari Rabkin commented on CHUKWA-569:
-----------------------------------

I think that assumption is usually a safe one -- particularly if the scan only covers, say, the most recent 20 rows.

What about explicitly keeping a separate metadata table, either in HBase or in Zookeeper?

> Add an option to speed up graph_explorer.jsp in fetching Column Name and Row Name
> ---------------------------------------------------------------------------------
>
>                 Key: CHUKWA-569
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-569
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: User Interface
>         Environment: Java 6, MacOSX 10.6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.5.0
>
>
> When select a column family, the only way to figure out all columns inside the column family is to do a scan.  Since HBase does not have an API to get column names only, we have to provide a temporary workaround.  We can add an option to do full table scan, and we scan small amount of data to figure out the column patterns by default.  This short cut make assumption that data are in repeated pattern, scanning one row should be sufficient to find all the columns.  The same principle applies in scanning for unique row names.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.