You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2011/03/23 18:38:05 UTC

[jira] [Created] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
------------------------------------------------------------------------------

                 Key: HBASE-3693
                 URL: https://issues.apache.org/jira/browse/HBASE-3693
             Project: HBase
          Issue Type: Improvement
            Reporter: Kannan Muthukkaruppan
            Assignee: Liyin Tang


We noticed that are lots of listStatus calls on the ColumnFamily directories within each regions, coming from this codepath:

{code}
compactionSelection()
 --> isMajorCompaction 
    --> getLowestTimestamp()
       -->  FileStatus[] stats = fs.listStatus(p);
{code}

So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.

Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

Posted by "Liyin Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-3693:
------------------------------

    Attachment: Hbase-3693[r1085248]_2.patch

Thanks Stack:)

I remove the set function and correct the typo. 

This patch should be fine:)

> isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3693
>                 URL: https://issues.apache.org/jira/browse/HBASE-3693
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Liyin Tang
>         Attachments: Hbase-3693[r1085248]_2.patch, Hbase-3693[r1085306].patch
>
>
> We noticed that there are lots of listStatus calls on the ColumnFamily directories within each region, coming from this codepath:
> {code}
> compactionSelection()
>  --> isMajorCompaction 
>     --> getLowestTimestamp()
>        -->  FileStatus[] stats = fs.listStatus(p);
> {code}
> So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.
> Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3693.
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed to TRUNK (not to branch since only an 'improvement').  Nice patch Liyin.  Thanks.

> isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3693
>                 URL: https://issues.apache.org/jira/browse/HBASE-3693
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Liyin Tang
>         Attachments: Hbase-3693[r1085248]_2.patch, Hbase-3693[r1085306].patch
>
>
> We noticed that there are lots of listStatus calls on the ColumnFamily directories within each region, coming from this codepath:
> {code}
> compactionSelection()
>  --> isMajorCompaction 
>     --> getLowestTimestamp()
>        -->  FileStatus[] stats = fs.listStatus(p);
> {code}
> So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.
> Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-3693:
-----------------------------------------

    Description: 
We noticed that there are lots of listStatus calls on the ColumnFamily directories within each region, coming from this codepath:

{code}
compactionSelection()
 --> isMajorCompaction 
    --> getLowestTimestamp()
       -->  FileStatus[] stats = fs.listStatus(p);
{code}

So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.

Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.


  was:
We noticed that are lots of listStatus calls on the ColumnFamily directories within each regions, coming from this codepath:

{code}
compactionSelection()
 --> isMajorCompaction 
    --> getLowestTimestamp()
       -->  FileStatus[] stats = fs.listStatus(p);
{code}

So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.

Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.



> isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3693
>                 URL: https://issues.apache.org/jira/browse/HBASE-3693
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Liyin Tang
>
> We noticed that there are lots of listStatus calls on the ColumnFamily directories within each region, coming from this codepath:
> {code}
> compactionSelection()
>  --> isMajorCompaction 
>     --> getLowestTimestamp()
>        -->  FileStatus[] stats = fs.listStatus(p);
> {code}
> So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.
> Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

Posted by "Kazuki Ohta (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010321#comment-13010321 ] 

Kazuki Ohta commented on HBASE-3693:
------------------------------------

+1 on good catch!

Would be better to have FileSystem access statistics by ganglia metrics. Maybe by modifying hadoop-hdfs or FileSystem wrapper?


> isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3693
>                 URL: https://issues.apache.org/jira/browse/HBASE-3693
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Liyin Tang
>
> We noticed that there are lots of listStatus calls on the ColumnFamily directories within each region, coming from this codepath:
> {code}
> compactionSelection()
>  --> isMajorCompaction 
>     --> getLowestTimestamp()
>        -->  FileStatus[] stats = fs.listStatus(p);
> {code}
> So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.
> Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010252#comment-13010252 ] 

Jonathan Gray commented on HBASE-3693:
--------------------------------------

+1 on caching this.  Good stuff!

> isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3693
>                 URL: https://issues.apache.org/jira/browse/HBASE-3693
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Liyin Tang
>
> We noticed that are lots of listStatus calls on the ColumnFamily directories within each regions, coming from this codepath:
> {code}
> compactionSelection()
>  --> isMajorCompaction 
>     --> getLowestTimestamp()
>        -->  FileStatus[] stats = fs.listStatus(p);
> {code}
> So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.
> Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010250#comment-13010250 ] 

Jean-Daniel Cryans commented on HBASE-3693:
-------------------------------------------

Wow good on your for finding this! +1

> isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3693
>                 URL: https://issues.apache.org/jira/browse/HBASE-3693
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Liyin Tang
>
> We noticed that are lots of listStatus calls on the ColumnFamily directories within each regions, coming from this codepath:
> {code}
> compactionSelection()
>  --> isMajorCompaction 
>     --> getLowestTimestamp()
>        -->  FileStatus[] stats = fs.listStatus(p);
> {code}
> So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.
> Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

Posted by "Liyin Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-3693:
------------------------------

    Attachment: Hbase-3693[r1085306].patch

Cache the modification time in the StoreFile. So no need to keep checking modification time. 

> isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3693
>                 URL: https://issues.apache.org/jira/browse/HBASE-3693
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Liyin Tang
>         Attachments: Hbase-3693[r1085306].patch
>
>
> We noticed that there are lots of listStatus calls on the ColumnFamily directories within each region, coming from this codepath:
> {code}
> compactionSelection()
>  --> isMajorCompaction 
>     --> getLowestTimestamp()
>        -->  FileStatus[] stats = fs.listStatus(p);
> {code}
> So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.
> Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011088#comment-13011088 ] 

stack commented on HBASE-3693:
------------------------------

I like your patch Liyin.  Can I commit it? (I'm nervous about committing your stuff -- I seem to be jumping the gun).

I'd fix this misspelling: stroeFileNum 

Also, why a setModificationTimeStamp?  Seems odd being able to set this?  It comes from the fs?

I'd like to get this into 0.90.2.

> isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3693
>                 URL: https://issues.apache.org/jira/browse/HBASE-3693
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Liyin Tang
>         Attachments: Hbase-3693[r1085306].patch
>
>
> We noticed that there are lots of listStatus calls on the ColumnFamily directories within each region, coming from this codepath:
> {code}
> compactionSelection()
>  --> isMajorCompaction 
>     --> getLowestTimestamp()
>        -->  FileStatus[] stats = fs.listStatus(p);
> {code}
> So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode.
> Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira