You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2011/01/05 17:04:46 UTC

[jira] Created: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-3417
                 URL: https://issues.apache.org/jira/browse/HBASE-3417
             Project: HBase
          Issue Type: Bug
          Components: io, regionserver
    Affects Versions: 0.92.0
            Reporter: Jonathan Gray
            Assignee: Jonathan Gray
            Priority: Critical
             Fix For: 0.92.0


Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.

The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-3417:
---------------------------------

    Attachment: HBASE-3417-redux-v1.patch

Adds a new unit test in TestFromClientSide for testing cache on write and evict on close by looking inside the block cache.

This test fails on 92 and trunk without the rest of the changes in this patch (CacheOnWrite is currently completely broken with hfile v2).
                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009449#comment-13009449 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

Just verified that this is the same as what we have been running with in production (since the patch was put up in January).

I'm ready to commit if you want to +1 me :)

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977943#action_12977943 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

One idea from discussion with stack is to use a UUID for the filename.  That way we can generate it once for the temporary file and then just move it in place without doing a rename.  Would then just use UUID + blockNumber as the blockName.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126856#comment-13126856 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------



bq.  On 2011-10-13 19:33:27, Michael Stack wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 166
bq.  > <https://reviews.apache.org/r/2379/diff/2/?file=49945#file49945line166>
bq.  >
bq.  >     @Mikhail.... yeah.   I figured it out.  I was thinking we should allow dashes... then it looks like a uuid and makes compare  by human easier.
bq.  
bq.  Jonathan Gray wrote:
bq.      we had this argument months ago.  there's a good consistency in the UIs w/ the current region encoded names (used as dir names) w/ these uuid (minus -) names.  The dashes makes mismatched (as I recall my thoughts on the matter way back when it originally had dashes).
bq.      
bq.      i stand by the no dashes!  :)

I took a look.  The UUID ends up being the key in a LRU cache.  I'm fine w/ it (I argued for keeping them back then too... sorry for rehearsing old arg)


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2565
-----------------------------------------------------------


On 2011-10-13 19:36:33, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 19:36:33)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1183049 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1183049 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1183049 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1183049 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1183049 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009537#comment-13009537 ] 

stack commented on HBASE-3417:
------------------------------

+1 on patch for 0.92 (But don't include the TestResettingCounters pollution).

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray reopened HBASE-3417:
----------------------------------


Reopening this.  I believe CacheOnWrite is actually broken right now, this exact issue still exists.

Will attach a new unit test that shows COW not working.  Will work on the fix.
                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978427#action_12978427 ] 

stack commented on HBASE-3417:
------------------------------

Should you add in A-Z in below just in case?

{code}
+    Pattern.compile("^([0-9a-f]+)(?:\\.(.+))?$");
{code}

Yeah, don't replace the '-' I'd say:

{code}
+    return new Path(dir, UUID.randomUUID().toString().replaceAll("-", "")
+        + ((suffix == null || suffix.length() <= 0) ? "" : suffix));
{code}

Then its easy to go back to UUID.  You might want to do that so you can use the 128 bits as key in LRU rather than String?

Otherwise, +1 IFF verified backward compatible.




> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126842#comment-13126842 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2564
-----------------------------------------------------------



/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
<https://reviews.apache.org/r/2379/#comment5748>

    I like this suggestion... keeping config immutable and go make a new one if you need to change something... w/ all final.


- Michael


On 2011-10-13 06:31:13, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 06:31:13)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127197#comment-13127197 ] 

Hudson commented on HBASE-3417:
-------------------------------

Integrated in HBase-0.92 #63 (See [https://builds.apache.org/job/HBase-0.92/63/])
    HBASE-3417  CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme (jgray)

jgray : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java

                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978431#action_12978431 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

bq. Should you add in A-Z in below just in case?
Could add A-F (uuid is hex chars only), but it's unnecessary.

bq. Then its easy to go back to UUID. You might want to do that so you can use the 128 bits as key in LRU rather than String?
LRU uses a String for block name.  I think it looks much nicer with a consistent looking naming scheme for region directories and storefiles.  And I don't think we need to be overly concerned about the size... If 64K block, in the LRU we're talking about 0.05% overhead (or like 0.02% over a more compact version).

Also, traditional GUID format reminds me of Microsoft SQL Server :)

This latest v5 patch is being deployed on a 100 node cluster with existing data tonight.  Will commit once verified that it's working there.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-3417:
---------------------------------

    Release Note: 
StoreFile naming changes from random ascii longs to [0-9a-f] uuids.  The same name is used in the tmp and perm location.  This name is what gets used for the block name in the cache.

This change is a backwards compatible slow migration, all new files will be named with the new scheme, old files will still be readable.  However, once you have new files created in the new scheme, previous installations will fail.

  was:StoreFile naming changes from random ascii longs to [0-9a-f] uuids.  The same name is used in the tmp and perm location.  This name is what gets used for the block name in the cache.

    
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126764#comment-13126764 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2555
-----------------------------------------------------------



/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
<https://reviews.apache.org/r/2379/#comment5731>

    Can we keep these final and CacheConfig immutable, and create a separate instance of CacheConfig during testing instead?



/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
<https://reviews.apache.org/r/2379/#comment5732>

    The same as above.



/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
<https://reviews.apache.org/r/2379/#comment5736>

    Using the approach above we can avoid creating new setters.
    
    If it's not feasible, then appending "ForTesting" to these method names will probably be enough to discourage use of them in production.



/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
<https://reviews.apache.org/r/2379/#comment5733>

    Maybe we can avoid these setters using the approach I described above? If not, it should either be package-private, or the name should include something like "ForTesting".



/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
<https://reviews.apache.org/r/2379/#comment5734>

    The same as above.



/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/2379/#comment5737>

    This does not return null anymore in the new version? This will probably result in an IOException down the line, is that handled well? We could throw an IOException right here as an alternative.



/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2379/#comment5738>

    @Michael: This pattern does not allow dashes in the UUID, it only allows 0-9 and a-f.



/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2379/#comment5739>

    suffix.length() <= 0 is unnecessary.



/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
<https://reviews.apache.org/r/2379/#comment5740>

    Nice test! This goes all the way from client side to HFile, unlike TestCacheOnWrite.


- Mikhail


On 2011-10-13 06:31:13, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 06:31:13)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978124#action_12978124 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

I changed regex to be {{([0-9a-z]+)}}

I kind of like how it is.  It looks just like the encoded region names used for region directory names.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126639#comment-13126639 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2548
-----------------------------------------------------------

Ship it!


Just a few questions in the below.  I'm good w/ commit.  Would suggest open new issue rather than reopen an old one in future -- makes it easier on the fellas coming up behind us trying to make some sense of what we did.  Good stuff.


/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/2379/#comment5719>

    This is the tmp name?  Thats what you want?   I don't see where it gets set to renamed filename.



/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2379/#comment5720>

    Why?  To be more specific?
    
    Oh, I see.... how about not removing the '-' from uuid?  Is it illegal in filesystem?



/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2379/#comment5721>

    This is better.


- Michael


On 2011-10-13 06:31:13, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 06:31:13)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126851#comment-13126851 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/
-----------------------------------------------------------

(Updated 2011-10-13 19:36:33.845621)


Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.


Changes
-------

Renamed setters to be prefixed with:  forTestsOnly_

(I'm still opting to keep it this way because I know that the next set of tests I write for the compressed cache stuff is going to want to do the same thing)


Summary
-------

Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.

The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.


This addresses bug HBASE-3417.
    https://issues.apache.org/jira/browse/HBASE-3417


Diffs (updated)
-----

  /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1183049 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1183049 
  /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1183049 
  /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1183049 
  /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1183049 

Diff: https://reviews.apache.org/r/2379/diff


Testing
-------

TestFromClientSide and TestCacheOnWrite both working


Thanks,

Jonathan


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126838#comment-13126838 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------



bq.  On 2011-10-13 19:24:46, Jonathan Gray wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, lines 1228-1229
bq.  > <https://reviews.apache.org/r/2379/diff/2/?file=49944#file49944line1228>
bq.  >
bq.  >     I think i'll restore the old behavior of returning null, but if it gets an IOE, that will be bubbled up (that should happen for a harder failure than just a simple rename failure, in which case, we probably want to bubble it up?)

Sounds good. By the way, is a null return value handled correctly upstream? The whole compaction probably needs to be considered failed in that case.


- Mikhail


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2561
-----------------------------------------------------------


On 2011-10-13 06:31:13, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 06:31:13)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126848#comment-13126848 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------



bq.  On 2011-10-13 19:31:38, Michael Stack wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java, line 100
bq.  > <https://reviews.apache.org/r/2379/diff/2/?file=49943#file49943line100>
bq.  >
bq.  >     I like this suggestion... keeping config immutable and go make a new one if you need to change something... w/ all final.

I like the idea too, but it's more clunky.  Then CacheConfig can't be final in things like Store and it becomes not very useful in other tests.  I'm going to make the test method names more explicit.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2564
-----------------------------------------------------------


On 2011-10-13 19:33:02, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 19:33:02)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1183046 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1183046 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1183046 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1183046 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1183046 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978125#action_12978125 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

Old random file name was using rand.nextLong() so it could be any length >= 1.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3417:
-------------------------

    Status: Patch Available  (was: Open)

This patch did not go in.  Want me to massage it in?

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126839#comment-13126839 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------



bq.  On 2011-10-13 19:24:46, Jonathan Gray wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, lines 1228-1229
bq.  > <https://reviews.apache.org/r/2379/diff/2/?file=49944#file49944line1228>
bq.  >
bq.  >     I think i'll restore the old behavior of returning null, but if it gets an IOE, that will be bubbled up (that should happen for a harder failure than just a simple rename failure, in which case, we probably want to bubble it up?)
bq.  
bq.  Mikhail Bautin wrote:
bq.      Sounds good. By the way, is a null return value handled correctly upstream? The whole compaction probably needs to be considered failed in that case.

Guess I should have looked at that.  Seems that it is not handled.

I'm going to change the behavior to throw an IOException rather than return null.

I will kick off the test suite before committing.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2561
-----------------------------------------------------------


On 2011-10-13 06:31:13, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 06:31:13)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127018#comment-13127018 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

Stack, I was going to open a new JIRA, but it is the exact same issue and a nearly identical patch (primary difference is pre/post hfile v2).  It was just incorrect to close this following commit of hfile v2 which was unrelated to this bug.  Nothing was ever committed under this JIRA so just reopened with an updated patch.

I think things get confusing when there is more than one commit per branch per jira.  We should probably ban that practice.  Or at least institute some kind of standardized commit message (HBASE-3417, HBASE-3417-B, HBASE-3417-C, etc) or some such thing.
                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-3417:
---------------------------------

    Attachment: HBASE-3417-v5.patch

Latest version.  Adds proper handling in the compaction path (previous patch only dealt with flush).  Changes regex to use [0-9a-f]+ to be backwards compatible.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126766#comment-13126766 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------



bq.  On 2011-10-13 17:58:45, Mikhail Bautin wrote:
bq.  >

Nice fix, Jonathan! For some reason I assumed that HBASE-3417 was part of the HFile v2 patch when I submitted the patch, but apparently it was not.


- Mikhail


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2555
-----------------------------------------------------------


On 2011-10-13 06:31:13, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 06:31:13)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126841#comment-13126841 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/
-----------------------------------------------------------

(Updated 2011-10-13 19:33:02.273799)


Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.


Changes
-------

Changes from Mikhail review.  Changes it so if we fail rename we throw an exception, otherwise it looks like we would eventually get an NPE down the line?  There's no special casing for a null return.

Will run test suite before committing.


Summary
-------

Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.

The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.


This addresses bug HBASE-3417.
    https://issues.apache.org/jira/browse/HBASE-3417


Diffs (updated)
-----

  /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1183046 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1183046 
  /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1183046 
  /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1183046 
  /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1183046 

Diff: https://reviews.apache.org/r/2379/diff


Testing
-------

TestFromClientSide and TestCacheOnWrite both working


Thanks,

Jonathan


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126370#comment-13126370 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/
-----------------------------------------------------------

Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.


Summary
-------

Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.

The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.


This addresses bug HBASE-3417.
    https://issues.apache.org/jira/browse/HBASE-3417


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182636 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182636 
  /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182636 
  /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182636 
  /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182636 

Diff: https://reviews.apache.org/r/2379/diff


Testing
-------

TestFromClientSide and TestCacheOnWrite both working


Thanks,

Jonathan


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-3417:
---------------------------------

    Attachment: HBASE-3417-v1.patch

Changes storefile names to be UUIDs.  Makes it so we use the same name for the tmp file and the permanent file.  Updates a regex which now matches against 32 char word string instead of digits.  Changes HFile to use the file name for block cache block names rather than full path.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126378#comment-13126378 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/
-----------------------------------------------------------

(Updated 2011-10-13 06:31:13.710051)


Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.


Changes
-------

Applies cleanly.


Summary
-------

Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.

The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.


This addresses bug HBASE-3417.
    https://issues.apache.org/jira/browse/HBASE-3417


Diffs (updated)
-----

  /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
  /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
  /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
  /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 

Diff: https://reviews.apache.org/r/2379/diff


Testing
-------

TestFromClientSide and TestCacheOnWrite both working


Thanks,

Jonathan


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978122#action_12978122 ] 

stack commented on HBASE-3417:
------------------------------

As discussed up on IRC, this is not backward compatible:

{code}
+    Pattern.compile("^(\\w{32})(?:\\.(.+))?$");
{code}

You can do a range IIRC 20-32 (was old length 20 chars?)

The below is a little bit messy.:

{code}
+    return new Path(dir, UUID.randomUUID().toString().replaceAll("-", "")
+        + ((suffix == null || suffix.length() <= 0) ? "" : suffix));
{code}

Up on IRC, was thinking should base64 because then it'd be more compact.   See http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string.  There is also in hbase util a Base64#encodeBytes method that will take the 128 UUID bits and emit them as base64 (Possible to get it all down to 22 chars).  But looking at the base64 vocabulary, http://en.wikipedia.org/wiki/Base64, it includes '+' and '/' which are illegal in URL, a hdfs filepath.  Base32?  http://en.wikipedia.org/wiki/Base32? But that won't work either.  Has to be multiples of 40 bits.

Maybe leave it as it comes out of UUID.toString w/ hyphens.  Then its plain its a UUID and its easier to read?



> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068829#comment-13068829 ] 

stack commented on HBASE-3417:
------------------------------

Will this issue still be pertinent after the hfile v2 goes in (It seems to support cacheOnWrite natively).

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-3417:
---------------------------------

    Attachment: HBASE-3417-v2.patch

Makes it so we don't have to parse fileName for each block when doing CacheOnWrite.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126853#comment-13126853 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------



bq.  On 2011-10-13 19:33:27, Michael Stack wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 166
bq.  > <https://reviews.apache.org/r/2379/diff/2/?file=49945#file49945line166>
bq.  >
bq.  >     @Mikhail.... yeah.   I figured it out.  I was thinking we should allow dashes... then it looks like a uuid and makes compare  by human easier.

we had this argument months ago.  there's a good consistency in the UIs w/ the current region encoded names (used as dir names) w/ these uuid (minus -) names.  The dashes makes mismatched (as I recall my thoughts on the matter way back when it originally had dashes).

i stand by the no dashes!  :)


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2565
-----------------------------------------------------------


On 2011-10-13 19:36:33, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 19:36:33)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1183049 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1183049 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1183049 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1183049 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1183049 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127051#comment-13127051 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

I didn't mark as incompatible but it is only one-way compatible.

There is actually a very trivial change that can be made in the 0.90 branch (or any other branches) to make this change compatible in all directions.  Just need to update the REF_NAME_PARSER regex to be what it is in this change (tolerant of [a-f] in addition to digits).  That's it.
                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127054#comment-13127054 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

In StoreFile.java:
{code}
   private static final Pattern REF_NAME_PARSER =
-    Pattern.compile("^(\\d+)(?:\\.(.+))?$");
+    Pattern.compile("^([0-9a-f]+)(?:\\.(.+))?$");
{code}

If you ever need to go backwards from 92 to a previous version.
                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray resolved HBASE-3417.
----------------------------------

      Resolution: Fixed
    Release Note: StoreFile naming changes from random ascii longs to [0-9a-f] uuids.  The same name is used in the tmp and perm location.  This name is what gets used for the block name in the cache.

Committed to 92 and trunk.  Thanks Stack and Mikhail for the speedy reviews.
                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126847#comment-13126847 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2565
-----------------------------------------------------------



/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2379/#comment5749>

    @Mikhail.... yeah.   I figured it out.  I was thinking we should allow dashes... then it looks like a uuid and makes compare  by human easier.


- Michael


On 2011-10-13 19:33:02, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 19:33:02)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1183046 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1183046 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1183046 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1183046 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1183046 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126832#comment-13126832 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------



bq.  On 2011-10-13 14:49:35, Michael Stack wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 496
bq.  > <https://reviews.apache.org/r/2379/diff/2/?file=49944#file49944line496>
bq.  >
bq.  >     This is the tmp name?  Thats what you want?   I don't see where it gets set to renamed filename.

The tmp name and perm name are the same.  The tmp is just in .tmp


bq.  On 2011-10-13 14:49:35, Michael Stack wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 166
bq.  > <https://reviews.apache.org/r/2379/diff/2/?file=49945#file49945line166>
bq.  >
bq.  >     Why?  To be more specific?
bq.  >     
bq.  >     Oh, I see.... how about not removing the '-' from uuid?  Is it illegal in filesystem?

I am removing the '-'.  This checks for alphanumerics only.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2548
-----------------------------------------------------------


On 2011-10-13 06:31:13, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 06:31:13)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069204#comment-13069204 ] 

Jonathan Gray commented on HBASE-3417:
--------------------------------------

It does support COW but if it doesn't include changes to how files are named, it will still need this fix.  Will follow-up with Mikhail.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128386#comment-13128386 ] 

Hudson commented on HBASE-3417:
-------------------------------

Integrated in HBase-TRUNK #2325 (See [https://builds.apache.org/job/HBase-TRUNK/2325/])
    HBASE-3417  CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme (jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java

                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Bautin updated HBASE-3417:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

This was committed as part of HFile v2 (HBASE-3857). Resolving.

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126834#comment-13126834 ] 

jiraposter@reviews.apache.org commented on HBASE-3417:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2379/#review2561
-----------------------------------------------------------


Thanks for the reviews stack and mikhail!


/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/2379/#comment5744>

    I think i'll restore the old behavior of returning null, but if it gets an IOE, that will be bubbled up (that should happen for a harder failure than just a simple rename failure, in which case, we probably want to bubble it up?)


- Jonathan


On 2011-10-13 06:31:13, Jonathan Gray wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2379/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-13 06:31:13)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, and Mikhail Bautin.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Adds a new test that fails w/o this fix and changes the naming scheme for storefiles to use UUID instead of random longs as ascii.
bq.  
bq.  The big change is that the name of the tmp file used when flushing and compacting is the same name (but different dir) when moved in place.  This makes it so block names are consistent for COW.
bq.  
bq.  
bq.  This addresses bug HBASE-3417.
bq.      https://issues.apache.org/jira/browse/HBASE-3417
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1182679 
bq.    /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1182679 
bq.    /src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1182679 
bq.  
bq.  Diff: https://reviews.apache.org/r/2379/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestFromClientSide and TestCacheOnWrite both working
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009396#comment-13009396 ] 

stack commented on HBASE-3417:
------------------------------

You running this patch still Jon or did you do something else?

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069216#comment-13069216 ] 

Mikhail Bautin commented on HBASE-3417:
---------------------------------------

@stack: This patch is already part of HFile-v2. Thanks!

> CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch
>
>
> Currently the block names used in the block cache are built using the filesystem path.  However, for cache on write, the path is a temporary output file.
> The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough.  Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira