You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (Created) (JIRA)" <ji...@apache.org> on 2011/12/04 02:50:39 UTC

[jira] [Created] (HBASE-4944) Optionally verify bulk loaded HFiles

Optionally verify bulk loaded HFiles
------------------------------------

                 Key: HBASE-4944
                 URL: https://issues.apache.org/jira/browse/HBASE-4944
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
            Reporter: Andrew Purtell
            Priority: Minor
             Fix For: 0.92.0, 0.94.0, 0.90.5


We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.

Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162767#comment-13162767 ] 

Hudson commented on HBASE-4944:
-------------------------------

Integrated in HBase-0.92-security #30 (See [https://builds.apache.org/job/HBase-0.92-security/30/])
    HBASE-4944. Optionally verify bulk loaded HFiles

apurtell : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-4944:
----------------------------------

        Fix Version/s:     (was: 0.90.5)
                           (was: 0.94.0)
                           (was: 0.92.0)
    Affects Version/s: 0.90.5
                       0.94.0
                       0.92.0
               Status: Patch Available  (was: Open)
    
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-4944:
----------------------------------

    Attachment:     (was: 4944.txt)
    
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162528#comment-13162528 ] 

Andrew Purtell commented on HBASE-4944:
---------------------------------------

Test failures are unrelated to the patch. All tests pass locally for me. 

@Ted, what do you think of v3?
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-4944:
----------------------------------

    Attachment: HBASE-4944-v2.patch

Rebased patch addressing Ted's comments.
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: 4944.txt, HBASE-4944-v2.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162291#comment-13162291 ] 

Hadoop QA commented on HBASE-4944:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12506018/HBASE-4944-v3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -160 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 71 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestTimeRangeMapRed
                  org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
                  org.apache.hadoop.hbase.TestDrainingServer
                  org.apache.hadoop.hbase.TestFullLogReconstruction
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/437//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/437//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/437//console

This message is automatically generated.
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4944:
--------------------------

    Attachment: 4944.txt

Patch from Andy.
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: 4944.txt
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162592#comment-13162592 ] 

Hudson commented on HBASE-4944:
-------------------------------

Integrated in HBase-0.92 #170 (See [https://builds.apache.org/job/HBase-0.92/170/])
    HBASE-4944. Optionally verify bulk loaded HFiles

apurtell : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162279#comment-13162279 ] 

Hadoop QA commented on HBASE-4944:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12506016/4944.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/436//console

This message is automatically generated.
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: 4944.txt
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162557#comment-13162557 ] 

Andrew Purtell commented on HBASE-4944:
---------------------------------------

@Ted Thanks for taking a look. Sure, I will make that change on commit. 

                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162280#comment-13162280 ] 

Ted Yu commented on HBASE-4944:
-------------------------------

Minor comments:
{code}
+        KeyValue pkv = null;
{code}
The variable can be named prevKV which is clearer.
{code}
+              throw new InvalidHFileException("Previous row is greater then"
{code}
Typo above, should be 'greater than'.


                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: 4944.txt
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162253#comment-13162253 ] 

Andrew Purtell commented on HBASE-4944:
---------------------------------------

>From JIRA: "Cannot attach file HBASE-4944.patch: Unknown server error (500)."

The patch is pretty small, so here it is:

{code}
Index: src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/regionserver/Store.java	(revision 1210044)
+++ src/main/java/org/apache/hadoop/hbase/regionserver/Store.java	(working copy)
@@ -50,6 +50,7 @@
 import org.apache.hadoop.hbase.io.hfile.Compression;
 import org.apache.hadoop.hbase.io.hfile.HFile;
 import org.apache.hadoop.hbase.io.hfile.HFileScanner;
+import org.apache.hadoop.hbase.io.hfile.InvalidHFileException;
 import org.apache.hadoop.hbase.monitoring.MonitoredTask;
 import org.apache.hadoop.hbase.regionserver.StoreScanner.ScanType;
 import org.apache.hadoop.hbase.regionserver.compactions.CompactionProgress;
@@ -123,6 +124,7 @@
   private final String storeNameStr;
   private CompactionProgress progress;
   private final int compactionKVMax;
+  private final boolean verifyBulkLoads;
 
   // not private for testing
   /* package */ScanInfo scanInfo;
@@ -222,6 +224,9 @@
       = conf.getLong("hbase.hstore.compaction.max.size", Long.MAX_VALUE);
     this.compactionKVMax = conf.getInt("hbase.hstore.compaction.kv.max", 10);
 
+    this.verifyBulkLoads = conf.getBoolean("hbase.hstore.bulkload.verify",
+        true);
+
     if (Store.closeCheckInterval == 0) {
       Store.closeCheckInterval = conf.getInt(
           "hbase.hstore.close.check.interval", 10*1000*1000 /* 10 MB */);
@@ -355,8 +360,8 @@
   }
 
   /**
-   * This throws a WrongRegionException if the bulkHFile does not fit in this
-   * region.
+   * This throws a WrongRegionException if the HFile does not fit in this
+   * region, or an InvalidHFileException if the HFile is not valid.
    *
    */
   void assertBulkLoadHFileOk(Path srcPath) throws IOException {
@@ -386,6 +391,34 @@
             "Bulk load file " + srcPath.toString() + " does not fit inside region "
             + this.region);
       }
+
+      if (verifyBulkLoads) {
+        KeyValue pkv = null;
+        HFileScanner scanner = reader.getScanner(false, false, false);
+        scanner.seekTo();
+        do {
+          KeyValue kv = scanner.getKeyValue();
+          if (pkv != null) {
+            if (Bytes.compareTo(pkv.getBuffer(), pkv.getRowOffset(),
+                pkv.getRowLength(), kv.getBuffer(), kv.getRowOffset(),
+                kv.getRowLength()) > 0) {
+              throw new InvalidHFileException("Previous row is greater then"
+                  + " current row: path=" + srcPath + " previous="
+                  + Bytes.toStringBinary(pkv.getKey()) + " current="
+                  + Bytes.toStringBinary(kv.getKey()));
+            }
+            if (Bytes.compareTo(pkv.getBuffer(), pkv.getFamilyOffset(),
+                pkv.getFamilyLength(), kv.getBuffer(), kv.getFamilyOffset(),
+                kv.getFamilyLength()) != 0) {
+              throw new InvalidHFileException("Previous key had different"
+                  + " family compared to current key: path=" + srcPath
+                  + " previous=" + Bytes.toStringBinary(pkv.getKey())
+                  + " current=" + Bytes.toStringBinary(kv.getKey()));
+            }
+          }
+          pkv = kv;
+        } while (scanner.next());
+      }
     } finally {
       if (reader != null) reader.close();
     }
Index: src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java	(revision 0)
+++ src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java	(revision 0)
@@ -0,0 +1,40 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.io.hfile;
+
+import java.io.IOException;
+
+/**
+ * Thrown when an invalid HFile format is detected
+ */
+public class InvalidHFileException extends IOException {
+  private static final long serialVersionUID = 4660352028739861249L;
+
+  /** constructor */
+  public InvalidHFileException() {
+    super();
+  }
+
+  /**
+   * Constructor
+   * @param s message
+   */
+  public InvalidHFileException(String s) {
+    super(s);
+  }
+}
\ No newline at end of file
{code}
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162620#comment-13162620 ] 

Hudson commented on HBASE-4944:
-------------------------------

Integrated in HBase-TRUNK #2516 (See [https://builds.apache.org/job/HBase-TRUNK/2516/])
    HBASE-4944. Optionally verify bulk loaded HFiles

apurtell : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162549#comment-13162549 ] 

Ted Yu commented on HBASE-4944:
-------------------------------

Patch v3 looks good.

Minor comment for the case of different families:
{code}
+                  + " previous=" + Bytes.toStringBinary(prevKV.getKey())
+                  + " current=" + Bytes.toStringBinary(kv.getKey()));
{code}
I think it would be nice to include family names by calling getFamily() in the above message.
This can be done at time of commit.
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reassigned HBASE-4944:
-------------------------------------

    Assignee: Andrew Purtell
    
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-4944:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.90.5
                   0.94.0
                   0.92.0
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)
    
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162281#comment-13162281 ] 

Ted Yu commented on HBASE-4944:
-------------------------------

Looks like the patch should be rebased:
{code}
4 out of 5 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/Store.java.rej
{code}
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: 4944.txt
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Andrew Purtell (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-4944:
----------------------------------

    Attachment: HBASE-4944-v3.patch

Sorry, v3 patch restores the default to false (current behavior)
                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162658#comment-13162658 ] 

Hudson commented on HBASE-4944:
-------------------------------

Integrated in HBase-TRUNK-security #22 (See [https://builds.apache.org/job/HBase-TRUNK-security/22/])
    HBASE-4944. Optionally verify bulk loaded HFiles

apurtell : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Optionally verify bulk loaded HFiles
> ------------------------------------
>
>                 Key: HBASE-4944
>                 URL: https://issues.apache.org/jira/browse/HBASE-4944
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4944-v2.patch, HBASE-4944-v3.patch
>
>
> We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira