You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Irfan Mohammed <ir...@gmail.com> on 2009/06/18 12:16:15 UTC

Scan returns rows beyond the endRow when the column is specified

Hi,
We ran into an issue where the scan resulted in rows beyond the endRow. Are we doing something incorrectly here? The test case is given below. When the scan.addColumn(...) is specified, the rows has { "row333" } but having the scan.addColumn(...) gives rows { "row555" }.

dumpTable results :
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111], [publisher_id:Pub111] => [10]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222], [publisher_id:Pub111] => [15]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333], [publisher_id:Pub222] => [20]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444], [publisher_id:Pub222] => [30]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555], [publisher_id:Pub111] => [40]

We are using 0.20.0 from mainline on ubuntu 9.04.

Thanks,
Irfan

    /**
     * Test case to confirm a bug in HBase Scan method.
     * 
     * Scenario -
     * a) We have a HTable as follows
     *  
     *          |    publisher_id:Pub111         publisher_id:Pub222
     * -------------------------------------------------------------
     * row111   |        x                  |
     * row222   |        x                  |
     * row333   |                           |         x
     * row444   |                           |         x
     * row555   |        x                  |
     * --------------------------------------------------------------
     * Where 'x' denotes some data
     * 
     * b) We set up a Scan from "row333" to "row444", and specify column publisher_id:Pub111
     * c) We expect to get 0 Result objects, because  row333 and row444 do not have any data for Pub111
     * d) BUG - Instead we get the Result row for row555 .. which is totally unexpected (its outside the range we specified in the Scan)  
     * 
     * @throws Exception just propagates the Exception
     */
    @Test
    public void testBugInHBaseScan() throws Exception
    {
        /*
         * Create the table
         */
        HTable table = createTable("test_get_range_value", new FilterableDimension[]{Dimension.PUBLISHER}, null);
        
        /*
         * Add rows
         */
        Put row1 = new Put(Bytes.toBytes("row111"));
        Put row2 = new Put(Bytes.toBytes("row222"));
        Put row3 = new Put(Bytes.toBytes("row333"));
        Put row4 = new Put(Bytes.toBytes("row444"));        
        Put row5 = new Put(Bytes.toBytes("row555"));
        
        row1.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"), Bytes.toBytes(10L));
        row2.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"), Bytes.toBytes(15L));        
        row3.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub222"), Bytes.toBytes(20L));
        row4.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub222"), Bytes.toBytes(30L));
        row5.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"), Bytes.toBytes(40L));
        
        List<Put> rows = new ArrayList<Put>();
        rows.add(row1);
        rows.add(row2);
        rows.add(row3);
        rows.add(row4);        
        rows.add(row5);
        
        table.put(rows);
        
        dumpTable(table);
        
        /*
         * Per the above setup, row333 and row444 don't have any data for the column publisher_id:Pub111,
         * BUT they have data for the column publisher_id:Pub222
         * 
         * We now setup a Scan between row333 and row444 on the column publisher_id:Pub111
         * 
         * Expected behavior - No Result objects should be returned.
         */
        
        Scan scan = new Scan(Bytes.toBytes("row333"), Bytes.toBytes("row444"));
        scan.setMaxVersions(100000);
        scan.addColumn(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"));
        
        boolean success = true;
        ResultScanner scanner = table.getScanner(scan);
        Result result = scanner.next();
        
        StringBuffer buffer = new StringBuffer();
        while (result != null) {
            success = false;
            buffer.append(result);
            buffer.append("\n");
            result = scanner.next();
        }
        
        assertTrue("Did not expect the scanner to return any Result .. but got a result object for these rows - " + buffer.toString(), success);
    }

Re: Scan returns rows beyond the endRow when the column is specified

Posted by stack <st...@duboce.net>.
First, thank you for going to the bother of writing up a test to prove the
bug you are seeing.  Excellent.

Would you mind filing an issue and attaching the full source of your test
(e.g. dumpTable is missing from the below).  Hopefully I can then just run
it and see the issue you are reporting.

Thanks Irfan.
St.Ack




On Thu, Jun 18, 2009 at 3:16 AM, Irfan Mohammed <ir...@gmail.com> wrote:

> Hi,
> We ran into an issue where the scan resulted in rows beyond the endRow. Are
> we doing something incorrectly here? The test case is given below. When the
> scan.addColumn(...) is specified, the rows has { "row333" } but having the
> scan.addColumn(...) gives rows { "row555" }.
>
> dumpTable results :
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row111]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row111], family : [publisher_id]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row111], [publisher_id:Pub111] => [10]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row222]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row222], family : [publisher_id]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row222], [publisher_id:Pub111] => [15]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row333]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row333], family : [publisher_id]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row333], [publisher_id:Pub222] => [20]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row444]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row444], family : [publisher_id]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row444], [publisher_id:Pub222] => [30]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row555]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row555], family : [publisher_id]
> 09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row :
> [row555], [publisher_id:Pub111] => [40]
>
> We are using 0.20.0 from mainline on ubuntu 9.04.
>
> Thanks,
> Irfan
>
>    /**
>     * Test case to confirm a bug in HBase Scan method.
>     *
>     * Scenario -
>     * a) We have a HTable as follows
>     *
>     *          |    publisher_id:Pub111         publisher_id:Pub222
>     * -------------------------------------------------------------
>     * row111   |        x                  |
>     * row222   |        x                  |
>     * row333   |                           |         x
>     * row444   |                           |         x
>     * row555   |        x                  |
>     * --------------------------------------------------------------
>     * Where 'x' denotes some data
>     *
>     * b) We set up a Scan from "row333" to "row444", and specify column
> publisher_id:Pub111
>     * c) We expect to get 0 Result objects, because  row333 and row444 do
> not have any data for Pub111
>     * d) BUG - Instead we get the Result row for row555 .. which is totally
> unexpected (its outside the range we specified in the Scan)
>     *
>     * @throws Exception just propagates the Exception
>     */
>    @Test
>    public void testBugInHBaseScan() throws Exception
>    {
>        /*
>         * Create the table
>         */
>        HTable table = createTable("test_get_range_value", new
> FilterableDimension[]{Dimension.PUBLISHER}, null);
>
>        /*
>         * Add rows
>         */
>        Put row1 = new Put(Bytes.toBytes("row111"));
>        Put row2 = new Put(Bytes.toBytes("row222"));
>        Put row3 = new Put(Bytes.toBytes("row333"));
>        Put row4 = new Put(Bytes.toBytes("row444"));
>        Put row5 = new Put(Bytes.toBytes("row555"));
>
>        row1.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()),
> Bytes.toBytes("Pub111"), Bytes.toBytes(10L));
>        row2.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()),
> Bytes.toBytes("Pub111"), Bytes.toBytes(15L));
>        row3.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()),
> Bytes.toBytes("Pub222"), Bytes.toBytes(20L));
>        row4.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()),
> Bytes.toBytes("Pub222"), Bytes.toBytes(30L));
>        row5.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()),
> Bytes.toBytes("Pub111"), Bytes.toBytes(40L));
>
>        List<Put> rows = new ArrayList<Put>();
>        rows.add(row1);
>        rows.add(row2);
>        rows.add(row3);
>        rows.add(row4);
>        rows.add(row5);
>
>        table.put(rows);
>
>        dumpTable(table);
>
>        /*
>         * Per the above setup, row333 and row444 don't have any data for
> the column publisher_id:Pub111,
>         * BUT they have data for the column publisher_id:Pub222
>         *
>         * We now setup a Scan between row333 and row444 on the column
> publisher_id:Pub111
>         *
>         * Expected behavior - No Result objects should be returned.
>         */
>
>        Scan scan = new Scan(Bytes.toBytes("row333"),
> Bytes.toBytes("row444"));
>        scan.setMaxVersions(100000);
>        scan.addColumn(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()),
> Bytes.toBytes("Pub111"));
>
>        boolean success = true;
>        ResultScanner scanner = table.getScanner(scan);
>        Result result = scanner.next();
>
>        StringBuffer buffer = new StringBuffer();
>        while (result != null) {
>            success = false;
>            buffer.append(result);
>            buffer.append("\n");
>            result = scanner.next();
>        }
>
>        assertTrue("Did not expect the scanner to return any Result .. but
> got a result object for these rows - " + buffer.toString(), success);
>    }
>

Re: Scan returns rows beyond the endRow when the column is specified

Posted by Irfan Mohammed <ir...@gmail.com>.
Scratch the first line in my previous mail. It should read as follows:

We ran into an issue where the scan resulted in rows beyond the endRow. Are we doing something incorrectly here? The test case is given below. When the scan.addColumn(...) is commented, the rows has { "row333" } but having the scan.addColumn(...) in the scan gives rows { "row555" }.

----- Original Message -----
From: "Irfan Mohammed" <ir...@gmail.com>
To: "hbase-dev" <hb...@hadoop.apache.org>
Sent: Thursday, June 18, 2009 6:16:15 AM GMT -05:00 US/Canada Eastern
Subject: Scan returns rows beyond the endRow when the column is specified

Hi,
We ran into an issue where the scan resulted in rows beyond the endRow. Are we doing something incorrectly here? The test case is given below. When the scan.addColumn(...) is specified, the rows has { "row333" } but having the scan.addColumn(...) gives rows { "row555" }.

dumpTable results :
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row111], [publisher_id:Pub111] => [10]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row222], [publisher_id:Pub111] => [15]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row333], [publisher_id:Pub222] => [20]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row444], [publisher_id:Pub222] => [30]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555], family : [publisher_id]
09/06/18 06:00:43 WARN persistence.QueryEngineTest.: [qws_internal] row : [row555], [publisher_id:Pub111] => [40]

We are using 0.20.0 from mainline on ubuntu 9.04.

Thanks,
Irfan

    /**
     * Test case to confirm a bug in HBase Scan method.
     * 
     * Scenario -
     * a) We have a HTable as follows
     *  
     *          |    publisher_id:Pub111         publisher_id:Pub222
     * -------------------------------------------------------------
     * row111   |        x                  |
     * row222   |        x                  |
     * row333   |                           |         x
     * row444   |                           |         x
     * row555   |        x                  |
     * --------------------------------------------------------------
     * Where 'x' denotes some data
     * 
     * b) We set up a Scan from "row333" to "row444", and specify column publisher_id:Pub111
     * c) We expect to get 0 Result objects, because  row333 and row444 do not have any data for Pub111
     * d) BUG - Instead we get the Result row for row555 .. which is totally unexpected (its outside the range we specified in the Scan)  
     * 
     * @throws Exception just propagates the Exception
     */
    @Test
    public void testBugInHBaseScan() throws Exception
    {
        /*
         * Create the table
         */
        HTable table = createTable("test_get_range_value", new FilterableDimension[]{Dimension.PUBLISHER}, null);
        
        /*
         * Add rows
         */
        Put row1 = new Put(Bytes.toBytes("row111"));
        Put row2 = new Put(Bytes.toBytes("row222"));
        Put row3 = new Put(Bytes.toBytes("row333"));
        Put row4 = new Put(Bytes.toBytes("row444"));        
        Put row5 = new Put(Bytes.toBytes("row555"));
        
        row1.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"), Bytes.toBytes(10L));
        row2.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"), Bytes.toBytes(15L));        
        row3.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub222"), Bytes.toBytes(20L));
        row4.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub222"), Bytes.toBytes(30L));
        row5.add(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"), Bytes.toBytes(40L));
        
        List<Put> rows = new ArrayList<Put>();
        rows.add(row1);
        rows.add(row2);
        rows.add(row3);
        rows.add(row4);        
        rows.add(row5);
        
        table.put(rows);
        
        dumpTable(table);
        
        /*
         * Per the above setup, row333 and row444 don't have any data for the column publisher_id:Pub111,
         * BUT they have data for the column publisher_id:Pub222
         * 
         * We now setup a Scan between row333 and row444 on the column publisher_id:Pub111
         * 
         * Expected behavior - No Result objects should be returned.
         */
        
        Scan scan = new Scan(Bytes.toBytes("row333"), Bytes.toBytes("row444"));
        scan.setMaxVersions(100000);
        scan.addColumn(Bytes.toBytes(Dimension.PUBLISHER.getFamilyName()), Bytes.toBytes("Pub111"));
        
        boolean success = true;
        ResultScanner scanner = table.getScanner(scan);
        Result result = scanner.next();
        
        StringBuffer buffer = new StringBuffer();
        while (result != null) {
            success = false;
            buffer.append(result);
            buffer.append("\n");
            result = scanner.next();
        }
        
        assertTrue("Did not expect the scanner to return any Result .. but got a result object for these rows - " + buffer.toString(), success);
    }