You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by BinShi-SecularBird <gi...@git.apache.org> on 2018/09/13 21:52:29 UTC

[GitHub] phoenix pull request #347: In BaseResultIterators.getParallelScans(...), per...

GitHub user BinShi-SecularBird opened a pull request:

    https://github.com/apache/phoenix/pull/347

    In BaseResultIterators.getParallelScans(...), performa binary search …

    …instead of linear search for the first guide post in the first region of the targeted scan ranges. In details:
    
        1. The aglorithm continuously decodes and loads guide posts in batches and perform binary search in each batch, until it finds the start key of the first region in the scan ranges or its insertion position.
           There are two reasons to use moving windows to load guide posts in batches.
           a. Firstly, we don't want to load all guide posts in the memory to increase memory footprint.
           b. Secondly, we don't want to decode and load the guide posts beyond the targed scan ranges to increase the system overhead.
        2. Added config parameter STATS_GUIDEPOST_MOVING_WINDOW_SIZE to denote the size of moving window used for loading guid posts.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BinShi-SecularBird/phoenix PHOENIX-4594

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #347
    
----
commit 7b528cb324c1941c554575be207c44f9b81ef420
Author: Bin Shi <bs...@...>
Date:   2018-09-13T21:45:05Z

    In BaseResultIterators.getParallelScans(...), performa binary search instead of linear search for the first guide post in the first region of the targeted scan ranges. In details:
        1. The aglorithm continuously decodes and loads guide posts in batches and perform binary search in each batch, until it finds the start key of the first region in the scan ranges or its insertion position.
           There are two reasons to use moving windows to load guide posts in batches.
           a. Firstly, we don't want to load all guide posts in the memory to increase memory footprint.
           b. Secondly, we don't want to decode and load the guide posts beyond the targed scan ranges to increase the system overhead.
        2. Added config parameter STATS_GUIDEPOST_MOVING_WINDOW_SIZE to denote the size of moving window used for loading guid posts.

----


---

[GitHub] phoenix pull request #347: PHOENIX-4594: Perform binary search on guideposts...

Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/347#discussion_r221142871
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/ExplainPlanWithStatsEnabledIT.java ---
    @@ -853,12 +1032,52 @@ private void testSelectQueriesWithFilters(boolean useStatsForParallelization) th
                     assertEquals(100 + i, rs.getInt(1));
                     i++;
                 }
    +            assertEquals(numRows, i);
                 info = getByteRowEstimates(conn, sql, binds);
                 // Depending on the guidepost boundary, this estimate
                 // can be slightly off. It's called estimate for a reason.
                 assertEquals((Long) 10L, info.getEstimatedRows());
                 assertEquals((Long) 720L, info.getEstimatedBytes());
                 assertTrue(info.getEstimateInfoTs() > 0);
    +
    +            // Query with multiple scan ranges, and each range's start key and end key are both between data
    +            sql = "SELECT a FROM " + tableName + " WHERE K <= 103 AND K >= 101 OR K <= 108 AND K >= 106";
    +            rs = conn.createStatement().executeQuery(sql);
    +            i = 0;
    +            numRows = 6;
    +            int[] result = new int[] { 101, 102, 103, 106, 107, 108 };
    +            while (rs.next()) {
    +                assertEquals(result[i++], rs.getInt(1));
    +            }
    +            assertEquals(numRows, i);
    +            info = getByteRowEstimates(conn, sql, binds);
    +            // Depending on the guidepost boundary, this estimate
    +            // can be slightly off. It's called estimate for a reason.
    +            assertEquals((Long) 6L, info.getEstimatedRows());
    +            assertEquals((Long) 460L, info.getEstimatedBytes());
    +            // TODO: the original code before this change will hit the following assertion. Need to investigate it.
    +            // assertTrue(info.getEstimateInfoTs() > 0);
    --- End diff --
    
    Why does this assert fail now?


---

[GitHub] phoenix issue #347: In BaseResultIterators.getParallelScans(...), performa b...

Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on the issue:

    https://github.com/apache/phoenix/pull/347
  
    @karanmehta93  Can you also review this?


---

[GitHub] phoenix pull request #347: PHOENIX-4594: Perform binary search on guideposts...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/347#discussion_r218595912
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -972,32 +977,84 @@ private static boolean clipKeyRangeBytes(RowKeySchema schema, int fieldIndex, in
                     decoder = new PrefixByteDecoder(gps.getMaxLength());
                     firstRegionStartKey = new ImmutableBytesWritable(regionLocations.get(regionIndex).getRegionInfo().getStartKey());
                     try {
    -                    int c;
    -                    // Continue walking guideposts until we get past the currentKey
    -                    while ((c=currentKey.compareTo(currentGuidePost = PrefixByteCodec.decode(decoder, input))) >= 0) {
    -                        // Detect if we found a guidepost that might be in the first region. This
    -                        // is for the case where the start key may be past the only guidepost in
    -                        // the first region.
    -                        if (!gpsForFirstRegion && firstRegionStartKey.compareTo(currentGuidePost) <= 0) {
    -                            gpsForFirstRegion = true;
    +                    if (firstRegionStartKey.getLength() > 0 && this.gpsMovingWindowSize > 0) {
    +                        // Continuously decode and load guide posts in batches (moving window). For each moving window,
    +                        // firstly compare the searching key with the last element to see whether the searching key is
    +                        // in the current window. If it isn't, perform binary search in the window; otherwise, move to
    --- End diff --
    
    nit: `If it is`


---

[GitHub] phoenix pull request #347: PHOENIX-4594: Perform binary search on guideposts...

Posted by BinShi-SecularBird <gi...@git.apache.org>.
Github user BinShi-SecularBird commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/347#discussion_r223111306
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/ExplainPlanWithStatsEnabledIT.java ---
    @@ -823,12 +823,189 @@ private void testSelectQueriesWithFilters(boolean useStatsForParallelization) th
                     assertEquals(100 + i, rs.getInt(1));
                     i++;
                 }
    +            assertEquals(numRows, i);
    +            info = getByteRowEstimates(conn, sql, binds);
    +            // Depending on the guidepost boundary, this estimate
    +            // can be slightly off. It's called estimate for a reason.
    +            assertEquals((Long) 3L, info.getEstimatedRows());
    +            assertEquals((Long) 160L, info.getEstimatedBytes());
    +            assertTrue(info.getEstimateInfoTs() > 0);
    +            // Query whose start key is between and end key is after data.
    +            sql = "SELECT a FROM " + tableName + " WHERE K <= 120 AND K >= 100";
    +            rs = conn.createStatement().executeQuery(sql);
    +            i = 0;
    +            numRows = 10;
    +            while (rs.next()) {
    +                assertEquals(100 + i, rs.getInt(1));
    +                i++;
    +            }
    +            assertEquals(numRows, i);
    +            info = getByteRowEstimates(conn, sql, binds);
    +            // Depending on the guidepost boundary, this estimate
    +            // can be slightly off. It's called estimate for a reason.
    +            assertEquals((Long) 10L, info.getEstimatedRows());
    +            assertEquals((Long) 720L, info.getEstimatedBytes());
    +            assertTrue(info.getEstimateInfoTs() > 0);
    +            // Query whose start key and end key are both between data.
    +            sql = "SELECT a FROM " + tableName + " WHERE K <= 109 AND K >= 100";
    +            rs = conn.createStatement().executeQuery(sql);
    +            i = 0;
    +            numRows = 10;
    +            while (rs.next()) {
    +                assertEquals(100 + i, rs.getInt(1));
    +                i++;
    +            }
    +            assertEquals(numRows, i);
    +            info = getByteRowEstimates(conn, sql, binds);
    +            // Depending on the guidepost boundary, this estimate
    +            // can be slightly off. It's called estimate for a reason.
    +            assertEquals((Long) 10L, info.getEstimatedRows());
    +            assertEquals((Long) 720L, info.getEstimatedBytes());
    +            assertTrue(info.getEstimateInfoTs() > 0);
    +        }
    +    }
    +
    +    @Test
    +    public void testSelectQueriesWithGuidePostMovingWindows() throws Exception {
    +        testSelectQueriesWithGuidePostMovingWindows(0);
    +        testSelectQueriesWithGuidePostMovingWindows(1);
    +        testSelectQueriesWithGuidePostMovingWindows(2);
    +        testSelectQueriesWithGuidePostMovingWindows(10);
    +        testSelectQueriesWithGuidePostMovingWindows(256);
    +    }
    +
    +    private void testSelectQueriesWithGuidePostMovingWindows(int movingWindowSize) throws Exception {
    +        String tableName = generateUniqueName();
    +        try (Connection conn = DriverManager.getConnection(getUrl())) {
    +            int guidePostWidth = 20;
    +            String ddl =
    +                    "CREATE TABLE " + tableName + " (k INTEGER PRIMARY KEY, a bigint, b bigint)"
    +                            + " GUIDE_POSTS_WIDTH=" + guidePostWidth
    +                            + ", USE_STATS_FOR_PARALLELIZATION=true" + " SPLIT ON (102, 105, 108)";
    +            conn.createStatement().execute(ddl);
    --- End diff --
    
    I'll hold this change as discussed in PHOENIX-4594.


---

[GitHub] phoenix pull request #347: PHOENIX-4594: Perform binary search on guideposts...

Posted by BinShi-SecularBird <gi...@git.apache.org>.
Github user BinShi-SecularBird commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/347#discussion_r223111017
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/ExplainPlanWithStatsEnabledIT.java ---
    @@ -853,12 +1032,52 @@ private void testSelectQueriesWithFilters(boolean useStatsForParallelization) th
                     assertEquals(100 + i, rs.getInt(1));
                     i++;
                 }
    +            assertEquals(numRows, i);
                 info = getByteRowEstimates(conn, sql, binds);
                 // Depending on the guidepost boundary, this estimate
                 // can be slightly off. It's called estimate for a reason.
                 assertEquals((Long) 10L, info.getEstimatedRows());
                 assertEquals((Long) 720L, info.getEstimatedBytes());
                 assertTrue(info.getEstimateInfoTs() > 0);
    +
    +            // Query with multiple scan ranges, and each range's start key and end key are both between data
    +            sql = "SELECT a FROM " + tableName + " WHERE K <= 103 AND K >= 101 OR K <= 108 AND K >= 106";
    +            rs = conn.createStatement().executeQuery(sql);
    +            i = 0;
    +            numRows = 6;
    +            int[] result = new int[] { 101, 102, 103, 106, 107, 108 };
    +            while (rs.next()) {
    +                assertEquals(result[i++], rs.getInt(1));
    +            }
    +            assertEquals(numRows, i);
    +            info = getByteRowEstimates(conn, sql, binds);
    +            // Depending on the guidepost boundary, this estimate
    +            // can be slightly off. It's called estimate for a reason.
    +            assertEquals((Long) 6L, info.getEstimatedRows());
    +            assertEquals((Long) 460L, info.getEstimatedBytes());
    +            // TODO: the original code before this change will hit the following assertion. Need to investigate it.
    +            // assertTrue(info.getEstimateInfoTs() > 0);
    --- End diff --
    
    As the "TODO" in the comment said, this is a new test case I wanted to add in this change, but it failed. So I reverted all the changes except this new test case, but it still failed, which means there is problem in the current code base without any of my change. I commented out the assertion here, but I opened another JIRA PHOENIX-4914 to track the original problem in the code base.


---

[GitHub] phoenix pull request #347: PHOENIX-4594: Perform binary search on guideposts...

Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/347#discussion_r221761936
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/ExplainPlanWithStatsEnabledIT.java ---
    @@ -823,12 +823,189 @@ private void testSelectQueriesWithFilters(boolean useStatsForParallelization) th
                     assertEquals(100 + i, rs.getInt(1));
                     i++;
                 }
    +            assertEquals(numRows, i);
    +            info = getByteRowEstimates(conn, sql, binds);
    +            // Depending on the guidepost boundary, this estimate
    +            // can be slightly off. It's called estimate for a reason.
    +            assertEquals((Long) 3L, info.getEstimatedRows());
    +            assertEquals((Long) 160L, info.getEstimatedBytes());
    +            assertTrue(info.getEstimateInfoTs() > 0);
    +            // Query whose start key is between and end key is after data.
    +            sql = "SELECT a FROM " + tableName + " WHERE K <= 120 AND K >= 100";
    +            rs = conn.createStatement().executeQuery(sql);
    +            i = 0;
    +            numRows = 10;
    +            while (rs.next()) {
    +                assertEquals(100 + i, rs.getInt(1));
    +                i++;
    +            }
    +            assertEquals(numRows, i);
    +            info = getByteRowEstimates(conn, sql, binds);
    +            // Depending on the guidepost boundary, this estimate
    +            // can be slightly off. It's called estimate for a reason.
    +            assertEquals((Long) 10L, info.getEstimatedRows());
    +            assertEquals((Long) 720L, info.getEstimatedBytes());
    +            assertTrue(info.getEstimateInfoTs() > 0);
    +            // Query whose start key and end key are both between data.
    +            sql = "SELECT a FROM " + tableName + " WHERE K <= 109 AND K >= 100";
    +            rs = conn.createStatement().executeQuery(sql);
    +            i = 0;
    +            numRows = 10;
    +            while (rs.next()) {
    +                assertEquals(100 + i, rs.getInt(1));
    +                i++;
    +            }
    +            assertEquals(numRows, i);
    +            info = getByteRowEstimates(conn, sql, binds);
    +            // Depending on the guidepost boundary, this estimate
    +            // can be slightly off. It's called estimate for a reason.
    +            assertEquals((Long) 10L, info.getEstimatedRows());
    +            assertEquals((Long) 720L, info.getEstimatedBytes());
    +            assertTrue(info.getEstimateInfoTs() > 0);
    +        }
    +    }
    +
    +    @Test
    +    public void testSelectQueriesWithGuidePostMovingWindows() throws Exception {
    +        testSelectQueriesWithGuidePostMovingWindows(0);
    +        testSelectQueriesWithGuidePostMovingWindows(1);
    +        testSelectQueriesWithGuidePostMovingWindows(2);
    +        testSelectQueriesWithGuidePostMovingWindows(10);
    +        testSelectQueriesWithGuidePostMovingWindows(256);
    +    }
    +
    +    private void testSelectQueriesWithGuidePostMovingWindows(int movingWindowSize) throws Exception {
    +        String tableName = generateUniqueName();
    +        try (Connection conn = DriverManager.getConnection(getUrl())) {
    +            int guidePostWidth = 20;
    +            String ddl =
    +                    "CREATE TABLE " + tableName + " (k INTEGER PRIMARY KEY, a bigint, b bigint)"
    +                            + " GUIDE_POSTS_WIDTH=" + guidePostWidth
    +                            + ", USE_STATS_FOR_PARALLELIZATION=true" + " SPLIT ON (102, 105, 108)";
    +            conn.createStatement().execute(ddl);
    --- End diff --
    
    refactor the common init code in initDataAndStats and reuse it here.


---