You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by ankitsinghal <gi...@git.apache.org> on 2016/01/15 11:34:35 UTC

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

GitHub user ankitsinghal opened a pull request:

    https://github.com/apache/phoenix/pull/147

    PHOENIX-2417 Compress memory used by row key byte[] of guideposts

    - Still need to tweak some copying of bytes
    - Having only these two test case failing currently 
        ViewIT.testNonSaltedUpdatableViewWithIndex:129->BaseViewIT.testUpdatableViewWithIndex:85->BaseViewIT.testUpdatableViewIndex:158 expected:<6> but was:<2>
      ViewIT.testNonSaltedUpdatableViewWithIndex:129->BaseViewIT.testUpdatableViewWithIndex:85->BaseViewIT.testUpdatableViewIndex:158 expected:<6> but was:<2>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ankitsinghal/phoenix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #147
    
----
commit 5ebcd9e2f100ab7eb15452e41ae27be8cbe4070e
Author: Ankit Singhal <an...@gmail.com>
Date:   2016-01-15T10:30:23Z

    PHOENIX-2417 Compress memory used by row key byte[] of guideposts

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50264321
  
    --- Diff: phoenix-protocol/src/main/PGuidePosts.proto ---
    @@ -24,7 +24,9 @@ option java_generic_services = true;
     option java_generate_equals_and_hash = true;
     option optimize_for = SPEED;
     message PGuidePosts {
    -  repeated bytes guidePosts = 1;
    --- End diff --
    
    These fields were optional and will not impact as we are not using stats with old client. But anyways I have made the change to add a new field and keep the old one as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on the pull request:

    https://github.com/apache/phoenix/pull/147#issuecomment-172366554
  
    Made the changes in GuidePostInfoBuilder as suggested and attach a wip patch for stats upgrade from server side on ticket. 
    Please review and confirm if I'm on the right track.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925760
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -461,9 +463,9 @@ private static String toString(List<byte[]> gps) {
             for(Pair<byte[], byte[]> where : context.getWhereConditionColumns()) {
               whereConditions.add(where.getFirst());
             }
    -        List<byte[]> gps = getGuidePosts(whereConditions);
    +        GuidePostsInfo gps = getGuidePosts(whereConditions);
             if (logger.isDebugEnabled()) {
    -            logger.debug("Guideposts: " + toString(gps));
    +            logger.debug("Guideposts: " + gps);
             }
    --- End diff --
    
    Minor: I'd get rid of this logger if statement. The output could be pretty huge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49883072
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -490,15 +493,37 @@ private static String toString(List<byte[]> gps) {
             }
             List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1);
             
    -        byte[] currentKey = startKey;
    -        int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey);
    -        int gpsSize = gps.size();
    +        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length);
    +        
    +        int gpsSize = gps.getGuidePostsCount();
             int estGuidepostsPerRegion = gpsSize == 0 ? 1 : gpsSize / regionLocations.size() + 1;
             int keyOffset = 0;
    +        ImmutableBytesWritable currentGuidePost = ByteUtil.EMPTY_IMMUTABLE_BYTE_ARRAY;
             List<Scan> scans = Lists.newArrayListWithExpectedSize(estGuidepostsPerRegion);
    +        ImmutableBytesWritable guidePosts = gps.getGuidePosts();
    +        ByteArrayInputStream stream = null;
    +        DataInput input = null;
    +        PrefixByteDecoder decoder = null;
    +        int guideIndex = 0;
    +        if (gpsSize > 0) {
    +            stream = new ByteArrayInputStream(guidePosts.get(), guidePosts.getOffset(), guidePosts.getLength());
    +            input = new DataInputStream(stream);
    +            if (gps.getMaxLength() > 0) {
    +                decoder = new PrefixByteDecoder(gps.getMaxLength());
    +            } else {
    +                decoder = new PrefixByteDecoder();
    +            }
    +            try {
    +                currentGuidePost = CodecUtils.decode(decoder, input);
    +                while (currentKey.getLength() != 0 && Bytes.compareTo(currentKey.get(), currentGuidePost.get()) <= 0) {
    --- End diff --
    
    You can't treat an ImmutableBytesWritable the same as a byte[]. To compare them, you do this:
    
        while (currentKey.compareTo(currentGuidePost = CodecUtils.decode(decoder, input)) <= 0) {
            guideIndex++;
        }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49935446
  
    --- Diff: phoenix-protocol/src/main/PTable.proto ---
    @@ -52,11 +52,12 @@ message PColumn {
     
     message PTableStats {
       required bytes key = 1;
    -  repeated bytes values = 2;
    +  optional bytes guidePosts = 2;
    --- End diff --
    
    I'm leaning toward having this release be 5.0. We can add some code MetaDataRegionObserver.postOpen() that does a checkAndPut on the SYSTEM.CATALOG table for the SYSTEM.STATS row where we conditionally truncate the table (i.e. invoke the code you already wrote).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49932690
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -510,14 +532,18 @@ private static String toString(List<byte[]> gps) {
                     endRegionKey = regionInfo.getEndKey();
                     keyOffset = ScanUtil.getRowKeyOffset(regionInfo.getStartKey(), endRegionKey);
                 }
    -            while (guideIndex < gpsSize
    -                    && (Bytes.compareTo(currentGuidePost = gps.get(guideIndex), endKey) <= 0 || endKey.length == 0)) {
    -                Scan newScan = scanRanges.intersectScan(scan, currentKey, currentGuidePost, keyOffset, false);
    -                scans = addNewScan(parallelScans, scans, newScan, currentGuidePost, false, regionLocation);
    -                currentKey = currentGuidePost;
    -                guideIndex++;
    -            }
    -            Scan newScan = scanRanges.intersectScan(scan, currentKey, endKey, keyOffset, true);
    +            try {
    +                while (guideIndex < gpsSize && (currentGuidePost.compareTo(endKey) <= 0 || endKey.length == 0)) {
    +                    Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes, keyOffset,
    +                            false);
    +                    scans = addNewScan(parallelScans, scans, newScan, currentGuidePostBytes, false, regionLocation);
    +                    currentKeyBytes = currentGuidePost.copyBytes();
    --- End diff --
    
    same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49938247
  
    --- Diff: phoenix-protocol/src/main/PTable.proto ---
    @@ -52,11 +52,12 @@ message PColumn {
     
     message PTableStats {
       required bytes key = 1;
    -  repeated bytes values = 2;
    +  optional bytes guidePosts = 2;
    --- End diff --
    
    yes it makes sense, as truncate of system.stats is necessary. I'll try this tomorrow.
    
    Yes ,as per your previous comments , I'll be keeping "values" field at position 2, so this will automatically ensure that empty PGuidePosts is returned when client older than 4.7 is used right?
    As per below code from version older than 4.7 (PTableImpl.createFromProto() )
    GuidePostsInfo info =
                        new GuidePostsInfo(guidePostsByteCount, value, rowCount);//Prior 4.7 version :- empty "value" list.
    
    3) I have created a GuidePostsInfoWriter and you can review the changes in this pull request now.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50270038
  
    --- Diff: phoenix-protocol/src/main/PTable.proto ---
    @@ -57,6 +57,8 @@ message PTableStats {
       optional int64 keyBytesCount = 4;
       optional int32 guidePostsCount = 5;
       optional PGuidePosts pGuidePosts = 6;
    +  optional bytes encodedGuidePosts = 7;
    --- End diff --
    
    I'm a bit confused by this, though. Doesn't the PTable still send multiple PGuidePosts (one per column family)? It's just the PGuidePosts that have changed, no? You already have maxLength on PGuidePosts (which is where it belongs). Why is it needed here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on the pull request:

    https://github.com/apache/phoenix/pull/147#issuecomment-173700290
  
    Thanks @JamesRTaylor for all the review and help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49939273
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfoWriter.java ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stats;
    +
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.util.ByteUtil;
    +import org.apache.phoenix.util.CodecUtils;
    +import org.apache.phoenix.util.PrefixByteEncoder;
    +import org.apache.phoenix.util.TrustedByteArrayOutputStream;
    +
    +/*
    + * Writer to help in writing guidePosts and creating guidePostInfo. This is used when we are collecting stats or reading stats for a table.
    + */
    +
    +public class GuidePostsInfoWriter {
    +    private PrefixByteEncoder encoder;
    +    private byte[] lastRow;
    +    private ImmutableBytesWritable guidePosts=new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY);
    +    private long byteCount = 0;
    +    private int guidePostsCount;
    +    
    +    /**
    +     * The rowCount that is flattened across the total number of guide posts.
    +     */
    +    private long rowCount = 0;
    +    
    +    /**
    +     * Maximum length of a guidePost collected
    +     */
    +    private int maxLength;
    +    private DataOutputStream output;
    +    private TrustedByteArrayOutputStream stream;
    +    
    +    public final static GuidePostsInfo EMPTY_GUIDEPOST = new GuidePostsInfo(0,
    +            new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY), 0, 0, 0);
    +
    +    public int getMaxLength() {
    +        return maxLength;
    +    }
    +    public GuidePostsInfoWriter(){
    +        this.stream = new TrustedByteArrayOutputStream(1);
    +        this.output = new DataOutputStream(stream);
    +        this.encoder=new PrefixByteEncoder();
    +        lastRow = ByteUtil.EMPTY_BYTE_ARRAY;
    +    }
    +
    +    /**
    +     * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
    +     * reached while collecting stats.
    +     * @param row
    +     * @param byteCount
    +     * @return
    +     * @throws IOException 
    +     */
    +    public boolean writeGuidePosts( byte[] row, long byteCount, long rowCount) {
    +        if (row.length != 0 && Bytes.compareTo(lastRow, row) < 0) {
    +            try {
    +                encoder.encode(output, row, 0, row.length);
    +                this.byteCount += byteCount;
    +                this.guidePostsCount++;
    +                this.maxLength = encoder.getMaxLength();
    +                this.rowCount += rowCount;
    +                lastRow = row;
    +                return true;
    +            } catch (IOException e) {
    +                return false;
    +            }
    +        }
    +        return false;
    +    }
    +    
    +    public boolean writeGuidePosts(byte[] row){
    --- End diff --
    
    Here too: addGuidePosts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50269285
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/UpgradeUtil.java ---
    @@ -1215,4 +1218,67 @@ public static void addRowKeyOrderOptimizableCell(List<Mutation> tableMetadata, b
                     MetaDataEndpointImpl.ROW_KEY_ORDER_OPTIMIZABLE_BYTES, PBoolean.INSTANCE.toBytes(true));
             tableMetadata.add(put);
         }
    +
    +    public static boolean truncateStats(HTableInterface metaTable, HTableInterface statsTable)
    +            throws IOException, InterruptedException {
    +        List<Cell> columnCells = metaTable
    +                .get(new Get(SchemaUtil.getTableKey(null, PhoenixDatabaseMetaData.SYSTEM_SCHEMA_NAME,
    +                        PhoenixDatabaseMetaData.SYSTEM_CATALOG_TABLE)))
    +                .getColumnCells(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, QueryConstants.EMPTY_COLUMN_BYTES);
    +        if (!columnCells.isEmpty()
    +                && columnCells.get(0).getTimestamp() < MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0) {
    --- End diff --
    
    The if isn't needed because the client-side code that truncates the stats table has been removed. This is the only place we do it. If stats building gets triggered *before* the client-side upgrade code has run (for example, through compaction), then it will build using the new logic with the new schema. It should be fine, b/c the code uses straight HBase APIs, not Phoenix APIs. Since we always send back empty guideposts for the protobuf field that old clients will be looking at, clients will just get empty stats until the client-side upgrade code runs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49939246
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/CodecUtils.java ---
    @@ -0,0 +1,104 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.util;
    +
    +import java.io.ByteArrayInputStream;
    +import java.io.ByteArrayOutputStream;
    +import java.io.DataInput;
    +import java.io.DataInputStream;
    +import java.io.DataOutput;
    +import java.io.DataOutputStream;
    +import java.io.EOFException;
    +import java.io.IOException;
    +import java.util.List;
    +
    +import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    +
    +import com.google.common.collect.Lists;
    +
    +public class CodecUtils {
    --- End diff --
    
    Rename to PrefixByteCodec


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50166028
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataRegionObserver.java ---
    @@ -96,10 +111,109 @@ public void start(CoprocessorEnvironment env) throws IOException {
             rebuildIndexTimeInterval = env.getConfiguration().getLong(QueryServices.INDEX_FAILURE_HANDLING_REBUILD_INTERVAL_ATTRIB,
                 QueryServicesOptions.DEFAULT_INDEX_FAILURE_HANDLING_REBUILD_INTERVAL);
         }
    -
    +    
    +    private static String getJdbcUrl(RegionCoprocessorEnvironment env) {
    +        String zkQuorum = env.getConfiguration().get(HConstants.ZOOKEEPER_QUORUM);
    +        String zkClientPort = env.getConfiguration().get(HConstants.ZOOKEEPER_CLIENT_PORT,
    +            Integer.toString(HConstants.DEFAULT_ZOOKEPER_CLIENT_PORT));
    +        String zkParentNode = env.getConfiguration().get(HConstants.ZOOKEEPER_ZNODE_PARENT,
    +            HConstants.DEFAULT_ZOOKEEPER_ZNODE_PARENT);
    +        return PhoenixRuntime.JDBC_PROTOCOL + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum
    +            + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkClientPort
    +            + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkParentNode;
    +    }
     
         @Override
         public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
    +        final RegionCoprocessorEnvironment env = e.getEnvironment();
    +
    +        Runnable r = new Runnable() {
    +            @Override
    +            public void run() {
    +                HTableInterface metaTable = null;
    --- End diff --
    
    Move the Thread.sleep(1000) outside here and move this logic into new UpgradeUtil.truncateStats(HTableInterface metaTable, HTableInterface statsTable) method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49934050
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -490,15 +492,35 @@ private static String toString(List<byte[]> gps) {
             }
             List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1);
             
    -        byte[] currentKey = startKey;
    -        int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey);
    -        int gpsSize = gps.size();
    +        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length);
    +        
    +        int gpsSize = gps.getGuidePostsCount();
             int estGuidepostsPerRegion = gpsSize == 0 ? 1 : gpsSize / regionLocations.size() + 1;
             int keyOffset = 0;
    +        ImmutableBytesWritable currentGuidePost = ByteUtil.EMPTY_IMMUTABLE_BYTE_ARRAY;
             List<Scan> scans = Lists.newArrayListWithExpectedSize(estGuidepostsPerRegion);
    +        ImmutableBytesWritable guidePosts = gps.getGuidePosts();
    +        ByteArrayInputStream stream = null;
    +        DataInput input = null;
    +        PrefixByteDecoder decoder = null;
    +        int guideIndex = 0;
    +        if (gpsSize > 0) {
    +            stream = new ByteArrayInputStream(guidePosts.get(), guidePosts.getOffset(), guidePosts.getLength());
    +            input = new DataInputStream(stream);
    +            decoder = new PrefixByteDecoder(gps.getMaxLength());
    +            try {
    +                while (currentKey.compareTo(currentGuidePost = CodecUtils.decode(decoder, input)) >= 0
    +                        && currentKey.getLength() != 0) {
    +                    guideIndex++;
    +                }
    +            } catch (EOFException e) {}
    +        }
    +        byte[] currentKeyBytes = currentKey.copyBytes();
    --- End diff --
    
    Yes, you're right, @ankitsinghal. Never mind about these changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925681
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -510,14 +532,18 @@ private static String toString(List<byte[]> gps) {
                     endRegionKey = regionInfo.getEndKey();
                     keyOffset = ScanUtil.getRowKeyOffset(regionInfo.getStartKey(), endRegionKey);
                 }
    -            while (guideIndex < gpsSize
    -                    && (Bytes.compareTo(currentGuidePost = gps.get(guideIndex), endKey) <= 0 || endKey.length == 0)) {
    -                Scan newScan = scanRanges.intersectScan(scan, currentKey, currentGuidePost, keyOffset, false);
    -                scans = addNewScan(parallelScans, scans, newScan, currentGuidePost, false, regionLocation);
    -                currentKey = currentGuidePost;
    -                guideIndex++;
    -            }
    -            Scan newScan = scanRanges.intersectScan(scan, currentKey, endKey, keyOffset, true);
    +            try {
    +                while (guideIndex < gpsSize && (currentGuidePost.compareTo(endKey) <= 0 || endKey.length == 0)) {
    +                    Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes, keyOffset,
    +                            false);
    +                    scans = addNewScan(parallelScans, scans, newScan, currentGuidePostBytes, false, regionLocation);
    +                    currentKeyBytes = currentGuidePost.copyBytes();
    --- End diff --
    
    And here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49932689
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -490,15 +492,35 @@ private static String toString(List<byte[]> gps) {
             }
             List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1);
             
    -        byte[] currentKey = startKey;
    -        int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey);
    -        int gpsSize = gps.size();
    +        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length);
    +        
    +        int gpsSize = gps.getGuidePostsCount();
             int estGuidepostsPerRegion = gpsSize == 0 ? 1 : gpsSize / regionLocations.size() + 1;
             int keyOffset = 0;
    +        ImmutableBytesWritable currentGuidePost = ByteUtil.EMPTY_IMMUTABLE_BYTE_ARRAY;
             List<Scan> scans = Lists.newArrayListWithExpectedSize(estGuidepostsPerRegion);
    +        ImmutableBytesWritable guidePosts = gps.getGuidePosts();
    +        ByteArrayInputStream stream = null;
    +        DataInput input = null;
    +        PrefixByteDecoder decoder = null;
    +        int guideIndex = 0;
    +        if (gpsSize > 0) {
    +            stream = new ByteArrayInputStream(guidePosts.get(), guidePosts.getOffset(), guidePosts.getLength());
    +            input = new DataInputStream(stream);
    +            decoder = new PrefixByteDecoder(gps.getMaxLength());
    +            try {
    +                while (currentKey.compareTo(currentGuidePost = CodecUtils.decode(decoder, input)) >= 0
    +                        && currentKey.getLength() != 0) {
    +                    guideIndex++;
    +                }
    +            } catch (EOFException e) {}
    +        }
    +        byte[] currentKeyBytes = currentKey.copyBytes();
    +
             // Merge bisect with guideposts for all but the last region
             while (regionIndex <= stopIndex) {
    -            byte[] currentGuidePost, endKey, endRegionKey = EMPTY_BYTE_ARRAY;
    +            byte[] currentGuidePostBytes = currentGuidePost.copyBytes();
    --- End diff --
    
    same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49939293
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfoWriter.java ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stats;
    +
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.util.ByteUtil;
    +import org.apache.phoenix.util.CodecUtils;
    +import org.apache.phoenix.util.PrefixByteEncoder;
    +import org.apache.phoenix.util.TrustedByteArrayOutputStream;
    +
    +/*
    + * Writer to help in writing guidePosts and creating guidePostInfo. This is used when we are collecting stats or reading stats for a table.
    + */
    +
    +public class GuidePostsInfoWriter {
    +    private PrefixByteEncoder encoder;
    +    private byte[] lastRow;
    +    private ImmutableBytesWritable guidePosts=new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY);
    +    private long byteCount = 0;
    +    private int guidePostsCount;
    +    
    +    /**
    +     * The rowCount that is flattened across the total number of guide posts.
    +     */
    +    private long rowCount = 0;
    +    
    +    /**
    +     * Maximum length of a guidePost collected
    +     */
    +    private int maxLength;
    +    private DataOutputStream output;
    +    private TrustedByteArrayOutputStream stream;
    +    
    +    public final static GuidePostsInfo EMPTY_GUIDEPOST = new GuidePostsInfo(0,
    +            new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY), 0, 0, 0);
    +
    +    public int getMaxLength() {
    +        return maxLength;
    +    }
    +    public GuidePostsInfoWriter(){
    +        this.stream = new TrustedByteArrayOutputStream(1);
    +        this.output = new DataOutputStream(stream);
    +        this.encoder=new PrefixByteEncoder();
    +        lastRow = ByteUtil.EMPTY_BYTE_ARRAY;
    +    }
    +
    +    /**
    +     * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
    +     * reached while collecting stats.
    +     * @param row
    +     * @param byteCount
    +     * @return
    +     * @throws IOException 
    +     */
    +    public boolean writeGuidePosts( byte[] row, long byteCount, long rowCount) {
    +        if (row.length != 0 && Bytes.compareTo(lastRow, row) < 0) {
    +            try {
    +                encoder.encode(output, row, 0, row.length);
    +                this.byteCount += byteCount;
    +                this.guidePostsCount++;
    +                this.maxLength = encoder.getMaxLength();
    +                this.rowCount += rowCount;
    +                lastRow = row;
    +                return true;
    +            } catch (IOException e) {
    +                return false;
    +            }
    +        }
    +        return false;
    +    }
    +    
    +    public boolean writeGuidePosts(byte[] row){
    +        return writeGuidePosts(row, 0, 0);
    +    }
    +
    +    public boolean writeGuidePosts(byte[] row, long byteCount){
    +        return writeGuidePosts(row, byteCount, 0);
    +    }
    +
    +    public void close() {
    +        CodecUtils.close(stream);
    +    }
    +
    +    public void incrementRowCount() {
    +        this.rowCount++;
    +    }
    +
    +    public long getByteCount() {
    +        return byteCount;
    +    }
    +
    +    public int getGuidePostsCount() {
    +        return guidePostsCount;
    +    }
    +
    +    public long getRowCount() {
    +        return rowCount;
    +    }
    +
    +    public ImmutableBytesWritable getGuidePosts() {
    +        this.guidePosts.set(stream.getBuffer(), 0, stream.size());
    +        return guidePosts;
    +    }
    +    
    +    public GuidePostsInfo createGuidePostInfo(){
    --- End diff --
    
    Rename to build() and call close() at end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/147#issuecomment-172279531
  
    Looking good. One more pass, and I think we'll be good to go. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49935021
  
    --- Diff: phoenix-protocol/src/main/PTable.proto ---
    @@ -52,11 +52,12 @@ message PColumn {
     
     message PTableStats {
       required bytes key = 1;
    -  repeated bytes values = 2;
    +  optional bytes guidePosts = 2;
    --- End diff --
    
    There's a pretty big backward compatibility issue due to PHOENIX-2143 and this one. The case you'll need to make work is an old pre 4.7.0 client that's running against a new 4.7.0 server. The client will expect the stats to be in the original format. In the following call:
    
        public void getTable(RpcController controller, GetTableRequest request,
                RpcCallback<MetaDataResponse> done) {
    
    You'll need to pass request.getClientVersion() through doGetTable(), into getTable() and finally into StatisticsUtil.readStatistics(). You should preserve the old code (we can dump it when we do a major release), and use that code path if the stats have not been regenerated yet. You can detect this based on the existence of the GUIDE_POSTS key value (which you'll want to project into the scan for the new code for this b/w compatibility case). If the stats have been regenerated, there'd be two cases: the client is pre 4.7.0 in which case you'd want to use the new code but put the data in the old format, or the client is 4.7.0 or above in which case your existing code is fine.
    
    With PHOENIX-2143, when compaction runs, we'll generate stats in the new format. It's possible that the SYSTEM.STATS table hasn't been updated yet (as this gets triggered when a new 4.7.0 client connects to the server which may not yet have happened). We'd need to issue the previous Delete marker based on the old row key structure to ensure that the stats for the region are deleted. We wouldn't want to issue the query that does the range delete in this case because it might delete rows across multiple regions (ugh). So we'd need to know if the schema upgrade has been done yet when compaction runs. We could detect this by querying the SYSTEM.CATALOG table directly or by using the MetaDataProtocol.getTable() call and pulling over the PTable and then conditionally do the delete the old way versus the new way.
    
    WDYT, @ankitsinghal? A more radical alternative would be to call this release 5.0. Users could still upgrade the server and client as with a minor release, but they'd need to truncate the SYSTEM.STATS table manually before upgrading the server. In that case, I think it'd be acceptable to return an empty guidepost for the protobuf values field (as essentially stats would be disabled for older clients running against the newer server).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49932700
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -510,14 +532,18 @@ private static String toString(List<byte[]> gps) {
                     endRegionKey = regionInfo.getEndKey();
                     keyOffset = ScanUtil.getRowKeyOffset(regionInfo.getStartKey(), endRegionKey);
                 }
    -            while (guideIndex < gpsSize
    -                    && (Bytes.compareTo(currentGuidePost = gps.get(guideIndex), endKey) <= 0 || endKey.length == 0)) {
    -                Scan newScan = scanRanges.intersectScan(scan, currentKey, currentGuidePost, keyOffset, false);
    -                scans = addNewScan(parallelScans, scans, newScan, currentGuidePost, false, regionLocation);
    -                currentKey = currentGuidePost;
    -                guideIndex++;
    -            }
    -            Scan newScan = scanRanges.intersectScan(scan, currentKey, endKey, keyOffset, true);
    +            try {
    +                while (guideIndex < gpsSize && (currentGuidePost.compareTo(endKey) <= 0 || endKey.length == 0)) {
    +                    Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes, keyOffset,
    +                            false);
    +                    scans = addNewScan(parallelScans, scans, newScan, currentGuidePostBytes, false, regionLocation);
    +                    currentKeyBytes = currentGuidePost.copyBytes();
    +                    currentGuidePost = CodecUtils.decode(decoder, input);
    +                    currentGuidePostBytes = currentGuidePost.copyBytes();
    --- End diff --
    
    same as above. As all are using the same byte[] object which the decoder is using.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49935621
  
    --- Diff: phoenix-protocol/src/main/PTable.proto ---
    @@ -52,11 +52,12 @@ message PColumn {
     
     message PTableStats {
       required bytes key = 1;
    -  repeated bytes values = 2;
    +  optional bytes guidePosts = 2;
    --- End diff --
    
    Let's do something in-the-middle. We can stick with the plan that this is still 4.7.0 release, but we can do the above in MetaDataRegionObserver to ensure that the SYSTEM.STATS table is truncated. Here what needs to be done:
    * conditionally truncate SYSTEM.STATS table in MetaDataRegionObserver.postOpen() based on checkAndPut
    * keep values field at protobuf position 2 and return an empty PGuidePosts for that field. We'll document that stats are essentially disabled for an old client once you upgrade your server (but nothing will break).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49882616
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -490,15 +493,37 @@ private static String toString(List<byte[]> gps) {
             }
             List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1);
             
    -        byte[] currentKey = startKey;
    -        int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey);
    -        int gpsSize = gps.size();
    +        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length);
    +        
    +        int gpsSize = gps.getGuidePostsCount();
             int estGuidepostsPerRegion = gpsSize == 0 ? 1 : gpsSize / regionLocations.size() + 1;
             int keyOffset = 0;
    +        ImmutableBytesWritable currentGuidePost = ByteUtil.EMPTY_IMMUTABLE_BYTE_ARRAY;
             List<Scan> scans = Lists.newArrayListWithExpectedSize(estGuidepostsPerRegion);
    +        ImmutableBytesWritable guidePosts = gps.getGuidePosts();
    +        ByteArrayInputStream stream = null;
    +        DataInput input = null;
    +        PrefixByteDecoder decoder = null;
    +        int guideIndex = 0;
    +        if (gpsSize > 0) {
    +            stream = new ByteArrayInputStream(guidePosts.get(), guidePosts.getOffset(), guidePosts.getLength());
    +            input = new DataInputStream(stream);
    +            if (gps.getMaxLength() > 0) {
    +                decoder = new PrefixByteDecoder(gps.getMaxLength());
    +            } else {
    +                decoder = new PrefixByteDecoder();
    +            }
    --- End diff --
    
    Instead of this if statement everywhere, just do this (and ensure that PrefixByteDecoder does the right thing if the maxLength is zero):
    
        decoder = new PrefixByteDecoder(gps.getMaxLength());


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49939300
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfoWriter.java ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stats;
    +
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.util.ByteUtil;
    +import org.apache.phoenix.util.CodecUtils;
    +import org.apache.phoenix.util.PrefixByteEncoder;
    +import org.apache.phoenix.util.TrustedByteArrayOutputStream;
    +
    +/*
    + * Writer to help in writing guidePosts and creating guidePostInfo. This is used when we are collecting stats or reading stats for a table.
    + */
    +
    +public class GuidePostsInfoWriter {
    +    private PrefixByteEncoder encoder;
    +    private byte[] lastRow;
    +    private ImmutableBytesWritable guidePosts=new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY);
    +    private long byteCount = 0;
    +    private int guidePostsCount;
    +    
    +    /**
    +     * The rowCount that is flattened across the total number of guide posts.
    +     */
    +    private long rowCount = 0;
    +    
    +    /**
    +     * Maximum length of a guidePost collected
    +     */
    +    private int maxLength;
    +    private DataOutputStream output;
    +    private TrustedByteArrayOutputStream stream;
    +    
    +    public final static GuidePostsInfo EMPTY_GUIDEPOST = new GuidePostsInfo(0,
    +            new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY), 0, 0, 0);
    +
    +    public int getMaxLength() {
    +        return maxLength;
    +    }
    +    public GuidePostsInfoWriter(){
    +        this.stream = new TrustedByteArrayOutputStream(1);
    +        this.output = new DataOutputStream(stream);
    +        this.encoder=new PrefixByteEncoder();
    +        lastRow = ByteUtil.EMPTY_BYTE_ARRAY;
    +    }
    +
    +    /**
    +     * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
    +     * reached while collecting stats.
    +     * @param row
    +     * @param byteCount
    +     * @return
    +     * @throws IOException 
    +     */
    +    public boolean writeGuidePosts( byte[] row, long byteCount, long rowCount) {
    +        if (row.length != 0 && Bytes.compareTo(lastRow, row) < 0) {
    +            try {
    +                encoder.encode(output, row, 0, row.length);
    +                this.byteCount += byteCount;
    +                this.guidePostsCount++;
    +                this.maxLength = encoder.getMaxLength();
    +                this.rowCount += rowCount;
    +                lastRow = row;
    +                return true;
    +            } catch (IOException e) {
    +                return false;
    +            }
    +        }
    +        return false;
    +    }
    +    
    +    public boolean writeGuidePosts(byte[] row){
    +        return writeGuidePosts(row, 0, 0);
    +    }
    +
    +    public boolean writeGuidePosts(byte[] row, long byteCount){
    +        return writeGuidePosts(row, byteCount, 0);
    +    }
    +
    +    public void close() {
    +        CodecUtils.close(stream);
    +    }
    +
    +    public void incrementRowCount() {
    +        this.rowCount++;
    +    }
    +
    +    public long getByteCount() {
    --- End diff --
    
    Make the getters private if possible (otherwise leave as is)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/147#issuecomment-172969102
  
    Looks good. Couple of minor nits and one b/w compat question. Looks like it needs to be rebased too. Thanks for the contribution, @ankitsinghal  - nice work!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49939279
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfoWriter.java ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stats;
    +
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.util.ByteUtil;
    +import org.apache.phoenix.util.CodecUtils;
    +import org.apache.phoenix.util.PrefixByteEncoder;
    +import org.apache.phoenix.util.TrustedByteArrayOutputStream;
    +
    +/*
    + * Writer to help in writing guidePosts and creating guidePostInfo. This is used when we are collecting stats or reading stats for a table.
    + */
    +
    +public class GuidePostsInfoWriter {
    +    private PrefixByteEncoder encoder;
    +    private byte[] lastRow;
    +    private ImmutableBytesWritable guidePosts=new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY);
    +    private long byteCount = 0;
    +    private int guidePostsCount;
    +    
    +    /**
    +     * The rowCount that is flattened across the total number of guide posts.
    +     */
    +    private long rowCount = 0;
    +    
    +    /**
    +     * Maximum length of a guidePost collected
    +     */
    +    private int maxLength;
    +    private DataOutputStream output;
    +    private TrustedByteArrayOutputStream stream;
    +    
    +    public final static GuidePostsInfo EMPTY_GUIDEPOST = new GuidePostsInfo(0,
    +            new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY), 0, 0, 0);
    +
    +    public int getMaxLength() {
    +        return maxLength;
    +    }
    +    public GuidePostsInfoWriter(){
    +        this.stream = new TrustedByteArrayOutputStream(1);
    +        this.output = new DataOutputStream(stream);
    +        this.encoder=new PrefixByteEncoder();
    +        lastRow = ByteUtil.EMPTY_BYTE_ARRAY;
    +    }
    +
    +    /**
    +     * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
    +     * reached while collecting stats.
    +     * @param row
    +     * @param byteCount
    +     * @return
    +     * @throws IOException 
    +     */
    +    public boolean writeGuidePosts( byte[] row, long byteCount, long rowCount) {
    +        if (row.length != 0 && Bytes.compareTo(lastRow, row) < 0) {
    +            try {
    +                encoder.encode(output, row, 0, row.length);
    +                this.byteCount += byteCount;
    +                this.guidePostsCount++;
    +                this.maxLength = encoder.getMaxLength();
    +                this.rowCount += rowCount;
    +                lastRow = row;
    +                return true;
    +            } catch (IOException e) {
    +                return false;
    +            }
    +        }
    +        return false;
    +    }
    +    
    +    public boolean writeGuidePosts(byte[] row){
    +        return writeGuidePosts(row, 0, 0);
    +    }
    +
    +    public boolean writeGuidePosts(byte[] row, long byteCount){
    +        return writeGuidePosts(row, byteCount, 0);
    +    }
    +
    +    public void close() {
    --- End diff --
    
    Make this private and call at end of build() method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49939265
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfoWriter.java ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stats;
    +
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.util.ByteUtil;
    +import org.apache.phoenix.util.CodecUtils;
    +import org.apache.phoenix.util.PrefixByteEncoder;
    +import org.apache.phoenix.util.TrustedByteArrayOutputStream;
    +
    +/*
    + * Writer to help in writing guidePosts and creating guidePostInfo. This is used when we are collecting stats or reading stats for a table.
    + */
    +
    +public class GuidePostsInfoWriter {
    --- End diff --
    
    Much better, but let's rename to GuidePostsInfoBuilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925674
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -490,15 +492,35 @@ private static String toString(List<byte[]> gps) {
             }
             List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1);
             
    -        byte[] currentKey = startKey;
    -        int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey);
    -        int gpsSize = gps.size();
    +        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length);
    +        
    +        int gpsSize = gps.getGuidePostsCount();
             int estGuidepostsPerRegion = gpsSize == 0 ? 1 : gpsSize / regionLocations.size() + 1;
             int keyOffset = 0;
    +        ImmutableBytesWritable currentGuidePost = ByteUtil.EMPTY_IMMUTABLE_BYTE_ARRAY;
             List<Scan> scans = Lists.newArrayListWithExpectedSize(estGuidepostsPerRegion);
    +        ImmutableBytesWritable guidePosts = gps.getGuidePosts();
    +        ByteArrayInputStream stream = null;
    +        DataInput input = null;
    +        PrefixByteDecoder decoder = null;
    +        int guideIndex = 0;
    +        if (gpsSize > 0) {
    +            stream = new ByteArrayInputStream(guidePosts.get(), guidePosts.getOffset(), guidePosts.getLength());
    +            input = new DataInputStream(stream);
    +            decoder = new PrefixByteDecoder(gps.getMaxLength());
    +            try {
    +                while (currentKey.compareTo(currentGuidePost = CodecUtils.decode(decoder, input)) >= 0
    +                        && currentKey.getLength() != 0) {
    +                    guideIndex++;
    +                }
    +            } catch (EOFException e) {}
    +        }
    +        byte[] currentKeyBytes = currentKey.copyBytes();
    +
             // Merge bisect with guideposts for all but the last region
             while (regionIndex <= stopIndex) {
    -            byte[] currentGuidePost, endKey, endRegionKey = EMPTY_BYTE_ARRAY;
    +            byte[] currentGuidePostBytes = currentGuidePost.copyBytes();
    --- End diff --
    
    And here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49937657
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfo.java ---
    @@ -79,67 +96,66 @@ public long getRowCount() {
         public void incrementRowCount() {
             this.rowCount++;
         }
    -
    -    /**
    -     * Combines the GuidePosts per region into one.
    -     * @param oldInfo
    -     */
    -    public void combine(GuidePostsInfo oldInfo) {
    -        if (!oldInfo.getGuidePosts().isEmpty()) {
    -            byte[] newFirstKey = oldInfo.getGuidePosts().get(0);
    -            byte[] existingLastKey;
    -            if (!this.getGuidePosts().isEmpty()) {
    -                existingLastKey = this.getGuidePosts().get(this.getGuidePosts().size() - 1);
    -            } else {
    -                existingLastKey = HConstants.EMPTY_BYTE_ARRAY;
    -            }
    -            int size = oldInfo.getGuidePosts().size();
    -            // If the existing guidePosts is lesser than the new RegionInfo that we are combining
    -            // then add the new Region info to the end of the current GuidePosts.
    -            // If the new region info is smaller than the existing guideposts then add the existing
    -            // guide posts after the new guideposts.
    -            List<byte[]> newTotalGuidePosts = new ArrayList<byte[]>(this.getGuidePosts().size() + size);
    -            if (Bytes.compareTo(existingLastKey, newFirstKey) <= 0) {
    -                newTotalGuidePosts.addAll(this.getGuidePosts());
    -                newTotalGuidePosts.addAll(oldInfo.getGuidePosts());
    -            } else {
    -                newTotalGuidePosts.addAll(oldInfo.getGuidePosts());
    -                newTotalGuidePosts.addAll(this.getGuidePosts());
    -            }
    -            this.guidePosts = ImmutableList.copyOf(newTotalGuidePosts);
    -        }
    -        this.byteCount += oldInfo.getByteCount();
    -        this.keyByteSize += oldInfo.keyByteSize;
    -        this.rowCount += oldInfo.getRowCount();
    -    }
         
    +    public int getGuidePostsCount() {
    +        return guidePostsCount;
    +    }
    +
         /**
          * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
          * reached while collecting stats.
          * @param row
          * @param byteCount
          * @return
    +     * @throws IOException 
          */
    -    public boolean addGuidePost(byte[] row, long byteCount, long rowCount) {
    -        if (guidePosts.isEmpty() || Bytes.compareTo(row, guidePosts.get(guidePosts.size() - 1)) > 0) {
    -            List<byte[]> newGuidePosts = Lists.newArrayListWithExpectedSize(this.getGuidePosts().size() + 1);
    -            newGuidePosts.addAll(guidePosts);
    -            newGuidePosts.add(row);
    -            this.guidePosts = ImmutableList.copyOf(newGuidePosts);
    -            this.byteCount += byteCount;
    -            this.keyByteSize += row.length;
    -            this.rowCount+=rowCount;
    -            return true;
    +    public boolean encodeAndCollectGuidePost(byte[] row, long byteCount, long rowCount) {
    +        if (row.length != 0 && Bytes.compareTo(lastRow, row) < 0) {
    +            try {
    +                if(!isStreamInitialized){
    +                    stream = new TrustedByteArrayOutputStream(guidePosts.getLength());
    +                    output = new DataOutputStream(stream);
    +                    stream.write(ByteUtil.copyKeyBytesIfNecessary(guidePosts));
    +                    encoder = new PrefixByteEncoder();
    +                    isStreamInitialized=true;
    +                }
    +                encoder.encode(output, row, 0, row.length);
    +                this.byteCount += byteCount;
    +                this.guidePostsCount++;
    +                this.maxLength = encoder.getMaxLength();
    +                this.rowCount += rowCount;
    +                lastRow = row;
    +                return true;
    +            } catch (IOException e) {
    +                return false;
    +            }
             }
             return false;
         }
    -    
    -    public boolean addGuidePost(byte[] row) {
    -        return addGuidePost(row, 0, 0);
    +  
    +    public boolean encodeAndCollectGuidePost(byte[] row){
    +        return encodeAndCollectGuidePost(row, 0, 0);
         }
     
    -    public boolean addGuidePost(byte[] row, long byteCount) {
    -        return addGuidePost(row, byteCount, 0);
    +    public boolean encodeAndCollectGuidePost(byte[] row, long byteCount){
    +        return encodeAndCollectGuidePost(row, byteCount, 0);
         }
     
    +    public void close() {
    --- End diff --
    
    moved it to GuidePostsInfoWriter but it has no affect as ByteArrayOutputStream doesn't have close implementation. so there is also no need to copy buffer from TrustedByteArrayOutputStream as it is available after close also.
    
    creating stream in writer only as it may be easy to maintain and track streams against guidePosts collected at multiple column family.
    
    Updated the code with the guidePostInfoWriter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50165631
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataRegionObserver.java ---
    @@ -96,10 +111,109 @@ public void start(CoprocessorEnvironment env) throws IOException {
             rebuildIndexTimeInterval = env.getConfiguration().getLong(QueryServices.INDEX_FAILURE_HANDLING_REBUILD_INTERVAL_ATTRIB,
                 QueryServicesOptions.DEFAULT_INDEX_FAILURE_HANDLING_REBUILD_INTERVAL);
         }
    -
    +    
    +    private static String getJdbcUrl(RegionCoprocessorEnvironment env) {
    +        String zkQuorum = env.getConfiguration().get(HConstants.ZOOKEEPER_QUORUM);
    +        String zkClientPort = env.getConfiguration().get(HConstants.ZOOKEEPER_CLIENT_PORT,
    +            Integer.toString(HConstants.DEFAULT_ZOOKEPER_CLIENT_PORT));
    +        String zkParentNode = env.getConfiguration().get(HConstants.ZOOKEEPER_ZNODE_PARENT,
    +            HConstants.DEFAULT_ZOOKEEPER_ZNODE_PARENT);
    +        return PhoenixRuntime.JDBC_PROTOCOL + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum
    +            + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkClientPort
    +            + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkParentNode;
    +    }
     
         @Override
         public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
    +        final RegionCoprocessorEnvironment env = e.getEnvironment();
    +
    +        Runnable r = new Runnable() {
    +            @Override
    +            public void run() {
    +                HTableInterface metaTable = null;
    +                HTableInterface statsTable = null;
    +                try {
    +                    Thread.sleep(1000);
    +                    LOG.info("Stats will be deleted for upgrade 4.7 requirement!!");
    +                    metaTable = env.getTable(TableName.valueOf(PhoenixDatabaseMetaData.SYSTEM_CATALOG_NAME));
    +                    List<Cell> columnCells = metaTable
    +                            .get(new Get(SchemaUtil.getTableKey(null, PhoenixDatabaseMetaData.SYSTEM_SCHEMA_NAME,
    +                                    PhoenixDatabaseMetaData.SYSTEM_CATALOG_TABLE)))
    +                            .getColumnCells(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES,
    +                                    QueryConstants.EMPTY_COLUMN_BYTES);
    +                    if (!columnCells.isEmpty()
    +                            && columnCells.get(0).getTimestamp() < MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0) {
    --- End diff --
    
    This if is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49939268
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfoWriter.java ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stats;
    +
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.util.ByteUtil;
    +import org.apache.phoenix.util.CodecUtils;
    +import org.apache.phoenix.util.PrefixByteEncoder;
    +import org.apache.phoenix.util.TrustedByteArrayOutputStream;
    +
    +/*
    + * Writer to help in writing guidePosts and creating guidePostInfo. This is used when we are collecting stats or reading stats for a table.
    + */
    +
    +public class GuidePostsInfoWriter {
    +    private PrefixByteEncoder encoder;
    +    private byte[] lastRow;
    +    private ImmutableBytesWritable guidePosts=new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY);
    +    private long byteCount = 0;
    +    private int guidePostsCount;
    +    
    +    /**
    +     * The rowCount that is flattened across the total number of guide posts.
    +     */
    +    private long rowCount = 0;
    +    
    +    /**
    +     * Maximum length of a guidePost collected
    +     */
    +    private int maxLength;
    +    private DataOutputStream output;
    +    private TrustedByteArrayOutputStream stream;
    +    
    +    public final static GuidePostsInfo EMPTY_GUIDEPOST = new GuidePostsInfo(0,
    +            new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY), 0, 0, 0);
    +
    +    public int getMaxLength() {
    +        return maxLength;
    +    }
    +    public GuidePostsInfoWriter(){
    +        this.stream = new TrustedByteArrayOutputStream(1);
    +        this.output = new DataOutputStream(stream);
    +        this.encoder=new PrefixByteEncoder();
    +        lastRow = ByteUtil.EMPTY_BYTE_ARRAY;
    +    }
    +
    +    /**
    +     * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
    +     * reached while collecting stats.
    +     * @param row
    +     * @param byteCount
    +     * @return
    +     * @throws IOException 
    +     */
    +    public boolean writeGuidePosts( byte[] row, long byteCount, long rowCount) {
    --- End diff --
    
    Rename to addGuidePosts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49882278
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/PTableImpl.java ---
    @@ -1003,26 +1003,16 @@ public static PTable createFromProto(PTableProtos.PTable table) {
           boolean isImmutableRows = table.getIsImmutableRows();
           SortedMap<byte[], GuidePostsInfo> tableGuidePosts = new TreeMap<byte[], GuidePostsInfo>(Bytes.BYTES_COMPARATOR);
             for (PTableProtos.PTableStats pTableStatsProto : table.getGuidePostsList()) {
    -            List<byte[]> value = Lists.newArrayListWithExpectedSize(pTableStatsProto.getValuesCount());
    -            for (int j = 0; j < pTableStatsProto.getValuesCount(); j++) {
    -                value.add(pTableStatsProto.getValues(j).toByteArray());
    -            }
    -            // No op
    -            pTableStatsProto.getGuidePostsByteCount();
    -            value = Lists.newArrayListWithExpectedSize(pTableStatsProto.getValuesCount());
                 PGuidePosts pGuidePosts = pTableStatsProto.getPGuidePosts();
    -            for(int j = 0; j < pGuidePosts.getGuidePostsCount(); j++) {
    -                value.add(pGuidePosts.getGuidePosts(j).toByteArray());
    -            }
                 long guidePostsByteCount = pGuidePosts.getByteCount();
                 long rowCount = pGuidePosts.getRowCount();
    -            // TODO : Not exposing MIN/MAX key outside to client 
    -            GuidePostsInfo info =
    -                    new GuidePostsInfo(guidePostsByteCount, value, rowCount);
    +            int maxLength = pGuidePosts.getMaxLength();
    +            int guidePostsCount = pGuidePosts.getGuidePostsCount();
    +            GuidePostsInfo info = new GuidePostsInfo(guidePostsByteCount,
    +                    new ImmutableBytesWritable(pGuidePosts.getGuidePosts().toByteArray()), rowCount, maxLength, guidePostsCount);
    --- End diff --
    
    Use new ImmutableBytesWritable(HBaseZeroCopyByteString.zeroCopyGetBytes(pGuidePosts.getGuidePosts())) here (and elsewhere) instead to prevent copying the underlying byte array.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49880434
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/compile/ScanRanges.java ---
    @@ -237,7 +237,9 @@ public void initializeScan(Scan scan) {
             return temp;
         }
         
    -    public Scan intersectScan(Scan scan, final byte[] originalStartKey, final byte[] originalStopKey, final int keyOffset, boolean crossesRegionBoundary) {
    +    public Scan intersectScan(Scan scan, final ImmutableBytesWritable originalStartKeyPtr, final ImmutableBytesWritable originalStopKeyPtr, final int keyOffset, boolean crossesRegionBoundary) {
    +        byte[] originalStartKey= originalStartKeyPtr.get();
    +        byte[] originalStopKey= originalStopKeyPtr.get();
    --- End diff --
    
    You can't treat an ImmutableBytesWritable the same as what was a byte[] before because an ImmutableBytesWritable has an offset and a length. By doing it this way, you're assuming that the offset is 0 and the length is byte[].length. Instead, you'd want to change the type and adjust the code as necessary:
    
        ImmutableBytesWritable originalStartKey= originalStartKeyPtr;
        ImmutableBytesWritable originalStopKey = originalStopKeyPtr;



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49880903
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfo.java ---
    @@ -53,22 +70,21 @@
          * @param guidePosts
          * @param rowCount
          */
    -    public GuidePostsInfo(long byteCount, List<byte[]> guidePosts, long rowCount) {
    -        this.guidePosts = ImmutableList.copyOf(guidePosts);
    -        int size = 0;
    -        for (byte[] key : guidePosts) {
    -            size += key.length;
    -        }
    -        this.keyByteSize = size;
    +    public GuidePostsInfo(long byteCount, ImmutableBytesWritable guidePosts, long rowCount, int maxLength, int guidePostsCount) {
    +        this.guidePosts = guidePosts;
    --- End diff --
    
    Best to make a copy of the ImmutableBytesWritable here in case the caller uses the ptr again. Note that this doesn't copy the underlying bytes:
    
        this.guidePosts = new ImmutableBytesWritable(guidePosts);


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925866
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfo.java ---
    @@ -79,67 +96,66 @@ public long getRowCount() {
         public void incrementRowCount() {
             this.rowCount++;
         }
    -
    -    /**
    -     * Combines the GuidePosts per region into one.
    -     * @param oldInfo
    -     */
    -    public void combine(GuidePostsInfo oldInfo) {
    -        if (!oldInfo.getGuidePosts().isEmpty()) {
    -            byte[] newFirstKey = oldInfo.getGuidePosts().get(0);
    -            byte[] existingLastKey;
    -            if (!this.getGuidePosts().isEmpty()) {
    -                existingLastKey = this.getGuidePosts().get(this.getGuidePosts().size() - 1);
    -            } else {
    -                existingLastKey = HConstants.EMPTY_BYTE_ARRAY;
    -            }
    -            int size = oldInfo.getGuidePosts().size();
    -            // If the existing guidePosts is lesser than the new RegionInfo that we are combining
    -            // then add the new Region info to the end of the current GuidePosts.
    -            // If the new region info is smaller than the existing guideposts then add the existing
    -            // guide posts after the new guideposts.
    -            List<byte[]> newTotalGuidePosts = new ArrayList<byte[]>(this.getGuidePosts().size() + size);
    -            if (Bytes.compareTo(existingLastKey, newFirstKey) <= 0) {
    -                newTotalGuidePosts.addAll(this.getGuidePosts());
    -                newTotalGuidePosts.addAll(oldInfo.getGuidePosts());
    -            } else {
    -                newTotalGuidePosts.addAll(oldInfo.getGuidePosts());
    -                newTotalGuidePosts.addAll(this.getGuidePosts());
    -            }
    -            this.guidePosts = ImmutableList.copyOf(newTotalGuidePosts);
    -        }
    -        this.byteCount += oldInfo.getByteCount();
    -        this.keyByteSize += oldInfo.keyByteSize;
    -        this.rowCount += oldInfo.getRowCount();
    -    }
         
    +    public int getGuidePostsCount() {
    +        return guidePostsCount;
    +    }
    +
         /**
          * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
          * reached while collecting stats.
          * @param row
          * @param byteCount
          * @return
    +     * @throws IOException 
          */
    -    public boolean addGuidePost(byte[] row, long byteCount, long rowCount) {
    -        if (guidePosts.isEmpty() || Bytes.compareTo(row, guidePosts.get(guidePosts.size() - 1)) > 0) {
    -            List<byte[]> newGuidePosts = Lists.newArrayListWithExpectedSize(this.getGuidePosts().size() + 1);
    -            newGuidePosts.addAll(guidePosts);
    -            newGuidePosts.add(row);
    -            this.guidePosts = ImmutableList.copyOf(newGuidePosts);
    -            this.byteCount += byteCount;
    -            this.keyByteSize += row.length;
    -            this.rowCount+=rowCount;
    -            return true;
    +    public boolean encodeAndCollectGuidePost(byte[] row, long byteCount, long rowCount) {
    --- End diff --
    
    This feels a little funky here. How about if we have a GuidePostsInfoBuilder or GuidePostsInfoWriter with a writeGuidePost(DataOutput out, byte[] row, long byteCount, long rowCount) method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925890
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfo.java ---
    @@ -79,67 +96,66 @@ public long getRowCount() {
         public void incrementRowCount() {
             this.rowCount++;
         }
    -
    -    /**
    -     * Combines the GuidePosts per region into one.
    -     * @param oldInfo
    -     */
    -    public void combine(GuidePostsInfo oldInfo) {
    -        if (!oldInfo.getGuidePosts().isEmpty()) {
    -            byte[] newFirstKey = oldInfo.getGuidePosts().get(0);
    -            byte[] existingLastKey;
    -            if (!this.getGuidePosts().isEmpty()) {
    -                existingLastKey = this.getGuidePosts().get(this.getGuidePosts().size() - 1);
    -            } else {
    -                existingLastKey = HConstants.EMPTY_BYTE_ARRAY;
    -            }
    -            int size = oldInfo.getGuidePosts().size();
    -            // If the existing guidePosts is lesser than the new RegionInfo that we are combining
    -            // then add the new Region info to the end of the current GuidePosts.
    -            // If the new region info is smaller than the existing guideposts then add the existing
    -            // guide posts after the new guideposts.
    -            List<byte[]> newTotalGuidePosts = new ArrayList<byte[]>(this.getGuidePosts().size() + size);
    -            if (Bytes.compareTo(existingLastKey, newFirstKey) <= 0) {
    -                newTotalGuidePosts.addAll(this.getGuidePosts());
    -                newTotalGuidePosts.addAll(oldInfo.getGuidePosts());
    -            } else {
    -                newTotalGuidePosts.addAll(oldInfo.getGuidePosts());
    -                newTotalGuidePosts.addAll(this.getGuidePosts());
    -            }
    -            this.guidePosts = ImmutableList.copyOf(newTotalGuidePosts);
    -        }
    -        this.byteCount += oldInfo.getByteCount();
    -        this.keyByteSize += oldInfo.keyByteSize;
    -        this.rowCount += oldInfo.getRowCount();
    -    }
         
    +    public int getGuidePostsCount() {
    +        return guidePostsCount;
    +    }
    +
         /**
          * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
          * reached while collecting stats.
          * @param row
          * @param byteCount
          * @return
    +     * @throws IOException 
          */
    -    public boolean addGuidePost(byte[] row, long byteCount, long rowCount) {
    -        if (guidePosts.isEmpty() || Bytes.compareTo(row, guidePosts.get(guidePosts.size() - 1)) > 0) {
    -            List<byte[]> newGuidePosts = Lists.newArrayListWithExpectedSize(this.getGuidePosts().size() + 1);
    -            newGuidePosts.addAll(guidePosts);
    -            newGuidePosts.add(row);
    -            this.guidePosts = ImmutableList.copyOf(newGuidePosts);
    -            this.byteCount += byteCount;
    -            this.keyByteSize += row.length;
    -            this.rowCount+=rowCount;
    -            return true;
    +    public boolean encodeAndCollectGuidePost(byte[] row, long byteCount, long rowCount) {
    +        if (row.length != 0 && Bytes.compareTo(lastRow, row) < 0) {
    +            try {
    +                if(!isStreamInitialized){
    +                    stream = new TrustedByteArrayOutputStream(guidePosts.getLength());
    +                    output = new DataOutputStream(stream);
    +                    stream.write(ByteUtil.copyKeyBytesIfNecessary(guidePosts));
    +                    encoder = new PrefixByteEncoder();
    +                    isStreamInitialized=true;
    +                }
    +                encoder.encode(output, row, 0, row.length);
    +                this.byteCount += byteCount;
    +                this.guidePostsCount++;
    +                this.maxLength = encoder.getMaxLength();
    +                this.rowCount += rowCount;
    +                lastRow = row;
    +                return true;
    +            } catch (IOException e) {
    +                return false;
    +            }
             }
             return false;
         }
    -    
    -    public boolean addGuidePost(byte[] row) {
    -        return addGuidePost(row, 0, 0);
    +  
    +    public boolean encodeAndCollectGuidePost(byte[] row){
    +        return encodeAndCollectGuidePost(row, 0, 0);
         }
     
    -    public boolean addGuidePost(byte[] row, long byteCount) {
    -        return addGuidePost(row, byteCount, 0);
    +    public boolean encodeAndCollectGuidePost(byte[] row, long byteCount){
    +        return encodeAndCollectGuidePost(row, byteCount, 0);
         }
     
    +    public void close() {
    --- End diff --
    
    This would move to the GuidePostsInfoWriter class or maybe be unnecessary at all as the stream would be created outside of the class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925628
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -490,15 +492,35 @@ private static String toString(List<byte[]> gps) {
             }
             List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1);
             
    -        byte[] currentKey = startKey;
    -        int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey);
    -        int gpsSize = gps.size();
    +        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length);
    --- End diff --
    
    Minor nit: you can just do this instead:
    
        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey);


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49939275
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stats/GuidePostsInfoWriter.java ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stats;
    +
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.util.ByteUtil;
    +import org.apache.phoenix.util.CodecUtils;
    +import org.apache.phoenix.util.PrefixByteEncoder;
    +import org.apache.phoenix.util.TrustedByteArrayOutputStream;
    +
    +/*
    + * Writer to help in writing guidePosts and creating guidePostInfo. This is used when we are collecting stats or reading stats for a table.
    + */
    +
    +public class GuidePostsInfoWriter {
    +    private PrefixByteEncoder encoder;
    +    private byte[] lastRow;
    +    private ImmutableBytesWritable guidePosts=new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY);
    +    private long byteCount = 0;
    +    private int guidePostsCount;
    +    
    +    /**
    +     * The rowCount that is flattened across the total number of guide posts.
    +     */
    +    private long rowCount = 0;
    +    
    +    /**
    +     * Maximum length of a guidePost collected
    +     */
    +    private int maxLength;
    +    private DataOutputStream output;
    +    private TrustedByteArrayOutputStream stream;
    +    
    +    public final static GuidePostsInfo EMPTY_GUIDEPOST = new GuidePostsInfo(0,
    +            new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY), 0, 0, 0);
    +
    +    public int getMaxLength() {
    +        return maxLength;
    +    }
    +    public GuidePostsInfoWriter(){
    +        this.stream = new TrustedByteArrayOutputStream(1);
    +        this.output = new DataOutputStream(stream);
    +        this.encoder=new PrefixByteEncoder();
    +        lastRow = ByteUtil.EMPTY_BYTE_ARRAY;
    +    }
    +
    +    /**
    +     * The guide posts, rowCount and byteCount are accumulated every time a guidePosts depth is
    +     * reached while collecting stats.
    +     * @param row
    +     * @param byteCount
    +     * @return
    +     * @throws IOException 
    +     */
    +    public boolean writeGuidePosts( byte[] row, long byteCount, long rowCount) {
    +        if (row.length != 0 && Bytes.compareTo(lastRow, row) < 0) {
    +            try {
    +                encoder.encode(output, row, 0, row.length);
    +                this.byteCount += byteCount;
    +                this.guidePostsCount++;
    +                this.maxLength = encoder.getMaxLength();
    +                this.rowCount += rowCount;
    +                lastRow = row;
    +                return true;
    +            } catch (IOException e) {
    +                return false;
    +            }
    +        }
    +        return false;
    +    }
    +    
    +    public boolean writeGuidePosts(byte[] row){
    +        return writeGuidePosts(row, 0, 0);
    +    }
    +
    +    public boolean writeGuidePosts(byte[] row, long byteCount){
    --- End diff --
    
    And here: addGuidePosts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50166279
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataRegionObserver.java ---
    @@ -69,6 +82,8 @@
         protected ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(1);
         private boolean enableRebuildIndex = QueryServicesOptions.DEFAULT_INDEX_FAILURE_HANDLING_REBUILD;
         private long rebuildIndexTimeInterval = QueryServicesOptions.DEFAULT_INDEX_FAILURE_HANDLING_REBUILD_INTERVAL;
    +    public static final byte[] UPGRADE_TO_4_7_COLUMN_NAME = Bytes.toBytes("UPGRADE_TO_4_7");
    +    public static final String UPGRADE_TO_4_7_COLUMN_VALUE = "TRUNCATED_4_7";
    --- End diff --
    
    Minor nit - no need for a column value. Just using ByteUtil.EMPTY_BYTE_ARRAY.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50245176
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataRegionObserver.java ---
    @@ -96,10 +111,109 @@ public void start(CoprocessorEnvironment env) throws IOException {
             rebuildIndexTimeInterval = env.getConfiguration().getLong(QueryServices.INDEX_FAILURE_HANDLING_REBUILD_INTERVAL_ATTRIB,
                 QueryServicesOptions.DEFAULT_INDEX_FAILURE_HANDLING_REBUILD_INTERVAL);
         }
    -
    +    
    +    private static String getJdbcUrl(RegionCoprocessorEnvironment env) {
    +        String zkQuorum = env.getConfiguration().get(HConstants.ZOOKEEPER_QUORUM);
    +        String zkClientPort = env.getConfiguration().get(HConstants.ZOOKEEPER_CLIENT_PORT,
    +            Integer.toString(HConstants.DEFAULT_ZOOKEPER_CLIENT_PORT));
    +        String zkParentNode = env.getConfiguration().get(HConstants.ZOOKEEPER_ZNODE_PARENT,
    +            HConstants.DEFAULT_ZOOKEEPER_ZNODE_PARENT);
    +        return PhoenixRuntime.JDBC_PROTOCOL + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum
    +            + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkClientPort
    +            + PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkParentNode;
    +    }
     
         @Override
         public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
    +        final RegionCoprocessorEnvironment env = e.getEnvironment();
    +
    +        Runnable r = new Runnable() {
    +            @Override
    +            public void run() {
    +                HTableInterface metaTable = null;
    +                HTableInterface statsTable = null;
    +                try {
    +                    Thread.sleep(1000);
    +                    LOG.info("Stats will be deleted for upgrade 4.7 requirement!!");
    +                    metaTable = env.getTable(TableName.valueOf(PhoenixDatabaseMetaData.SYSTEM_CATALOG_NAME));
    +                    List<Cell> columnCells = metaTable
    +                            .get(new Get(SchemaUtil.getTableKey(null, PhoenixDatabaseMetaData.SYSTEM_SCHEMA_NAME,
    +                                    PhoenixDatabaseMetaData.SYSTEM_CATALOG_TABLE)))
    +                            .getColumnCells(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES,
    +                                    QueryConstants.EMPTY_COLUMN_BYTES);
    +                    if (!columnCells.isEmpty()
    +                            && columnCells.get(0).getTimestamp() < MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0) {
    --- End diff --
    
    I think this is needed to prevent system.stats being deleted after the client upgrade also as upgrade in connectionQueryServicesImpl will delete all the rows of system.stats from catalog including the dummy column. WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/147#issuecomment-172037773
  
    Thanks for the patch, @ankitsinghal. The main remaining issues are around not treating an ImmutableBytesWritable the same as a byte[]. Maybe easiest for now to have BaseResultIterators.getParallelScans() declare currentKey as an ImmutableBytesWritable, but keep
    
        // Do this as infrequently as possible to prevent a copy of the backing byte array
        byte[] currentKeyBytes = SchemaUtil.copyKeyIfNecessary(currentKey);
        byte[] currentGuidePostBytes = SchemaUtil.copyKeyIfNecessary(currentGuidePost);
    
        Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes, keyOffset, false);
        scans = addNewScan(parallelScans, scans, newScan, currentGuidePostBytes, false, regionLocation);



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50284550
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/UpgradeUtil.java ---
    @@ -1215,4 +1218,67 @@ public static void addRowKeyOrderOptimizableCell(List<Mutation> tableMetadata, b
                     MetaDataEndpointImpl.ROW_KEY_ORDER_OPTIMIZABLE_BYTES, PBoolean.INSTANCE.toBytes(true));
             tableMetadata.add(put);
         }
    +
    +    public static boolean truncateStats(HTableInterface metaTable, HTableInterface statsTable)
    +            throws IOException, InterruptedException {
    +        List<Cell> columnCells = metaTable
    +                .get(new Get(SchemaUtil.getTableKey(null, PhoenixDatabaseMetaData.SYSTEM_SCHEMA_NAME,
    +                        PhoenixDatabaseMetaData.SYSTEM_CATALOG_TABLE)))
    +                .getColumnCells(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, QueryConstants.EMPTY_COLUMN_BYTES);
    +        if (!columnCells.isEmpty()
    +                && columnCells.get(0).getTimestamp() < MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0) {
    --- End diff --
    
    Actually, I take it back. Your way is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r50162555
  
    --- Diff: phoenix-protocol/src/main/PGuidePosts.proto ---
    @@ -24,7 +24,9 @@ option java_generic_services = true;
     option java_generate_equals_and_hash = true;
     option optimize_for = SPEED;
     message PGuidePosts {
    -  repeated bytes guidePosts = 1;
    --- End diff --
    
    Is this protobuf change required? Will it cause b/w compat issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on the pull request:

    https://github.com/apache/phoenix/pull/147#issuecomment-172089737
  
    Thanks @JamesRTaylor  for the review.
    I have made the changes you have suggested above except this one.
    byte[] currentGuidePostBytes = SchemaUtil.copyKeyIfNecessary(currentGuidePost);
    
    As PrefixByteDecoder updates the previous buffer only whenever maxLength is passed as a part of optimization.
    
    public ImmutableBytesWritable decode(DataInput in) throws IOException {
            int prefixLen = WritableUtils.readVInt(in);
            int suffixLen = WritableUtils.readVInt(in);
            int length = prefixLen + suffixLen;
            byte[] b;
            if (maxLength == -1) { // Allocate new byte array each time
                b = new byte[length];
                System.arraycopy(previous.get(), previous.getOffset(), b, 0, prefixLen);
            } else { // Reuse same buffer each time
                b = previous.get();
            }
            in.readFully(b, prefixLen, suffixLen);
            previous.set(b, 0, length);
            return previous;
        }
    
    so I need to copy bytes even if the length of the ImmutableByteWritable is equal to byte[] contained in it.
    
    What do you think about this?
    
    And also I have fixed the failure test cases as well. there was logical operator problem while incrementing 
    guidePosts till the start key.
    
    
    I have run the complete test suite now and will confirm you once it is completed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925693
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -510,14 +532,18 @@ private static String toString(List<byte[]> gps) {
                     endRegionKey = regionInfo.getEndKey();
                     keyOffset = ScanUtil.getRowKeyOffset(regionInfo.getStartKey(), endRegionKey);
                 }
    -            while (guideIndex < gpsSize
    -                    && (Bytes.compareTo(currentGuidePost = gps.get(guideIndex), endKey) <= 0 || endKey.length == 0)) {
    -                Scan newScan = scanRanges.intersectScan(scan, currentKey, currentGuidePost, keyOffset, false);
    -                scans = addNewScan(parallelScans, scans, newScan, currentGuidePost, false, regionLocation);
    -                currentKey = currentGuidePost;
    -                guideIndex++;
    -            }
    -            Scan newScan = scanRanges.intersectScan(scan, currentKey, endKey, keyOffset, true);
    +            try {
    +                while (guideIndex < gpsSize && (currentGuidePost.compareTo(endKey) <= 0 || endKey.length == 0)) {
    +                    Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes, keyOffset,
    +                            false);
    +                    scans = addNewScan(parallelScans, scans, newScan, currentGuidePostBytes, false, regionLocation);
    +                    currentKeyBytes = currentGuidePost.copyBytes();
    +                    currentGuidePost = CodecUtils.decode(decoder, input);
    +                    currentGuidePostBytes = currentGuidePost.copyBytes();
    --- End diff --
    
    And here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal closed the pull request at:

    https://github.com/apache/phoenix/pull/147


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925667
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -490,15 +492,35 @@ private static String toString(List<byte[]> gps) {
             }
             List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1);
             
    -        byte[] currentKey = startKey;
    -        int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey);
    -        int gpsSize = gps.size();
    +        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length);
    +        
    +        int gpsSize = gps.getGuidePostsCount();
             int estGuidepostsPerRegion = gpsSize == 0 ? 1 : gpsSize / regionLocations.size() + 1;
             int keyOffset = 0;
    +        ImmutableBytesWritable currentGuidePost = ByteUtil.EMPTY_IMMUTABLE_BYTE_ARRAY;
             List<Scan> scans = Lists.newArrayListWithExpectedSize(estGuidepostsPerRegion);
    +        ImmutableBytesWritable guidePosts = gps.getGuidePosts();
    +        ByteArrayInputStream stream = null;
    +        DataInput input = null;
    +        PrefixByteDecoder decoder = null;
    +        int guideIndex = 0;
    +        if (gpsSize > 0) {
    +            stream = new ByteArrayInputStream(guidePosts.get(), guidePosts.getOffset(), guidePosts.getLength());
    +            input = new DataInputStream(stream);
    +            decoder = new PrefixByteDecoder(gps.getMaxLength());
    +            try {
    +                while (currentKey.compareTo(currentGuidePost = CodecUtils.decode(decoder, input)) >= 0
    +                        && currentKey.getLength() != 0) {
    +                    guideIndex++;
    +                }
    +            } catch (EOFException e) {}
    +        }
    +        byte[] currentKeyBytes = currentKey.copyBytes();
    --- End diff --
    
    Always use ByteUtil.copyKeyBytesIfNecessary(currentKey) instead of copyBytes() as it's often not necessary to actually copy the bytes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49925601
  
    --- Diff: phoenix-protocol/src/main/PTable.proto ---
    @@ -52,11 +52,12 @@ message PColumn {
     
     message PTableStats {
       required bytes key = 1;
    -  repeated bytes values = 2;
    +  optional bytes guidePosts = 2;
    --- End diff --
    
    This might be a b/w compat issue. Safest would be to add guidePosts at the end and keep values as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/147#issuecomment-172148769
  
    Looks good functionally now, @ankitsinghal. Just needs a little bit of cleanup. Nice work!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2417 Compress memory used by row key...

Posted by ankitsinghal <gi...@git.apache.org>.
Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/147#discussion_r49932688
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java ---
    @@ -490,15 +492,35 @@ private static String toString(List<byte[]> gps) {
             }
             List<List<Scan>> parallelScans = Lists.newArrayListWithExpectedSize(stopIndex - regionIndex + 1);
             
    -        byte[] currentKey = startKey;
    -        int guideIndex = currentKey.length == 0 ? 0 : getIndexContainingInclusive(gps, currentKey);
    -        int gpsSize = gps.size();
    +        ImmutableBytesWritable currentKey = new ImmutableBytesWritable(startKey, 0, startKey.length);
    +        
    +        int gpsSize = gps.getGuidePostsCount();
             int estGuidepostsPerRegion = gpsSize == 0 ? 1 : gpsSize / regionLocations.size() + 1;
             int keyOffset = 0;
    +        ImmutableBytesWritable currentGuidePost = ByteUtil.EMPTY_IMMUTABLE_BYTE_ARRAY;
             List<Scan> scans = Lists.newArrayListWithExpectedSize(estGuidepostsPerRegion);
    +        ImmutableBytesWritable guidePosts = gps.getGuidePosts();
    +        ByteArrayInputStream stream = null;
    +        DataInput input = null;
    +        PrefixByteDecoder decoder = null;
    +        int guideIndex = 0;
    +        if (gpsSize > 0) {
    +            stream = new ByteArrayInputStream(guidePosts.get(), guidePosts.getOffset(), guidePosts.getLength());
    +            input = new DataInputStream(stream);
    +            decoder = new PrefixByteDecoder(gps.getMaxLength());
    +            try {
    +                while (currentKey.compareTo(currentGuidePost = CodecUtils.decode(decoder, input)) >= 0
    +                        && currentKey.getLength() != 0) {
    +                    guideIndex++;
    +                }
    +            } catch (EOFException e) {}
    +        }
    +        byte[] currentKeyBytes = currentKey.copyBytes();
    --- End diff --
    
    But @JamesRTaylor , here the problem is that PrefixByteDecoder reused the same byte[] for subsequent values too.
    
    public ImmutableBytesWritable decode(DataInput in) throws IOException {
    int prefixLen = WritableUtils.readVInt(in);
    int suffixLen = WritableUtils.readVInt(in);
    int length = prefixLen + suffixLen;
    byte[] b;
    if (maxLength == -1) { // Allocate new byte array each time
    b = new byte[length];
    System.arraycopy(previous.get(), previous.getOffset(), b, 0, prefixLen);
    } else { // Reuse same buffer each time
    b = previous.get();
    }
    in.readFully(b, prefixLen, suffixLen);
    previous.set(b, 0, length);
    return previous;
    }
    
    And, ByteUtil.copyKeyBytesIfNecessary(currentKey) copies bytes only when ImmutableByteWritable length is not equal to contained byte[] length.
    
    So ,we need to copyBytes everytime as the subsequent call to decode will overwrite the same array object.
    
    If you agree on above below comments will follow the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---