You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tephra.apache.org by poornachandra <gi...@git.apache.org> on 2016/11/02 23:34:35 UTC

[GitHub] incubator-tephra pull request #19: Save compaction state for pruning invalid...

GitHub user poornachandra opened a pull request:

    https://github.com/apache/incubator-tephra/pull/19

    Save compaction state for pruning invalid list

    JIRA - https://issues.apache.org/jira/browse/TEPHRA-35
    
    Adds ability to save prune upper bound from the transaction snapshot used for compaction.
    
    Note that the first two commits are re-factoring existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/poornachandra/incubator-tephra feature/transaction-pruning

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tephra/pull/19.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19
    
----
commit 3075cb3cf1b2b52c8946f18e9adec21e8a90d589
Author: poorna <po...@cask.co>
Date:   2016-10-28T22:12:23Z

    Save compaction state for pruning invalid list

commit be048335024fe03ec567090f0dc2c121d9bff08a
Author: poorna <po...@cask.co>
Date:   2016-10-29T00:46:01Z

    Refactor existing test

commit 40ab5259722e8e138524c81b90bac2a16d455d24
Author: poorna <po...@cask.co>
Date:   2016-11-01T03:59:23Z

    Refactor createTable to not add transaction co-processor by default

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by anew <gi...@git.apache.org>.
Github user anew commented on a diff in the pull request:

    https://github.com/apache/incubator-tephra/pull/19#discussion_r86656252
  
    --- Diff: tephra-core/src/main/java/org/apache/tephra/util/TxUtils.java ---
    @@ -149,4 +149,15 @@ private static long getMaxTTL(Map<byte[], Long> ttlByFamily) {
       public static boolean isPreExistingVersion(long version) {
         return version < MAX_NON_TX_TIMESTAMP;
       }
    +
    +  /**
    +   * Returns the maximum transaction that can be removed from the invalid list for the state represented by the given
    +   * transaction.
    +   */
    +  public static long getPruneUpperBound(Transaction tx) {
    +    long maxInvalidTx =
    +      tx.getInvalids().length > 0 ? tx.getInvalids()[tx.getInvalids().length - 1] : Transaction.NO_TX_IN_PROGRESS;
    +    long firstInProgress = tx.getFirstInProgress();
    +    return Math.min(maxInvalidTx, firstInProgress - 1);
    --- End diff --
    
    if there is no invalid tx, and also no in-progress, the upper bound would still be the current tx id? That is, return tx.getTransactionId()-1? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra issue #19: TEPHRA-35 Save compaction state for pruning inva...

Posted by anew <gi...@git.apache.org>.
Github user anew commented on the issue:

    https://github.com/apache/incubator-tephra/pull/19
  
    Question: Suppose I start a transaction, which times out, and therefore goes into the invalid list. A little later HBase performs a major compaction. This transaction and all its writes are removed from the table by the DataJanitor. A little later TxManager prunes its invalid transactions, and because this tx has been removed from HBase, it removes it from the invalid list. 
    
    The problem is if the program that started the transaction is still running. What if it performs another write after the transaction pruning? This would be an invalid version, but now it has been pruned from the invalid list and becomes visible. 
    
    Isn't that a problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by anew <gi...@git.apache.org>.
Github user anew commented on a diff in the pull request:

    https://github.com/apache/incubator-tephra/pull/19#discussion_r86656117
  
    --- Diff: tephra-core/src/main/java/org/apache/tephra/TxConstants.java ---
    @@ -345,4 +345,14 @@
         public static final byte CURRENT_VERSION = 3;
       }
     
    +  /**
    +   * Configuration for data janitor
    +   */
    +  public static final class DataJanitor {
    +    public static final String PRUNE_ENABLE = "data.tx.prune.enable";
    +    public static final String PRUNE_STATE_TABLE = "data.tx.prune.state.table";
    +
    +    public static final boolean DEFAULT_PRUNE_ENABLE = false;
    +    public static final String DEFAULT_PRUNE_STATE_TABLE = "default:data_tx_janitor_state";
    --- End diff --
    
    I am not sure whether all supported HBase versions support namespaces... this assumes we have a default namespace?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by anew <gi...@git.apache.org>.
Github user anew commented on a diff in the pull request:

    https://github.com/apache/incubator-tephra/pull/19#discussion_r86656266
  
    --- Diff: tephra-hbase-compat-1.1-base/pom.xml ---
    @@ -28,6 +28,11 @@
       <artifactId>tephra-hbase-compat-1.1-base</artifactId>
       <name>Apache Tephra HBase 1.1 Compatibility Base</name>
     
    +  <properties>
    --- End diff --
    
    unrelated change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by poornachandra <gi...@git.apache.org>.
Github user poornachandra commented on a diff in the pull request:

    https://github.com/apache/incubator-tephra/pull/19#discussion_r86889803
  
    --- Diff: tephra-hbase-compat-1.1-base/pom.xml ---
    @@ -28,6 +28,11 @@
       <artifactId>tephra-hbase-compat-1.1-base</artifactId>
       <name>Apache Tephra HBase 1.1 Compatibility Base</name>
     
    +  <properties>
    --- End diff --
    
    Yes - it is required to run the tests through IDE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra issue #19: TEPHRA-35 Save compaction state for pruning inva...

Posted by anew <gi...@git.apache.org>.
Github user anew commented on the issue:

    https://github.com/apache/incubator-tephra/pull/19
  
    @poornachandra are you planning to add this maximum duration as part of this implementation? Or is that a future improvement?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by anew <gi...@git.apache.org>.
Github user anew commented on a diff in the pull request:

    https://github.com/apache/incubator-tephra/pull/19#discussion_r86933962
  
    --- Diff: tephra-hbase-compat-1.1-base/src/test/java/org/apache/tephra/hbase/InvalidListPruneTest.java ---
    @@ -185,6 +186,12 @@ public TransactionStateCache get() {
        */
       @SuppressWarnings("WeakerAccess")
       public static class InMemoryTransactionStateCache extends TransactionStateCache {
    +    private static TransactionVisibilityState transactionSnapshot;
    +
    +    public static void setTransactionSnapshot(TransactionVisibilityState transactionSnapshot) {
    +      InMemoryTransactionStateCache.transactionSnapshot = transactionSnapshot;
    +    }
    --- End diff --
    
    much nicer \U0001f44d 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-tephra/pull/19


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra issue #19: TEPHRA-35 Save compaction state for pruning inva...

Posted by poornachandra <gi...@git.apache.org>.
Github user poornachandra commented on the issue:

    https://github.com/apache/incubator-tephra/pull/19
  
    @anew I'll add the maximum duration check as a separate PR. I'll file a JIRA for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by anew <gi...@git.apache.org>.
Github user anew commented on a diff in the pull request:

    https://github.com/apache/incubator-tephra/pull/19#discussion_r86656565
  
    --- Diff: tephra-hbase-compat-1.1-base/src/test/java/org/apache/tephra/hbase/InvalidListPruneTest.java ---
    @@ -0,0 +1,203 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +package org.apache.tephra.hbase;
    +
    +import com.google.common.base.Supplier;
    +import com.google.common.collect.ImmutableSet;
    +import com.google.common.collect.ImmutableSortedMap;
    +import org.apache.hadoop.hbase.HBaseConfiguration;
    +import org.apache.hadoop.hbase.HRegionLocation;
    +import org.apache.hadoop.hbase.TableName;
    +import org.apache.hadoop.hbase.client.HTable;
    +import org.apache.hadoop.hbase.client.Put;
    +import org.apache.hadoop.hbase.client.Table;
    +import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.tephra.TransactionContext;
    +import org.apache.tephra.TransactionManager;
    +import org.apache.tephra.TransactionType;
    +import org.apache.tephra.TxConstants;
    +import org.apache.tephra.coprocessor.TransactionStateCache;
    +import org.apache.tephra.hbase.coprocessor.TransactionProcessor;
    +import org.apache.tephra.hbase.coprocessor.janitor.DataJanitorState;
    +import org.apache.tephra.inmemory.InMemoryTxSystemClient;
    +import org.apache.tephra.metrics.TxMetricsCollector;
    +import org.apache.tephra.persist.InMemoryTransactionStateStorage;
    +import org.apache.tephra.persist.TransactionSnapshot;
    +import org.apache.tephra.persist.TransactionStateStorage;
    +import org.apache.tephra.persist.TransactionVisibilityState;
    +import org.junit.After;
    +import org.junit.AfterClass;
    +import org.junit.Assert;
    +import org.junit.Before;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +import java.io.IOException;
    +import java.util.Collections;
    +
    +/**
    + * Test invalid list pruning
    + */
    +public class InvalidListPruneTest extends AbstractHBaseTableTest {
    +  private static final byte[] family = Bytes.toBytes("f1");
    +  private static final byte[] qualifier = Bytes.toBytes("col1");
    +
    +  private static TableName dataTable;
    +  private static TableName pruneStateTable;
    +  private static TransactionSnapshot transactionSnapshot;
    +
    +  // Override AbstractHBaseTableTest.startMiniCluster to setup configuration
    +  @BeforeClass
    +  public static void startMiniCluster() throws Exception {
    +    // Setup the configuration to start HBase cluster with the invalid list pruning enabled
    +    conf = HBaseConfiguration.create();
    +    conf.setBoolean(TxConstants.DataJanitor.PRUNE_ENABLE, true);
    +    AbstractHBaseTableTest.startMiniCluster();
    +
    +    TransactionStateStorage txStateStorage = new InMemoryTransactionStateStorage();
    +    TransactionManager txManager = new TransactionManager(conf, txStateStorage, new TxMetricsCollector());
    +    txManager.startAndWait();
    +
    +    // Do some transactional data operations
    +    dataTable = TableName.valueOf("invalidListPruneTestTable");
    +    HTable hTable = createTable(dataTable.getName(), new byte[][]{family}, false,
    +                                Collections.singletonList(TestTransactionProcessor.class.getName()));
    +    try (TransactionAwareHTable txTable = new TransactionAwareHTable(hTable, TxConstants.ConflictDetection.ROW)) {
    +      TransactionContext txContext = new TransactionContext(new InMemoryTxSystemClient(txManager), txTable);
    +      txContext.start();
    +      for(int i = 0; i < 10; ++i) {
    +        txTable.put(new Put(Bytes.toBytes(i)).addColumn(family, qualifier, Bytes.toBytes(i)));
    +      }
    +      txContext.finish();
    +    }
    +
    +    testUtil.flush(dataTable);
    +    txManager.stopAndWait();
    +
    +    pruneStateTable = TableName.valueOf(conf.get(TxConstants.DataJanitor.PRUNE_STATE_TABLE,
    +                                                 TxConstants.DataJanitor.DEFAULT_PRUNE_STATE_TABLE));
    +  }
    +
    +  @AfterClass
    +  public static void shutdownAfterClass() throws Exception {
    +    hBaseAdmin.disableTable(dataTable);
    +    hBaseAdmin.deleteTable(dataTable);
    +  }
    +
    +  @Before
    +  public void beforeTest() throws Exception {
    +    HTable table = createTable(pruneStateTable.getName(), new byte[][]{DataJanitorState.FAMILY}, false,
    +                               // Prune state table is a non-transactional table, hence no transaction co-processor
    +                               Collections.<String>emptyList());
    +    table.close();
    +  }
    +
    +  @After
    +  public void afterTest() throws Exception {
    +    hBaseAdmin.disableTable(pruneStateTable);
    +    hBaseAdmin.deleteTable(pruneStateTable);
    +  }
    +
    +  @Test
    +  public void testRecordCompactionState() throws Exception {
    +    DataJanitorState dataJanitorState =
    +      new DataJanitorState(new DataJanitorState.TableSupplier() {
    +        @Override
    +        public Table get() throws IOException {
    +          return testUtil.getConnection().getTable(pruneStateTable);
    +        }
    +      });
    +
    +    // No prune upper bound initially
    +    Assert.assertEquals(-1, dataJanitorState.getPruneUpperBound(getRegionName(dataTable, Bytes.toBytes(0))));
    +
    +    // Create a new transaction snapshot
    +    transactionSnapshot = new TransactionSnapshot(100, 100, 100, ImmutableSet.of(50L),
    +                                                  ImmutableSortedMap.<Long, TransactionManager.InProgressTx>of());
    --- End diff --
    
    I am not sure how this works... why does creating a snapshot change the transaction state in the coprocessor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by poornachandra <gi...@git.apache.org>.
Github user poornachandra commented on a diff in the pull request:

    https://github.com/apache/incubator-tephra/pull/19#discussion_r86888243
  
    --- Diff: tephra-core/src/main/java/org/apache/tephra/TxConstants.java ---
    @@ -345,4 +345,14 @@
         public static final byte CURRENT_VERSION = 3;
       }
     
    +  /**
    +   * Configuration for data janitor
    +   */
    +  public static final class DataJanitor {
    +    public static final String PRUNE_ENABLE = "data.tx.prune.enable";
    +    public static final String PRUNE_STATE_TABLE = "data.tx.prune.state.table";
    +
    +    public static final boolean DEFAULT_PRUNE_ENABLE = false;
    +    public static final String DEFAULT_PRUNE_STATE_TABLE = "default:data_tx_janitor_state";
    --- End diff --
    
    Good point, I removed the namespace from the default value. It can always be overridden in the configuration


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra pull request #19: TEPHRA-35 Save compaction state for pruni...

Posted by poornachandra <gi...@git.apache.org>.
Github user poornachandra commented on a diff in the pull request:

    https://github.com/apache/incubator-tephra/pull/19#discussion_r86889731
  
    --- Diff: tephra-core/src/main/java/org/apache/tephra/util/TxUtils.java ---
    @@ -149,4 +149,15 @@ private static long getMaxTTL(Map<byte[], Long> ttlByFamily) {
       public static boolean isPreExistingVersion(long version) {
         return version < MAX_NON_TX_TIMESTAMP;
       }
    +
    +  /**
    +   * Returns the maximum transaction that can be removed from the invalid list for the state represented by the given
    +   * transaction.
    +   */
    +  public static long getPruneUpperBound(Transaction tx) {
    +    long maxInvalidTx =
    +      tx.getInvalids().length > 0 ? tx.getInvalids()[tx.getInvalids().length - 1] : Transaction.NO_TX_IN_PROGRESS;
    +    long firstInProgress = tx.getFirstInProgress();
    +    return Math.min(maxInvalidTx, firstInProgress - 1);
    --- End diff --
    
    Good catch, updated the code to use the current read pointer in such a case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tephra issue #19: TEPHRA-35 Save compaction state for pruning inva...

Posted by poornachandra <gi...@git.apache.org>.
Github user poornachandra commented on the issue:

    https://github.com/apache/incubator-tephra/pull/19
  
    @anew One way to handle the above issue is to define a time limit that defines a maximum duration a transaction can be used for writing. While doing data writes, we could add some checks in the co-processor to enforce this time limit. While pruning we can use this time limit to remove invalid transactions that are older than this time limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---