You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by karanmehta93 <gi...@git.apache.org> on 2018/10/27 03:30:27 UTC

[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...

GitHub user karanmehta93 opened a pull request:

    https://github.com/apache/phoenix/pull/397

    PHOENIX-4997 Phoenix MR on snapshots can produce duplicate rows

    @akshita-malhotra @gjacoby126 
    Could you please review?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/karanmehta93/phoenix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #397
    
----
commit 5bf281f8ac51eb54fb8c80db2c3ac7f056d1644b
Author: Karan Mehta <ka...@...>
Date:   2018-10-27T03:28:08Z

    PHOENIX-4997 Phoenix MR on snapshots can produce duplicate rows

----


---

[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/397#discussion_r228699931
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java ---
    @@ -66,74 +74,62 @@
       private static List<List<Object>> result;
       private long timestamp;
       private String tableName;
    +  private Job job;
    +  private Path tmpDir;
    +  private Configuration conf;
     
       @BeforeClass
       public static void doSetup() throws Exception {
           Map<String,String> props = Maps.newHashMapWithExpectedSize(1);
           setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator()));
       }
     
    -  @Test
    -  public void testMapReduceSnapshots() throws Exception {
    +  @Before
    --- End diff --
    
    Refactoring done to move common code to before method.


---

[GitHub] phoenix issue #397: PHOENIX-4997 Phoenix MR on snapshots can produce duplica...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/397
  
    Thanks @twdsilva and @gjacoby126 for the review. Will push it soon.


---

[GitHub] phoenix issue #397: PHOENIX-4997 Phoenix MR on snapshots can produce duplica...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/397
  
    @BinShi-SecularBird


---

[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/397#discussion_r228699968
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java ---
    @@ -191,31 +192,52 @@ private void upsertData(PreparedStatement stmt, String field1, String field2, in
         stmt.execute();
       }
     
    -  public void upsertAndSnapshot(String tableName) throws Exception {
    +  private void upsertAndSnapshot(String tableName, boolean shouldSplit) throws Exception {
         upsertData(tableName);
     
    +    TableName hbaseTableName = TableName.valueOf(tableName);
         Connection conn = DriverManager.getConnection(getUrl());
         Admin admin = conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin();
    -    admin.snapshot(SNAPSHOT_NAME, TableName.valueOf(tableName));
    -    // call flush to create new files in the region
    -    admin.flush(TableName.valueOf(tableName));
    +
    +    if (shouldSplit) {
    +      splitTableSync(admin, hbaseTableName, "BBBB".getBytes(), 2);
    +    }
    +
    +    admin.snapshot(SNAPSHOT_NAME, hbaseTableName);
     
         List<SnapshotDescription> snapshots = admin.listSnapshots();
         Assert.assertEquals(tableName, snapshots.get(0).getTable());
     
    +    // Capture the snapshot timestamp to use as SCN while reading the table later
    +    // Assigning the timestamp value here will make tests less flaky
    +    timestamp = System.currentTimeMillis();
    +
         // upsert data after snapshot
         PreparedStatement stmt = conn.prepareStatement(String.format(UPSERT, tableName));
    -    upsertData(stmt, "DDDD", "SNFB", 0004);
    +    upsertData(stmt, "DDDD", "SNFB", 45);
         conn.commit();
       }
     
    -    public void deleteSnapshot(String tableName) throws Exception {
    +  private void splitTableSync(Admin admin, TableName hbaseTableName,
    --- End diff --
    
    Didn't use the method provided in BaseTest.java since enabling/disabling the table causes the snapshot manifest file to not contain the offline regions. 
    The bug is caused because the snapshot manifest file on a regular production table can contain information about regions that are split or offline, which should ideally be ignored when restoring snapshot.


---

[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...

Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/397#discussion_r229055918
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java ---
    @@ -80,18 +83,39 @@ public boolean shouldStartNewScan(QueryPlan plan, List<Scan> scans,
     		}
     	}
     
    +	/**
    +	 * Get list of region locations from SnapshotManifest
    +	 * BaseResultIterators assume that regions are sorted using RegionInfo.COMPARATOR
    +	 */
     	private List<HRegionLocation> getRegionLocationsFromManifest(SnapshotManifest manifest) {
     		List<SnapshotRegionManifest> regionManifests = manifest.getRegionManifests();
     		Preconditions.checkNotNull(regionManifests);
     
    -		List<HRegionLocation> regionLocations = Lists.newArrayListWithCapacity(regionManifests.size());
    +		List<RegionInfo> regionInfos = Lists.newArrayListWithCapacity(regionManifests.size());
    +		List<HRegionLocation> hRegionLocations = Lists.newArrayListWithCapacity(regionManifests.size());
     
     		for (SnapshotRegionManifest regionManifest : regionManifests) {
    -			regionLocations.add(new HRegionLocation(
    -					ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo()), null));
    +			RegionInfo regionInfo = ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo());
    +			if (isValidRegion(regionInfo)) {
    +				regionInfos.add(regionInfo);
    +			}
    +		}
    +
    +		regionInfos.sort(RegionInfo.COMPARATOR);
    +
    +		for (RegionInfo regionInfo : regionInfos) {
    +			hRegionLocations.add(new HRegionLocation(regionInfo, null));
     		}
     
    -		return regionLocations;
    +		return hRegionLocations;
    +	}
    +
    +	// Exclude offline split parent regions
    +	private boolean isValidRegion(RegionInfo hri) {
    --- End diff --
    
    Maybe extract this to a util since its used in two classes.


---

[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/397#discussion_r229061395
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java ---
    @@ -80,18 +83,39 @@ public boolean shouldStartNewScan(QueryPlan plan, List<Scan> scans,
     		}
     	}
     
    +	/**
    +	 * Get list of region locations from SnapshotManifest
    +	 * BaseResultIterators assume that regions are sorted using RegionInfo.COMPARATOR
    +	 */
     	private List<HRegionLocation> getRegionLocationsFromManifest(SnapshotManifest manifest) {
     		List<SnapshotRegionManifest> regionManifests = manifest.getRegionManifests();
     		Preconditions.checkNotNull(regionManifests);
     
    -		List<HRegionLocation> regionLocations = Lists.newArrayListWithCapacity(regionManifests.size());
    +		List<RegionInfo> regionInfos = Lists.newArrayListWithCapacity(regionManifests.size());
    +		List<HRegionLocation> hRegionLocations = Lists.newArrayListWithCapacity(regionManifests.size());
     
     		for (SnapshotRegionManifest regionManifest : regionManifests) {
    -			regionLocations.add(new HRegionLocation(
    -					ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo()), null));
    +			RegionInfo regionInfo = ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo());
    +			if (isValidRegion(regionInfo)) {
    +				regionInfos.add(regionInfo);
    +			}
    +		}
    +
    +		regionInfos.sort(RegionInfo.COMPARATOR);
    +
    +		for (RegionInfo regionInfo : regionInfos) {
    +			hRegionLocations.add(new HRegionLocation(regionInfo, null));
     		}
     
    -		return regionLocations;
    +		return hRegionLocations;
    +	}
    +
    +	// Exclude offline split parent regions
    +	private boolean isValidRegion(RegionInfo hri) {
    --- End diff --
    
    I thought about it, however they don't have any classes in common and the implementations of these methods are slightly different. I agree that this is duplicate code and should not exist this way but I couldn't think of a better way for this.


---

[GitHub] phoenix issue #397: PHOENIX-4997 Phoenix MR on snapshots can produce duplica...

Posted by gjacoby126 <gi...@git.apache.org>.
Github user gjacoby126 commented on the issue:

    https://github.com/apache/phoenix/pull/397
  
    lgtm.


---

[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 closed the pull request at:

    https://github.com/apache/phoenix/pull/397


---

[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...

Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/397#discussion_r229057356
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java ---
    @@ -151,13 +147,17 @@ private void configureJob(Job job, String tableName, String inputQuery, String c
           if (condition != null) {
             selectQuery.append(" WHERE " + condition);
           }
    +
           if (inputQuery == null)
             inputQuery = selectQuery.toString();
     
           ResultSet rs = DriverManager.getConnection(getUrl(), props).createStatement().executeQuery(inputQuery);
     
           for (List<Object> r : result) {
    -        assertTrue("No data stored in the table!", rs.next());
    +        if (rs.next() == false) {
    +          System.out.println("Placeholder");
    --- End diff --
    
    remove 


---