You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by karanmehta93 <gi...@git.apache.org> on 2018/10/27 03:30:27 UTC
[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...
GitHub user karanmehta93 opened a pull request:
https://github.com/apache/phoenix/pull/397
PHOENIX-4997 Phoenix MR on snapshots can produce duplicate rows
@akshita-malhotra @gjacoby126
Could you please review?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/karanmehta93/phoenix master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/phoenix/pull/397.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #397
----
commit 5bf281f8ac51eb54fb8c80db2c3ac7f056d1644b
Author: Karan Mehta <ka...@...>
Date: 2018-10-27T03:28:08Z
PHOENIX-4997 Phoenix MR on snapshots can produce duplicate rows
----
---
[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...
Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/397#discussion_r228699931
--- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java ---
@@ -66,74 +74,62 @@
private static List<List<Object>> result;
private long timestamp;
private String tableName;
+ private Job job;
+ private Path tmpDir;
+ private Configuration conf;
@BeforeClass
public static void doSetup() throws Exception {
Map<String,String> props = Maps.newHashMapWithExpectedSize(1);
setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator()));
}
- @Test
- public void testMapReduceSnapshots() throws Exception {
+ @Before
--- End diff --
Refactoring done to move common code to before method.
---
[GitHub] phoenix issue #397: PHOENIX-4997 Phoenix MR on snapshots can produce duplica...
Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:
https://github.com/apache/phoenix/pull/397
Thanks @twdsilva and @gjacoby126 for the review. Will push it soon.
---
[GitHub] phoenix issue #397: PHOENIX-4997 Phoenix MR on snapshots can produce duplica...
Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:
https://github.com/apache/phoenix/pull/397
@BinShi-SecularBird
---
[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...
Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/397#discussion_r228699968
--- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java ---
@@ -191,31 +192,52 @@ private void upsertData(PreparedStatement stmt, String field1, String field2, in
stmt.execute();
}
- public void upsertAndSnapshot(String tableName) throws Exception {
+ private void upsertAndSnapshot(String tableName, boolean shouldSplit) throws Exception {
upsertData(tableName);
+ TableName hbaseTableName = TableName.valueOf(tableName);
Connection conn = DriverManager.getConnection(getUrl());
Admin admin = conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin();
- admin.snapshot(SNAPSHOT_NAME, TableName.valueOf(tableName));
- // call flush to create new files in the region
- admin.flush(TableName.valueOf(tableName));
+
+ if (shouldSplit) {
+ splitTableSync(admin, hbaseTableName, "BBBB".getBytes(), 2);
+ }
+
+ admin.snapshot(SNAPSHOT_NAME, hbaseTableName);
List<SnapshotDescription> snapshots = admin.listSnapshots();
Assert.assertEquals(tableName, snapshots.get(0).getTable());
+ // Capture the snapshot timestamp to use as SCN while reading the table later
+ // Assigning the timestamp value here will make tests less flaky
+ timestamp = System.currentTimeMillis();
+
// upsert data after snapshot
PreparedStatement stmt = conn.prepareStatement(String.format(UPSERT, tableName));
- upsertData(stmt, "DDDD", "SNFB", 0004);
+ upsertData(stmt, "DDDD", "SNFB", 45);
conn.commit();
}
- public void deleteSnapshot(String tableName) throws Exception {
+ private void splitTableSync(Admin admin, TableName hbaseTableName,
--- End diff --
Didn't use the method provided in BaseTest.java since enabling/disabling the table causes the snapshot manifest file to not contain the offline regions.
The bug is caused because the snapshot manifest file on a regular production table can contain information about regions that are split or offline, which should ideally be ignored when restoring snapshot.
---
[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...
Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/397#discussion_r229055918
--- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java ---
@@ -80,18 +83,39 @@ public boolean shouldStartNewScan(QueryPlan plan, List<Scan> scans,
}
}
+ /**
+ * Get list of region locations from SnapshotManifest
+ * BaseResultIterators assume that regions are sorted using RegionInfo.COMPARATOR
+ */
private List<HRegionLocation> getRegionLocationsFromManifest(SnapshotManifest manifest) {
List<SnapshotRegionManifest> regionManifests = manifest.getRegionManifests();
Preconditions.checkNotNull(regionManifests);
- List<HRegionLocation> regionLocations = Lists.newArrayListWithCapacity(regionManifests.size());
+ List<RegionInfo> regionInfos = Lists.newArrayListWithCapacity(regionManifests.size());
+ List<HRegionLocation> hRegionLocations = Lists.newArrayListWithCapacity(regionManifests.size());
for (SnapshotRegionManifest regionManifest : regionManifests) {
- regionLocations.add(new HRegionLocation(
- ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo()), null));
+ RegionInfo regionInfo = ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo());
+ if (isValidRegion(regionInfo)) {
+ regionInfos.add(regionInfo);
+ }
+ }
+
+ regionInfos.sort(RegionInfo.COMPARATOR);
+
+ for (RegionInfo regionInfo : regionInfos) {
+ hRegionLocations.add(new HRegionLocation(regionInfo, null));
}
- return regionLocations;
+ return hRegionLocations;
+ }
+
+ // Exclude offline split parent regions
+ private boolean isValidRegion(RegionInfo hri) {
--- End diff --
Maybe extract this to a util since its used in two classes.
---
[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...
Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/397#discussion_r229061395
--- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java ---
@@ -80,18 +83,39 @@ public boolean shouldStartNewScan(QueryPlan plan, List<Scan> scans,
}
}
+ /**
+ * Get list of region locations from SnapshotManifest
+ * BaseResultIterators assume that regions are sorted using RegionInfo.COMPARATOR
+ */
private List<HRegionLocation> getRegionLocationsFromManifest(SnapshotManifest manifest) {
List<SnapshotRegionManifest> regionManifests = manifest.getRegionManifests();
Preconditions.checkNotNull(regionManifests);
- List<HRegionLocation> regionLocations = Lists.newArrayListWithCapacity(regionManifests.size());
+ List<RegionInfo> regionInfos = Lists.newArrayListWithCapacity(regionManifests.size());
+ List<HRegionLocation> hRegionLocations = Lists.newArrayListWithCapacity(regionManifests.size());
for (SnapshotRegionManifest regionManifest : regionManifests) {
- regionLocations.add(new HRegionLocation(
- ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo()), null));
+ RegionInfo regionInfo = ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo());
+ if (isValidRegion(regionInfo)) {
+ regionInfos.add(regionInfo);
+ }
+ }
+
+ regionInfos.sort(RegionInfo.COMPARATOR);
+
+ for (RegionInfo regionInfo : regionInfos) {
+ hRegionLocations.add(new HRegionLocation(regionInfo, null));
}
- return regionLocations;
+ return hRegionLocations;
+ }
+
+ // Exclude offline split parent regions
+ private boolean isValidRegion(RegionInfo hri) {
--- End diff --
I thought about it, however they don't have any classes in common and the implementations of these methods are slightly different. I agree that this is duplicate code and should not exist this way but I couldn't think of a better way for this.
---
[GitHub] phoenix issue #397: PHOENIX-4997 Phoenix MR on snapshots can produce duplica...
Posted by gjacoby126 <gi...@git.apache.org>.
Github user gjacoby126 commented on the issue:
https://github.com/apache/phoenix/pull/397
lgtm.
---
[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...
Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 closed the pull request at:
https://github.com/apache/phoenix/pull/397
---
[GitHub] phoenix pull request #397: PHOENIX-4997 Phoenix MR on snapshots can produce ...
Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/397#discussion_r229057356
--- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java ---
@@ -151,13 +147,17 @@ private void configureJob(Job job, String tableName, String inputQuery, String c
if (condition != null) {
selectQuery.append(" WHERE " + condition);
}
+
if (inputQuery == null)
inputQuery = selectQuery.toString();
ResultSet rs = DriverManager.getConnection(getUrl(), props).createStatement().executeQuery(inputQuery);
for (List<Object> r : result) {
- assertTrue("No data stored in the table!", rs.next());
+ if (rs.next() == false) {
+ System.out.println("Placeholder");
--- End diff --
remove
---