You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/21 17:51:00 UTC

[jira] [Commented] (PHOENIX-6273) Add support to handle MR Snapshot restore externally

    [ https://issues.apache.org/jira/browse/PHOENIX-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269485#comment-17269485 ] 

ASF GitHub Bot commented on PHOENIX-6273:
-----------------------------------------

sakshamgangwar commented on a change in pull request #1079:
URL: https://github.com/apache/phoenix/pull/1079#discussion_r562080082



##########
File path: phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java
##########
@@ -274,6 +293,9 @@ private void configureJob(Job job, String tableName, String inputQuery, String c
 
       assertFalse("Should only have stored" + result.size() + "rows in the table for the timestamp!", rs.next());
     } finally {
+      if (isSnapshotRestoreDoneExternally) {
+        assertRestoreDirCount(conf, tmpDir.toString(), 1);

Review comment:
       @shahrs87 There were two levels of subdirectories getting created for snapshot restore on every scan: 
   https://github.com/apache/phoenix/blob/55f1362fc52eeabed139728dae153518883743a5/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixMapReduceUtil.java#L185
   
   https://github.com/apache/phoenix/blob/55f1362fc52eeabed139728dae153518883743a5/phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L81
   
   I have removed those in the original flow so now the directory gets cleaned up every single scan and gets created again for the next scan with the same directory structure. 
   
   I can assert here no existence of the restore directory in the original flow. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add support to handle MR Snapshot restore externally
> ----------------------------------------------------
>
>                 Key: PHOENIX-6273
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6273
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Saksham Gangwar
>            Assignee: Saksham Gangwar
>            Priority: Major
>             Fix For: 5.1.0, 4.16.0
>
>
> Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. 
> We also *need not restore the snapshot per map task*. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. Jira to correct this behavior: https://issues.apache.org/jira/browse/PHOENIX-6334
> *The purpose of this Jira* is to resolve this issue immediately by providing the ability to the caller to decide whether or not snapshot restore needs to be handled externally or internally on the Phoenix side (the buggy approach). 
> All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081



--
This message was sent by Atlassian Jira
(v8.3.4#803005)