You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/07/16 13:02:31 UTC

[GitHub] [incubator-doris] wangbo opened a new pull request #4109: [Spark Load] add job granularity global dict lock

wangbo opened a new pull request #4109:
URL: https://github.com/apache/incubator-doris/pull/4109


   #4058
   implement a job granularity lock for global dict


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wangbo commented on a change in pull request #4109: [Spark Load] add job granularity global dict lock

Posted by GitBox <gi...@apache.org>.
wangbo commented on a change in pull request #4109:
URL: https://github.com/apache/incubator-doris/pull/4109#discussion_r459250963



##########
File path: fe/src/main/java/org/apache/doris/load/loadv2/LoadJobScheduler.java
##########
@@ -47,6 +51,10 @@
 
     private LinkedBlockingQueue<LoadJob> needScheduleJobs = Queues.newLinkedBlockingQueue();
 
+    // Used to implement a job granularity lock for update global dict table serially
+    // The runningBitmapTables keeps the doris table id of the spark load job which in state pending or etl
+    private Map<Long, Set<Long>> runningBitmapTableMap = new HashMap<>();

Review comment:
       Maybe "runningTableWithBitmapColumns" is better ?

##########
File path: fe/src/main/java/org/apache/doris/load/loadv2/SparkLoadJob.java
##########
@@ -909,4 +949,30 @@ private void initTDescriptorTable(DescriptorTable descTable) {
             tDescriptorTable = descTable.toThrift();
         }
     }
+
+    public Set<Long> getTableWithBitmapColumn() {
+        if (tableWithBitmapColumn.size() != 0) {

Review comment:
       👌




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wangbo commented on a change in pull request #4109: [Spark Load] add job granularity global dict lock

Posted by GitBox <gi...@apache.org>.
wangbo commented on a change in pull request #4109:
URL: https://github.com/apache/incubator-doris/pull/4109#discussion_r459228336



##########
File path: fe/src/main/java/org/apache/doris/load/loadv2/SparkLoadJob.java
##########
@@ -742,6 +760,26 @@ private void unprotectedLogUpdateStateInfo() {
         Catalog.getCurrentCatalog().getEditLog().logUpdateLoadJob(info);
     }
 
+    // because current SparkLoadJob didn't persist job's tableId, so we need to get job's tableid from transaction mgr
+    private void fillJobTableIdSetWhenReplay() {
+        try {
+            DatabaseTransactionMgr databaseTransactionMgr =

Review comment:
       👌




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wangbo closed pull request #4109: [Spark Load] add job granularity global dict lock

Posted by GitBox <gi...@apache.org>.
wangbo closed pull request #4109:
URL: https://github.com/apache/incubator-doris/pull/4109


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #4109: [Spark Load] add job granularity global dict lock

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #4109:
URL: https://github.com/apache/incubator-doris/pull/4109#discussion_r457882817



##########
File path: fe/src/main/java/org/apache/doris/load/loadv2/LoadJobScheduler.java
##########
@@ -47,6 +51,10 @@
 
     private LinkedBlockingQueue<LoadJob> needScheduleJobs = Queues.newLinkedBlockingQueue();
 
+    // Used to implement a job granularity lock for update global dict table serially
+    // The runningBitmapTables keeps the doris table id of the spark load job which in state pending or etl
+    private Map<Long, Set<Long>> runningBitmapTableMap = new HashMap<>();

Review comment:
       Why calling it a "bitmap" table?

##########
File path: fe/src/main/java/org/apache/doris/load/loadv2/SparkLoadJob.java
##########
@@ -742,6 +760,26 @@ private void unprotectedLogUpdateStateInfo() {
         Catalog.getCurrentCatalog().getEditLog().logUpdateLoadJob(info);
     }
 
+    // because current SparkLoadJob didn't persist job's tableId, so we need to get job's tableid from transaction mgr
+    private void fillJobTableIdSetWhenReplay() {
+        try {
+            DatabaseTransactionMgr databaseTransactionMgr =

Review comment:
       How about just save the table ids in SparkLoad, so that we don't need to get it from TxnMgr? Because of this logic, the coupling with TxnMgr may be very serious. we have to handle the problem such as "db not exist"...

##########
File path: fe/src/main/java/org/apache/doris/load/loadv2/SparkLoadJob.java
##########
@@ -909,4 +949,30 @@ private void initTDescriptorTable(DescriptorTable descTable) {
             tDescriptorTable = descTable.toThrift();
         }
     }
+
+    public Set<Long> getTableWithBitmapColumn() {
+        if (tableWithBitmapColumn.size() != 0) {

Review comment:
       I think we can prepare this bitmap table map when analyzing the job, and then persist it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org