You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by babokim <gi...@git.apache.org> on 2014/08/05 12:44:29 UTC

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

GitHub user babokim opened a pull request:

    https://github.com/apache/tajo/pull/107

    TAJO-991: Running PullServer on a dedicated JVM process which separates from worker.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/babokim/tajo TAJO-991

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #107
    
----
commit 85a9918779428c25669b6d6a187f124a37dffb86
Author: 김형준 <ba...@babokim-macbook-pro.local>
Date:   2014-08-04T11:22:49Z

    TAJO-991: Running PullServer on a dedicated JVM process which separates from worker.

commit 731d9160be1549fd8e13fecff321dfe3db604f2f
Author: 김형준 <ba...@babokim-macbook-pro.local>
Date:   2014-08-04T11:23:29Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 25f63a062840b540432eb50746f8f6eac6ebd4c6
Author: 김형준 <ba...@babokim-macbook-pro.local>
Date:   2014-08-04T13:38:26Z

    TAJO-991: Running PullServer on a dedicated JVM process which separates from worker.
    fix delete por file error in readPullServerPort.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16970239
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java ---
    @@ -622,6 +622,11 @@ public void handle(TaskEvent event) {
       }
     
       public void setIntermediateData(Collection<IntermediateEntry> partitions) {
    +    for (IntermediateEntry i: partitions) {
    --- End diff --
    
    looks like unused codes. please remove this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on the pull request:

    https://github.com/apache/tajo/pull/107#issuecomment-51737626
  
    OK, I will do review this. @babokim  please merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16338685
  
    --- Diff: tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java ---
    @@ -312,6 +377,63 @@ public ChannelPipeline getPipeline() throws Exception {
         }
       }
     
    +
    +  Map<String, ProcessingStatus> processingStatusMap = new ConcurrentHashMap<String, ProcessingStatus>();
    +
    +  public void completeFileChunk(FadvisedFileRegion filePart,
    +                                   String requestUri,
    +                                   long startTime) {
    +    ProcessingStatus status = processingStatusMap.get(requestUri);
    +    if (status != null) {
    +      status.decrementRemainFiles(filePart, startTime);
    +    }
    +  }
    +
    +  class ProcessingStatus {
    +    String requestUri;
    +    int numFiles;
    +    AtomicInteger remainFiles;
    +    long startTime;
    +    long makeFileListTime;
    +    long minTime = Long.MAX_VALUE;
    +    long maxTime;
    +    int numSlowFile;
    +
    +    public ProcessingStatus(String requestUri) {
    +      this.requestUri = requestUri;
    +      this.startTime = System.currentTimeMillis();
    +    }
    +
    +    public void setNumFiles(int numFiles) {
    +      this.numFiles = numFiles;
    +      this.remainFiles = new AtomicInteger(numFiles);
    +    }
    +    public void decrementRemainFiles(FadvisedFileRegion filePart, long fileStartTime) {
    +      synchronized(remainFiles) {
    --- End diff --
    
    remainFiles is AtomicInteger. please remove the synchronized block. also synchronized(processingStatusMap)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16338681
  
    --- Diff: tajo-dist/src/main/conf/tajo-env.sh ---
    @@ -68,4 +74,7 @@ export TAJO_WORKER_STANDBY_MODE=true
     
     # It must be required to use HCatalogStore
     # export HIVE_HOME=
    -# export HIVE_JDBC_DRIVER_DIR=
    \ No newline at end of file
    +# export HIVE_JDBC_DRIVER_DIR=
    +
    +# Tajo PullServer mode. the default mode is dedicated mode(dedicated or embedded)
    + export TAJO_PULLSERVER_MODE=dedicated
    --- End diff --
    
    How about changing ‘TAJO_PULLSERVER_STANDALONE = true’ instead of dedicated/embeded ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by babokim <gi...@git.apache.org>.
Github user babokim commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16764955
  
    --- Diff: tajo-dist/src/main/conf/tajo-env.sh ---
    @@ -68,4 +74,7 @@ export TAJO_WORKER_STANDBY_MODE=true
     
     # It must be required to use HCatalogStore
     # export HIVE_HOME=
    -# export HIVE_JDBC_DRIVER_DIR=
    \ No newline at end of file
    +# export HIVE_JDBC_DRIVER_DIR=
    +
    +# Tajo PullServer mode. the default mode is dedicated mode(dedicated or embedded)
    + export TAJO_PULLSERVER_MODE=dedicated
    --- End diff --
    
    Ok. I'll change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by babokim <gi...@git.apache.org>.
Github user babokim commented on the pull request:

    https://github.com/apache/tajo/pull/107#issuecomment-54012381
  
    I rebase and apply commented issues. Please review this patch.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16338689
  
    --- Diff: tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java ---
    @@ -312,6 +377,63 @@ public ChannelPipeline getPipeline() throws Exception {
         }
       }
     
    +
    +  Map<String, ProcessingStatus> processingStatusMap = new ConcurrentHashMap<String, ProcessingStatus>();
    +
    +  public void completeFileChunk(FadvisedFileRegion filePart,
    +                                   String requestUri,
    +                                   long startTime) {
    +    ProcessingStatus status = processingStatusMap.get(requestUri);
    +    if (status != null) {
    +      status.decrementRemainFiles(filePart, startTime);
    +    }
    +  }
    +
    +  class ProcessingStatus {
    +    String requestUri;
    +    int numFiles;
    +    AtomicInteger remainFiles;
    +    long startTime;
    +    long makeFileListTime;
    +    long minTime = Long.MAX_VALUE;
    +    long maxTime;
    +    int numSlowFile;
    +
    +    public ProcessingStatus(String requestUri) {
    +      this.requestUri = requestUri;
    +      this.startTime = System.currentTimeMillis();
    +    }
    +
    +    public void setNumFiles(int numFiles) {
    +      this.numFiles = numFiles;
    +      this.remainFiles = new AtomicInteger(numFiles);
    +    }
    +    public void decrementRemainFiles(FadvisedFileRegion filePart, long fileStartTime) {
    +      synchronized(remainFiles) {
    +        long fileSendTime = System.currentTimeMillis() - fileStartTime;
    +        if (fileSendTime > 20 * 1000) {
    +          LOG.info("PullServer send too long time: filePos=" + filePart.getPosition() + ", fileLen=" + filePart.getCount());
    +          numSlowFile++;
    +        }
    +        if (fileSendTime > maxTime) {
    +          maxTime = fileSendTime;
    +        }
    +        if (fileSendTime < minTime) {
    +          minTime = fileSendTime;
    +        }
    +        int remain = remainFiles.decrementAndGet();
    +        if (remain <= 0) {
    +          synchronized(processingStatusMap) {
    --- End diff --
    
    remove this block


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by babokim <gi...@git.apache.org>.
Github user babokim commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16764866
  
    --- Diff: tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java ---
    @@ -312,6 +377,63 @@ public ChannelPipeline getPipeline() throws Exception {
         }
       }
     
    +
    +  Map<String, ProcessingStatus> processingStatusMap = new ConcurrentHashMap<String, ProcessingStatus>();
    +
    +  public void completeFileChunk(FadvisedFileRegion filePart,
    +                                   String requestUri,
    +                                   long startTime) {
    +    ProcessingStatus status = processingStatusMap.get(requestUri);
    +    if (status != null) {
    +      status.decrementRemainFiles(filePart, startTime);
    +    }
    +  }
    +
    +  class ProcessingStatus {
    +    String requestUri;
    +    int numFiles;
    +    AtomicInteger remainFiles;
    +    long startTime;
    +    long makeFileListTime;
    +    long minTime = Long.MAX_VALUE;
    +    long maxTime;
    +    int numSlowFile;
    +
    +    public ProcessingStatus(String requestUri) {
    +      this.requestUri = requestUri;
    +      this.startTime = System.currentTimeMillis();
    +    }
    +
    +    public void setNumFiles(int numFiles) {
    +      this.numFiles = numFiles;
    +      this.remainFiles = new AtomicInteger(numFiles);
    +    }
    +    public void decrementRemainFiles(FadvisedFileRegion filePart, long fileStartTime) {
    +      synchronized(remainFiles) {
    --- End diff --
    
    The maxTime and minTime variable is not atomic variable and this variable should be guaranteed thread safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by babokim <gi...@git.apache.org>.
Github user babokim commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16985288
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ---
    @@ -183,7 +181,13 @@ public void stop() {
       }
     
       protected void cleanupExecutionBlock(List<TajoIdProtos.ExecutionBlockIdProto> executionBlockIds) {
    -    LOG.info("cleanup executionBlocks: " + executionBlockIds);
    +    String cleanupMessage = "";
    +    String prefix = "";
    +    for (TajoIdProtos.ExecutionBlockIdProto eachEbId: executionBlockIds) {
    +      cleanupMessage += prefix + (new ExecutionBlockId(eachEbId)).toString();
    --- End diff --
    
    Yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on the pull request:

    https://github.com/apache/tajo/pull/107#issuecomment-52456322
  
    @babokim 
    Thank you for your contribution.
    I left some trivial comments. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by babokim <gi...@git.apache.org>.
Github user babokim commented on the pull request:

    https://github.com/apache/tajo/pull/107#issuecomment-54262469
  
    Hi @jinossy.
    Thank you for your review.
    I've just committed it to the master branch. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/107#issuecomment-51735248
  
    Your patch is conflict against the latest revision. So, I've rebased it and sent a pull request to your working branch. Please review this.
    
    @jinossy This work is conflict against your recent work (TAJO-949). It would be great if you review Hyoungjun's work and my conflict resolution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16970744
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ---
    @@ -183,7 +181,13 @@ public void stop() {
       }
     
       protected void cleanupExecutionBlock(List<TajoIdProtos.ExecutionBlockIdProto> executionBlockIds) {
    -    LOG.info("cleanup executionBlocks: " + executionBlockIds);
    +    String cleanupMessage = "";
    +    String prefix = "";
    +    for (TajoIdProtos.ExecutionBlockIdProto eachEbId: executionBlockIds) {
    +      cleanupMessage += prefix + (new ExecutionBlockId(eachEbId)).toString();
    --- End diff --
    
    Could you change to StringBuilder ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on the pull request:

    https://github.com/apache/tajo/pull/107#issuecomment-54252062
  
    +1 
    Looks great to me. Thank you for your contribution!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/107


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-991: Running PullServer on a dedicated JVM...

Posted by babokim <gi...@git.apache.org>.
Github user babokim commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/107#discussion_r16985311
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java ---
    @@ -622,6 +622,11 @@ public void handle(TaskEvent event) {
       }
     
       public void setIntermediateData(Collection<IntermediateEntry> partitions) {
    +    for (IntermediateEntry i: partitions) {
    --- End diff --
    
    Ok, will be deleted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---