You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by kumarvishal09 <gi...@git.apache.org> on 2017/06/12 10:46:04 UTC

[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...

GitHub user kumarvishal09 opened a pull request:

    https://github.com/apache/carbondata/pull/1019

    [CARBONDATA-1156]Improve IUD performance and fixed synchronization issue

    Delete delta file loading is taking more time as it is read for blocklet level. Now added code to read block level.
    In current IUD design delete delta files are getting listed for each block in executor level in case of parallel query and iud operation it may give wrong result. Now passing delete delta information from driver to executor

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kumarvishal09/incubator-carbondata IUDPerformanceImprovement

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1019.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1019
    
----
commit 60cfc66fe1f2de4cc3c2395a4dd479abb2a602f4
Author: kumarvishal <ku...@gmail.com>
Date:   2017-06-12T10:36:24Z

    Fixed Syncronization issue and improve IUD performance

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1019#discussion_r121395234
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java ---
    @@ -126,6 +144,82 @@ private void intialiseInfos() {
         }
       }
     
    +  /**
    +   * Below method will be used to get the delete delta rows for a block
    +   *
    +   * @param dataBlock       data block
    +   * @param deleteDeltaInfo delete delta info
    +   * @return blockid+pageid to deleted row mapping
    +   */
    +  private Map<String, DeleteDeltaVo> getDeleteDeltaDetails(AbstractIndex dataBlock,
    +      DeleteDeltaInfo deleteDeltaInfo) {
    +    // if datablock deleted delta timestamp is more then the current delete delta files timestamp
    +    // then return the current deleted rows
    +    if (dataBlock.getDeleteDeltaTimestamp() >= deleteDeltaInfo
    +        .getLatestDeleteDeltaFileTimestamp()) {
    +      return dataBlock.getDeletedRowsMap();
    +    }
    +    CarbonDeleteFilesDataReader carbonDeleteDeltaFileReader = null;
    +    // get the lock object so in case of concurrent query only one task will read the delete delta
    +    // files other tasks will wait
    +    Object lockObject = deleteDeltaToLockObjectMap.get(deleteDeltaInfo);
    +    // if lock object is null then add a lock object
    +    if (null == lockObject) {
    +      synchronized (deleteDeltaToLockObjectMap) {
    +        // double checking
    --- End diff --
    
    Again do `deleteDeltaToLockObjectMap.get(deleteDeltaInfo);` to avoid null pointer exception


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1019


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1019: [CARBONDATA-1156]Improve IUD performance and fixed s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1019
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/carbondata-pr-spark-1.6/287/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1019#discussion_r121399194
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java ---
    @@ -126,6 +144,82 @@ private void intialiseInfos() {
         }
       }
     
    +  /**
    +   * Below method will be used to get the delete delta rows for a block
    +   *
    +   * @param dataBlock       data block
    +   * @param deleteDeltaInfo delete delta info
    +   * @return blockid+pageid to deleted row mapping
    +   */
    +  private Map<String, DeleteDeltaVo> getDeleteDeltaDetails(AbstractIndex dataBlock,
    +      DeleteDeltaInfo deleteDeltaInfo) {
    +    // if datablock deleted delta timestamp is more then the current delete delta files timestamp
    +    // then return the current deleted rows
    +    if (dataBlock.getDeleteDeltaTimestamp() >= deleteDeltaInfo
    +        .getLatestDeleteDeltaFileTimestamp()) {
    +      return dataBlock.getDeletedRowsMap();
    +    }
    +    CarbonDeleteFilesDataReader carbonDeleteDeltaFileReader = null;
    +    // get the lock object so in case of concurrent query only one task will read the delete delta
    +    // files other tasks will wait
    +    Object lockObject = deleteDeltaToLockObjectMap.get(deleteDeltaInfo);
    +    // if lock object is null then add a lock object
    +    if (null == lockObject) {
    +      synchronized (deleteDeltaToLockObjectMap) {
    +        // double checking
    --- End diff --
    
    ok. I missed it:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1019: [CARBONDATA-1156]Improve IUD performance and fixed s...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1019
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2385/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1019#discussion_r121390830
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/reader/CarbonDeleteFilesDataReader.java ---
    @@ -120,7 +122,53 @@ private void initThreadPoolSize() {
           }
         }
         return pageIdDeleteRowsMap;
    +  }
     
    +  /**
    +   * Below method will be used to read the delete delta files
    +   * and get the map of blockletid and page id mapping to deleted
    +   * rows
    +   *
    +   * @param deltaFiles delete delta files array
    +   * @return map of blockletid_pageid to deleted rows
    +   */
    +  public Map<String, DeleteDeltaVo> getDeletedRowsDataVo(String[] deltaFiles) {
    +    List<Future<DeleteDeltaBlockDetails>> taskSubmitList = new ArrayList<>();
    +    ExecutorService executorService = Executors.newFixedThreadPool(thread_pool_size);
    +    for (final String deltaFile : deltaFiles) {
    +      taskSubmitList.add(executorService.submit(new Callable<DeleteDeltaBlockDetails>() {
    +        @Override public DeleteDeltaBlockDetails call() throws IOException {
    +          CarbonDeleteDeltaFileReaderImpl deltaFileReader =
    +              new CarbonDeleteDeltaFileReaderImpl(deltaFile, FileFactory.getFileType(deltaFile));
    +          return deltaFileReader.readJson();
    +        }
    +      }));
    +    }
    +    try {
    +      executorService.shutdown();
    +      executorService.awaitTermination(30, TimeUnit.MINUTES);
    +    } catch (InterruptedException e) {
    +      LOGGER.error("Error while reading the delete delta files : " + e.getMessage());
    +    }
    +    Map<String, DeleteDeltaVo> pageIdToBlockLetVo = new HashMap<>();
    +    List<DeleteDeltaBlockletDetails> blockletDetails = null;
    +    for (int i = 0; i < taskSubmitList.size(); i++) {
    +      try {
    +        blockletDetails = taskSubmitList.get(i).get().getBlockletDetails();
    +      } catch (InterruptedException | ExecutionException e) {
    +        throw new RuntimeException(e);
    +      }
    +      for (DeleteDeltaBlockletDetails blockletDetail : blockletDetails) {
    +        DeleteDeltaVo deleteDeltaVo = pageIdToBlockLetVo.get(blockletDetail.getBlockletKey());
    +        if (null == deleteDeltaVo) {
    +          deleteDeltaVo = new DeleteDeltaVo();
    +          pageIdToBlockLetVo.put(blockletDetail.getBlockletKey(), deleteDeltaVo);
    +        }
    +        deleteDeltaVo.insertData(blockletDetail.getDeletedRows());
    +        ;
    --- End diff --
    
    remove semicolon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1019: [CARBONDATA-1156]Improve IUD performance and fixed s...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1019
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1019: [CARBONDATA-1156]Improve IUD performance and fixed s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1019
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/carbondata-pr-spark-1.6/264/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1019: [CARBONDATA-1156]Improve IUD performance and fixed s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1019
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/carbondata-pr-spark-1.6/266/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1019: [CARBONDATA-1156]Improve IUD performance and fixed s...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1019
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2387/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1019#discussion_r121389622
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/mutate/DeleteDeltaVo.java ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.mutate;
    +
    +import java.util.BitSet;
    +import java.util.Iterator;
    +import java.util.Set;
    +
    +/**
    + * Class which keep the information about the rows
    + * while got deleted
    + */
    +public class DeleteDeltaVo {
    +
    --- End diff --
    
    Mo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1019: [CARBONDATA-1156]Improve IUD performance and fixed s...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1019
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2408/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---