You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by dhatchayani <gi...@git.apache.org> on 2018/04/27 17:33:25 UTC

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

GitHub user dhatchayani opened a pull request:

    https://github.com/apache/carbondata/pull/2244

    [CARBONDATA-2310] Refactored code to improve Distributable interface

    [CARBONDATA-2362] Changing the Cacheable object from DataMap to Wrapper
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2362

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2244
    
----
commit a2fa54a69c2ccd4edf4016e28cef7b5d2890aa60
Author: dhatchayani <dh...@...>
Date:   2018-04-27T17:33:52Z

    [CARBONDATA-2310] Refactored code to improve Distributable interface
    [CARBONDATA-2362] Changing the Cacheable object from DataMap to Wrapper

----


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5480/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4427/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4326/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4478/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5508/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4755/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4316/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4596/



---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r185804764
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -0,0 +1,194 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeMap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datamap.Segment;
    +import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
    +import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.BlockMetaInfo;
    +import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
    +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
    +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
    +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
    +import org.apache.carbondata.core.util.path.CarbonTablePath;
    +
    +import org.apache.hadoop.fs.LocatedFileStatus;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.fs.PathFilter;
    +import org.apache.hadoop.fs.RemoteIterator;
    +
    +
    +public class BlockletDataMapUtil {
    +
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
    +
    +  public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
    +      TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
    +      Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
    +      throws IOException {
    +    if (identifier.getMergeIndexFileName() != null
    +        && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      CarbonFile indexMergeFile = FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getMergeIndexFileName());
    +      if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
    +        indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
    +        filesRead.add(indexMergeFile.getPath());
    +      }
    +    }
    +    if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getIndexFileName()) });
    +    }
    +    DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
    +    Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
    +    List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
    +        identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +            .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
    +    for (DataFileFooter footer : indexInfo) {
    +      String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
    +      if (null == blockMetaInfoMap.get(blockPath)) {
    +        blockMetaInfoMap.put(blockPath, createBlockMetaInfo(fileNameToMetaInfoMapping, blockPath));
    +      }
    +    }
    +    return blockMetaInfoMap;
    +  }
    +
    +  /**
    +   * This method will create file name to block Meta Info Mapping. This method will reduce the
    +   * number of namenode calls and using this method one namenode will fetch 1000 entries
    +   *
    +   * @param segmentFilePath
    +   * @return
    +   * @throws IOException
    +   */
    +  public static Map<String, BlockMetaInfo> createCarbonDataFileBlockMetaInfoMapping(
    +      String segmentFilePath) throws IOException {
    +    Map<String, BlockMetaInfo> fileNameToMetaInfoMapping = new TreeMap();
    +    CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath);
    +    if (carbonFile instanceof AbstractDFSCarbonFile) {
    +      Path path = new Path(segmentFilePath);
    +      RemoteIterator<LocatedFileStatus> iter =
    +          ((AbstractDFSCarbonFile) carbonFile).fs.listLocatedStatus(path);
    +      PathFilter pathFilter = new PathFilter() {
    +        @Override public boolean accept(Path path) {
    +          return CarbonTablePath.isCarbonDataFile(path.getName());
    +        }
    +      };
    +      while (iter.hasNext()) {
    +        LocatedFileStatus fileStatus = iter.next();
    +        if (pathFilter.accept(fileStatus.getPath())) {
    +          String[] location = fileStatus.getBlockLocations()[0].getHosts();
    +          long len = fileStatus.getLen();
    +          BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len);
    +          fileNameToMetaInfoMapping.put(fileStatus.getPath().toString(), blockMetaInfo);
    +        }
    +      }
    +    }
    +    return fileNameToMetaInfoMapping;
    +  }
    +
    +  private static BlockMetaInfo createBlockMetaInfo(
    +      Map<String, BlockMetaInfo> fileNameToMetaInfoMapping, String carbonDataFile)
    +      throws IOException {
    --- End diff --
    
    no need to add throws exception


---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r185794118
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -159,16 +159,18 @@
           FileFactory.mkdirs(writePath, FileFactory.getFileType(writePath));
         }
         for (Blocklet blocklet : blocklets) {
    -      ExtendedBlocklet detailedBlocklet = blockletDetailsFetcher
    -          .getExtendedBlocklet(blocklet, distributable.getSegment());
    -      if (dataMapFactory.getDataMapType() == DataMapLevel.FG) {
    -        String blockletwritePath =
    -            writePath + CarbonCommonConstants.FILE_SEPARATOR + System.nanoTime();
    -        detailedBlocklet.setDataMapWriterPath(blockletwritePath);
    -        serializer.serializeBlocklet((FineGrainBlocklet) blocklet, blockletwritePath);
    +      List<ExtendedBlocklet> detailedBlockletsList =
    +          blockletDetailsFetcher.getExtendedBlocklet(blocklet, distributable.getSegment());
    --- End diff --
    
    The blocklets are already pruned so it one Blocklet should always be mapped with one ExtendedBlocklet. how it can return list?


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5492/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4313/



---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r184914643
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -277,4 +298,30 @@ public DataMapMeta getMeta() {
         }
         return blocklets;
       }
    +
    +  @Override public void cache(TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier,
    +      BlockletDataMapIndexWrapper blockletDataMapIndexWrapper) throws IOException, MemoryException {
    +    cache.put(tableBlockIndexUniqueIdentifier, blockletDataMapIndexWrapper);
    +  }
    +
    +  @Override
    +  public List<DataMapDistributable> getAllUncachedDistributables(
    +      List<DataMapDistributable> distributables) throws IOException {
    +    List<DataMapDistributable> distributablesToBeLoaded = new ArrayList<>(distributables.size());
    +    for (DataMapDistributable distributable : distributables) {
    +      Segment segment = distributable.getSegment();
    +      Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
    +          getTableBlockIndexUniqueIdentifiers(segment);
    +      // filter out the tableBlockIndexUniqueIdentifiers based on distributable
    +      TableBlockIndexUniqueIdentifier validIdentifier = BlockletDataMapUtil
    +          .filterIdentifiersBasedOnDistributable(tableBlockIndexUniqueIdentifiers,
    +              (BlockletDataMapDistributable) distributable);
    +      ((BlockletDataMapDistributable) distributable)
    +          .setTableBlockIndexUniqueIdentifier(validIdentifier);
    --- End diff --
    
    Move this line inside below if check


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4471/



---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r185804179
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -0,0 +1,194 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeMap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datamap.Segment;
    +import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
    +import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.BlockMetaInfo;
    +import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
    +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
    +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
    +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
    +import org.apache.carbondata.core.util.path.CarbonTablePath;
    +
    +import org.apache.hadoop.fs.LocatedFileStatus;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.fs.PathFilter;
    +import org.apache.hadoop.fs.RemoteIterator;
    +
    +
    +public class BlockletDataMapUtil {
    +
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
    +
    +  public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
    +      TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
    +      Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
    +      throws IOException {
    +    if (identifier.getMergeIndexFileName() != null
    +        && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      CarbonFile indexMergeFile = FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getMergeIndexFileName());
    +      if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
    +        indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
    +        filesRead.add(indexMergeFile.getPath());
    +      }
    +    }
    +    if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getIndexFileName()) });
    +    }
    +    DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
    +    Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
    +    List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
    +        identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +            .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
    +    for (DataFileFooter footer : indexInfo) {
    +      String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
    +      if (null == blockMetaInfoMap.get(blockPath)) {
    +        blockMetaInfoMap.put(blockPath, createBlockMetaInfo(fileNameToMetaInfoMapping, blockPath));
    +      }
    +    }
    +    return blockMetaInfoMap;
    +  }
    +
    +  /**
    +   * This method will create file name to block Meta Info Mapping. This method will reduce the
    +   * number of namenode calls and using this method one namenode will fetch 1000 entries
    +   *
    +   * @param segmentFilePath
    +   * @return
    +   * @throws IOException
    +   */
    +  public static Map<String, BlockMetaInfo> createCarbonDataFileBlockMetaInfoMapping(
    +      String segmentFilePath) throws IOException {
    +    Map<String, BlockMetaInfo> fileNameToMetaInfoMapping = new TreeMap();
    +    CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath);
    +    if (carbonFile instanceof AbstractDFSCarbonFile) {
    +      Path path = new Path(segmentFilePath);
    +      RemoteIterator<LocatedFileStatus> iter =
    +          ((AbstractDFSCarbonFile) carbonFile).fs.listLocatedStatus(path);
    +      PathFilter pathFilter = new PathFilter() {
    +        @Override public boolean accept(Path path) {
    +          return CarbonTablePath.isCarbonDataFile(path.getName());
    +        }
    +      };
    +      while (iter.hasNext()) {
    +        LocatedFileStatus fileStatus = iter.next();
    +        if (pathFilter.accept(fileStatus.getPath())) {
    +          String[] location = fileStatus.getBlockLocations()[0].getHosts();
    +          long len = fileStatus.getLen();
    +          BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len);
    +          fileNameToMetaInfoMapping.put(fileStatus.getPath().toString(), blockMetaInfo);
    +        }
    +      }
    +    }
    +    return fileNameToMetaInfoMapping;
    +  }
    +
    +  private static BlockMetaInfo createBlockMetaInfo(
    +      Map<String, BlockMetaInfo> fileNameToMetaInfoMapping, String carbonDataFile)
    +      throws IOException {
    +    FileFactory.FileType fileType = FileFactory.getFileType(carbonDataFile);
    +    switch (fileType) {
    +      case LOCAL:
    +        CarbonFile carbonFile = FileFactory.getCarbonFile(carbonDataFile, fileType);
    +        return new BlockMetaInfo(new String[] { "localhost" }, carbonFile.getSize());
    +      default:
    +        return fileNameToMetaInfoMapping.get(carbonDataFile);
    +    }
    +  }
    +
    +  public static Set<TableBlockIndexUniqueIdentifier> getTableBlockUniqueIdentifiers(Segment segment)
    +      throws IOException {
    +    Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers = new HashSet<>();
    +    Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +    for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +      Path indexFile = new Path(indexFileEntry.getKey());
    +      tableBlockIndexUniqueIdentifiers.add(
    +          new TableBlockIndexUniqueIdentifier(indexFile.getParent().toString(), indexFile.getName(),
    +              indexFileEntry.getValue(), segment.getSegmentNo()));
    +    }
    +    return tableBlockIndexUniqueIdentifiers;
    +  }
    +
    +  /**
    +   * This method will filter out the TableBlockIndexUniqueIdentifier belongs to that distributable
    +   *
    +   * @param tableBlockIndexUniqueIdentifiers
    +   * @param distributable
    +   * @return
    +   */
    +  public static TableBlockIndexUniqueIdentifier filterIdentifiersBasedOnDistributable(
    +      Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers,
    +      BlockletDataMapDistributable distributable) {
    +    TableBlockIndexUniqueIdentifier validIdentifier = null;
    +    for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier :
    +        tableBlockIndexUniqueIdentifiers) {
    +      if (distributable.getFilePath().equals(
    --- End diff --
    
    Better get the file name from distributable and compare only file name instead of comparing complete path


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4337/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4727/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    LGTM


---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r184911430
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/expr/DataMapExprWrapperImpl.java ---
    @@ -86,4 +87,8 @@ public DataMapExprWrapperImpl(TableDataMap dataMap, FilterResolverIntf expressio
       @Override public DataMapLevel getDataMapType() {
         return dataMap.getDataMapFactory().getDataMapType();
       }
    +
    +  public DataMapFactory getDataMapFactory() {
    --- End diff --
    
    this is not an interface method...can we try to get the dataMapFactory in some other way and remove this method from here?


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by dhatchayani <gi...@git.apache.org>.
Github user dhatchayani commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    retest this please


---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r184914866
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/DataFileFooter.java ---
    @@ -52,7 +52,7 @@
       /**
        * Information about leaf nodes of all columns in this file
        */
    -  private List<BlockletInfo> blockletList;
    +  private transient List<BlockletInfo> blockletList;
    --- End diff --
    
    why transient is required here?


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5576/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5682/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4343/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5588/



---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by dhatchayani <gi...@git.apache.org>.
Github user dhatchayani commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r185435135
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -0,0 +1,198 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeMap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datamap.Segment;
    +import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
    +import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.BlockMetaInfo;
    +import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
    +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
    +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
    +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
    +import org.apache.carbondata.core.util.path.CarbonTablePath;
    +
    +import org.apache.hadoop.fs.LocatedFileStatus;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.fs.PathFilter;
    +import org.apache.hadoop.fs.RemoteIterator;
    +
    +
    +public class BlockletDataMapUtil {
    +
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
    +
    +  public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
    +      TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
    +      Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
    +      throws IOException {
    +    if (identifier.getMergeIndexFileName() != null
    +        && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      CarbonFile indexMergeFile = FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getMergeIndexFileName());
    +      if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
    +        indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
    +        filesRead.add(indexMergeFile.getPath());
    +      }
    +    }
    +    if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getIndexFileName()) });
    +    }
    +    DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
    +    Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
    +    List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
    +        identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +            .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
    +    for (DataFileFooter footer : indexInfo) {
    +      String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
    +      if (FileFactory.isFileExist(blockPath)) {
    --- End diff --
    
    This check is required in case of update/delete operation. 


---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r185807371
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -0,0 +1,194 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeMap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datamap.Segment;
    +import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
    +import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.BlockMetaInfo;
    +import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
    +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
    +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
    +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
    +import org.apache.carbondata.core.util.path.CarbonTablePath;
    +
    +import org.apache.hadoop.fs.LocatedFileStatus;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.fs.PathFilter;
    +import org.apache.hadoop.fs.RemoteIterator;
    +
    +
    +public class BlockletDataMapUtil {
    +
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
    +
    +  public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
    +      TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
    +      Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
    +      throws IOException {
    +    if (identifier.getMergeIndexFileName() != null
    +        && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      CarbonFile indexMergeFile = FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getMergeIndexFileName());
    +      if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
    +        indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
    +        filesRead.add(indexMergeFile.getPath());
    +      }
    +    }
    +    if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getIndexFileName()) });
    +    }
    +    DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
    +    Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
    +    List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
    +        identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +            .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
    +    for (DataFileFooter footer : indexInfo) {
    +      String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
    +      if (null == blockMetaInfoMap.get(blockPath)) {
    +        blockMetaInfoMap.put(blockPath, createBlockMetaInfo(fileNameToMetaInfoMapping, blockPath));
    +      }
    +    }
    +    return blockMetaInfoMap;
    +  }
    +
    +  /**
    +   * This method will create file name to block Meta Info Mapping. This method will reduce the
    +   * number of namenode calls and using this method one namenode will fetch 1000 entries
    +   *
    +   * @param segmentFilePath
    +   * @return
    +   * @throws IOException
    +   */
    +  public static Map<String, BlockMetaInfo> createCarbonDataFileBlockMetaInfoMapping(
    +      String segmentFilePath) throws IOException {
    +    Map<String, BlockMetaInfo> fileNameToMetaInfoMapping = new TreeMap();
    +    CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath);
    --- End diff --
    
    You can directly call `carbonFile.locationAwareListFiles()` to get these instead of using this code


---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2244


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4490/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4415/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4683/



---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r184915870
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -0,0 +1,198 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeMap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datamap.Segment;
    +import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
    +import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.BlockMetaInfo;
    +import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
    +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
    +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
    +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
    +import org.apache.carbondata.core.util.path.CarbonTablePath;
    +
    +import org.apache.hadoop.fs.LocatedFileStatus;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.fs.PathFilter;
    +import org.apache.hadoop.fs.RemoteIterator;
    +
    +
    +public class BlockletDataMapUtil {
    +
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
    +
    +  public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
    +      TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
    +      Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
    +      throws IOException {
    +    if (identifier.getMergeIndexFileName() != null
    +        && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      CarbonFile indexMergeFile = FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getMergeIndexFileName());
    +      if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
    +        indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
    +        filesRead.add(indexMergeFile.getPath());
    +      }
    +    }
    +    if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getIndexFileName()) });
    +    }
    +    DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
    +    Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
    +    List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
    +        identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +            .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
    +    for (DataFileFooter footer : indexInfo) {
    +      String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
    +      if (FileFactory.isFileExist(blockPath)) {
    --- End diff --
    
    Is this if check required?.. It will make a namenode call for every blocklet entry here so better to remove this


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5502/



---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r184915943
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CacheClient.java ---
    @@ -16,21 +16,40 @@
      */
     package org.apache.carbondata.hadoop;
     
    +import java.util.List;
    +import java.util.Map;
    +import java.util.concurrent.ConcurrentHashMap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
     import org.apache.carbondata.core.cache.Cache;
     import org.apache.carbondata.core.cache.CacheProvider;
     import org.apache.carbondata.core.cache.CacheType;
    +import org.apache.carbondata.core.datastore.SegmentTaskIndexStore;
     import org.apache.carbondata.core.datastore.TableSegmentUniqueIdentifier;
    +import org.apache.carbondata.core.datastore.block.SegmentProperties;
     import org.apache.carbondata.core.datastore.block.SegmentTaskIndexWrapper;
    +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
    +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
     
     /**
      * CacheClient : Holds all the Cache access clients for Btree, Dictionary
      */
     public class CacheClient {
     
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(CacheClient.class.getName());
    +
    +  private final Object lock = new Object();
    +
       // segment access client for driver LRU cache
       private CacheAccessClient<TableSegmentUniqueIdentifier, SegmentTaskIndexWrapper>
           segmentAccessClient;
     
    +  private static Map<SegmentTaskIndexStore.SegmentPropertiesWrapper, SegmentProperties>
    +      segmentProperties =
    +      new ConcurrentHashMap<>();
    --- End diff --
    
    Move this to above line


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5679/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4757/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4605/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by dhatchayani <gi...@git.apache.org>.
Github user dhatchayani commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Retest this please


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5650/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5639/



---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r185484269
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -0,0 +1,198 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeMap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datamap.Segment;
    +import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
    +import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.BlockMetaInfo;
    +import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
    +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
    +import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
    +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
    +import org.apache.carbondata.core.util.path.CarbonTablePath;
    +
    +import org.apache.hadoop.fs.LocatedFileStatus;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.fs.PathFilter;
    +import org.apache.hadoop.fs.RemoteIterator;
    +
    +
    +public class BlockletDataMapUtil {
    +
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
    +
    +  public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
    +      TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
    +      Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
    +      throws IOException {
    +    if (identifier.getMergeIndexFileName() != null
    +        && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      CarbonFile indexMergeFile = FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getMergeIndexFileName());
    +      if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
    +        indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
    +        filesRead.add(indexMergeFile.getPath());
    +      }
    +    }
    +    if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
    +      indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
    +          identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +              .getIndexFileName()) });
    +    }
    +    DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
    +    Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
    +    List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
    +        identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
    +            .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
    +    for (DataFileFooter footer : indexInfo) {
    +      String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
    +      if (FileFactory.isFileExist(blockPath)) {
    --- End diff --
    
    fileNameToMetaInfoMapping will contain list of all the existing carbondata files. I think u can remove this namenode call and make use of this map to check whether a file physically exists or not


---

[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2244#discussion_r185796999
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -124,32 +132,41 @@ public DataMapWriter createWriter(Segment segment, String writeDirectoryPath) {
           }
           return detailedBlocklets;
         }
    -    List<TableBlockIndexUniqueIdentifier> identifiers =
    +
    +    Set<TableBlockIndexUniqueIdentifier> identifiers =
             getTableBlockIndexUniqueIdentifiers(segment);
         // Retrieve each blocklets detail information from blocklet datamap
         for (Blocklet blocklet : blocklets) {
    -      detailedBlocklets.add(getExtendedBlocklet(identifiers, blocklet));
    +      detailedBlocklets.addAll(getExtendedBlocklet(identifiers, blocklet));
         }
         return detailedBlocklets;
       }
     
       @Override
    -  public ExtendedBlocklet getExtendedBlocklet(Blocklet blocklet, Segment segment)
    +  public List<ExtendedBlocklet> getExtendedBlocklet(Blocklet blocklet, Segment segment)
           throws IOException {
    +    List<ExtendedBlocklet> extendedBlocklets = new ArrayList<>();
         if (blocklet instanceof ExtendedBlocklet) {
    -      return (ExtendedBlocklet) blocklet;
    +      extendedBlocklets.add((ExtendedBlocklet) blocklet);
    +      return extendedBlocklets;
         }
    -    List<TableBlockIndexUniqueIdentifier> identifiers =
    +    Set<TableBlockIndexUniqueIdentifier> identifiers =
             getTableBlockIndexUniqueIdentifiers(segment);
         return getExtendedBlocklet(identifiers, blocklet);
       }
     
    -  private ExtendedBlocklet getExtendedBlocklet(List<TableBlockIndexUniqueIdentifier> identifiers,
    -      Blocklet blocklet) throws IOException {
    +  private List<ExtendedBlocklet> getExtendedBlocklet(
    +      Set<TableBlockIndexUniqueIdentifier> identifiers, Blocklet blocklet) throws IOException {
    --- End diff --
    
    It should return only one blocklet not list.


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5631/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4333/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4522/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4484/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5499/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5644/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4482/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4756/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    @dhatchayani Please rebase


---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4720/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4603/



---

[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2244
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4519/



---