You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by dhatchayani <gi...@git.apache.org> on 2018/04/27 17:33:25 UTC
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
GitHub user dhatchayani opened a pull request:
https://github.com/apache/carbondata/pull/2244
[CARBONDATA-2310] Refactored code to improve Distributable interface
[CARBONDATA-2362] Changing the Cacheable object from DataMap to Wrapper
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2362
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2244.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2244
----
commit a2fa54a69c2ccd4edf4016e28cef7b5d2890aa60
Author: dhatchayani <dh...@...>
Date: 2018-04-27T17:33:52Z
[CARBONDATA-2310] Refactored code to improve Distributable interface
[CARBONDATA-2362] Changing the Cacheable object from DataMap to Wrapper
----
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5480/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4427/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4326/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4478/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5508/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4755/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4316/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4596/
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r185804764
--- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeMap;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.indexstore.BlockMetaInfo;
+import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.fs.RemoteIterator;
+
+
+public class BlockletDataMapUtil {
+
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
+
+ public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
+ TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
+ Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
+ throws IOException {
+ if (identifier.getMergeIndexFileName() != null
+ && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ CarbonFile indexMergeFile = FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getMergeIndexFileName());
+ if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
+ filesRead.add(indexMergeFile.getPath());
+ }
+ }
+ if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName()) });
+ }
+ DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
+ Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
+ List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
+ for (DataFileFooter footer : indexInfo) {
+ String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
+ if (null == blockMetaInfoMap.get(blockPath)) {
+ blockMetaInfoMap.put(blockPath, createBlockMetaInfo(fileNameToMetaInfoMapping, blockPath));
+ }
+ }
+ return blockMetaInfoMap;
+ }
+
+ /**
+ * This method will create file name to block Meta Info Mapping. This method will reduce the
+ * number of namenode calls and using this method one namenode will fetch 1000 entries
+ *
+ * @param segmentFilePath
+ * @return
+ * @throws IOException
+ */
+ public static Map<String, BlockMetaInfo> createCarbonDataFileBlockMetaInfoMapping(
+ String segmentFilePath) throws IOException {
+ Map<String, BlockMetaInfo> fileNameToMetaInfoMapping = new TreeMap();
+ CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath);
+ if (carbonFile instanceof AbstractDFSCarbonFile) {
+ Path path = new Path(segmentFilePath);
+ RemoteIterator<LocatedFileStatus> iter =
+ ((AbstractDFSCarbonFile) carbonFile).fs.listLocatedStatus(path);
+ PathFilter pathFilter = new PathFilter() {
+ @Override public boolean accept(Path path) {
+ return CarbonTablePath.isCarbonDataFile(path.getName());
+ }
+ };
+ while (iter.hasNext()) {
+ LocatedFileStatus fileStatus = iter.next();
+ if (pathFilter.accept(fileStatus.getPath())) {
+ String[] location = fileStatus.getBlockLocations()[0].getHosts();
+ long len = fileStatus.getLen();
+ BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len);
+ fileNameToMetaInfoMapping.put(fileStatus.getPath().toString(), blockMetaInfo);
+ }
+ }
+ }
+ return fileNameToMetaInfoMapping;
+ }
+
+ private static BlockMetaInfo createBlockMetaInfo(
+ Map<String, BlockMetaInfo> fileNameToMetaInfoMapping, String carbonDataFile)
+ throws IOException {
--- End diff --
no need to add throws exception
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r185794118
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
@@ -159,16 +159,18 @@
FileFactory.mkdirs(writePath, FileFactory.getFileType(writePath));
}
for (Blocklet blocklet : blocklets) {
- ExtendedBlocklet detailedBlocklet = blockletDetailsFetcher
- .getExtendedBlocklet(blocklet, distributable.getSegment());
- if (dataMapFactory.getDataMapType() == DataMapLevel.FG) {
- String blockletwritePath =
- writePath + CarbonCommonConstants.FILE_SEPARATOR + System.nanoTime();
- detailedBlocklet.setDataMapWriterPath(blockletwritePath);
- serializer.serializeBlocklet((FineGrainBlocklet) blocklet, blockletwritePath);
+ List<ExtendedBlocklet> detailedBlockletsList =
+ blockletDetailsFetcher.getExtendedBlocklet(blocklet, distributable.getSegment());
--- End diff --
The blocklets are already pruned so it one Blocklet should always be mapped with one ExtendedBlocklet. how it can return list?
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5492/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4313/
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r184914643
--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
@@ -277,4 +298,30 @@ public DataMapMeta getMeta() {
}
return blocklets;
}
+
+ @Override public void cache(TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier,
+ BlockletDataMapIndexWrapper blockletDataMapIndexWrapper) throws IOException, MemoryException {
+ cache.put(tableBlockIndexUniqueIdentifier, blockletDataMapIndexWrapper);
+ }
+
+ @Override
+ public List<DataMapDistributable> getAllUncachedDistributables(
+ List<DataMapDistributable> distributables) throws IOException {
+ List<DataMapDistributable> distributablesToBeLoaded = new ArrayList<>(distributables.size());
+ for (DataMapDistributable distributable : distributables) {
+ Segment segment = distributable.getSegment();
+ Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
+ getTableBlockIndexUniqueIdentifiers(segment);
+ // filter out the tableBlockIndexUniqueIdentifiers based on distributable
+ TableBlockIndexUniqueIdentifier validIdentifier = BlockletDataMapUtil
+ .filterIdentifiersBasedOnDistributable(tableBlockIndexUniqueIdentifiers,
+ (BlockletDataMapDistributable) distributable);
+ ((BlockletDataMapDistributable) distributable)
+ .setTableBlockIndexUniqueIdentifier(validIdentifier);
--- End diff --
Move this line inside below if check
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4471/
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r185804179
--- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeMap;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.indexstore.BlockMetaInfo;
+import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.fs.RemoteIterator;
+
+
+public class BlockletDataMapUtil {
+
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
+
+ public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
+ TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
+ Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
+ throws IOException {
+ if (identifier.getMergeIndexFileName() != null
+ && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ CarbonFile indexMergeFile = FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getMergeIndexFileName());
+ if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
+ filesRead.add(indexMergeFile.getPath());
+ }
+ }
+ if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName()) });
+ }
+ DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
+ Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
+ List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
+ for (DataFileFooter footer : indexInfo) {
+ String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
+ if (null == blockMetaInfoMap.get(blockPath)) {
+ blockMetaInfoMap.put(blockPath, createBlockMetaInfo(fileNameToMetaInfoMapping, blockPath));
+ }
+ }
+ return blockMetaInfoMap;
+ }
+
+ /**
+ * This method will create file name to block Meta Info Mapping. This method will reduce the
+ * number of namenode calls and using this method one namenode will fetch 1000 entries
+ *
+ * @param segmentFilePath
+ * @return
+ * @throws IOException
+ */
+ public static Map<String, BlockMetaInfo> createCarbonDataFileBlockMetaInfoMapping(
+ String segmentFilePath) throws IOException {
+ Map<String, BlockMetaInfo> fileNameToMetaInfoMapping = new TreeMap();
+ CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath);
+ if (carbonFile instanceof AbstractDFSCarbonFile) {
+ Path path = new Path(segmentFilePath);
+ RemoteIterator<LocatedFileStatus> iter =
+ ((AbstractDFSCarbonFile) carbonFile).fs.listLocatedStatus(path);
+ PathFilter pathFilter = new PathFilter() {
+ @Override public boolean accept(Path path) {
+ return CarbonTablePath.isCarbonDataFile(path.getName());
+ }
+ };
+ while (iter.hasNext()) {
+ LocatedFileStatus fileStatus = iter.next();
+ if (pathFilter.accept(fileStatus.getPath())) {
+ String[] location = fileStatus.getBlockLocations()[0].getHosts();
+ long len = fileStatus.getLen();
+ BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len);
+ fileNameToMetaInfoMapping.put(fileStatus.getPath().toString(), blockMetaInfo);
+ }
+ }
+ }
+ return fileNameToMetaInfoMapping;
+ }
+
+ private static BlockMetaInfo createBlockMetaInfo(
+ Map<String, BlockMetaInfo> fileNameToMetaInfoMapping, String carbonDataFile)
+ throws IOException {
+ FileFactory.FileType fileType = FileFactory.getFileType(carbonDataFile);
+ switch (fileType) {
+ case LOCAL:
+ CarbonFile carbonFile = FileFactory.getCarbonFile(carbonDataFile, fileType);
+ return new BlockMetaInfo(new String[] { "localhost" }, carbonFile.getSize());
+ default:
+ return fileNameToMetaInfoMapping.get(carbonDataFile);
+ }
+ }
+
+ public static Set<TableBlockIndexUniqueIdentifier> getTableBlockUniqueIdentifiers(Segment segment)
+ throws IOException {
+ Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers = new HashSet<>();
+ Map<String, String> indexFiles = segment.getCommittedIndexFile();
+ for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
+ Path indexFile = new Path(indexFileEntry.getKey());
+ tableBlockIndexUniqueIdentifiers.add(
+ new TableBlockIndexUniqueIdentifier(indexFile.getParent().toString(), indexFile.getName(),
+ indexFileEntry.getValue(), segment.getSegmentNo()));
+ }
+ return tableBlockIndexUniqueIdentifiers;
+ }
+
+ /**
+ * This method will filter out the TableBlockIndexUniqueIdentifier belongs to that distributable
+ *
+ * @param tableBlockIndexUniqueIdentifiers
+ * @param distributable
+ * @return
+ */
+ public static TableBlockIndexUniqueIdentifier filterIdentifiersBasedOnDistributable(
+ Set<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers,
+ BlockletDataMapDistributable distributable) {
+ TableBlockIndexUniqueIdentifier validIdentifier = null;
+ for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier :
+ tableBlockIndexUniqueIdentifiers) {
+ if (distributable.getFilePath().equals(
--- End diff --
Better get the file name from distributable and compare only file name instead of comparing complete path
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4337/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4727/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
LGTM
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r184911430
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/expr/DataMapExprWrapperImpl.java ---
@@ -86,4 +87,8 @@ public DataMapExprWrapperImpl(TableDataMap dataMap, FilterResolverIntf expressio
@Override public DataMapLevel getDataMapType() {
return dataMap.getDataMapFactory().getDataMapType();
}
+
+ public DataMapFactory getDataMapFactory() {
--- End diff --
this is not an interface method...can we try to get the dataMapFactory in some other way and remove this method from here?
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by dhatchayani <gi...@git.apache.org>.
Github user dhatchayani commented on the issue:
https://github.com/apache/carbondata/pull/2244
retest this please
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r184914866
--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/DataFileFooter.java ---
@@ -52,7 +52,7 @@
/**
* Information about leaf nodes of all columns in this file
*/
- private List<BlockletInfo> blockletList;
+ private transient List<BlockletInfo> blockletList;
--- End diff --
why transient is required here?
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5576/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5682/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4343/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5588/
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by dhatchayani <gi...@git.apache.org>.
Github user dhatchayani commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r185435135
--- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeMap;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.indexstore.BlockMetaInfo;
+import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.fs.RemoteIterator;
+
+
+public class BlockletDataMapUtil {
+
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
+
+ public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
+ TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
+ Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
+ throws IOException {
+ if (identifier.getMergeIndexFileName() != null
+ && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ CarbonFile indexMergeFile = FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getMergeIndexFileName());
+ if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
+ filesRead.add(indexMergeFile.getPath());
+ }
+ }
+ if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName()) });
+ }
+ DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
+ Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
+ List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
+ for (DataFileFooter footer : indexInfo) {
+ String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
+ if (FileFactory.isFileExist(blockPath)) {
--- End diff --
This check is required in case of update/delete operation.
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r185807371
--- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeMap;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.indexstore.BlockMetaInfo;
+import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.fs.RemoteIterator;
+
+
+public class BlockletDataMapUtil {
+
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
+
+ public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
+ TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
+ Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
+ throws IOException {
+ if (identifier.getMergeIndexFileName() != null
+ && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ CarbonFile indexMergeFile = FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getMergeIndexFileName());
+ if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
+ filesRead.add(indexMergeFile.getPath());
+ }
+ }
+ if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName()) });
+ }
+ DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
+ Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
+ List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
+ for (DataFileFooter footer : indexInfo) {
+ String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
+ if (null == blockMetaInfoMap.get(blockPath)) {
+ blockMetaInfoMap.put(blockPath, createBlockMetaInfo(fileNameToMetaInfoMapping, blockPath));
+ }
+ }
+ return blockMetaInfoMap;
+ }
+
+ /**
+ * This method will create file name to block Meta Info Mapping. This method will reduce the
+ * number of namenode calls and using this method one namenode will fetch 1000 entries
+ *
+ * @param segmentFilePath
+ * @return
+ * @throws IOException
+ */
+ public static Map<String, BlockMetaInfo> createCarbonDataFileBlockMetaInfoMapping(
+ String segmentFilePath) throws IOException {
+ Map<String, BlockMetaInfo> fileNameToMetaInfoMapping = new TreeMap();
+ CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath);
--- End diff --
You can directly call `carbonFile.locationAwareListFiles()` to get these instead of using this code
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/2244
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4490/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4415/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4683/
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r184915870
--- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeMap;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.indexstore.BlockMetaInfo;
+import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.fs.RemoteIterator;
+
+
+public class BlockletDataMapUtil {
+
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
+
+ public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
+ TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
+ Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
+ throws IOException {
+ if (identifier.getMergeIndexFileName() != null
+ && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ CarbonFile indexMergeFile = FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getMergeIndexFileName());
+ if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
+ filesRead.add(indexMergeFile.getPath());
+ }
+ }
+ if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName()) });
+ }
+ DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
+ Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
+ List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
+ for (DataFileFooter footer : indexInfo) {
+ String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
+ if (FileFactory.isFileExist(blockPath)) {
--- End diff --
Is this if check required?.. It will make a namenode call for every blocklet entry here so better to remove this
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5502/
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r184915943
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CacheClient.java ---
@@ -16,21 +16,40 @@
*/
package org.apache.carbondata.hadoop;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
import org.apache.carbondata.core.cache.Cache;
import org.apache.carbondata.core.cache.CacheProvider;
import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.datastore.SegmentTaskIndexStore;
import org.apache.carbondata.core.datastore.TableSegmentUniqueIdentifier;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
import org.apache.carbondata.core.datastore.block.SegmentTaskIndexWrapper;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
/**
* CacheClient : Holds all the Cache access clients for Btree, Dictionary
*/
public class CacheClient {
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(CacheClient.class.getName());
+
+ private final Object lock = new Object();
+
// segment access client for driver LRU cache
private CacheAccessClient<TableSegmentUniqueIdentifier, SegmentTaskIndexWrapper>
segmentAccessClient;
+ private static Map<SegmentTaskIndexStore.SegmentPropertiesWrapper, SegmentProperties>
+ segmentProperties =
+ new ConcurrentHashMap<>();
--- End diff --
Move this to above line
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5679/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4757/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4605/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by dhatchayani <gi...@git.apache.org>.
Github user dhatchayani commented on the issue:
https://github.com/apache/carbondata/pull/2244
Retest this please
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5650/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5639/
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r185484269
--- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeMap;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.indexstore.BlockMetaInfo;
+import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.fs.RemoteIterator;
+
+
+public class BlockletDataMapUtil {
+
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(BlockletDataMapUtil.class.getName());
+
+ public static Map<String, BlockMetaInfo> getBlockMetaInfoMap(
+ TableBlockIndexUniqueIdentifier identifier, SegmentIndexFileStore indexFileStore,
+ Set<String> filesRead, Map<String, BlockMetaInfo> fileNameToMetaInfoMapping)
+ throws IOException {
+ if (identifier.getMergeIndexFileName() != null
+ && indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ CarbonFile indexMergeFile = FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getMergeIndexFileName());
+ if (indexMergeFile.exists() && !filesRead.contains(indexMergeFile.getPath())) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { indexMergeFile });
+ filesRead.add(indexMergeFile.getPath());
+ }
+ }
+ if (indexFileStore.getFileData(identifier.getIndexFileName()) == null) {
+ indexFileStore.readAllIIndexOfSegment(new CarbonFile[] { FileFactory.getCarbonFile(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName()) });
+ }
+ DataFileFooterConverter fileFooterConverter = new DataFileFooterConverter();
+ Map<String, BlockMetaInfo> blockMetaInfoMap = new HashMap<>();
+ List<DataFileFooter> indexInfo = fileFooterConverter.getIndexInfo(
+ identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier
+ .getIndexFileName(), indexFileStore.getFileData(identifier.getIndexFileName()));
+ for (DataFileFooter footer : indexInfo) {
+ String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath();
+ if (FileFactory.isFileExist(blockPath)) {
--- End diff --
fileNameToMetaInfoMapping will contain list of all the existing carbondata files. I think u can remove this namenode call and make use of this map to check whether a file physically exists or not
---
[GitHub] carbondata pull request #2244: [CARBONDATA-2310] Refactored code to improve ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2244#discussion_r185796999
--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
@@ -124,32 +132,41 @@ public DataMapWriter createWriter(Segment segment, String writeDirectoryPath) {
}
return detailedBlocklets;
}
- List<TableBlockIndexUniqueIdentifier> identifiers =
+
+ Set<TableBlockIndexUniqueIdentifier> identifiers =
getTableBlockIndexUniqueIdentifiers(segment);
// Retrieve each blocklets detail information from blocklet datamap
for (Blocklet blocklet : blocklets) {
- detailedBlocklets.add(getExtendedBlocklet(identifiers, blocklet));
+ detailedBlocklets.addAll(getExtendedBlocklet(identifiers, blocklet));
}
return detailedBlocklets;
}
@Override
- public ExtendedBlocklet getExtendedBlocklet(Blocklet blocklet, Segment segment)
+ public List<ExtendedBlocklet> getExtendedBlocklet(Blocklet blocklet, Segment segment)
throws IOException {
+ List<ExtendedBlocklet> extendedBlocklets = new ArrayList<>();
if (blocklet instanceof ExtendedBlocklet) {
- return (ExtendedBlocklet) blocklet;
+ extendedBlocklets.add((ExtendedBlocklet) blocklet);
+ return extendedBlocklets;
}
- List<TableBlockIndexUniqueIdentifier> identifiers =
+ Set<TableBlockIndexUniqueIdentifier> identifiers =
getTableBlockIndexUniqueIdentifiers(segment);
return getExtendedBlocklet(identifiers, blocklet);
}
- private ExtendedBlocklet getExtendedBlocklet(List<TableBlockIndexUniqueIdentifier> identifiers,
- Blocklet blocklet) throws IOException {
+ private List<ExtendedBlocklet> getExtendedBlocklet(
+ Set<TableBlockIndexUniqueIdentifier> identifiers, Blocklet blocklet) throws IOException {
--- End diff --
It should return only one blocklet not list.
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5631/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4333/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4522/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4484/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5499/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5644/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4482/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4756/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
@dhatchayani Please rebase
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4720/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2244
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4603/
---
[GitHub] carbondata issue #2244: [CARBONDATA-2310] Refactored code to improve Distrib...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2244
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4519/
---