You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by jackylk <gi...@git.apache.org> on 2018/02/27 08:31:13 UTC
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
GitHub user jackylk opened a pull request:
https://github.com/apache/carbondata/pull/2003
[CARBONDATA-2206] support lucene index datamap
This PR is an initial effort to integrate lucene as an index datamap into carbondata.
A new module called carbondata-lucene is added to support lucene datamap:
1. Add LuceneFineGrainDataMap, implement FineGrainDataMap interface.
2. Add LuceneCoarseGrainDataMap, implement CoarseGrainDataMap interface.
3. Support writing lucene index via LuceneDataMapWriter.
4. Implement LuceneDataMapFactory
5. A UDF called `text_match` is added
User can use lucene datamap as:
```
CREATE TABLE main(id INT, name STRING, city STRING, age INT)
STORED BY 'carbondata'
CREATE DATAMAP dm ON TABLE main
USING 'org.apache.carbondata.datamap.lucene.LuceneFineGrainDataMapFactory'
SELECT * FROM main WHERE TEXT_MATCH('name:n10')
```
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata lucene-datamap-initial2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2003.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2003
----
commit e1d5b6c88b06d0c9d418008002d10a52368a0d84
Author: Jacky Li <ja...@...>
Date: 2018-02-26T08:30:38Z
support lucene index datamap
----
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2702/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2003
merged into datamap branch
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170864075
--- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/FilterExpressParser.java ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.lucene;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.standard.StandardAnalyzer;
+import org.apache.lucene.queryparser.classic.ParseException;
+import org.apache.lucene.queryparser.classic.QueryParser;
+import org.apache.lucene.search.BooleanClause;
+import org.apache.lucene.search.Query;
+
+@InterfaceAudience.Internal
+public class FilterExpressParser extends QueryParser {
--- End diff --
remove unuseful class
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2693/
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170865828
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/intf/ExpressionType.java ---
@@ -42,5 +42,6 @@
TRUE,
STARTSWITH,
ENDSWITH,
- CONTAINSWITH
+ CONTAINSWITH,
+ MATCH
--- End diff --
fixed
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk closed the pull request at:
https://github.com/apache/carbondata/pull/2003
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2003
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3711/
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170864420
--- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneCoarseGrainDataMap.java ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.lucene;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datamap.dev.DataMapModel;
+import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.standard.StandardAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.IndexableField;
+import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
+import org.apache.lucene.queryparser.classic.ParseException;
+import org.apache.lucene.queryparser.classic.QueryParser;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.ScoreDoc;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.store.Directory;
+import org.apache.solr.store.hdfs.HdfsDirectory;
+
+@InterfaceAudience.Internal
+public class LuceneCoarseGrainDataMap extends CoarseGrainDataMap {
+
+ /**
+ * log information
+ */
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(LuceneCoarseGrainDataMap.class.getName());
+
+ public static final int BLOCKID_ID = 0;
+
+ public static final int BLOCKLETID_ID = 1;
+
+ public static final int PAGEID_ID = 2;
+
+ public static final int ROWID_ID = 3;
+ /**
+ * searcher object for this datamap
+ */
+ private IndexSearcher indexSearcher = null;
+
+ /**
+ * default max values to return
+ */
+ private static int MAX_RESULT_NUMBER = 100;
--- End diff --
better to config it
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170866114
--- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneCoarseGrainDataMapFactory.java ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.lucene;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datamap.DataMapDistributable;
+import org.apache.carbondata.core.datamap.DataMapLevel;
+import org.apache.carbondata.core.datamap.dev.DataMapModel;
+import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
+import org.apache.carbondata.core.memory.MemoryException;
+
+@InterfaceAudience.Internal
+public class LuceneCoarseGrainDataMapFactory extends LuceneDataMapFactoryBase<CoarseGrainDataMap> {
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(LuceneCoarseGrainDataMapFactory.class.getName());
+
+ /**
+ * Get the datamap for segmentid
+ */
+ public List<CoarseGrainDataMap> getDataMaps(String segmentId) throws IOException {
+ List<CoarseGrainDataMap> lstDataMap = new ArrayList<>();
+ CoarseGrainDataMap dataMap = new LuceneCoarseGrainDataMap(analyzer);
+ try {
+ dataMap.init(new DataMapModel(
+ tableIdentifier.getTablePath() + "/Fact/Part0/Segment_" + segmentId + File.separator
--- End diff --
fixed
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3955/
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170863724
--- Diff: datamap/lucene/pom.xml ---
@@ -0,0 +1,152 @@
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+
+ <parent>
+ <groupId>org.apache.carbondata</groupId>
+ <artifactId>carbondata-parent</artifactId>
+ <version>1.4.0-SNAPSHOT</version>
+ <relativePath>../../pom.xml</relativePath>
+ </parent>
+
+ <artifactId>carbondata-lucene</artifactId>
+ <name>Apache CarbonData :: Lucene Index DataMap</name>
+
+ <properties>
+ <dev.path>${basedir}/../../dev</dev.path>
+ <lucene.version>6.3.0</lucene.version>
+ <solr.version>6.3.0</solr.version>
+ </properties>
+
+ <dependencies>
+ <dependency>
+ <groupId>org.apache.carbondata</groupId>
+ <artifactId>carbondata-spark2</artifactId>
+ <version>${project.version}</version>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-core</artifactId>
+ <version>${lucene.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-analyzers-common</artifactId>
+ <version>${lucene.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-queryparser</artifactId>
+ <version>${lucene.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-sandbox</artifactId>
+ <version>${lucene.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.solr</groupId>
+ <artifactId>solr-core</artifactId>
+ <version>${solr.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.solr</groupId>
+ <artifactId>solr-solrj</artifactId>
+ <version>${solr.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.scalatest</groupId>
+ <artifactId>scalatest_${scala.binary.version}</artifactId>
+ <scope>test</scope>
+ </dependency>
+ </dependencies>
+
+ <build>
+ <testSourceDirectory>src/test/scala</testSourceDirectory>
+ <resources>
+ <resource>
+ <directory>src/resources</directory>
+ </resource>
+ <resource>
+ <directory>.</directory>
+ <includes>
+ <include>CARBON_SPARK_INTERFACELogResource.properties</include>
--- End diff --
not require
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170863075
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/intf/ExpressionType.java ---
@@ -42,5 +42,6 @@
TRUE,
STARTSWITH,
ENDSWITH,
- CONTAINSWITH
+ CONTAINSWITH,
+ MATCH
--- End diff --
please use TEXT_MATCH
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2710/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2695/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2691/
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170866003
--- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/FilterExpressParser.java ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.lucene;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.standard.StandardAnalyzer;
+import org.apache.lucene.queryparser.classic.ParseException;
+import org.apache.lucene.queryparser.classic.QueryParser;
+import org.apache.lucene.search.BooleanClause;
+import org.apache.lucene.search.Query;
+
+@InterfaceAudience.Internal
+public class FilterExpressParser extends QueryParser {
--- End diff --
fixed
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3933/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3937/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3952/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2688/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2698/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/2003
LGTM
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3944/
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170865464
--- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneCoarseGrainDataMapFactory.java ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.lucene;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datamap.DataMapDistributable;
+import org.apache.carbondata.core.datamap.DataMapLevel;
+import org.apache.carbondata.core.datamap.dev.DataMapModel;
+import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
+import org.apache.carbondata.core.memory.MemoryException;
+
+@InterfaceAudience.Internal
+public class LuceneCoarseGrainDataMapFactory extends LuceneDataMapFactoryBase<CoarseGrainDataMap> {
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(LuceneCoarseGrainDataMapFactory.class.getName());
+
+ /**
+ * Get the datamap for segmentid
+ */
+ public List<CoarseGrainDataMap> getDataMaps(String segmentId) throws IOException {
+ List<CoarseGrainDataMap> lstDataMap = new ArrayList<>();
+ CoarseGrainDataMap dataMap = new LuceneCoarseGrainDataMap(analyzer);
+ try {
+ dataMap.init(new DataMapModel(
+ tableIdentifier.getTablePath() + "/Fact/Part0/Segment_" + segmentId + File.separator
--- End diff --
please invoke getSegmentDIr
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3939/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3947/
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170866077
--- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneCoarseGrainDataMap.java ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.lucene;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datamap.dev.DataMapModel;
+import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.standard.StandardAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.IndexableField;
+import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
+import org.apache.lucene.queryparser.classic.ParseException;
+import org.apache.lucene.queryparser.classic.QueryParser;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.ScoreDoc;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.store.Directory;
+import org.apache.solr.store.hdfs.HdfsDirectory;
+
+@InterfaceAudience.Internal
+public class LuceneCoarseGrainDataMap extends CoarseGrainDataMap {
+
+ /**
+ * log information
+ */
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(LuceneCoarseGrainDataMap.class.getName());
+
+ public static final int BLOCKID_ID = 0;
+
+ public static final int BLOCKLETID_ID = 1;
+
+ public static final int PAGEID_ID = 2;
+
+ public static final int ROWID_ID = 3;
+ /**
+ * searcher object for this datamap
+ */
+ private IndexSearcher indexSearcher = null;
+
+ /**
+ * default max values to return
+ */
+ private static int MAX_RESULT_NUMBER = 100;
--- End diff --
I will leave it to future PR
---
[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2003#discussion_r170865978
--- Diff: datamap/lucene/pom.xml ---
@@ -0,0 +1,152 @@
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+
+ <parent>
+ <groupId>org.apache.carbondata</groupId>
+ <artifactId>carbondata-parent</artifactId>
+ <version>1.4.0-SNAPSHOT</version>
+ <relativePath>../../pom.xml</relativePath>
+ </parent>
+
+ <artifactId>carbondata-lucene</artifactId>
+ <name>Apache CarbonData :: Lucene Index DataMap</name>
+
+ <properties>
+ <dev.path>${basedir}/../../dev</dev.path>
+ <lucene.version>6.3.0</lucene.version>
+ <solr.version>6.3.0</solr.version>
+ </properties>
+
+ <dependencies>
+ <dependency>
+ <groupId>org.apache.carbondata</groupId>
+ <artifactId>carbondata-spark2</artifactId>
+ <version>${project.version}</version>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-core</artifactId>
+ <version>${lucene.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-analyzers-common</artifactId>
+ <version>${lucene.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-queryparser</artifactId>
+ <version>${lucene.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-sandbox</artifactId>
+ <version>${lucene.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.solr</groupId>
+ <artifactId>solr-core</artifactId>
+ <version>${solr.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.solr</groupId>
+ <artifactId>solr-solrj</artifactId>
+ <version>${solr.version}</version>
+ <exclusions>
+ <exclusion>
+ <groupId>*</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.scalatest</groupId>
+ <artifactId>scalatest_${scala.binary.version}</artifactId>
+ <scope>test</scope>
+ </dependency>
+ </dependencies>
+
+ <build>
+ <testSourceDirectory>src/test/scala</testSourceDirectory>
+ <resources>
+ <resource>
+ <directory>src/resources</directory>
+ </resource>
+ <resource>
+ <directory>.</directory>
+ <includes>
+ <include>CARBON_SPARK_INTERFACELogResource.properties</include>
--- End diff --
fixed
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2707/
---
[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2003
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2692/
---