You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by jackylk <gi...@git.apache.org> on 2018/02/27 08:31:13 UTC

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

GitHub user jackylk opened a pull request:

    https://github.com/apache/carbondata/pull/2003

    [CARBONDATA-2206] support lucene index datamap

    This PR is an initial effort to integrate lucene as an index datamap into carbondata. 
    A new module called carbondata-lucene is added to support lucene datamap:
    1. Add LuceneFineGrainDataMap, implement FineGrainDataMap interface.
    2. Add LuceneCoarseGrainDataMap, implement CoarseGrainDataMap interface.
    3. Support writing lucene index via LuceneDataMapWriter. 
    4. Implement LuceneDataMapFactory 
    5. A UDF called `text_match` is added
    
    User can use lucene datamap as:
    ```
        CREATE TABLE main(id INT, name STRING, city STRING, age INT)
        STORED BY 'carbondata'
    
        CREATE DATAMAP dm ON TABLE main
        USING 'org.apache.carbondata.datamap.lucene.LuceneFineGrainDataMapFactory'
    
        SELECT * FROM main WHERE TEXT_MATCH('name:n10')
    ```
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata lucene-datamap-initial2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2003.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2003
    
----
commit e1d5b6c88b06d0c9d418008002d10a52368a0d84
Author: Jacky Li <ja...@...>
Date:   2018-02-26T08:30:38Z

    support lucene index datamap

----


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2702/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    merged into datamap branch


---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170864075
  
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/FilterExpressParser.java ---
    @@ -0,0 +1,143 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.datamap.lucene;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.carbondata.common.annotations.InterfaceAudience;
    +import org.apache.carbondata.core.scan.expression.Expression;
    +import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
    +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
    +
    +import org.apache.lucene.analysis.Analyzer;
    +import org.apache.lucene.analysis.standard.StandardAnalyzer;
    +import org.apache.lucene.queryparser.classic.ParseException;
    +import org.apache.lucene.queryparser.classic.QueryParser;
    +import org.apache.lucene.search.BooleanClause;
    +import org.apache.lucene.search.Query;
    +
    +@InterfaceAudience.Internal
    +public class FilterExpressParser extends QueryParser {
    --- End diff --
    
    remove unuseful class


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2693/



---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170865828
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/intf/ExpressionType.java ---
    @@ -42,5 +42,6 @@
       TRUE,
       STARTSWITH,
       ENDSWITH,
    -  CONTAINSWITH
    +  CONTAINSWITH,
    +  MATCH
    --- End diff --
    
    fixed


---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk closed the pull request at:

    https://github.com/apache/carbondata/pull/2003


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3711/



---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170864420
  
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneCoarseGrainDataMap.java ---
    @@ -0,0 +1,233 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.datamap.lucene;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import org.apache.carbondata.common.annotations.InterfaceAudience;
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datamap.dev.DataMapModel;
    +import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
    +import org.apache.carbondata.core.datastore.block.SegmentProperties;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.Blocklet;
    +import org.apache.carbondata.core.memory.MemoryException;
    +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
    +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
    +
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.lucene.analysis.Analyzer;
    +import org.apache.lucene.analysis.standard.StandardAnalyzer;
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.index.DirectoryReader;
    +import org.apache.lucene.index.IndexReader;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
    +import org.apache.lucene.queryparser.classic.ParseException;
    +import org.apache.lucene.queryparser.classic.QueryParser;
    +import org.apache.lucene.search.IndexSearcher;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.ScoreDoc;
    +import org.apache.lucene.search.TopDocs;
    +import org.apache.lucene.store.Directory;
    +import org.apache.solr.store.hdfs.HdfsDirectory;
    +
    +@InterfaceAudience.Internal
    +public class LuceneCoarseGrainDataMap extends CoarseGrainDataMap {
    +
    +  /**
    +   * log information
    +   */
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(LuceneCoarseGrainDataMap.class.getName());
    +
    +  public static final int BLOCKID_ID = 0;
    +
    +  public static final int BLOCKLETID_ID = 1;
    +
    +  public static final int PAGEID_ID = 2;
    +
    +  public static final int ROWID_ID = 3;
    +  /**
    +   * searcher object for this datamap
    +   */
    +  private IndexSearcher indexSearcher = null;
    +
    +  /**
    +   * default max values to return
    +   */
    +  private static int MAX_RESULT_NUMBER = 100;
    --- End diff --
    
    better to config it


---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170866114
  
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneCoarseGrainDataMapFactory.java ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.datamap.lucene;
    +
    +import java.io.File;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.carbondata.common.annotations.InterfaceAudience;
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datamap.DataMapDistributable;
    +import org.apache.carbondata.core.datamap.DataMapLevel;
    +import org.apache.carbondata.core.datamap.dev.DataMapModel;
    +import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
    +import org.apache.carbondata.core.memory.MemoryException;
    +
    +@InterfaceAudience.Internal
    +public class LuceneCoarseGrainDataMapFactory extends LuceneDataMapFactoryBase<CoarseGrainDataMap> {
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(LuceneCoarseGrainDataMapFactory.class.getName());
    +
    +  /**
    +   * Get the datamap for segmentid
    +   */
    +  public List<CoarseGrainDataMap> getDataMaps(String segmentId) throws IOException {
    +    List<CoarseGrainDataMap> lstDataMap = new ArrayList<>();
    +    CoarseGrainDataMap dataMap = new LuceneCoarseGrainDataMap(analyzer);
    +    try {
    +      dataMap.init(new DataMapModel(
    +          tableIdentifier.getTablePath() + "/Fact/Part0/Segment_" + segmentId + File.separator
    --- End diff --
    
    fixed


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3955/



---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170863724
  
    --- Diff: datamap/lucene/pom.xml ---
    @@ -0,0 +1,152 @@
    +<project xmlns="http://maven.apache.org/POM/4.0.0"
    +         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    +         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    +  <modelVersion>4.0.0</modelVersion>
    +
    +  <parent>
    +    <groupId>org.apache.carbondata</groupId>
    +    <artifactId>carbondata-parent</artifactId>
    +    <version>1.4.0-SNAPSHOT</version>
    +    <relativePath>../../pom.xml</relativePath>
    +  </parent>
    +
    +  <artifactId>carbondata-lucene</artifactId>
    +  <name>Apache CarbonData :: Lucene Index DataMap</name>
    +
    +  <properties>
    +    <dev.path>${basedir}/../../dev</dev.path>
    +    <lucene.version>6.3.0</lucene.version>
    +    <solr.version>6.3.0</solr.version>
    +  </properties>
    +
    +  <dependencies>
    +    <dependency>
    +      <groupId>org.apache.carbondata</groupId>
    +      <artifactId>carbondata-spark2</artifactId>
    +      <version>${project.version}</version>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.lucene</groupId>
    +      <artifactId>lucene-core</artifactId>
    +      <version>${lucene.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.lucene</groupId>
    +      <artifactId>lucene-analyzers-common</artifactId>
    +      <version>${lucene.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.lucene</groupId>
    +      <artifactId>lucene-queryparser</artifactId>
    +      <version>${lucene.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.lucene</groupId>
    +      <artifactId>lucene-sandbox</artifactId>
    +      <version>${lucene.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.solr</groupId>
    +      <artifactId>solr-core</artifactId>
    +      <version>${solr.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.solr</groupId>
    +      <artifactId>solr-solrj</artifactId>
    +      <version>${solr.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.scalatest</groupId>
    +      <artifactId>scalatest_${scala.binary.version}</artifactId>
    +      <scope>test</scope>
    +    </dependency>
    +  </dependencies>
    +
    +  <build>
    +    <testSourceDirectory>src/test/scala</testSourceDirectory>
    +    <resources>
    +      <resource>
    +        <directory>src/resources</directory>
    +      </resource>
    +      <resource>
    +        <directory>.</directory>
    +        <includes>
    +          <include>CARBON_SPARK_INTERFACELogResource.properties</include>
    --- End diff --
    
    not require


---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170863075
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/intf/ExpressionType.java ---
    @@ -42,5 +42,6 @@
       TRUE,
       STARTSWITH,
       ENDSWITH,
    -  CONTAINSWITH
    +  CONTAINSWITH,
    +  MATCH
    --- End diff --
    
    please use TEXT_MATCH


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2710/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2695/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2691/



---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170866003
  
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/FilterExpressParser.java ---
    @@ -0,0 +1,143 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.datamap.lucene;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.carbondata.common.annotations.InterfaceAudience;
    +import org.apache.carbondata.core.scan.expression.Expression;
    +import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
    +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
    +
    +import org.apache.lucene.analysis.Analyzer;
    +import org.apache.lucene.analysis.standard.StandardAnalyzer;
    +import org.apache.lucene.queryparser.classic.ParseException;
    +import org.apache.lucene.queryparser.classic.QueryParser;
    +import org.apache.lucene.search.BooleanClause;
    +import org.apache.lucene.search.Query;
    +
    +@InterfaceAudience.Internal
    +public class FilterExpressParser extends QueryParser {
    --- End diff --
    
    fixed


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3933/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3937/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3952/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2688/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2698/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    LGTM


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3944/



---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170865464
  
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneCoarseGrainDataMapFactory.java ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.datamap.lucene;
    +
    +import java.io.File;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.carbondata.common.annotations.InterfaceAudience;
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datamap.DataMapDistributable;
    +import org.apache.carbondata.core.datamap.DataMapLevel;
    +import org.apache.carbondata.core.datamap.dev.DataMapModel;
    +import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
    +import org.apache.carbondata.core.memory.MemoryException;
    +
    +@InterfaceAudience.Internal
    +public class LuceneCoarseGrainDataMapFactory extends LuceneDataMapFactoryBase<CoarseGrainDataMap> {
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(LuceneCoarseGrainDataMapFactory.class.getName());
    +
    +  /**
    +   * Get the datamap for segmentid
    +   */
    +  public List<CoarseGrainDataMap> getDataMaps(String segmentId) throws IOException {
    +    List<CoarseGrainDataMap> lstDataMap = new ArrayList<>();
    +    CoarseGrainDataMap dataMap = new LuceneCoarseGrainDataMap(analyzer);
    +    try {
    +      dataMap.init(new DataMapModel(
    +          tableIdentifier.getTablePath() + "/Fact/Part0/Segment_" + segmentId + File.separator
    --- End diff --
    
    please invoke getSegmentDIr


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3939/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3947/



---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170866077
  
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneCoarseGrainDataMap.java ---
    @@ -0,0 +1,233 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.datamap.lucene;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import org.apache.carbondata.common.annotations.InterfaceAudience;
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datamap.dev.DataMapModel;
    +import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
    +import org.apache.carbondata.core.datastore.block.SegmentProperties;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.Blocklet;
    +import org.apache.carbondata.core.memory.MemoryException;
    +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
    +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
    +
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.lucene.analysis.Analyzer;
    +import org.apache.lucene.analysis.standard.StandardAnalyzer;
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.index.DirectoryReader;
    +import org.apache.lucene.index.IndexReader;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
    +import org.apache.lucene.queryparser.classic.ParseException;
    +import org.apache.lucene.queryparser.classic.QueryParser;
    +import org.apache.lucene.search.IndexSearcher;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.ScoreDoc;
    +import org.apache.lucene.search.TopDocs;
    +import org.apache.lucene.store.Directory;
    +import org.apache.solr.store.hdfs.HdfsDirectory;
    +
    +@InterfaceAudience.Internal
    +public class LuceneCoarseGrainDataMap extends CoarseGrainDataMap {
    +
    +  /**
    +   * log information
    +   */
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(LuceneCoarseGrainDataMap.class.getName());
    +
    +  public static final int BLOCKID_ID = 0;
    +
    +  public static final int BLOCKLETID_ID = 1;
    +
    +  public static final int PAGEID_ID = 2;
    +
    +  public static final int ROWID_ID = 3;
    +  /**
    +   * searcher object for this datamap
    +   */
    +  private IndexSearcher indexSearcher = null;
    +
    +  /**
    +   * default max values to return
    +   */
    +  private static int MAX_RESULT_NUMBER = 100;
    --- End diff --
    
    I will leave it to future PR


---

[GitHub] carbondata pull request #2003: [CARBONDATA-2206] support lucene index datama...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2003#discussion_r170865978
  
    --- Diff: datamap/lucene/pom.xml ---
    @@ -0,0 +1,152 @@
    +<project xmlns="http://maven.apache.org/POM/4.0.0"
    +         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    +         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    +  <modelVersion>4.0.0</modelVersion>
    +
    +  <parent>
    +    <groupId>org.apache.carbondata</groupId>
    +    <artifactId>carbondata-parent</artifactId>
    +    <version>1.4.0-SNAPSHOT</version>
    +    <relativePath>../../pom.xml</relativePath>
    +  </parent>
    +
    +  <artifactId>carbondata-lucene</artifactId>
    +  <name>Apache CarbonData :: Lucene Index DataMap</name>
    +
    +  <properties>
    +    <dev.path>${basedir}/../../dev</dev.path>
    +    <lucene.version>6.3.0</lucene.version>
    +    <solr.version>6.3.0</solr.version>
    +  </properties>
    +
    +  <dependencies>
    +    <dependency>
    +      <groupId>org.apache.carbondata</groupId>
    +      <artifactId>carbondata-spark2</artifactId>
    +      <version>${project.version}</version>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.lucene</groupId>
    +      <artifactId>lucene-core</artifactId>
    +      <version>${lucene.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.lucene</groupId>
    +      <artifactId>lucene-analyzers-common</artifactId>
    +      <version>${lucene.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.lucene</groupId>
    +      <artifactId>lucene-queryparser</artifactId>
    +      <version>${lucene.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.lucene</groupId>
    +      <artifactId>lucene-sandbox</artifactId>
    +      <version>${lucene.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.solr</groupId>
    +      <artifactId>solr-core</artifactId>
    +      <version>${solr.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.apache.solr</groupId>
    +      <artifactId>solr-solrj</artifactId>
    +      <version>${solr.version}</version>
    +      <exclusions>
    +        <exclusion>
    +          <groupId>*</groupId>
    +          <artifactId>*</artifactId>
    +        </exclusion>
    +      </exclusions>
    +    </dependency>
    +    <dependency>
    +      <groupId>org.scalatest</groupId>
    +      <artifactId>scalatest_${scala.binary.version}</artifactId>
    +      <scope>test</scope>
    +    </dependency>
    +  </dependencies>
    +
    +  <build>
    +    <testSourceDirectory>src/test/scala</testSourceDirectory>
    +    <resources>
    +      <resource>
    +        <directory>src/resources</directory>
    +      </resource>
    +      <resource>
    +        <directory>.</directory>
    +        <includes>
    +          <include>CARBON_SPARK_INTERFACELogResource.properties</include>
    --- End diff --
    
    fixed


---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2707/



---

[GitHub] carbondata issue #2003: [CARBONDATA-2206] support lucene index datamap

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2003
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2692/



---