You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by jackylk <gi...@git.apache.org> on 2016/10/27 04:07:11 UTC

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] [WIP] Use CarbonInp...

GitHub user jackylk opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/262

    [CARBONDATA-308] [WIP] Use CarbonInputFormat in CarbonScanRDD compute

    Use CarbonInputFormat in CarbonScanRDD compute function
    
    1. In driver side, only getSplit is required, so only filter condition is required, no need to create full QueryModel object, so creation of QueryModel is moved from driver side to executor side.
    2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead of use 
    QueryExecutor directly

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata scanrdd

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #262
    
----
commit ef4a889db9b86653c273794c9a810a9cd9683437
Author: jackylk <ja...@huawei.com>
Date:   2016-10-22T18:43:53Z

    use CarbonInputFormat in executor

commit a5c17f523c7127b538cc2d384cbff4fa454a007a
Author: jackylk <ja...@huawei.com>
Date:   2016-10-27T04:01:36Z

    modify getPartition

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87712823
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -224,42 +221,29 @@ private void addSegmentsIfEmpty(JobContext job, AbsoluteTableIdentifier absolute
        * @return List<InputSplit> list of CarbonInputSplit
        * @throws IOException
        */
    -  @Override public List<InputSplit> getSplits(JobContext job) throws IOException {
    -    try {
    +  @Override
    +  public List<InputSplit> getSplits(JobContext job) throws IOException {
    +    AbsoluteTableIdentifier identifier = getAbsoluteTableIdentifier(job.getConfiguration());
    +    addSegmentsIfEmpty(job, identifier);
    +    if (getSegmentsToAccess(job).length == 0) {
    --- End diff --
    
    Is this check required here? Any way in the down method it does again right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87713382
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/mapreduce/CarbonHadoopMapReduceUtil.scala ---
    @@ -0,0 +1,25 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mapreduce
    +
    +/**
    + * Utility that enable the usage of SparkHadoopMapReduceUtil out side of spark package
    + */
    +trait CarbonHadoopMapReduceUtil extends SparkHadoopMapReduceUtil {
    --- End diff --
    
    This class is already added in latest master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #262: [CARBONDATA-308] Use CarbonInputFormat in C...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/262
  
    CI testing
    http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/609/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87713012
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -61,7 +61,6 @@
     import org.apache.carbondata.lcm.status.SegmentStatusManager;
    --- End diff --
    
    I guess `getRowCount` can be removed as we are not using it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716124
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala ---
    @@ -244,68 +195,40 @@ class CarbonScanRDD[V: ClassTag](
               throw new java.util.NoSuchElementException("End of stream")
             }
             havePair = false
    -        recordCount += 1
    -        keyClass.getValue(rowIterator.next())
    +        val value: V = reader.getCurrentValue
    +        value
           }
    +    }
    +  }
     
    -      def clearDictionaryCache(columnToDictionaryMap: java.util.Map[String, Dictionary]) = {
    -        if (null != columnToDictionaryMap) {
    -          org.apache.carbondata.spark.util.CarbonQueryUtil
    -            .clearColumnDictionaryCache(columnToDictionaryMap)
    -        }
    -      }
    +  private def prepareInputFormatForDriver(conf: Configuration): CarbonInputFormat[V] = {
    +    CarbonInputFormat.setCarbonTable(conf, carbonTable)
    +    createInputFormat(conf)
    +  }
     
    -      def logStatistics(): Unit = {
    -          if (null != queryModel.getStatisticsRecorder) {
    -            var queryStatistic = new QueryStatistic()
    -            queryStatistic
    -              .addFixedTimeStatistic(QueryStatisticsConstants.EXECUTOR_PART,
    -                System.currentTimeMillis - queryStartTime
    -              )
    -            queryModel.getStatisticsRecorder.recordStatistics(queryStatistic)
    -            // result size
    -            queryStatistic = new QueryStatistic()
    -            queryStatistic.addCountStatistic(QueryStatisticsConstants.RESULT_SIZE, recordCount)
    -            queryModel.getStatisticsRecorder.recordStatistics(queryStatistic)
    -            // print executor query statistics for each task_id
    -            queryModel.getStatisticsRecorder.logStatisticsAsTableExecutor()
    -          }
    -        }
    -      }
    +  private def prepareInputFormatForExecutor(conf: Configuration): CarbonInputFormat[V] = {
    +    CarbonInputFormat.setCarbonReadSupport(classOf[RawDataReadSupport], conf)
    +    createInputFormat(conf)
    +  }
     
    -    iter
    +  private def createInputFormat(conf: Configuration): CarbonInputFormat[V] = {
    +    val format = new CarbonInputFormat[V]
    +    CarbonInputFormat.setTablePath(conf, identifier.getTablePath)
    +    CarbonInputFormat.setFilterPredicates(conf, filterExpression)
    +    val projection = new CarbonProjection
    +    columnProjection.foreach { attr =>
    +      projection.addColumn(attr.asInstanceOf[AttributeReference].name)
    +    }
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86786673
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -311,80 +278,6 @@ private void addSegmentsIfEmpty(JobContext job, AbsoluteTableIdentifier absolute
         return result;
       }
     
    -  /**
    -   * get total number of rows. Same as count(*)
    -   *
    -   * @throws IOException
    -   * @throws IndexBuilderException
    -   */
    -  public long getRowCount(JobContext job) throws IOException, IndexBuilderException {
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87714121
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/mapreduce/CarbonHadoopMapReduceUtil.scala ---
    @@ -0,0 +1,25 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mapreduce
    +
    +/**
    + * Utility that enable the usage of SparkHadoopMapReduceUtil out side of spark package
    + */
    +trait CarbonHadoopMapReduceUtil extends SparkHadoopMapReduceUtil {
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716097
  
    --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java ---
    @@ -97,15 +91,14 @@ public long getTableStatusLastModifiedTime() throws IOException {
        * @return
        * @throws IOException
        */
    -  public ValidAndInvalidSegmentsInfo getValidAndInvalidSegments() throws IOException {
    +  public static ValidSegmentsInfo getValidSegments(AbsoluteTableIdentifier identifier)
    --- End diff --
    
    ok, added back.
    invalid segments need to pass to task side, executor need to remove them in the index store


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86786637
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/block/Distributable.java ---
    @@ -16,10 +16,12 @@
      */
     package org.apache.carbondata.core.carbon.datastore.block;
     
    +import java.io.IOException;
    +
     /**
    - * Abstract class which is maintains the locations of node.
    + * interface to get the locations of node. Used for making task distribution based on locality
      */
    -public abstract class Distributable implements Comparable<Distributable> {
    +public interface Distributable extends Comparable<Distributable> {
     
    -  public abstract String[] getLocations();
    +  String[] getLocations() throws IOException;
    --- End diff --
    
    Because CarbonInputSplit need to implement Distributable, and InputSplit has a getLocation function that throws IOException


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #262: [CARBONDATA-308] Use CarbonInputFormat in C...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/262
  
    CI running
    http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/659/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86214621
  
    --- Diff: integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -953,66 +959,6 @@ public static void checkAndCreateCarbonDataLocation(String carbonStorePath, Stri
       }
     
       /**
    -   * method to distribute the blocklets of a block in multiple blocks
    --- End diff --
    
    May be we should take a call for removing blocklet distribution. For filter queries with small number of blocks to scan it is very helpful to process faster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87712366
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -130,41 +130,27 @@ public static CarbonTable getCarbonTable(Configuration configuration) throws IOE
         return (CarbonTable) ObjectSerializationUtil.convertStringToObject(carbonTableStr);
       }
     
    -  /**
    -   * It sets unresolved filter expression.
    -   *
    -   * @param configuration
    -   * @param filterExpression
    -   */
    -  public static void setFilterPredicates(Configuration configuration, Expression filterExpression) {
    -    try {
    -      String filterString = ObjectSerializationUtil.convertObjectToString(filterExpression);
    -      configuration.set(FILTER_PREDICATE, filterString);
    -    } catch (Exception e) {
    -      throw new RuntimeException("Error while setting filter expression to Job", e);
    -    }
    +  public static void setTablePath(Configuration configuration, String tablePath)
    +      throws IOException {
    +    configuration.set(FileInputFormat.INPUT_DIR, tablePath);
       }
     
       /**
    -   * It sets the resolved filter expression
    +   * It sets unresolved filter expression.
        *
        * @param configuration
        * @param filterExpression
        */
    -  public static void setFilterPredicates(Configuration configuration,
    -      FilterResolverIntf filterExpression) {
    +  public static void setFilterPredicates(Configuration configuration, Expression filterExpression) {
    --- End diff --
    
    Can't the `filterExpression` null in any case? Don't require null check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by QiangCai <gi...@git.apache.org>.

Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058188
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
    @@ -22,28 +22,44 @@
     import java.io.DataOutput;
     import java.io.IOException;
     import java.io.Serializable;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
    +import org.apache.carbondata.core.carbon.datastore.block.Distributable;
    +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
    +import org.apache.carbondata.core.carbon.path.CarbonTablePath;
     
     import org.apache.hadoop.fs.Path;
     import org.apache.hadoop.io.Writable;
     import org.apache.hadoop.mapreduce.lib.input.FileSplit;
     
    +
     /**
      * Carbon input split to allow distributed read of CarbonInputFormat.
      */
    -public class CarbonInputSplit extends FileSplit implements Serializable, Writable {
    +public class CarbonInputSplit extends FileSplit implements Distributable, Serializable, Writable {
     
       private static final long serialVersionUID = 3520344046772190207L;
       private String segmentId;
    -  /**
    +  public String taskId = "0";
    +
    +  /*
        * Number of BlockLets in a block
        */
       private int numberOfBlocklets = 0;
     
    -  public CarbonInputSplit() {
    -    super(null, 0, 0, new String[0]);
    +  public  CarbonInputSplit() {
       }
     
    -  public CarbonInputSplit(String segmentId, Path path, long start, long length,
    +  private void parserPath(Path path) {
    +    String[] nameParts = path.getName().split("-");
    +    if (nameParts != null && nameParts.length >= 3) {
    +      this.taskId = nameParts[2];
    +    }
    +  }
    +
    +  private CarbonInputSplit(String segmentId, Path path, long start, long length,
    --- End diff --
    
    please initialize taskId


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716028
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -224,42 +221,29 @@ private void addSegmentsIfEmpty(JobContext job, AbsoluteTableIdentifier absolute
        * @return List<InputSplit> list of CarbonInputSplit
        * @throws IOException
        */
    -  @Override public List<InputSplit> getSplits(JobContext job) throws IOException {
    -    try {
    +  @Override
    +  public List<InputSplit> getSplits(JobContext job) throws IOException {
    +    AbsoluteTableIdentifier identifier = getAbsoluteTableIdentifier(job.getConfiguration());
    +    addSegmentsIfEmpty(job, identifier);
    +    if (getSegmentsToAccess(job).length == 0) {
    +      return new ArrayList<>(0);
    +    }
    +
    +    FilterResolverIntf filterInterface;
    +    Expression filter = getFilterPredicates(job.getConfiguration());
    +    if (filter == null) {
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by kumarvishal09 <gi...@git.apache.org>.

Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86393676
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -311,80 +278,6 @@ private void addSegmentsIfEmpty(JobContext job, AbsoluteTableIdentifier absolute
         return result;
       }
     
    -  /**
    -   * get total number of rows. Same as count(*)
    -   *
    -   * @throws IOException
    -   * @throws IndexBuilderException
    -   */
    -  public long getRowCount(JobContext job) throws IOException, IndexBuilderException {
    --- End diff --
    
    This method is useful for count(*) query as we can return number of rows from driver itself , currently we are pushing down to executor, better keep this method it will be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86362889
  
    --- Diff: integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -953,66 +959,6 @@ public static void checkAndCreateCarbonDataLocation(String carbonStorePath, Stri
       }
     
       /**
    -   * method to distribute the blocklets of a block in multiple blocks
    --- End diff --
    
    @kumarvishal09 can you have a look and probably verify any performance change for filter queries?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87540836
  
    --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java ---
    @@ -177,6 +178,13 @@ public ValidAndInvalidSegmentsInfo getValidAndInvalidSegments() throws IOExcepti
           }
     
         }
    +
    +    // remove entry in the segment index if there are invalid segments
    +    if (listOfInvalidSegments.size() > 0) {
    --- End diff --
    
    ok, modified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716071
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -224,42 +221,29 @@ private void addSegmentsIfEmpty(JobContext job, AbsoluteTableIdentifier absolute
        * @return List<InputSplit> list of CarbonInputSplit
        * @throws IOException
        */
    -  @Override public List<InputSplit> getSplits(JobContext job) throws IOException {
    -    try {
    +  @Override
    +  public List<InputSplit> getSplits(JobContext job) throws IOException {
    +    AbsoluteTableIdentifier identifier = getAbsoluteTableIdentifier(job.getConfiguration());
    +    addSegmentsIfEmpty(job, identifier);
    +    if (getSegmentsToAccess(job).length == 0) {
    --- End diff --
    
    If this check is removed, some test case of MergeRDD is failing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86211496
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/filter/FilterExpressionProcessor.java ---
    @@ -352,4 +354,18 @@ private FilterResolverIntf getFilterResolverBasedOnExpressionType(
         return new RowLevelFilterResolverImpl(expression, false, false, tableIdentifier);
       }
     
    +  public static FilterResolverIntf getResolvedFilter(AbsoluteTableIdentifier identifier,
    --- End diff --
    
    Why it was added?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87540816
  
    --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java ---
    @@ -177,6 +178,13 @@ public ValidAndInvalidSegmentsInfo getValidAndInvalidSegments() throws IOExcepti
           }
     
         }
    +
    +    // remove entry in the segment index if there are invalid segments
    +    if (listOfInvalidSegments.size() > 0) {
    --- End diff --
    
    ok, modified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716151
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -130,41 +130,27 @@ public static CarbonTable getCarbonTable(Configuration configuration) throws IOE
         return (CarbonTable) ObjectSerializationUtil.convertStringToObject(carbonTableStr);
       }
     
    -  /**
    -   * It sets unresolved filter expression.
    -   *
    -   * @param configuration
    -   * @param filterExpression
    -   */
    -  public static void setFilterPredicates(Configuration configuration, Expression filterExpression) {
    -    try {
    -      String filterString = ObjectSerializationUtil.convertObjectToString(filterExpression);
    -      configuration.set(FILTER_PREDICATE, filterString);
    -    } catch (Exception e) {
    -      throw new RuntimeException("Error while setting filter expression to Job", e);
    -    }
    +  public static void setTablePath(Configuration configuration, String tablePath)
    +      throws IOException {
    +    configuration.set(FileInputFormat.INPUT_DIR, tablePath);
       }
     
       /**
    -   * It sets the resolved filter expression
    +   * It sets unresolved filter expression.
        *
        * @param configuration
        * @param filterExpression
        */
    -  public static void setFilterPredicates(Configuration configuration,
    -      FilterResolverIntf filterExpression) {
    +  public static void setFilterPredicates(Configuration configuration, Expression filterExpression) {
    --- End diff --
    
    ok, added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #262: [CARBONDATA-308] Use CarbonInputFormat in C...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/262
  
    CI running
    http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/661/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by QiangCai <gi...@git.apache.org>.

Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058166
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
    @@ -22,28 +22,44 @@
     import java.io.DataOutput;
     import java.io.IOException;
     import java.io.Serializable;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
    +import org.apache.carbondata.core.carbon.datastore.block.Distributable;
    +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
    +import org.apache.carbondata.core.carbon.path.CarbonTablePath;
     
     import org.apache.hadoop.fs.Path;
     import org.apache.hadoop.io.Writable;
     import org.apache.hadoop.mapreduce.lib.input.FileSplit;
     
    +
     /**
      * Carbon input split to allow distributed read of CarbonInputFormat.
      */
    -public class CarbonInputSplit extends FileSplit implements Serializable, Writable {
    +public class CarbonInputSplit extends FileSplit implements Distributable, Serializable, Writable {
     
       private static final long serialVersionUID = 3520344046772190207L;
       private String segmentId;
    -  /**
    +  public String taskId = "0";
    +
    +  /*
        * Number of BlockLets in a block
        */
       private int numberOfBlocklets = 0;
     
    -  public CarbonInputSplit() {
    -    super(null, 0, 0, new String[0]);
    +  public  CarbonInputSplit() {
       }
     
    -  public CarbonInputSplit(String segmentId, Path path, long start, long length,
    +  private void parserPath(Path path) {
    --- End diff --
    
    please use CarbonTablePath.DataFileUtil.getTaskNo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r87712886
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -224,42 +221,29 @@ private void addSegmentsIfEmpty(JobContext job, AbsoluteTableIdentifier absolute
        * @return List<InputSplit> list of CarbonInputSplit
        * @throws IOException
        */
    -  @Override public List<InputSplit> getSplits(JobContext job) throws IOException {
    -    try {
    +  @Override
    +  public List<InputSplit> getSplits(JobContext job) throws IOException {
    +    AbsoluteTableIdentifier identifier = getAbsoluteTableIdentifier(job.getConfiguration());
    +    addSegmentsIfEmpty(job, identifier);
    +    if (getSegmentsToAccess(job).length == 0) {
    +      return new ArrayList<>(0);
    +    }
    +
    +    FilterResolverIntf filterInterface;
    +    Expression filter = getFilterPredicates(job.getConfiguration());
    +    if (filter == null) {
    --- End diff --
    
    No need of this check. Please initialize `FilterResolverIntf filterInterface = null` at first


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/262


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

Posted by kumarvishal09 <gi...@git.apache.org>.

Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86391469
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/block/Distributable.java ---
    @@ -16,10 +16,12 @@
      */
     package org.apache.carbondata.core.carbon.datastore.block;
     
    +import java.io.IOException;
    +
     /**
    - * Abstract class which is maintains the locations of node.
    + * interface to get the locations of node. Used for making task distribution based on locality
      */
    -public abstract class Distributable implements Comparable<Distributable> {
    +public interface Distributable extends Comparable<Distributable> {
     
    -  public abstract String[] getLocations();
    +  String[] getLocations() throws IOException;
    --- End diff --
    
    Any reason to throw IOException form this method, I think this is not required ?? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---