You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by ravipesala <gi...@git.apache.org> on 2016/12/08 10:49:21 UTC

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

GitHub user ravipesala opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/412

    [WIP]Added vector reader in Carbon scan.

    This PR enables carbon to read the data in vector columnar format.
    New interface classes added `CarbonColumnarBatch` and `CarbonColumnVector`  to read data in vector format directly from scanner.
    In case of Spark2.0 batch reader we can directly pass wrapper class of 'org.apache.spark.sql.execution.vectorized.ColumnarBatch' to carbon and set the data to it. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata vectorreader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #412
    
----
commit 1bdf725452092b23fc50b4c8cb513541b430fae2
Author: ravipesala <ra...@gmail.com>
Date:   2016-12-06T17:54:05Z

    add initial check in for vector reader

commit 6bce6a80d09a7853e3fb883a320368471d5e739a
Author: ravipesala <ra...@gmail.com>
Date:   2016-12-08T10:39:44Z

    Added vector reader in carbon

commit c52f8644160a1ac7ad2531e80f3154ca893a03ad
Author: ravipesala <ra...@gmail.com>
Date:   2016-12-08T10:43:34Z

    Fixed check style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/101/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    @jackylk, rebased please review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830120
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    --- End diff --
    
    missing file header


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by piaoyats <gi...@git.apache.org>.

Github user piaoyats commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92533460
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
    @@ -18,15 +18,69 @@
      */
     package org.apache.carbondata.scan.executor;
     
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
     import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
    +import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;
     
     /**
      * Factory class to get the query executor from RDD
      * This will return the executor based on query type
      */
     public class QueryExecutorFactory {
     
    -  public static QueryExecutor getQueryExecutor() {
    -    return new DetailQueryExecutor();
    +  public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
    +    if (queryModel.isVectorReader()) {
    +      return new VectorDetailQueryExecutor();
    +    } else {
    +      return new DetailQueryExecutor();
    +    }
    +  }
    +
    +  public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
    +    int batchSize = 10000;
    --- End diff --
    
    Can we set batchSize through configuration? which is useful for tuning.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835427
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r93042507
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala ---
    @@ -231,4 +248,21 @@ class CarbonScanRDD[V: ClassTag](
         val firstOptionLocation = theSplit.split.value.getLocations.filter(_ != "localhost")
         firstOptionLocation
       }
    +
    +  def createVectorizedCarbonRecordReader(queryModel: QueryModel): RecordReader[Void, Object] = {
    +    val name = "org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader"
    +    try {
    +      val cons = Class.forName(name).getDeclaredConstructors
    +      cons.head.setAccessible(true)
    +      cons.head.newInstance(queryModel).asInstanceOf[RecordReader[Void, Object]]
    +    } catch {
    +      case e: Exception =>
    +        null
    --- End diff --
    
    should not ignore exception


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92743691
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
    @@ -18,15 +18,69 @@
      */
     package org.apache.carbondata.scan.executor;
     
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
     import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
    +import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;
     
     /**
      * Factory class to get the query executor from RDD
      * This will return the executor based on query type
      */
     public class QueryExecutorFactory {
     
    -  public static QueryExecutor getQueryExecutor() {
    -    return new DetailQueryExecutor();
    +  public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
    +    if (queryModel.isVectorReader()) {
    +      return new VectorDetailQueryExecutor();
    +    } else {
    +      return new DetailQueryExecutor();
    +    }
    +  }
    +
    +  public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
    --- End diff --
    
    This method is not used now, I am removing it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    please rebase


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by piaoyats <gi...@git.apache.org>.

Github user piaoyats commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92534171
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
    @@ -18,15 +18,69 @@
      */
     package org.apache.carbondata.scan.executor;
     
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
     import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
    +import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;
     
     /**
      * Factory class to get the query executor from RDD
      * This will return the executor based on query type
      */
     public class QueryExecutorFactory {
     
    -  public static QueryExecutor getQueryExecutor() {
    -    return new DetailQueryExecutor();
    +  public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
    +    if (queryModel.isVectorReader()) {
    +      return new VectorDetailQueryExecutor();
    +    } else {
    +      return new DetailQueryExecutor();
    +    }
    +  }
    +
    +  public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
    +    int batchSize = 10000;
    +    List<QueryDimension> queryDimension = queryModel.getQueryDimension();
    +    List<QueryMeasure> queryMeasures = queryModel.getQueryMeasures();
    +    CarbonColumnVector[] vectors =
    +        new CarbonColumnVector[queryDimension.size() + queryMeasures.size()];
    +    for (int i = 0; i < queryDimension.size(); i++) {
    +      QueryDimension dim = queryDimension.get(i);
    +      if (dim.getDimension().hasEncoding(Encoding.DIRECT_DICTIONARY)) {
    +        vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.LONG);
    +      } else if (!dim.getDimension().hasEncoding(Encoding.DICTIONARY)) {
    +        vectors[dim.getQueryOrder()] =
    +            new CarbonColumnVectorImpl(batchSize, dim.getDimension().getDataType());
    +      } else if (dim.getDimension().isComplex()) {
    +        vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.STRUCT);
    +      } else {
    +        vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.INT);
    +      }
    +    }
    +
    +    for (int i = 0; i < queryMeasures.size(); i++) {
    +      QueryMeasure msr = queryMeasures.get(i);
    +      switch (msr.getMeasure().getDataType()) {
    +        case SHORT:
    +        case INT:
    +        case LONG:
    +          vectors[msr.getQueryOrder()] =
    +              new CarbonColumnVectorImpl(batchSize, msr.getMeasure().getDataType());
    +          break;
    +        case DECIMAL:
    +          vectors[msr.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.DECIMAL);
    +          break;
    +        default:
    +          vectors[msr.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.DOUBLE);
    +      }
    --- End diff --
    
    what is the difference between msr.getMeasure().getDataType() with DataType.Decimal and DataType.DOUBLE?
    Seems that case DECIMAL logic can merge witch LONG
    
    can we just use vectors[msr.getQueryOrder()] =
                  new CarbonColumnVectorImpl(batchSize, msr.getMeasure().getDataType());  for all measure data type ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830121
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +import org.apache.spark.sql.types.Decimal;
    +
    +public interface CarbonColumnVector {
    --- End diff --
    
    name it `ColumnVector`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830068
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
    @@ -78,7 +80,13 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context)
         readSupport.initialize(queryModel.getProjectionColumns(),
             queryModel.getAbsoluteTableIdentifier());
         try {
    -      carbonIterator = new ChunkRowIterator(queryExecutor.execute(queryModel));
    +      if (queryModel.isVectorReader()) {
    +        carbonIterator = new VectorChunkRowIterator(
    --- End diff --
    
    use factory to create it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835416
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java ---
    @@ -0,0 +1,45 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +public class CarbonColumnarBatch {
    --- End diff --
    
    In spark the name is aready `ColumnarBatch` so I just used this name to avoid confusion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91885134
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +import org.apache.spark.sql.types.Decimal;
    +
    +public interface CarbonColumnVector {
    +
    +  public void putShort(int rowId, short value);
    +
    +  public void putInt(int rowId, int value);
    +
    +  public void putLong(int rowId, long value);
    +
    +  public void putDecimal(int rowId, Decimal value, int precision);
    +
    +  public void putDouble(int rowId, double value);
    +
    +  public void putBytes(int rowId, byte[] value);
    +
    +  public void putBytes(int rowId, int offset, int length, byte[] value);
    --- End diff --
    
    Yes, never used, just added for future purpose


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830226
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +import org.apache.spark.sql.types.Decimal;
    +
    +public interface CarbonColumnVector {
    --- End diff --
    
    Why is it an interface? Are you considering make it offheap in the future? If not, I think use class directly is enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r93042113
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/CarbonLateDecodeStrategy.scala ---
    @@ -393,4 +426,10 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
           case others => None
         }
       }
    +
    +  def supportBatchedDataSource(sqlContext: SQLContext, cols: Seq[Attribute]): Boolean = {
    +    sqlContext.conf.wholeStageEnabled &&
    +    sqlContext.sparkSession.conf.get("carbon.enable.vector.reader", "true").toBoolean &&
    --- End diff --
    
    better to add this string to CarbonCommonConstants and make it a property. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [WIP]Added vector reader in Carbon scan.

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/87/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by piaoyats <gi...@git.apache.org>.

Github user piaoyats commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92108917
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/CarbonLateDecodeStrategy.scala ---
    @@ -87,19 +90,17 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
       private[this] def toCatalystRDD(
           relation: LogicalRelation,
           output: Seq[Attribute],
    -      rdd: RDD[Row],
    +      rdd: RDD[InternalRow],
           needDecode: ArrayBuffer[AttributeReference]):
       RDD[InternalRow] = {
    -    val newRdd = if (needDecode.size > 0) {
    +    if (needDecode.size > 0) {
    +      rdd.asInstanceOf[CarbonScanRDD].setVectorReaderSupport(false)
           getDecoderRDD(relation, needDecode, rdd, output)
    --- End diff --
    
    hi, i want to know what will happen if setVectorReaderSupport(true)  when needDecode.size > 0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830271
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +import org.apache.spark.sql.types.Decimal;
    --- End diff --
    
    can we not introduce spark dependency in this interface and implementation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92744134
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
    @@ -18,15 +18,69 @@
      */
     package org.apache.carbondata.scan.executor;
     
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
     import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
    +import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;
     
     /**
      * Factory class to get the query executor from RDD
      * This will return the executor based on query type
      */
     public class QueryExecutorFactory {
     
    -  public static QueryExecutor getQueryExecutor() {
    -    return new DetailQueryExecutor();
    +  public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
    +    if (queryModel.isVectorReader()) {
    +      return new VectorDetailQueryExecutor();
    +    } else {
    +      return new DetailQueryExecutor();
    +    }
    +  }
    +
    +  public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
    +    int batchSize = 10000;
    +    List<QueryDimension> queryDimension = queryModel.getQueryDimension();
    +    List<QueryMeasure> queryMeasures = queryModel.getQueryMeasures();
    +    CarbonColumnVector[] vectors =
    +        new CarbonColumnVector[queryDimension.size() + queryMeasures.size()];
    +    for (int i = 0; i < queryDimension.size(); i++) {
    +      QueryDimension dim = queryDimension.get(i);
    +      if (dim.getDimension().hasEncoding(Encoding.DIRECT_DICTIONARY)) {
    +        vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.LONG);
    +      } else if (!dim.getDimension().hasEncoding(Encoding.DICTIONARY)) {
    +        vectors[dim.getQueryOrder()] =
    +            new CarbonColumnVectorImpl(batchSize, dim.getDimension().getDataType());
    +      } else if (dim.getDimension().isComplex()) {
    +        vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.STRUCT);
    +      } else {
    +        vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.INT);
    +      }
    +    }
    +
    +    for (int i = 0; i < queryMeasures.size(); i++) {
    +      QueryMeasure msr = queryMeasures.get(i);
    +      switch (msr.getMeasure().getDataType()) {
    +        case SHORT:
    +        case INT:
    +        case LONG:
    +          vectors[msr.getQueryOrder()] =
    +              new CarbonColumnVectorImpl(batchSize, msr.getMeasure().getDataType());
    +          break;
    +        case DECIMAL:
    +          vectors[msr.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.DECIMAL);
    +          break;
    +        default:
    +          vectors[msr.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.DOUBLE);
    +      }
    --- End diff --
    
    No we can't use as we support few datatypes while storing reading measure data.
    Anyway this method is not used now, I am removing it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/243/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91829957
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/executor/impl/DetailQueryExecutor.java ---
    @@ -36,8 +37,13 @@
       @Override public CarbonIterator<Object[]> execute(QueryModel queryModel)
           throws QueryExecutionException {
         List<BlockExecutionInfo> blockExecutionInfoList = getBlockExecutionInfos(queryModel);
    -    return new DetailQueryResultIterator(blockExecutionInfoList, queryModel,
    -        queryProperties.executorService);
    +    if (queryModel.isVectorReader()) {
    --- End diff --
    
    can we create different implementation of AbstractQueryExecutor and new a different one in CarbonRecordReader?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/412


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/106/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by piaoyats <gi...@git.apache.org>.

Github user piaoyats commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92533279
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
    @@ -18,15 +18,69 @@
      */
     package org.apache.carbondata.scan.executor;
     
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
     import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
    +import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;
     
     /**
      * Factory class to get the query executor from RDD
      * This will return the executor based on query type
      */
     public class QueryExecutorFactory {
     
    -  public static QueryExecutor getQueryExecutor() {
    -    return new DetailQueryExecutor();
    +  public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
    +    if (queryModel.isVectorReader()) {
    +      return new VectorDetailQueryExecutor();
    +    } else {
    +      return new DetailQueryExecutor();
    +    }
    +  }
    +
    +  public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
    --- End diff --
    
    spell error\uff0c  Columinar ->  Columnar


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92743655
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
    @@ -18,15 +18,69 @@
      */
     package org.apache.carbondata.scan.executor;
     
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
     import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
    +import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;
     
     /**
      * Factory class to get the query executor from RDD
      * This will return the executor based on query type
      */
     public class QueryExecutorFactory {
     
    -  public static QueryExecutor getQueryExecutor() {
    -    return new DetailQueryExecutor();
    +  public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
    +    if (queryModel.isVectorReader()) {
    +      return new VectorDetailQueryExecutor();
    +    } else {
    +      return new DetailQueryExecutor();
    +    }
    +  }
    +
    +  public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
    +    int batchSize = 10000;
    --- End diff --
    
    This method is not used now, I am removing it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [WIP]Added vector reader in Carbon scan.

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/71/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r93043995
  
    --- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java ---
    @@ -0,0 +1,248 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.spark.vectorreader;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.carbondata.core.cache.dictionary.Dictionary;
    +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
    +import org.apache.carbondata.core.util.CarbonUtil;
    +import org.apache.carbondata.hadoop.CarbonInputSplit;
    +import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
    +import org.apache.carbondata.scan.executor.QueryExecutor;
    +import org.apache.carbondata.scan.executor.QueryExecutorFactory;
    +import org.apache.carbondata.scan.executor.exception.QueryExecutionException;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.spark.util.CarbonScalaUtil;
    +
    +import org.apache.hadoop.mapreduce.InputSplit;
    +import org.apache.hadoop.mapreduce.RecordReader;
    +import org.apache.hadoop.mapreduce.TaskAttemptContext;
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.vectorized.ColumnarBatch;
    +import org.apache.spark.sql.types.DecimalType;
    +import org.apache.spark.sql.types.StructField;
    +import org.apache.spark.sql.types.StructType;
    +
    +public class VectorizedCarbonRecordReader extends RecordReader<Void, Object> {
    +
    +  private int batchIdx = 0;
    +
    +  private int numBatched = 0;
    +
    +  private ColumnarBatch columnarBatch;
    +
    +  private CarbonColumnarBatch carbonColumnarBatch;
    +
    +  /**
    +   * If true, this class returns batches instead of rows.
    +   */
    +  private boolean returnColumnarBatch;
    +
    +  /**
    +   * The default config on whether columnarBatch should be offheap.
    +   */
    +  private static final MemoryMode DEFAULT_MEMORY_MODE = MemoryMode.ON_HEAP;
    +
    +  private QueryModel queryModel;
    +
    +  private AbstractDetailQueryResultIterator iterator;
    +
    +  private QueryExecutor queryExecutor;
    +
    +  public VectorizedCarbonRecordReader(QueryModel queryModel) {
    +    this.queryModel = queryModel;
    +    enableReturningBatches();
    +  }
    +
    +  /**
    +   * Implementation of RecordReader API.
    +   */
    +  @Override public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptContext)
    --- End diff --
    
    can you abstract this function into a base class, most of the initialization logic is duplicate with `CarbonRecordReader`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91839108
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +import org.apache.spark.sql.types.Decimal;
    +
    +public interface CarbonColumnVector {
    +
    +  public void putShort(int rowId, short value);
    +
    +  public void putInt(int rowId, int value);
    +
    +  public void putLong(int rowId, long value);
    +
    +  public void putDecimal(int rowId, Decimal value, int precision);
    +
    +  public void putDouble(int rowId, double value);
    +
    +  public void putBytes(int rowId, byte[] value);
    +
    +  public void putBytes(int rowId, int offset, int length, byte[] value);
    --- End diff --
    
    `ColumnarVectorWrapper` is the implementation class for this interface


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92741003
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala ---
    @@ -150,12 +153,25 @@ class CarbonScanRDD[V: ClassTag](
         val attemptContext = new TaskAttemptContextImpl(new Configuration(), attemptId)
         val format = prepareInputFormatForExecutor(attemptContext.getConfiguration)
         val inputSplit = split.asInstanceOf[CarbonSparkPartition].split.value
    -    val reader = format.createRecordReader(inputSplit, attemptContext)
    +    val model = format.getQueryModel(inputSplit, attemptContext)
    +    val reader = {
    +      if (vectorReader) {
    +        val carbonRecordReader = createVectorizedCarbonRecordReader(model)
    +        if (carbonRecordReader == null) {
    +          new CarbonRecordReader(model, format.getReadSupportClass(attemptContext.getConfiguration))
    +        } else {
    +          carbonRecordReader
    +        }
    +      } else {
    +        new CarbonRecordReader(model, format.getReadSupportClass(attemptContext.getConfiguration))
    --- End diff --
    
    This vector carbon reader is specific to spark and it has spark dependencies that is why it is created in spark module. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/244/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91871166
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +import org.apache.spark.sql.types.Decimal;
    +
    +public interface CarbonColumnVector {
    +
    +  public void putShort(int rowId, short value);
    +
    +  public void putInt(int rowId, int value);
    +
    +  public void putLong(int rowId, long value);
    +
    +  public void putDecimal(int rowId, Decimal value, int precision);
    +
    +  public void putDouble(int rowId, double value);
    +
    +  public void putBytes(int rowId, byte[] value);
    +
    +  public void putBytes(int rowId, int offset, int length, byte[] value);
    --- End diff --
    
    Yes, but I mean this function `putBytes(int rowId, int offset, int length, byte[] value)` is never used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92742973
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/CarbonLateDecodeStrategy.scala ---
    @@ -87,19 +90,17 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
       private[this] def toCatalystRDD(
           relation: LogicalRelation,
           output: Seq[Attribute],
    -      rdd: RDD[Row],
    +      rdd: RDD[InternalRow],
           needDecode: ArrayBuffer[AttributeReference]):
       RDD[InternalRow] = {
    -    val newRdd = if (needDecode.size > 0) {
    +    if (needDecode.size > 0) {
    +      rdd.asInstanceOf[CarbonScanRDD].setVectorReaderSupport(false)
           getDecoderRDD(relation, needDecode, rdd, output)
    --- End diff --
    
    if vector reader is true and `needDecode.size > 0` then it uses dictionary decoder rdd in its parent.But decoder rdd is not capable of handling columnar batches. 
    I will raise another PR to move the decoder RDD logic to carbon layer. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r93044529
  
    --- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java ---
    @@ -0,0 +1,248 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.spark.vectorreader;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.carbondata.core.cache.dictionary.Dictionary;
    +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
    +import org.apache.carbondata.core.util.CarbonUtil;
    +import org.apache.carbondata.hadoop.CarbonInputSplit;
    +import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
    +import org.apache.carbondata.scan.executor.QueryExecutor;
    +import org.apache.carbondata.scan.executor.QueryExecutorFactory;
    +import org.apache.carbondata.scan.executor.exception.QueryExecutionException;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.spark.util.CarbonScalaUtil;
    +
    +import org.apache.hadoop.mapreduce.InputSplit;
    +import org.apache.hadoop.mapreduce.RecordReader;
    +import org.apache.hadoop.mapreduce.TaskAttemptContext;
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.vectorized.ColumnarBatch;
    +import org.apache.spark.sql.types.DecimalType;
    +import org.apache.spark.sql.types.StructField;
    +import org.apache.spark.sql.types.StructType;
    +
    +public class VectorizedCarbonRecordReader extends RecordReader<Void, Object> {
    +
    +  private int batchIdx = 0;
    +
    +  private int numBatched = 0;
    +
    +  private ColumnarBatch columnarBatch;
    +
    +  private CarbonColumnarBatch carbonColumnarBatch;
    +
    +  /**
    +   * If true, this class returns batches instead of rows.
    +   */
    +  private boolean returnColumnarBatch;
    +
    +  /**
    +   * The default config on whether columnarBatch should be offheap.
    +   */
    +  private static final MemoryMode DEFAULT_MEMORY_MODE = MemoryMode.ON_HEAP;
    +
    +  private QueryModel queryModel;
    +
    +  private AbstractDetailQueryResultIterator iterator;
    +
    +  private QueryExecutor queryExecutor;
    +
    +  public VectorizedCarbonRecordReader(QueryModel queryModel) {
    +    this.queryModel = queryModel;
    +    enableReturningBatches();
    +  }
    +
    +  /**
    +   * Implementation of RecordReader API.
    +   */
    +  @Override public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptContext)
    +      throws IOException, InterruptedException, UnsupportedOperationException {
    +    // The input split can contain single HDFS block or multiple blocks, so firstly get all the
    +    // blocks and then set them in the query model.
    +    List<CarbonInputSplit> splitList;
    +    if (inputSplit instanceof CarbonInputSplit) {
    +      splitList = new ArrayList<>(1);
    +      splitList.add((CarbonInputSplit) inputSplit);
    +    } else if (inputSplit instanceof CarbonMultiBlockSplit) {
    +      // contains multiple blocks, this is an optimization for concurrent query.
    +      CarbonMultiBlockSplit multiBlockSplit = (CarbonMultiBlockSplit) inputSplit;
    +      splitList = multiBlockSplit.getAllSplits();
    +    } else {
    +      throw new RuntimeException("unsupported input split type: " + inputSplit);
    +    }
    +    List<TableBlockInfo> tableBlockInfoList = CarbonInputSplit.createBlocks(splitList);
    +    queryModel.setTableBlockInfos(tableBlockInfoList);
    +    queryModel.setVectorReader(true);
    +    try {
    +      queryExecutor = QueryExecutorFactory.getQueryExecutor(queryModel);
    +      iterator = (AbstractDetailQueryResultIterator) queryExecutor.execute(queryModel);
    +    } catch (QueryExecutionException e) {
    +      throw new InterruptedException(e.getMessage());
    +    }
    +  }
    +
    +  @Override public void close() throws IOException {
    +    if (columnarBatch != null) {
    +      columnarBatch.close();
    +      columnarBatch = null;
    +    }
    +    // clear dictionary cache
    +    Map<String, Dictionary> columnToDictionaryMapping = queryModel.getColumnToDictionaryMapping();
    +    if (null != columnToDictionaryMapping) {
    +      for (Map.Entry<String, Dictionary> entry : columnToDictionaryMapping.entrySet()) {
    +        CarbonUtil.clearDictionaryCache(entry.getValue());
    +      }
    +    }
    +    try {
    +      queryExecutor.finish();
    +    } catch (QueryExecutionException e) {
    +      throw new IOException(e);
    +    }
    +  }
    +
    +  @Override public boolean nextKeyValue() throws IOException, InterruptedException {
    +    resultBatch();
    +
    +    if (returnColumnarBatch) return nextBatch();
    +
    +    if (batchIdx >= numBatched) {
    +      if (!nextBatch()) return false;
    +    }
    +    ++batchIdx;
    +    return true;
    +  }
    +
    +  @Override public Object getCurrentValue() throws IOException, InterruptedException {
    +    if (returnColumnarBatch) return columnarBatch;
    +    return columnarBatch.getRow(batchIdx - 1);
    +  }
    +
    +  @Override public Void getCurrentKey() throws IOException, InterruptedException {
    +    return null;
    +  }
    +
    +  @Override public float getProgress() throws IOException, InterruptedException {
    +    // TODO : Implement it based on total number of rows it is going to retrive.
    +    return 0;
    +  }
    +
    +  /**
    +   * Returns the ColumnarBatch object that will be used for all rows returned by this reader.
    +   * This object is reused. Calling this enables the vectorized reader. This should be called
    +   * before any calls to nextKeyValue/nextBatch.
    +   */
    +
    +  public void initBatch(MemoryMode memMode) {
    --- End diff --
    
    please check the visibility of each function in this class, many of them do not need to be public


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835389
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
    @@ -78,7 +80,13 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context)
         readSupport.initialize(queryModel.getProjectionColumns(),
             queryModel.getAbsoluteTableIdentifier());
         try {
    -      carbonIterator = new ChunkRowIterator(queryExecutor.execute(queryModel));
    +      if (queryModel.isVectorReader()) {
    +        carbonIterator = new VectorChunkRowIterator(
    --- End diff --
    
    I moved out this logic out of the class. And new `VectorizedCarbonRecordReader` is created.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by piaoyats <gi...@git.apache.org>.

Github user piaoyats commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92531150
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/chunk/impl/FixedLengthDimensionDataChunk.java ---
    @@ -77,13 +79,75 @@ public FixedLengthDimensionDataChunk(byte[] dataChunk, DimensionChunkAttributes
           rowId = chunkAttributes.getInvertedIndexesReverse()[rowId];
         }
         int start = rowId * chunkAttributes.getColumnValueSize();
    +    int dict = getInt(chunkAttributes.getColumnValueSize(), start);
    +    row[columnIndex] = dict;
    +    return columnIndex + 1;
    +  }
    +
    +  @Override public int fillConvertedChunkData(ColumnVectorInfo[] vectorInfo, int column,
    +      KeyStructureInfo restructuringInfo) {
    --- End diff --
    
    What does KeyStructureInfo use? Seem we did not use this parameter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r93044312
  
    --- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java ---
    @@ -0,0 +1,248 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.spark.vectorreader;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.carbondata.core.cache.dictionary.Dictionary;
    +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
    +import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
    +import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
    +import org.apache.carbondata.core.util.CarbonUtil;
    +import org.apache.carbondata.hadoop.CarbonInputSplit;
    +import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
    +import org.apache.carbondata.scan.executor.QueryExecutor;
    +import org.apache.carbondata.scan.executor.QueryExecutorFactory;
    +import org.apache.carbondata.scan.executor.exception.QueryExecutionException;
    +import org.apache.carbondata.scan.model.QueryDimension;
    +import org.apache.carbondata.scan.model.QueryMeasure;
    +import org.apache.carbondata.scan.model.QueryModel;
    +import org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
    +import org.apache.carbondata.spark.util.CarbonScalaUtil;
    +
    +import org.apache.hadoop.mapreduce.InputSplit;
    +import org.apache.hadoop.mapreduce.RecordReader;
    +import org.apache.hadoop.mapreduce.TaskAttemptContext;
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.vectorized.ColumnarBatch;
    +import org.apache.spark.sql.types.DecimalType;
    +import org.apache.spark.sql.types.StructField;
    +import org.apache.spark.sql.types.StructType;
    +
    +public class VectorizedCarbonRecordReader extends RecordReader<Void, Object> {
    --- End diff --
    
    add description to this class, describe the difference from CarbonRecordReader


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    @jackylk Added testcase, please review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830109
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java ---
    @@ -0,0 +1,45 @@
    +package org.apache.carbondata.scan.result.vector;
    --- End diff --
    
    missing file header


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830305
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
    @@ -0,0 +1,29 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +import org.apache.spark.sql.types.Decimal;
    +
    +public interface CarbonColumnVector {
    +
    +  public void putShort(int rowId, short value);
    +
    +  public void putInt(int rowId, int value);
    +
    +  public void putLong(int rowId, long value);
    +
    +  public void putDecimal(int rowId, Decimal value, int precision);
    +
    +  public void putDouble(int rowId, double value);
    +
    +  public void putBytes(int rowId, byte[] value);
    +
    +  public void putBytes(int rowId, int offset, int length, byte[] value);
    --- End diff --
    
    This interface is never used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Is there a test case for this feature? I could not find it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r93071362
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/CarbonLateDecodeStrategy.scala ---
    @@ -428,8 +429,15 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
       }
     
       def supportBatchedDataSource(sqlContext: SQLContext, cols: Seq[Attribute]): Boolean = {
    -    sqlContext.conf.wholeStageEnabled &&
    -    sqlContext.sparkSession.conf.get("carbon.enable.vector.reader", "true").toBoolean &&
    +    val enableReader = {
    +      if (sqlContext.sparkSession.conf.contains(CarbonCommonConstants.ENABLE_VECTOR_READER)) {
    +        sqlContext.sparkSession.conf.get(CarbonCommonConstants.ENABLE_VECTOR_READER).toBoolean
    +      } else {
    +        System.getProperty(CarbonCommonConstants.ENABLE_VECTOR_READER,
    +          CarbonCommonConstants.ENABLE_VECTOR_READER_DEFAULT).toBoolean
    +      }
    +    }
    +    sqlContext.conf.wholeStageEnabled && enableReader &&
         cols.forall(_.dataType.isInstanceOf[AtomicType])
    --- End diff --
    
    incorrect indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Failed  with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/207/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/241/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [WIP]Added vector reader in Carbon scan.

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/86/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r93013995
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -455,12 +463,10 @@ private Expression getFilterPredicates(Configuration configuration) {
             queryModel.setInvalidSegmentIds(invalidSegments);
           }
         }
    -
    -    CarbonReadSupport readSupport = getReadSupportClass(configuration);
    -    return new CarbonRecordReader<T>(queryModel, readSupport);
    +    return queryModel;
       }
     
    -  private CarbonReadSupport getReadSupportClass(Configuration configuration) {
    +  public CarbonReadSupport getReadSupportClass(Configuration configuration) {
    --- End diff --
    
    why is it public? it is used internal only, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91961576
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala ---
    @@ -150,12 +153,25 @@ class CarbonScanRDD[V: ClassTag](
         val attemptContext = new TaskAttemptContextImpl(new Configuration(), attemptId)
         val format = prepareInputFormatForExecutor(attemptContext.getConfiguration)
         val inputSplit = split.asInstanceOf[CarbonSparkPartition].split.value
    -    val reader = format.createRecordReader(inputSplit, attemptContext)
    +    val model = format.getQueryModel(inputSplit, attemptContext)
    +    val reader = {
    +      if (vectorReader) {
    +        val carbonRecordReader = createVectorizedCarbonRecordReader(model)
    +        if (carbonRecordReader == null) {
    +          new CarbonRecordReader(model, format.getReadSupportClass(attemptContext.getConfiguration))
    +        } else {
    +          carbonRecordReader
    +        }
    +      } else {
    +        new CarbonRecordReader(model, format.getReadSupportClass(attemptContext.getConfiguration))
    --- End diff --
    
    should not new CarbonRecordReader directly, can we choose:
    option 1: create two InputFormat, one for batch and another for non-batch
    option 2: one InputFormat, create RecordReader according to configuration.
    I prefer option 2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835397
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java ---
    @@ -0,0 +1,45 @@
    +package org.apache.carbondata.scan.result.vector;
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

Posted by piaoyats <gi...@git.apache.org>.

Github user piaoyats commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r92531597
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/chunk/impl/FixedLengthDimensionDataChunk.java ---
    @@ -77,13 +79,75 @@ public FixedLengthDimensionDataChunk(byte[] dataChunk, DimensionChunkAttributes
           rowId = chunkAttributes.getInvertedIndexesReverse()[rowId];
         }
         int start = rowId * chunkAttributes.getColumnValueSize();
    +    int dict = getInt(chunkAttributes.getColumnValueSize(), start);
    +    row[columnIndex] = dict;
    +    return columnIndex + 1;
    +  }
    +
    +  @Override public int fillConvertedChunkData(ColumnVectorInfo[] vectorInfo, int column,
    +      KeyStructureInfo restructuringInfo) {
    +    ColumnVectorInfo columnVectorInfo = vectorInfo[column];
    +    int offset = columnVectorInfo.offset;
    +    int vectorOffset = columnVectorInfo.vectorOffset;
    +    int len = columnVectorInfo.size + offset;
    +    int[] indexesReverse = chunkAttributes.getInvertedIndexesReverse();
    +    int columnValueSize = chunkAttributes.getColumnValueSize();
    +    CarbonColumnVector vector = columnVectorInfo.vector;
    +    for (int j = offset; j < len; j++) {
    +      int start =
    +          indexesReverse == null ? j * columnValueSize : indexesReverse[j] * columnValueSize;
    +      int dict = getInt(columnValueSize, start);
    +      if (columnVectorInfo.directDictionaryGenerator == null) {
    +        vector.putInt(vectorOffset++, dict);
    +      } else {
    +        Object valueFromSurrogate =
    +            columnVectorInfo.directDictionaryGenerator.getValueFromSurrogate(dict);
    +        if (valueFromSurrogate == null) {
    +          vector.putNull(vectorOffset++);
    +        } else {
    +          vector.putLong(vectorOffset++, (long) valueFromSurrogate);
    +        }
    +      }
    +    }
    +    return column + 1;
    +  }
    +
    +  @Override
    +  public int fillConvertedChunkData(int[] rowMapping, ColumnVectorInfo[] vectorInfo, int column,
    +      KeyStructureInfo restructuringInfo) {
    +    ColumnVectorInfo columnVectorInfo = vectorInfo[column];
    +    int offset = columnVectorInfo.offset;
    +    int vectorOffset = columnVectorInfo.vectorOffset;
    +    int len = columnVectorInfo.size + offset;
    +    int[] indexesReverse = chunkAttributes.getInvertedIndexesReverse();
    +    int columnValueSize = chunkAttributes.getColumnValueSize();
    +    CarbonColumnVector vector = columnVectorInfo.vector;
    +    for (int j = offset; j < len; j++) {
    +      int start = indexesReverse == null ?
    +          rowMapping[j] * columnValueSize :indexesReverse[rowMapping[j]] * columnValueSize;
    --- End diff --
    
    add space after : 
    can we add some notes or docs to show the logic, so readers can understand easily


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/107/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830114
  
    --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java ---
    @@ -0,0 +1,45 @@
    +package org.apache.carbondata.scan.result.vector;
    +
    +public class CarbonColumnarBatch {
    --- End diff --
    
    I think it is better to use name `ColumnarBatch` directly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/412
  
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/208/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---