You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/05/28 06:25:55 UTC
[GitHub] [carbondata] Indhumathi27 opened a new pull request #3778: [WIP] Support array with SI
Indhumathi27 opened a new pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778
### Why is this PR needed?
### What changes were proposed in this PR?
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
### Is any new testcase added?
- No
- Yes
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r480920128
##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
/**
* This method will prepare the data from raw object that will take part in sorting
*/
- private Object[] prepareRowObjectForSorting(Object[] row) {
+ private Object[] prepareRowObjectForSorting(Object[] row,
+ Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+ throws SecondaryIndexException {
ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
- // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+ byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
List<CarbonDimension> dimensions = segmentProperties.getDimensions();
Object[] preparedRow = new Object[dimensions.size() + measureCount];
+ Map<Integer, Object[]> complexDataMap = new HashMap<>();
int noDictionaryIndex = 0;
int dictionaryIndex = 0;
+ int complexIndex = 0;
int i = 0;
// loop excluding last dimension as last one is implicit column.
for (; i < dimensions.size() - 1; i++) {
CarbonDimension dims = dimensions.get(i);
- if (dims.hasEncoding(Encoding.DICTIONARY)) {
+ boolean isComplexColumn = false;
+ // check if dimension is a complex data type
+ if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
Review comment:
Can we use dim.isComplex to do the same check?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661763876
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r458509321
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,35 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
@Override
public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
- Object[] data = fillData(dataBuffer);
+ return getDataBasedOnDataType(dataBuffer, false);
+ }
+
+ @Override
+ public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {
+ Object[] data = fillData(dataBuffer, false);
if (data == null) {
return null;
}
return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data);
}
- protected Object[] fillData(ByteBuffer dataBuffer) {
+ @Override
+ public Object[] getObjectArrayDataBasedOnDataType(ByteBuffer dataBuffer) {
+ Object[] data = fillData(dataBuffer, true);
+ if (data == null) {
+ return null;
+ }
+ return data;
Review comment:
return fillData(dataBuffer, true);
##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -281,12 +310,45 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
}
}
+ if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
+ // In case of complex array type, flatten the data and add for sorting
+ // TODO: Handle for nested array and other complex types
+ for (int k = 0; k < wrapper.getComplexTypesKeys().length; k++) {
+ byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(k);
+ ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+ GenericQueryType genericQueryType =
+ complexDimensionInfoMap.get(complexColumnParentBlockIndexes[k]);
+ short length = byteArrayInput.getShort(2);
+ // get flattened array data
+ Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+ if (length != 1) {
+ for (int j = 1; j < length; j++) {
+ preparedRow[i] = getData(data, j);
+ preparedRow[i + 1] = implicitColumnByteArray;
+ addRowForSorting(preparedRow.clone());
+ }
+ // add first row
+ preparedRow[i] = getData(data, 0);
+ } else {
+ preparedRow[i] = getData(data, 0);
+ }
Review comment:
}
preparedRow[i] = getData(data, 0);
##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -226,8 +239,16 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
for (CarbonIterator<RowBatch> detailQueryIterator : detailQueryResultIteratorList) {
while (detailQueryIterator.hasNext()) {
RowBatch batchResult = detailQueryIterator.next();
+ DetailQueryResultIterator queryIterator = (DetailQueryResultIterator) detailQueryIterator;
+ BlockExecutionInfo blockExecutionInfo = queryIterator.getBlockExecutionInfo();
+ // get complex dimension info map from block execution info
+ Map<Integer, GenericQueryType> complexDimensionInfoMap =
+ blockExecutionInfo.getComplexDimensionInfoMap();
+ int[] complexColumnParentBlockIndexes =
+ blockExecutionInfo.getComplexColumnParentBlockIndexes();
Review comment:
move to outside of while loop
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677708488
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3818/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671499205
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1948/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674787907
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2004/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677642207
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3810/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365673
##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
##########
@@ -221,4 +221,9 @@ public int numberOfNodes() {
public List<TableBlockInfo> getBlockInfos() {
Review comment:
removed getBlockInfos method
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673324648
please rebase
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r467978296
##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+ override def beforeEach(): Unit = {
+ sql("drop table if exists complextable")
+ }
+
+ override def afterEach(): Unit = {
+ sql("drop index if exists index_1 on complextable")
+ sql("drop table if exists complextable")
+ }
+
+ test("test array<string> on secondary index") {
Review comment:
currenlty, for d), for query having more than one array contains filter will not SI. query will hit main table only. added testcase for c)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r469772276
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -39,7 +39,7 @@ public ArrayQueryType(String name, String parentName, int columnIndex) {
@Override
public void addChildren(GenericQueryType children) {
- if (this.getName().equals(children.getParentName())) {
+ if (null == this.getName() || this.getName().equals(children.getParentName())) {
Review comment:
When the name can be null?
##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2456,4 +2456,15 @@ private CarbonCommonConstants() {
* property which defines the insert stage flow
*/
public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+ /**
+ * Until the threshold for complex filter is reached, row id will be set to the bitset in
+ * implicit filter during secondary index pruning
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
Review comment:
better to move all constants of SI together to one place of this class.
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##########
@@ -41,39 +44,62 @@
* map that contains the mapping of block id to the valid blocklets in that block which contain
* the data as per the applied filter
*/
- private Map<String, Set<Integer>> blockIdToBlockletIdMapping;
+ private final Map<String, Set<String>> blockIdToBlockletIdMapping;
+
+ /**
+ * checks if implicit filter exceeds complex filter threshold
+ */
+ private boolean isThresholdReached;
public ImplicitExpression(List<Expression> implicitFilterList) {
+ final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName());
Review comment:
move LOGGER to be a static field of this class
##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
##########
@@ -221,4 +221,9 @@ public int numberOfNodes() {
public List<TableBlockInfo> getBlockInfos() {
Review comment:
after we added getTableBlockInfo(), can we remove this method?
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##########
@@ -41,39 +44,62 @@
* map that contains the mapping of block id to the valid blocklets in that block which contain
* the data as per the applied filter
*/
- private Map<String, Set<Integer>> blockIdToBlockletIdMapping;
+ private final Map<String, Set<String>> blockIdToBlockletIdMapping;
+
+ /**
+ * checks if implicit filter exceeds complex filter threshold
+ */
+ private boolean isThresholdReached;
public ImplicitExpression(List<Expression> implicitFilterList) {
+ final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName());
// initialize map with half the size of filter list as one block id can contain
// multiple blocklets
blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2);
for (Expression value : implicitFilterList) {
String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString();
addBlockEntry(blockletPath);
}
+ int complexFilterThreshold = CarbonProperties.getInstance().getComplexFilterThresholdForSI();
+ isThresholdReached = implicitFilterList.size() > complexFilterThreshold;
+ if (isThresholdReached) {
+ LOGGER.info("Implicit Filter Size: " + implicitFilterList.size() + ", Threshold is: "
+ + complexFilterThreshold);
+ }
}
- public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) {
+ public ImplicitExpression(Map<String, Set<String>> blockIdToBlockletIdMapping) {
this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping;
}
private void addBlockEntry(String blockletPath) {
Review comment:
The logic of this method is hard to understand.
Can we add a flag into ImplicitExpression when it is created?
if it is blocklet level, we addBlockletEntry.
if it is row level, we addRowEntry.
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
@Override
public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
- Object[] data = fillData(dataBuffer);
+ return getDataBasedOnDataType(dataBuffer, false);
+ }
+
+ @Override
+ public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {
Review comment:
how about to keep the old method and add a new method getObjectDataBasedOnDataType?
It will not need this boolean parameter.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661191872
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1701/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684727237
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3942/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660967562
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3438/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r458547213
##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+ override def beforeEach(): Unit = {
+ sql("drop table if exists complextable")
+ }
+
+ override def afterEach(): Unit = {
+ sql("drop index if exists index_1 on complextable")
+ sql("drop table if exists complextable")
+ }
+
+ test("test array<string> on secondary index") {
Review comment:
a) I feel we should handle the SI support for all array of primitive not just string.
b) Better to discuss the row level SI solution in community once ?
c) Also if multiple SI created for one primitive and one complex column. query gives 0 rows. Need to handle it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-680679287
@QiangCai @ajantha-bhat Reverted row level position reference code changes. Please review
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661187440
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3443/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-662267686
Agree with @ajantha-bhat
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r480960162
##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
/**
* This method will prepare the data from raw object that will take part in sorting
*/
- private Object[] prepareRowObjectForSorting(Object[] row) {
+ private Object[] prepareRowObjectForSorting(Object[] row,
+ Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+ throws SecondaryIndexException {
ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
- // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+ byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
List<CarbonDimension> dimensions = segmentProperties.getDimensions();
Object[] preparedRow = new Object[dimensions.size() + measureCount];
+ Map<Integer, Object[]> complexDataMap = new HashMap<>();
int noDictionaryIndex = 0;
int dictionaryIndex = 0;
+ int complexIndex = 0;
int i = 0;
// loop excluding last dimension as last one is implicit column.
for (; i < dimensions.size() - 1; i++) {
CarbonDimension dims = dimensions.get(i);
- if (dims.hasEncoding(Encoding.DICTIONARY)) {
+ boolean isComplexColumn = false;
+ // check if dimension is a complex data type
+ if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
+ for (GenericQueryType queryType : complexDimensionInfoMap.values()) {
+ if (queryType.getName().equalsIgnoreCase(dims.getColName())) {
+ isComplexColumn = true;
+ break;
+ }
+ }
+ }
+ if (dims.hasEncoding(Encoding.DICTIONARY) && !isComplexColumn) {
// dictionary
preparedRow[i] = wrapper.getDictionaryKeyByIndex(dictionaryIndex++);
} else {
- // no dictionary dims
- byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
- // no dictionary primitive columns are expected to be in original data while loading,
- // so convert it to original data
- if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
- Object dataFromBytes = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn(
- noDictionaryKeyByIndex, dims.getDataType());
- if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
- dataFromBytes = (long) dataFromBytes / 1000L;
+ if (isComplexColumn) {
+ byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
+ ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+ GenericQueryType genericQueryType =
+ complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
+ int complexDataLength = byteArrayInput.getShort(2);
+ // In case, if array is empty
+ if (complexDataLength == 0) {
+ complexDataLength = complexDataLength + 1;
+ }
+ // get flattened array data
+ Object[] complexFlattenedData = new Object[complexDataLength];
+ Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+ for (int index = 0; index < complexDataLength; index++) {
+ complexFlattenedData[index] =
+ getData(data, index, dims.getColumnSchema().getDataType());
}
- preparedRow[i] = dataFromBytes;
+ complexDataMap.put(i, complexFlattenedData);
} else {
- preparedRow[i] = noDictionaryKeyByIndex;
+ // no dictionary dims
+ byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
+ // no dictionary primitive columns are expected to be in original data while loading,
+ // so convert it to original data
+ if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
+ Object dataFromBytes = DataTypeUtil
+ .getDataBasedOnDataTypeForNoDictionaryColumn(noDictionaryKeyByIndex,
+ dims.getDataType());
+ if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
+ dataFromBytes = (long) dataFromBytes / 1000L;
+ }
+ preparedRow[i] = dataFromBytes;
+ } else {
+ preparedRow[i] = noDictionaryKeyByIndex;
+ }
}
}
}
// at last add implicit column position reference(PID)
+ preparedRow[i] = implicitColumnByteArray;
- preparedRow[i] = wrapper.getImplicitColumnByteArray();
+ // In case of complex array type, flatten the data and add for sorting
+ // TODO: Handle for nested array and other complex types
Review comment:
added
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r480922354
##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
/**
* This method will prepare the data from raw object that will take part in sorting
*/
- private Object[] prepareRowObjectForSorting(Object[] row) {
+ private Object[] prepareRowObjectForSorting(Object[] row,
+ Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+ throws SecondaryIndexException {
ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
- // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+ byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
List<CarbonDimension> dimensions = segmentProperties.getDimensions();
Object[] preparedRow = new Object[dimensions.size() + measureCount];
+ Map<Integer, Object[]> complexDataMap = new HashMap<>();
int noDictionaryIndex = 0;
int dictionaryIndex = 0;
+ int complexIndex = 0;
int i = 0;
// loop excluding last dimension as last one is implicit column.
for (; i < dimensions.size() - 1; i++) {
CarbonDimension dims = dimensions.get(i);
- if (dims.hasEncoding(Encoding.DICTIONARY)) {
+ boolean isComplexColumn = false;
+ // check if dimension is a complex data type
+ if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
+ for (GenericQueryType queryType : complexDimensionInfoMap.values()) {
+ if (queryType.getName().equalsIgnoreCase(dims.getColName())) {
+ isComplexColumn = true;
+ break;
+ }
+ }
+ }
+ if (dims.hasEncoding(Encoding.DICTIONARY) && !isComplexColumn) {
// dictionary
preparedRow[i] = wrapper.getDictionaryKeyByIndex(dictionaryIndex++);
} else {
- // no dictionary dims
- byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
- // no dictionary primitive columns are expected to be in original data while loading,
- // so convert it to original data
- if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
- Object dataFromBytes = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn(
- noDictionaryKeyByIndex, dims.getDataType());
- if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
- dataFromBytes = (long) dataFromBytes / 1000L;
+ if (isComplexColumn) {
+ byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
+ ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+ GenericQueryType genericQueryType =
+ complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
+ int complexDataLength = byteArrayInput.getShort(2);
+ // In case, if array is empty
+ if (complexDataLength == 0) {
+ complexDataLength = complexDataLength + 1;
+ }
+ // get flattened array data
+ Object[] complexFlattenedData = new Object[complexDataLength];
+ Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+ for (int index = 0; index < complexDataLength; index++) {
+ complexFlattenedData[index] =
+ getData(data, index, dims.getColumnSchema().getDataType());
}
- preparedRow[i] = dataFromBytes;
+ complexDataMap.put(i, complexFlattenedData);
} else {
- preparedRow[i] = noDictionaryKeyByIndex;
+ // no dictionary dims
+ byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
+ // no dictionary primitive columns are expected to be in original data while loading,
+ // so convert it to original data
+ if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
+ Object dataFromBytes = DataTypeUtil
+ .getDataBasedOnDataTypeForNoDictionaryColumn(noDictionaryKeyByIndex,
+ dims.getDataType());
+ if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
+ dataFromBytes = (long) dataFromBytes / 1000L;
+ }
+ preparedRow[i] = dataFromBytes;
+ } else {
+ preparedRow[i] = noDictionaryKeyByIndex;
+ }
}
}
}
// at last add implicit column position reference(PID)
+ preparedRow[i] = implicitColumnByteArray;
- preparedRow[i] = wrapper.getImplicitColumnByteArray();
+ // In case of complex array type, flatten the data and add for sorting
+ // TODO: Handle for nested array and other complex types
Review comment:
Please mention in some doc, that nexted complex is not supported
##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
/**
* This method will prepare the data from raw object that will take part in sorting
*/
- private Object[] prepareRowObjectForSorting(Object[] row) {
+ private Object[] prepareRowObjectForSorting(Object[] row,
+ Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+ throws SecondaryIndexException {
ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
- // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+ byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
List<CarbonDimension> dimensions = segmentProperties.getDimensions();
Object[] preparedRow = new Object[dimensions.size() + measureCount];
+ Map<Integer, Object[]> complexDataMap = new HashMap<>();
int noDictionaryIndex = 0;
int dictionaryIndex = 0;
+ int complexIndex = 0;
int i = 0;
// loop excluding last dimension as last one is implicit column.
for (; i < dimensions.size() - 1; i++) {
CarbonDimension dims = dimensions.get(i);
- if (dims.hasEncoding(Encoding.DICTIONARY)) {
+ boolean isComplexColumn = false;
+ // check if dimension is a complex data type
+ if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
+ for (GenericQueryType queryType : complexDimensionInfoMap.values()) {
+ if (queryType.getName().equalsIgnoreCase(dims.getColName())) {
+ isComplexColumn = true;
+ break;
+ }
+ }
+ }
+ if (dims.hasEncoding(Encoding.DICTIONARY) && !isComplexColumn) {
// dictionary
preparedRow[i] = wrapper.getDictionaryKeyByIndex(dictionaryIndex++);
} else {
- // no dictionary dims
- byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
- // no dictionary primitive columns are expected to be in original data while loading,
- // so convert it to original data
- if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
- Object dataFromBytes = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn(
- noDictionaryKeyByIndex, dims.getDataType());
- if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
- dataFromBytes = (long) dataFromBytes / 1000L;
+ if (isComplexColumn) {
+ byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
+ ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+ GenericQueryType genericQueryType =
+ complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
+ int complexDataLength = byteArrayInput.getShort(2);
+ // In case, if array is empty
+ if (complexDataLength == 0) {
+ complexDataLength = complexDataLength + 1;
+ }
+ // get flattened array data
+ Object[] complexFlattenedData = new Object[complexDataLength];
+ Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+ for (int index = 0; index < complexDataLength; index++) {
+ complexFlattenedData[index] =
+ getData(data, index, dims.getColumnSchema().getDataType());
}
- preparedRow[i] = dataFromBytes;
+ complexDataMap.put(i, complexFlattenedData);
} else {
- preparedRow[i] = noDictionaryKeyByIndex;
+ // no dictionary dims
+ byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
+ // no dictionary primitive columns are expected to be in original data while loading,
+ // so convert it to original data
+ if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
+ Object dataFromBytes = DataTypeUtil
+ .getDataBasedOnDataTypeForNoDictionaryColumn(noDictionaryKeyByIndex,
+ dims.getDataType());
+ if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
+ dataFromBytes = (long) dataFromBytes / 1000L;
+ }
+ preparedRow[i] = dataFromBytes;
+ } else {
+ preparedRow[i] = noDictionaryKeyByIndex;
+ }
}
}
}
// at last add implicit column position reference(PID)
+ preparedRow[i] = implicitColumnByteArray;
- preparedRow[i] = wrapper.getImplicitColumnByteArray();
+ // In case of complex array type, flatten the data and add for sorting
+ // TODO: Handle for nested array and other complex types
Review comment:
Please mention in some doc, that nested complex is not supported
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671341729
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1943/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471459506
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
@Override
public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
- Object[] data = fillData(dataBuffer);
+ return getDataBasedOnDataType(dataBuffer, false);
+ }
+
+ @Override
+ public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {
Review comment:
handled
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660968353
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1696/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677789319
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3820/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660048799
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3422/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473600323
##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
*/
public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
+ /**
+ * Until the threshold for complex filter is reached, row id will be set to the bitset in
+ * implicit filter during secondary index pruning
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+ /**
+ * Maximum value for complex filter threshold
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+ /**
+ * Property to decide if position id till row level or not
+ */
+ public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =
Review comment:
In the community, we concluded that no need of row level position reference. So, why this is required ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673649958
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1974/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r460795333
##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+ override def beforeEach(): Unit = {
+ sql("drop table if exists complextable")
+ }
+
+ override def afterEach(): Unit = {
+ sql("drop index if exists index_1 on complextable")
+ sql("drop table if exists complextable")
+ }
+
+ test("test array<string> on secondary index") {
Review comment:
d) If two array_contains() present with AND in query. When it is pushed down as equal to filter in SI. It will give 0 rows as SI is flattened and it cannot find two values in one row. Need to handle that also
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684965465
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2212/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671340388
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3682/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660869540
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1689/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-635224307
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677647032
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2069/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682146723
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2155/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r480961113
##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
/**
* This method will prepare the data from raw object that will take part in sorting
*/
- private Object[] prepareRowObjectForSorting(Object[] row) {
+ private Object[] prepareRowObjectForSorting(Object[] row,
+ Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+ throws SecondaryIndexException {
ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
- // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+ byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
List<CarbonDimension> dimensions = segmentProperties.getDimensions();
Object[] preparedRow = new Object[dimensions.size() + measureCount];
+ Map<Integer, Object[]> complexDataMap = new HashMap<>();
int noDictionaryIndex = 0;
int dictionaryIndex = 0;
+ int complexIndex = 0;
int i = 0;
// loop excluding last dimension as last one is implicit column.
for (; i < dimensions.size() - 1; i++) {
CarbonDimension dims = dimensions.get(i);
- if (dims.hasEncoding(Encoding.DICTIONARY)) {
+ boolean isComplexColumn = false;
+ // check if dimension is a complex data type
+ if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
Review comment:
dim is actually the index table dimension, where complex type of main table is stored as its primitive type in SI. Hence, we cannot check isComplex from dimension.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365424
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -39,7 +39,7 @@ public ArrayQueryType(String name, String parentName, int columnIndex) {
@Override
public void addChildren(GenericQueryType children) {
- if (this.getName().equals(children.getParentName())) {
+ if (null == this.getName() || this.getName().equals(children.getParentName())) {
Review comment:
removed this check
##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2456,4 +2456,15 @@ private CarbonCommonConstants() {
* property which defines the insert stage flow
*/
public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+ /**
+ * Until the threshold for complex filter is reached, row id will be set to the bitset in
+ * implicit filter during secondary index pruning
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
Review comment:
handled
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660867814
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3431/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] brijoobopanna commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
brijoobopanna commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682087451
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677790243
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2079/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-659944291
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1676/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671194649
please rebase it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684734917
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2202/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473620842
##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
*/
public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
+ /**
+ * Until the threshold for complex filter is reached, row id will be set to the bitset in
+ * implicit filter during secondary index pruning
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+ /**
+ * Maximum value for complex filter threshold
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+ /**
+ * Property to decide if position id till row level or not
+ */
+ public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =
Review comment:
cc: @kunal642 , @QiangCai , @ravipesala
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
kunal642 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-685454195
LGTM
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-635205017
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3081/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661078111
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1700/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674854838
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2005/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471254176
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
@Override
public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
- Object[] data = fillData(dataBuffer);
+ return getDataBasedOnDataType(dataBuffer, false);
+ }
+
+ @Override
+ public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {
Review comment:
why not call getObjectDataBasedOnDataType?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661065769
please rebase it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471242699
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
@Override
public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
- Object[] data = fillData(dataBuffer);
+ return getDataBasedOnDataType(dataBuffer, false);
+ }
+
+ @Override
+ public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {
Review comment:
Already added a new method getObjectDataBasedOnDataType. this boolen is still required, as in filldata() method, complex children getDataBasedOnDataType will be called
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660051811
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1681/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-665812800
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-662251770
a) I feel we should handle the SI support for all array<primitive> not just string.
b) Better to discuss the row level SI solution in community once ?
@QiangCai , @kunal642 @Indhumathi27
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671496551
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3687/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-659945334
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3418/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-635205935
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1360/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat edited a comment on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-662251770
a) I feel we should handle the SI support for all array of primitive not just string.
b) Better to discuss the row level SI solution in community once ?
@QiangCai , @kunal642 @Indhumathi27
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r458561948
##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+ override def beforeEach(): Unit = {
+ sql("drop table if exists complextable")
+ }
+
+ override def afterEach(): Unit = {
+ sql("drop index if exists index_1 on complextable")
+ sql("drop table if exists complextable")
+ }
+
+ test("test array<string> on secondary index") {
Review comment:
also add a test case for c)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473601787
##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
*/
public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
+ /**
+ * Until the threshold for complex filter is reached, row id will be set to the bitset in
+ * implicit filter during secondary index pruning
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+ /**
+ * Maximum value for complex filter threshold
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+ /**
+ * Property to decide if position id till row level or not
+ */
+ public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =
Review comment:
CC: @kunal642 , @QiangCai
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682145902
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3896/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473600574
##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
*/
public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
+ /**
+ * Until the threshold for complex filter is reached, row id will be set to the bitset in
+ * implicit filter during secondary index pruning
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+ /**
+ * Maximum value for complex filter threshold
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+ /**
+ * Property to decide if position id till row level or not
+ */
+ public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =
Review comment:
I think above 2 properties also not needed if it is not row level position reference
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684738151
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473599181
##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala
##########
@@ -443,10 +443,34 @@ private[sql] case class CarbonCreateSecondaryIndexCommand(
databaseName: String, tableName: String, indexTableName: String,
absoluteTableIdentifier: AbsoluteTableIdentifier): TableInfo = {
var schemaOrdinal = -1
- var allColumns = indexModel.columnNames.map { indexCol =>
- val colSchema = carbonTable.getDimensionByName(indexCol).getColumnSchema
+ val complexDimensions = carbonTable.getAllDimensions.asScala
+ .filter(dim => dim.getDataType.isComplexType &&
+ indexModel.columnNames.asJava.contains(dim.getColName))
+ if (complexDimensions.size > 1) {
+ throw new ErrorMessage("SI creation with more than one complex type is not supported yet");
+ }
+ var allColumns = List[ColumnSchema]()
Review comment:
Consider the scenario where one SI table contains (complex, primitive1, primitive2), we need to maintain the same order. But now it will become primitive1,primitive2,complex1. which is wrong.
So, suggest to keep the user-specified order.
Can refer below code
var allColumns = List[ColumnSchema]()
indexModel.columnNames.foreach { indexCol =>
val dimension = carbonTable.getDimensionByName(tableName, indexCol)
val colSchema = dimension.getColumnSchema
schemaOrdinal += 1
allColumns = allColumns :+ cloneColumnSchema(colSchema, schemaOrdinal)
}
complexDimensions.foreach { complexDim =>
if (complexDim.getNumberOfChild > 0) {
if (complexDim.getListOfChildDimensions.asScala
.exists(col => DataTypes.isArrayType(col.getDataType))) {
throw new ErrorMessage("SI creation with nested array complex type is not supported yet");
}
}
}
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674861462
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3746/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684963426
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3952/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673883264
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3716/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673940891
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677708011
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2077/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661075813
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3442/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473601787
##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
*/
public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
+ /**
+ * Until the threshold for complex filter is reached, row id will be set to the bitset in
+ * implicit filter during secondary index pruning
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+ /**
+ * Maximum value for complex filter threshold
+ */
+ public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+ /**
+ * Property to decide if position id till row level or not
+ */
+ public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =
Review comment:
CC: @kunal642 , @QiangCai
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673637872
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3713/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473873670
##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala
##########
@@ -443,10 +443,34 @@ private[sql] case class CarbonCreateSecondaryIndexCommand(
databaseName: String, tableName: String, indexTableName: String,
absoluteTableIdentifier: AbsoluteTableIdentifier): TableInfo = {
var schemaOrdinal = -1
- var allColumns = indexModel.columnNames.map { indexCol =>
- val colSchema = carbonTable.getDimensionByName(indexCol).getColumnSchema
+ val complexDimensions = carbonTable.getAllDimensions.asScala
+ .filter(dim => dim.getDataType.isComplexType &&
+ indexModel.columnNames.asJava.contains(dim.getColName))
+ if (complexDimensions.size > 1) {
+ throw new ErrorMessage("SI creation with more than one complex type is not supported yet");
+ }
+ var allColumns = List[ColumnSchema]()
Review comment:
handled
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471366000
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##########
@@ -41,39 +44,62 @@
* map that contains the mapping of block id to the valid blocklets in that block which contain
* the data as per the applied filter
*/
- private Map<String, Set<Integer>> blockIdToBlockletIdMapping;
+ private final Map<String, Set<String>> blockIdToBlockletIdMapping;
+
+ /**
+ * checks if implicit filter exceeds complex filter threshold
+ */
+ private boolean isThresholdReached;
public ImplicitExpression(List<Expression> implicitFilterList) {
+ final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName());
// initialize map with half the size of filter list as one block id can contain
// multiple blocklets
blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2);
for (Expression value : implicitFilterList) {
String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString();
addBlockEntry(blockletPath);
}
+ int complexFilterThreshold = CarbonProperties.getInstance().getComplexFilterThresholdForSI();
+ isThresholdReached = implicitFilterList.size() > complexFilterThreshold;
+ if (isThresholdReached) {
+ LOGGER.info("Implicit Filter Size: " + implicitFilterList.size() + ", Threshold is: "
+ + complexFilterThreshold);
+ }
}
- public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) {
+ public ImplicitExpression(Map<String, Set<String>> blockIdToBlockletIdMapping) {
this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping;
}
private void addBlockEntry(String blockletPath) {
Review comment:
handled
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684847720
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3948/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP] Support array with SI
Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-659830702
please describe the PR and fix the failure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673882872
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1977/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365947
##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##########
@@ -41,39 +44,62 @@
* map that contains the mapping of block id to the valid blocklets in that block which contain
* the data as per the applied filter
*/
- private Map<String, Set<Integer>> blockIdToBlockletIdMapping;
+ private final Map<String, Set<String>> blockIdToBlockletIdMapping;
+
+ /**
+ * checks if implicit filter exceeds complex filter threshold
+ */
+ private boolean isThresholdReached;
public ImplicitExpression(List<Expression> implicitFilterList) {
+ final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName());
Review comment:
moved
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org