You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/14 00:47:44 UTC

[GitHub] [pinot] MeihanLi opened a new issue #8019: Group by multi-value columns without aggregation can not work

MeihanLi opened a new issue #8019:
URL: https://github.com/apache/pinot/issues/8019


   [Pinot SQL] It seems that non-aggregation group by only works for single value columns. Grouping by multi-value columns throws exceptions with error code 200. The exception message from Pinot Controller UI is redundant and it repeats the first message over 20 times and the detailed stack tree is not meaningful for us to understand what causes the issue. 
   
   Example tags_value (String array):
   `restaurant_professional,restaurant_fast,restaurant_fresh,restaurant_tasty`
   
   Example query:
   `select tags_value from myTable GROUP BY tags_value`
   
   Error message from Pinot Controller UI:
   `  {
       "errorCode": 200,
       "message": "QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat org.apache.pinot.segment.spi.index.reader.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readDictIds(DataFetcher.java:278)\n\tat org.apache.pinot.core.common.DataFetcher.fetchDictIds(DataFetcher.java:88)\n\tat org.apache.pinot.core.common.DataBlockCache.getDictIdsForSVColumn(DataBlockCache.java:99)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getDictionaryIdsSV(ProjectionBlockValSet.java:69)\n\tat org.apache.pinot.core.query.distinct.dictionary.DictionaryBasedSingleColumnDistinctOnlyExecutor.process(DictionaryBasedSingleColumnDistinctOnlyExecutor.java:42)\n\tat org.apache.pinot.core.operator.query.DistinctOperator.getNextBlock(DistinctOperator.java:61)\n\tat org.apache.pinot.core.operator.query.DistinctOperator.getNextBlock(DistinctOperator.java:38)\n\tat org.apache.pinot.core.operator.BaseOperator
 .nextBlock(BaseOperator.java:49)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator.processSegments(BaseCombineOperator.java:150)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:105)\n\tat org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)"
     },`
   
   Error message from the server:
   `2022-01-10 19:16:24.768 [pqw-10] ERROR org.apache.pinot.core.operator.combine.BaseCombineOperator  - Caught exception while executing operator of index: 1 (query: QueryContext{_tableName='re
   staurant_bi_feedback_OFFLINE', _selectExpressions=[distinct(tags_value)], _aliasList=[null], _filter=rating_value > '0', _groupByExpressions=null, _havingFilter=null, _orderByExpressions=nul
   l, _limit=10, _offset=0, _queryOptions={responseFormat=sql, trace=true, groupByMode=sql, timeoutMs=16000}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:
   restaurant_bi_feedback_OFFLINE), pinotQuery:PinotQuery(dataSource:DataSource(tableName:restaurant_bi_feedback_OFFLINE), selectList:[Expression(type:FUNCTION, functionCall:Function(operator:D
   ISTINCT, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:tags_value))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:GREATER_THAN, operands:[
   Expression(type:IDENTIFIER, identifier:Identifier(name:rating_value)), Expression(type:LITERAL, literal:<Literal longValue:0>)])), groupByList:[], orderByList:[], limit:10, queryOptions:{res
   ponseFormat=sql, trace=true, groupByMode=sql, timeoutMs=16000}))})
   java.lang.UnsupportedOperationException: null
           at org.apache.pinot.segment.spi.index.reader.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)
           at org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readDictIds(DataFetcher.java:278)
           at org.apache.pinot.core.common.DataFetcher.fetchDictIds(DataFetcher.java:88)
           at org.apache.pinot.core.common.DataBlockCache.getDictIdsForSVColumn(DataBlockCache.java:99)
           at org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getDictionaryIdsSV(ProjectionBlockValSet.java:69)
           at org.apache.pinot.core.query.distinct.dictionary.DictionaryBasedSingleColumnDistinctOnlyExecutor.process(DictionaryBasedSingleColumnDistinctOnlyExecutor.java:42)
           at org.apache.pinot.core.operator.query.DistinctOperator.getNextBlock(DistinctOperator.java:61)
           at org.apache.pinot.core.operator.query.DistinctOperator.getNextBlock(DistinctOperator.java:38)
           at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42)
           at org.apache.pinot.core.operator.combine.BaseCombineOperator.processSegments(BaseCombineOperator.java:150)
           at org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:105)
           at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
           at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
           at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   `
   
   Also, It is weired that aggregation group by works for multi-value columns. This query shows more details and if this query can work, the non-aggregation group by should also work for multi-value columns.
   Workable query:
   `select tags_value, count(*) from myTable GROUP BY tags_value`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1015729615


   @atris For single column, we can do that. But if there are multiple MV columns in the distinct, should we return all the combinations (the same semantic for group-by right now)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] yupeng9 commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
yupeng9 commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1016775388


   As @MeihanLi said, why don't we take the same behavior as the query **with** aggregations, namely `select tags_value, count(*) from myTable GROUP BY tags_value`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1013527169


   This query will be rewritten to `select distinct(tags_value) from myTable`. Currently there is no MV support for distinct queries.
   The syntax for MV distinct is unclear: Should we break the MV into multiple individual values when calculating the distinct?
   Once we decide the syntax, we can add the support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] MeihanLi commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
MeihanLi commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1015913476


   I feel like that we can break the MV into multiple individual values, since that is the behavior for aggregation group by, "select tags_value, count(*) from myTable GROUP BY tags_value"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
atris commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1016776222


   > As @MeihanLi said, why don't we take the same behavior as the query **with** aggregations, namely `select tags_value, count(*) from myTable GROUP BY tags_value`?
   
   Yes, I believe that is the consensus here 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1012683735


   cc: @atris 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
atris commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1015730677


   > @atris For single column, we can do that. But if there are multiple MV columns in the distinct, should we return all the combinations (the same semantic for group-by right now)?
   
   Yes, IMO. Sorry, when I meant the grouping in my earlier comment, I meant this exact behaviour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] MeihanLi edited a comment on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
MeihanLi edited a comment on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1015913476


   I feel like that we can break the MV into multiple individual values, since that is the behavior for aggregation group by, "select tags_value, count(*) from myTable GROUP BY tags_value" cc: @Jackie-Jiang @yupeng9 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
atris commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1014221353


   I believe it should be individual values and ORing the result together.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #8019: Group by multi-value columns without aggregation can not work

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #8019:
URL: https://github.com/apache/pinot/issues/8019#issuecomment-1016704100


   @kishoreg What do you think about the distinct on MV column semantic? I don't see a sql standard for this. To follow the semantic of MV group-by, we can return all combinations of the column values, e.g. col1: [1, 2], col2: [3, 4], distinct(col1, col2) -> [1, 3], [1, 4], [2, 3], [2, 4]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org