You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/02/04 23:33:11 UTC

[GitHub] [drill] vdiravka commented on pull request #2364: DRILL-8037: Add V2 JSON Format Plugin based on EVF

vdiravka commented on pull request #2364:
URL: https://github.com/apache/drill/pull/2364#issuecomment-1030429509


   Hi @paul-rogers I have rebased the branch to master branch. And in separate new commit removed the hack, which hid the schema change in the `HashAggTemplate` (and actually one row is missing in query result, just actually test case doesn't check it).
   Thanks for explanation how vectors is working, it helped me. It is clear now, that schema is changing due to `RepeatedMapVector` [can't be obtained from the cache](https://github.com/apache/drill/blob/317f164791bbbe8f937eb452b49e92c34f1c0333/exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java#L220):
   ```
         // Don't get the map vector from the vector cache. Map vectors may
         // have content that varies from batch to batch. Only the leaf
         // vectors can be cached.
   ```
   Obtaining vector from cache here leads to errors in this and other test cases:
   `mapVector = (RepeatedMapVector) parent.vectorCache().vectorFor(mapColSchema.schema());`
   
   ```
   org.apache.drill.common.exceptions.UserRemoteException: EXECUTION_ERROR ERROR: null
   
   Read failed for reader: JsonBatchReader
   ....
   Caused by: java.lang.AssertionError: 
   	at org.apache.drill.exec.physical.resultSet.impl.TupleState$MapState.addOutputColumn(TupleState.java:475)
   	at org.apache.drill.exec.physical.resultSet.impl.ColumnState.buildOutput(ColumnState.java:321)
   	at org.apache.drill.exec.physical.resultSet.impl.TupleState.updateOutput(TupleState.java:206)
   	at org.apache.drill.exec.physical.resultSet.impl.TupleState.updateOutput(TupleState.java:217)
   	at org.apache.drill.exec.physical.resultSet.impl.TupleState$RowState.updateOutput(TupleState.java:430)
   	at org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.harvest(ResultSetLoaderImpl.java:716)
   ```
   So as for me looks like we need to implement supporting schema change for hashAgg operator or obtaining `RepeatedMapVector` from the cache. I lean towards the latter. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org