You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jason Altekruse (JIRA)" <ji...@apache.org> on 2015/07/07 00:36:04 UTC

[jira] [Commented] (DRILL-3443) Flatten function raise exception when JSON files have different schema

    [ https://issues.apache.org/jira/browse/DRILL-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615784#comment-14615784 ] 

Jason Altekruse commented on DRILL-3443:
----------------------------------------

We have a number of known issues around handling changing schemas. Unfortunately due to some current design limitations, a few of these evolving schema cases where a field doesn't exist in some files and does in others, are also known to have issues. We will be trying to fix the error messages in these cases (there are a number of JIRAs related to this root problem) and are looking into ways to solve the problem more generally soon.

> Flatten function raise exception when JSON files have different schema
> ----------------------------------------------------------------------
>
>                 Key: DRILL-3443
>                 URL: https://issues.apache.org/jira/browse/DRILL-3443
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.0.0
>         Environment: DRILL 1.0 Embedded (running on OSX with Java 8)
> DRILL 1.0 Deployed on MapR 4.1 Sandbox
>            Reporter: Tugdual Grall
>            Assignee: Jason Altekruse
>            Priority: Critical
>             Fix For: 1.3.0
>
>
> I have 2 JSON documents:
> {code}
> {
>   "name" : "PPRODUCT_002",
>   "price" : 200.00,
>   "tags" : ["sports" , "cool", "ocean"]
> }
> {
>   "name" : "PPRODUCT_001",
>   "price" : 100.00
> }
> {code}
> And I execute this query:
> {code}
> SELECT name, flatten(tags)
> FROM dfs.`data/json_array/*.json`
> {code}
> If the JSON Documents are located in 2 different files and the first file does not contains the "tags" (product 001 in 001.json ), the following exception is raised:
> {code}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: java.lang.ClassCastException: Cannot cast org.apache.drill.exec.vector.NullableIntVector to org.apache.drill.exec.vector.RepeatedValueVector Fragment 0:0 [Error Id: 4bb5b9e4-0de1-48e9-a0f3-956339608903 on 192.168.99.13:31010]
> {code}
> It is working if:
> * All the JSON documents are in a single json file (order is not important)
> * if the product with the tags attribute is "first" on the file system, for example you put product 02 in 000.json  (that will be read before 001.json)
> This is similar to [DRILL-3334] bug



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)