You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Suresh Ollala (JIRA)" <ji...@apache.org> on 2016/01/18 06:13:40 UTC
[jira] [Updated] (DRILL-3353) Non data-type related schema changes
errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suresh Ollala updated DRILL-3353:
---------------------------------
Reviewer: Chun Chang
> Non data-type related schema changes errors
> -------------------------------------------
>
> Key: DRILL-3353
> URL: https://issues.apache.org/jira/browse/DRILL-3353
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - JSON
> Affects Versions: 1.0.0
> Reporter: Oscar Bernal
> Assignee: Steven Phillips
> Fix For: 1.5.0
>
> Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip
>
>
> I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data:
> {code}
> "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}}
> {code}
> Among those records (hundreds of them) I have only two with a slightly different schema:
> {code}
> "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}}
> {code}
> When trying to query the "new" fields, my queries fail:
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615';
> Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615"
> Fragment 0:0
> [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
> {noformat}
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl.
> File file.json
> Record 35
> Fragment 0:0
> [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
> {noformat}
> If I try to extract all "attributes" from those events, Drill will only return a subset of the fields, ignoring the others.
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
> +----------------------------------------------------+
> | EXPR$0 |
> +----------------------------------------------------+
> | {"logged":"no","wearable":"no","type":"xxxx"} |
> | {"logged":"no","wearable":"no","type":"xxxx"} |
> | {"logged":"no","wearable":"no","type":"xxxx"} |
> | {"logged":"no","wearable":"no","type":"xxxx"} |
> | {"logged":"no","wearable":"no","type":"xxxx"} |
> +----------------------------------------------------+
> {noformat}
> What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well.
> Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected?
> Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was for data type changes mostly, for which there are other well documented issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)