You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Parth Chandra (JIRA)" <ji...@apache.org> on 2018/04/02 09:18:00 UTC
[jira] [Commented] (DRILL-6223) Drill fails on Schema changes
[ https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422042#comment-16422042 ]
Parth Chandra commented on DRILL-6223:
--------------------------------------
{quote}To your point about compensation logic in the context of Schema Changes
{quote} *
{quote}Why do you think it is ok to dynamically include new columns?{quote}
*
{quote}Yet it is not ok to exclude them?{quote}
Usually, in real world data with dynamically changing schema's, new columns are added and not removed.
*
{quote}Consider a batch of 32k rows{quote}
*
{quote}A VV with null integer values will require 32kb (bits) + 32kb * 4 = 160kb{quote}
*
{quote}Each missing column will require that much memory per mini-fragment{quote}
One of the guarantees provided by value vectors is that elements can be accessed by index in constant time (or, in the case of nested elements in O(m) where m is the level of nesting) . The representation is based on providing this guarantee. It comes at the cost of additional memory usage, which is a deliberate tradeoff.
{quote}This is unless (similarly to the implicit columns) we optimize the VV storage representation or / and push the column preservation to higher layers such as the client or foreman
{quote}
It would be wonderful to improve vectors to use much less memory while providing the same guarantees. A proposal would be welcome.
> Drill fails on Schema changes
> ------------------------------
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators
> Affects Versions: 1.10.0, 1.12.0
> Reporter: salim achouche
> Assignee: salim achouche
> Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data File (Parquet) Set). There are differences in Schema among the files:
> * The Parquet files exhibit differences both at the first level and within nested data types
> * A select * will not cause an exception but using a limit clause will
> * Note also this issue seems to happen only when multiple Drillbit minor fragments are involved (concurrency higher than one)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)