You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Parth Chandra (JIRA)" <ji...@apache.org> on 2015/04/20 20:00:00 UTC

[jira] [Updated] (DRILL-1781) For complex functions, don't return until schema is known

     [ https://issues.apache.org/jira/browse/DRILL-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Parth Chandra updated DRILL-1781:
---------------------------------
    Component/s: Metadata

> For complex functions, don't return until schema is known
> ---------------------------------------------------------
>
>                 Key: DRILL-1781
>                 URL: https://issues.apache.org/jira/browse/DRILL-1781
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>            Reporter: Steven Phillips
>            Priority: Blocker
>             Fix For: 0.7.0
>
>         Attachments: DRILL-1781.patch, DRILL-1781.patch
>
>
> In the case of complex output functions, it is impossible to determine the output schema until the actual data is consumed. For example, with convert_form(VARCHAR, 'json'), unlike most other functions, it is not sufficient to know that the incoming data type is VARCHAR, we actually need to decode the contents of the record before we can determine what the output type is, whether it be map, list, or primitive type.
> For fast schema return, we worked around this problem by simply assuming the type was Map, and if it happened to be different, there would be a schema change. This solution is not satisfactory, as it ends up breaking other functions, like flatten.
> The solution is to continue returning a schema whenever possible, but when it is not possible, drill will wait until it is.
> For non-blocking operators, drill will immediately consume the incoming batch, and thus will not return empty schema batches if there is data to consume. Blocking operators will return an empty schema batch. If a flattten function occurs downstream from a blocking operator, it will not be able to return a schema, and thus fast schema return will not happen in this case.
> In the cases where the complex function is not downstream from a blocking operator, fast schema return should continue to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)