You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/09 15:02:00 UTC

[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins

    [ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245783#comment-16245783 ] 

ASF GitHub Bot commented on DRILL-5771:
---------------------------------------

Github user priteshm commented on the issue:

    https://github.com/apache/drill/pull/1014
  
    @ilooner can you please review this?


> Fix serDe errors for format plugins
> -----------------------------------
>
>                 Key: DRILL-5771
>                 URL: https://issues.apache.org/jira/browse/DRILL-5771
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Minor
>             Fix For: 1.12.0
>
>
> Create unit tests to check that all storage format plugins can be successfully serialized  / deserialized.
> Usually this happens when query has several major fragments. 
> One way to check serde is to generate physical plan (generated as json) and then submit it back to Drill.
> One example of found errors is described in the first comment. Another example is described in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name. 
> On Drill start up we load information about available plugins (its reloaded each time storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can not find one we try to [create one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists based on configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node based on format configuration.
> Then we have sent major fragment to the different node where we used this format configuration we could not get format plugin based on it and deserialization has failed.
> To fix this problem we need to create format plugin during query deserialization if it's absent.
>   
> 2.  Absent hash code and equals.
> Format plugins are stored in hash map where key is format plugin config.
> Since some format plugin configs did not have overridden hash code and equals, we could not find format plugin based on its configuration.
> 3. Named format plugin usage
> Named format plugins configs allow to get format plugin by its name for configuration shared among all drillbits.
> They are used as alias for pre-configured format plugiins. User with admin priliges can modify them at runtime.
> Named format plugins configs are used instead of sending all non-default parameters of format plugin config, in this case only name is sent.
> Their usage in distributed system may cause raise conditions.
> For example, 
> 1. Query is submitted. 
> 2. Parquet format plugin is created with the following configuration (autoCorrectCorruptDates=>true).
> 3. Seralized named format plugin config with name as parquet.
> 4. Major fragment is sent to the different node.
> 5. Admin has changed parquet configuration for the alias 'parquet' on all nodes to autoCorrectCorruptDates=>false.
> 6. Named format is deserialized on the different node into parquet format plugin with configuration (autoCorrectCorruptDates=>false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)