You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/09 15:02:00 UTC
[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins
[ https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245783#comment-16245783 ]
ASF GitHub Bot commented on DRILL-5771:
---------------------------------------
Github user priteshm commented on the issue:
https://github.com/apache/drill/pull/1014
@ilooner can you please review this?
> Fix serDe errors for format plugins
> -----------------------------------
>
> Key: DRILL-5771
> URL: https://issues.apache.org/jira/browse/DRILL-5771
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.11.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Priority: Minor
> Fix For: 1.12.0
>
>
> Create unit tests to check that all storage format plugins can be successfully serialized / deserialized.
> Usually this happens when query has several major fragments.
> One way to check serde is to generate physical plan (generated as json) and then submit it back to Drill.
> One example of found errors is described in the first comment. Another example is described in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name.
> On Drill start up we load information about available plugins (its reloaded each time storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can not find one we try to [create one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists based on configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node based on format configuration.
> Then we have sent major fragment to the different node where we used this format configuration we could not get format plugin based on it and deserialization has failed.
> To fix this problem we need to create format plugin during query deserialization if it's absent.
>
> 2. Absent hash code and equals.
> Format plugins are stored in hash map where key is format plugin config.
> Since some format plugin configs did not have overridden hash code and equals, we could not find format plugin based on its configuration.
> 3. Named format plugin usage
> Named format plugins configs allow to get format plugin by its name for configuration shared among all drillbits.
> They are used as alias for pre-configured format plugiins. User with admin priliges can modify them at runtime.
> Named format plugins configs are used instead of sending all non-default parameters of format plugin config, in this case only name is sent.
> Their usage in distributed system may cause raise conditions.
> For example,
> 1. Query is submitted.
> 2. Parquet format plugin is created with the following configuration (autoCorrectCorruptDates=>true).
> 3. Seralized named format plugin config with name as parquet.
> 4. Major fragment is sent to the different node.
> 5. Admin has changed parquet configuration for the alias 'parquet' on all nodes to autoCorrectCorruptDates=>false.
> 6. Named format is deserialized on the different node into parquet format plugin with configuration (autoCorrectCorruptDates=>false).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)