You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2011/09/08 07:19:10 UTC

[jira] [Commented] (OOZIE-346) GH-558: Serialization/deserialization of WorkflowInstance

    [ https://issues.apache.org/jira/browse/OOZIE-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100016#comment-13100016 ] 

Hadoop QA commented on OOZIE-346:
---------------------------------

tucu00 remarked:
there are different options:

1 * (as you say) use a version VERSION field. This means that Oozie will be able to handle more than one version o WF state. This also means that Oozie should be able to deserialize  blob from another version to check the VERSION field. This also means the current serialialization probably won't works as it is binary (Writable) and assumes it can be fully read. so we should change to a serialization format that allows the inspection of the VERSION without having to deserialize the whole thing. Using JSON would do the trick.

2* Have a migration script that gets a DB from Oozie version N and copies/converts things into a new DB for Oozie version N+1.

IMO, we should aim for #1, but we'll have to do #2 once before getting to #1.

> GH-558: Serialization/deserialization of WorkflowInstance
> ---------------------------------------------------------
>
>                 Key: OOZIE-346
>                 URL: https://issues.apache.org/jira/browse/OOZIE-346
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie team at yahoo has recently experienced multiple production issues when a new oozie version is upgraded attributed to the modifications of Workflow tables' structure.
> More specifically, we added a new field into workflow table. Hence, for example, if a user submits a WF job in earlier oozie version and if the job is still active after the upgrade, oozie fails to de-serialize the WFInstance object. In other words, the object was originally serialized using the old structure whereas oozie tries to deserailize using the new structures after the upgrade. Therefore it throws exception.
> Some observations that came up from our internal discussion:
> 1. Is it required to store the blob into table? Can't we create the the object from the other fields of the table? I know it might not be that straight forward. However, other options might be worse than this.
> 2. If we want to keep the blob, the new field(s) should be added at the end during serialization. However if some fields are removed, how could we handle that? Might not be a flexible idea.
> 3. During serialization, we could use some type of version at the beginning, that would help to de-serailize the object. This might make the coding very ugly depending on how many old versions we would like to support.  
> 4. Since it is a very well-known problem, there should be some standard procedure. However they might not be easy too.
> Anyway these are just the initial thoughts. We didn't come up in any conclusion yet.
> Please feel free to make comment?
> Thanks,
> Mohammad

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira