You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2018/06/05 06:33:44 UTC

Re: Suitability of RelJson format for long-term storage

I think it would be useful to serialize the table’s row-type (column names and types, in order) as part of the RelJson. Then the reader could adapt if the table’s row-type has changed since the RelJson was written.

If the row-type of each leaf is known, then column names can be deduced throughout the tree, and column references by name have the same information content as column ordinals by reference.

Julian


> On May 31, 2018, at 12:56 PM, Marc Prud'hommeaux <ma...@glimpse.io> wrote:
> 
> 
> Thanks for the encouragement. After some further reflection, one concern I have for using the built-in serialization format for long-term storage is that column references are stored by their ordinal position rather than their name, which could mess things up if the underlying table's columns are changed over time.
> 
> This seems to be pretty deeply baked in, but please let me know if I am missing some option for encoding the column references by name rather than index.
> 
> 	-Marc
> 
> 
>> On May 31, 2018, at 11:50, Julian Hyde <jh...@apache.org> wrote:
>> 
>> I support the idea of making it stable. It will take some work: at a minimum, documentation and a version id, then later some transformers to convert version X to version Y.
>> 
>>> On May 31, 2018, at 8:16 AM, Michael Mior <mm...@apache.org> wrote:
>>> 
>>> AFAIK, no one is using this for long-term storage and no one is expecting
>>> the format to stable. That said, I personally would be open to the idea of
>>> stabilizing the format. Given the format is fairly simple, one approach
>>> would be to use something like JSON Schema and then have some tests to
>>> validate that the output corresponds to the schema.
>>> 
>>> --
>>> Michael Mior
>>> mmior@apache.org
>>> 
>>> 
>>> 
>>> Le jeu. 31 mai 2018 à 11:09, Marc Prud'hommeaux <ma...@glimpse.io> a écrit :
>>> 
>>>> 
>>>> I am developing an application that allows end users to interactively
>>>> construct and execute relational expressions that span multiple data
>>>> sources using Calcite. My current implementation utilizes my own relational
>>>> algebra JSON format which I then convert to a RelNode using a RelBuilder.
>>>> It would vastly simplify my project if I could just use Calcite's own
>>>> RelJson format to construct and persist relational expressions, but I am
>>>> concerned that the format is both undocumented, and, aside from
>>>> RelWriterTest.java, does not have much in the way of future guarantees that
>>>> the format will remain stable.
>>>> 
>>>> Is the RelJson format intended the be used for long-term storage? Are
>>>> there any known applications that are using this as a serialization format
>>>> for their relational expressions?
>>>> 
>>>> If the consensus is that this format should be stable, then I can do some
>>>> work towards documenting it, as well as implementing some additional test
>>>> cases to ensure that RelNodes that are round-tripped through JSON
>>>> serialization maintain fidelity.
>>>> 
>>>>      -Marc
>>>> 
>>>> 
>> 
>