You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2007/11/02 17:23:50 UTC

[jira] Resolved: (PIG-10) reduce encoding of intermediate results

     [ https://issues.apache.org/jira/browse/PIG-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates resolved PIG-10.
---------------------------

    Resolution: Invalid

There is no requirement in pig that each tuple in a relation share the same schema, so it will not always be an option to store the schema once up front in intermediate results.  Even in the cases where the schema is known, complex data types with no guaranteed schemas (such as maps) could be in the tuples and would still require markers in the code.  We could optimize for the case where all tuples are the same and all tuples contain only atomic data, but its not clear how we would know that to be the case.

> reduce encoding of intermediate results
> ---------------------------------------
>
>                 Key: PIG-10
>                 URL: https://issues.apache.org/jira/browse/PIG-10
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Olga Natkovich
>
> Currently, in intermediate results, the data is written with a marker for every column in every row.  For instance if
> we are writing a row that has a schema of bag, atom, we'll write:
> BAGMARKER BAGDATA ATOMMARKER ATOMDATA
> There's no reason to write the markers for every row.  Is should be sufficient to write it once at the beginning of the
> file and then remember it for subsequent rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.