You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (Jira)" <ji...@apache.org> on 2020/03/12 20:57:00 UTC

[jira] [Updated] (PIG-5400) OrcStorage dropping struct(tuple) when it only holds a single field inside a Bag(array)

     [ https://issues.apache.org/jira/browse/PIG-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-5400:
------------------------------
    Summary: OrcStorage dropping struct(tuple) when it only holds a single field inside a Bag(array)  (was: OrcStorage dropping struct(tuple) when it only holds a single field)

> OrcStorage dropping struct(tuple) when it only holds a single field inside a Bag(array)
> ---------------------------------------------------------------------------------------
>
>                 Key: PIG-5400
>                 URL: https://issues.apache.org/jira/browse/PIG-5400
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>
> I was asked by a user that they were seeing inconsistent schema when stored on OrcStorage. Sample code 
> {code} 
> A = load 'input.txt' as (a0:long); 
> B = GROUP A by a0; 
> STORE B into 'filename' using OrcStorage(); 
> {code} 
> Pig's schema {{B: {group: long,A: bag: { tuple(a0: long)}}}}. 
> Expected Orc schema {{struct<group:bigint,A:array<struct<bigint>>>}} 
> Actual Orc schema {{struct<group:bigint,A:array<bigint>>}} 
> _This only happens when a tuple contains a single field._ 
> Current schema without struct(tuple) is better in saving space but it would be nice to have an option to keep the extra struct(tuple) layer if user expects schema evolution within that tuple by adding more fields in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)