You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Packer (JIRA)" <ji...@apache.org> on 2012/06/25 16:12:42 UTC
[jira] [Created] (PIG-2767) Pig creates wrong schema after
dereferencing nested tuple fields
Jonathan Packer created PIG-2767:
------------------------------------
Summary: Pig creates wrong schema after dereferencing nested tuple fields
Key: PIG-2767
URL: https://issues.apache.org/jira/browse/PIG-2767
Project: Pig
Issue Type: Bug
Components: parser
Affects Versions: 0.10.0
Environment: Amazon EMR, patched to use Pig 0.10.0
Reporter: Jonathan Packer
The following script fails:
data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
int, f4: int);
nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
DESCRIBE dereferenced;
uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
DESCRIBE uses_dereferenced;
The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
used, the data is actually in form of the correct schema however, ex.
(1,(2,3))
(5,(6,7))
...
This is not just a problem with DESCRIBE. Because the schema is incorrect,
the reference to "nested_tuple" in the "uses_dereferenced" statement is
considered to be invalid, and the script fails to run. The error is:
Invalid field projection. Projected field [nested_tuple] does not exist in
schema: f1:int,f2:int.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2767) Pig creates wrong schema after
dereferencing nested tuple fields
Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Le Dem updated PIG-2767:
-------------------------------
Fix Version/s: (was: 0.11)
This will be fixed in a future release
> Pig creates wrong schema after dereferencing nested tuple fields
> ----------------------------------------------------------------
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
> Reporter: Jonathan Packer
> Assignee: Daniel Dai
> Attachments: test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2767) Pig creates wrong schema after
dereferencing nested tuple fields
Posted by "Jonathan Packer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Packer updated PIG-2767:
---------------------------------
Attachment: test_data.txt
The 'text_data.txt' used in the script.
> Pig creates wrong schema after dereferencing nested tuple fields
> ----------------------------------------------------------------
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
> Reporter: Jonathan Packer
> Attachments: test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2767) Pig creates wrong schema after
dereferencing nested tuple fields
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai reassigned PIG-2767:
-------------------------------
Assignee: Daniel Dai
> Pig creates wrong schema after dereferencing nested tuple fields
> ----------------------------------------------------------------
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
> Reporter: Jonathan Packer
> Assignee: Daniel Dai
> Fix For: 0.11
>
> Attachments: test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2767) Pig creates wrong schema after
dereferencing nested tuple fields
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-2767:
----------------------------
Fix Version/s: 0.11
> Pig creates wrong schema after dereferencing nested tuple fields
> ----------------------------------------------------------------
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
> Reporter: Jonathan Packer
> Assignee: Daniel Dai
> Fix For: 0.11
>
> Attachments: test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira