You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2011/04/07 04:45:05 UTC
[jira] [Created] (PIG-1971) New Logical Plan messes up schemas in
projections
New Logical Plan messes up schemas in projections
-------------------------------------------------
Key: PIG-1971
URL: https://issues.apache.org/jira/browse/PIG-1971
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Dmitriy V. Ryaboy
While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
a = load 'hbase://TESTTABLE_1' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
'-loadKey -caster HBaseBinaryConverter')
as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
b = FOREACH a GENERATE rowKey, col_a, col_b;
STORE b into 'TESTTABLE_2' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
#-----------------------------------------------
# Logical Plan:
#-----------------------------------------------
fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
|
|---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
| |
| Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
| Input: a: Load 1-9
| |
| Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
| Input: a: Load 1-9
| |
| Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
| Input: a: Load 1-9
|
|---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
|
|---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
| |
| (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
| | |
| | (Name: Cast Type: chararray Uid: 13)
| | |
| | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
| | |
| | (Name: Cast Type: int Uid: 13)
| | |
| | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
| | |
| | (Name: Cast Type: double Uid: 14)
| | |
| | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
| |
| |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
| |
| |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
| |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
|
|---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1971) New Logical Plan messes up schemas in
projections
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-1971:
-----------------------------------
Description:
While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
a = load 'hbase://TESTTABLE_1' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
'-loadKey -caster HBaseBinaryConverter')
as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
b = FOREACH a GENERATE rowKey, col_a, col_b;
STORE b into 'TESTTABLE_2' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
{noformat}
#-----------------------------------------------
# Logical Plan:
#-----------------------------------------------
fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
|
|---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
| |
| Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
| Input: a: Load 1-9
| |
| Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
| Input: a: Load 1-9
| |
| Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
| Input: a: Load 1-9
|
|---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
|
|---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
| |
| (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
| | |
| | (Name: Cast Type: chararray Uid: 13)
| | |
| | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
| | |
| | (Name: Cast Type: int Uid: 13)
| | |
| | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
| | |
| | (Name: Cast Type: double Uid: 14)
| | |
| | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
| |
| |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
| |
| |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
| |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
|
|---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
{noformat}
was:
While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
a = load 'hbase://TESTTABLE_1' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
'-loadKey -caster HBaseBinaryConverter')
as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
b = FOREACH a GENERATE rowKey, col_a, col_b;
STORE b into 'TESTTABLE_2' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
#-----------------------------------------------
# Logical Plan:
#-----------------------------------------------
fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
|
|---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
| |
| Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
| Input: a: Load 1-9
| |
| Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
| Input: a: Load 1-9
| |
| Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
| Input: a: Load 1-9
|
|---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
|
|---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
| |
| (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
| | |
| | (Name: Cast Type: chararray Uid: 13)
| | |
| | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
| | |
| | (Name: Cast Type: int Uid: 13)
| | |
| | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
| | |
| | (Name: Cast Type: double Uid: 14)
| | |
| | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
| |
| |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
| |
| |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
| |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
|
|---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
> New Logical Plan messes up schemas in projections
> -------------------------------------------------
>
> Key: PIG-1971
> URL: https://issues.apache.org/jira/browse/PIG-1971
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Dmitriy V. Ryaboy
>
> While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
> a = load 'hbase://TESTTABLE_1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
> '-loadKey -caster HBaseBinaryConverter')
> as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
> b = FOREACH a GENERATE rowKey, col_a, col_b;
> STORE b into 'TESTTABLE_2' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
> If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
> Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
> {noformat}
> #-----------------------------------------------
> # Logical Plan:
> #-----------------------------------------------
> fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
> |
> |---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
> | |
> | Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
> | Input: a: Load 1-9
> | |
> | Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
> | Input: a: Load 1-9
> | |
> | Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
> | Input: a: Load 1-9
> |
> |---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
> |
> |---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | |
> | (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | | |
> | | (Name: Cast Type: chararray Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
> | | |
> | | (Name: Cast Type: int Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
> | | |
> | | (Name: Cast Type: double Uid: 14)
> | | |
> | | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
> | |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
> |
> |---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-1971) New Logical Plan messes up schemas in
projections
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy resolved PIG-1971.
------------------------------------
Resolution: Not A Problem
Closing as not-a-bug. I believe the documentation issue was solved in a different ticket.
> New Logical Plan messes up schemas in projections
> -------------------------------------------------
>
> Key: PIG-1971
> URL: https://issues.apache.org/jira/browse/PIG-1971
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Dmitriy V. Ryaboy
> Attachments: PIG-1971-0.patch
>
>
> While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
> a = load 'hbase://TESTTABLE_1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
> '-loadKey -caster HBaseBinaryConverter')
> as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
> b = FOREACH a GENERATE rowKey, col_a, col_b;
> STORE b into 'TESTTABLE_2' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
> If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
> Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
> {noformat}
> #-----------------------------------------------
> # Logical Plan:
> #-----------------------------------------------
> fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
> |
> |---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
> | |
> | Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
> | Input: a: Load 1-9
> | |
> | Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
> | Input: a: Load 1-9
> | |
> | Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
> | Input: a: Load 1-9
> |
> |---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
> |
> |---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | |
> | (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | | |
> | | (Name: Cast Type: chararray Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
> | | |
> | | (Name: Cast Type: int Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
> | | |
> | | (Name: Cast Type: double Uid: 14)
> | | |
> | | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
> | |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
> |
> |---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1971) New Logical Plan messes up schemas in
projections
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017127#comment-13017127 ]
Olga Natkovich commented on PIG-1971:
-------------------------------------
which release should this go to? 0.8? 0.9?
> New Logical Plan messes up schemas in projections
> -------------------------------------------------
>
> Key: PIG-1971
> URL: https://issues.apache.org/jira/browse/PIG-1971
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Dmitriy V. Ryaboy
> Attachments: PIG-1971-0.patch
>
>
> While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
> a = load 'hbase://TESTTABLE_1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
> '-loadKey -caster HBaseBinaryConverter')
> as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
> b = FOREACH a GENERATE rowKey, col_a, col_b;
> STORE b into 'TESTTABLE_2' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
> If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
> Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
> {noformat}
> #-----------------------------------------------
> # Logical Plan:
> #-----------------------------------------------
> fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
> |
> |---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
> | |
> | Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
> | Input: a: Load 1-9
> | |
> | Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
> | Input: a: Load 1-9
> | |
> | Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
> | Input: a: Load 1-9
> |
> |---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
> |
> |---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | |
> | (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | | |
> | | (Name: Cast Type: chararray Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
> | | |
> | | (Name: Cast Type: int Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
> | | |
> | | (Name: Cast Type: double Uid: 14)
> | | |
> | | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
> | |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
> |
> |---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1971) New Logical Plan messes up schemas in
projections
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017125#comment-13017125 ]
Dmitriy V. Ryaboy commented on PIG-1971:
----------------------------------------
Thanks for the fast response Daniel, I'll apply this to my fix for 1870 and see if it helps.
> New Logical Plan messes up schemas in projections
> -------------------------------------------------
>
> Key: PIG-1971
> URL: https://issues.apache.org/jira/browse/PIG-1971
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Dmitriy V. Ryaboy
> Attachments: PIG-1971-0.patch
>
>
> While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
> a = load 'hbase://TESTTABLE_1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
> '-loadKey -caster HBaseBinaryConverter')
> as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
> b = FOREACH a GENERATE rowKey, col_a, col_b;
> STORE b into 'TESTTABLE_2' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
> If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
> Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
> {noformat}
> #-----------------------------------------------
> # Logical Plan:
> #-----------------------------------------------
> fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
> |
> |---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
> | |
> | Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
> | Input: a: Load 1-9
> | |
> | Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
> | Input: a: Load 1-9
> | |
> | Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
> | Input: a: Load 1-9
> |
> |---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
> |
> |---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | |
> | (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | | |
> | | (Name: Cast Type: chararray Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
> | | |
> | | (Name: Cast Type: int Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
> | | |
> | | (Name: Cast Type: double Uid: 14)
> | | |
> | | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
> | |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
> |
> |---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1971) New Logical Plan messes up schemas in
projections
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1971:
----------------------------
Attachment: PIG-1971-0.patch
This is because HBaseStorage.pushProjection change requiredFieldList. requiredFieldList is intend to read only. I attached an initial patch and add a comment to pushProjection. Dmitriy, can you test it?
> New Logical Plan messes up schemas in projections
> -------------------------------------------------
>
> Key: PIG-1971
> URL: https://issues.apache.org/jira/browse/PIG-1971
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Dmitriy V. Ryaboy
> Attachments: PIG-1971-0.patch
>
>
> While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
> a = load 'hbase://TESTTABLE_1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
> '-loadKey -caster HBaseBinaryConverter')
> as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
> b = FOREACH a GENERATE rowKey, col_a, col_b;
> STORE b into 'TESTTABLE_2' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
> If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
> Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
> {noformat}
> #-----------------------------------------------
> # Logical Plan:
> #-----------------------------------------------
> fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
> |
> |---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
> | |
> | Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
> | Input: a: Load 1-9
> | |
> | Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
> | Input: a: Load 1-9
> | |
> | Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
> | Input: a: Load 1-9
> |
> |---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
> |
> |---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | |
> | (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | | |
> | | (Name: Cast Type: chararray Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
> | | |
> | | (Name: Cast Type: int Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
> | | |
> | | (Name: Cast Type: double Uid: 14)
> | | |
> | | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
> | |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
> |
> |---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (PIG-1971) New Logical Plan messes up schemas in
projections
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy closed PIG-1971.
----------------------------------
> New Logical Plan messes up schemas in projections
> -------------------------------------------------
>
> Key: PIG-1971
> URL: https://issues.apache.org/jira/browse/PIG-1971
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Dmitriy V. Ryaboy
> Attachments: PIG-1971-0.patch
>
>
> While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
> a = load 'hbase://TESTTABLE_1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
> '-loadKey -caster HBaseBinaryConverter')
> as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
> b = FOREACH a GENERATE rowKey, col_a, col_b;
> STORE b into 'TESTTABLE_2' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
> If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
> Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
> {noformat}
> #-----------------------------------------------
> # Logical Plan:
> #-----------------------------------------------
> fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
> |
> |---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
> | |
> | Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
> | Input: a: Load 1-9
> | |
> | Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
> | Input: a: Load 1-9
> | |
> | Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
> | Input: a: Load 1-9
> |
> |---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
> |
> |---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | |
> | (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | | |
> | | (Name: Cast Type: chararray Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
> | | |
> | | (Name: Cast Type: int Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
> | | |
> | | (Name: Cast Type: double Uid: 14)
> | | |
> | | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
> | |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
> |
> |---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1971) New Logical Plan messes up schemas in
projections
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017133#comment-13017133 ]
Dmitriy V. Ryaboy commented on PIG-1971:
----------------------------------------
0.8, though if it is indeed a problem with HBaseStorage and not Pig, it'll just be part of 1870
> New Logical Plan messes up schemas in projections
> -------------------------------------------------
>
> Key: PIG-1971
> URL: https://issues.apache.org/jira/browse/PIG-1971
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Dmitriy V. Ryaboy
> Attachments: PIG-1971-0.patch
>
>
> While dealing with PIG-1870, I found that when using the HBaseStorage load/storefunc, which implements projection pushdown and has a custom Caster, the caster is not getting used when the following script is executed:
> a = load 'hbase://TESTTABLE_1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B TESTCOLUMN_C',
> '-loadKey -caster HBaseBinaryConverter')
> as (rowKey:chararray,col_a:int, col_b:double, col_c:chararray);
> b = FOREACH a GENERATE rowKey, col_a, col_b;
> STORE b into 'TESTTABLE_2' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('TESTCOLUMN_A TESTCOLUMN_B','-caster HBaseBinaryConverter');
> If a is stored directly, without the FOREACH, the HBaseBinaryConverter methods are invoked to convert fields as appropriate. If b gets stored, HBaseBinaryConverter is completely ignored. If newlogicalplan is turned off, everything works as expected.
> Further evidence that something odd as afoot -- though possibly unrelated -- note that the field aliases are messed up in the new logical plan if b is EXPLAINed (col_a is repeated twice, instead of the first column being called rowkey, in the new logical plan):
> {noformat}
> #-----------------------------------------------
> # Logical Plan:
> #-----------------------------------------------
> fake: Store 1-18 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: Unknown
> |
> |---b: ForEach 1-17 Schema: {rowKey: chararray,col_a: int,col_b: double} Type: bag
> | |
> | Project 1-14 Projections: [0] Overloaded: false FieldSchema: rowKey: chararray Type: chararray
> | Input: a: Load 1-9
> | |
> | Project 1-15 Projections: [1] Overloaded: false FieldSchema: col_a: int Type: int
> | Input: a: Load 1-9
> | |
> | Project 1-16 Projections: [2] Overloaded: false FieldSchema: col_b: double Type: double
> | Input: a: Load 1-9
> |
> |---a: Load 1-9 Schema: {rowKey: chararray,col_a: int,col_b: double,col_c: chararray} Type: bag
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> fake: (Name: LOStore Schema: col_a#12:chararray,col_a#13:int,col_b#14:double)ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]
> |
> |---b: (Name: LOForEach Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | |
> | (Name: LOGenerate[false,false,false] Schema: col_a#13:chararray,col_a#13:int,col_b#14:double)
> | | |
> | | (Name: Cast Type: chararray Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 0 Column: 0)
> | | |
> | | (Name: Cast Type: int Uid: 13)
> | | |
> | | |---col_a:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: 0)
> | | |
> | | (Name: Cast Type: double Uid: 14)
> | | |
> | | |---col_b:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: 0)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray)
> | |
> | |---(Name: LOInnerLoad[0] Schema: col_a#13:bytearray) | |
> | |---(Name: LOInnerLoad[1] Schema: col_b#14:bytearray)
> |
> |---a: (Name: LOLoad Schema: col_a#13:bytearray,col_b#14:bytearray)ColumnPrune:RequiredColumns=[0, 1, 2]ColumnPrune:InputUids=[12, 13, 14]ColumnPrune:OutputUids=[12, 13, 14]RequiredFields:[1, 2]
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira