You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by lulynn_2008 <lu...@163.com> on 2011/08/23 12:20:02 UTC

question about pig commands implementation procedure and unit test result

 Hello,
I have some opinion about pig commands implementation procedure:
For example:
pig commands(from TestNewPlanLogToPhyTranslationVisitor.java):
        a = load 'd1.txt' as (id, c);
        b = load 'd2.txt'as (id, c);
        c = load 'd3.txt' as (id, c);
        d = join a by id, b by c;      
        e = filter d by a::id==NULL AND b::c==NULL;
        f = join e by b::c, c by id;
        g = filter f by b::id==NULL AND c::c==NULL;
        store g into 'empty2';
Pig will use buildPlan method to get LogicalPlan like this:
|
|---g: Filter scope-24 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
    |   |
    |   And scope-23 FieldSchema: boolean Type: boolean
    |   |
    |   |---Equal scope-19 FieldSchema: boolean Type: boolean
    |   |   |
    |   |   |---Project scope-17 Projections: [2] Overloaded: false FieldSchema: e::b::id: bytearray Type: bytearray
    |   |   |   Input: f: LOJoin scope-16
    |   |   |
    |   |   |---Const scope-18( null ) FieldSchema: bytearray Type: bytearray
    |   |
    |   |---Equal scope-22 FieldSchema: boolean Type: boolean
    |       |
    |       |---Project scope-20 Projections: [5] Overloaded: false FieldSchema: c::c: bytearray Type: bytearray
    |       |   Input: f: LOJoin scope-16
    |       |
    |       |---Const scope-21( null ) FieldSchema: bytearray Type: bytearray
    |
    |---f: LOJoin scope-16 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
        |   |
        |   Project scope-14 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
        |   Input: e: Filter scope-13
        |   |
        |   Project scope-15 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
        |   Input: c: Load scope-2
        |
        |---c: Load scope-2 Schema: {id: bytearray,c: bytearray} Type: bag
        |
        |---e: Filter scope-13 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
            |   |
            |   And scope-12 FieldSchema: boolean Type: boolean
            |   |
            |   |---Equal scope-8 FieldSchema: boolean Type: boolean
            |   |   |
            |   |   |---Project scope-6 Projections: [0] Overloaded: false FieldSchema: a::id: bytearray Type: bytearray
            |   |   |   Input: d: LOJoin scope-5
            |   |   |
            |   |   |---Const scope-7( null ) FieldSchema: bytearray Type: bytearray
            |   |
            |   |---Equal scope-11 FieldSchema: boolean Type: boolean
            |       |
            |       |---Project scope-9 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
            |       |   Input: d: LOJoin scope-5
            |       |
            |       |---Const scope-10( null ) FieldSchema: bytearray Type: bytearray
            |
            |---d: LOJoin scope-5 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
                |   |
                |   Project scope-3 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
                |   Input: a: Load scope-0
                |   |
                |   Project scope-4 Projections: [1] Overloaded: false FieldSchema: c: bytearray Type: bytearray
                |   Input: b: Load scope-1
                |
                |---a: Load scope-0 Schema: {id: bytearray,c: bytearray} Type: bag
                |
                |---b: Load scope-1 Schema: {id: bytearray,c: bytearray} Type: bag

I assume the commands analysis and middle data storage are all based on HashMap structure. Is this correct?
I found some test cases result are based on the result of HashMap analysis. Then in my opinion, our test case output result should not be single. As we know the output of HashMap analysis is not  steadfast. Please give your opinion about my words. Thank you.



Re: question about pig commands implementation procedure and unit test result

Posted by Daniel Dai <da...@hortonworks.com>.
Yes, we use HashMap in 0.8.1. In 0.9, we are using ArrayList, so you
might see fewer issues like this.

Daniel

2011/8/23 lulynn_2008 <lu...@163.com>:
>  Hello,
> I have some opinion about pig commands implementation procedure:
> For example:
> pig commands(from TestNewPlanLogToPhyTranslationVisitor.java):
>        a = load 'd1.txt' as (id, c);
>        b = load 'd2.txt'as (id, c);
>        c = load 'd3.txt' as (id, c);
>        d = join a by id, b by c;
>        e = filter d by a::id==NULL AND b::c==NULL;
>        f = join e by b::c, c by id;
>        g = filter f by b::id==NULL AND c::c==NULL;
>        store g into 'empty2';
> Pig will use buildPlan method to get LogicalPlan like this:
> |
> |---g: Filter scope-24 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
>    |   |
>    |   And scope-23 FieldSchema: boolean Type: boolean
>    |   |
>    |   |---Equal scope-19 FieldSchema: boolean Type: boolean
>    |   |   |
>    |   |   |---Project scope-17 Projections: [2] Overloaded: false FieldSchema: e::b::id: bytearray Type: bytearray
>    |   |   |   Input: f: LOJoin scope-16
>    |   |   |
>    |   |   |---Const scope-18( null ) FieldSchema: bytearray Type: bytearray
>    |   |
>    |   |---Equal scope-22 FieldSchema: boolean Type: boolean
>    |       |
>    |       |---Project scope-20 Projections: [5] Overloaded: false FieldSchema: c::c: bytearray Type: bytearray
>    |       |   Input: f: LOJoin scope-16
>    |       |
>    |       |---Const scope-21( null ) FieldSchema: bytearray Type: bytearray
>    |
>    |---f: LOJoin scope-16 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
>        |   |
>        |   Project scope-14 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
>        |   Input: e: Filter scope-13
>        |   |
>        |   Project scope-15 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
>        |   Input: c: Load scope-2
>        |
>        |---c: Load scope-2 Schema: {id: bytearray,c: bytearray} Type: bag
>        |
>        |---e: Filter scope-13 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
>            |   |
>            |   And scope-12 FieldSchema: boolean Type: boolean
>            |   |
>            |   |---Equal scope-8 FieldSchema: boolean Type: boolean
>            |   |   |
>            |   |   |---Project scope-6 Projections: [0] Overloaded: false FieldSchema: a::id: bytearray Type: bytearray
>            |   |   |   Input: d: LOJoin scope-5
>            |   |   |
>            |   |   |---Const scope-7( null ) FieldSchema: bytearray Type: bytearray
>            |   |
>            |   |---Equal scope-11 FieldSchema: boolean Type: boolean
>            |       |
>            |       |---Project scope-9 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
>            |       |   Input: d: LOJoin scope-5
>            |       |
>            |       |---Const scope-10( null ) FieldSchema: bytearray Type: bytearray
>            |
>            |---d: LOJoin scope-5 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
>                |   |
>                |   Project scope-3 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
>                |   Input: a: Load scope-0
>                |   |
>                |   Project scope-4 Projections: [1] Overloaded: false FieldSchema: c: bytearray Type: bytearray
>                |   Input: b: Load scope-1
>                |
>                |---a: Load scope-0 Schema: {id: bytearray,c: bytearray} Type: bag
>                |
>                |---b: Load scope-1 Schema: {id: bytearray,c: bytearray} Type: bag
>
> I assume the commands analysis and middle data storage are all based on HashMap structure. Is this correct?
> I found some test cases result are based on the result of HashMap analysis. Then in my opinion, our test case output result should not be single. As we know the output of HashMap analysis is not  steadfast. Please give your opinion about my words. Thank you.
>
>
>