You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tajo.apache.org by camelia c <ca...@yahoo.com> on 2013/09/07 13:42:29 UTC

[GSoc2013] - Outer Join - a question about MergeJoinExec

Hello,

I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.

1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?

2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?

3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?

More precisely, in class OuterJoinRewriteRule, in method
   public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)

I tried to do
    plan.getBlock(joinNode).getName()
but I receive a Null Pointer Exception.



I look forward to receiving Your answer!

Yours sincerely,
Camelia

Re: [GSoc2013] - Outer Join - 2 very important questions

Posted by Hyunsik Choi <hy...@apache.org>.
This is the answer of the second question. If you want to perform unit
tests for a specific class TestLeftOuter_NLJoinExec, please type at
the directory tajo-core/tajo-core-backend as follows:

mvn test -Dtest=org.apache.tajo.engine.planner.physical.TestLeftOuter_NLJoinExec

The best way is to use some IDE that support junit. intelliJ and
eclipse support this feature.

Note that you should execute 'mvn clean install' at the source root if
you changed some source code out of tajo-core/tajo-core-backend.

Best regards,
Hyunsik

On Sun, Sep 15, 2013 at 3:55 AM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> 1)
> I'm still waiting for Your answer on this question please.
> I re-tell You the problem.
>
> I have a table A with a column fkA of type INT4 which is a foreign key referencing table B's primary key pkB.
> The data in column fkA contains both numbers and null values (NullDatum).
>
>
> In table B, column pkB is of type INT4 and it is the primary key => all its values are non-null. So in pkB we have only numbers.
>
> When faced with testing the join condition
>
>
> if (joinQual != null) {
>         joinQual.eval(qualCtx, inSchema, frameTuple);
>         if (joinQual.terminate(qualCtx).asBool()) {
>
> there is a problem:
>
> ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
>     at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
>     at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
>     at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
>     at
>  org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
>     at org.apache.tajo.worker.Task.run(Task.java:378)
>     at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
>     at java.lang.Thread.run(Thread.java:662)
>
>
> So, once again my question is: what should be modified in order to allow the comparison of null values residents in an INT4 column to numbers in a different table's INT4 column.
>
> 2) How can I perform a specific unit test without having to perform all tests. I tried different choices suggested at http://tajo.incubator.apache.org/build.html,  but no success.
> For example, say You want to perform only test   TestNLJoinExec.java
> What is the complete command that You type in the terminal?
>
>
>
> Please answer me this message as soon as possible, please.
>
>
> Yours sincerely,
>
> Camelia
>
>
>
>
>
>
>
>
>
>
>
>
>
> ________________________________
>  From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Wednesday, September 11, 2013 11:06 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx (PS)
>
>
> PS:
>
> The problem is either there in BinaryEvalCtx, or in class
>
> public class Int4Datum extends NumericDatum.
>
>
> in the method
> public int compareTo(Datum datum) {
>
> ...
>
>
> @Override
>  127   public int compareTo(Datum datum) {
>  128     switch (datum.type()) {
>  129       case INT2:
>  130         if (val <
>  datum.asInt2()) {
>  131           return -1;
>  132         } else if (datum.asInt2() < val) {
>  133           return 1;
>  134         } else {
>  135           return 0;
>  136         }
>  137       case INT4:
>  138         if (val < datum.asInt4()) {
>  139           return -1;
>  140         } else if (datum.asInt4() < val) {
>  141           return
>  1;
>  142         } else {
>  143           return 0;
>  144         }
>  145       case INT8:
>  146         if (val < datum.asInt8()) {
>  147           return -1;
>  148         } else if (datum.asInt8() < val) {
>  149           return 1;
>  150         } else {
>  151           return 0;
>  152         }
>  153       case
>  FLOAT4:
>  154         if (val < datum.asFloat4()) {
>  155           return -1;
>  156         } else if (datum.asFloat4() < val) {
>  157           return 1;
>  158         } else {
>  159           return 0;
>  160         }
>  161       case FLOAT8:
>  162         if (val < datum.asFloat8()) {
>  163           return -1;
>  164         } else if (datum.asFloat8() < val)
>  {
>  165           return 1;
>  166         } else {
>  167           return 0;
>  168         }
>  169       default:
>  170         throw new InvalidOperationException(datum.type());
>  171     }
>  172   }
>  173
>
>
>
> Just a hint, maybe it helps You find and solve the issue more quickly.
>
> Camelia
>
>
>
>
> ________________________________
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Wednesday, September 11, 2013 10:18 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
>
>
> Hello,
>
> I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.
>
>
> The portion of code leading to error in  is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:
>
> if (joinQual != null) {
>         joinQual.eval(qualCtx, inSchema,
>  frameTuple);
>         if (joinQual.terminate(qualCtx).asBool()) {
>
> Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.
>
>
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
>  ********rightChild.next() =(0=>777.0, 1=>NULL)
>
> 13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
>     at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
>     at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
>     at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
>     at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
>     at org.apache.tajo.worker.Task.run(Task.java:378)
>     at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
>     at java.lang.Thread.run(Thread.java:662)
>
>
> It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.
>
> These classes exist beyond outer join nodes' scope but maybe they
>  weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.
>
>
>
> Thank You in advance for Your answer and for the understanding!
>
> Yours sincerely,
> Camelia

Re: [GSoc2013] - Outer Join

Posted by Hyunsik Choi <hy...@apache.org>.
I also appreciate your understanding. If it is hard for you to rebase
your source code, I can help you.

Please read this page http://wiki.apache.org/tajo/HowToContribute
HowToContribute page describes how to create a patch.

Then, you can find a button named 'Attach Files' from 'More Actions'
in the below Jira issue.
https://issues.apache.org/jira/browse/TAJO-34

Thanks,
Hyunsik

On Tue, Sep 24, 2013 at 11:13 PM, camelia c <ca...@yahoo.com> wrote:
> Thank You for Your feedback!
>
> It is mainly the outer join optimization part that got affected by TAJO's
> recent changes. I'll make some more efforts to figure out how to save that
> part as well.
>
> Also, which is the command for creating a patch for JIRA?
>
> Thank You very much,
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: camelia c <ca...@yahoo.com>
> Cc: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Tuesday, September 24, 2013 3:52 PM
>
> Subject: Re: [GSoc2013] - Outer Join
>
> I also think you successfully finish you job.
>
> However, you need to know what is the primary focus of GSoC project.
> The most important thing is to learn how to participate in open source
> project. The final step of open source contribution is to commit your
> work to the source repository. It would be better that you make more
> effort to submit your work as a patch to Jira.
>
> In addition, source code conflicts are very usual in open source
> projects that two or more people contribute. You need to rebase your
> code frequently or submit a small completed peace as a patch to Jira
> each time you finished a small part. I already recommend you to submit
> patches several times. Your last rebase was performed in September
> 5th. 20 days are not short in an active open source project.
>
> Anyway, conguratultion that you finish GSoC program!
>
> Best regards,
> Hyunsik Choi
>
>
>
> On Tue, Sep 24, 2013 at 9:04 PM, camelia c <ca...@yahoo.com> wrote:
>> Hello,
>>
>>
>> Yes, indeed I finished the project. I had synchronized on September 5th ,
>> the last time.
>> Then I finished the project and ran  mvn clean install    successfully!
>>
>> As You should have known, on September 16th, it was the soft pencils down
>> deadline, meaning that Google evaluates source code written by that date.
>> All source code modified afterwards counts only as further work, not for
>> the
>> GSoC project aim accomplisment.
>>
>>
>> Today, as You requested, I tried to synchronize again.
>> Well, except from some conflicts that I resolved for commit purposes, the
>> biggest surprise was to discover that You deleted the classes I worked
>> with.
>> Also, You modified/ deleted some methods that I was using. Unfortunately
>> this is not the first time that You delete classes while I'm using them
>> (just remind You of  #TAJO-87,  #TAJO-121 ,  #TAJO-96  which caused me to
>> start work all over again at mid August, even if we had agreed initially
>> on
>> the Software Design Document).
>>
>> I shall give some examples:
>>
>> 1) I use the class FromTable in several places (e.g. OuterJoinUtil,
>> OuterJoinMeta, OuterJoinRewriteRule), but this class was deleted by You on
>> September 20th, after the project's deadline. I'm talking about Your
>> commit
>> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
>> resolving method. (hyunsik) ".
>>
>>
>> 2) From class ScanNode, I use the method getTableId()  (e.g. in
>> OuterJoinMeta, in FilterPushdownRule), which is no longer available, as of
>> September 16th when You first renamed it to getTableName in commit
>> 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
>> GlobalPlanner and global plan data structure".
>> Afterwards, on September 20th, in commit
>> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
>> resolving method. (hyunsik) ",  You dropped the FromTable field from this
>> class and used a TableDesc instead.
>>
>>
>> 3) From class Column, I use the method getTableName()  (e.g. in
>> FilterPushdownRule), which is no longer available, as of September 16th,
>> in
>> commit  1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
>> GlobalPlanner and global plan data structure", You deleted this method.
>>
>> 4) From class SortNode, I use the constructor public SortNode(SortSpec[]
>> sortKeys, Schema inSchema, Schema outSchema)  (e.g. in
>> PhysicalPlannerImpl)
>> but yesterday, on September 23rd , on commit
>> fc018de823dd34d769eb73f3c42e089b0d992b81    named  "TAJO-194: LogicalNode
>> should have an identifier to distinguish each logical node instance", You
>> deleted this method.
>>
>> All these major changes after the project's deadline September 16th, can
>> be
>> handled in the future depending on what You had in your plan when You
>> decided to do all these changes.
>>
>>
>> As all these changes occured after the project's deadline, I consider the
>> project successfully finished and the proof is the source repository at
>> https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1.
>>
>> The project status was continuously updated on the project's website that
>> You know : https://sites.google.com/site/gsoc2013tajo34/ , so that the
>> status was always visible for interested parties.
>>
>> Looking forward to hear from You soon.
>>
>> Yours sincerely,
>> Camelia
>>
>>
>> ________________________________
>> From: Hyunsik Choi <hy...@apache.org>
>> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
>> <ca...@yahoo.com>
>> Sent: Sunday, September 15, 2013 12:49 PM
>> Subject: Re: [GSoc2013] - Outer Join - 2 very important questions
>>
>> Hi camelia,
>>
>> I'm sorry for late response. The solution is simple. You can modify
>> existing source code. I've changed the 121 line in Int4Datum as
>> follows:
>>
>> default:
>>  if (datum instanceof NullDatum) {
>>    return DatumFactory.createBool(false);
>>  } else {
>>    throw new InvalidOperationException();
>>  }
>>
>> Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
>> other Datum classes need to have the above codes.
>>
>> Anyway, you work looks very great.
>>
>> Best regards,
>> Hyunsik Choi
>>
>>> __________
>
>

Re: [GSoc2013] - Outer Join

Posted by camelia c <ca...@yahoo.com>.
Thank You for Your feedback!

It is mainly the outer join optimization part that got affected by TAJO's recent changes. I'll make some more efforts to figure out how to save that part as well.

Also, which is the command for creating a patch for JIRA?

Thank You very much,
Camelia




________________________________
 From: Hyunsik Choi <hy...@apache.org>
To: camelia c <ca...@yahoo.com> 
Cc: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org> 
Sent: Tuesday, September 24, 2013 3:52 PM
Subject: Re: [GSoc2013] - Outer Join
 

I also think you successfully finish you job.

However, you need to know what is the primary focus of GSoC project.
The most important thing is to learn how to participate in open source
project. The final step of open source contribution is to commit your
work to the source repository. It would be better that you make more
effort to submit your work as a patch to Jira.

In addition, source code conflicts are very usual in open source
projects that two or more people contribute. You need to rebase your
code frequently or submit a small completed peace as a patch to Jira
each time you finished a small part. I already recommend you to submit
patches several times. Your last rebase was performed in September
5th. 20 days are not short in an active open source project.

Anyway, conguratultion that you finish GSoC program!

Best regards,
Hyunsik Choi



On Tue, Sep 24, 2013 at 9:04 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
>
> Yes, indeed I finished the project. I had synchronized on September 5th ,
> the last time.
> Then I finished the project and ran   mvn clean install    successfully!
>
> As You should have known, on September 16th, it was the soft pencils down
> deadline, meaning that Google evaluates source code written by that date.
> All source code modified afterwards counts only as further work, not for the
> GSoC project aim accomplisment.
>
>
> Today, as You requested, I tried to synchronize again.
> Well, except from some conflicts that I resolved for commit purposes, the
> biggest surprise was to discover that You deleted the classes I worked with.
> Also, You modified/ deleted some methods that I was using. Unfortunately
> this is not the first time that You delete classes while I'm using them
> (just remind You of  #TAJO-87,  #TAJO-121 ,  #TAJO-96  which caused me to
> start work all over again at mid August, even if we had agreed initially on
> the Software Design Document).
>
> I shall give some examples:
>
> 1) I use the class FromTable in several places (e.g. OuterJoinUtil,
> OuterJoinMeta, OuterJoinRewriteRule), but this class was deleted by You on
> September 20th, after the project's deadline. I'm talking about Your commit
> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
> resolving method. (hyunsik) ".
>
>
> 2) From class ScanNode, I use the method getTableId()  (e.g. in
> OuterJoinMeta, in FilterPushdownRule), which is no longer available, as of
> September 16th when You first renamed it to getTableName in commit
> 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
> GlobalPlanner and global plan data structure".
> Afterwards, on September 20th, in commit
> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
> resolving method. (hyunsik) ",  You dropped the FromTable field from this
> class and used a TableDesc instead.
>
>
> 3) From class Column, I use the method getTableName()  (e.g. in
> FilterPushdownRule), which is no longer available, as of September 16th, in
> commit  1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
> GlobalPlanner and global plan data structure", You deleted this method.
>
> 4) From class SortNode, I use the constructor public SortNode(SortSpec[]
> sortKeys, Schema inSchema, Schema outSchema)   (e.g. in PhysicalPlannerImpl)
> but yesterday, on September 23rd , on commit
> fc018de823dd34d769eb73f3c42e089b0d992b81    named  "TAJO-194: LogicalNode
> should have an identifier to distinguish each logical node instance", You
> deleted this method.
>
> All these major changes after the project's deadline September 16th, can be
> handled in the future depending on what You had in your plan when You
> decided to do all these changes.
>
>
> As all these changes occured after the project's deadline, I consider the
> project successfully finished and the proof is the source repository at
> https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1.
>
> The project status was continuously updated on the project's website that
> You know : https://sites.google.com/site/gsoc2013tajo34/ , so that the
> status was always visible for interested parties.
>
> Looking forward to hear from You soon.
>
> Yours sincerely,
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
> <ca...@yahoo.com>
> Sent: Sunday, September 15, 2013 12:49 PM
> Subject: Re: [GSoc2013] - Outer Join - 2 very important questions
>
> Hi camelia,
>
> I'm sorry for late response. The solution is simple. You can modify
> existing source code. I've changed the 121 line in Int4Datum as
> follows:
>
> default:
>   if (datum instanceof NullDatum) {
>     return DatumFactory.createBool(false);
>   } else {
>     throw new InvalidOperationException();
>   }
>
> Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
> other Datum classes need to have the above codes.
>
> Anyway, you work looks very great.
>
> Best regards,
> Hyunsik Choi
>
>> __________

Re: [GSoc2013] - Outer Join

Posted by Hyunsik Choi <hy...@apache.org>.
I also think you successfully finish you job.

However, you need to know what is the primary focus of GSoC project.
The most important thing is to learn how to participate in open source
project. The final step of open source contribution is to commit your
work to the source repository. It would be better that you make more
effort to submit your work as a patch to Jira.

In addition, source code conflicts are very usual in open source
projects that two or more people contribute. You need to rebase your
code frequently or submit a small completed peace as a patch to Jira
each time you finished a small part. I already recommend you to submit
patches several times. Your last rebase was performed in September
5th. 20 days are not short in an active open source project.

Anyway, conguratultion that you finish GSoC program!

Best regards,
Hyunsik Choi



On Tue, Sep 24, 2013 at 9:04 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
>
> Yes, indeed I finished the project. I had synchronized on September 5th ,
> the last time.
> Then I finished the project and ran   mvn clean install    successfully!
>
> As You should have known, on September 16th, it was the soft pencils down
> deadline, meaning that Google evaluates source code written by that date.
> All source code modified afterwards counts only as further work, not for the
> GSoC project aim accomplisment.
>
>
> Today, as You requested, I tried to synchronize again.
> Well, except from some conflicts that I resolved for commit purposes, the
> biggest surprise was to discover that You deleted the classes I worked with.
> Also, You modified/ deleted some methods that I was using. Unfortunately
> this is not the first time that You delete classes while I'm using them
> (just remind You of  #TAJO-87,  #TAJO-121 ,  #TAJO-96  which caused me to
> start work all over again at mid August, even if we had agreed initially on
> the Software Design Document).
>
> I shall give some examples:
>
> 1) I use the class FromTable in several places (e.g. OuterJoinUtil,
> OuterJoinMeta, OuterJoinRewriteRule), but this class was deleted by You on
> September 20th, after the project's deadline. I'm talking about Your commit
> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
> resolving method. (hyunsik) ".
>
>
> 2) From class ScanNode, I use the method getTableId()  (e.g. in
> OuterJoinMeta, in FilterPushdownRule), which is no longer available, as of
> September 16th when You first renamed it to getTableName in commit
> 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
> GlobalPlanner and global plan data structure".
> Afterwards, on September 20th, in commit
> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
> resolving method. (hyunsik) ",  You dropped the FromTable field from this
> class and used a TableDesc instead.
>
>
> 3) From class Column, I use the method getTableName()  (e.g. in
> FilterPushdownRule), which is no longer available, as of September 16th, in
> commit  1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
> GlobalPlanner and global plan data structure", You deleted this method.
>
> 4) From class SortNode, I use the constructor public SortNode(SortSpec[]
> sortKeys, Schema inSchema, Schema outSchema)   (e.g. in PhysicalPlannerImpl)
> but yesterday, on September 23rd , on commit
> fc018de823dd34d769eb73f3c42e089b0d992b81    named  "TAJO-194: LogicalNode
> should have an identifier to distinguish each logical node instance", You
> deleted this method.
>
> All these major changes after the project's deadline September 16th, can be
> handled in the future depending on what You had in your plan when You
> decided to do all these changes.
>
>
> As all these changes occured after the project's deadline, I consider the
> project successfully finished and the proof is the source repository at
> https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1.
>
> The project status was continuously updated on the project's website that
> You know : https://sites.google.com/site/gsoc2013tajo34/ , so that the
> status was always visible for interested parties.
>
> Looking forward to hear from You soon.
>
> Yours sincerely,
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
> <ca...@yahoo.com>
> Sent: Sunday, September 15, 2013 12:49 PM
> Subject: Re: [GSoc2013] - Outer Join - 2 very important questions
>
> Hi camelia,
>
> I'm sorry for late response. The solution is simple. You can modify
> existing source code. I've changed the 121 line in Int4Datum as
> follows:
>
> default:
>   if (datum instanceof NullDatum) {
>     return DatumFactory.createBool(false);
>   } else {
>     throw new InvalidOperationException();
>   }
>
> Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
> other Datum classes need to have the above codes.
>
> Anyway, you work looks very great.
>
> Best regards,
> Hyunsik Choi
>
>> __________

[GSoc2013] - Outer Join

Posted by camelia c <ca...@yahoo.com>.
Hello,


Yes, indeed I finished the project. I had synchronized on September 5th , the last time.
Then I finished the project and ran   mvn clean install    successfully!

As You should have known, on September 16th, it was the soft pencils down deadline, meaning that Google evaluates source code written by that date. All source code modified afterwards counts only as further work, not for the GSoC project aim accomplisment.



Today, as You requested, I tried to synchronize again. 

Well, except from some conflicts that I resolved for commit purposes, the biggest surprise was to discover that You deleted the classes I worked with. Also, You modified/ deleted some methods that I was using. Unfortunately this is not the first time that You delete classes while I'm using them (just remind You of  #TAJO-87,  #TAJO-121 ,  #TAJO-96  which caused me to start work all over again at mid August, even if we had agreed initially on the Software Design Document).


I shall give some examples:

1) I use the class FromTable in several places (e.g. OuterJoinUtil, OuterJoinMeta, OuterJoinRewriteRule), but this class was deleted by You on September 20th, after the project's deadline. I'm talking about Your commit bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column resolving method. (hyunsik) ".


2) From class ScanNode, I use the method getTableId()  (e.g. in OuterJoinMeta, in FilterPushdownRule), which is no longer available, as of September 16th when You first renamed it to getTableName in commit 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor GlobalPlanner and global plan data structure".  

Afterwards, on September 20th, in commit bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column resolving method. (hyunsik) ",  You dropped the FromTable field from this class and used a TableDesc instead.


3) From class Column, I use the method getTableName()  (e.g. in FilterPushdownRule), which is no longer available, as of September 16th, in commit 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor GlobalPlanner and global plan data structure", You deleted this method.


4) From class SortNode, I use the constructor public SortNode(SortSpec[] sortKeys, Schema inSchema, Schema outSchema)   (e.g. in PhysicalPlannerImpl)

but yesterday, on September 23rd , on commit  fc018de823dd34d769eb73f3c42e089b0d992b81    named  "TAJO-194: LogicalNode should have an identifier to distinguish each logical node instance", You deleted this method.


All these major changes after the project's deadline September 16th, can be handled in the future depending on what You had in your plan when You decided to do all these changes.



As all these changes occured after the project's deadline, I consider the project successfully finished and the proof is the source repository at https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1.

The project status was continuously updated on the project's website that You know : https://sites.google.com/site/gsoc2013tajo34/ , so that the status was always visible for interested parties.


Looking forward to hear from You soon.


Yours sincerely,
Camelia





________________________________
 From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com> 
Sent: Sunday, September 15, 2013 12:49 PM
Subject: Re: [GSoc2013] - Outer Join - 2 very important questions
 

Hi camelia,

I'm sorry for late response. The solution is simple. You can modify
existing source code. I've changed the 121 line in Int4Datum as
follows:

default:
  if (datum instanceof NullDatum) {
    return DatumFactory.createBool(false);
  } else {
    throw new InvalidOperationException();
  }

Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
other Datum classes need to have the above codes.

Anyway, you work looks very great.

Best regards,
Hyunsik Choi

> __________

Re: [GSoc2013] - Outer Join - 2 very important questions

Posted by Hyunsik Choi <hy...@apache.org>.
Hi camelia,

I'm sorry for late response. The solution is simple. You can modify
existing source code. I've changed the 121 line in Int4Datum as
follows:

default:
  if (datum instanceof NullDatum) {
    return DatumFactory.createBool(false);
  } else {
    throw new InvalidOperationException();
  }

Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
other Datum classes need to have the above codes.

Anyway, you work looks very great.

Best regards,
Hyunsik Choi

On Sun, Sep 15, 2013 at 3:55 AM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> 1)
> I'm still waiting for Your answer on this question please.
> I re-tell You the problem.
>
> I have a table A with a column fkA of type INT4 which is a foreign key referencing table B's primary key pkB.
> The data in column fkA contains both numbers and null values (NullDatum).
>
>
> In table B, column pkB is of type INT4 and it is the primary key => all its values are non-null. So in pkB we have only numbers.
>
> When faced with testing the join condition
>
>
> if (joinQual != null) {
>         joinQual.eval(qualCtx, inSchema, frameTuple);
>         if (joinQual.terminate(qualCtx).asBool()) {
>
> there is a problem:
>
> ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
>     at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
>     at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
>     at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
>     at
>  org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
>     at org.apache.tajo.worker.Task.run(Task.java:378)
>     at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
>     at java.lang.Thread.run(Thread.java:662)
>
>
> So, once again my question is: what should be modified in order to allow the comparison of null values residents in an INT4 column to numbers in a different table's INT4 column.
>
> 2) How can I perform a specific unit test without having to perform all tests. I tried different choices suggested at http://tajo.incubator.apache.org/build.html,  but no success.
> For example, say You want to perform only test   TestNLJoinExec.java
> What is the complete command that You type in the terminal?
>
>
>
> Please answer me this message as soon as possible, please.
>
>
> Yours sincerely,
>
> Camelia
>
>
>
>
>
>
>
>
>
>
>
>
>
> ________________________________
>  From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Wednesday, September 11, 2013 11:06 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx (PS)
>
>
> PS:
>
> The problem is either there in BinaryEvalCtx, or in class
>
> public class Int4Datum extends NumericDatum.
>
>
> in the method
> public int compareTo(Datum datum) {
>
> ...
>
>
> @Override
>  127   public int compareTo(Datum datum) {
>  128     switch (datum.type()) {
>  129       case INT2:
>  130         if (val <
>  datum.asInt2()) {
>  131           return -1;
>  132         } else if (datum.asInt2() < val) {
>  133           return 1;
>  134         } else {
>  135           return 0;
>  136         }
>  137       case INT4:
>  138         if (val < datum.asInt4()) {
>  139           return -1;
>  140         } else if (datum.asInt4() < val) {
>  141           return
>  1;
>  142         } else {
>  143           return 0;
>  144         }
>  145       case INT8:
>  146         if (val < datum.asInt8()) {
>  147           return -1;
>  148         } else if (datum.asInt8() < val) {
>  149           return 1;
>  150         } else {
>  151           return 0;
>  152         }
>  153       case
>  FLOAT4:
>  154         if (val < datum.asFloat4()) {
>  155           return -1;
>  156         } else if (datum.asFloat4() < val) {
>  157           return 1;
>  158         } else {
>  159           return 0;
>  160         }
>  161       case FLOAT8:
>  162         if (val < datum.asFloat8()) {
>  163           return -1;
>  164         } else if (datum.asFloat8() < val)
>  {
>  165           return 1;
>  166         } else {
>  167           return 0;
>  168         }
>  169       default:
>  170         throw new InvalidOperationException(datum.type());
>  171     }
>  172   }
>  173
>
>
>
> Just a hint, maybe it helps You find and solve the issue more quickly.
>
> Camelia
>
>
>
>
> ________________________________
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Wednesday, September 11, 2013 10:18 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
>
>
> Hello,
>
> I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.
>
>
> The portion of code leading to error in  is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:
>
> if (joinQual != null) {
>         joinQual.eval(qualCtx, inSchema,
>  frameTuple);
>         if (joinQual.terminate(qualCtx).asBool()) {
>
> Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.
>
>
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
>  ********rightChild.next() =(0=>777.0, 1=>NULL)
>
> 13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
>     at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
>     at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
>     at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
>     at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
>     at org.apache.tajo.worker.Task.run(Task.java:378)
>     at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
>     at java.lang.Thread.run(Thread.java:662)
>
>
> It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.
>
> These classes exist beyond outer join nodes' scope but maybe they
>  weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.
>
>
>
> Thank You in advance for Your answer and for the understanding!
>
> Yours sincerely,
> Camelia

Re: [GSoc2013] - Outer Join - 2 very important questions

Posted by Hyunsik Choi <hy...@apache.org>.
According to your project site, you appear to finish your gsoc project one
week ago. Could you let me know a brief status of the project?

- hyunsik

2013년 9월 15일 일요일에 camelia c님이 작성:

> Hello,
>
> 1)
> I'm still waiting for Your answer on this question please.
> I re-tell You the problem.
>
> I have a table A with a column fkA of type INT4 which is a foreign key
> referencing table B's primary key pkB.
> The data in column fkA contains both numbers and null values (NullDatum).
>
>
> In table B, column pkB is of type INT4 and it is the primary key => all
> its values are non-null. So in pkB we have only numbers.
>
> When faced with testing the join condition
>
>
> if (joinQual != null) {
>         joinQual.eval(qualCtx, inSchema, frameTuple);
>         if (joinQual.terminate(qualCtx).asBool()) {
>
> there is a problem:
>
> ERROR worker.Task:
> org.apache.tajo.datum.exception.InvalidOperationException
>     at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
>     at
> org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
>     at
> org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
>     at
>
>  org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
>     at org.apache.tajo.worker.Task.run(Task.java:378)
>     at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
>     at java.lang.Thread.run(Thread.java:662)
>
>
> So, once again my question is: what should be modified in order to allow
> the comparison of null values residents in an INT4 column to numbers in a
> different table's INT4 column.
>
> 2) How can I perform a specific unit test without having to perform all
> tests. I tried different choices suggested at
> http://tajo.incubator.apache.org/build.html,  but no success.
> For example, say You want to perform only test   TestNLJoinExec.java
> What is the complete command that You type in the terminal?
>
>
>
> Please answer me this message as soon as possible, please.
>
>
> Yours sincerely,
>
> Camelia
>
>
>
>
>
>
>
>
>
>
>
>
>
> ________________________________
>  From: camelia c <camelie_1985@yahoo.com <javascript:;>>
> To: "dev@tajo.incubator.apache.org <javascript:;>" <
> dev@tajo.incubator.apache.org <javascript:;>>
> Sent: Wednesday, September 11, 2013 11:06 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
> (PS)
>
>
> PS:
>
> The problem is either there in BinaryEvalCtx, or in class
>
> public class Int4Datum extends NumericDatum.
>
>
> in the method
> public int compareTo(Datum datum) {
>
> ...
>
>
> @Override
>  127   public int compareTo(Datum datum) {
>  128     switch (datum.type()) {
>  129       case INT2:
>  130         if (val <
>  datum.asInt2()) {
>  131           return -1;
>  132         } else if (datum.asInt2() < val) {
>  133           return 1;
>  134         } else {
>  135           return 0;
>  136         }
>  137       case INT4:
>  138         if (val < datum.asInt4()) {
>  139           return -1;
>  140         } else if (datum.asInt4() < val) {
>  141           return
>  1;
>  142         } else {
>  143           return 0;
>  144         }
>  145       case INT8:
>  146         if (val < datum.asInt8()) {
>  147           return -1;
>  148         } else if (datum.asInt8() < val) {
>  149           return 1;
>  150         } else {
>  151           return 0;
>  152         }
>  153       case
>  FLOAT4:
>  154         if (val < datum.asFloat4()) {
>  155           return -1;
>  156         } else if (datum.asFloat4() < val) {
>  157           return 1;
>  158         } else {
>  159           return 0;
>  160         }
>  161       case FLOAT8:
>  162         if (val < datum.asFloat8()) {
>  163           return -1;
>  164         } else if (datum.asFloat8() < val)
>  {
>  165           return 1;
>  166         } else {
>  167           return 0;
>  168         }
>  169       default:
>  170         throw new InvalidOperationException(datum.type());
>  171     }
>  172   }
>  173
>
>
>
> Just a hint, maybe it helps You find and solve the issue more quickly.
>
> Camelia
>
>
>
>
> ________________________________
> From: camelia c <camelie_1985@yahoo.com <javascript:;>>
> To: "dev@tajo.incubator.apache.org <javascript:;>" <
> dev@tajo.incubator.apache.org <javascript:;>>
> Sent: Wednesday, September 11, 2013 10:18 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
>
>
> Hello,
>
> I would like to kindly ask You to take a moment and explain to me how does
> this BinaryEvalCtx handle NULL values in fields. From an error I got
> recently it seems that when it reads NULL values and it has to compare them
> to non-NULL values, it breaks.
>
>
> The portion of code leading to error in  is the same as in the original
> NLJoinExec and other join algorithms, where TAJO needs to verify the join
> condition and then project according to the output schema:
>
> if (joinQual != null) {
>         joinQual.eval(qualCtx, inSchema,
>  frameTuple);
>         if (joinQual.terminate(qualCtx).asBool()) {
>
> Here the verification of the join condition fails if one operand is a NULL
> value and the other is a number, in this case number 10.
>
>
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********leftChild.next() =(0=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********rightChild.next() =(0=>333.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result
> matched padded tuple =(0=>10, 1=>333.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********rightChild.next() =(0=>555.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result
> matched padded tuple =(0=>10, 1=>555.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
>  ********rightChild.next() =(0=>777.0, 1=>NULL)
>
> 13/09/11 21:37:01 ERROR worker.Task:
> org.apache.tajo.datum.exception.InvalidOperationException
>     at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
>     at
> org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
>     at
> org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
>     at
> org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
>     at org.apache.tajo.worker.Task.run(Task.java:378)
>     at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
>     at java.lang.Thread.run(Thread.java:662)
>
>
> It seems that at this moment, it takes the NULL as being Int4Datum,
> instead of NullDatum.
>
> These classes exist beyond outer join nodes' scope but maybe they
>  weren't tested for NULL values before. Now, some of the outer join's
> processing relies on these classes' correct behavior, that is why I'm
> trying to signal when I find errors.
>
>
>
> Thank You in advance for Your answer and for the understanding!
>
> Yours sincerely,
> Camelia

Re: [GSoc2013] - Outer Join - 2 very important questions

Posted by camelia c <ca...@yahoo.com>.
Hello,

1)
I'm still waiting for Your answer on this question please.
I re-tell You the problem.

I have a table A with a column fkA of type INT4 which is a foreign key referencing table B's primary key pkB.
The data in column fkA contains both numbers and null values (NullDatum). 


In table B, column pkB is of type INT4 and it is the primary key => all its values are non-null. So in pkB we have only numbers.

When faced with testing the join condition 


if (joinQual != null) {
        joinQual.eval(qualCtx, inSchema, frameTuple);
        if (joinQual.terminate(qualCtx).asBool()) {

there is a problem:

ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
    at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
    at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
    at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
    at
 org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
    at org.apache.tajo.worker.Task.run(Task.java:378)
    at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
    at java.lang.Thread.run(Thread.java:662)


So, once again my question is: what should be modified in order to allow the comparison of null values residents in an INT4 column to numbers in a different table's INT4 column.

2) How can I perform a specific unit test without having to perform all tests. I tried different choices suggested at http://tajo.incubator.apache.org/build.html,  but no success.
For example, say You want to perform only test   TestNLJoinExec.java
What is the complete command that You type in the terminal?



Please answer me this message as soon as possible, please.


Yours sincerely,

Camelia













________________________________
 From: camelia c <ca...@yahoo.com>
To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org> 
Sent: Wednesday, September 11, 2013 11:06 PM
Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx (PS)
 

PS:

The problem is either there in BinaryEvalCtx, or in class

public class Int4Datum extends NumericDatum.


in the method
public int compareTo(Datum datum) {

...


@Override
 127   public int compareTo(Datum datum) {
 128     switch (datum.type()) {
 129       case INT2:
 130         if (val <
 datum.asInt2()) {
 131           return -1;
 132         } else if (datum.asInt2() < val) {
 133           return 1;
 134         } else {
 135           return 0;
 136         }
 137       case INT4:
 138         if (val < datum.asInt4()) {
 139           return -1;
 140         } else if (datum.asInt4() < val) {
 141           return
 1;
 142         } else {
 143           return 0;
 144         }
 145       case INT8:
 146         if (val < datum.asInt8()) {
 147           return -1;
 148         } else if (datum.asInt8() < val) {
 149           return 1;
 150         } else {
 151           return 0;
 152         }
 153       case
 FLOAT4:
 154         if (val < datum.asFloat4()) {
 155           return -1;
 156         } else if (datum.asFloat4() < val) {
 157           return 1;
 158         } else {
 159           return 0;
 160         }
 161       case FLOAT8:
 162         if (val < datum.asFloat8()) {
 163           return -1;
 164         } else if (datum.asFloat8() < val)
 {
 165           return 1;
 166         } else {
 167           return 0;
 168         }
 169       default:
 170         throw new InvalidOperationException(datum.type());
 171     }
 172   }
 173 



Just a hint, maybe it helps You find and solve the issue more quickly.

Camelia




________________________________
From: camelia c <ca...@yahoo.com>
To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org> 
Sent: Wednesday, September 11, 2013 10:18 PM
Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx


Hello,

I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.


The portion of code leading to error in  is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:

if (joinQual != null) {
        joinQual.eval(qualCtx, inSchema,
 frameTuple);
        if (joinQual.terminate(qualCtx).asBool()) {
          
Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.



13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
 ********rightChild.next() =(0=>777.0, 1=>NULL)

13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
    at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
    at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
    at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
    at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
    at org.apache.tajo.worker.Task.run(Task.java:378)
    at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
    at java.lang.Thread.run(Thread.java:662)


It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.

These classes exist beyond outer join nodes' scope but maybe they
 weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.



Thank You in advance for Your answer and for the understanding!

Yours sincerely,
Camelia

Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx (PS)

Posted by camelia c <ca...@yahoo.com>.
PS:

The problem is either there in BinaryEvalCtx, or in class

public class Int4Datum extends NumericDatum.


 in the method
public int compareTo(Datum datum) {

...


@Override
 127   public int compareTo(Datum datum) {
 128     switch (datum.type()) {
 129       case INT2:
 130         if (val < datum.asInt2()) {
 131           return -1;
 132         } else if (datum.asInt2() < val) {
 133           return 1;
 134         } else {
 135           return 0;
 136         }
 137       case INT4:
 138         if (val < datum.asInt4()) {
 139           return -1;
 140         } else if (datum.asInt4() < val) {
 141           return 1;
 142         } else {
 143           return 0;
 144         }
 145       case INT8:
 146         if (val < datum.asInt8()) {
 147           return -1;
 148         } else if (datum.asInt8() < val) {
 149           return 1;
 150         } else {
 151           return 0;
 152         }
 153       case FLOAT4:
 154         if (val < datum.asFloat4()) {
 155           return -1;
 156         } else if (datum.asFloat4() < val) {
 157           return 1;
 158         } else {
 159           return 0;
 160         }
 161       case FLOAT8:
 162         if (val < datum.asFloat8()) {
 163           return -1;
 164         } else if (datum.asFloat8() < val) {
 165           return 1;
 166         } else {
 167           return 0;
 168         }
 169       default:
 170         throw new InvalidOperationException(datum.type());
 171     }
 172   }
 173 



Just a hint, maybe it helps You find and solve the issue more quickly.

Camelia




________________________________
 From: camelia c <ca...@yahoo.com>
To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org> 
Sent: Wednesday, September 11, 2013 10:18 PM
Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
 

Hello,

I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.


The portion of code leading to error in  is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:

if (joinQual != null) {
        joinQual.eval(qualCtx, inSchema, frameTuple);
        if (joinQual.terminate(qualCtx).asBool()) {
          
Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.



13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>777.0, 1=>NULL)

13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
    at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
    at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
    at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
    at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
    at org.apache.tajo.worker.Task.run(Task.java:378)
    at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
    at java.lang.Thread.run(Thread.java:662)


It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.

These classes exist beyond outer join nodes' scope but maybe they weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.



Thank You in advance for Your answer and for the understanding!

Yours sincerely,
Camelia

Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx

Posted by camelia c <ca...@yahoo.com>.
Hello,

I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.


The portion of code leading to error in  is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:

if (joinQual != null) {
        joinQual.eval(qualCtx, inSchema, frameTuple);
        if (joinQual.terminate(qualCtx).asBool()) {
          
Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.



13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)

13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>777.0, 1=>NULL)

13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
    at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
    at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
    at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
    at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
    at org.apache.tajo.worker.Task.run(Task.java:378)
    at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
    at java.lang.Thread.run(Thread.java:662)


It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.

These classes exist beyond outer join nodes' scope but maybe they weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.



Thank You in advance for Your answer and for the understanding!

Yours sincerely,
Camelia

Re: [GSoc2013] - Outer Join - a question about MergeJoinExec

Posted by camelia c <ca...@yahoo.com>.
Hello,

This is my source repository
      https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1

It's the same one that I wrote to You about in the past.


Thank You very much for Your kind help!

Camelia



________________________________
 From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com> 
Sent: Monday, September 9, 2013 7:39 PM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
 

Thank you for more detailed information. Are these problems caused by
your working source?

If so, how can I access your recent working source? Your github?

Actually, the recommended way for sharing your problem is as follows:

* create an Jira issue
* submit your patch or your github revision url
* describe your problem (your attached file is already satisfied)

Best regards,
Hyunsik Choi

On Mon, Sep 9, 2013 at 10:04 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> I send You an archive with the 3 problems encountered so far with the
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java
>
> Please be kind to help me solve them.
>
> For each problem there is a separate folder in the archive, containing the
> query, the problem, the TAJO output, the logical plan of MasterLOG and the
> worker's log.
>
> To summarize:
> Problem 1) partial output and
>
> java.lang.NullPointerException
>     at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383)
>     at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294)
>     at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223)
>     at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643)
>
> , even if the physical operator's next method returns correct and complete
> results.
>
> Problem 2) incorrect values in tuples received from child nodes
>
> Problem 3) unexpected stop receiving values and
> ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space
>
> The dataset is also concatenated in a separate data file in the archive.
>
>
> Thank You very much!
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
> <ca...@yahoo.com>
> Sent: Monday, September 9, 2013 3:52 AM
>
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
> Hi Camelia,
>
> Could you let me know as follows? If so, it's easier to investigate the
> problem.
>
> * your submitted SQL query
> * which physical operator (NLJoin or MergeJoin?)
> * (if possible) data sample that reproduces the problem
>
> Best regards,
> Hyunsik
>
>
> On Mon, Sep 9, 2013 at 7:30 AM, camelia c <ca...@yahoo.com> wrote:
>> A small addition to the previous message:
>>
>> The value obtained with
>>
>>    innerTuple = rightChild.next();
>>
>>
>> is in the join operator.
>>
>>
>> Camelia
>>
>>
>> ----- Forwarded Message -----
>> From: camelia c <ca...@yahoo.com>
>> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
>> Sent: Monday, September 9, 2013 1:25 AM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>>
>> Hello,
>>
>> Thank You very much for You helpful answer of yesterday!
>>
>> While testing, I encountered the following issue: the null values which
>> are read from files are sometimes randomly replaced by numbers such as 24 or
>> 29 or 30. This makes a serious problem for the algorithms! Can You please
>> tell me why do do think this happens and how can it be corrected?
>>
>>
>> Let me give You an example
>>
>> create external table emp1 (emp_id int, first_name text, last_name text,
>> dep_id int, salary float, job_id int) using csv with
>> ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>>
>>
>>
>> I specify null values in file like this:
>>
>> 1000,Tom,Smith,10,333,100
>> 1001,Mary,Thompson,10,555,
>> 1002,Aron,Weber,,777,100
>> 1003,Susan,Carlson,,999,
>>
>> Both the internal nulls and the trailing nulls(those at the end of line)
>> are sometimes  randomly substituted with a small number; for example
>> (last_name, salary, emp_id, dep_id) was read from file with
>>
>> innerTuple = rightChild.next();
>>
>> obtaining values innerTuple.toString() as :
>>
>>
>> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>>
>>
>> Sometimes, in other queries the null value is correctly read as NULL.
>>
>>
>>
>> Thank You in advance!
>>
>> Yours sincerely,
>> Camelia
>>
>>
>>
>>
>> ________________________________
>>  From: Hyunsik Choi <hy...@apache.org>
>> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
>> <ca...@yahoo.com>
>> Sent: Saturday, September 7, 2013 6:00 PM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>> Hi camelia,
>>
>> I'm sorry for late response. I've just came back home from the family
>> meeting. I leave in-line comments on your question.
>>
>> Best regards,
>> Hyunsik
>>
>>
>> On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
>>
>>> Hello,
>>>
>>> I resend You an updated list of questions that I have. For some of the
>>> ancient ones, I found the answer already.
>>>
>>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and
>>> outerTupleSlots and can You please give me an example of how they are
>>> filled, based on a dummy data set ?
>>
>> Merge join forwards each relation in order
>>  to find the same join key
>> tuples. Each of them keeps a list of tuples whose join keys are same.
>> Consider the below examples where there are two relations to be joined
>> and the first column of each relation is the join key.
>>
>> -----------------------------------
>> Two relations to be joined
>> -----------------------------------
>> Left                Right
>> (1,  A)            (1, B)
>> (1, C)            (1, C)
>> (3, D)            (1, D)
>>                      (2, E)
>>
>>
>> MergeJoin first finds all the same key tuples for each relation. So,
>> each tuple slot contains as follows:
>>
>> outerTupleSlots : (1, A), (1,C)
>> innerTupleSlots : (1,B), (1, C), (1,D)
>>
>> Then, MergeJoin leads to joined tuples. In the above example,
>> MergeJoin
>>  results in 6 tuples (2 x 3).
>>
>>>
>>> 2) I understood from a talk that the MergeJoinExec has some issues and
>>> that Mr Jihoon is trying to fix them. Can I rely on the current version of
>>> MergeJoinExec to extend it for FullOuter_MergeJoinExec and
>>> RightOuter_MergeJoinExec?
>>
>> MergeJoinExec does not have any problem. It is correct. There was a
>> misunderstood.
>>
>>>
>>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain
>>> the block name containing it?
>>> Even for a single-block query, how do we find for a JoinNode that it
>>> belongs to @ROOT, for example?
>>>
>>> More precisely, in class OuterJoinRewriteRule, in method
>>>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode,
>>> Stack<LogicalNode> stack, Integer depth)
>>>
>>> I tried to do
>>>    plan.getBlock(joinNode).getName()
>>> but I receive a Null Pointer Exception.
>>>
>>
>> The
>>  current API cannot what you want. The API needs to be improved for
>> supporting that. Probably, that is archived by modifying
>> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
>> method with some object including a current block name. I'll create a
>> jira issue for this improvement.
>>
>>
>>>
>>>
>>> I look forward to receiving Your answer!
>>>
>>> Yours sincerely,
>>> Camelia
>

Re: [GSoc2013] - Outer Join - a question about MergeJoinExec

Posted by Hyunsik Choi <hy...@apache.org>.
Thank you for more detailed information. Are these problems caused by
your working source?

If so, how can I access your recent working source? Your github?

Actually, the recommended way for sharing your problem is as follows:

* create an Jira issue
* submit your patch or your github revision url
* describe your problem (your attached file is already satisfied)

Best regards,
Hyunsik Choi

On Mon, Sep 9, 2013 at 10:04 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> I send You an archive with the 3 problems encountered so far with the
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java
>
> Please be kind to help me solve them.
>
> For each problem there is a separate folder in the archive, containing the
> query, the problem, the TAJO output, the logical plan of MasterLOG and the
> worker's log.
>
> To summarize:
> Problem 1) partial output and
>
> java.lang.NullPointerException
>     at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383)
>     at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294)
>     at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223)
>     at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643)
>
> , even if the physical operator's next method returns correct and complete
> results.
>
> Problem 2) incorrect values in tuples received from child nodes
>
> Problem 3) unexpected stop receiving values and
> ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space
>
> The dataset is also concatenated in a separate data file in the archive.
>
>
> Thank You very much!
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
> <ca...@yahoo.com>
> Sent: Monday, September 9, 2013 3:52 AM
>
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
> Hi Camelia,
>
> Could you let me know as follows? If so, it's easier to investigate the
> problem.
>
> * your submitted SQL query
> * which physical operator (NLJoin or MergeJoin?)
> * (if possible) data sample that reproduces the problem
>
> Best regards,
> Hyunsik
>
>
> On Mon, Sep 9, 2013 at 7:30 AM, camelia c <ca...@yahoo.com> wrote:
>> A small addition to the previous message:
>>
>> The value obtained with
>>
>>    innerTuple = rightChild.next();
>>
>>
>> is in the join operator.
>>
>>
>> Camelia
>>
>>
>> ----- Forwarded Message -----
>> From: camelia c <ca...@yahoo.com>
>> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
>> Sent: Monday, September 9, 2013 1:25 AM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>>
>> Hello,
>>
>> Thank You very much for You helpful answer of yesterday!
>>
>> While testing, I encountered the following issue: the null values which
>> are read from files are sometimes randomly replaced by numbers such as 24 or
>> 29 or 30. This makes a serious problem for the algorithms! Can You please
>> tell me why do do think this happens and how can it be corrected?
>>
>>
>> Let me give You an example
>>
>> create external table emp1 (emp_id int, first_name text, last_name text,
>> dep_id int, salary float, job_id int) using csv with
>> ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>>
>>
>>
>> I specify null values in file like this:
>>
>> 1000,Tom,Smith,10,333,100
>> 1001,Mary,Thompson,10,555,
>> 1002,Aron,Weber,,777,100
>> 1003,Susan,Carlson,,999,
>>
>> Both the internal nulls and the trailing nulls(those at the end of line)
>> are sometimes  randomly substituted with a small number; for example
>> (last_name, salary, emp_id, dep_id) was read from file with
>>
>> innerTuple = rightChild.next();
>>
>> obtaining values innerTuple.toString() as :
>>
>>
>> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>>
>>
>> Sometimes, in other queries the null value is correctly read as NULL.
>>
>>
>>
>> Thank You in advance!
>>
>> Yours sincerely,
>> Camelia
>>
>>
>>
>>
>> ________________________________
>>  From: Hyunsik Choi <hy...@apache.org>
>> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
>> <ca...@yahoo.com>
>> Sent: Saturday, September 7, 2013 6:00 PM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>> Hi camelia,
>>
>> I'm sorry for late response. I've just came back home from the family
>> meeting. I leave in-line comments on your question.
>>
>> Best regards,
>> Hyunsik
>>
>>
>> On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
>>
>>> Hello,
>>>
>>> I resend You an updated list of questions that I have. For some of the
>>> ancient ones, I found the answer already.
>>>
>>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and
>>> outerTupleSlots and can You please give me an example of how they are
>>> filled, based on a dummy data set ?
>>
>> Merge join forwards each relation in order
>>  to find the same join key
>> tuples. Each of them keeps a list of tuples whose join keys are same.
>> Consider the below examples where there are two relations to be joined
>> and the first column of each relation is the join key.
>>
>> -----------------------------------
>> Two relations to be joined
>> -----------------------------------
>> Left                Right
>> (1,  A)            (1, B)
>> (1, C)            (1, C)
>> (3, D)            (1, D)
>>                      (2, E)
>>
>>
>> MergeJoin first finds all the same key tuples for each relation. So,
>> each tuple slot contains as follows:
>>
>> outerTupleSlots : (1, A), (1,C)
>> innerTupleSlots : (1,B), (1, C), (1,D)
>>
>> Then, MergeJoin leads to joined tuples. In the above example,
>> MergeJoin
>>  results in 6 tuples (2 x 3).
>>
>>>
>>> 2) I understood from a talk that the MergeJoinExec has some issues and
>>> that Mr Jihoon is trying to fix them. Can I rely on the current version of
>>> MergeJoinExec to extend it for FullOuter_MergeJoinExec and
>>> RightOuter_MergeJoinExec?
>>
>> MergeJoinExec does not have any problem. It is correct. There was a
>> misunderstood.
>>
>>>
>>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain
>>> the block name containing it?
>>> Even for a single-block query, how do we find for a JoinNode that it
>>> belongs to @ROOT, for example?
>>>
>>> More precisely, in class OuterJoinRewriteRule, in method
>>>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode,
>>> Stack<LogicalNode> stack, Integer depth)
>>>
>>> I tried to do
>>>    plan.getBlock(joinNode).getName()
>>> but I receive a Null Pointer Exception.
>>>
>>
>> The
>>  current API cannot what you want. The API needs to be improved for
>> supporting that. Probably, that is archived by modifying
>> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
>> method with some object including a current block name. I'll create a
>> jira issue for this improvement.
>>
>>
>>>
>>>
>>> I look forward to receiving Your answer!
>>>
>>> Yours sincerely,
>>> Camelia
>

Re: [GSoc2013] - Outer Join - a question about MergeJoinExec

Posted by camelia c <ca...@yahoo.com>.
Hello,

I send You an archive with the 3 problems encountered so far with the 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java

Please be kind to help me solve them.

For each problem there is a separate folder in the archive, containing the query, the problem, the TAJO output, the logical plan of MasterLOG and the worker's log.

To summarize:
Problem 1) partial output and

java.lang.NullPointerException
    at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383)
    at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294)
    at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223)
    at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643)

, even if the physical operator's next method returns correct and complete results.

Problem 2) incorrect values in tuples received from child nodes

Problem 3) unexpected stop receiving values and 
ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space

The dataset is also concatenated in a separate data file in the archive.


Thank You very much!
Camelia




________________________________
 From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com> 
Sent: Monday, September 9, 2013 3:52 AM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
 

Hi Camelia,

Could you let me know as follows? If so, it's easier to investigate the problem.

* your submitted SQL query
* which physical operator (NLJoin or MergeJoin?)
* (if possible) data sample that reproduces the problem

Best regards,
Hyunsik


On Mon, Sep 9, 2013 at 7:30 AM, camelia c <ca...@yahoo.com> wrote:
> A small addition to the previous message:
>
> The value obtained with
>
>    innerTuple = rightChild.next();
>
>
> is in the join operator.
>
>
> Camelia
>
>
> ----- Forwarded Message -----
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Monday, September 9, 2013 1:25 AM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
>
> Hello,
>
> Thank You very much for You helpful answer of yesterday!
>
> While testing, I encountered the following issue: the null values which are read from files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious problem for the algorithms! Can You please tell me why do do think this happens and how can it be corrected?
>
>
> Let me give You an example
>
> create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>
>
>
> I specify null values in file like this:
>
> 1000,Tom,Smith,10,333,100
> 1001,Mary,Thompson,10,555,
> 1002,Aron,Weber,,777,100
> 1003,Susan,Carlson,,999,
>
> Both the internal nulls and the trailing nulls(those at the end of line) are sometimes  randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id) was read from file with
>
> innerTuple = rightChild.next();
>
> obtaining values innerTuple.toString() as :
>
>
> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>
>
> Sometimes, in other queries the null value is correctly read as NULL.
>
>
>
> Thank You in advance!
>
> Yours sincerely,
> Camelia
>
>
>
>
> ________________________________
>  From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
> Sent: Saturday, September 7, 2013 6:00 PM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
> Hi camelia,
>
> I'm sorry for late response. I've just came back home from the family
> meeting. I leave in-line comments on your question.
>
> Best regards,
> Hyunsik
>
>
> On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
>
>> Hello,
>>
>> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>>
>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?
>
> Merge join forwards each relation in order
>  to find the same join key
> tuples. Each of them keeps a list of tuples whose join keys are same.
> Consider the below examples where there are two relations to be joined
> and the first column of each relation is the join key.
>
> -----------------------------------
> Two relations to be joined
> -----------------------------------
> Left                Right
> (1,  A)            (1, B)
> (1, C)             (1, C)
> (3, D)             (1, D)
>                       (2, E)
>
>
> MergeJoin first finds all the same key tuples for each relation. So,
> each tuple slot contains as follows:
>
> outerTupleSlots : (1, A), (1,C)
> innerTupleSlots : (1,B), (1, C), (1,D)
>
> Then, MergeJoin leads to joined tuples. In the above example,
> MergeJoin
>  results in 6 tuples (2 x 3).
>
>>
>> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
>
> MergeJoinExec does not have any problem. It is correct. There was a
> misunderstood.
>
>>
>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
>> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>>
>> More precisely, in class OuterJoinRewriteRule, in method
>>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>>
>> I tried to do
>>     plan.getBlock(joinNode).getName()
>> but I receive a Null Pointer Exception.
>>
>
> The
>  current API cannot what you want. The API needs to be improved for
> supporting that. Probably, that is archived by modifying
> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
> method with some object including a current block name. I'll create a
> jira issue for this improvement.
>
>
>>
>>
>> I look forward to receiving Your answer!
>>
>> Yours sincerely,
>> Camelia

Re: [GSoc2013] - Outer Join - a question about MergeJoinExec

Posted by Hyunsik Choi <hy...@apache.org>.
Hi Camelia,

Could you let me know as follows? If so, it's easier to investigate the problem.

* your submitted SQL query
* which physical operator (NLJoin or MergeJoin?)
* (if possible) data sample that reproduces the problem

Best regards,
Hyunsik


On Mon, Sep 9, 2013 at 7:30 AM, camelia c <ca...@yahoo.com> wrote:
> A small addition to the previous message:
>
> The value obtained with
>
>    innerTuple = rightChild.next();
>
>
> is in the join operator.
>
>
> Camelia
>
>
> ----- Forwarded Message -----
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Monday, September 9, 2013 1:25 AM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
>
> Hello,
>
> Thank You very much for You helpful answer of yesterday!
>
> While testing, I encountered the following issue: the null values which are read from files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious problem for the algorithms! Can You please tell me why do do think this happens and how can it be corrected?
>
>
> Let me give You an example
>
> create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>
>
>
> I specify null values in file like this:
>
> 1000,Tom,Smith,10,333,100
> 1001,Mary,Thompson,10,555,
> 1002,Aron,Weber,,777,100
> 1003,Susan,Carlson,,999,
>
> Both the internal nulls and the trailing nulls(those at the end of line) are sometimes  randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id) was read from file with
>
> innerTuple = rightChild.next();
>
> obtaining values innerTuple.toString() as :
>
>
> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>
>
> Sometimes, in other queries the null value is correctly read as NULL.
>
>
>
> Thank You in advance!
>
> Yours sincerely,
> Camelia
>
>
>
>
> ________________________________
>  From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
> Sent: Saturday, September 7, 2013 6:00 PM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
> Hi camelia,
>
> I'm sorry for late response. I've just came back home from the family
> meeting. I leave in-line comments on your question.
>
> Best regards,
> Hyunsik
>
>
> On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
>
>> Hello,
>>
>> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>>
>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?
>
> Merge join forwards each relation in order
>  to find the same join key
> tuples. Each of them keeps a list of tuples whose join keys are same.
> Consider the below examples where there are two relations to be joined
> and the first column of each relation is the join key.
>
> -----------------------------------
> Two relations to be joined
> -----------------------------------
> Left                Right
> (1,  A)            (1, B)
> (1, C)             (1, C)
> (3, D)             (1, D)
>                       (2, E)
>
>
> MergeJoin first finds all the same key tuples for each relation. So,
> each tuple slot contains as follows:
>
> outerTupleSlots : (1, A), (1,C)
> innerTupleSlots : (1,B), (1, C), (1,D)
>
> Then, MergeJoin leads to joined tuples. In the above example,
> MergeJoin
>  results in 6 tuples (2 x 3).
>
>>
>> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
>
> MergeJoinExec does not have any problem. It is correct. There was a
> misunderstood.
>
>>
>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
>> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>>
>> More precisely, in class OuterJoinRewriteRule, in method
>>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>>
>> I tried to do
>>     plan.getBlock(joinNode).getName()
>> but I receive a Null Pointer Exception.
>>
>
> The
>  current API cannot what you want. The API needs to be improved for
> supporting that. Probably, that is archived by modifying
> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
> method with some object including a current block name. I'll create a
> jira issue for this improvement.
>
>
>>
>>
>> I look forward to receiving Your answer!
>>
>> Yours sincerely,
>> Camelia

[GSoc2013] - Outer Join - a question about MergeJoinExec

Posted by camelia c <ca...@yahoo.com>.
A small addition to the previous message:

The value obtained with 

   innerTuple = rightChild.next();  


is in the join operator.


Camelia


----- Forwarded Message -----
From: camelia c <ca...@yahoo.com>
To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org> 
Sent: Monday, September 9, 2013 1:25 AM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
 


Hello,

Thank You very much for You helpful answer of yesterday!

While testing, I encountered the following issue: the null values which are read from files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious problem for the algorithms! Can You please tell me why do do think this happens and how can it be corrected?


Let me give You an example

create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';



I specify null values in file like this:

1000,Tom,Smith,10,333,100
1001,Mary,Thompson,10,555,
1002,Aron,Weber,,777,100
1003,Susan,Carlson,,999,

Both the internal nulls and the trailing nulls(those at the end of line) are sometimes  randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id) was read from file with 

innerTuple = rightChild.next();

obtaining values innerTuple.toString() as :


(0=>Weber, 1=>777.0, 2=>1002, 3=>29)


Sometimes, in other queries the null value is correctly read as NULL.



Thank You in advance!

Yours sincerely,
Camelia




________________________________
 From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com> 
Sent: Saturday, September 7, 2013 6:00 PM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
 

Hi camelia,

I'm sorry for late response. I've just came back home from the family
meeting. I leave in-line comments on your question.

Best regards,
Hyunsik


On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:

> Hello,
>
> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>
> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?

Merge join forwards each relation in order
 to find the same join key
tuples. Each of them keeps a list of tuples whose join keys are same.
Consider the below examples where there are two relations to be joined
and the first column of each relation is the join key.

-----------------------------------
Two relations to be joined
-----------------------------------
Left                Right
(1,  A)            (1, B)
(1, C)             (1, C)
(3, D)             (1, D)
                      (2, E)


MergeJoin first finds all the same key tuples for each relation. So,
each tuple slot contains as follows:

outerTupleSlots : (1, A), (1,C)
innerTupleSlots : (1,B), (1, C), (1,D)

Then, MergeJoin leads to joined tuples. In the above example,
MergeJoin
 results in 6 tuples (2 x 3).

>
> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?

MergeJoinExec does not have any problem. It is correct. There was a
misunderstood.

>
> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>
> More precisely, in class OuterJoinRewriteRule, in method
>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>
> I tried to do
>     plan.getBlock(joinNode).getName()
> but I receive a Null Pointer Exception.
>

The
 current API cannot what you want. The API needs to be improved for
supporting that. Probably, that is archived by modifying
BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
method with some object including a current block name. I'll create a
jira issue for this improvement.


>
>
> I look forward to receiving Your answer!
>
> Yours sincerely,
> Camelia

Re: [GSoc2013] - Outer Join - a question about MergeJoinExec

Posted by camelia c <ca...@yahoo.com>.
Hello,

Thank You very much for You helpful answer of yesterday!

While testing, I encountered the following issue: the null values which are read from files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious problem for the algorithms! Can You please tell me why do do think this happens and how can it be corrected?


Let me give You an example

create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';



I specify null values in file like this:

1000,Tom,Smith,10,333,100
1001,Mary,Thompson,10,555,
1002,Aron,Weber,,777,100
1003,Susan,Carlson,,999,

Both the internal nulls and the trailing nulls(those at the end of line) are sometimes  randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id) was read from file with 

innerTuple = rightChild.next();

obtaining values innerTuple.toString() as :


(0=>Weber, 1=>777.0, 2=>1002, 3=>29)


Sometimes, in other queries the null value is correctly read as NULL.



Thank You in advance!

Yours sincerely,
Camelia




________________________________
 From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com> 
Sent: Saturday, September 7, 2013 6:00 PM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
 

Hi camelia,

I'm sorry for late response. I've just came back home from the family
meeting. I leave in-line comments on your question.

Best regards,
Hyunsik


On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:

> Hello,
>
> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>
> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?

Merge join forwards each relation in order to find the same join key
tuples. Each of them keeps a list of tuples whose join keys are same.
Consider the below examples where there are two relations to be joined
and the first column of each relation is the join key.

-----------------------------------
Two relations to be joined
-----------------------------------
Left                Right
(1,  A)            (1, B)
(1, C)             (1, C)
(3, D)             (1, D)
                      (2, E)


MergeJoin first finds all the same key tuples for each relation. So,
each tuple slot contains as follows:

outerTupleSlots : (1, A), (1,C)
innerTupleSlots : (1,B), (1, C), (1,D)

Then, MergeJoin leads to joined tuples. In the above example,
MergeJoin results in 6 tuples (2 x 3).

>
> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?

MergeJoinExec does not have any problem. It is correct. There was a
misunderstood.

>
> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>
> More precisely, in class OuterJoinRewriteRule, in method
>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>
> I tried to do
>     plan.getBlock(joinNode).getName()
> but I receive a Null Pointer Exception.
>

The current API cannot what you want. The API needs to be improved for
supporting that. Probably, that is archived by modifying
BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
method with some object including a current block name. I'll create a
jira issue for this improvement.


>
>
> I look forward to receiving Your answer!
>
> Yours sincerely,
> Camelia

Re: [GSoc2013] - Outer Join - a question about MergeJoinExec

Posted by Hyunsik Choi <hy...@apache.org>.
Hi camelia,

I'm sorry for late response. I've just came back home from the family
meeting. I leave in-line comments on your question.

Best regards,
Hyunsik


On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:

> Hello,
>
> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>
> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?

Merge join forwards each relation in order to find the same join key
tuples. Each of them keeps a list of tuples whose join keys are same.
Consider the below examples where there are two relations to be joined
and the first column of each relation is the join key.

-----------------------------------
Two relations to be joined
-----------------------------------
Left                Right
(1,  A)            (1, B)
(1, C)             (1, C)
(3, D)             (1, D)
                      (2, E)


MergeJoin first finds all the same key tuples for each relation. So,
each tuple slot contains as follows:

outerTupleSlots : (1, A), (1,C)
innerTupleSlots : (1,B), (1, C), (1,D)

Then, MergeJoin leads to joined tuples. In the above example,
MergeJoin results in 6 tuples (2 x 3).

>
> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?

MergeJoinExec does not have any problem. It is correct. There was a
misunderstood.

>
> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>
> More precisely, in class OuterJoinRewriteRule, in method
>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>
> I tried to do
>     plan.getBlock(joinNode).getName()
> but I receive a Null Pointer Exception.
>

The current API cannot what you want. The API needs to be improved for
supporting that. Probably, that is archived by modifying
BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
method with some object including a current block name. I'll create a
jira issue for this improvement.


>
>
> I look forward to receiving Your answer!
>
> Yours sincerely,
> Camelia