You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tajo.apache.org by camelia c <ca...@yahoo.com> on 2013/09/07 13:42:29 UTC
[GSoc2013] - Outer Join - a question about MergeJoinExec
Hello,
I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?
2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
More precisely, in class OuterJoinRewriteRule, in method
public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
I tried to do
plan.getBlock(joinNode).getName()
but I receive a Null Pointer Exception.
I look forward to receiving Your answer!
Yours sincerely,
Camelia
Re: [GSoc2013] - Outer Join - 2 very important questions
Posted by Hyunsik Choi <hy...@apache.org>.
This is the answer of the second question. If you want to perform unit
tests for a specific class TestLeftOuter_NLJoinExec, please type at
the directory tajo-core/tajo-core-backend as follows:
mvn test -Dtest=org.apache.tajo.engine.planner.physical.TestLeftOuter_NLJoinExec
The best way is to use some IDE that support junit. intelliJ and
eclipse support this feature.
Note that you should execute 'mvn clean install' at the source root if
you changed some source code out of tajo-core/tajo-core-backend.
Best regards,
Hyunsik
On Sun, Sep 15, 2013 at 3:55 AM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> 1)
> I'm still waiting for Your answer on this question please.
> I re-tell You the problem.
>
> I have a table A with a column fkA of type INT4 which is a foreign key referencing table B's primary key pkB.
> The data in column fkA contains both numbers and null values (NullDatum).
>
>
> In table B, column pkB is of type INT4 and it is the primary key => all its values are non-null. So in pkB we have only numbers.
>
> When faced with testing the join condition
>
>
> if (joinQual != null) {
> joinQual.eval(qualCtx, inSchema, frameTuple);
> if (joinQual.terminate(qualCtx).asBool()) {
>
> there is a problem:
>
> ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
> at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
> at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
> at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
> at
> org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
> at org.apache.tajo.worker.Task.run(Task.java:378)
> at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
> at java.lang.Thread.run(Thread.java:662)
>
>
> So, once again my question is: what should be modified in order to allow the comparison of null values residents in an INT4 column to numbers in a different table's INT4 column.
>
> 2) How can I perform a specific unit test without having to perform all tests. I tried different choices suggested at http://tajo.incubator.apache.org/build.html, but no success.
> For example, say You want to perform only test TestNLJoinExec.java
> What is the complete command that You type in the terminal?
>
>
>
> Please answer me this message as soon as possible, please.
>
>
> Yours sincerely,
>
> Camelia
>
>
>
>
>
>
>
>
>
>
>
>
>
> ________________________________
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Wednesday, September 11, 2013 11:06 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx (PS)
>
>
> PS:
>
> The problem is either there in BinaryEvalCtx, or in class
>
> public class Int4Datum extends NumericDatum.
>
>
> in the method
> public int compareTo(Datum datum) {
>
> ...
>
>
> @Override
> 127 public int compareTo(Datum datum) {
> 128 switch (datum.type()) {
> 129 case INT2:
> 130 if (val <
> datum.asInt2()) {
> 131 return -1;
> 132 } else if (datum.asInt2() < val) {
> 133 return 1;
> 134 } else {
> 135 return 0;
> 136 }
> 137 case INT4:
> 138 if (val < datum.asInt4()) {
> 139 return -1;
> 140 } else if (datum.asInt4() < val) {
> 141 return
> 1;
> 142 } else {
> 143 return 0;
> 144 }
> 145 case INT8:
> 146 if (val < datum.asInt8()) {
> 147 return -1;
> 148 } else if (datum.asInt8() < val) {
> 149 return 1;
> 150 } else {
> 151 return 0;
> 152 }
> 153 case
> FLOAT4:
> 154 if (val < datum.asFloat4()) {
> 155 return -1;
> 156 } else if (datum.asFloat4() < val) {
> 157 return 1;
> 158 } else {
> 159 return 0;
> 160 }
> 161 case FLOAT8:
> 162 if (val < datum.asFloat8()) {
> 163 return -1;
> 164 } else if (datum.asFloat8() < val)
> {
> 165 return 1;
> 166 } else {
> 167 return 0;
> 168 }
> 169 default:
> 170 throw new InvalidOperationException(datum.type());
> 171 }
> 172 }
> 173
>
>
>
> Just a hint, maybe it helps You find and solve the issue more quickly.
>
> Camelia
>
>
>
>
> ________________________________
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Wednesday, September 11, 2013 10:18 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
>
>
> Hello,
>
> I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.
>
>
> The portion of code leading to error in is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:
>
> if (joinQual != null) {
> joinQual.eval(qualCtx, inSchema,
> frameTuple);
> if (joinQual.terminate(qualCtx).asBool()) {
>
> Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.
>
>
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********rightChild.next() =(0=>777.0, 1=>NULL)
>
> 13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
> at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
> at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
> at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
> at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
> at org.apache.tajo.worker.Task.run(Task.java:378)
> at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
> at java.lang.Thread.run(Thread.java:662)
>
>
> It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.
>
> These classes exist beyond outer join nodes' scope but maybe they
> weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.
>
>
>
> Thank You in advance for Your answer and for the understanding!
>
> Yours sincerely,
> Camelia
Re: [GSoc2013] - Outer Join
Posted by Hyunsik Choi <hy...@apache.org>.
I also appreciate your understanding. If it is hard for you to rebase
your source code, I can help you.
Please read this page http://wiki.apache.org/tajo/HowToContribute
HowToContribute page describes how to create a patch.
Then, you can find a button named 'Attach Files' from 'More Actions'
in the below Jira issue.
https://issues.apache.org/jira/browse/TAJO-34
Thanks,
Hyunsik
On Tue, Sep 24, 2013 at 11:13 PM, camelia c <ca...@yahoo.com> wrote:
> Thank You for Your feedback!
>
> It is mainly the outer join optimization part that got affected by TAJO's
> recent changes. I'll make some more efforts to figure out how to save that
> part as well.
>
> Also, which is the command for creating a patch for JIRA?
>
> Thank You very much,
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: camelia c <ca...@yahoo.com>
> Cc: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Tuesday, September 24, 2013 3:52 PM
>
> Subject: Re: [GSoc2013] - Outer Join
>
> I also think you successfully finish you job.
>
> However, you need to know what is the primary focus of GSoC project.
> The most important thing is to learn how to participate in open source
> project. The final step of open source contribution is to commit your
> work to the source repository. It would be better that you make more
> effort to submit your work as a patch to Jira.
>
> In addition, source code conflicts are very usual in open source
> projects that two or more people contribute. You need to rebase your
> code frequently or submit a small completed peace as a patch to Jira
> each time you finished a small part. I already recommend you to submit
> patches several times. Your last rebase was performed in September
> 5th. 20 days are not short in an active open source project.
>
> Anyway, conguratultion that you finish GSoC program!
>
> Best regards,
> Hyunsik Choi
>
>
>
> On Tue, Sep 24, 2013 at 9:04 PM, camelia c <ca...@yahoo.com> wrote:
>> Hello,
>>
>>
>> Yes, indeed I finished the project. I had synchronized on September 5th ,
>> the last time.
>> Then I finished the project and ran mvn clean install successfully!
>>
>> As You should have known, on September 16th, it was the soft pencils down
>> deadline, meaning that Google evaluates source code written by that date.
>> All source code modified afterwards counts only as further work, not for
>> the
>> GSoC project aim accomplisment.
>>
>>
>> Today, as You requested, I tried to synchronize again.
>> Well, except from some conflicts that I resolved for commit purposes, the
>> biggest surprise was to discover that You deleted the classes I worked
>> with.
>> Also, You modified/ deleted some methods that I was using. Unfortunately
>> this is not the first time that You delete classes while I'm using them
>> (just remind You of #TAJO-87, #TAJO-121 , #TAJO-96 which caused me to
>> start work all over again at mid August, even if we had agreed initially
>> on
>> the Software Design Document).
>>
>> I shall give some examples:
>>
>> 1) I use the class FromTable in several places (e.g. OuterJoinUtil,
>> OuterJoinMeta, OuterJoinRewriteRule), but this class was deleted by You on
>> September 20th, after the project's deadline. I'm talking about Your
>> commit
>> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
>> resolving method. (hyunsik) ".
>>
>>
>> 2) From class ScanNode, I use the method getTableId() (e.g. in
>> OuterJoinMeta, in FilterPushdownRule), which is no longer available, as of
>> September 16th when You first renamed it to getTableName in commit
>> 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
>> GlobalPlanner and global plan data structure".
>> Afterwards, on September 20th, in commit
>> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
>> resolving method. (hyunsik) ", You dropped the FromTable field from this
>> class and used a TableDesc instead.
>>
>>
>> 3) From class Column, I use the method getTableName() (e.g. in
>> FilterPushdownRule), which is no longer available, as of September 16th,
>> in
>> commit 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
>> GlobalPlanner and global plan data structure", You deleted this method.
>>
>> 4) From class SortNode, I use the constructor public SortNode(SortSpec[]
>> sortKeys, Schema inSchema, Schema outSchema) (e.g. in
>> PhysicalPlannerImpl)
>> but yesterday, on September 23rd , on commit
>> fc018de823dd34d769eb73f3c42e089b0d992b81 named "TAJO-194: LogicalNode
>> should have an identifier to distinguish each logical node instance", You
>> deleted this method.
>>
>> All these major changes after the project's deadline September 16th, can
>> be
>> handled in the future depending on what You had in your plan when You
>> decided to do all these changes.
>>
>>
>> As all these changes occured after the project's deadline, I consider the
>> project successfully finished and the proof is the source repository at
>> https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1.
>>
>> The project status was continuously updated on the project's website that
>> You know : https://sites.google.com/site/gsoc2013tajo34/ , so that the
>> status was always visible for interested parties.
>>
>> Looking forward to hear from You soon.
>>
>> Yours sincerely,
>> Camelia
>>
>>
>> ________________________________
>> From: Hyunsik Choi <hy...@apache.org>
>> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
>> <ca...@yahoo.com>
>> Sent: Sunday, September 15, 2013 12:49 PM
>> Subject: Re: [GSoc2013] - Outer Join - 2 very important questions
>>
>> Hi camelia,
>>
>> I'm sorry for late response. The solution is simple. You can modify
>> existing source code. I've changed the 121 line in Int4Datum as
>> follows:
>>
>> default:
>> if (datum instanceof NullDatum) {
>> return DatumFactory.createBool(false);
>> } else {
>> throw new InvalidOperationException();
>> }
>>
>> Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
>> other Datum classes need to have the above codes.
>>
>> Anyway, you work looks very great.
>>
>> Best regards,
>> Hyunsik Choi
>>
>>> __________
>
>
Re: [GSoc2013] - Outer Join
Posted by camelia c <ca...@yahoo.com>.
Thank You for Your feedback!
It is mainly the outer join optimization part that got affected by TAJO's recent changes. I'll make some more efforts to figure out how to save that part as well.
Also, which is the command for creating a patch for JIRA?
Thank You very much,
Camelia
________________________________
From: Hyunsik Choi <hy...@apache.org>
To: camelia c <ca...@yahoo.com>
Cc: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
Sent: Tuesday, September 24, 2013 3:52 PM
Subject: Re: [GSoc2013] - Outer Join
I also think you successfully finish you job.
However, you need to know what is the primary focus of GSoC project.
The most important thing is to learn how to participate in open source
project. The final step of open source contribution is to commit your
work to the source repository. It would be better that you make more
effort to submit your work as a patch to Jira.
In addition, source code conflicts are very usual in open source
projects that two or more people contribute. You need to rebase your
code frequently or submit a small completed peace as a patch to Jira
each time you finished a small part. I already recommend you to submit
patches several times. Your last rebase was performed in September
5th. 20 days are not short in an active open source project.
Anyway, conguratultion that you finish GSoC program!
Best regards,
Hyunsik Choi
On Tue, Sep 24, 2013 at 9:04 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
>
> Yes, indeed I finished the project. I had synchronized on September 5th ,
> the last time.
> Then I finished the project and ran mvn clean install successfully!
>
> As You should have known, on September 16th, it was the soft pencils down
> deadline, meaning that Google evaluates source code written by that date.
> All source code modified afterwards counts only as further work, not for the
> GSoC project aim accomplisment.
>
>
> Today, as You requested, I tried to synchronize again.
> Well, except from some conflicts that I resolved for commit purposes, the
> biggest surprise was to discover that You deleted the classes I worked with.
> Also, You modified/ deleted some methods that I was using. Unfortunately
> this is not the first time that You delete classes while I'm using them
> (just remind You of #TAJO-87, #TAJO-121 , #TAJO-96 which caused me to
> start work all over again at mid August, even if we had agreed initially on
> the Software Design Document).
>
> I shall give some examples:
>
> 1) I use the class FromTable in several places (e.g. OuterJoinUtil,
> OuterJoinMeta, OuterJoinRewriteRule), but this class was deleted by You on
> September 20th, after the project's deadline. I'm talking about Your commit
> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
> resolving method. (hyunsik) ".
>
>
> 2) From class ScanNode, I use the method getTableId() (e.g. in
> OuterJoinMeta, in FilterPushdownRule), which is no longer available, as of
> September 16th when You first renamed it to getTableName in commit
> 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
> GlobalPlanner and global plan data structure".
> Afterwards, on September 20th, in commit
> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
> resolving method. (hyunsik) ", You dropped the FromTable field from this
> class and used a TableDesc instead.
>
>
> 3) From class Column, I use the method getTableName() (e.g. in
> FilterPushdownRule), which is no longer available, as of September 16th, in
> commit 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
> GlobalPlanner and global plan data structure", You deleted this method.
>
> 4) From class SortNode, I use the constructor public SortNode(SortSpec[]
> sortKeys, Schema inSchema, Schema outSchema) (e.g. in PhysicalPlannerImpl)
> but yesterday, on September 23rd , on commit
> fc018de823dd34d769eb73f3c42e089b0d992b81 named "TAJO-194: LogicalNode
> should have an identifier to distinguish each logical node instance", You
> deleted this method.
>
> All these major changes after the project's deadline September 16th, can be
> handled in the future depending on what You had in your plan when You
> decided to do all these changes.
>
>
> As all these changes occured after the project's deadline, I consider the
> project successfully finished and the proof is the source repository at
> https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1.
>
> The project status was continuously updated on the project's website that
> You know : https://sites.google.com/site/gsoc2013tajo34/ , so that the
> status was always visible for interested parties.
>
> Looking forward to hear from You soon.
>
> Yours sincerely,
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
> <ca...@yahoo.com>
> Sent: Sunday, September 15, 2013 12:49 PM
> Subject: Re: [GSoc2013] - Outer Join - 2 very important questions
>
> Hi camelia,
>
> I'm sorry for late response. The solution is simple. You can modify
> existing source code. I've changed the 121 line in Int4Datum as
> follows:
>
> default:
> if (datum instanceof NullDatum) {
> return DatumFactory.createBool(false);
> } else {
> throw new InvalidOperationException();
> }
>
> Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
> other Datum classes need to have the above codes.
>
> Anyway, you work looks very great.
>
> Best regards,
> Hyunsik Choi
>
>> __________
Re: [GSoc2013] - Outer Join
Posted by Hyunsik Choi <hy...@apache.org>.
I also think you successfully finish you job.
However, you need to know what is the primary focus of GSoC project.
The most important thing is to learn how to participate in open source
project. The final step of open source contribution is to commit your
work to the source repository. It would be better that you make more
effort to submit your work as a patch to Jira.
In addition, source code conflicts are very usual in open source
projects that two or more people contribute. You need to rebase your
code frequently or submit a small completed peace as a patch to Jira
each time you finished a small part. I already recommend you to submit
patches several times. Your last rebase was performed in September
5th. 20 days are not short in an active open source project.
Anyway, conguratultion that you finish GSoC program!
Best regards,
Hyunsik Choi
On Tue, Sep 24, 2013 at 9:04 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
>
> Yes, indeed I finished the project. I had synchronized on September 5th ,
> the last time.
> Then I finished the project and ran mvn clean install successfully!
>
> As You should have known, on September 16th, it was the soft pencils down
> deadline, meaning that Google evaluates source code written by that date.
> All source code modified afterwards counts only as further work, not for the
> GSoC project aim accomplisment.
>
>
> Today, as You requested, I tried to synchronize again.
> Well, except from some conflicts that I resolved for commit purposes, the
> biggest surprise was to discover that You deleted the classes I worked with.
> Also, You modified/ deleted some methods that I was using. Unfortunately
> this is not the first time that You delete classes while I'm using them
> (just remind You of #TAJO-87, #TAJO-121 , #TAJO-96 which caused me to
> start work all over again at mid August, even if we had agreed initially on
> the Software Design Document).
>
> I shall give some examples:
>
> 1) I use the class FromTable in several places (e.g. OuterJoinUtil,
> OuterJoinMeta, OuterJoinRewriteRule), but this class was deleted by You on
> September 20th, after the project's deadline. I'm talking about Your commit
> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
> resolving method. (hyunsik) ".
>
>
> 2) From class ScanNode, I use the method getTableId() (e.g. in
> OuterJoinMeta, in FilterPushdownRule), which is no longer available, as of
> September 16th when You first renamed it to getTableName in commit
> 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
> GlobalPlanner and global plan data structure".
> Afterwards, on September 20th, in commit
> bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column
> resolving method. (hyunsik) ", You dropped the FromTable field from this
> class and used a TableDesc instead.
>
>
> 3) From class Column, I use the method getTableName() (e.g. in
> FilterPushdownRule), which is no longer available, as of September 16th, in
> commit 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor
> GlobalPlanner and global plan data structure", You deleted this method.
>
> 4) From class SortNode, I use the constructor public SortNode(SortSpec[]
> sortKeys, Schema inSchema, Schema outSchema) (e.g. in PhysicalPlannerImpl)
> but yesterday, on September 23rd , on commit
> fc018de823dd34d769eb73f3c42e089b0d992b81 named "TAJO-194: LogicalNode
> should have an identifier to distinguish each logical node instance", You
> deleted this method.
>
> All these major changes after the project's deadline September 16th, can be
> handled in the future depending on what You had in your plan when You
> decided to do all these changes.
>
>
> As all these changes occured after the project's deadline, I consider the
> project successfully finished and the proof is the source repository at
> https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1.
>
> The project status was continuously updated on the project's website that
> You know : https://sites.google.com/site/gsoc2013tajo34/ , so that the
> status was always visible for interested parties.
>
> Looking forward to hear from You soon.
>
> Yours sincerely,
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
> <ca...@yahoo.com>
> Sent: Sunday, September 15, 2013 12:49 PM
> Subject: Re: [GSoc2013] - Outer Join - 2 very important questions
>
> Hi camelia,
>
> I'm sorry for late response. The solution is simple. You can modify
> existing source code. I've changed the 121 line in Int4Datum as
> follows:
>
> default:
> if (datum instanceof NullDatum) {
> return DatumFactory.createBool(false);
> } else {
> throw new InvalidOperationException();
> }
>
> Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
> other Datum classes need to have the above codes.
>
> Anyway, you work looks very great.
>
> Best regards,
> Hyunsik Choi
>
>> __________
[GSoc2013] - Outer Join
Posted by camelia c <ca...@yahoo.com>.
Hello,
Yes, indeed I finished the project. I had synchronized on September 5th , the last time.
Then I finished the project and ran mvn clean install successfully!
As You should have known, on September 16th, it was the soft pencils down deadline, meaning that Google evaluates source code written by that date. All source code modified afterwards counts only as further work, not for the GSoC project aim accomplisment.
Today, as You requested, I tried to synchronize again.
Well, except from some conflicts that I resolved for commit purposes, the biggest surprise was to discover that You deleted the classes I worked with. Also, You modified/ deleted some methods that I was using. Unfortunately this is not the first time that You delete classes while I'm using them (just remind You of #TAJO-87, #TAJO-121 , #TAJO-96 which caused me to start work all over again at mid August, even if we had agreed initially on the Software Design Document).
I shall give some examples:
1) I use the class FromTable in several places (e.g. OuterJoinUtil, OuterJoinMeta, OuterJoinRewriteRule), but this class was deleted by You on September 20th, after the project's deadline. I'm talking about Your commit bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column resolving method. (hyunsik) ".
2) From class ScanNode, I use the method getTableId() (e.g. in OuterJoinMeta, in FilterPushdownRule), which is no longer available, as of September 16th when You first renamed it to getTableName in commit 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor GlobalPlanner and global plan data structure".
Afterwards, on September 20th, in commit bd1619de0c0a371363382540f70bd876c08ea765, named "TAJO-186: Improve column resolving method. (hyunsik) ", You dropped the FromTable field from this class and used a TableDesc instead.
3) From class Column, I use the method getTableName() (e.g. in FilterPushdownRule), which is no longer available, as of September 16th, in commit 1b1d1e8c1a6b82ccc5c3ce4daeb9e3daa309cde4 named "TAJO-184: Refactor GlobalPlanner and global plan data structure", You deleted this method.
4) From class SortNode, I use the constructor public SortNode(SortSpec[] sortKeys, Schema inSchema, Schema outSchema) (e.g. in PhysicalPlannerImpl)
but yesterday, on September 23rd , on commit fc018de823dd34d769eb73f3c42e089b0d992b81 named "TAJO-194: LogicalNode should have an identifier to distinguish each logical node instance", You deleted this method.
All these major changes after the project's deadline September 16th, can be handled in the future depending on what You had in your plan when You decided to do all these changes.
As all these changes occured after the project's deadline, I consider the project successfully finished and the proof is the source repository at https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1.
The project status was continuously updated on the project's website that You know : https://sites.google.com/site/gsoc2013tajo34/ , so that the status was always visible for interested parties.
Looking forward to hear from You soon.
Yours sincerely,
Camelia
________________________________
From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
Sent: Sunday, September 15, 2013 12:49 PM
Subject: Re: [GSoc2013] - Outer Join - 2 very important questions
Hi camelia,
I'm sorry for late response. The solution is simple. You can modify
existing source code. I've changed the 121 line in Int4Datum as
follows:
default:
if (datum instanceof NullDatum) {
return DatumFactory.createBool(false);
} else {
throw new InvalidOperationException();
}
Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
other Datum classes need to have the above codes.
Anyway, you work looks very great.
Best regards,
Hyunsik Choi
> __________
Re: [GSoc2013] - Outer Join - 2 very important questions
Posted by Hyunsik Choi <hy...@apache.org>.
Hi camelia,
I'm sorry for late response. The solution is simple. You can modify
existing source code. I've changed the 121 line in Int4Datum as
follows:
default:
if (datum instanceof NullDatum) {
return DatumFactory.createBool(false);
} else {
throw new InvalidOperationException();
}
Then, the all unit tests of TestLeftOuter_NLJoinExec are passed. Also,
other Datum classes need to have the above codes.
Anyway, you work looks very great.
Best regards,
Hyunsik Choi
On Sun, Sep 15, 2013 at 3:55 AM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> 1)
> I'm still waiting for Your answer on this question please.
> I re-tell You the problem.
>
> I have a table A with a column fkA of type INT4 which is a foreign key referencing table B's primary key pkB.
> The data in column fkA contains both numbers and null values (NullDatum).
>
>
> In table B, column pkB is of type INT4 and it is the primary key => all its values are non-null. So in pkB we have only numbers.
>
> When faced with testing the join condition
>
>
> if (joinQual != null) {
> joinQual.eval(qualCtx, inSchema, frameTuple);
> if (joinQual.terminate(qualCtx).asBool()) {
>
> there is a problem:
>
> ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
> at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
> at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
> at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
> at
> org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
> at org.apache.tajo.worker.Task.run(Task.java:378)
> at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
> at java.lang.Thread.run(Thread.java:662)
>
>
> So, once again my question is: what should be modified in order to allow the comparison of null values residents in an INT4 column to numbers in a different table's INT4 column.
>
> 2) How can I perform a specific unit test without having to perform all tests. I tried different choices suggested at http://tajo.incubator.apache.org/build.html, but no success.
> For example, say You want to perform only test TestNLJoinExec.java
> What is the complete command that You type in the terminal?
>
>
>
> Please answer me this message as soon as possible, please.
>
>
> Yours sincerely,
>
> Camelia
>
>
>
>
>
>
>
>
>
>
>
>
>
> ________________________________
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Wednesday, September 11, 2013 11:06 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx (PS)
>
>
> PS:
>
> The problem is either there in BinaryEvalCtx, or in class
>
> public class Int4Datum extends NumericDatum.
>
>
> in the method
> public int compareTo(Datum datum) {
>
> ...
>
>
> @Override
> 127 public int compareTo(Datum datum) {
> 128 switch (datum.type()) {
> 129 case INT2:
> 130 if (val <
> datum.asInt2()) {
> 131 return -1;
> 132 } else if (datum.asInt2() < val) {
> 133 return 1;
> 134 } else {
> 135 return 0;
> 136 }
> 137 case INT4:
> 138 if (val < datum.asInt4()) {
> 139 return -1;
> 140 } else if (datum.asInt4() < val) {
> 141 return
> 1;
> 142 } else {
> 143 return 0;
> 144 }
> 145 case INT8:
> 146 if (val < datum.asInt8()) {
> 147 return -1;
> 148 } else if (datum.asInt8() < val) {
> 149 return 1;
> 150 } else {
> 151 return 0;
> 152 }
> 153 case
> FLOAT4:
> 154 if (val < datum.asFloat4()) {
> 155 return -1;
> 156 } else if (datum.asFloat4() < val) {
> 157 return 1;
> 158 } else {
> 159 return 0;
> 160 }
> 161 case FLOAT8:
> 162 if (val < datum.asFloat8()) {
> 163 return -1;
> 164 } else if (datum.asFloat8() < val)
> {
> 165 return 1;
> 166 } else {
> 167 return 0;
> 168 }
> 169 default:
> 170 throw new InvalidOperationException(datum.type());
> 171 }
> 172 }
> 173
>
>
>
> Just a hint, maybe it helps You find and solve the issue more quickly.
>
> Camelia
>
>
>
>
> ________________________________
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Wednesday, September 11, 2013 10:18 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
>
>
> Hello,
>
> I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.
>
>
> The portion of code leading to error in is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:
>
> if (joinQual != null) {
> joinQual.eval(qualCtx, inSchema,
> frameTuple);
> if (joinQual.terminate(qualCtx).asBool()) {
>
> Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.
>
>
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********rightChild.next() =(0=>777.0, 1=>NULL)
>
> 13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
> at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
> at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
> at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
> at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
> at org.apache.tajo.worker.Task.run(Task.java:378)
> at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
> at java.lang.Thread.run(Thread.java:662)
>
>
> It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.
>
> These classes exist beyond outer join nodes' scope but maybe they
> weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.
>
>
>
> Thank You in advance for Your answer and for the understanding!
>
> Yours sincerely,
> Camelia
Re: [GSoc2013] - Outer Join - 2 very important questions
Posted by Hyunsik Choi <hy...@apache.org>.
According to your project site, you appear to finish your gsoc project one
week ago. Could you let me know a brief status of the project?
- hyunsik
2013년 9월 15일 일요일에 camelia c님이 작성:
> Hello,
>
> 1)
> I'm still waiting for Your answer on this question please.
> I re-tell You the problem.
>
> I have a table A with a column fkA of type INT4 which is a foreign key
> referencing table B's primary key pkB.
> The data in column fkA contains both numbers and null values (NullDatum).
>
>
> In table B, column pkB is of type INT4 and it is the primary key => all
> its values are non-null. So in pkB we have only numbers.
>
> When faced with testing the join condition
>
>
> if (joinQual != null) {
> joinQual.eval(qualCtx, inSchema, frameTuple);
> if (joinQual.terminate(qualCtx).asBool()) {
>
> there is a problem:
>
> ERROR worker.Task:
> org.apache.tajo.datum.exception.InvalidOperationException
> at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
> at
> org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
> at
> org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
> at
>
> org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
> at org.apache.tajo.worker.Task.run(Task.java:378)
> at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
> at java.lang.Thread.run(Thread.java:662)
>
>
> So, once again my question is: what should be modified in order to allow
> the comparison of null values residents in an INT4 column to numbers in a
> different table's INT4 column.
>
> 2) How can I perform a specific unit test without having to perform all
> tests. I tried different choices suggested at
> http://tajo.incubator.apache.org/build.html, but no success.
> For example, say You want to perform only test TestNLJoinExec.java
> What is the complete command that You type in the terminal?
>
>
>
> Please answer me this message as soon as possible, please.
>
>
> Yours sincerely,
>
> Camelia
>
>
>
>
>
>
>
>
>
>
>
>
>
> ________________________________
> From: camelia c <camelie_1985@yahoo.com <javascript:;>>
> To: "dev@tajo.incubator.apache.org <javascript:;>" <
> dev@tajo.incubator.apache.org <javascript:;>>
> Sent: Wednesday, September 11, 2013 11:06 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
> (PS)
>
>
> PS:
>
> The problem is either there in BinaryEvalCtx, or in class
>
> public class Int4Datum extends NumericDatum.
>
>
> in the method
> public int compareTo(Datum datum) {
>
> ...
>
>
> @Override
> 127 public int compareTo(Datum datum) {
> 128 switch (datum.type()) {
> 129 case INT2:
> 130 if (val <
> datum.asInt2()) {
> 131 return -1;
> 132 } else if (datum.asInt2() < val) {
> 133 return 1;
> 134 } else {
> 135 return 0;
> 136 }
> 137 case INT4:
> 138 if (val < datum.asInt4()) {
> 139 return -1;
> 140 } else if (datum.asInt4() < val) {
> 141 return
> 1;
> 142 } else {
> 143 return 0;
> 144 }
> 145 case INT8:
> 146 if (val < datum.asInt8()) {
> 147 return -1;
> 148 } else if (datum.asInt8() < val) {
> 149 return 1;
> 150 } else {
> 151 return 0;
> 152 }
> 153 case
> FLOAT4:
> 154 if (val < datum.asFloat4()) {
> 155 return -1;
> 156 } else if (datum.asFloat4() < val) {
> 157 return 1;
> 158 } else {
> 159 return 0;
> 160 }
> 161 case FLOAT8:
> 162 if (val < datum.asFloat8()) {
> 163 return -1;
> 164 } else if (datum.asFloat8() < val)
> {
> 165 return 1;
> 166 } else {
> 167 return 0;
> 168 }
> 169 default:
> 170 throw new InvalidOperationException(datum.type());
> 171 }
> 172 }
> 173
>
>
>
> Just a hint, maybe it helps You find and solve the issue more quickly.
>
> Camelia
>
>
>
>
> ________________________________
> From: camelia c <camelie_1985@yahoo.com <javascript:;>>
> To: "dev@tajo.incubator.apache.org <javascript:;>" <
> dev@tajo.incubator.apache.org <javascript:;>>
> Sent: Wednesday, September 11, 2013 10:18 PM
> Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
>
>
> Hello,
>
> I would like to kindly ask You to take a moment and explain to me how does
> this BinaryEvalCtx handle NULL values in fields. From an error I got
> recently it seems that when it reads NULL values and it has to compare them
> to non-NULL values, it breaks.
>
>
> The portion of code leading to error in is the same as in the original
> NLJoinExec and other join algorithms, where TAJO needs to verify the join
> condition and then project according to the output schema:
>
> if (joinQual != null) {
> joinQual.eval(qualCtx, inSchema,
> frameTuple);
> if (joinQual.terminate(qualCtx).asBool()) {
>
> Here the verification of the join condition fails if one operand is a NULL
> value and the other is a number, in this case number 10.
>
>
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********leftChild.next() =(0=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********rightChild.next() =(0=>333.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result
> matched padded tuple =(0=>10, 1=>333.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********rightChild.next() =(0=>555.0, 1=>10)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result
> matched padded tuple =(0=>10, 1=>555.0)
>
> 13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
> ********rightChild.next() =(0=>777.0, 1=>NULL)
>
> 13/09/11 21:37:01 ERROR worker.Task:
> org.apache.tajo.datum.exception.InvalidOperationException
> at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
> at
> org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
> at
> org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
> at
> org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
> at org.apache.tajo.worker.Task.run(Task.java:378)
> at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
> at java.lang.Thread.run(Thread.java:662)
>
>
> It seems that at this moment, it takes the NULL as being Int4Datum,
> instead of NullDatum.
>
> These classes exist beyond outer join nodes' scope but maybe they
> weren't tested for NULL values before. Now, some of the outer join's
> processing relies on these classes' correct behavior, that is why I'm
> trying to signal when I find errors.
>
>
>
> Thank You in advance for Your answer and for the understanding!
>
> Yours sincerely,
> Camelia
Re: [GSoc2013] - Outer Join - 2 very important questions
Posted by camelia c <ca...@yahoo.com>.
Hello,
1)
I'm still waiting for Your answer on this question please.
I re-tell You the problem.
I have a table A with a column fkA of type INT4 which is a foreign key referencing table B's primary key pkB.
The data in column fkA contains both numbers and null values (NullDatum).
In table B, column pkB is of type INT4 and it is the primary key => all its values are non-null. So in pkB we have only numbers.
When faced with testing the join condition
if (joinQual != null) {
joinQual.eval(qualCtx, inSchema, frameTuple);
if (joinQual.terminate(qualCtx).asBool()) {
there is a problem:
ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
at
org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
at org.apache.tajo.worker.Task.run(Task.java:378)
at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
at java.lang.Thread.run(Thread.java:662)
So, once again my question is: what should be modified in order to allow the comparison of null values residents in an INT4 column to numbers in a different table's INT4 column.
2) How can I perform a specific unit test without having to perform all tests. I tried different choices suggested at http://tajo.incubator.apache.org/build.html, but no success.
For example, say You want to perform only test TestNLJoinExec.java
What is the complete command that You type in the terminal?
Please answer me this message as soon as possible, please.
Yours sincerely,
Camelia
________________________________
From: camelia c <ca...@yahoo.com>
To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
Sent: Wednesday, September 11, 2013 11:06 PM
Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx (PS)
PS:
The problem is either there in BinaryEvalCtx, or in class
public class Int4Datum extends NumericDatum.
in the method
public int compareTo(Datum datum) {
...
@Override
127 public int compareTo(Datum datum) {
128 switch (datum.type()) {
129 case INT2:
130 if (val <
datum.asInt2()) {
131 return -1;
132 } else if (datum.asInt2() < val) {
133 return 1;
134 } else {
135 return 0;
136 }
137 case INT4:
138 if (val < datum.asInt4()) {
139 return -1;
140 } else if (datum.asInt4() < val) {
141 return
1;
142 } else {
143 return 0;
144 }
145 case INT8:
146 if (val < datum.asInt8()) {
147 return -1;
148 } else if (datum.asInt8() < val) {
149 return 1;
150 } else {
151 return 0;
152 }
153 case
FLOAT4:
154 if (val < datum.asFloat4()) {
155 return -1;
156 } else if (datum.asFloat4() < val) {
157 return 1;
158 } else {
159 return 0;
160 }
161 case FLOAT8:
162 if (val < datum.asFloat8()) {
163 return -1;
164 } else if (datum.asFloat8() < val)
{
165 return 1;
166 } else {
167 return 0;
168 }
169 default:
170 throw new InvalidOperationException(datum.type());
171 }
172 }
173
Just a hint, maybe it helps You find and solve the issue more quickly.
Camelia
________________________________
From: camelia c <ca...@yahoo.com>
To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
Sent: Wednesday, September 11, 2013 10:18 PM
Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
Hello,
I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.
The portion of code leading to error in is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:
if (joinQual != null) {
joinQual.eval(qualCtx, inSchema,
frameTuple);
if (joinQual.terminate(qualCtx).asBool()) {
Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec:
********rightChild.next() =(0=>777.0, 1=>NULL)
13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
at org.apache.tajo.worker.Task.run(Task.java:378)
at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
at java.lang.Thread.run(Thread.java:662)
It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.
These classes exist beyond outer join nodes' scope but maybe they
weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.
Thank You in advance for Your answer and for the understanding!
Yours sincerely,
Camelia
Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx (PS)
Posted by camelia c <ca...@yahoo.com>.
PS:
The problem is either there in BinaryEvalCtx, or in class
public class Int4Datum extends NumericDatum.
in the method
public int compareTo(Datum datum) {
...
@Override
127 public int compareTo(Datum datum) {
128 switch (datum.type()) {
129 case INT2:
130 if (val < datum.asInt2()) {
131 return -1;
132 } else if (datum.asInt2() < val) {
133 return 1;
134 } else {
135 return 0;
136 }
137 case INT4:
138 if (val < datum.asInt4()) {
139 return -1;
140 } else if (datum.asInt4() < val) {
141 return 1;
142 } else {
143 return 0;
144 }
145 case INT8:
146 if (val < datum.asInt8()) {
147 return -1;
148 } else if (datum.asInt8() < val) {
149 return 1;
150 } else {
151 return 0;
152 }
153 case FLOAT4:
154 if (val < datum.asFloat4()) {
155 return -1;
156 } else if (datum.asFloat4() < val) {
157 return 1;
158 } else {
159 return 0;
160 }
161 case FLOAT8:
162 if (val < datum.asFloat8()) {
163 return -1;
164 } else if (datum.asFloat8() < val) {
165 return 1;
166 } else {
167 return 0;
168 }
169 default:
170 throw new InvalidOperationException(datum.type());
171 }
172 }
173
Just a hint, maybe it helps You find and solve the issue more quickly.
Camelia
________________________________
From: camelia c <ca...@yahoo.com>
To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
Sent: Wednesday, September 11, 2013 10:18 PM
Subject: Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
Hello,
I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.
The portion of code leading to error in is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:
if (joinQual != null) {
joinQual.eval(qualCtx, inSchema, frameTuple);
if (joinQual.terminate(qualCtx).asBool()) {
Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>777.0, 1=>NULL)
13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
at org.apache.tajo.worker.Task.run(Task.java:378)
at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
at java.lang.Thread.run(Thread.java:662)
It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.
These classes exist beyond outer join nodes' scope but maybe they weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.
Thank You in advance for Your answer and for the understanding!
Yours sincerely,
Camelia
Re: [GSoc2013] - Outer Join -Possible issue about BinaryEvalCtx
Posted by camelia c <ca...@yahoo.com>.
Hello,
I would like to kindly ask You to take a moment and explain to me how does this BinaryEvalCtx handle NULL values in fields. From an error I got recently it seems that when it reads NULL values and it has to compare them to non-NULL values, it breaks.
The portion of code leading to error in is the same as in the original NLJoinExec and other join algorithms, where TAJO needs to verify the join condition and then project according to the output schema:
if (joinQual != null) {
joinQual.eval(qualCtx, inSchema, frameTuple);
if (joinQual.terminate(qualCtx).asBool()) {
Here the verification of the join condition fails if one operand is a NULL value and the other is a number, in this case number 10.
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********leftChild.next() =(0=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>333.0, 1=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>333.0)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>555.0, 1=>10)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ******** a result matched padded tuple =(0=>10, 1=>555.0)
13/09/11 21:37:01 INFO physical.LeftOuter_NLJoinExec: ********rightChild.next() =(0=>777.0, 1=>NULL)
13/09/11 21:37:01 ERROR worker.Task: org.apache.tajo.datum.exception.InvalidOperationException
at org.apache.tajo.datum.Int4Datum.equalsTo(Int4Datum.java:122)
at org.apache.tajo.engine.eval.BinaryEval.terminate(BinaryEval.java:158)
at org.apache.tajo.engine.planner.physical.LeftOuter_NLJoinExec.next(LeftOuter_NLJoinExec.java:157)
at org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:85)
at org.apache.tajo.worker.Task.run(Task.java:378)
at org.apache.tajo.worker.TaskRunner$2.run(TaskRunner.java:359)
at java.lang.Thread.run(Thread.java:662)
It seems that at this moment, it takes the NULL as being Int4Datum, instead of NullDatum.
These classes exist beyond outer join nodes' scope but maybe they weren't tested for NULL values before. Now, some of the outer join's processing relies on these classes' correct behavior, that is why I'm trying to signal when I find errors.
Thank You in advance for Your answer and for the understanding!
Yours sincerely,
Camelia
Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Posted by camelia c <ca...@yahoo.com>.
Hello,
This is my source repository
https://github.com/camelia-c/incubator-tajo/tree/outerjoin_1
It's the same one that I wrote to You about in the past.
Thank You very much for Your kind help!
Camelia
________________________________
From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
Sent: Monday, September 9, 2013 7:39 PM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Thank you for more detailed information. Are these problems caused by
your working source?
If so, how can I access your recent working source? Your github?
Actually, the recommended way for sharing your problem is as follows:
* create an Jira issue
* submit your patch or your github revision url
* describe your problem (your attached file is already satisfied)
Best regards,
Hyunsik Choi
On Mon, Sep 9, 2013 at 10:04 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> I send You an archive with the 3 problems encountered so far with the
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java
>
> Please be kind to help me solve them.
>
> For each problem there is a separate folder in the archive, containing the
> query, the problem, the TAJO output, the logical plan of MasterLOG and the
> worker's log.
>
> To summarize:
> Problem 1) partial output and
>
> java.lang.NullPointerException
> at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383)
> at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294)
> at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223)
> at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643)
>
> , even if the physical operator's next method returns correct and complete
> results.
>
> Problem 2) incorrect values in tuples received from child nodes
>
> Problem 3) unexpected stop receiving values and
> ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space
>
> The dataset is also concatenated in a separate data file in the archive.
>
>
> Thank You very much!
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
> <ca...@yahoo.com>
> Sent: Monday, September 9, 2013 3:52 AM
>
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
> Hi Camelia,
>
> Could you let me know as follows? If so, it's easier to investigate the
> problem.
>
> * your submitted SQL query
> * which physical operator (NLJoin or MergeJoin?)
> * (if possible) data sample that reproduces the problem
>
> Best regards,
> Hyunsik
>
>
> On Mon, Sep 9, 2013 at 7:30 AM, camelia c <ca...@yahoo.com> wrote:
>> A small addition to the previous message:
>>
>> The value obtained with
>>
>> innerTuple = rightChild.next();
>>
>>
>> is in the join operator.
>>
>>
>> Camelia
>>
>>
>> ----- Forwarded Message -----
>> From: camelia c <ca...@yahoo.com>
>> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
>> Sent: Monday, September 9, 2013 1:25 AM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>>
>> Hello,
>>
>> Thank You very much for You helpful answer of yesterday!
>>
>> While testing, I encountered the following issue: the null values which
>> are read from files are sometimes randomly replaced by numbers such as 24 or
>> 29 or 30. This makes a serious problem for the algorithms! Can You please
>> tell me why do do think this happens and how can it be corrected?
>>
>>
>> Let me give You an example
>>
>> create external table emp1 (emp_id int, first_name text, last_name text,
>> dep_id int, salary float, job_id int) using csv with
>> ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>>
>>
>>
>> I specify null values in file like this:
>>
>> 1000,Tom,Smith,10,333,100
>> 1001,Mary,Thompson,10,555,
>> 1002,Aron,Weber,,777,100
>> 1003,Susan,Carlson,,999,
>>
>> Both the internal nulls and the trailing nulls(those at the end of line)
>> are sometimes randomly substituted with a small number; for example
>> (last_name, salary, emp_id, dep_id) was read from file with
>>
>> innerTuple = rightChild.next();
>>
>> obtaining values innerTuple.toString() as :
>>
>>
>> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>>
>>
>> Sometimes, in other queries the null value is correctly read as NULL.
>>
>>
>>
>> Thank You in advance!
>>
>> Yours sincerely,
>> Camelia
>>
>>
>>
>>
>> ________________________________
>> From: Hyunsik Choi <hy...@apache.org>
>> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
>> <ca...@yahoo.com>
>> Sent: Saturday, September 7, 2013 6:00 PM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>> Hi camelia,
>>
>> I'm sorry for late response. I've just came back home from the family
>> meeting. I leave in-line comments on your question.
>>
>> Best regards,
>> Hyunsik
>>
>>
>> On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
>>
>>> Hello,
>>>
>>> I resend You an updated list of questions that I have. For some of the
>>> ancient ones, I found the answer already.
>>>
>>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and
>>> outerTupleSlots and can You please give me an example of how they are
>>> filled, based on a dummy data set ?
>>
>> Merge join forwards each relation in order
>> to find the same join key
>> tuples. Each of them keeps a list of tuples whose join keys are same.
>> Consider the below examples where there are two relations to be joined
>> and the first column of each relation is the join key.
>>
>> -----------------------------------
>> Two relations to be joined
>> -----------------------------------
>> Left Right
>> (1, A) (1, B)
>> (1, C) (1, C)
>> (3, D) (1, D)
>> (2, E)
>>
>>
>> MergeJoin first finds all the same key tuples for each relation. So,
>> each tuple slot contains as follows:
>>
>> outerTupleSlots : (1, A), (1,C)
>> innerTupleSlots : (1,B), (1, C), (1,D)
>>
>> Then, MergeJoin leads to joined tuples. In the above example,
>> MergeJoin
>> results in 6 tuples (2 x 3).
>>
>>>
>>> 2) I understood from a talk that the MergeJoinExec has some issues and
>>> that Mr Jihoon is trying to fix them. Can I rely on the current version of
>>> MergeJoinExec to extend it for FullOuter_MergeJoinExec and
>>> RightOuter_MergeJoinExec?
>>
>> MergeJoinExec does not have any problem. It is correct. There was a
>> misunderstood.
>>
>>>
>>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain
>>> the block name containing it?
>>> Even for a single-block query, how do we find for a JoinNode that it
>>> belongs to @ROOT, for example?
>>>
>>> More precisely, in class OuterJoinRewriteRule, in method
>>> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode,
>>> Stack<LogicalNode> stack, Integer depth)
>>>
>>> I tried to do
>>> plan.getBlock(joinNode).getName()
>>> but I receive a Null Pointer Exception.
>>>
>>
>> The
>> current API cannot what you want. The API needs to be improved for
>> supporting that. Probably, that is archived by modifying
>> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
>> method with some object including a current block name. I'll create a
>> jira issue for this improvement.
>>
>>
>>>
>>>
>>> I look forward to receiving Your answer!
>>>
>>> Yours sincerely,
>>> Camelia
>
Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Posted by Hyunsik Choi <hy...@apache.org>.
Thank you for more detailed information. Are these problems caused by
your working source?
If so, how can I access your recent working source? Your github?
Actually, the recommended way for sharing your problem is as follows:
* create an Jira issue
* submit your patch or your github revision url
* describe your problem (your attached file is already satisfied)
Best regards,
Hyunsik Choi
On Mon, Sep 9, 2013 at 10:04 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> I send You an archive with the 3 problems encountered so far with the
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java
>
> Please be kind to help me solve them.
>
> For each problem there is a separate folder in the archive, containing the
> query, the problem, the TAJO output, the logical plan of MasterLOG and the
> worker's log.
>
> To summarize:
> Problem 1) partial output and
>
> java.lang.NullPointerException
> at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383)
> at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294)
> at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223)
> at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643)
>
> , even if the physical operator's next method returns correct and complete
> results.
>
> Problem 2) incorrect values in tuples received from child nodes
>
> Problem 3) unexpected stop receiving values and
> ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space
>
> The dataset is also concatenated in a separate data file in the archive.
>
>
> Thank You very much!
> Camelia
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
> <ca...@yahoo.com>
> Sent: Monday, September 9, 2013 3:52 AM
>
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
> Hi Camelia,
>
> Could you let me know as follows? If so, it's easier to investigate the
> problem.
>
> * your submitted SQL query
> * which physical operator (NLJoin or MergeJoin?)
> * (if possible) data sample that reproduces the problem
>
> Best regards,
> Hyunsik
>
>
> On Mon, Sep 9, 2013 at 7:30 AM, camelia c <ca...@yahoo.com> wrote:
>> A small addition to the previous message:
>>
>> The value obtained with
>>
>> innerTuple = rightChild.next();
>>
>>
>> is in the join operator.
>>
>>
>> Camelia
>>
>>
>> ----- Forwarded Message -----
>> From: camelia c <ca...@yahoo.com>
>> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
>> Sent: Monday, September 9, 2013 1:25 AM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>>
>> Hello,
>>
>> Thank You very much for You helpful answer of yesterday!
>>
>> While testing, I encountered the following issue: the null values which
>> are read from files are sometimes randomly replaced by numbers such as 24 or
>> 29 or 30. This makes a serious problem for the algorithms! Can You please
>> tell me why do do think this happens and how can it be corrected?
>>
>>
>> Let me give You an example
>>
>> create external table emp1 (emp_id int, first_name text, last_name text,
>> dep_id int, salary float, job_id int) using csv with
>> ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>>
>>
>>
>> I specify null values in file like this:
>>
>> 1000,Tom,Smith,10,333,100
>> 1001,Mary,Thompson,10,555,
>> 1002,Aron,Weber,,777,100
>> 1003,Susan,Carlson,,999,
>>
>> Both the internal nulls and the trailing nulls(those at the end of line)
>> are sometimes randomly substituted with a small number; for example
>> (last_name, salary, emp_id, dep_id) was read from file with
>>
>> innerTuple = rightChild.next();
>>
>> obtaining values innerTuple.toString() as :
>>
>>
>> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>>
>>
>> Sometimes, in other queries the null value is correctly read as NULL.
>>
>>
>>
>> Thank You in advance!
>>
>> Yours sincerely,
>> Camelia
>>
>>
>>
>>
>> ________________________________
>> From: Hyunsik Choi <hy...@apache.org>
>> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c
>> <ca...@yahoo.com>
>> Sent: Saturday, September 7, 2013 6:00 PM
>> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>>
>>
>> Hi camelia,
>>
>> I'm sorry for late response. I've just came back home from the family
>> meeting. I leave in-line comments on your question.
>>
>> Best regards,
>> Hyunsik
>>
>>
>> On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
>>
>>> Hello,
>>>
>>> I resend You an updated list of questions that I have. For some of the
>>> ancient ones, I found the answer already.
>>>
>>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and
>>> outerTupleSlots and can You please give me an example of how they are
>>> filled, based on a dummy data set ?
>>
>> Merge join forwards each relation in order
>> to find the same join key
>> tuples. Each of them keeps a list of tuples whose join keys are same.
>> Consider the below examples where there are two relations to be joined
>> and the first column of each relation is the join key.
>>
>> -----------------------------------
>> Two relations to be joined
>> -----------------------------------
>> Left Right
>> (1, A) (1, B)
>> (1, C) (1, C)
>> (3, D) (1, D)
>> (2, E)
>>
>>
>> MergeJoin first finds all the same key tuples for each relation. So,
>> each tuple slot contains as follows:
>>
>> outerTupleSlots : (1, A), (1,C)
>> innerTupleSlots : (1,B), (1, C), (1,D)
>>
>> Then, MergeJoin leads to joined tuples. In the above example,
>> MergeJoin
>> results in 6 tuples (2 x 3).
>>
>>>
>>> 2) I understood from a talk that the MergeJoinExec has some issues and
>>> that Mr Jihoon is trying to fix them. Can I rely on the current version of
>>> MergeJoinExec to extend it for FullOuter_MergeJoinExec and
>>> RightOuter_MergeJoinExec?
>>
>> MergeJoinExec does not have any problem. It is correct. There was a
>> misunderstood.
>>
>>>
>>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain
>>> the block name containing it?
>>> Even for a single-block query, how do we find for a JoinNode that it
>>> belongs to @ROOT, for example?
>>>
>>> More precisely, in class OuterJoinRewriteRule, in method
>>> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode,
>>> Stack<LogicalNode> stack, Integer depth)
>>>
>>> I tried to do
>>> plan.getBlock(joinNode).getName()
>>> but I receive a Null Pointer Exception.
>>>
>>
>> The
>> current API cannot what you want. The API needs to be improved for
>> supporting that. Probably, that is archived by modifying
>> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
>> method with some object including a current block name. I'll create a
>> jira issue for this improvement.
>>
>>
>>>
>>>
>>> I look forward to receiving Your answer!
>>>
>>> Yours sincerely,
>>> Camelia
>
Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Posted by camelia c <ca...@yahoo.com>.
Hello,
I send You an archive with the 3 problems encountered so far with the
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java
Please be kind to help me solve them.
For each problem there is a separate folder in the archive, containing the query, the problem, the TAJO output, the logical plan of MasterLOG and the worker's log.
To summarize:
Problem 1) partial output and
java.lang.NullPointerException
at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383)
at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294)
at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223)
at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643)
, even if the physical operator's next method returns correct and complete results.
Problem 2) incorrect values in tuples received from child nodes
Problem 3) unexpected stop receiving values and
ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space
The dataset is also concatenated in a separate data file in the archive.
Thank You very much!
Camelia
________________________________
From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
Sent: Monday, September 9, 2013 3:52 AM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Hi Camelia,
Could you let me know as follows? If so, it's easier to investigate the problem.
* your submitted SQL query
* which physical operator (NLJoin or MergeJoin?)
* (if possible) data sample that reproduces the problem
Best regards,
Hyunsik
On Mon, Sep 9, 2013 at 7:30 AM, camelia c <ca...@yahoo.com> wrote:
> A small addition to the previous message:
>
> The value obtained with
>
> innerTuple = rightChild.next();
>
>
> is in the join operator.
>
>
> Camelia
>
>
> ----- Forwarded Message -----
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Monday, September 9, 2013 1:25 AM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
>
> Hello,
>
> Thank You very much for You helpful answer of yesterday!
>
> While testing, I encountered the following issue: the null values which are read from files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious problem for the algorithms! Can You please tell me why do do think this happens and how can it be corrected?
>
>
> Let me give You an example
>
> create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>
>
>
> I specify null values in file like this:
>
> 1000,Tom,Smith,10,333,100
> 1001,Mary,Thompson,10,555,
> 1002,Aron,Weber,,777,100
> 1003,Susan,Carlson,,999,
>
> Both the internal nulls and the trailing nulls(those at the end of line) are sometimes randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id) was read from file with
>
> innerTuple = rightChild.next();
>
> obtaining values innerTuple.toString() as :
>
>
> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>
>
> Sometimes, in other queries the null value is correctly read as NULL.
>
>
>
> Thank You in advance!
>
> Yours sincerely,
> Camelia
>
>
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
> Sent: Saturday, September 7, 2013 6:00 PM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
> Hi camelia,
>
> I'm sorry for late response. I've just came back home from the family
> meeting. I leave in-line comments on your question.
>
> Best regards,
> Hyunsik
>
>
> On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
>
>> Hello,
>>
>> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>>
>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?
>
> Merge join forwards each relation in order
> to find the same join key
> tuples. Each of them keeps a list of tuples whose join keys are same.
> Consider the below examples where there are two relations to be joined
> and the first column of each relation is the join key.
>
> -----------------------------------
> Two relations to be joined
> -----------------------------------
> Left Right
> (1, A) (1, B)
> (1, C) (1, C)
> (3, D) (1, D)
> (2, E)
>
>
> MergeJoin first finds all the same key tuples for each relation. So,
> each tuple slot contains as follows:
>
> outerTupleSlots : (1, A), (1,C)
> innerTupleSlots : (1,B), (1, C), (1,D)
>
> Then, MergeJoin leads to joined tuples. In the above example,
> MergeJoin
> results in 6 tuples (2 x 3).
>
>>
>> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
>
> MergeJoinExec does not have any problem. It is correct. There was a
> misunderstood.
>
>>
>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
>> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>>
>> More precisely, in class OuterJoinRewriteRule, in method
>> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>>
>> I tried to do
>> plan.getBlock(joinNode).getName()
>> but I receive a Null Pointer Exception.
>>
>
> The
> current API cannot what you want. The API needs to be improved for
> supporting that. Probably, that is archived by modifying
> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
> method with some object including a current block name. I'll create a
> jira issue for this improvement.
>
>
>>
>>
>> I look forward to receiving Your answer!
>>
>> Yours sincerely,
>> Camelia
Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Posted by Hyunsik Choi <hy...@apache.org>.
Hi Camelia,
Could you let me know as follows? If so, it's easier to investigate the problem.
* your submitted SQL query
* which physical operator (NLJoin or MergeJoin?)
* (if possible) data sample that reproduces the problem
Best regards,
Hyunsik
On Mon, Sep 9, 2013 at 7:30 AM, camelia c <ca...@yahoo.com> wrote:
> A small addition to the previous message:
>
> The value obtained with
>
> innerTuple = rightChild.next();
>
>
> is in the join operator.
>
>
> Camelia
>
>
> ----- Forwarded Message -----
> From: camelia c <ca...@yahoo.com>
> To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
> Sent: Monday, September 9, 2013 1:25 AM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
>
> Hello,
>
> Thank You very much for You helpful answer of yesterday!
>
> While testing, I encountered the following issue: the null values which are read from files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious problem for the algorithms! Can You please tell me why do do think this happens and how can it be corrected?
>
>
> Let me give You an example
>
> create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
>
>
>
> I specify null values in file like this:
>
> 1000,Tom,Smith,10,333,100
> 1001,Mary,Thompson,10,555,
> 1002,Aron,Weber,,777,100
> 1003,Susan,Carlson,,999,
>
> Both the internal nulls and the trailing nulls(those at the end of line) are sometimes randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id) was read from file with
>
> innerTuple = rightChild.next();
>
> obtaining values innerTuple.toString() as :
>
>
> (0=>Weber, 1=>777.0, 2=>1002, 3=>29)
>
>
> Sometimes, in other queries the null value is correctly read as NULL.
>
>
>
> Thank You in advance!
>
> Yours sincerely,
> Camelia
>
>
>
>
> ________________________________
> From: Hyunsik Choi <hy...@apache.org>
> To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
> Sent: Saturday, September 7, 2013 6:00 PM
> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
>
>
> Hi camelia,
>
> I'm sorry for late response. I've just came back home from the family
> meeting. I leave in-line comments on your question.
>
> Best regards,
> Hyunsik
>
>
> On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
>
>> Hello,
>>
>> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>>
>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?
>
> Merge join forwards each relation in order
> to find the same join key
> tuples. Each of them keeps a list of tuples whose join keys are same.
> Consider the below examples where there are two relations to be joined
> and the first column of each relation is the join key.
>
> -----------------------------------
> Two relations to be joined
> -----------------------------------
> Left Right
> (1, A) (1, B)
> (1, C) (1, C)
> (3, D) (1, D)
> (2, E)
>
>
> MergeJoin first finds all the same key tuples for each relation. So,
> each tuple slot contains as follows:
>
> outerTupleSlots : (1, A), (1,C)
> innerTupleSlots : (1,B), (1, C), (1,D)
>
> Then, MergeJoin leads to joined tuples. In the above example,
> MergeJoin
> results in 6 tuples (2 x 3).
>
>>
>> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
>
> MergeJoinExec does not have any problem. It is correct. There was a
> misunderstood.
>
>>
>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
>> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>>
>> More precisely, in class OuterJoinRewriteRule, in method
>> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>>
>> I tried to do
>> plan.getBlock(joinNode).getName()
>> but I receive a Null Pointer Exception.
>>
>
> The
> current API cannot what you want. The API needs to be improved for
> supporting that. Probably, that is archived by modifying
> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
> method with some object including a current block name. I'll create a
> jira issue for this improvement.
>
>
>>
>>
>> I look forward to receiving Your answer!
>>
>> Yours sincerely,
>> Camelia
[GSoc2013] - Outer Join - a question about MergeJoinExec
Posted by camelia c <ca...@yahoo.com>.
A small addition to the previous message:
The value obtained with
innerTuple = rightChild.next();
is in the join operator.
Camelia
----- Forwarded Message -----
From: camelia c <ca...@yahoo.com>
To: "dev@tajo.incubator.apache.org" <de...@tajo.incubator.apache.org>
Sent: Monday, September 9, 2013 1:25 AM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Hello,
Thank You very much for You helpful answer of yesterday!
While testing, I encountered the following issue: the null values which are read from files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious problem for the algorithms! Can You please tell me why do do think this happens and how can it be corrected?
Let me give You an example
create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
I specify null values in file like this:
1000,Tom,Smith,10,333,100
1001,Mary,Thompson,10,555,
1002,Aron,Weber,,777,100
1003,Susan,Carlson,,999,
Both the internal nulls and the trailing nulls(those at the end of line) are sometimes randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id) was read from file with
innerTuple = rightChild.next();
obtaining values innerTuple.toString() as :
(0=>Weber, 1=>777.0, 2=>1002, 3=>29)
Sometimes, in other queries the null value is correctly read as NULL.
Thank You in advance!
Yours sincerely,
Camelia
________________________________
From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
Sent: Saturday, September 7, 2013 6:00 PM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Hi camelia,
I'm sorry for late response. I've just came back home from the family
meeting. I leave in-line comments on your question.
Best regards,
Hyunsik
On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>
> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?
Merge join forwards each relation in order
to find the same join key
tuples. Each of them keeps a list of tuples whose join keys are same.
Consider the below examples where there are two relations to be joined
and the first column of each relation is the join key.
-----------------------------------
Two relations to be joined
-----------------------------------
Left Right
(1, A) (1, B)
(1, C) (1, C)
(3, D) (1, D)
(2, E)
MergeJoin first finds all the same key tuples for each relation. So,
each tuple slot contains as follows:
outerTupleSlots : (1, A), (1,C)
innerTupleSlots : (1,B), (1, C), (1,D)
Then, MergeJoin leads to joined tuples. In the above example,
MergeJoin
results in 6 tuples (2 x 3).
>
> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
MergeJoinExec does not have any problem. It is correct. There was a
misunderstood.
>
> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>
> More precisely, in class OuterJoinRewriteRule, in method
> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>
> I tried to do
> plan.getBlock(joinNode).getName()
> but I receive a Null Pointer Exception.
>
The
current API cannot what you want. The API needs to be improved for
supporting that. Probably, that is archived by modifying
BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
method with some object including a current block name. I'll create a
jira issue for this improvement.
>
>
> I look forward to receiving Your answer!
>
> Yours sincerely,
> Camelia
Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Posted by camelia c <ca...@yahoo.com>.
Hello,
Thank You very much for You helpful answer of yesterday!
While testing, I encountered the following issue: the null values which are read from files are sometimes randomly replaced by numbers such as 24 or 29 or 30. This makes a serious problem for the algorithms! Can You please tell me why do do think this happens and how can it be corrected?
Let me give You an example
create external table emp1 (emp_id int, first_name text, last_name text, dep_id int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1';
I specify null values in file like this:
1000,Tom,Smith,10,333,100
1001,Mary,Thompson,10,555,
1002,Aron,Weber,,777,100
1003,Susan,Carlson,,999,
Both the internal nulls and the trailing nulls(those at the end of line) are sometimes randomly substituted with a small number; for example (last_name, salary, emp_id, dep_id) was read from file with
innerTuple = rightChild.next();
obtaining values innerTuple.toString() as :
(0=>Weber, 1=>777.0, 2=>1002, 3=>29)
Sometimes, in other queries the null value is correctly read as NULL.
Thank You in advance!
Yours sincerely,
Camelia
________________________________
From: Hyunsik Choi <hy...@apache.org>
To: tajo-dev <de...@tajo.incubator.apache.org>; camelia c <ca...@yahoo.com>
Sent: Saturday, September 7, 2013 6:00 PM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Hi camelia,
I'm sorry for late response. I've just came back home from the family
meeting. I leave in-line comments on your question.
Best regards,
Hyunsik
On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>
> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?
Merge join forwards each relation in order to find the same join key
tuples. Each of them keeps a list of tuples whose join keys are same.
Consider the below examples where there are two relations to be joined
and the first column of each relation is the join key.
-----------------------------------
Two relations to be joined
-----------------------------------
Left Right
(1, A) (1, B)
(1, C) (1, C)
(3, D) (1, D)
(2, E)
MergeJoin first finds all the same key tuples for each relation. So,
each tuple slot contains as follows:
outerTupleSlots : (1, A), (1,C)
innerTupleSlots : (1,B), (1, C), (1,D)
Then, MergeJoin leads to joined tuples. In the above example,
MergeJoin results in 6 tuples (2 x 3).
>
> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
MergeJoinExec does not have any problem. It is correct. There was a
misunderstood.
>
> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>
> More precisely, in class OuterJoinRewriteRule, in method
> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>
> I tried to do
> plan.getBlock(joinNode).getName()
> but I receive a Null Pointer Exception.
>
The current API cannot what you want. The API needs to be improved for
supporting that. Probably, that is archived by modifying
BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
method with some object including a current block name. I'll create a
jira issue for this improvement.
>
>
> I look forward to receiving Your answer!
>
> Yours sincerely,
> Camelia
Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
Posted by Hyunsik Choi <hy...@apache.org>.
Hi camelia,
I'm sorry for late response. I've just came back home from the family
meeting. I leave in-line comments on your question.
Best regards,
Hyunsik
On Sep 7, 2013, at 8:42 PM, camelia c <ca...@yahoo.com> wrote:
> Hello,
>
> I resend You an updated list of questions that I have. For some of the ancient ones, I found the answer already.
>
> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and outerTupleSlots and can You please give me an example of how they are filled, based on a dummy data set ?
Merge join forwards each relation in order to find the same join key
tuples. Each of them keeps a list of tuples whose join keys are same.
Consider the below examples where there are two relations to be joined
and the first column of each relation is the join key.
-----------------------------------
Two relations to be joined
-----------------------------------
Left Right
(1, A) (1, B)
(1, C) (1, C)
(3, D) (1, D)
(2, E)
MergeJoin first finds all the same key tuples for each relation. So,
each tuple slot contains as follows:
outerTupleSlots : (1, A), (1,C)
innerTupleSlots : (1,B), (1, C), (1,D)
Then, MergeJoin leads to joined tuples. In the above example,
MergeJoin results in 6 tuples (2 x 3).
>
> 2) I understood from a talk that the MergeJoinExec has some issues and that Mr Jihoon is trying to fix them. Can I rely on the current version of MergeJoinExec to extend it for FullOuter_MergeJoinExec and RightOuter_MergeJoinExec?
MergeJoinExec does not have any problem. It is correct. There was a
misunderstood.
>
> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the block name containing it?
> Even for a single-block query, how do we find for a JoinNode that it belongs to @ROOT, for example?
>
> More precisely, in class OuterJoinRewriteRule, in method
> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, Stack<LogicalNode> stack, Integer depth)
>
> I tried to do
> plan.getBlock(joinNode).getName()
> but I receive a Null Pointer Exception.
>
The current API cannot what you want. The API needs to be improved for
supporting that. Probably, that is archived by modifying
BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
method with some object including a current block name. I'll create a
jira issue for this improvement.
>
>
> I look forward to receiving Your answer!
>
> Yours sincerely,
> Camelia