You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Navis Ryu <na...@nexr.com> on 2014/10/08 06:06:07 UTC

Review Request 26435: Self join may fail if one side has VCs and other doesn't

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26435/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-8186
    https://issues.apache.org/jira/browse/HIVE-8186


Repository: hive-git


Description
-------

{noformat}select t1.BLOCK__OFFSET__INSIDE__FILE,t2.BLOCK__OFFSET__INSIDE__FILE
from src t1 join src t2 on t1.key = t2.key;{noformat}
Passes
{noformat}select t2.BLOCK__OFFSET__INSIDE__FILE
from src t1 join src t2 on t1.key = t2.key;{noformat}
Fails.

The issue is that LazyBinarySerDe OI receives data intended for UnionStructObjectInspector.
Judging by the above it has something to do with scanning table once for two aliases.

I'll look tomorrow


Diffs
-----

  data/conf/hive-log4j.properties 7f5dfc4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java f624bf4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 3dc7c76 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d8698da 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 155002a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java 311f6d6 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 78d4d1f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestOperators.java 90e4cad 
  ql/src/test/queries/clientpositive/join_vc.q 63b3da7 
  ql/src/test/results/clientpositive/join_vc.q.out 12004ca 

Diff: https://reviews.apache.org/r/26435/diff/


Testing
-------


Thanks,

Navis Ryu


Re: Review Request 26435: Self join may fail if one side has VCs and other doesn't

Posted by Navis Ryu <na...@nexr.com>.

> On Oct. 8, 2014, 6:44 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java, line 85
> > <https://reviews.apache.org/r/26435/diff/1/?file=715105#file715105line85>
> >
> >     is it possible to document those with a small comment; as well as MapOpMeta and MapOpCtx classes?

Sure.


> On Oct. 8, 2014, 6:44 p.m., Sergey Shelukhin wrote:
> > data/conf/hive-log4j.properties, line 84
> > <https://reviews.apache.org/r/26435/diff/1/?file=715104#file715104line84>
> >
> >     are ObjectStore and Operator lines intended? they'd affect logging that might be useful for other tests

ObjectStore is a little noisy, IMHO. But ok, I'll revert that.


> On Oct. 8, 2014, 6:44 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java, line 368
> > <https://reviews.apache.org/r/26435/diff/1/?file=715105#file715105line368>
> >
> >     why are the calles to SerDeUtils gone? Just asking

It's same with pd.getDeserializer(hconf);


> On Oct. 8, 2014, 6:44 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java, line 467
> > <https://reviews.apache.org/r/26435/diff/1/?file=715105#file715105line467>
> >
> >     one child operator can be present for several paths or in several MapOpCtx?

Can be. But rowObjectInspector is handed over once and cannot be changed. So all OIs of MapOpCtxs for the operator should be all the same.


> On Oct. 8, 2014, 6:44 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java, line 445
> > <https://reviews.apache.org/r/26435/diff/1/?file=715105#file715105line445>
> >
> >     is children.size() check no longer necessary?

I'm not sure on this a little. Previously, setChildren() made childrenOpToOpCtxMap only for operators handling first input path. Now it makes contexts for all operators in conf.getAliasToWork(). It's why children.size() == 0 check and extraChildrenToClose is removed.


> On Oct. 8, 2014, 6:44 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java, line 425
> > <https://reviews.apache.org/r/26435/diff/1/?file=715105#file715105line425>
> >
> >     this creates a context and checks for its presence, but then puts the result of some call taking the context to map. Would it make sense to put context into map first and then do processing on it (that I assume the call does)? That would be less confusing.

Line 372 adds onefile-->List<MapOpCtx> mapping and context is added to List<MapOpCtx> (which is 'contexts' in source code).


> On Oct. 8, 2014, 6:44 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java, line 202
> > <https://reviews.apache.org/r/26435/diff/1/?file=715105#file715105line202>
> >
> >     nit: hash codes could be combined similar to how Java does it ( * prime + next)

hashCode and equals seemed not needed. I'll remove those two.


- Navis


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26435/#review55851
-----------------------------------------------------------


On Oct. 8, 2014, 4:06 a.m., Navis Ryu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26435/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 4:06 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-8186
>     https://issues.apache.org/jira/browse/HIVE-8186
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> {noformat}select t1.BLOCK__OFFSET__INSIDE__FILE,t2.BLOCK__OFFSET__INSIDE__FILE
> from src t1 join src t2 on t1.key = t2.key;{noformat}
> Passes
> {noformat}select t2.BLOCK__OFFSET__INSIDE__FILE
> from src t1 join src t2 on t1.key = t2.key;{noformat}
> Fails.
> 
> The issue is that LazyBinarySerDe OI receives data intended for UnionStructObjectInspector.
> Judging by the above it has something to do with scanning table once for two aliases.
> 
> I'll look tomorrow
> 
> 
> Diffs
> -----
> 
>   data/conf/hive-log4j.properties 7f5dfc4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java f624bf4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 3dc7c76 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d8698da 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 155002a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java 311f6d6 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 78d4d1f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestOperators.java 90e4cad 
>   ql/src/test/queries/clientpositive/join_vc.q 63b3da7 
>   ql/src/test/results/clientpositive/join_vc.q.out 12004ca 
> 
> Diff: https://reviews.apache.org/r/26435/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Navis Ryu
> 
>


Re: Review Request 26435: Self join may fail if one side has VCs and other doesn't

Posted by Sergey Shelukhin <se...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26435/#review55851
-----------------------------------------------------------



data/conf/hive-log4j.properties
<https://reviews.apache.org/r/26435/#comment96225>

    are ObjectStore and Operator lines intended? they'd affect logging that might be useful for other tests



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/26435/#comment96232>

    is it possible to document those with a small comment; as well as MapOpMeta and MapOpCtx classes?



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/26435/#comment96226>

    nit: hash codes could be combined similar to how Java does it ( * prime + next)



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/26435/#comment96227>

    why are the calles to SerDeUtils gone? Just asking



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/26435/#comment96228>

    this creates a context and checks for its presence, but then puts the result of some call taking the context to map. Would it make sense to put context into map first and then do processing on it (that I assume the call does)? That would be less confusing.



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/26435/#comment96229>

    is children.size() check no longer necessary?



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/26435/#comment96230>

    path is never used



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/26435/#comment96233>

    one child operator can be present for several paths or in several MapOpCtx?


- Sergey Shelukhin


On Oct. 8, 2014, 4:06 a.m., Navis Ryu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26435/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 4:06 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-8186
>     https://issues.apache.org/jira/browse/HIVE-8186
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> {noformat}select t1.BLOCK__OFFSET__INSIDE__FILE,t2.BLOCK__OFFSET__INSIDE__FILE
> from src t1 join src t2 on t1.key = t2.key;{noformat}
> Passes
> {noformat}select t2.BLOCK__OFFSET__INSIDE__FILE
> from src t1 join src t2 on t1.key = t2.key;{noformat}
> Fails.
> 
> The issue is that LazyBinarySerDe OI receives data intended for UnionStructObjectInspector.
> Judging by the above it has something to do with scanning table once for two aliases.
> 
> I'll look tomorrow
> 
> 
> Diffs
> -----
> 
>   data/conf/hive-log4j.properties 7f5dfc4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java f624bf4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 3dc7c76 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d8698da 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 155002a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java 311f6d6 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 78d4d1f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestOperators.java 90e4cad 
>   ql/src/test/queries/clientpositive/join_vc.q 63b3da7 
>   ql/src/test/results/clientpositive/join_vc.q.out 12004ca 
> 
> Diff: https://reviews.apache.org/r/26435/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Navis Ryu
> 
>


Re: Review Request 26435: Self join may fail if one side has VCs and other doesn't

Posted by Navis Ryu <na...@nexr.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26435/
-----------------------------------------------------------

(Updated Oct. 14, 2014, 2:36 a.m.)


Review request for hive.


Changes
-------

Addressed comments and modified the way handing over the table name to script operator


Bugs: HIVE-8186
    https://issues.apache.org/jira/browse/HIVE-8186


Repository: hive-git


Description
-------

{noformat}select t1.BLOCK__OFFSET__INSIDE__FILE,t2.BLOCK__OFFSET__INSIDE__FILE
from src t1 join src t2 on t1.key = t2.key;{noformat}
Passes
{noformat}select t2.BLOCK__OFFSET__INSIDE__FILE
from src t1 join src t2 on t1.key = t2.key;{noformat}
Fails.

The issue is that LazyBinarySerDe OI receives data intended for UnionStructObjectInspector.
Judging by the above it has something to do with scanning table once for two aliases.

I'll look tomorrow


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6a1a5f0 
  data/conf/hive-log4j.properties 7f5dfc4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java f624bf4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 3dc7c76 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d8698da 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 8228e09 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 155002a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java 311f6d6 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 78d4d1f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestOperators.java 90e4cad 
  ql/src/test/queries/clientpositive/join_vc.q 63b3da7 
  ql/src/test/results/clientpositive/join_vc.q.out 12004ca 

Diff: https://reviews.apache.org/r/26435/diff/


Testing
-------


Thanks,

Navis Ryu