You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Szehon Ho <sz...@cloudera.com> on 2014/03/07 23:12:23 UTC

Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/
-----------------------------------------------------------

Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Repository: hive-git


Description
-------

The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  

The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance it caches based on an array list, of all object inspectors of columns.  The second time the query is run, it is attempted to lookup cached value.  When the DeepParquetHiveMapInspector is gotten to, java calls .equals but it fails as the .equals method casted it to a "StandardParquetHiveInspector".

Regenerating the .equals and .hashcode from eclipse.  

Also adding one more check in .equals before casting, as if by luck another type of object inspector gets hashed to the same hashcode in the cache, java would call .equals against it to find it if it is the key.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 

Diff: https://reviews.apache.org/r/18925/diff/


Testing
-------

Manual testing.


Thanks,

Szehon Ho


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36576
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
<https://reviews.apache.org/r/18925/#comment67582>

    nit: trim ws


- Brock Noland


On March 7, 2014, 10:12 p.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 7, 2014, 10:12 p.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance it caches based on an array list, of all object inspectors of columns.  The second time the query is run, it is attempted to lookup cached value.  When the DeepParquetHiveMapInspector is gotten to, java calls .equals but it fails as the .equals method casted it to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, as if by luck another type of object inspector gets hashed to the same hashcode in the cache, java would call .equals against it to find it if it is the key.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Szehon Ho <sz...@cloudera.com>.

> On March 7, 2014, 11:06 p.m., Xuefu Zhang wrote:
> >

Thanks for the review, put some response and will do the refactor.


> On March 7, 2014, 11:06 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java, line 154
> > <https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154>
> >
> >     I don't think we should check the class equality here. A child class might inherit a parent class w/o any new identifier. In that case the child class can still be equal to the parent class. It's really up to the child class to decide the equality (by overriding the equal()).

Hm if you really wanted to achieve that, a child of this class would be able to override .equals.  So why would this check not be appropriate here?

Also I don't think its good practice to allow this choice  , as what you are suggesting would make parent return true for .equals(children), and you are giving a choice for children to choose whether .equals(parent) is true for them.  If its not consistent, it would break transitive property of .equals.

Imo, the standard for this should be two elements are equal only if they are same class, that is guaranteeing the transitive property of equals.


> On March 7, 2014, 11:06 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java, line 164
> > <https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line164>
> >
> >     1. code style: we may need {} for else if block
> >     
> >     2. However, I think this might be refactored as follows for easier understanding:
> >     
> >     boolean keyEq = (keyInspector == null && other.keyInspector == null) || keyInspector.equal(other.keyInspector);
> >     boolean valEq = (ValueInspector == null && other.valueInspector == null) || valueInspector.equal(other.valueInspector);
> >     return keyEq && valEq;

Yea I'll refactor it.  This is the default generated one, which is not adhering to proper java-style.


- Szehon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36578
-----------------------------------------------------------


On March 7, 2014, 10:20 p.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 7, 2014, 10:20 p.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36578
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
<https://reviews.apache.org/r/18925/#comment67584>

    I don't think we should check the class equality here. A child class might inherit a parent class w/o any new identifier. In that case the child class can still be equal to the parent class. It's really up to the child class to decide the equality (by overriding the equal()).



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
<https://reviews.apache.org/r/18925/#comment67585>

    1. code style: we may need {} for else if block
    
    2. However, I think this might be refactored as follows for easier understanding:
    
    boolean keyEq = (keyInspector == null && other.keyInspector == null) || keyInspector.equal(other.keyInspector);
    boolean valEq = (ValueInspector == null && other.valueInspector == null) || valueInspector.equal(other.valueInspector);
    return keyEq && valEq;


- Xuefu Zhang


On March 7, 2014, 10:20 p.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 7, 2014, 10:20 p.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Xuefu Zhang <xz...@cloudera.com>.

> On March 8, 2014, 12:33 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java, line 154
> > <https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154>
> >
> >     I guess I wasn't clear. It's not inapproapriate, but goes beyond its responsibility. Equality implementation is within a context which is the class. The instance to be checked doesn't necessarily has the runtime class info. In fact, the context shouldn't be aware the runtime class of these instances, as child classes can be added any time. Doing getClass == other.getClass() goes beyond the current context.
> >     
> >     What's more appropriate to check type compatibility by calling if (other instanceof this.class). This is different from checking this.getClass() == other.getClass().
> >     
> >     Take Java ArrayList.equals() method as an example. (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29). This method doesn't do runtime class check. The implementation is saying, other.getClass() doesn't have to be ArrayList, but has to be an instance of ArrayList. It could be an instance of MyArrayList as long as MyArrayList is inherited from ArrayList.
> >     
> >     If we think it's more protical to do such a check, we'd expect that ArrayList.equals() would also check this.getClass() == other.getClass().
> >     
> >     Btw, I don't understand how it breaks transitivity by removing this check.
> >     
> >     I understand this check was there before your change. I missed it in my previous review.
> >
> 
> Szehon Ho wrote:
>     Hm I actually did not realize that Java's code has that for collections, thanks for pointing that out.  I suppose in list case, the semantic is the user doesn't care about list implementation, but about the contents. 
>     
>     What I meant about breaking the transitive property if you allow each class to decide:  Say we remove the check of RT class equality.  There is a subclass called 'A' which choose to override equal to return true only if 'other' is A.  Another subclass 'B' doesn't override .equals, and by inheritance can return true if 'other' is any subclass of parent (A or B).  A.equals(B) is false, B.equals(A) is true, breaking transitive.  Now that I think about it, this argument doesn't justify having the parent one way or another, all I meant is that a class cannot implement .equals just in its own context as you mentioned, all subclass must choose the same way to be consistent, and I felt that having this check at the parent would ensure that all the children followed it.
>     
>     But coming back down to this particular issue, I still don't think its safe to remove that check.  There are two subclass of AbstractParquetMapInspector, the Deep and Standard one depending on the type of map.  If we don't do this check, then Deep will be considered equal to Standard, and perhaps the wrong one may be returned from cache and used in the inspection, they are not interchangeable.  This is unlike java list,map, here the actual class matters more than the content.  At least that is my understanding looking at the code.

okay. Frankly, I don't know what's the difference between the two child class: the whole parquet code is very confusing. Since the code was there before this, it's fine to keep it as it is.


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36586
-----------------------------------------------------------


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 8, 2014, 12:01 a.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Szehon Ho <sz...@cloudera.com>.

> On March 8, 2014, 12:33 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java, line 154
> > <https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154>
> >
> >     I guess I wasn't clear. It's not inapproapriate, but goes beyond its responsibility. Equality implementation is within a context which is the class. The instance to be checked doesn't necessarily has the runtime class info. In fact, the context shouldn't be aware the runtime class of these instances, as child classes can be added any time. Doing getClass == other.getClass() goes beyond the current context.
> >     
> >     What's more appropriate to check type compatibility by calling if (other instanceof this.class). This is different from checking this.getClass() == other.getClass().
> >     
> >     Take Java ArrayList.equals() method as an example. (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29). This method doesn't do runtime class check. The implementation is saying, other.getClass() doesn't have to be ArrayList, but has to be an instance of ArrayList. It could be an instance of MyArrayList as long as MyArrayList is inherited from ArrayList.
> >     
> >     If we think it's more protical to do such a check, we'd expect that ArrayList.equals() would also check this.getClass() == other.getClass().
> >     
> >     Btw, I don't understand how it breaks transitivity by removing this check.
> >     
> >     I understand this check was there before your change. I missed it in my previous review.
> >

Hm I actually did not realize that Java's code has that for collections, thanks for pointing that out.  I suppose in list case, the semantic is the user doesn't care about list implementation, but about the contents. 

What I meant about breaking the transitive property if you allow each class to decide:  Say we remove the check of RT class equality.  There is a subclass called 'A' which choose to override equal to return true only if 'other' is A.  Another subclass 'B' doesn't override .equals, and by inheritance can return true if 'other' is any subclass of parent (A or B).  A.equals(B) is false, B.equals(A) is true, breaking transitive.  Now that I think about it, this argument doesn't justify having the parent one way or another, all I meant is that a class cannot implement .equals just in its own context as you mentioned, all subclass must choose the same way to be consistent, and I felt that having this check at the parent would ensure that all the children followed it.

But coming back down to this particular issue, I still don't think its safe to remove that check.  There are two subclass of AbstractParquetMapInspector, the Deep and Standard one depending on the type of map.  If we don't do this check, then Deep will be considered equal to Standard, and perhaps the wrong one may be returned from cache and used in the inspection, they are not interchangeable.  This is unlike java list,map, here the actual class matters more than the content.  At least that is my understanding looking at the code.


- Szehon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36586
-----------------------------------------------------------


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 8, 2014, 12:01 a.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36586
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
<https://reviews.apache.org/r/18925/#comment67592>

    I guess I wasn't clear. It's not inapproapriate, but goes beyond its responsibility. Equality implementation is within a context which is the class. The instance to be checked doesn't necessarily has the runtime class info. In fact, the context shouldn't be aware the runtime class of these instances, as child classes can be added any time. Doing getClass == other.getClass() goes beyond the current context.
    
    What's more appropriate to check type compatibility by calling if (other instanceof this.class). This is different from checking this.getClass() == other.getClass().
    
    Take Java ArrayList.equals() method as an example. (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29). This method doesn't do runtime class check. The implementation is saying, other.getClass() doesn't have to be ArrayList, but has to be an instance of ArrayList. It could be an instance of MyArrayList as long as MyArrayList is inherited from ArrayList.
    
    If we think it's more protical to do such a check, we'd expect that ArrayList.equals() would also check this.getClass() == other.getClass().
    
    Btw, I don't understand how it breaks transitivity by removing this check.
    
    I understand this check was there before your change. I missed it in my previous review.
    


- Xuefu Zhang


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 8, 2014, 12:01 a.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by justin coffey <j....@criteo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36644
-----------------------------------------------------------

Ship it!


go for r3 with the getClass (and no instanceof) check and {} formatting.

- justin coffey


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 8, 2014, 12:01 a.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Szehon Ho <sz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/
-----------------------------------------------------------

(Updated March 8, 2014, 12:01 a.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Changes
-------

Fixed the styling of generated equals code.  I looked at the proposed refactor, I think the simplication you suggested would hit NPE if this.keyInspector is null and other.keyInspector != null.  Given that you would need a few more boolean statements in that line to make it correct, it seems that the current one is probably the cleanest to look at.


Repository: hive-git


Description
-------

The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  

The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".

Regenerating the .equals and .hashcode from eclipse.  

Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 

Diff: https://reviews.apache.org/r/18925/diff/


Testing
-------

Manual testing.


Thanks,

Szehon Ho


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Szehon Ho <sz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/
-----------------------------------------------------------

(Updated March 7, 2014, 10:20 p.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Changes
-------

Thanks for the review.  Trimmed the whitespace.


Repository: hive-git


Description
-------

The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  

The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".

Regenerating the .equals and .hashcode from eclipse.  

Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 

Diff: https://reviews.apache.org/r/18925/diff/


Testing
-------

Manual testing.


Thanks,

Szehon Ho


Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

Posted by Szehon Ho <sz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/
-----------------------------------------------------------

(Updated March 7, 2014, 10:17 p.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Repository: hive-git


Description (updated)
-------

The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one column of an overall parquet-table struct object inspector.  

The problem lies in the ObjectInspectorFactory's cache for struct object inspector.  For performance, there is a cache keyed on an array list, of all object inspectors of columns.  The second time the query is run, it attempts to lookup cached struct inspector.  But when the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals method casted the "other" to a "StandardParquetHiveInspector".

Regenerating the .equals and .hashcode from eclipse.  

Also adding one more check in .equals before casting, to handle the case if another class of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals against the other, which in this case is not of the same class.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java 1d72747 

Diff: https://reviews.apache.org/r/18925/diff/


Testing
-------

Manual testing.


Thanks,

Szehon Ho