You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2010/07/08 09:40:50 UTC

[jira] Commented: (HIVE-1452) Mapside join on non partitioned table with partitioned table causes error

    [ https://issues.apache.org/jira/browse/HIVE-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886272#action_12886272 ] 

He Yongqiang commented on HIVE-1452:
------------------------------------

Not sure what's happening here. It will be great if you can provide a testcase to reproduce.
The parameter "hive.mapjoin.cache.numrows" (default 25K) is used to control when to flush the in-memory hashmap (which's value object is MapJoinObjectValue). You may want to use a small number for this parameter in your testcase.

A guess for this issue is maybe we should do a 
{noformat} 
out.flush();
{noformat} 
in MapjoinObjectValue's writeExternal method. (MapjoinObjectValue line 131)

> Mapside join on non partitioned table with partitioned table causes error
> -------------------------------------------------------------------------
>
>                 Key: HIVE-1452
>                 URL: https://issues.apache.org/jira/browse/HIVE-1452
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>
> I am running script which contains two tables, one is dynamically partitioned and stored as RCFormat and the other is stored as TXT file.
> The TXT file has around 397MB in size and has around 24million rows.
> {code}
> drop table joinquery;
> create external table joinquery (
>   id string,
>   type string,
>   sec string,
>   num string,
>   url string,
>   cost string,
>   listinfo array <map<string,string>>
> ) 
> STORED AS TEXTFILE
> LOCATION '/projects/joinquery';
> CREATE EXTERNAL TABLE idtable20mil(
> id string
> )
> STORED AS TEXTFILE
> LOCATION '/projects/idtable20mil';
> insert overwrite table joinquery
>    select 
>       /*+ MAPJOIN(idtable20mil) */
>       rctable.id,
>       rctable.type,
>       rctable.map['sec'],
>       rctable.map['num'],
>       rctable.map['url'],
>       rctable.map['cost'],
>       rctable.listinfo
>     from rctable
>     JOIN  idtable20mil on (rctable.id = idtable20mil.id)
>     where
>     rctable.id is not null and
>     rctable.part='value' and
>     rctable.subpart='value'and
>     rctable.pty='100' and
>     rctable.uniqid='1000'
> order by id;
> {code}
> Result:
> Possible error:
>   Data file split:string,part:string,subpart:string,subsubpart:string&gt; is corrupted.
> Solution:
>   Replace file. i.e. by re-running the query that produced the source table / partition.
> -----
> If I look at mapper logs.
> {verbatim}
> Caused by: java.io.IOException: java.io.EOFException
> 	at org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:109)
> 	at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
> 	at org.apache.hadoop.hive.ql.util.jdbm.htree.HashBucket.readExternal(HashBucket.java:284)
> 	at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
> 	at org.apache.hadoop.hive.ql.util.jdbm.helper.Serialization.deserialize(Serialization.java:106)
> 	at org.apache.hadoop.hive.ql.util.jdbm.helper.DefaultSerializer.deserialize(DefaultSerializer.java:106)
> 	at org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:360)
> 	at org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:332)
> 	at org.apache.hadoop.hive.ql.util.jdbm.htree.HashDirectory.get(HashDirectory.java:195)
> 	at org.apache.hadoop.hive.ql.util.jdbm.htree.HTree.get(HTree.java:155)
> 	at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.get(HashMapWrapper.java:114)
> 	... 11 more
> Caused by: java.io.EOFException
> 	at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 	at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
> 	at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
> 	at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
> 	at org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:98)
> {verbatim}
> I am trying to create a testcase, which can demonstrate this error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.