You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by java8964 java8964 <ja...@hotmail.com> on 2013/03/20 21:45:09 UTC

Question about how to add the debug info into the hive core jar

Hi, 
I have the hadoop running in  pseudo-distributed mode on my linux box. Right now I face a problem about a Hive, which throws Exception in a table for some data which used my custom SerDe and InputFormat class.
To help me to trace the root cause, I need to modify the code of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe to add more debug logging information to understand why the exception happens.
After I modify the hive code, I can compile it and generate a new hive-serde.jar file, with the same name as the release version, just size changed.
Now I put my new hive-serde.jar under $HIVE_HOME/lib folder, replace the old one, and run the query which failed. But after the failure, if I check the $HADOOP_HOME/logs/user_logs/, I saw the Exception stacktrace still looked like generated by the original hive-serde class. The reason is that the line number shown in the log doesn't match with the new code I changed to add the debug information.
My question is, if I have this new compiled hive-serde.jar file, besides $HIVE_HOME/lib, where should I put it in?
1) This is a pseudo environments. Everything (namenode, data node, job tracker and tasktracer are all running in one box)2) After I replace hive-serde.jar with my new jar, I even stop all the hadoop java processing and restart them.3) But when I run the query in the hive session, I still saw the log generated by the old hive-serde.jar class. Why?
Thank for any help
Yong

RE: Question about how to add the debug info into the hive core jar

Posted by java8964 java8964 <ja...@hotmail.com>.

I am not sure the existing logging information is enough for me.
The exception trace is as following:
Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8at java.util.ArrayList.rangeCheck(ArrayList.java:604)at java.util.ArrayList.get(ArrayList.java:382)at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
It is hive 0.9.0, and I look into the source code of LazySImpleSerDe.java around line 485:
      List<? extends StructField> fields = soi.getAllStructFieldRefs();      list = soi.getStructFieldsDataAsList(obj);      if (list == null) {        out.write(nullSequence.getBytes(), 0, nullSequence.getLength());      } else {        for (int i = 0; i < list.size(); i++) {          if (i > 0) {            out.write(separator);          }             serialize(out, list.get(i), fields.get(i).getFieldObjectInspector(),                     -- line 485              separators, level + 1, nullSequence, escaped, escapeChar,              needsEscape);        }         }   
For this exception to happen, it means that the soi (Which is my StructObjectInspector class) must return different length of collection object as "fields" and "list".But I already add the logger in my StructorObjectInspector, which proves the same length collection returned from both method of getAllStructFieldRefs() and getStructFiledsDataAsList(Object).So I really don't know how this exception could happen in the Hive code.
I have 2 options right now:1) Change the above code to add more debug information to return at runtime to check what kind of content in the either "fields" object or "list" object, to understand why their length not same. But I have problem to make my new jar to be loaded by hadoop.2) Enable remote debug. There is very limited example on the internet about how to enable the hive server side MR jobs remote debug, even some wiki pages claim it is doable, but without concrete examples.
Thanks

From: ashettia@hortonworks.com
Subject: Re: Question about how to add the debug info into the hive core jar
Date: Wed, 20 Mar 2013 17:35:36 -0700
To: user@hive.apache.org

Hi Yong, 
Have you tried running the H query in debug mode. Hive log level can be changed by passing the following conf while hive client is running.  #hive -hiveconf hive.root.logger=ALL,console -e " DDL statement ;"#hive -hiveconf hive.root.logger=ALL,console -f ddl.sql ;   Hope this helps
 Thanks

On Mar 20, 2013, at 1:45 PM, java8964 java8964 <ja...@hotmail.com> wrote:Hi, 
I have the hadoop running in  pseudo-distributed mode on my linux box. Right now I face a problem about a Hive, which throws Exception in a table for some data which used my custom SerDe and InputFormat class.
To help me to trace the root cause, I need to modify the code of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe to add more debug logging information to understand why the exception happens.
After I modify the hive code, I can compile it and generate a new hive-serde.jar file, with the same name as the release version, just size changed.
Now I put my new hive-serde.jar under $HIVE_HOME/lib folder, replace the old one, and run the query which failed. But after the failure, if I check the $HADOOP_HOME/logs/user_logs/, I saw the Exception stacktrace still looked like generated by the original hive-serde class. The reason is that the line number shown in the log doesn't match with the new code I changed to add the debug information.
My question is, if I have this new compiled hive-serde.jar file, besides $HIVE_HOME/lib, where should I put it in?
1) This is a pseudo environments. Everything (namenode, data node, job tracker and tasktracer are all running in one box)2) After I replace hive-serde.jar with my new jar, I even stop all the hadoop java processing and restart them.3) But when I run the query in the hive session, I still saw the log generated by the old hive-serde.jar class. Why?
Thank for any help
Yong

Re: Question about how to add the debug info into the hive core jar

Posted by Abdelrhman Shettia <as...@hortonworks.com>.

Hi Yong, 

Have you tried running the H query in debug mode. Hive log level can be changed by passing the following conf while hive client is running. 
 
#hive -hiveconf hive.root.logger=ALL,console -e " DDL statement ;"
#hive -hiveconf hive.root.logger=ALL,console -f ddl.sql ;  
 
Hope this helps

 
Thanks


On Mar 20, 2013, at 1:45 PM, java8964 java8964 <ja...@hotmail.com> wrote:

> Hi, 
> 
> I have the hadoop running in  pseudo-distributed mode on my linux box. Right now I face a problem about a Hive, which throws Exception in a table for some data which used my custom SerDe and InputFormat class.
> 
> To help me to trace the root cause, I need to modify the code of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe to add more debug logging information to understand why the exception happens.
> 
> After I modify the hive code, I can compile it and generate a new hive-serde.jar file, with the same name as the release version, just size changed.
> 
> Now I put my new hive-serde.jar under $HIVE_HOME/lib folder, replace the old one, and run the query which failed. But after the failure, if I check the $HADOOP_HOME/logs/user_logs/, I saw the Exception stacktrace still looked like generated by the original hive-serde class. The reason is that the line number shown in the log doesn't match with the new code I changed to add the debug information.
> 
> My question is, if I have this new compiled hive-serde.jar file, besides $HIVE_HOME/lib, where should I put it in?
> 
> 1) This is a pseudo environments. Everything (namenode, data node, job tracker and tasktracer are all running in one box)
> 2) After I replace hive-serde.jar with my new jar, I even stop all the hadoop java processing and restart them.
> 3) But when I run the query in the hive session, I still saw the log generated by the old hive-serde.jar class. Why?
> 
> Thank for any help
> 
> Yong