You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Sriram Ramachandrasekaran <sr...@gmail.com> on 2012/07/27 09:54:38 UTC

Deserialization issue.

Hello,
I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora
also provides couple of classes which can be extended to write Mappers and
Reducers, if the mappers need input from an HBase store and Reducers need
to write it out to an HBase store. This is the reason why I use Gora.

Now, when I run my MR job, I get an exception as below. (
https://issues.apache.org/jira/browse/HADOOP-3093)
*
java.lang.RuntimeException: java.io.IOException:
java.lang.NullPointerException
at
org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.lang.NullPointerException
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
at
org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
at
org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
... 9 more
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at
org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
... 11 more

*
I tried the following things to work through this issue.
0. The stack trace indicates that, when setting up a new Mapper, it is
unable to deserialize something. (I could not get to understand where it
fails).
1. I looked around the forums and realized that serialization options are
not getting passed, so, I tried setting up, *io.serializations* config on
the job.
   1.1. I am not setting up the "io.serializations" myself, I use
GoraMapReduceUtils.setIOSerializations() to do it. I verified that, the
confs are getting proper serializers.
2. I verified in the job xml to see if these confs have got through, they
were. But, it failed again.
3. I tried starting the hadoop job runner with debug options turned on and
in suspend mode, -XDebug suspend=y and I also set the VM options for mapred
child tasks, via the mapred.child.java.*opts *to see if I can debug the VM
that gets spawned newly. Although I get a message on my stdout saying,
opening port X and waiting, when I try to attach a remote debugger on that
port, it does not work.

I understand that, when SerializationFactory tries to deSerialize
'something', it does not find an appropriate unmarshaller and so it fails.
But, I would like to know a way to find that 'something' and I would like
to get some idea on how (pseudo) distributed MR jobs should be generally
debugged. I tried searching, did not find anything useful.

Any help/pointers would be greatly useful.

Thanks!

-- 
It's just about how deep your longing is!

Re: Deserialization issue.

Posted by Harsh J <ha...@cloudera.com>.
Btw, do speak to Gora folks on fixing or at least documenting this
flaw. I can imagine others hitting the same issue :)

On Mon, Jul 30, 2012 at 9:22 PM, Harsh J <ha...@cloudera.com> wrote:
> I've mostly done it with logging, but this JIRA may interest you if
> you still wish to attach a remote debugger to tasks:
> https://issues.apache.org/jira/browse/MAPREDUCE-2637
>
> On Mon, Jul 30, 2012 at 7:28 PM, Sriram Ramachandrasekaran
> <sr...@gmail.com> wrote:
>> Harsh,
>> I was waiting to try it on my cluster before I came back to report if it
>> worked or not.
>> I tried it and it works. The site wide configuration worked.
>> The IOUtils.conf.addResource("job.xml") does the same thing as
>> GoraMapReduceUtils.setIOSerialization(), so it did not help.
>>
>> Thanks for the help. I still would like to know, what would be a better way
>> to debug distributed map reduce jobs.
>> I know I can debug stand-alone jobs quite easily, but, I would like to know
>> how folks do distributed map reduce jobs debugging.
>>
>> Thanks again!
>> -Sriram
>>
>>
>> On Sat, Jul 28, 2012 at 6:20 AM, Sriram Ramachandrasekaran
>> <sr...@gmail.com> wrote:
>>>
>>> aah! I always thought about setting io.serializations at the job level. I
>>> never thought about this. will try this site wide thing. thanks again.
>>>
>>> On 28 Jul 2012 06:16, "Harsh J" <ha...@cloudera.com> wrote:
>>>>
>>>> Ah, that may be cause the core-site.xml has the property
>>>> io.serializations fully defined for Gora as well? You can do that as
>>>> an alternative fix, supply a core-site.xml across tasktrackers that
>>>> also carry the serialization class Gora requires. I failed to think of
>>>> that as a solution.
>>>>
>>>> On Sat, Jul 28, 2012 at 6:04 AM, Sriram Ramachandrasekaran
>>>> <sr...@gmail.com> wrote:
>>>> > okay. But this issue didn't present itself when run in standalone mode.
>>>> > :)
>>>> >
>>>> > On 28 Jul 2012 06:02, "Harsh J" <ha...@cloudera.com> wrote:
>>>> >>
>>>> >> I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
>>>> >> TDD) first, or via LocalJobRunner, for debug purposes.
>>>> >>
>>>> >> On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran
>>>> >> <sr...@gmail.com> wrote:
>>>> >> > hello harsh,
>>>> >> > thanks for your investigations. while we were debugging, I saw the
>>>> >> > exact
>>>> >> > thing. As you pointed out, we suspected it to be a problem. So, we
>>>> >> > set
>>>> >> > the
>>>> >> > job conf object directly on Gora's query object.
>>>> >> > It goes something like this,
>>>> >> > query.setConf..(job.getConfig..())
>>>> >> >
>>>> >> > And, then I saw that it was not getting into creating a new object
>>>> >> > at
>>>> >> > getOrCreate().
>>>> >> >
>>>> >> > OTOH, i've not tried the job.xml thing. I should give it a try n I
>>>> >> > shall
>>>> >> > keep the loop posted.
>>>> >> >
>>>> >> > I would also like to hear about standard practices for debugging
>>>> >> > distributed
>>>> >> > MR tasks.
>>>> >> >
>>>> >> > -----
>>>> >> > reply from a hh device. Pl excuse typos n lack of formatting.
>>>> >> >
>>>> >> > On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:
>>>> >> >>
>>>> >> >> Hi Sriram,
>>>> >> >>
>>>> >> >> I suspect the following in Gora to somehow be causing this issue:
>>>> >> >>
>>>> >> >> IOUtils source:
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
>>>> >> >> QueryBase source:
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
>>>> >> >>
>>>> >> >> Notice that IOUtils.deserialize(…) calls expect a proper
>>>> >> >> Configuration
>>>> >> >> object. If not passed (i.e., if null), they call the following.
>>>> >> >>
>>>> >> >> 68        private static Configuration
>>>> >> >> getOrCreateConf(Configuration
>>>> >> >> conf)
>>>> >> >> {
>>>> >> >> 69          if(conf == null) {
>>>> >> >> 70            if(IOUtils.conf == null) {
>>>> >> >> 71              IOUtils.conf = new Configuration();
>>>> >> >> 72            }
>>>> >> >> 73          }
>>>> >> >> 74          return conf != null ? conf : IOUtils.conf;
>>>> >> >> 75        }
>>>> >> >>
>>>> >> >> Now QueryBase, has in its readFields method, some
>>>> >> >> IOUtils.deserialize(…) calls, that seem to pass a null for the
>>>> >> >> configuration object. The IOUtils.deserialize(…) method hence calls
>>>> >> >> this above method, and initializes a whole new Configuration
>>>> >> >> object,
>>>> >> >> as the passed conf object is null.
>>>> >> >>
>>>> >> >> If it does that, it would not be loading the "job.xml" file
>>>> >> >> contents,
>>>> >> >> which is the job's config file (thats something the map task's
>>>> >> >> config
>>>> >> >> set alone loads, and not a file thats loaded by default). So hence,
>>>> >> >> custom serializers will disappear the moment it begins using this
>>>> >> >> new
>>>> >> >> Configuration object.
>>>> >> >>
>>>> >> >> This is what you'll want to investigate and fix or notify the Gora
>>>> >> >> devs about (why QueryBase#readFields uses a null object, and if it
>>>> >> >> can
>>>> >> >> reuse some set conf object). As a cheap hack fix, maybe doing the
>>>> >> >> following will make it work in an MR environment?
>>>> >> >>
>>>> >> >> IOUtils.conf = new Configuration();
>>>> >> >> IOUtils.conf.addResource("job.xml");
>>>> >> >>
>>>> >> >> I haven't tried the above, but let us know how we can be of further
>>>> >> >> assistance. An ideal fix would be to only use the MapTask's
>>>> >> >> provided
>>>> >> >> Configuration object everywhere, somehow, and never re-create one.
>>>> >> >>
>>>> >> >> P.s. If you want a thread ref link to share with other devs over
>>>> >> >> Gora,
>>>> >> >> here it is: http://search-hadoop.com/m/BXZA4dTUFC
>>>> >> >>
>>>> >> >> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
>>>> >> >> <sr...@gmail.com> wrote:
>>>> >> >> > Hello,
>>>> >> >> > I have an MR job that talks to HBase. I use Gora to talk to
>>>> >> >> > HBase.
>>>> >> >> > Gora
>>>> >> >> > also
>>>> >> >> > provides couple of classes which can be extended to write Mappers
>>>> >> >> > and
>>>> >> >> > Reducers, if the mappers need input from an HBase store and
>>>> >> >> > Reducers
>>>> >> >> > need to
>>>> >> >> > write it out to an HBase store. This is the reason why I use
>>>> >> >> > Gora.
>>>> >> >> >
>>>> >> >> > Now, when I run my MR job, I get an exception as below.
>>>> >> >> > (https://issues.apache.org/jira/browse/HADOOP-3093)
>>>> >> >> > java.lang.RuntimeException: java.io.IOException:
>>>> >> >> > java.lang.NullPointerException
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>> >> >> > at
>>>> >> >> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
>>>> >> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>> >> >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>> >> >> > at java.security.AccessController.doPrivileged(Native Method)
>>>> >> >> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>>>> >> >> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>> >> >> > Caused by: java.io.IOException: java.lang.NullPointerException
>>>> >> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
>>>> >> >> > ... 9 more
>>>> >> >> > Caused by: java.lang.NullPointerException
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
>>>> >> >> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> > org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>>> >> >> > at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
>>>> >> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
>>>> >> >> > ... 11 more
>>>> >> >> >
>>>> >> >> > I tried the following things to work through this issue.
>>>> >> >> > 0. The stack trace indicates that, when setting up a new Mapper,
>>>> >> >> > it
>>>> >> >> > is
>>>> >> >> > unable to deserialize something. (I could not get to understand
>>>> >> >> > where
>>>> >> >> > it
>>>> >> >> > fails).
>>>> >> >> > 1. I looked around the forums and realized that serialization
>>>> >> >> > options
>>>> >> >> > are
>>>> >> >> > not getting passed, so, I tried setting up, io.serializations
>>>> >> >> > config
>>>> >> >> > on
>>>> >> >> > the
>>>> >> >> > job.
>>>> >> >> >    1.1. I am not setting up the "io.serializations" myself, I use
>>>> >> >> > GoraMapReduceUtils.setIOSerializations() to do it. I verified
>>>> >> >> > that,
>>>> >> >> > the
>>>> >> >> > confs are getting proper serializers.
>>>> >> >> > 2. I verified in the job xml to see if these confs have got
>>>> >> >> > through,
>>>> >> >> > they
>>>> >> >> > were. But, it failed again.
>>>> >> >> > 3. I tried starting the hadoop job runner with debug options
>>>> >> >> > turned
>>>> >> >> > on
>>>> >> >> > and
>>>> >> >> > in suspend mode, -XDebug suspend=y and I also set the VM options
>>>> >> >> > for
>>>> >> >> > mapred
>>>> >> >> > child tasks, via the mapred.child.java.opts to see if I can debug
>>>> >> >> > the
>>>> >> >> > VM
>>>> >> >> > that gets spawned newly. Although I get a message on my stdout
>>>> >> >> > saying,
>>>> >> >> > opening port X and waiting, when I try to attach a remote
>>>> >> >> > debugger on
>>>> >> >> > that
>>>> >> >> > port, it does not work.
>>>> >> >> >
>>>> >> >> > I understand that, when SerializationFactory tries to deSerialize
>>>> >> >> > 'something', it does not find an appropriate unmarshaller and so
>>>> >> >> > it
>>>> >> >> > fails.
>>>> >> >> > But, I would like to know a way to find that 'something' and I
>>>> >> >> > would
>>>> >> >> > like to
>>>> >> >> > get some idea on how (pseudo) distributed MR jobs should be
>>>> >> >> > generally
>>>> >> >> > debugged. I tried searching, did not find anything useful.
>>>> >> >> >
>>>> >> >> > Any help/pointers would be greatly useful.
>>>> >> >> >
>>>> >> >> > Thanks!
>>>> >> >> >
>>>> >> >> > --
>>>> >> >> > It's just about how deep your longing is!
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Harsh J
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Harsh J
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>
>>
>>
>>
>> --
>> It's just about how deep your longing is!
>>
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Deserialization issue.

Posted by Harsh J <ha...@cloudera.com>.
I've mostly done it with logging, but this JIRA may interest you if
you still wish to attach a remote debugger to tasks:
https://issues.apache.org/jira/browse/MAPREDUCE-2637

On Mon, Jul 30, 2012 at 7:28 PM, Sriram Ramachandrasekaran
<sr...@gmail.com> wrote:
> Harsh,
> I was waiting to try it on my cluster before I came back to report if it
> worked or not.
> I tried it and it works. The site wide configuration worked.
> The IOUtils.conf.addResource("job.xml") does the same thing as
> GoraMapReduceUtils.setIOSerialization(), so it did not help.
>
> Thanks for the help. I still would like to know, what would be a better way
> to debug distributed map reduce jobs.
> I know I can debug stand-alone jobs quite easily, but, I would like to know
> how folks do distributed map reduce jobs debugging.
>
> Thanks again!
> -Sriram
>
>
> On Sat, Jul 28, 2012 at 6:20 AM, Sriram Ramachandrasekaran
> <sr...@gmail.com> wrote:
>>
>> aah! I always thought about setting io.serializations at the job level. I
>> never thought about this. will try this site wide thing. thanks again.
>>
>> On 28 Jul 2012 06:16, "Harsh J" <ha...@cloudera.com> wrote:
>>>
>>> Ah, that may be cause the core-site.xml has the property
>>> io.serializations fully defined for Gora as well? You can do that as
>>> an alternative fix, supply a core-site.xml across tasktrackers that
>>> also carry the serialization class Gora requires. I failed to think of
>>> that as a solution.
>>>
>>> On Sat, Jul 28, 2012 at 6:04 AM, Sriram Ramachandrasekaran
>>> <sr...@gmail.com> wrote:
>>> > okay. But this issue didn't present itself when run in standalone mode.
>>> > :)
>>> >
>>> > On 28 Jul 2012 06:02, "Harsh J" <ha...@cloudera.com> wrote:
>>> >>
>>> >> I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
>>> >> TDD) first, or via LocalJobRunner, for debug purposes.
>>> >>
>>> >> On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran
>>> >> <sr...@gmail.com> wrote:
>>> >> > hello harsh,
>>> >> > thanks for your investigations. while we were debugging, I saw the
>>> >> > exact
>>> >> > thing. As you pointed out, we suspected it to be a problem. So, we
>>> >> > set
>>> >> > the
>>> >> > job conf object directly on Gora's query object.
>>> >> > It goes something like this,
>>> >> > query.setConf..(job.getConfig..())
>>> >> >
>>> >> > And, then I saw that it was not getting into creating a new object
>>> >> > at
>>> >> > getOrCreate().
>>> >> >
>>> >> > OTOH, i've not tried the job.xml thing. I should give it a try n I
>>> >> > shall
>>> >> > keep the loop posted.
>>> >> >
>>> >> > I would also like to hear about standard practices for debugging
>>> >> > distributed
>>> >> > MR tasks.
>>> >> >
>>> >> > -----
>>> >> > reply from a hh device. Pl excuse typos n lack of formatting.
>>> >> >
>>> >> > On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:
>>> >> >>
>>> >> >> Hi Sriram,
>>> >> >>
>>> >> >> I suspect the following in Gora to somehow be causing this issue:
>>> >> >>
>>> >> >> IOUtils source:
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
>>> >> >> QueryBase source:
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
>>> >> >>
>>> >> >> Notice that IOUtils.deserialize(…) calls expect a proper
>>> >> >> Configuration
>>> >> >> object. If not passed (i.e., if null), they call the following.
>>> >> >>
>>> >> >> 68        private static Configuration
>>> >> >> getOrCreateConf(Configuration
>>> >> >> conf)
>>> >> >> {
>>> >> >> 69          if(conf == null) {
>>> >> >> 70            if(IOUtils.conf == null) {
>>> >> >> 71              IOUtils.conf = new Configuration();
>>> >> >> 72            }
>>> >> >> 73          }
>>> >> >> 74          return conf != null ? conf : IOUtils.conf;
>>> >> >> 75        }
>>> >> >>
>>> >> >> Now QueryBase, has in its readFields method, some
>>> >> >> IOUtils.deserialize(…) calls, that seem to pass a null for the
>>> >> >> configuration object. The IOUtils.deserialize(…) method hence calls
>>> >> >> this above method, and initializes a whole new Configuration
>>> >> >> object,
>>> >> >> as the passed conf object is null.
>>> >> >>
>>> >> >> If it does that, it would not be loading the "job.xml" file
>>> >> >> contents,
>>> >> >> which is the job's config file (thats something the map task's
>>> >> >> config
>>> >> >> set alone loads, and not a file thats loaded by default). So hence,
>>> >> >> custom serializers will disappear the moment it begins using this
>>> >> >> new
>>> >> >> Configuration object.
>>> >> >>
>>> >> >> This is what you'll want to investigate and fix or notify the Gora
>>> >> >> devs about (why QueryBase#readFields uses a null object, and if it
>>> >> >> can
>>> >> >> reuse some set conf object). As a cheap hack fix, maybe doing the
>>> >> >> following will make it work in an MR environment?
>>> >> >>
>>> >> >> IOUtils.conf = new Configuration();
>>> >> >> IOUtils.conf.addResource("job.xml");
>>> >> >>
>>> >> >> I haven't tried the above, but let us know how we can be of further
>>> >> >> assistance. An ideal fix would be to only use the MapTask's
>>> >> >> provided
>>> >> >> Configuration object everywhere, somehow, and never re-create one.
>>> >> >>
>>> >> >> P.s. If you want a thread ref link to share with other devs over
>>> >> >> Gora,
>>> >> >> here it is: http://search-hadoop.com/m/BXZA4dTUFC
>>> >> >>
>>> >> >> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
>>> >> >> <sr...@gmail.com> wrote:
>>> >> >> > Hello,
>>> >> >> > I have an MR job that talks to HBase. I use Gora to talk to
>>> >> >> > HBase.
>>> >> >> > Gora
>>> >> >> > also
>>> >> >> > provides couple of classes which can be extended to write Mappers
>>> >> >> > and
>>> >> >> > Reducers, if the mappers need input from an HBase store and
>>> >> >> > Reducers
>>> >> >> > need to
>>> >> >> > write it out to an HBase store. This is the reason why I use
>>> >> >> > Gora.
>>> >> >> >
>>> >> >> > Now, when I run my MR job, I get an exception as below.
>>> >> >> > (https://issues.apache.org/jira/browse/HADOOP-3093)
>>> >> >> > java.lang.RuntimeException: java.io.IOException:
>>> >> >> > java.lang.NullPointerException
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>> >> >> > at
>>> >> >> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
>>> >> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>> >> >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>> >> >> > at java.security.AccessController.doPrivileged(Native Method)
>>> >> >> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>>> >> >> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>> >> >> > Caused by: java.io.IOException: java.lang.NullPointerException
>>> >> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
>>> >> >> > ... 9 more
>>> >> >> > Caused by: java.lang.NullPointerException
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
>>> >> >> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
>>> >> >> > at
>>> >> >> >
>>> >> >> > org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>> >> >> > at
>>> >> >> >
>>> >> >> >
>>> >> >> > org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
>>> >> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
>>> >> >> > ... 11 more
>>> >> >> >
>>> >> >> > I tried the following things to work through this issue.
>>> >> >> > 0. The stack trace indicates that, when setting up a new Mapper,
>>> >> >> > it
>>> >> >> > is
>>> >> >> > unable to deserialize something. (I could not get to understand
>>> >> >> > where
>>> >> >> > it
>>> >> >> > fails).
>>> >> >> > 1. I looked around the forums and realized that serialization
>>> >> >> > options
>>> >> >> > are
>>> >> >> > not getting passed, so, I tried setting up, io.serializations
>>> >> >> > config
>>> >> >> > on
>>> >> >> > the
>>> >> >> > job.
>>> >> >> >    1.1. I am not setting up the "io.serializations" myself, I use
>>> >> >> > GoraMapReduceUtils.setIOSerializations() to do it. I verified
>>> >> >> > that,
>>> >> >> > the
>>> >> >> > confs are getting proper serializers.
>>> >> >> > 2. I verified in the job xml to see if these confs have got
>>> >> >> > through,
>>> >> >> > they
>>> >> >> > were. But, it failed again.
>>> >> >> > 3. I tried starting the hadoop job runner with debug options
>>> >> >> > turned
>>> >> >> > on
>>> >> >> > and
>>> >> >> > in suspend mode, -XDebug suspend=y and I also set the VM options
>>> >> >> > for
>>> >> >> > mapred
>>> >> >> > child tasks, via the mapred.child.java.opts to see if I can debug
>>> >> >> > the
>>> >> >> > VM
>>> >> >> > that gets spawned newly. Although I get a message on my stdout
>>> >> >> > saying,
>>> >> >> > opening port X and waiting, when I try to attach a remote
>>> >> >> > debugger on
>>> >> >> > that
>>> >> >> > port, it does not work.
>>> >> >> >
>>> >> >> > I understand that, when SerializationFactory tries to deSerialize
>>> >> >> > 'something', it does not find an appropriate unmarshaller and so
>>> >> >> > it
>>> >> >> > fails.
>>> >> >> > But, I would like to know a way to find that 'something' and I
>>> >> >> > would
>>> >> >> > like to
>>> >> >> > get some idea on how (pseudo) distributed MR jobs should be
>>> >> >> > generally
>>> >> >> > debugged. I tried searching, did not find anything useful.
>>> >> >> >
>>> >> >> > Any help/pointers would be greatly useful.
>>> >> >> >
>>> >> >> > Thanks!
>>> >> >> >
>>> >> >> > --
>>> >> >> > It's just about how deep your longing is!
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Harsh J
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>
>
>
>
> --
> It's just about how deep your longing is!
>



-- 
Harsh J

Re: Deserialization issue.

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.
Harsh,
I was waiting to try it on my cluster before I came back to report if it
worked or not.
I tried it and it works. The site wide configuration worked.
The IOUtils.conf.addResource("job.xml") does the same thing as
GoraMapReduceUtils.setIOSerialization(), so it did not help.

Thanks for the help. I still would like to know, what would be a better way
to debug distributed map reduce jobs.
I know I can debug stand-alone jobs quite easily, but, I would like to know
how folks do distributed map reduce jobs debugging.

Thanks again!
-Sriram


On Sat, Jul 28, 2012 at 6:20 AM, Sriram Ramachandrasekaran <
sri.rams85@gmail.com> wrote:

> aah! I always thought about setting io.serializations at the job level. I
> never thought about this. will try this site wide thing. thanks again.
> On 28 Jul 2012 06:16, "Harsh J" <ha...@cloudera.com> wrote:
>
>> Ah, that may be cause the core-site.xml has the property
>> io.serializations fully defined for Gora as well? You can do that as
>> an alternative fix, supply a core-site.xml across tasktrackers that
>> also carry the serialization class Gora requires. I failed to think of
>> that as a solution.
>>
>> On Sat, Jul 28, 2012 at 6:04 AM, Sriram Ramachandrasekaran
>> <sr...@gmail.com> wrote:
>> > okay. But this issue didn't present itself when run in standalone mode.
>> :)
>> >
>> > On 28 Jul 2012 06:02, "Harsh J" <ha...@cloudera.com> wrote:
>> >>
>> >> I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
>> >> TDD) first, or via LocalJobRunner, for debug purposes.
>> >>
>> >> On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran
>> >> <sr...@gmail.com> wrote:
>> >> > hello harsh,
>> >> > thanks for your investigations. while we were debugging, I saw the
>> exact
>> >> > thing. As you pointed out, we suspected it to be a problem. So, we
>> set
>> >> > the
>> >> > job conf object directly on Gora's query object.
>> >> > It goes something like this,
>> >> > query.setConf..(job.getConfig..())
>> >> >
>> >> > And, then I saw that it was not getting into creating a new object at
>> >> > getOrCreate().
>> >> >
>> >> > OTOH, i've not tried the job.xml thing. I should give it a try n I
>> shall
>> >> > keep the loop posted.
>> >> >
>> >> > I would also like to hear about standard practices for debugging
>> >> > distributed
>> >> > MR tasks.
>> >> >
>> >> > -----
>> >> > reply from a hh device. Pl excuse typos n lack of formatting.
>> >> >
>> >> > On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:
>> >> >>
>> >> >> Hi Sriram,
>> >> >>
>> >> >> I suspect the following in Gora to somehow be causing this issue:
>> >> >>
>> >> >> IOUtils source:
>> >> >>
>> >> >>
>> >> >>
>> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
>> >> >> QueryBase source:
>> >> >>
>> >> >>
>> >> >>
>> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
>> >> >>
>> >> >> Notice that IOUtils.deserialize(…) calls expect a proper
>> Configuration
>> >> >> object. If not passed (i.e., if null), they call the following.
>> >> >>
>> >> >> 68        private static Configuration getOrCreateConf(Configuration
>> >> >> conf)
>> >> >> {
>> >> >> 69          if(conf == null) {
>> >> >> 70            if(IOUtils.conf == null) {
>> >> >> 71              IOUtils.conf = new Configuration();
>> >> >> 72            }
>> >> >> 73          }
>> >> >> 74          return conf != null ? conf : IOUtils.conf;
>> >> >> 75        }
>> >> >>
>> >> >> Now QueryBase, has in its readFields method, some
>> >> >> IOUtils.deserialize(…) calls, that seem to pass a null for the
>> >> >> configuration object. The IOUtils.deserialize(…) method hence calls
>> >> >> this above method, and initializes a whole new Configuration object,
>> >> >> as the passed conf object is null.
>> >> >>
>> >> >> If it does that, it would not be loading the "job.xml" file
>> contents,
>> >> >> which is the job's config file (thats something the map task's
>> config
>> >> >> set alone loads, and not a file thats loaded by default). So hence,
>> >> >> custom serializers will disappear the moment it begins using this
>> new
>> >> >> Configuration object.
>> >> >>
>> >> >> This is what you'll want to investigate and fix or notify the Gora
>> >> >> devs about (why QueryBase#readFields uses a null object, and if it
>> can
>> >> >> reuse some set conf object). As a cheap hack fix, maybe doing the
>> >> >> following will make it work in an MR environment?
>> >> >>
>> >> >> IOUtils.conf = new Configuration();
>> >> >> IOUtils.conf.addResource("job.xml");
>> >> >>
>> >> >> I haven't tried the above, but let us know how we can be of further
>> >> >> assistance. An ideal fix would be to only use the MapTask's provided
>> >> >> Configuration object everywhere, somehow, and never re-create one.
>> >> >>
>> >> >> P.s. If you want a thread ref link to share with other devs over
>> Gora,
>> >> >> here it is: http://search-hadoop.com/m/BXZA4dTUFC
>> >> >>
>> >> >> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
>> >> >> <sr...@gmail.com> wrote:
>> >> >> > Hello,
>> >> >> > I have an MR job that talks to HBase. I use Gora to talk to HBase.
>> >> >> > Gora
>> >> >> > also
>> >> >> > provides couple of classes which can be extended to write Mappers
>> and
>> >> >> > Reducers, if the mappers need input from an HBase store and
>> Reducers
>> >> >> > need to
>> >> >> > write it out to an HBase store. This is the reason why I use Gora.
>> >> >> >
>> >> >> > Now, when I run my MR job, I get an exception as below.
>> >> >> > (https://issues.apache.org/jira/browse/HADOOP-3093)
>> >> >> > java.lang.RuntimeException: java.io.IOException:
>> >> >> > java.lang.NullPointerException
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
>> >> >> > at
>> >> >> >
>> >> >> >
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> >> >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
>> >> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> >> >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> >> >> > at java.security.AccessController.doPrivileged(Native Method)
>> >> >> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>> >> >> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> >> >> > Caused by: java.io.IOException: java.lang.NullPointerException
>> >> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
>> >> >> > ... 9 more
>> >> >> > Caused by: java.lang.NullPointerException
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
>> >> >> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
>> >> >> > at
>> >> >> >
>> org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>> >> >> > at
>> >> >> >
>> >> >> >
>> >> >> >
>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>> >> >> > at
>> >> >> >
>> >> >> >
>> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
>> >> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
>> >> >> > ... 11 more
>> >> >> >
>> >> >> > I tried the following things to work through this issue.
>> >> >> > 0. The stack trace indicates that, when setting up a new Mapper,
>> it
>> >> >> > is
>> >> >> > unable to deserialize something. (I could not get to understand
>> where
>> >> >> > it
>> >> >> > fails).
>> >> >> > 1. I looked around the forums and realized that serialization
>> options
>> >> >> > are
>> >> >> > not getting passed, so, I tried setting up, io.serializations
>> config
>> >> >> > on
>> >> >> > the
>> >> >> > job.
>> >> >> >    1.1. I am not setting up the "io.serializations" myself, I use
>> >> >> > GoraMapReduceUtils.setIOSerializations() to do it. I verified
>> that,
>> >> >> > the
>> >> >> > confs are getting proper serializers.
>> >> >> > 2. I verified in the job xml to see if these confs have got
>> through,
>> >> >> > they
>> >> >> > were. But, it failed again.
>> >> >> > 3. I tried starting the hadoop job runner with debug options
>> turned
>> >> >> > on
>> >> >> > and
>> >> >> > in suspend mode, -XDebug suspend=y and I also set the VM options
>> for
>> >> >> > mapred
>> >> >> > child tasks, via the mapred.child.java.opts to see if I can debug
>> the
>> >> >> > VM
>> >> >> > that gets spawned newly. Although I get a message on my stdout
>> >> >> > saying,
>> >> >> > opening port X and waiting, when I try to attach a remote
>> debugger on
>> >> >> > that
>> >> >> > port, it does not work.
>> >> >> >
>> >> >> > I understand that, when SerializationFactory tries to deSerialize
>> >> >> > 'something', it does not find an appropriate unmarshaller and so
>> it
>> >> >> > fails.
>> >> >> > But, I would like to know a way to find that 'something' and I
>> would
>> >> >> > like to
>> >> >> > get some idea on how (pseudo) distributed MR jobs should be
>> generally
>> >> >> > debugged. I tried searching, did not find anything useful.
>> >> >> >
>> >> >> > Any help/pointers would be greatly useful.
>> >> >> >
>> >> >> > Thanks!
>> >> >> >
>> >> >> > --
>> >> >> > It's just about how deep your longing is!
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Harsh J
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>


-- 
It's just about how deep your longing is!

Re: Deserialization issue.

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.
aah! I always thought about setting io.serializations at the job level. I
never thought about this. will try this site wide thing. thanks again.
On 28 Jul 2012 06:16, "Harsh J" <ha...@cloudera.com> wrote:

> Ah, that may be cause the core-site.xml has the property
> io.serializations fully defined for Gora as well? You can do that as
> an alternative fix, supply a core-site.xml across tasktrackers that
> also carry the serialization class Gora requires. I failed to think of
> that as a solution.
>
> On Sat, Jul 28, 2012 at 6:04 AM, Sriram Ramachandrasekaran
> <sr...@gmail.com> wrote:
> > okay. But this issue didn't present itself when run in standalone mode.
> :)
> >
> > On 28 Jul 2012 06:02, "Harsh J" <ha...@cloudera.com> wrote:
> >>
> >> I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
> >> TDD) first, or via LocalJobRunner, for debug purposes.
> >>
> >> On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran
> >> <sr...@gmail.com> wrote:
> >> > hello harsh,
> >> > thanks for your investigations. while we were debugging, I saw the
> exact
> >> > thing. As you pointed out, we suspected it to be a problem. So, we set
> >> > the
> >> > job conf object directly on Gora's query object.
> >> > It goes something like this,
> >> > query.setConf..(job.getConfig..())
> >> >
> >> > And, then I saw that it was not getting into creating a new object at
> >> > getOrCreate().
> >> >
> >> > OTOH, i've not tried the job.xml thing. I should give it a try n I
> shall
> >> > keep the loop posted.
> >> >
> >> > I would also like to hear about standard practices for debugging
> >> > distributed
> >> > MR tasks.
> >> >
> >> > -----
> >> > reply from a hh device. Pl excuse typos n lack of formatting.
> >> >
> >> > On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:
> >> >>
> >> >> Hi Sriram,
> >> >>
> >> >> I suspect the following in Gora to somehow be causing this issue:
> >> >>
> >> >> IOUtils source:
> >> >>
> >> >>
> >> >>
> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
> >> >> QueryBase source:
> >> >>
> >> >>
> >> >>
> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
> >> >>
> >> >> Notice that IOUtils.deserialize(…) calls expect a proper
> Configuration
> >> >> object. If not passed (i.e., if null), they call the following.
> >> >>
> >> >> 68        private static Configuration getOrCreateConf(Configuration
> >> >> conf)
> >> >> {
> >> >> 69          if(conf == null) {
> >> >> 70            if(IOUtils.conf == null) {
> >> >> 71              IOUtils.conf = new Configuration();
> >> >> 72            }
> >> >> 73          }
> >> >> 74          return conf != null ? conf : IOUtils.conf;
> >> >> 75        }
> >> >>
> >> >> Now QueryBase, has in its readFields method, some
> >> >> IOUtils.deserialize(…) calls, that seem to pass a null for the
> >> >> configuration object. The IOUtils.deserialize(…) method hence calls
> >> >> this above method, and initializes a whole new Configuration object,
> >> >> as the passed conf object is null.
> >> >>
> >> >> If it does that, it would not be loading the "job.xml" file contents,
> >> >> which is the job's config file (thats something the map task's config
> >> >> set alone loads, and not a file thats loaded by default). So hence,
> >> >> custom serializers will disappear the moment it begins using this new
> >> >> Configuration object.
> >> >>
> >> >> This is what you'll want to investigate and fix or notify the Gora
> >> >> devs about (why QueryBase#readFields uses a null object, and if it
> can
> >> >> reuse some set conf object). As a cheap hack fix, maybe doing the
> >> >> following will make it work in an MR environment?
> >> >>
> >> >> IOUtils.conf = new Configuration();
> >> >> IOUtils.conf.addResource("job.xml");
> >> >>
> >> >> I haven't tried the above, but let us know how we can be of further
> >> >> assistance. An ideal fix would be to only use the MapTask's provided
> >> >> Configuration object everywhere, somehow, and never re-create one.
> >> >>
> >> >> P.s. If you want a thread ref link to share with other devs over
> Gora,
> >> >> here it is: http://search-hadoop.com/m/BXZA4dTUFC
> >> >>
> >> >> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
> >> >> <sr...@gmail.com> wrote:
> >> >> > Hello,
> >> >> > I have an MR job that talks to HBase. I use Gora to talk to HBase.
> >> >> > Gora
> >> >> > also
> >> >> > provides couple of classes which can be extended to write Mappers
> and
> >> >> > Reducers, if the mappers need input from an HBase store and
> Reducers
> >> >> > need to
> >> >> > write it out to an HBase store. This is the reason why I use Gora.
> >> >> >
> >> >> > Now, when I run my MR job, I get an exception as below.
> >> >> > (https://issues.apache.org/jira/browse/HADOOP-3093)
> >> >> > java.lang.RuntimeException: java.io.IOException:
> >> >> > java.lang.NullPointerException
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
> >> >> > at
> >> >> >
> >> >> >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >> >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
> >> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >> >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >> >> > at java.security.AccessController.doPrivileged(Native Method)
> >> >> > at javax.security.auth.Subject.doAs(Subject.java:415)
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> >> >> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >> >> > Caused by: java.io.IOException: java.lang.NullPointerException
> >> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
> >> >> > ... 9 more
> >> >> > Caused by: java.lang.NullPointerException
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
> >> >> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
> >> >> > at
> >> >> > org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> >> >> > at
> >> >> >
> >> >> >
> >> >> >
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
> >> >> > at
> >> >> >
> >> >> >
> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
> >> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
> >> >> > ... 11 more
> >> >> >
> >> >> > I tried the following things to work through this issue.
> >> >> > 0. The stack trace indicates that, when setting up a new Mapper, it
> >> >> > is
> >> >> > unable to deserialize something. (I could not get to understand
> where
> >> >> > it
> >> >> > fails).
> >> >> > 1. I looked around the forums and realized that serialization
> options
> >> >> > are
> >> >> > not getting passed, so, I tried setting up, io.serializations
> config
> >> >> > on
> >> >> > the
> >> >> > job.
> >> >> >    1.1. I am not setting up the "io.serializations" myself, I use
> >> >> > GoraMapReduceUtils.setIOSerializations() to do it. I verified that,
> >> >> > the
> >> >> > confs are getting proper serializers.
> >> >> > 2. I verified in the job xml to see if these confs have got
> through,
> >> >> > they
> >> >> > were. But, it failed again.
> >> >> > 3. I tried starting the hadoop job runner with debug options turned
> >> >> > on
> >> >> > and
> >> >> > in suspend mode, -XDebug suspend=y and I also set the VM options
> for
> >> >> > mapred
> >> >> > child tasks, via the mapred.child.java.opts to see if I can debug
> the
> >> >> > VM
> >> >> > that gets spawned newly. Although I get a message on my stdout
> >> >> > saying,
> >> >> > opening port X and waiting, when I try to attach a remote debugger
> on
> >> >> > that
> >> >> > port, it does not work.
> >> >> >
> >> >> > I understand that, when SerializationFactory tries to deSerialize
> >> >> > 'something', it does not find an appropriate unmarshaller and so it
> >> >> > fails.
> >> >> > But, I would like to know a way to find that 'something' and I
> would
> >> >> > like to
> >> >> > get some idea on how (pseudo) distributed MR jobs should be
> generally
> >> >> > debugged. I tried searching, did not find anything useful.
> >> >> >
> >> >> > Any help/pointers would be greatly useful.
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > --
> >> >> > It's just about how deep your longing is!
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>

Re: Deserialization issue.

Posted by Harsh J <ha...@cloudera.com>.
Ah, that may be cause the core-site.xml has the property
io.serializations fully defined for Gora as well? You can do that as
an alternative fix, supply a core-site.xml across tasktrackers that
also carry the serialization class Gora requires. I failed to think of
that as a solution.

On Sat, Jul 28, 2012 at 6:04 AM, Sriram Ramachandrasekaran
<sr...@gmail.com> wrote:
> okay. But this issue didn't present itself when run in standalone mode. :)
>
> On 28 Jul 2012 06:02, "Harsh J" <ha...@cloudera.com> wrote:
>>
>> I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
>> TDD) first, or via LocalJobRunner, for debug purposes.
>>
>> On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran
>> <sr...@gmail.com> wrote:
>> > hello harsh,
>> > thanks for your investigations. while we were debugging, I saw the exact
>> > thing. As you pointed out, we suspected it to be a problem. So, we set
>> > the
>> > job conf object directly on Gora's query object.
>> > It goes something like this,
>> > query.setConf..(job.getConfig..())
>> >
>> > And, then I saw that it was not getting into creating a new object at
>> > getOrCreate().
>> >
>> > OTOH, i've not tried the job.xml thing. I should give it a try n I shall
>> > keep the loop posted.
>> >
>> > I would also like to hear about standard practices for debugging
>> > distributed
>> > MR tasks.
>> >
>> > -----
>> > reply from a hh device. Pl excuse typos n lack of formatting.
>> >
>> > On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:
>> >>
>> >> Hi Sriram,
>> >>
>> >> I suspect the following in Gora to somehow be causing this issue:
>> >>
>> >> IOUtils source:
>> >>
>> >>
>> >> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
>> >> QueryBase source:
>> >>
>> >>
>> >> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
>> >>
>> >> Notice that IOUtils.deserialize(…) calls expect a proper Configuration
>> >> object. If not passed (i.e., if null), they call the following.
>> >>
>> >> 68        private static Configuration getOrCreateConf(Configuration
>> >> conf)
>> >> {
>> >> 69          if(conf == null) {
>> >> 70            if(IOUtils.conf == null) {
>> >> 71              IOUtils.conf = new Configuration();
>> >> 72            }
>> >> 73          }
>> >> 74          return conf != null ? conf : IOUtils.conf;
>> >> 75        }
>> >>
>> >> Now QueryBase, has in its readFields method, some
>> >> IOUtils.deserialize(…) calls, that seem to pass a null for the
>> >> configuration object. The IOUtils.deserialize(…) method hence calls
>> >> this above method, and initializes a whole new Configuration object,
>> >> as the passed conf object is null.
>> >>
>> >> If it does that, it would not be loading the "job.xml" file contents,
>> >> which is the job's config file (thats something the map task's config
>> >> set alone loads, and not a file thats loaded by default). So hence,
>> >> custom serializers will disappear the moment it begins using this new
>> >> Configuration object.
>> >>
>> >> This is what you'll want to investigate and fix or notify the Gora
>> >> devs about (why QueryBase#readFields uses a null object, and if it can
>> >> reuse some set conf object). As a cheap hack fix, maybe doing the
>> >> following will make it work in an MR environment?
>> >>
>> >> IOUtils.conf = new Configuration();
>> >> IOUtils.conf.addResource("job.xml");
>> >>
>> >> I haven't tried the above, but let us know how we can be of further
>> >> assistance. An ideal fix would be to only use the MapTask's provided
>> >> Configuration object everywhere, somehow, and never re-create one.
>> >>
>> >> P.s. If you want a thread ref link to share with other devs over Gora,
>> >> here it is: http://search-hadoop.com/m/BXZA4dTUFC
>> >>
>> >> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
>> >> <sr...@gmail.com> wrote:
>> >> > Hello,
>> >> > I have an MR job that talks to HBase. I use Gora to talk to HBase.
>> >> > Gora
>> >> > also
>> >> > provides couple of classes which can be extended to write Mappers and
>> >> > Reducers, if the mappers need input from an HBase store and Reducers
>> >> > need to
>> >> > write it out to an HBase store. This is the reason why I use Gora.
>> >> >
>> >> > Now, when I run my MR job, I get an exception as below.
>> >> > (https://issues.apache.org/jira/browse/HADOOP-3093)
>> >> > java.lang.RuntimeException: java.io.IOException:
>> >> > java.lang.NullPointerException
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
>> >> > at
>> >> >
>> >> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
>> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> >> > at java.security.AccessController.doPrivileged(Native Method)
>> >> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>> >> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> >> > Caused by: java.io.IOException: java.lang.NullPointerException
>> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
>> >> > ... 9 more
>> >> > Caused by: java.lang.NullPointerException
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
>> >> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
>> >> > at
>> >> > org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>> >> > at
>> >> >
>> >> > org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
>> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
>> >> > ... 11 more
>> >> >
>> >> > I tried the following things to work through this issue.
>> >> > 0. The stack trace indicates that, when setting up a new Mapper, it
>> >> > is
>> >> > unable to deserialize something. (I could not get to understand where
>> >> > it
>> >> > fails).
>> >> > 1. I looked around the forums and realized that serialization options
>> >> > are
>> >> > not getting passed, so, I tried setting up, io.serializations config
>> >> > on
>> >> > the
>> >> > job.
>> >> >    1.1. I am not setting up the "io.serializations" myself, I use
>> >> > GoraMapReduceUtils.setIOSerializations() to do it. I verified that,
>> >> > the
>> >> > confs are getting proper serializers.
>> >> > 2. I verified in the job xml to see if these confs have got through,
>> >> > they
>> >> > were. But, it failed again.
>> >> > 3. I tried starting the hadoop job runner with debug options turned
>> >> > on
>> >> > and
>> >> > in suspend mode, -XDebug suspend=y and I also set the VM options for
>> >> > mapred
>> >> > child tasks, via the mapred.child.java.opts to see if I can debug the
>> >> > VM
>> >> > that gets spawned newly. Although I get a message on my stdout
>> >> > saying,
>> >> > opening port X and waiting, when I try to attach a remote debugger on
>> >> > that
>> >> > port, it does not work.
>> >> >
>> >> > I understand that, when SerializationFactory tries to deSerialize
>> >> > 'something', it does not find an appropriate unmarshaller and so it
>> >> > fails.
>> >> > But, I would like to know a way to find that 'something' and I would
>> >> > like to
>> >> > get some idea on how (pseudo) distributed MR jobs should be generally
>> >> > debugged. I tried searching, did not find anything useful.
>> >> >
>> >> > Any help/pointers would be greatly useful.
>> >> >
>> >> > Thanks!
>> >> >
>> >> > --
>> >> > It's just about how deep your longing is!
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: Deserialization issue.

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.
okay. But this issue didn't present itself when run in standalone mode. :)
On 28 Jul 2012 06:02, "Harsh J" <ha...@cloudera.com> wrote:

> I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
> TDD) first, or via LocalJobRunner, for debug purposes.
>
> On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran
> <sr...@gmail.com> wrote:
> > hello harsh,
> > thanks for your investigations. while we were debugging, I saw the exact
> > thing. As you pointed out, we suspected it to be a problem. So, we set
> the
> > job conf object directly on Gora's query object.
> > It goes something like this,
> > query.setConf..(job.getConfig..())
> >
> > And, then I saw that it was not getting into creating a new object at
> > getOrCreate().
> >
> > OTOH, i've not tried the job.xml thing. I should give it a try n I shall
> > keep the loop posted.
> >
> > I would also like to hear about standard practices for debugging
> distributed
> > MR tasks.
> >
> > -----
> > reply from a hh device. Pl excuse typos n lack of formatting.
> >
> > On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:
> >>
> >> Hi Sriram,
> >>
> >> I suspect the following in Gora to somehow be causing this issue:
> >>
> >> IOUtils source:
> >>
> >>
> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
> >> QueryBase source:
> >>
> >>
> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
> >>
> >> Notice that IOUtils.deserialize(…) calls expect a proper Configuration
> >> object. If not passed (i.e., if null), they call the following.
> >>
> >> 68        private static Configuration getOrCreateConf(Configuration
> conf)
> >> {
> >> 69          if(conf == null) {
> >> 70            if(IOUtils.conf == null) {
> >> 71              IOUtils.conf = new Configuration();
> >> 72            }
> >> 73          }
> >> 74          return conf != null ? conf : IOUtils.conf;
> >> 75        }
> >>
> >> Now QueryBase, has in its readFields method, some
> >> IOUtils.deserialize(…) calls, that seem to pass a null for the
> >> configuration object. The IOUtils.deserialize(…) method hence calls
> >> this above method, and initializes a whole new Configuration object,
> >> as the passed conf object is null.
> >>
> >> If it does that, it would not be loading the "job.xml" file contents,
> >> which is the job's config file (thats something the map task's config
> >> set alone loads, and not a file thats loaded by default). So hence,
> >> custom serializers will disappear the moment it begins using this new
> >> Configuration object.
> >>
> >> This is what you'll want to investigate and fix or notify the Gora
> >> devs about (why QueryBase#readFields uses a null object, and if it can
> >> reuse some set conf object). As a cheap hack fix, maybe doing the
> >> following will make it work in an MR environment?
> >>
> >> IOUtils.conf = new Configuration();
> >> IOUtils.conf.addResource("job.xml");
> >>
> >> I haven't tried the above, but let us know how we can be of further
> >> assistance. An ideal fix would be to only use the MapTask's provided
> >> Configuration object everywhere, somehow, and never re-create one.
> >>
> >> P.s. If you want a thread ref link to share with other devs over Gora,
> >> here it is: http://search-hadoop.com/m/BXZA4dTUFC
> >>
> >> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
> >> <sr...@gmail.com> wrote:
> >> > Hello,
> >> > I have an MR job that talks to HBase. I use Gora to talk to HBase.
> Gora
> >> > also
> >> > provides couple of classes which can be extended to write Mappers and
> >> > Reducers, if the mappers need input from an HBase store and Reducers
> >> > need to
> >> > write it out to an HBase store. This is the reason why I use Gora.
> >> >
> >> > Now, when I run my MR job, I get an exception as below.
> >> > (https://issues.apache.org/jira/browse/HADOOP-3093)
> >> > java.lang.RuntimeException: java.io.IOException:
> >> > java.lang.NullPointerException
> >> > at
> >> >
> >> >
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
> >> > at
> >> >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> >> > at
> >> >
> >> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >> > at java.security.AccessController.doPrivileged(Native Method)
> >> > at javax.security.auth.Subject.doAs(Subject.java:415)
> >> > at
> >> >
> >> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> >> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >> > Caused by: java.io.IOException: java.lang.NullPointerException
> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
> >> > at
> >> >
> >> >
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
> >> > at
> >> >
> >> >
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
> >> > ... 9 more
> >> > Caused by: java.lang.NullPointerException
> >> > at
> >> >
> >> >
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
> >> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
> >> > at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
> >> > at
> >> >
> >> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> >> > at
> >> >
> >> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> >> > at
> >> >
> >> >
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
> >> > at
> >> >
> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
> >> > ... 11 more
> >> >
> >> > I tried the following things to work through this issue.
> >> > 0. The stack trace indicates that, when setting up a new Mapper, it is
> >> > unable to deserialize something. (I could not get to understand where
> it
> >> > fails).
> >> > 1. I looked around the forums and realized that serialization options
> >> > are
> >> > not getting passed, so, I tried setting up, io.serializations config
> on
> >> > the
> >> > job.
> >> >    1.1. I am not setting up the "io.serializations" myself, I use
> >> > GoraMapReduceUtils.setIOSerializations() to do it. I verified that,
> the
> >> > confs are getting proper serializers.
> >> > 2. I verified in the job xml to see if these confs have got through,
> >> > they
> >> > were. But, it failed again.
> >> > 3. I tried starting the hadoop job runner with debug options turned on
> >> > and
> >> > in suspend mode, -XDebug suspend=y and I also set the VM options for
> >> > mapred
> >> > child tasks, via the mapred.child.java.opts to see if I can debug the
> VM
> >> > that gets spawned newly. Although I get a message on my stdout saying,
> >> > opening port X and waiting, when I try to attach a remote debugger on
> >> > that
> >> > port, it does not work.
> >> >
> >> > I understand that, when SerializationFactory tries to deSerialize
> >> > 'something', it does not find an appropriate unmarshaller and so it
> >> > fails.
> >> > But, I would like to know a way to find that 'something' and I would
> >> > like to
> >> > get some idea on how (pseudo) distributed MR jobs should be generally
> >> > debugged. I tried searching, did not find anything useful.
> >> >
> >> > Any help/pointers would be greatly useful.
> >> >
> >> > Thanks!
> >> >
> >> > --
> >> > It's just about how deep your longing is!
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>

Re: Deserialization issue.

Posted by Harsh J <ha...@cloudera.com>.
I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
TDD) first, or via LocalJobRunner, for debug purposes.

On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran
<sr...@gmail.com> wrote:
> hello harsh,
> thanks for your investigations. while we were debugging, I saw the exact
> thing. As you pointed out, we suspected it to be a problem. So, we set the
> job conf object directly on Gora's query object.
> It goes something like this,
> query.setConf..(job.getConfig..())
>
> And, then I saw that it was not getting into creating a new object at
> getOrCreate().
>
> OTOH, i've not tried the job.xml thing. I should give it a try n I shall
> keep the loop posted.
>
> I would also like to hear about standard practices for debugging distributed
> MR tasks.
>
> -----
> reply from a hh device. Pl excuse typos n lack of formatting.
>
> On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:
>>
>> Hi Sriram,
>>
>> I suspect the following in Gora to somehow be causing this issue:
>>
>> IOUtils source:
>>
>> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
>> QueryBase source:
>>
>> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
>>
>> Notice that IOUtils.deserialize(…) calls expect a proper Configuration
>> object. If not passed (i.e., if null), they call the following.
>>
>> 68        private static Configuration getOrCreateConf(Configuration conf)
>> {
>> 69          if(conf == null) {
>> 70            if(IOUtils.conf == null) {
>> 71              IOUtils.conf = new Configuration();
>> 72            }
>> 73          }
>> 74          return conf != null ? conf : IOUtils.conf;
>> 75        }
>>
>> Now QueryBase, has in its readFields method, some
>> IOUtils.deserialize(…) calls, that seem to pass a null for the
>> configuration object. The IOUtils.deserialize(…) method hence calls
>> this above method, and initializes a whole new Configuration object,
>> as the passed conf object is null.
>>
>> If it does that, it would not be loading the "job.xml" file contents,
>> which is the job's config file (thats something the map task's config
>> set alone loads, and not a file thats loaded by default). So hence,
>> custom serializers will disappear the moment it begins using this new
>> Configuration object.
>>
>> This is what you'll want to investigate and fix or notify the Gora
>> devs about (why QueryBase#readFields uses a null object, and if it can
>> reuse some set conf object). As a cheap hack fix, maybe doing the
>> following will make it work in an MR environment?
>>
>> IOUtils.conf = new Configuration();
>> IOUtils.conf.addResource("job.xml");
>>
>> I haven't tried the above, but let us know how we can be of further
>> assistance. An ideal fix would be to only use the MapTask's provided
>> Configuration object everywhere, somehow, and never re-create one.
>>
>> P.s. If you want a thread ref link to share with other devs over Gora,
>> here it is: http://search-hadoop.com/m/BXZA4dTUFC
>>
>> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
>> <sr...@gmail.com> wrote:
>> > Hello,
>> > I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora
>> > also
>> > provides couple of classes which can be extended to write Mappers and
>> > Reducers, if the mappers need input from an HBase store and Reducers
>> > need to
>> > write it out to an HBase store. This is the reason why I use Gora.
>> >
>> > Now, when I run my MR job, I get an exception as below.
>> > (https://issues.apache.org/jira/browse/HADOOP-3093)
>> > java.lang.RuntimeException: java.io.IOException:
>> > java.lang.NullPointerException
>> > at
>> >
>> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
>> > at
>> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>> > at
>> >
>> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> > Caused by: java.io.IOException: java.lang.NullPointerException
>> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
>> > at
>> >
>> > org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
>> > at
>> >
>> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
>> > ... 9 more
>> > Caused by: java.lang.NullPointerException
>> > at
>> >
>> > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
>> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
>> > at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
>> > at
>> >
>> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>> > at
>> >
>> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>> > at
>> >
>> > org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>> > at
>> > org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
>> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
>> > ... 11 more
>> >
>> > I tried the following things to work through this issue.
>> > 0. The stack trace indicates that, when setting up a new Mapper, it is
>> > unable to deserialize something. (I could not get to understand where it
>> > fails).
>> > 1. I looked around the forums and realized that serialization options
>> > are
>> > not getting passed, so, I tried setting up, io.serializations config on
>> > the
>> > job.
>> >    1.1. I am not setting up the "io.serializations" myself, I use
>> > GoraMapReduceUtils.setIOSerializations() to do it. I verified that, the
>> > confs are getting proper serializers.
>> > 2. I verified in the job xml to see if these confs have got through,
>> > they
>> > were. But, it failed again.
>> > 3. I tried starting the hadoop job runner with debug options turned on
>> > and
>> > in suspend mode, -XDebug suspend=y and I also set the VM options for
>> > mapred
>> > child tasks, via the mapred.child.java.opts to see if I can debug the VM
>> > that gets spawned newly. Although I get a message on my stdout saying,
>> > opening port X and waiting, when I try to attach a remote debugger on
>> > that
>> > port, it does not work.
>> >
>> > I understand that, when SerializationFactory tries to deSerialize
>> > 'something', it does not find an appropriate unmarshaller and so it
>> > fails.
>> > But, I would like to know a way to find that 'something' and I would
>> > like to
>> > get some idea on how (pseudo) distributed MR jobs should be generally
>> > debugged. I tried searching, did not find anything useful.
>> >
>> > Any help/pointers would be greatly useful.
>> >
>> > Thanks!
>> >
>> > --
>> > It's just about how deep your longing is!
>> >
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: Deserialization issue.

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.
hello harsh,
thanks for your investigations. while we were debugging, I saw the exact
thing. As you pointed out, we suspected it to be a problem. So, we set the
job conf object directly on Gora's query object.
It goes something like this,
query.setConf..(job.getConfig..())

And, then I saw that it was not getting into creating a new object at
getOrCreate().

OTOH, i've not tried the job.xml thing. I should give it a try n I shall
keep the loop posted.

I would also like to hear about standard practices for debugging
distributed MR tasks.

-----
reply from a hh device. Pl excuse typos n lack of formatting.
On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:

> Hi Sriram,
>
> I suspect the following in Gora to somehow be causing this issue:
>
> IOUtils source:
>
> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
> QueryBase source:
>
> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
>
> Notice that IOUtils.deserialize(…) calls expect a proper Configuration
> object. If not passed (i.e., if null), they call the following.
>
> 68        private static Configuration getOrCreateConf(Configuration conf)
> {
> 69          if(conf == null) {
> 70            if(IOUtils.conf == null) {
> 71              IOUtils.conf = new Configuration();
> 72            }
> 73          }
> 74          return conf != null ? conf : IOUtils.conf;
> 75        }
>
> Now QueryBase, has in its readFields method, some
> IOUtils.deserialize(…) calls, that seem to pass a null for the
> configuration object. The IOUtils.deserialize(…) method hence calls
> this above method, and initializes a whole new Configuration object,
> as the passed conf object is null.
>
> If it does that, it would not be loading the "job.xml" file contents,
> which is the job's config file (thats something the map task's config
> set alone loads, and not a file thats loaded by default). So hence,
> custom serializers will disappear the moment it begins using this new
> Configuration object.
>
> This is what you'll want to investigate and fix or notify the Gora
> devs about (why QueryBase#readFields uses a null object, and if it can
> reuse some set conf object). As a cheap hack fix, maybe doing the
> following will make it work in an MR environment?
>
> IOUtils.conf = new Configuration();
> IOUtils.conf.addResource("job.xml");
>
> I haven't tried the above, but let us know how we can be of further
> assistance. An ideal fix would be to only use the MapTask's provided
> Configuration object everywhere, somehow, and never re-create one.
>
> P.s. If you want a thread ref link to share with other devs over Gora,
> here it is: http://search-hadoop.com/m/BXZA4dTUFC
>
> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
> <sr...@gmail.com> wrote:
> > Hello,
> > I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora
> also
> > provides couple of classes which can be extended to write Mappers and
> > Reducers, if the mappers need input from an HBase store and Reducers
> need to
> > write it out to an HBase store. This is the reason why I use Gora.
> >
> > Now, when I run my MR job, I get an exception as below.
> > (https://issues.apache.org/jira/browse/HADOOP-3093)
> > java.lang.RuntimeException: java.io.IOException:
> > java.lang.NullPointerException
> > at
> >
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
> > at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> > at
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by: java.io.IOException: java.lang.NullPointerException
> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
> > at
> >
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
> > at
> >
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
> > ... 9 more
> > Caused by: java.lang.NullPointerException
> > at
> >
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
> > at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
> > at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> > at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> > at
> >
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
> > at
> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
> > ... 11 more
> >
> > I tried the following things to work through this issue.
> > 0. The stack trace indicates that, when setting up a new Mapper, it is
> > unable to deserialize something. (I could not get to understand where it
> > fails).
> > 1. I looked around the forums and realized that serialization options are
> > not getting passed, so, I tried setting up, io.serializations config on
> the
> > job.
> >    1.1. I am not setting up the "io.serializations" myself, I use
> > GoraMapReduceUtils.setIOSerializations() to do it. I verified that, the
> > confs are getting proper serializers.
> > 2. I verified in the job xml to see if these confs have got through, they
> > were. But, it failed again.
> > 3. I tried starting the hadoop job runner with debug options turned on
> and
> > in suspend mode, -XDebug suspend=y and I also set the VM options for
> mapred
> > child tasks, via the mapred.child.java.opts to see if I can debug the VM
> > that gets spawned newly. Although I get a message on my stdout saying,
> > opening port X and waiting, when I try to attach a remote debugger on
> that
> > port, it does not work.
> >
> > I understand that, when SerializationFactory tries to deSerialize
> > 'something', it does not find an appropriate unmarshaller and so it
> fails.
> > But, I would like to know a way to find that 'something' and I would
> like to
> > get some idea on how (pseudo) distributed MR jobs should be generally
> > debugged. I tried searching, did not find anything useful.
> >
> > Any help/pointers would be greatly useful.
> >
> > Thanks!
> >
> > --
> > It's just about how deep your longing is!
> >
>
>
>
> --
> Harsh J
>

Re: Deserialization issue.

Posted by Harsh J <ha...@cloudera.com>.
Hi Sriram,

I suspect the following in Gora to somehow be causing this issue:

IOUtils source:
http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
QueryBase source:
http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup

Notice that IOUtils.deserialize(…) calls expect a proper Configuration
object. If not passed (i.e., if null), they call the following.

68	  private static Configuration getOrCreateConf(Configuration conf) {
69	    if(conf == null) {
70	      if(IOUtils.conf == null) {
71	        IOUtils.conf = new Configuration();
72	      }
73	    }
74	    return conf != null ? conf : IOUtils.conf;
75	  }

Now QueryBase, has in its readFields method, some
IOUtils.deserialize(…) calls, that seem to pass a null for the
configuration object. The IOUtils.deserialize(…) method hence calls
this above method, and initializes a whole new Configuration object,
as the passed conf object is null.

If it does that, it would not be loading the "job.xml" file contents,
which is the job's config file (thats something the map task's config
set alone loads, and not a file thats loaded by default). So hence,
custom serializers will disappear the moment it begins using this new
Configuration object.

This is what you'll want to investigate and fix or notify the Gora
devs about (why QueryBase#readFields uses a null object, and if it can
reuse some set conf object). As a cheap hack fix, maybe doing the
following will make it work in an MR environment?

IOUtils.conf = new Configuration();
IOUtils.conf.addResource("job.xml");

I haven't tried the above, but let us know how we can be of further
assistance. An ideal fix would be to only use the MapTask's provided
Configuration object everywhere, somehow, and never re-create one.

P.s. If you want a thread ref link to share with other devs over Gora,
here it is: http://search-hadoop.com/m/BXZA4dTUFC

On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
<sr...@gmail.com> wrote:
> Hello,
> I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora also
> provides couple of classes which can be extended to write Mappers and
> Reducers, if the mappers need input from an HBase store and Reducers need to
> write it out to an HBase store. This is the reason why I use Gora.
>
> Now, when I run my MR job, I get an exception as below.
> (https://issues.apache.org/jira/browse/HADOOP-3093)
> java.lang.RuntimeException: java.io.IOException:
> java.lang.NullPointerException
> at
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.lang.NullPointerException
> at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
> at
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
> at
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
> ... 9 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
> at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
> at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> at
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
> at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
> at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
> ... 11 more
>
> I tried the following things to work through this issue.
> 0. The stack trace indicates that, when setting up a new Mapper, it is
> unable to deserialize something. (I could not get to understand where it
> fails).
> 1. I looked around the forums and realized that serialization options are
> not getting passed, so, I tried setting up, io.serializations config on the
> job.
>    1.1. I am not setting up the "io.serializations" myself, I use
> GoraMapReduceUtils.setIOSerializations() to do it. I verified that, the
> confs are getting proper serializers.
> 2. I verified in the job xml to see if these confs have got through, they
> were. But, it failed again.
> 3. I tried starting the hadoop job runner with debug options turned on and
> in suspend mode, -XDebug suspend=y and I also set the VM options for mapred
> child tasks, via the mapred.child.java.opts to see if I can debug the VM
> that gets spawned newly. Although I get a message on my stdout saying,
> opening port X and waiting, when I try to attach a remote debugger on that
> port, it does not work.
>
> I understand that, when SerializationFactory tries to deSerialize
> 'something', it does not find an appropriate unmarshaller and so it fails.
> But, I would like to know a way to find that 'something' and I would like to
> get some idea on how (pseudo) distributed MR jobs should be generally
> debugged. I tried searching, did not find anything useful.
>
> Any help/pointers would be greatly useful.
>
> Thanks!
>
> --
> It's just about how deep your longing is!
>



-- 
Harsh J