You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/12/07 17:39:20 UTC

NPE with Generic/SpecifcDatumWriter in Avro 1.3.3

Hi,

We have an issue over in Nutch where we are trying to inject urls into
an Avro backed file store (which resides in Gora [0]). The schema we
are using to generate the Java classes to store the data can be found
here [1].

Currently when I use the Nutch Inject tool (a MR job which reads a
flat file of URLs adding metadata then stores these into the file
store) I get the following stack trace

java.lang.NullPointerException
	at org.apache.avro.specific.SpecificDatumWriter.getField(SpecificDatumWriter.java:48)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

So I guess I have the following questions

1) Is the schema OK? Is there anything which should be changed?
2) If not then can someone please explain to me (if possible) how we
get the NPE and what field within the schema might relate to this?

I am keen to learn more about this, any help in order to do so would
be greatly appreciated.

Thanks, Lewis

[0] http://svn.apache.org/repos/asf/gora/trunk/gora-core/src/main/java/org/apache/gora/avro/store/DataFileAvroStore.java
[1] https://issues.apache.org/jira/secure/attachment/12559852/webpage.avsc
[2] http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/crawl/InjectorJob.java
-- 
Lewis

Re: NPE with Generic/SpecifcDatumWriter in Avro 1.3.3

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,
Fortunately we discovered the flaw in the avsc.

The schema has been changed and appended below. Note the record fields
protocolStatus and parseStatus are now optionals (all optional). This
solved the issue.

Best

Lewis

{"name": "WebPage",
 "type": "record",
 "namespace": "org.apache.nutch.storage",
 "fields": [
        {"name": "baseUrl", "type": ["null","string"] },
        {"name": "status", "type": "int"},
        {"name": "fetchTime", "type": "long"},
        {"name": "prevFetchTime", "type": "long"},
        {"name": "fetchInterval", "type": "int"},
        {"name": "retriesSinceFetch", "type": "int"},
        {"name": "modifiedTime", "type": "long"},
        {"name": "protocolStatus", "type": ["null", {
            "name": "ProtocolStatus",
            "type": "record",
            "namespace": "org.apache.nutch.storage",
            "fields": [
                {"name": "code", "type": "int"},
                {"name": "args", "type": {"type": "array", "items": "string"}},
                {"name": "lastModified", "type": "long"}
            ]
            }]},
        {"name": "content", "type": ["null","bytes"]},
        {"name": "contentType", "type": ["null","string"] },
        {"name": "prevSignature", "type": ["null","bytes"]},
        {"name": "signature", "type": ["null","bytes"]},
        {"name": "title", "type": ["null","string"] },
        {"name": "text", "type": ["null","string"] },
        {"name": "parseStatus", "type": ["null",{
            "name": "ParseStatus",
            "type": "record",
            "namespace": "org.apache.nutch.storage",
            "fields": [
                {"name": "majorCode", "type": "int"},
                {"name": "minorCode", "type": "int"},
                {"name": "args", "type": {"type": "array", "items": "string"}}
            ]
            }]},
        {"name": "score", "type": "float"},
        {"name": "reprUrl", "type": ["null","string"] },
        {"name": "headers", "type": {"type": "map", "values": "string"}},
        {"name": "outlinks", "type": {"type": "map", "values": "string"}},
        {"name": "inlinks", "type": {"type": "map", "values": "string"}},
        {"name": "markers", "type": {"type": "map", "values": "string"}},
        {"name": "metadata", "type": {"type": "map", "values": "bytes"}}
   ]
}

On Fri, Dec 7, 2012 at 4:39 PM, Lewis John Mcgibbney
<le...@gmail.com> wrote:
> Hi,
>
> We have an issue over in Nutch where we are trying to inject urls into
> an Avro backed file store (which resides in Gora [0]). The schema we
> are using to generate the Java classes to store the data can be found
> here [1].
>
> Currently when I use the Nutch Inject tool (a MR job which reads a
> flat file of URLs adding metadata then stores these into the file
> store) I get the following stack trace
>
> java.lang.NullPointerException
>         at org.apache.avro.specific.SpecificDatumWriter.getField(SpecificDatumWriter.java:48)
>         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
>         at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
>         at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
>         at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
>         at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> So I guess I have the following questions
>
> 1) Is the schema OK? Is there anything which should be changed?
> 2) If not then can someone please explain to me (if possible) how we
> get the NPE and what field within the schema might relate to this?
>
> I am keen to learn more about this, any help in order to do so would
> be greatly appreciated.
>
> Thanks, Lewis
>
> [0] http://svn.apache.org/repos/asf/gora/trunk/gora-core/src/main/java/org/apache/gora/avro/store/DataFileAvroStore.java
> [1] https://issues.apache.org/jira/secure/attachment/12559852/webpage.avsc
> [2] http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/crawl/InjectorJob.java
> --
> Lewis



-- 
Lewis