You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Mike Baranczak (JIRA)" <ji...@apache.org> on 2012/10/11 03:13:03 UTC

[jira] [Created] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Mike Baranczak created NUTCH-1477:
-------------------------------------

             Summary: NPE when injecting with DataFileAvroStore
                 Key: NUTCH-1477
                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
             Project: Nutch
          Issue Type: Bug
          Components: storage
    Affects Versions: 2.1
         Environment: Java 1.6.0_35
            Reporter: Mike Baranczak


Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.

java.lang.NullPointerException
	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Fwd: [jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by Julien Nioche <li...@gmail.com>.
Guys,

Any suggestions for the issue below?  Would be great to be able to use
the DataFileAvroStore from Nutch

Thanks

Julien

---------- Forwarded message ----------
From: Julien Nioche (JIRA) <ji...@apache.org>
Date: 25 October 2012 15:45
Subject: [jira] [Commented] (NUTCH-1477) NPE when injecting with
DataFileAvroStore
To: dev@nutch.apache.org

    [
https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484169#comment-13484169]

Julien Nioche commented on NUTCH-1477:
--------------------------------------

Found a clue in https://issues.apache.org/jira/browse/NUTCH-842. Not sure
what the point of compile-avro-schema is but we need to compile the schemas
with gora and not just avro. The generated classes now compile fine.

Using the modified schema fails at compilation as the generated objects
don't have accessors e.g. getContentType()



> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>            Priority: Critical
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore.
Injection job throws NullPointerException, see below. No error when I
switch to MemStore.
> java.lang.NullPointerException
>       at
org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
>       at
org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
>       at
org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
>       at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
>       at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
>       at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>       at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
>       at
org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
>       at
org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
>       at
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
>       at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>       at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at
org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
>       at
org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA
administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

RE: [jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by Nishikawa, Alfonso <an...@indra.es>.
Why Am I getting this email? I never subscribed to JIRA, and the same JIRA tells me it does not have my email...


-----Mensaje original-----
De: Julien Nioche (JIRA) [mailto:jira@apache.org]
Enviado el: jueves, 25 de octubre de 2012 16:45
Para: dev@nutch.apache.org
Asunto: [jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore


    [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484169#comment-13484169 ]

Julien Nioche commented on NUTCH-1477:
--------------------------------------

Found a clue in https://issues.apache.org/jira/browse/NUTCH-842. Not sure what the point of compile-avro-schema is but we need to compile the schemas with gora and not just avro. The generated classes now compile fine.

Using the modified schema fails at compilation as the generated objects don't have accessors e.g. getContentType()



> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>            Priority: Critical
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
>       at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
>       at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
>       at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
>       at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
>       at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
>       at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>       at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
>       at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
>       at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
>       at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
>       at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>       at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
>       at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:21
> 2)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

Este correo electrónico y, en su caso, cualquier fichero anexo al mismo, contiene información de carácter confidencial exclusivamente dirigida a su destinatario o destinatarios. Si no es vd. el destinatario indicado, queda notificado que la lectura, utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. En el caso de haber recibido este correo electrónico por error, se ruega notificar inmediatamente esta circunstancia mediante reenvío a la dirección electrónica del remitente.
Evite imprimir este mensaje si no es estrictamente necesario.

This email and any file attached to it (when applicable) contain(s) confidential information that is exclusively addressed to its recipient(s). If you are not the indicated recipient, you are informed that reading, using, disseminating and/or copying it without authorisation is forbidden in accordance with the legislation in effect. If you have received this email by mistake, please immediately notify the sender of the situation by resending it to their email address.
Avoid printing this message if it is not absolutely necessary.

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484908#comment-13484908 ] 

Lewis John McGibbney commented on NUTCH-1477:
---------------------------------------------

Hi Julien, can you confirm a few things for me please...
- Do you suggest we update the patch in NUTCH-842 with the correct package name for Gora in the Nutch build.xml file and remove the {code} ant compile-avro-schema {code} target?
- If no accessors are generated then is this not a problem with the Gora compiler? If so we should open a ticket over there and link the issues.
                
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>            Priority: Critical
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche updated NUTCH-1477:
---------------------------------

    Priority: Critical  (was: Major)
    
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>            Priority: Critical
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485172#comment-13485172 ] 

Julien Nioche commented on NUTCH-1477:
--------------------------------------

Hi Lewis

bq. Do you suggest we update the patch in NUTCH-842 with the correct package name for Gora in the Nutch build.xml file and remove the ant compile-avro-schema target?

yes, until someone can explain what that target is useful for?

bq. If no accessors are generated then is this not a problem with the Gora compiler? If so we should open a ticket over there and link the issues.

it is indeed. Looks like the gora compiler can't deal with the ["string", "null"] union. Will create an issue in GORA land


                
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>            Priority: Critical
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche updated NUTCH-1477:
---------------------------------

    Fix Version/s: 2.2
         Assignee: Julien Nioche
    
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>             Fix For: 2.2
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484148#comment-13484148 ] 

Julien Nioche commented on NUTCH-1477:
--------------------------------------

I found in http://mail-archives.apache.org/mod_mbox/avro-user/200910.mbox/%3C4AE78503.50307@apache.org%3E that we probably need to explicitly allow for null values in the schema (see attachment). 

I tried recompiling the schemas with {{ant compile-avro-schema}} but the classes generated do not compile and are nowhere near as complete as the original ones. More worryingly the same is true with the original schema. I assumed that the code in org.apache.nutch.storage could be generated from the schemas.

Any idea?
                
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche updated NUTCH-1477:
---------------------------------

    Attachment: webpage.avsc

Modified avro schema which allows fields to be null
                
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484169#comment-13484169 ] 

Julien Nioche commented on NUTCH-1477:
--------------------------------------

Found a clue in https://issues.apache.org/jira/browse/NUTCH-842. Not sure what the point of compile-avro-schema is but we need to compile the schemas with gora and not just avro. The generated classes now compile fine.

Using the modified schema fails at compilation as the generated objects don't have accessors e.g. getContentType()


                
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>            Priority: Critical
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Mike Baranczak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473756#comment-13473756 ] 

Mike Baranczak commented on NUTCH-1477:
---------------------------------------

I tried upgrading the Avro library to the latest (1.7.2), but I just get another error:

org.apache.gora.util.GoraException: org.apache.avro.AvroRuntimeException: Not a Specific class: class org.apache.nutch.storage.WebPage
	at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
	at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
	at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
	at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
	at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228)
	at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258)
Caused by: org.apache.avro.AvroRuntimeException: Not a Specific class: class org.apache.nutch.storage.WebPage
	at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:213)
	at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:154)
	at org.apache.avro.specific.SpecificDatumReader.setSchema(SpecificDatumReader.java:62)
	at org.apache.gora.avro.PersistentDatumReader.setSchema(PersistentDatumReader.java:69)
	at org.apache.gora.avro.PersistentDatumReader.<init>(PersistentDatumReader.java:63)
	at org.apache.gora.store.impl.DataStoreBase.initialize(DataStoreBase.java:87)
	at org.apache.gora.store.impl.FileBackedDataStoreBase.initialize(FileBackedDataStoreBase.java:63)
	at org.apache.gora.avro.store.AvroStore.initialize(AvroStore.java:80)
	at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
	at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)

                
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479919#comment-13479919 ] 

Julien Nioche commented on NUTCH-1477:
--------------------------------------

Thanks Mike. I confirm the issue. 
Did you recompile the Webpage class from the AVRO defs when using the latest version of AVRO? Could be an incompatibility between the versions.
Going back to the original problem I don't think the problem comes from AVRO as we would have it with the other backends as well. As for the MemStore I don't think it is used for anything else than tests.
                
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.
> java.lang.NullPointerException
> 	at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> 	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> 	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> 	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> 	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira