You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2011/05/21 04:52:47 UTC

[jira] [Updated] (MAHOUT-546) Bug creating vector from Solr index with TrieFields

     [ https://issues.apache.org/jira/browse/MAHOUT-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-546:
-----------------------------

         Priority: Minor  (was: Major)
    Fix Version/s: 0.6

I am not sure that is the issue. The stack trace shows a different error. It's trying to serialize a NamedVector which has no name (and I don't know why).

I propose that we take a step to repair the issue by:

1. Write an empty string, not null, in this case, even if it shouldn't happen. It at least avoids a failure.
2. Make NamedVector never accept null. This may punt the NPE elsewhere, but will show its cause.

You both seem to understand the other issue with Solr and precision. It seems a little ugly to go so far as to work around with a command-line arg, but I don't know well enough to have a strong opinion. Is this something that anyone has any suggested patch for?

> Bug creating vector from Solr index with TrieFields
> ---------------------------------------------------
>
>                 Key: MAHOUT-546
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-546
>             Project: Mahout
>          Issue Type: Bug
>          Components: Utils
>    Affects Versions: 0.4
>            Reporter: G Fernandes
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.6
>
>
> A Solr index with an Id field of type TrieLong causes NPE when trying to extract vectors from the lucene index
> Command line used:
> ../bin/mahout lucene.vector -d ~/solr/data/index/ -o  vectors.vec -t d.dic -f text -i articleId
> where 'articleId' is a Stored numeric field declared as 'tlong' in Solr.  Output:
> no HADOOP_HOME set, running locally
> Nov 16, 2010 2:59:50 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Output File: moreover.vec
> Nov 16, 2010 2:59:51 PM org.apache.hadoop.util.NativeCodeLoader <clinit>
> WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Nov 16, 2010 2:59:51 PM org.apache.hadoop.io.compress.CodecPool getCompressor
> INFO: Got brand-new compressor
> Exception in thread "main" java.lang.NullPointerException
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:330)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> 	at org.apache.mahout.math.VectorWritable.write(VectorWritable.java:125)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
> 	at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:1128)
> 	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
> 	at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:46)
> 	at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:226)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> The problem is at class LuceneIteratable line 130:
>   name = indexReader.document(doc, idFieldSelector).get(idField);
> It does not work with the way Solr stores numeric fields in the index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira