You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tricia Williams <pg...@student.cs.uwaterloo.ca> on 2006/07/27 18:54:58 UTC

add/update index

Hi,

    I have created a process which uses xsl to convert my data to the form 
indicated in the examples so that it can be added to the index as the solr 
tutorial indicates:
<add>
   <doc>
     <field name="field">value</field>
     ...
   </doc>
</add>

    In some cases the xsl process will create a field element with no data. 
(ie <field name="field"/>)  Is this considered bad input and will not be 
accepted?  Or is this something that solr should deal with?  Currently for 
each field element with no data I receive the message:
<result status="1">java.lang.NullPointerException
  at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:78)
  at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:74)
  at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:917)
  at org.apache.solr.core.SolrCore.update(SolrCore.java:685)
  at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52)
  ...
</result>

    Just curious if the gurus out there think I should deal with the null 
values in my xsl process or if this can be dealt with in solr itself?

Thanks,
Tricia

ps.  Thanks for the timely fix for the UTF-8 issue!

Re: add/update index

Posted by Tricia Williams <pg...@student.cs.uwaterloo.ca>.
Thanks Yonik,

    That's exactly what I needed to know.  I'll adapt my xsl process to 
omit null values.

Tricia

On Thu, 27 Jul 2006, Yonik Seeley wrote:

> On 7/27/06, Tricia Williams <pg...@student.cs.uwaterloo.ca> wrote:
>> Hi,
>>
>>     I have created a process which uses xsl to convert my data to the form
>> indicated in the examples so that it can be added to the index as the solr
>> tutorial indicates:
>> <add>
>>    <doc>
>>      <field name="field">value</field>
>>      ...
>>    </doc>
>> </add>
>>
>>     In some cases the xsl process will create a field element with no data.
>> (ie <field name="field"/>)  Is this considered bad input and will not be
>> accepted?
>
> If the desired semantics are "the field doesn't exist" or "null value"
> then yes.  There isn't a way to represent a field without a value in
> Lucene except to not add the field for that document.  If it's totally
> ignored, it probably shouldn't be in the XML.
>
> Now, one might think we could drop fields with no value, but that's
> problematic because it goes against the XML standard:
>
> http://www.w3.org/TR/REC-xml/#sec-starttags
> [Definition: An element with no content is said to be empty.] The
> representation of an empty element is either a start-tag immediately
> followed by an end-tag, or an empty-element tag. [Definition: An
> empty-element tag takes a special form:]
>
> So <a></a> and <a/> are supposed to be equivalent.  Given that, it
> does look like Solr should treat <field name="val"/> like a
> zero-length string (but that's not what you wanted, right?)
>
> -Yonik
>

Re: add/update index

Posted by Yonik Seeley <yo...@apache.org>.
On 7/27/06, Tricia Williams <pg...@student.cs.uwaterloo.ca> wrote:
> Hi,
>
>     I have created a process which uses xsl to convert my data to the form
> indicated in the examples so that it can be added to the index as the solr
> tutorial indicates:
> <add>
>    <doc>
>      <field name="field">value</field>
>      ...
>    </doc>
> </add>
>
>     In some cases the xsl process will create a field element with no data.
> (ie <field name="field"/>)  Is this considered bad input and will not be
> accepted?

If the desired semantics are "the field doesn't exist" or "null value"
then yes.  There isn't a way to represent a field without a value in
Lucene except to not add the field for that document.  If it's totally
ignored, it probably shouldn't be in the XML.

Now, one might think we could drop fields with no value, but that's
problematic because it goes against the XML standard:

http://www.w3.org/TR/REC-xml/#sec-starttags
[Definition: An element with no content is said to be empty.] The
representation of an empty element is either a start-tag immediately
followed by an end-tag, or an empty-element tag. [Definition: An
empty-element tag takes a special form:]

So <a></a> and <a/> are supposed to be equivalent.  Given that, it
does look like Solr should treat <field name="val"/> like a
zero-length string (but that's not what you wanted, right?)

-Yonik