You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gora Mohanty <go...@srijan.in> on 2010/01/19 19:41:05 UTC

Data storage, and textual analysis

Hi,

Another simple query. I have set up a field to hold phonetic
equivalents, with the relevant part of schema.xml looking like:
<analyzer>
 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
 <filter class="solr.WordDelimiterFilterFactory"
 generateWordParts="1" generateNumberParts="0" catenateWords="1"
 catenateNumbers="0" catenateAll="0"/>
 <filter class="solr.LowerCaseFilterFactory"/> <filter
 class="com.srijan.search.solr.analysis.AspellFilterFactory"/>
</analyzer>

Here, com.srijan.search.solr.analysis.AspellFilterFactory is
a custom filter that provides a phonetic soundslike equivalent for
Indian languages transliterated into English. However, that is
irrelevant here, as the issue below holds even if I use the standard
solr.DoubleMetaphoneFilterFactory.

I have a data source where all text is upper-case, and from
various Solr-related discussions found through Google, I would have
thought that fields of this type would be stored as the lower-case,
soundslike equivalent. Instead the data (as seen through the Solr
admin. interface, or through a front-end search) seem to be stored
as is.

The Solr admin. analysis view does show the index and query
conversions as I would expect. Also, phonetic matches, and matches
with lower-case input work properly. I am just curious as to how
this works.

Regards,
Gora

Re: Data storage, and textual analysis

Posted by Gora Mohanty <go...@srijan.in>.
On Tue, 19 Jan 2010 12:02:27 -0800 (PST)
Otis Gospodnetic <ot...@yahoo.com> wrote:

> Gora,
> 
> What you are seeing are the *stored* values, which are the
> original, unchanged field values. Analysis is applied to text for
> *indexing* purposes.
[...]

Ah, of course. Seems obvious now, and I had misread this message
in the mailing thread here:
http://old.nabble.com/How-does-search-work-with-phonetic-filter-factory---td24643678.html

Thanks for the help.

Regards,
Gora

Re: Data storage, and textual analysis

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Gora,

What you are seeing are the *stored* values, which are the original, unchanged field values.
Analysis is applied to text for *indexing* purposes.


Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Gora Mohanty <go...@srijan.in>
> To: solr-user@lucene.apache.org
> Sent: Tue, January 19, 2010 1:41:05 PM
> Subject: Data storage, and textual analysis
> 
> Hi,
> 
> Another simple query. I have set up a field to hold phonetic
> equivalents, with the relevant part of schema.xml looking like:
> 
> 
> 
> generateWordParts="1" generateNumberParts="0" catenateWords="1"
> catenateNumbers="0" catenateAll="0"/>
> 
> class="com.srijan.search.solr.analysis.AspellFilterFactory"/>
> 
> 
> Here, com.srijan.search.solr.analysis.AspellFilterFactory is
> a custom filter that provides a phonetic soundslike equivalent for
> Indian languages transliterated into English. However, that is
> irrelevant here, as the issue below holds even if I use the standard
> solr.DoubleMetaphoneFilterFactory.
> 
> I have a data source where all text is upper-case, and from
> various Solr-related discussions found through Google, I would have
> thought that fields of this type would be stored as the lower-case,
> soundslike equivalent. Instead the data (as seen through the Solr
> admin. interface, or through a front-end search) seem to be stored
> as is.
> 
> The Solr admin. analysis view does show the index and query
> conversions as I would expect. Also, phonetic matches, and matches
> with lower-case input work properly. I am just curious as to how
> this works.
> 
> Regards,
> Gora