You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by rfr <er...@gmail.com> on 2013/02/26 14:16:51 UTC
Strange fulltext search behaviour
Hello!
In our application, some nodes have "record numbers" in the form
[A-Z]/([A-Z]|[0-9])+/[0-9]{9}
For exemple:
N/620/000002032
M/AKA/000000235
or
L/AMA/0000000100
If I perform a full text search on this values, the search find my nodes.
If i try a search like N/*2032 or M/AK*235, I'm also able to retrieve my nodes.
But for an unknown reason, if I search for L/* or even
L/AMA/000000?00, the system does not find any node and seems to not
even search for something.
The record number is stored in a string property and I'm sure the
nodes are indexed since other queries on the same nodes works for
other search tokens.
It is like the "L/..." format is causing some troubles to the indexer
or the search code.
Any pointers? Can someone test this behaviour to see if it is reproductible?
Thanks a lot!
Regards,
Fred
Fwd: Strange fulltext search behaviour
Posted by rfr <er...@gmail.com>.
Hello again!
It seams that I made a mistake in my evaluation.
The problem seems to be linked when the slash is followed by a series
of letters only.
It means my query works for:
M/AB2/....
L/A11/...
but not for
L/ABC...
M/ZTS...
I came across some messages on the internet regarding lucene/solr and
slashes but I have absolutely no idea on how I could solve this
problem...
---------- Forwarded message ----------
From: rfr <er...@gmail.com>
Date: Tue, Feb 26, 2013 at 2:16 PM
Subject: Strange fulltext search behaviour
To: users@jackrabbit.apache.org
Hello!
In our application, some nodes have "record numbers" in the form
[A-Z]/([A-Z]|[0-9])+/[0-9]{9}
For exemple:
N/620/000002032
M/AKA/000000235
or
L/AMA/0000000100
If I perform a full text search on this values, the search find my nodes.
If i try a search like N/*2032 or M/AK*235, I'm also able to retrieve my nodes.
But for an unknown reason, if I search for L/* or even
L/AMA/000000?00, the system does not find any node and seems to not
even search for something.
The record number is stored in a string property and I'm sure the
nodes are indexed since other queries on the same nodes works for
other search tokens.
It is like the "L/..." format is causing some troubles to the indexer
or the search code.
Any pointers? Can someone test this behaviour to see if it is reproductible?
Thanks a lot!
Regards,
Fred
Re: Strange fulltext search behaviour
Posted by rfr <er...@gmail.com>.
Replying to myself with further analysis.
The data I'm searching for is a "code" for a "file". So I assumed it
may be an analyzer problem.
I have thus configured lucene to index the property with the keyword
analyzer in my indexConfiguration.xml file:
<analyzers>
<analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
<property>gns:numeroLabel</property>
</analyzer>
</analyzers>
After reindexing my content, I browsed my indexes with Luke and saw
that each code is indexed as a single token, what I expected.
I then tried to perform a search inside Luke, configuring the search
to search on "gns:numeroLabel" and to use the KeywordAnalyser.
I searched for L*640 or L\/*640 and Luke founds some documents (note
that in Luke, you have to escape the forntslash
I went back into my code, a searched for L*640 and found no results :(
I think the slashes are really causing some problems, but I can't
identify where ... The only thing I'm not sure is that Jackrabbit
correctly use the KeywordAnalyzer to analyze my query which is a
fullTextSearch() on the property.
Thanks for your help!
Regards,
Fred
On Tue, Feb 26, 2013 at 2:16 PM, rfr <er...@gmail.com> wrote:
> Hello!
>
> In our application, some nodes have "record numbers" in the form
> [A-Z]/([A-Z]|[0-9])+/[0-9]{9}
>
> For exemple:
>
> N/620/000002032
> M/AKA/000000235
>
> or
>
> L/AMA/0000000100
>
> If I perform a full text search on this values, the search find my nodes.
>
> If i try a search like N/*2032 or M/AK*235, I'm also able to retrieve my nodes.
>
> But for an unknown reason, if I search for L/* or even
> L/AMA/000000?00, the system does not find any node and seems to not
> even search for something.
>
> The record number is stored in a string property and I'm sure the
> nodes are indexed since other queries on the same nodes works for
> other search tokens.
>
> It is like the "L/..." format is causing some troubles to the indexer
> or the search code.
>
> Any pointers? Can someone test this behaviour to see if it is reproductible?
>
> Thanks a lot!
>
> Regards,
>
> Fred