You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by rfr <er...@gmail.com> on 2013/02/26 14:16:51 UTC

Strange fulltext search behaviour

Hello!

In our application, some nodes have "record numbers" in the form
[A-Z]/([A-Z]|[0-9])+/[0-9]{9}

For exemple:

N/620/000002032
M/AKA/000000235

or

L/AMA/0000000100

If I perform a full text search on this values, the search find my nodes.

If i try a search like N/*2032 or M/AK*235, I'm also able to retrieve my nodes.

But for an unknown reason, if I search for L/* or even
L/AMA/000000?00, the system does not find any node and seems to not
even search for something.

The record number is stored in a string property and I'm sure the
nodes are indexed since other queries on the same nodes works for
other search tokens.

It is like the "L/..." format is causing some troubles to the indexer
or the search code.

Any pointers? Can someone test this behaviour to see if it is reproductible?

Thanks a lot!

Regards,

Fred

Fwd: Strange fulltext search behaviour

Posted by rfr <er...@gmail.com>.
Hello again!

It seams that I made a mistake in  my evaluation.

The problem seems to be linked when the slash is followed by a series
of letters only.

It means my query works for:

M/AB2/....
L/A11/...

but not for

L/ABC...
M/ZTS...

I came across some messages on the internet regarding lucene/solr and
slashes but I have absolutely no idea on how I could solve this
problem...


---------- Forwarded message ----------
From: rfr <er...@gmail.com>
Date: Tue, Feb 26, 2013 at 2:16 PM
Subject: Strange fulltext search behaviour
To: users@jackrabbit.apache.org


Hello!

In our application, some nodes have "record numbers" in the form
[A-Z]/([A-Z]|[0-9])+/[0-9]{9}

For exemple:

N/620/000002032
M/AKA/000000235

or

L/AMA/0000000100

If I perform a full text search on this values, the search find my nodes.

If i try a search like N/*2032 or M/AK*235, I'm also able to retrieve my nodes.

But for an unknown reason, if I search for L/* or even
L/AMA/000000?00, the system does not find any node and seems to not
even search for something.

The record number is stored in a string property and I'm sure the
nodes are indexed since other queries on the same nodes works for
other search tokens.

It is like the "L/..." format is causing some troubles to the indexer
or the search code.

Any pointers? Can someone test this behaviour to see if it is reproductible?

Thanks a lot!

Regards,

Fred

Re: Strange fulltext search behaviour

Posted by rfr <er...@gmail.com>.
Replying to myself with further analysis.

The data I'm searching for is a "code" for a "file". So I assumed it
may be an analyzer problem.

I have thus configured lucene to index the property with the keyword
analyzer in my indexConfiguration.xml file:

<analyzers>
	<analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
		<property>gns:numeroLabel</property>
	</analyzer>
</analyzers>

After reindexing my content, I browsed my indexes with Luke and saw
that each code is indexed as a single token, what I expected.

I then tried to perform a search inside Luke, configuring the search
to search on "gns:numeroLabel" and to use the KeywordAnalyser.

I searched for L*640 or L\/*640 and Luke founds some documents (note
that in Luke, you have to escape the forntslash

I went back into my code, a searched for L*640 and found no results :(

I think the slashes are really causing some problems, but I can't
identify where ... The only thing I'm not sure is that Jackrabbit
correctly use the KeywordAnalyzer to analyze my query which is a
fullTextSearch() on the property.

Thanks for your help!


Regards,

Fred

On Tue, Feb 26, 2013 at 2:16 PM, rfr <er...@gmail.com> wrote:
> Hello!
>
> In our application, some nodes have "record numbers" in the form
> [A-Z]/([A-Z]|[0-9])+/[0-9]{9}
>
> For exemple:
>
> N/620/000002032
> M/AKA/000000235
>
> or
>
> L/AMA/0000000100
>
> If I perform a full text search on this values, the search find my nodes.
>
> If i try a search like N/*2032 or M/AK*235, I'm also able to retrieve my nodes.
>
> But for an unknown reason, if I search for L/* or even
> L/AMA/000000?00, the system does not find any node and seems to not
> even search for something.
>
> The record number is stored in a string property and I'm sure the
> nodes are indexed since other queries on the same nodes works for
> other search tokens.
>
> It is like the "L/..." format is causing some troubles to the indexer
> or the search code.
>
> Any pointers? Can someone test this behaviour to see if it is reproductible?
>
> Thanks a lot!
>
> Regards,
>
> Fred