You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Alex Parvulescu <al...@gmail.com> on 2012/07/03 11:19:35 UTC

Re: jcr sql2 - contains() full text search not working

Hi Carl,

What version of jackrabbit are you on?

Next, are you sure you have the tika extractors in the classpath? maybe you
are seeing something along the lines of [0].

I would try to isolate the problem by taking tomcat out of the setup. Build
a simple test, see how it works then deploy on tomcat and verify.
A good place to start is the unit test collection available in jackrabbit
core [1].


best,
alex

[0] https://issues.apache.org/jira/browse/JCR-3287
[1]
http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/java/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=markup


On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl <Ca...@mlb.com> wrote:

> So given the below I tried to use
>
> 'inclu*' and 'include*' and still no results so I'm going to start looking
> into perhaps maybe some of these reasons as why:
>
> https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BA
> C8_incorrect_hits.3F
>
> Of course it could just be that the parser is not parsing the '*'.
>
> Thanks again,
>
>
>
> Carl Furst
>
>
>
>
>
> On 6/27/12 1:59 PM, "Furst, Carl" <Ca...@mlb.com> wrote:
>
> >Thanks Torsten,
> >
> >So even using JQOM would not help here. I'll read up more on lucine and
> >find out more. My main stumbling block here was where the query was being
> >executed. Was it on the Derby level or the Lucine level..
> >
> >This has cleared that part of it up for me as well.
> >
> >Thanks again,
> >
> >Carl Furst
> >
> >
> >
> >
> >
> >
> >On 6/27/12 1:50 PM, "Torsten Stolpmann" <st...@verit.de> wrote:
> >
> >>Hi Carl,
> >>
> >>per default the underlying Lucene implementation does not match leading
> >>wildcards for performance reasons. See also:
> >>
> https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_suppor
> >>t
> >>_is_available_from_Lucene.3F
> >>
> >>So just matching '*' will not work, but eg. 'i*' might give you the
> >>results you were looking for.
> >>
> >>Sadly enough I did not find any reference to this in the JackRabbit
> >>documentation.
> >>
> >>Took me quite a while to find that too.
> >>
> >>Hope this helps,
> >>
> >>Torsten
> >>
> >>On 27.06.2012 17:19, Furst, Carl wrote:
> >>> I'm probably missing something here but everything I've read so far
> >>>leads
> >>> me to believe this should work..
> >>>
> >>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file has
> >>>a
> >>> child node jcr:content of type nt:resource which has a child property
> >>> called jcr:data
> >>>
> >>> There are many cases where the jcr:data column has the world 'include'
> >>>in
> >>> it. They are jsp files so, yes, I know this word exists in several
> >>>files.
> >>>
> >>> So here's the sql I use:
> >>>
> >>> select * from [nt:resource] where  contains([jcr:data], 'include');
> >>>
> >>> Here's the sql that is returned from q.getStatement() :
> >>>
> >>> SELECT [nt:resource].* FROM [nt:resource] WHERE
> >>> CONTAINS([nt:resource].[jcr:data], 'include');
> >>>
> >>> Here is a sample text in jcr:data to search on.
> >>>
> >>> <%@ include file="..."
> >>>
> >>>
> >>> ... More jsp here..
> >>> <%/jsp:include...
> >>>
> >>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need to add
> >>>a
> >>> "searchable" mixin or something?
> >>>
> >>> Any ideas why this is not being found?
> >>>
> >>> It used to be that apache had the cdn file for jackrabbit node types
> >>>was
> >>> readily available. Does anyone know where I can find the cdn file for
> >>> jackrabbit node types?
> >>>
> >>> jcr:content is unstructured, but I explicitly make the type nt:resource
> >>> (otherwise the statement would would not be parsed, Query object would
> >>> throw an error, like "table not found," right? Because the type is a
> >>> table). So the type is right.. The field is right.. The search is not
> >>> working.
> >>>
> >>>
> >>> I'm using Jackrabbit without any special configuration. Just the war in
> >>>a
> >>> simple tomcat deployment. So it's sitting on top of Derby and Lucine.
> >>>
> >>>
> >>> Any help would be appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> Carl Furst
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> **********************************************************
> >>>
> >>> MLB.com: Where Baseball is Always On
> >>>
> >>
> >
> >
> >
> >
> >
> >
> >**********************************************************
> >
> >MLB.com: Where Baseball is Always On
>
>
>
>
>
>
> **********************************************************
>
> MLB.com: Where Baseball is Always On
>

Re: jcr sql2 - contains() full text search not working

Posted by "Furst, Carl" <Ca...@mlb.com>.
Tried the same query using xpath:

//mc_art_contest.inc.html

Worked! 

Just FYI.


Carl Furst





On 7/12/12 6:07 PM, "Furst, Carl" <Ca...@mlb.com> wrote:

>I even tried a simpler query based on findings with Luke
>
>In Luke I did the following:
>
>_\:LOCAL_NAME: mc_art_contest.inc.html
>
>
>Which is the name of one of the nodes stored.
>
>And Luke reported one record found.
>
>Then I tried 
>
>select * from [nt:file] where name = 'mc_art_contest.inc.html'
>
>
>In JR and 
>found: 0 nodes
>
>
>Was the result.. The problem is.. I'm not sure where the bug is.. But text
>searches are not working with a derby/lucene/Jackrabbit default deploy. I
>tried this as a servlet in the same container as the war, I tried this as
>an RMI/JCA application…  No luck.. So that is that and its been fun.
>
>Thanks,
>
>
>
>Carl Furst
>
>
>
>
>
>
>On 7/11/12 4:17 PM, "Furst, Carl" <Ca...@mlb.com> wrote:
>
>>Thanks for the help Torsten,
>>
>>Unfortunately that didn't work. The output from my test is as follows:
>>
>>mimetype for node we are looking for is: text/html
>>// Which was taken from the node, using the path. This is the text that
>>is
>>stored in jcr:mimeType
>>
>>text for node we are looking for is:
>>FanFest Art Contest Winners</b></span><br>
>>// this is a snippet of text from the document I was searching stored in
>>jcr:data
>>
>>
>>
>>
>>starting execute
>>executing current query with sqlSELECT [nt:resource].* FROM [nt:resource]
>>WHERE CONTAINS([nt:resource].[jcr:data], 'FanFest Art Contest') using
>>language JCR-SQL2
>>//This is the query as extracted from the Query object
>>
>>And this is the result:
>>
>>found: 0 nodes
>>executed test in 660 ms
>>
>>
>>So something is not right…(SQL, maybe?). Maybe the node iterator isn't
>>getting the right count of nodes? Could it be that over RMI it's possible
>>to get the nodes but not the right count nodes returned?
>>
>>Hmmm…. 
>>
>>
>>Carl Furst
>>
>>
>>
>>
>>
>>On 7/11/12 2:32 PM, "Torsten Stolpmann" <st...@verit.de> wrote:
>>
>>>Hi Carl,
>>>
>>>AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for
>>>details. Maybe [2] has the correct answer to your problem (explicitly
>>>setting the jcr:mimeType for your data node)?
>>>
>>>HTH,
>>>
>>>Torsten
>>>
>>>[1] https://issues.apache.org/jira/browse/JCR-1878
>>>[2]
>>>http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-
>>>t
>>>o
>>>-specify-extractors-td4534050.html
>>>
>>>On 11.07.2012 20:16, Furst, Carl wrote:
>>>> So after some investigation I'm at a loss as to which class to use for
>>>> text extraction (ie what to set textFilterClasses to in the
>>>>workspace.xml
>>>> file).  Which class is the default in 2.4.2? The Wiki I think is
>>>> incorrect... It states
>>>> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
>>>> default, but I don't see that class in the source code.
>>>> 
>>>> Possible candidates are:
>>>> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
>>>> indexer)
>>>> Org.apache.jackrabbit.core.query.lucene.BlockingParser
>>>> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
>>>> 
>>>> Any suggestions? I'll plug in the last two and see if things improve.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Carl Furst
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 7/11/12 1:36 PM, "Furst, Carl"<Ca...@mlb.com>  wrote:
>>>> 
>>>>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a
>>>>>test.
>>>>>
>>>>> Thanks,
>>>>> Carl Furst
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 7/3/12 5:19 AM, "Alex Parvulescu"<al...@gmail.com>
>>>>>wrote:
>>>>>
>>>>>> Hi Carl,
>>>>>>
>>>>>> What version of jackrabbit are you on?
>>>>>>
>>>>>> Next, are you sure you have the tika extractors in the classpath?
>>>>>>maybe
>>>>>> you
>>>>>> are seeing something along the lines of [0].
>>>>>>
>>>>>> I would try to isolate the problem by taking tomcat out of the
>>>>>>setup.
>>>>>> Build
>>>>>> a simple test, see how it works then deploy on tomcat and verify.
>>>>>> A good place to start is the unit test collection available in
>>>>>>jackrabbit
>>>>>> core [1].
>>>>>>
>>>>>>
>>>>>> best,
>>>>>> alex
>>>>>>
>>>>>> [0] https://issues.apache.org/jira/browse/JCR-3287
>>>>>> [1]
>>>>>> 
>>>>>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/tes
>>>>>>t
>>>>>>/
>>>>>>ja
>>>>>> v
>>>>>> 
>>>>>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=ma
>>>>>>r
>>>>>>k
>>>>>>up
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<Ca...@mlb.com>
>>>>>>wrote:
>>>>>>
>>>>>>> So given the below I tried to use
>>>>>>>
>>>>>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>>>>> looking
>>>>>>> into perhaps maybe some of these reasons as why:
>>>>>>>
>>>>>>>
>>>>>>> 
>>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hi
>>>>>>>t
>>>>>>>s
>>>>>>>_.
>>>>>>> 2
>>>>>>> BA
>>>>>>> C8_incorrect_hits.3F
>>>>>>>
>>>>>>> Of course it could just be that the parser is not parsing the '*'.
>>>>>>>
>>>>>>> Thanks again,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Carl Furst
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 6/27/12 1:59 PM, "Furst, Carl"<Ca...@mlb.com>  wrote:
>>>>>>>
>>>>>>>> Thanks Torsten,
>>>>>>>>
>>>>>>>> So even using JQOM would not help here. I'll read up more on
>>>>>>>>lucine
>>>>>>> and
>>>>>>>> find out more. My main stumbling block here was where the query
>>>>>>>>was
>>>>>>> being
>>>>>>>> executed. Was it on the Derby level or the Lucine level..
>>>>>>>>
>>>>>>>> This has cleared that part of it up for me as well.
>>>>>>>>
>>>>>>>> Thanks again,
>>>>>>>>
>>>>>>>> Carl Furst
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<st...@verit.de>
>>>>>>>>wrote:
>>>>>>>>
>>>>>>>>> Hi Carl,
>>>>>>>>>
>>>>>>>>> per default the underlying Lucene implementation does not match
>>>>>>> leading
>>>>>>>>> wildcards for performance reasons. See also:
>>>>>>>>>
>>>>>>>
>>>>>>> 
>>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_s
>>>>>>>u
>>>>>>>p
>>>>>>>po
>>>>>>> r
>>>>>>>>> t
>>>>>>>>> _is_available_from_Lucene.3F
>>>>>>>>>
>>>>>>>>> So just matching '*' will not work, but eg. 'i*' might give you
>>>>>>>>>the
>>>>>>>>> results you were looking for.
>>>>>>>>>
>>>>>>>>> Sadly enough I did not find any reference to this in the
>>>>>>>>>JackRabbit
>>>>>>>>> documentation.
>>>>>>>>>
>>>>>>>>> Took me quite a while to find that too.
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Torsten
>>>>>>>>>
>>>>>>>>> On 27.06.2012 17:19, Furst, Carl wrote:
>>>>>>>>>> I'm probably missing something here but everything I've read so
>>>>>>>>>>far
>>>>>>>>>> leads
>>>>>>>>>> me to believe this should work..
>>>>>>>>>>
>>>>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file.
>>>>>>>>>>nt:file
>>>>>>> has
>>>>>>>>>> a
>>>>>>>>>> child node jcr:content of type nt:resource which has a child
>>>>>>> property
>>>>>>>>>> called jcr:data
>>>>>>>>>>
>>>>>>>>>> There are many cases where the jcr:data column has the world
>>>>>>> 'include'
>>>>>>>>>> in
>>>>>>>>>> it. They are jsp files so, yes, I know this word exists in
>>>>>>>>>>several
>>>>>>>>>> files.
>>>>>>>>>>
>>>>>>>>>> So here's the sql I use:
>>>>>>>>>>
>>>>>>>>>> select * from [nt:resource] where  contains([jcr:data],
>>>>>>>>>>'include');
>>>>>>>>>>
>>>>>>>>>> Here's the sql that is returned from q.getStatement() :
>>>>>>>>>>
>>>>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>>>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include');
>>>>>>>>>>
>>>>>>>>>> Here is a sample text in jcr:data to search on.
>>>>>>>>>>
>>>>>>>>>> <%@ include file="..."
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ... More jsp here..
>>>>>>>>>> <%/jsp:include...
>>>>>>>>>>
>>>>>>>>>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need
>>>>>>>>>>to
>>>>>>> add
>>>>>>>>>> a
>>>>>>>>>> "searchable" mixin or something?
>>>>>>>>>>
>>>>>>>>>> Any ideas why this is not being found?
>>>>>>>>>>
>>>>>>>>>> It used to be that apache had the cdn file for jackrabbit node
>>>>>>> types
>>>>>>>>>> was
>>>>>>>>>> readily available. Does anyone know where I can find the cdn
>>>>>>>>>>file
>>>>>>> for
>>>>>>>>>> jackrabbit node types?
>>>>>>>>>>
>>>>>>>>>> jcr:content is unstructured, but I explicitly make the type
>>>>>>> nt:resource
>>>>>>>>>> (otherwise the statement would would not be parsed, Query object
>>>>>>> would
>>>>>>>>>> throw an error, like "table not found," right? Because the type
>>>>>>>>>>is
>>>>>>> a
>>>>>>>>>> table). So the type is right.. The field is right.. The search
>>>>>>>>>>is
>>>>>>> not
>>>>>>>>>> working.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm using Jackrabbit without any special configuration. Just the
>>>>>>> war in
>>>>>>>>>> a
>>>>>>>>>> simple tomcat deployment. So it's sitting on top of Derby and
>>>>>>> Lucine.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Carl Furst
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> **********************************************************
>>>>>>>>>>
>>>>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> **********************************************************
>>>>>>>>
>>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> **********************************************************
>>>>>>>
>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> **********************************************************
>>>>>
>>>>> MLB.com: Where Baseball is Always On
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> **********************************************************
>>>> 
>>>> MLB.com: Where Baseball is Always On
>>>
>>
>>
>>
>>
>>
>>
>>**********************************************************
>>
>>MLB.com: Where Baseball is Always On
>
>
>
>
>
>
>**********************************************************
>
>MLB.com: Where Baseball is Always On






**********************************************************

MLB.com: Where Baseball is Always On

Re: jcr sql2 - contains() full text search not working

Posted by "Furst, Carl" <Ca...@mlb.com>.
I even tried a simpler query based on findings with Luke

In Luke I did the following:

_\:LOCAL_NAME: mc_art_contest.inc.html


Which is the name of one of the nodes stored.

And Luke reported one record found.

Then I tried 

select * from [nt:file] where name = 'mc_art_contest.inc.html'


In JR and 
found: 0 nodes


Was the result.. The problem is.. I'm not sure where the bug is.. But text
searches are not working with a derby/lucene/Jackrabbit default deploy. I
tried this as a servlet in the same container as the war, I tried this as
an RMI/JCA application…  No luck.. So that is that and its been fun.

Thanks,



Carl Furst






On 7/11/12 4:17 PM, "Furst, Carl" <Ca...@mlb.com> wrote:

>Thanks for the help Torsten,
>
>Unfortunately that didn't work. The output from my test is as follows:
>
>mimetype for node we are looking for is: text/html
>// Which was taken from the node, using the path. This is the text that is
>stored in jcr:mimeType
>
>text for node we are looking for is:
>FanFest Art Contest Winners</b></span><br>
>// this is a snippet of text from the document I was searching stored in
>jcr:data
>
>
>
>
>starting execute
>executing current query with sqlSELECT [nt:resource].* FROM [nt:resource]
>WHERE CONTAINS([nt:resource].[jcr:data], 'FanFest Art Contest') using
>language JCR-SQL2
>//This is the query as extracted from the Query object
>
>And this is the result:
>
>found: 0 nodes
>executed test in 660 ms
>
>
>So something is not right…(SQL, maybe?). Maybe the node iterator isn't
>getting the right count of nodes? Could it be that over RMI it's possible
>to get the nodes but not the right count nodes returned?
>
>Hmmm…. 
>
>
>Carl Furst
>
>
>
>
>
>On 7/11/12 2:32 PM, "Torsten Stolpmann" <st...@verit.de> wrote:
>
>>Hi Carl,
>>
>>AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for
>>details. Maybe [2] has the correct answer to your problem (explicitly
>>setting the jcr:mimeType for your data node)?
>>
>>HTH,
>>
>>Torsten
>>
>>[1] https://issues.apache.org/jira/browse/JCR-1878
>>[2]
>>http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-t
>>o
>>-specify-extractors-td4534050.html
>>
>>On 11.07.2012 20:16, Furst, Carl wrote:
>>> So after some investigation I'm at a loss as to which class to use for
>>> text extraction (ie what to set textFilterClasses to in the
>>>workspace.xml
>>> file).  Which class is the default in 2.4.2? The Wiki I think is
>>> incorrect... It states
>>> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
>>> default, but I don't see that class in the source code.
>>> 
>>> Possible candidates are:
>>> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
>>> indexer)
>>> Org.apache.jackrabbit.core.query.lucene.BlockingParser
>>> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
>>> 
>>> Any suggestions? I'll plug in the last two and see if things improve.
>>> 
>>> 
>>> 
>>> 
>>> Thanks,
>>> Carl Furst
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 7/11/12 1:36 PM, "Furst, Carl"<Ca...@mlb.com>  wrote:
>>> 
>>>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a
>>>>test.
>>>>
>>>> Thanks,
>>>> Carl Furst
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 7/3/12 5:19 AM, "Alex Parvulescu"<al...@gmail.com>
>>>>wrote:
>>>>
>>>>> Hi Carl,
>>>>>
>>>>> What version of jackrabbit are you on?
>>>>>
>>>>> Next, are you sure you have the tika extractors in the classpath?
>>>>>maybe
>>>>> you
>>>>> are seeing something along the lines of [0].
>>>>>
>>>>> I would try to isolate the problem by taking tomcat out of the setup.
>>>>> Build
>>>>> a simple test, see how it works then deploy on tomcat and verify.
>>>>> A good place to start is the unit test collection available in
>>>>>jackrabbit
>>>>> core [1].
>>>>>
>>>>>
>>>>> best,
>>>>> alex
>>>>>
>>>>> [0] https://issues.apache.org/jira/browse/JCR-3287
>>>>> [1]
>>>>> 
>>>>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test
>>>>>/
>>>>>ja
>>>>> v
>>>>> 
>>>>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=mar
>>>>>k
>>>>>up
>>>>>
>>>>>
>>>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<Ca...@mlb.com>
>>>>>wrote:
>>>>>
>>>>>> So given the below I tried to use
>>>>>>
>>>>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>>>> looking
>>>>>> into perhaps maybe some of these reasons as why:
>>>>>>
>>>>>>
>>>>>> 
>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hit
>>>>>>s
>>>>>>_.
>>>>>> 2
>>>>>> BA
>>>>>> C8_incorrect_hits.3F
>>>>>>
>>>>>> Of course it could just be that the parser is not parsing the '*'.
>>>>>>
>>>>>> Thanks again,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Carl Furst
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/27/12 1:59 PM, "Furst, Carl"<Ca...@mlb.com>  wrote:
>>>>>>
>>>>>>> Thanks Torsten,
>>>>>>>
>>>>>>> So even using JQOM would not help here. I'll read up more on lucine
>>>>>> and
>>>>>>> find out more. My main stumbling block here was where the query was
>>>>>> being
>>>>>>> executed. Was it on the Derby level or the Lucine level..
>>>>>>>
>>>>>>> This has cleared that part of it up for me as well.
>>>>>>>
>>>>>>> Thanks again,
>>>>>>>
>>>>>>> Carl Furst
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<st...@verit.de>  wrote:
>>>>>>>
>>>>>>>> Hi Carl,
>>>>>>>>
>>>>>>>> per default the underlying Lucene implementation does not match
>>>>>> leading
>>>>>>>> wildcards for performance reasons. See also:
>>>>>>>>
>>>>>>
>>>>>> 
>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_su
>>>>>>p
>>>>>>po
>>>>>> r
>>>>>>>> t
>>>>>>>> _is_available_from_Lucene.3F
>>>>>>>>
>>>>>>>> So just matching '*' will not work, but eg. 'i*' might give you
>>>>>>>>the
>>>>>>>> results you were looking for.
>>>>>>>>
>>>>>>>> Sadly enough I did not find any reference to this in the
>>>>>>>>JackRabbit
>>>>>>>> documentation.
>>>>>>>>
>>>>>>>> Took me quite a while to find that too.
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>>
>>>>>>>> Torsten
>>>>>>>>
>>>>>>>> On 27.06.2012 17:19, Furst, Carl wrote:
>>>>>>>>> I'm probably missing something here but everything I've read so
>>>>>>>>>far
>>>>>>>>> leads
>>>>>>>>> me to believe this should work..
>>>>>>>>>
>>>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file.
>>>>>>>>>nt:file
>>>>>> has
>>>>>>>>> a
>>>>>>>>> child node jcr:content of type nt:resource which has a child
>>>>>> property
>>>>>>>>> called jcr:data
>>>>>>>>>
>>>>>>>>> There are many cases where the jcr:data column has the world
>>>>>> 'include'
>>>>>>>>> in
>>>>>>>>> it. They are jsp files so, yes, I know this word exists in
>>>>>>>>>several
>>>>>>>>> files.
>>>>>>>>>
>>>>>>>>> So here's the sql I use:
>>>>>>>>>
>>>>>>>>> select * from [nt:resource] where  contains([jcr:data],
>>>>>>>>>'include');
>>>>>>>>>
>>>>>>>>> Here's the sql that is returned from q.getStatement() :
>>>>>>>>>
>>>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include');
>>>>>>>>>
>>>>>>>>> Here is a sample text in jcr:data to search on.
>>>>>>>>>
>>>>>>>>> <%@ include file="..."
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ... More jsp here..
>>>>>>>>> <%/jsp:include...
>>>>>>>>>
>>>>>>>>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need
>>>>>>>>>to
>>>>>> add
>>>>>>>>> a
>>>>>>>>> "searchable" mixin or something?
>>>>>>>>>
>>>>>>>>> Any ideas why this is not being found?
>>>>>>>>>
>>>>>>>>> It used to be that apache had the cdn file for jackrabbit node
>>>>>> types
>>>>>>>>> was
>>>>>>>>> readily available. Does anyone know where I can find the cdn file
>>>>>> for
>>>>>>>>> jackrabbit node types?
>>>>>>>>>
>>>>>>>>> jcr:content is unstructured, but I explicitly make the type
>>>>>> nt:resource
>>>>>>>>> (otherwise the statement would would not be parsed, Query object
>>>>>> would
>>>>>>>>> throw an error, like "table not found," right? Because the type
>>>>>>>>>is
>>>>>> a
>>>>>>>>> table). So the type is right.. The field is right.. The search is
>>>>>> not
>>>>>>>>> working.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm using Jackrabbit without any special configuration. Just the
>>>>>> war in
>>>>>>>>> a
>>>>>>>>> simple tomcat deployment. So it's sitting on top of Derby and
>>>>>> Lucine.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any help would be appreciated.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Carl Furst
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> **********************************************************
>>>>>>>>>
>>>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> **********************************************************
>>>>>>>
>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> **********************************************************
>>>>>>
>>>>>> MLB.com: Where Baseball is Always On
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> **********************************************************
>>>>
>>>> MLB.com: Where Baseball is Always On
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> **********************************************************
>>> 
>>> MLB.com: Where Baseball is Always On
>>
>
>
>
>
>
>
>**********************************************************
>
>MLB.com: Where Baseball is Always On






**********************************************************

MLB.com: Where Baseball is Always On

Re: jcr sql2 - contains() full text search not working

Posted by "Furst, Carl" <Ca...@mlb.com>.
Thanks for the help Torsten,

Unfortunately that didn't work. The output from my test is as follows:

mimetype for node we are looking for is: text/html
// Which was taken from the node, using the path. This is the text that is
stored in jcr:mimeType

text for node we are looking for is:
FanFest Art Contest Winners</b></span><br>
// this is a snippet of text from the document I was searching stored in
jcr:data




starting execute
executing current query with sqlSELECT [nt:resource].* FROM [nt:resource]
WHERE CONTAINS([nt:resource].[jcr:data], 'FanFest Art Contest') using
language JCR-SQL2
//This is the query as extracted from the Query object

And this is the result:

found: 0 nodes
executed test in 660 ms


So something is not right…(SQL, maybe?). Maybe the node iterator isn't
getting the right count of nodes? Could it be that over RMI it's possible
to get the nodes but not the right count nodes returned?

Hmmm…. 


Carl Furst





On 7/11/12 2:32 PM, "Torsten Stolpmann" <st...@verit.de> wrote:

>Hi Carl,
>
>AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for
>details. Maybe [2] has the correct answer to your problem (explicitly
>setting the jcr:mimeType for your data node)?
>
>HTH,
>
>Torsten
>
>[1] https://issues.apache.org/jira/browse/JCR-1878
>[2]
>http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-to
>-specify-extractors-td4534050.html
>
>On 11.07.2012 20:16, Furst, Carl wrote:
>> So after some investigation I'm at a loss as to which class to use for
>> text extraction (ie what to set textFilterClasses to in the
>>workspace.xml
>> file).  Which class is the default in 2.4.2? The Wiki I think is
>> incorrect... It states
>> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
>> default, but I don't see that class in the source code.
>> 
>> Possible candidates are:
>> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
>> indexer)
>> Org.apache.jackrabbit.core.query.lucene.BlockingParser
>> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
>> 
>> Any suggestions? I'll plug in the last two and see if things improve.
>> 
>> 
>> 
>> 
>> Thanks,
>> Carl Furst
>> 
>> 
>> 
>> 
>> 
>> On 7/11/12 1:36 PM, "Furst, Carl"<Ca...@mlb.com>  wrote:
>> 
>>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a test.
>>>
>>> Thanks,
>>> Carl Furst
>>>
>>>
>>>
>>>
>>>
>>> On 7/3/12 5:19 AM, "Alex Parvulescu"<al...@gmail.com>  wrote:
>>>
>>>> Hi Carl,
>>>>
>>>> What version of jackrabbit are you on?
>>>>
>>>> Next, are you sure you have the tika extractors in the classpath?
>>>>maybe
>>>> you
>>>> are seeing something along the lines of [0].
>>>>
>>>> I would try to isolate the problem by taking tomcat out of the setup.
>>>> Build
>>>> a simple test, see how it works then deploy on tomcat and verify.
>>>> A good place to start is the unit test collection available in
>>>>jackrabbit
>>>> core [1].
>>>>
>>>>
>>>> best,
>>>> alex
>>>>
>>>> [0] https://issues.apache.org/jira/browse/JCR-3287
>>>> [1]
>>>> 
>>>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/
>>>>ja
>>>> v
>>>> 
>>>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=mark
>>>>up
>>>>
>>>>
>>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<Ca...@mlb.com>
>>>>wrote:
>>>>
>>>>> So given the below I tried to use
>>>>>
>>>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>>> looking
>>>>> into perhaps maybe some of these reasons as why:
>>>>>
>>>>>
>>>>> 
>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits
>>>>>_.
>>>>> 2
>>>>> BA
>>>>> C8_incorrect_hits.3F
>>>>>
>>>>> Of course it could just be that the parser is not parsing the '*'.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>>
>>>>>
>>>>> Carl Furst
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 6/27/12 1:59 PM, "Furst, Carl"<Ca...@mlb.com>  wrote:
>>>>>
>>>>>> Thanks Torsten,
>>>>>>
>>>>>> So even using JQOM would not help here. I'll read up more on lucine
>>>>> and
>>>>>> find out more. My main stumbling block here was where the query was
>>>>> being
>>>>>> executed. Was it on the Derby level or the Lucine level..
>>>>>>
>>>>>> This has cleared that part of it up for me as well.
>>>>>>
>>>>>> Thanks again,
>>>>>>
>>>>>> Carl Furst
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<st...@verit.de>  wrote:
>>>>>>
>>>>>>> Hi Carl,
>>>>>>>
>>>>>>> per default the underlying Lucene implementation does not match
>>>>> leading
>>>>>>> wildcards for performance reasons. See also:
>>>>>>>
>>>>>
>>>>> 
>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_sup
>>>>>po
>>>>> r
>>>>>>> t
>>>>>>> _is_available_from_Lucene.3F
>>>>>>>
>>>>>>> So just matching '*' will not work, but eg. 'i*' might give you the
>>>>>>> results you were looking for.
>>>>>>>
>>>>>>> Sadly enough I did not find any reference to this in the JackRabbit
>>>>>>> documentation.
>>>>>>>
>>>>>>> Took me quite a while to find that too.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>>
>>>>>>> Torsten
>>>>>>>
>>>>>>> On 27.06.2012 17:19, Furst, Carl wrote:
>>>>>>>> I'm probably missing something here but everything I've read so
>>>>>>>>far
>>>>>>>> leads
>>>>>>>> me to believe this should work..
>>>>>>>>
>>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file
>>>>> has
>>>>>>>> a
>>>>>>>> child node jcr:content of type nt:resource which has a child
>>>>> property
>>>>>>>> called jcr:data
>>>>>>>>
>>>>>>>> There are many cases where the jcr:data column has the world
>>>>> 'include'
>>>>>>>> in
>>>>>>>> it. They are jsp files so, yes, I know this word exists in several
>>>>>>>> files.
>>>>>>>>
>>>>>>>> So here's the sql I use:
>>>>>>>>
>>>>>>>> select * from [nt:resource] where  contains([jcr:data],
>>>>>>>>'include');
>>>>>>>>
>>>>>>>> Here's the sql that is returned from q.getStatement() :
>>>>>>>>
>>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include');
>>>>>>>>
>>>>>>>> Here is a sample text in jcr:data to search on.
>>>>>>>>
>>>>>>>> <%@ include file="..."
>>>>>>>>
>>>>>>>>
>>>>>>>> ... More jsp here..
>>>>>>>> <%/jsp:include...
>>>>>>>>
>>>>>>>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need
>>>>>>>>to
>>>>> add
>>>>>>>> a
>>>>>>>> "searchable" mixin or something?
>>>>>>>>
>>>>>>>> Any ideas why this is not being found?
>>>>>>>>
>>>>>>>> It used to be that apache had the cdn file for jackrabbit node
>>>>> types
>>>>>>>> was
>>>>>>>> readily available. Does anyone know where I can find the cdn file
>>>>> for
>>>>>>>> jackrabbit node types?
>>>>>>>>
>>>>>>>> jcr:content is unstructured, but I explicitly make the type
>>>>> nt:resource
>>>>>>>> (otherwise the statement would would not be parsed, Query object
>>>>> would
>>>>>>>> throw an error, like "table not found," right? Because the type is
>>>>> a
>>>>>>>> table). So the type is right.. The field is right.. The search is
>>>>> not
>>>>>>>> working.
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm using Jackrabbit without any special configuration. Just the
>>>>> war in
>>>>>>>> a
>>>>>>>> simple tomcat deployment. So it's sitting on top of Derby and
>>>>> Lucine.
>>>>>>>>
>>>>>>>>
>>>>>>>> Any help would be appreciated.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Carl Furst
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> **********************************************************
>>>>>>>>
>>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> **********************************************************
>>>>>>
>>>>>> MLB.com: Where Baseball is Always On
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> **********************************************************
>>>>>
>>>>> MLB.com: Where Baseball is Always On
>>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> **********************************************************
>>>
>>> MLB.com: Where Baseball is Always On
>> 
>> 
>> 
>> 
>> 
>> 
>> **********************************************************
>> 
>> MLB.com: Where Baseball is Always On
>






**********************************************************

MLB.com: Where Baseball is Always On

Re: jcr sql2 - contains() full text search not working

Posted by Torsten Stolpmann <st...@verit.de>.
Hi Carl,

AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for
details. Maybe [2] has the correct answer to your problem (explicitly
setting the jcr:mimeType for your data node)?

HTH,

Torsten

[1] https://issues.apache.org/jira/browse/JCR-1878
[2]
http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-to-specify-extractors-td4534050.html

On 11.07.2012 20:16, Furst, Carl wrote:
> So after some investigation I'm at a loss as to which class to use for
> text extraction (ie what to set textFilterClasses to in the workspace.xml
> file).  Which class is the default in 2.4.2? The Wiki I think is
> incorrect... It states
> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
> default, but I don't see that class in the source code.
> 
> Possible candidates are:
> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
> indexer)
> Org.apache.jackrabbit.core.query.lucene.BlockingParser
> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
> 
> Any suggestions? I'll plug in the last two and see if things improve.
> 
> 
> 
> 
> Thanks,
> Carl Furst
> 
> 
> 
> 
> 
> On 7/11/12 1:36 PM, "Furst, Carl"<Ca...@mlb.com>  wrote:
> 
>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a test.
>>
>> Thanks,
>> Carl Furst
>>
>>
>>
>>
>>
>> On 7/3/12 5:19 AM, "Alex Parvulescu"<al...@gmail.com>  wrote:
>>
>>> Hi Carl,
>>>
>>> What version of jackrabbit are you on?
>>>
>>> Next, are you sure you have the tika extractors in the classpath? maybe
>>> you
>>> are seeing something along the lines of [0].
>>>
>>> I would try to isolate the problem by taking tomcat out of the setup.
>>> Build
>>> a simple test, see how it works then deploy on tomcat and verify.
>>> A good place to start is the unit test collection available in jackrabbit
>>> core [1].
>>>
>>>
>>> best,
>>> alex
>>>
>>> [0] https://issues.apache.org/jira/browse/JCR-3287
>>> [1]
>>> http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/ja
>>> v
>>> a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=markup
>>>
>>>
>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<Ca...@mlb.com>  wrote:
>>>
>>>> So given the below I tried to use
>>>>
>>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>> looking
>>>> into perhaps maybe some of these reasons as why:
>>>>
>>>>
>>>> https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.
>>>> 2
>>>> BA
>>>> C8_incorrect_hits.3F
>>>>
>>>> Of course it could just be that the parser is not parsing the '*'.
>>>>
>>>> Thanks again,
>>>>
>>>>
>>>>
>>>> Carl Furst
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 6/27/12 1:59 PM, "Furst, Carl"<Ca...@mlb.com>  wrote:
>>>>
>>>>> Thanks Torsten,
>>>>>
>>>>> So even using JQOM would not help here. I'll read up more on lucine
>>>> and
>>>>> find out more. My main stumbling block here was where the query was
>>>> being
>>>>> executed. Was it on the Derby level or the Lucine level..
>>>>>
>>>>> This has cleared that part of it up for me as well.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Carl Furst
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<st...@verit.de>  wrote:
>>>>>
>>>>>> Hi Carl,
>>>>>>
>>>>>> per default the underlying Lucene implementation does not match
>>>> leading
>>>>>> wildcards for performance reasons. See also:
>>>>>>
>>>>
>>>> https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_suppo
>>>> r
>>>>>> t
>>>>>> _is_available_from_Lucene.3F
>>>>>>
>>>>>> So just matching '*' will not work, but eg. 'i*' might give you the
>>>>>> results you were looking for.
>>>>>>
>>>>>> Sadly enough I did not find any reference to this in the JackRabbit
>>>>>> documentation.
>>>>>>
>>>>>> Took me quite a while to find that too.
>>>>>>
>>>>>> Hope this helps,
>>>>>>
>>>>>> Torsten
>>>>>>
>>>>>> On 27.06.2012 17:19, Furst, Carl wrote:
>>>>>>> I'm probably missing something here but everything I've read so far
>>>>>>> leads
>>>>>>> me to believe this should work..
>>>>>>>
>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file
>>>> has
>>>>>>> a
>>>>>>> child node jcr:content of type nt:resource which has a child
>>>> property
>>>>>>> called jcr:data
>>>>>>>
>>>>>>> There are many cases where the jcr:data column has the world
>>>> 'include'
>>>>>>> in
>>>>>>> it. They are jsp files so, yes, I know this word exists in several
>>>>>>> files.
>>>>>>>
>>>>>>> So here's the sql I use:
>>>>>>>
>>>>>>> select * from [nt:resource] where  contains([jcr:data], 'include');
>>>>>>>
>>>>>>> Here's the sql that is returned from q.getStatement() :
>>>>>>>
>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include');
>>>>>>>
>>>>>>> Here is a sample text in jcr:data to search on.
>>>>>>>
>>>>>>> <%@ include file="..."
>>>>>>>
>>>>>>>
>>>>>>> ... More jsp here..
>>>>>>> <%/jsp:include...
>>>>>>>
>>>>>>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need to
>>>> add
>>>>>>> a
>>>>>>> "searchable" mixin or something?
>>>>>>>
>>>>>>> Any ideas why this is not being found?
>>>>>>>
>>>>>>> It used to be that apache had the cdn file for jackrabbit node
>>>> types
>>>>>>> was
>>>>>>> readily available. Does anyone know where I can find the cdn file
>>>> for
>>>>>>> jackrabbit node types?
>>>>>>>
>>>>>>> jcr:content is unstructured, but I explicitly make the type
>>>> nt:resource
>>>>>>> (otherwise the statement would would not be parsed, Query object
>>>> would
>>>>>>> throw an error, like "table not found," right? Because the type is
>>>> a
>>>>>>> table). So the type is right.. The field is right.. The search is
>>>> not
>>>>>>> working.
>>>>>>>
>>>>>>>
>>>>>>> I'm using Jackrabbit without any special configuration. Just the
>>>> war in
>>>>>>> a
>>>>>>> simple tomcat deployment. So it's sitting on top of Derby and
>>>> Lucine.
>>>>>>>
>>>>>>>
>>>>>>> Any help would be appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Carl Furst
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> **********************************************************
>>>>>>>
>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> **********************************************************
>>>>>
>>>>> MLB.com: Where Baseball is Always On
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> **********************************************************
>>>>
>>>> MLB.com: Where Baseball is Always On
>>>>
>>
>>
>>
>>
>>
>>
>> **********************************************************
>>
>> MLB.com: Where Baseball is Always On
> 
> 
> 
> 
> 
> 
> **********************************************************
> 
> MLB.com: Where Baseball is Always On


Re: jcr sql2 - contains() full text search not working

Posted by "Furst, Carl" <Ca...@mlb.com>.
So after some investigation I'm at a loss as to which class to use for
text extraction (ie what to set textFilterClasses to in the workspace.xml
file).  Which class is the default in 2.4.2? The Wiki I think is
incorrect... It states
org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
default, but I don't see that class in the source code.

Possible candidates are:
Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
indexer)
Org.apache.jackrabbit.core.query.lucene.BlockingParser
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField

Any suggestions? I'll plug in the last two and see if things improve.




Thanks,
Carl Furst





On 7/11/12 1:36 PM, "Furst, Carl" <Ca...@mlb.com> wrote:

>2.4.2 - Thanks for the references.. I'll check out Tika and try a test.
>
>Thanks,
>Carl Furst
>
>
>
>
>
>On 7/3/12 5:19 AM, "Alex Parvulescu" <al...@gmail.com> wrote:
>
>>Hi Carl,
>>
>>What version of jackrabbit are you on?
>>
>>Next, are you sure you have the tika extractors in the classpath? maybe
>>you
>>are seeing something along the lines of [0].
>>
>>I would try to isolate the problem by taking tomcat out of the setup.
>>Build
>>a simple test, see how it works then deploy on tomcat and verify.
>>A good place to start is the unit test collection available in jackrabbit
>>core [1].
>>
>>
>>best,
>>alex
>>
>>[0] https://issues.apache.org/jira/browse/JCR-3287
>>[1]
>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/ja
>>v
>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=markup
>>
>>
>>On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl <Ca...@mlb.com> wrote:
>>
>>> So given the below I tried to use
>>>
>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>looking
>>> into perhaps maybe some of these reasons as why:
>>>
>>> 
>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.
>>>2
>>>BA
>>> C8_incorrect_hits.3F
>>>
>>> Of course it could just be that the parser is not parsing the '*'.
>>>
>>> Thanks again,
>>>
>>>
>>>
>>> Carl Furst
>>>
>>>
>>>
>>>
>>>
>>> On 6/27/12 1:59 PM, "Furst, Carl" <Ca...@mlb.com> wrote:
>>>
>>> >Thanks Torsten,
>>> >
>>> >So even using JQOM would not help here. I'll read up more on lucine
>>>and
>>> >find out more. My main stumbling block here was where the query was
>>>being
>>> >executed. Was it on the Derby level or the Lucine level..
>>> >
>>> >This has cleared that part of it up for me as well.
>>> >
>>> >Thanks again,
>>> >
>>> >Carl Furst
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On 6/27/12 1:50 PM, "Torsten Stolpmann" <st...@verit.de> wrote:
>>> >
>>> >>Hi Carl,
>>> >>
>>> >>per default the underlying Lucene implementation does not match
>>>leading
>>> >>wildcards for performance reasons. See also:
>>> >>
>>> 
>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_suppo
>>>r
>>> >>t
>>> >>_is_available_from_Lucene.3F
>>> >>
>>> >>So just matching '*' will not work, but eg. 'i*' might give you the
>>> >>results you were looking for.
>>> >>
>>> >>Sadly enough I did not find any reference to this in the JackRabbit
>>> >>documentation.
>>> >>
>>> >>Took me quite a while to find that too.
>>> >>
>>> >>Hope this helps,
>>> >>
>>> >>Torsten
>>> >>
>>> >>On 27.06.2012 17:19, Furst, Carl wrote:
>>> >>> I'm probably missing something here but everything I've read so far
>>> >>>leads
>>> >>> me to believe this should work..
>>> >>>
>>> >>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file
>>>has
>>> >>>a
>>> >>> child node jcr:content of type nt:resource which has a child
>>>property
>>> >>> called jcr:data
>>> >>>
>>> >>> There are many cases where the jcr:data column has the world
>>>'include'
>>> >>>in
>>> >>> it. They are jsp files so, yes, I know this word exists in several
>>> >>>files.
>>> >>>
>>> >>> So here's the sql I use:
>>> >>>
>>> >>> select * from [nt:resource] where  contains([jcr:data], 'include');
>>> >>>
>>> >>> Here's the sql that is returned from q.getStatement() :
>>> >>>
>>> >>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>> >>> CONTAINS([nt:resource].[jcr:data], 'include');
>>> >>>
>>> >>> Here is a sample text in jcr:data to search on.
>>> >>>
>>> >>> <%@ include file="..."
>>> >>>
>>> >>>
>>> >>> ... More jsp here..
>>> >>> <%/jsp:include...
>>> >>>
>>> >>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need to
>>>add
>>> >>>a
>>> >>> "searchable" mixin or something?
>>> >>>
>>> >>> Any ideas why this is not being found?
>>> >>>
>>> >>> It used to be that apache had the cdn file for jackrabbit node
>>>types
>>> >>>was
>>> >>> readily available. Does anyone know where I can find the cdn file
>>>for
>>> >>> jackrabbit node types?
>>> >>>
>>> >>> jcr:content is unstructured, but I explicitly make the type
>>>nt:resource
>>> >>> (otherwise the statement would would not be parsed, Query object
>>>would
>>> >>> throw an error, like "table not found," right? Because the type is
>>>a
>>> >>> table). So the type is right.. The field is right.. The search is
>>>not
>>> >>> working.
>>> >>>
>>> >>>
>>> >>> I'm using Jackrabbit without any special configuration. Just the
>>>war in
>>> >>>a
>>> >>> simple tomcat deployment. So it's sitting on top of Derby and
>>>Lucine.
>>> >>>
>>> >>>
>>> >>> Any help would be appreciated.
>>> >>>
>>> >>> Thanks,
>>> >>>
>>> >>> Carl Furst
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> **********************************************************
>>> >>>
>>> >>> MLB.com: Where Baseball is Always On
>>> >>>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >**********************************************************
>>> >
>>> >MLB.com: Where Baseball is Always On
>>>
>>>
>>>
>>>
>>>
>>>
>>> **********************************************************
>>>
>>> MLB.com: Where Baseball is Always On
>>>
>
>
>
>
>
>
>**********************************************************
>
>MLB.com: Where Baseball is Always On






**********************************************************

MLB.com: Where Baseball is Always On

Re: jcr sql2 - contains() full text search not working

Posted by "Furst, Carl" <Ca...@mlb.com>.
2.4.2 - Thanks for the references.. I'll check out Tika and try a test.

Thanks,
Carl Furst





On 7/3/12 5:19 AM, "Alex Parvulescu" <al...@gmail.com> wrote:

>Hi Carl,
>
>What version of jackrabbit are you on?
>
>Next, are you sure you have the tika extractors in the classpath? maybe
>you
>are seeing something along the lines of [0].
>
>I would try to isolate the problem by taking tomcat out of the setup.
>Build
>a simple test, see how it works then deploy on tomcat and verify.
>A good place to start is the unit test collection available in jackrabbit
>core [1].
>
>
>best,
>alex
>
>[0] https://issues.apache.org/jira/browse/JCR-3287
>[1]
>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/jav
>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=markup
>
>
>On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl <Ca...@mlb.com> wrote:
>
>> So given the below I tried to use
>>
>> 'inclu*' and 'include*' and still no results so I'm going to start
>>looking
>> into perhaps maybe some of these reasons as why:
>>
>> 
>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2
>>BA
>> C8_incorrect_hits.3F
>>
>> Of course it could just be that the parser is not parsing the '*'.
>>
>> Thanks again,
>>
>>
>>
>> Carl Furst
>>
>>
>>
>>
>>
>> On 6/27/12 1:59 PM, "Furst, Carl" <Ca...@mlb.com> wrote:
>>
>> >Thanks Torsten,
>> >
>> >So even using JQOM would not help here. I'll read up more on lucine and
>> >find out more. My main stumbling block here was where the query was
>>being
>> >executed. Was it on the Derby level or the Lucine level..
>> >
>> >This has cleared that part of it up for me as well.
>> >
>> >Thanks again,
>> >
>> >Carl Furst
>> >
>> >
>> >
>> >
>> >
>> >
>> >On 6/27/12 1:50 PM, "Torsten Stolpmann" <st...@verit.de> wrote:
>> >
>> >>Hi Carl,
>> >>
>> >>per default the underlying Lucene implementation does not match
>>leading
>> >>wildcards for performance reasons. See also:
>> >>
>> 
>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_suppor
>> >>t
>> >>_is_available_from_Lucene.3F
>> >>
>> >>So just matching '*' will not work, but eg. 'i*' might give you the
>> >>results you were looking for.
>> >>
>> >>Sadly enough I did not find any reference to this in the JackRabbit
>> >>documentation.
>> >>
>> >>Took me quite a while to find that too.
>> >>
>> >>Hope this helps,
>> >>
>> >>Torsten
>> >>
>> >>On 27.06.2012 17:19, Furst, Carl wrote:
>> >>> I'm probably missing something here but everything I've read so far
>> >>>leads
>> >>> me to believe this should work..
>> >>>
>> >>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file
>>has
>> >>>a
>> >>> child node jcr:content of type nt:resource which has a child
>>property
>> >>> called jcr:data
>> >>>
>> >>> There are many cases where the jcr:data column has the world
>>'include'
>> >>>in
>> >>> it. They are jsp files so, yes, I know this word exists in several
>> >>>files.
>> >>>
>> >>> So here's the sql I use:
>> >>>
>> >>> select * from [nt:resource] where  contains([jcr:data], 'include');
>> >>>
>> >>> Here's the sql that is returned from q.getStatement() :
>> >>>
>> >>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>> >>> CONTAINS([nt:resource].[jcr:data], 'include');
>> >>>
>> >>> Here is a sample text in jcr:data to search on.
>> >>>
>> >>> <%@ include file="..."
>> >>>
>> >>>
>> >>> ... More jsp here..
>> >>> <%/jsp:include...
>> >>>
>> >>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need to
>>add
>> >>>a
>> >>> "searchable" mixin or something?
>> >>>
>> >>> Any ideas why this is not being found?
>> >>>
>> >>> It used to be that apache had the cdn file for jackrabbit node types
>> >>>was
>> >>> readily available. Does anyone know where I can find the cdn file
>>for
>> >>> jackrabbit node types?
>> >>>
>> >>> jcr:content is unstructured, but I explicitly make the type
>>nt:resource
>> >>> (otherwise the statement would would not be parsed, Query object
>>would
>> >>> throw an error, like "table not found," right? Because the type is a
>> >>> table). So the type is right.. The field is right.. The search is
>>not
>> >>> working.
>> >>>
>> >>>
>> >>> I'm using Jackrabbit without any special configuration. Just the
>>war in
>> >>>a
>> >>> simple tomcat deployment. So it's sitting on top of Derby and
>>Lucine.
>> >>>
>> >>>
>> >>> Any help would be appreciated.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Carl Furst
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> **********************************************************
>> >>>
>> >>> MLB.com: Where Baseball is Always On
>> >>>
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> >**********************************************************
>> >
>> >MLB.com: Where Baseball is Always On
>>
>>
>>
>>
>>
>>
>> **********************************************************
>>
>> MLB.com: Where Baseball is Always On
>>






**********************************************************

MLB.com: Where Baseball is Always On