You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Orlando Palis <or...@gmail.com> on 2013/06/07 02:58:25 UTC

jackrabbit 2.6.0 Full Text Search

Hi Folks,

I'm new to jackrabbit and I'm trying out full-text search using jackrabbit
2.6.0. (with tika 1.3) . I have a custom node type that allows me to store
some custom properties and multiple html files (stored as binary) .  I have
the following configurations:

*workspace.xml:*

<?xml version="1.0" encoding="UTF-8"?>
<Workspace name="default">
        <!--
            virtual file system of the workspace:
            class: FQN of class implementing the FileSystem interface
        -->
        <FileSystem
class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
            <param name="dataSourceName" value="ds1"/>
            <param name="schemaObjectPrefix" value="fs_${wsp.name}_"/>
        </FileSystem>
        <!--
            persistence manager of the workspace:
            class: FQN of class implementing the PersistenceManager
interface
        -->
        <PersistenceManager
class="org.apache.jackrabbit.core.persistence.pool.OraclePersistenceManager">
            <param name="dataSourceName" value="ds1"/>
            <param name="schemaObjectPrefix" value="pm_${wsp.name}_"/>
        </PersistenceManager>
        <!--
            Search index and the file system it uses.
            class: FQN of class implementing the QueryHandler interface
        -->
        <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="analyzer"
value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
            <param name="queryClass"
value="org.apache.jackrabbit.core.query.QueryImpl"/>
            <param name="excerptProviderClass"
value="org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt"/>
            <param name="supportHighlighting" value="true"/>
            <param name="tikaConfigPath"
value="${wsp.home}/tika-config.xml"/>
        </SearchIndex>
</Workspace>


*tika-config.xml:*

<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <mimeTypeRepository resource="/org/apache/tika/mime/tika-mimetypes.xml"
magic="false"/>
    <parsers>
           <parser name="parse-html"
class="org.apache.tika.parser.html.HtmlParser">
               <mime>text/html</mime>
               <mime>application/xhtml+xml</mime>
               <mime>application/x-asp</mime>
           </parser>
    </parsers>
</properties>

*JCR-SQL2 queries tested:*

1) SELECT * FROM [nt:file] as file WHERE CONTAINS(file.*, 'This')

2) SELECT * FROM [nt:file] as file WHERE CONTAINS(file.*, 'This*')

3)
SELECT file.*, resource.* FROM [nt:file] AS file
INNER JOIN [nt:resource] AS resource ON ISCHILDNODE(resource, file)
WHERE resource.[jcr:mimeType] = 'text/html'
AND CONTAINS(file.*, 'This')

4)
SELECT file.*, resource.* FROM [nt:file] AS file
INNER JOIN [nt:resource] AS resource ON ISCHILDNODE(resource, file)
WHERE resource.[jcr:mimeType] = 'text/html'
AND CONTAINS(file.*, 'This*')

*Result:*
Nothing seems to work.  If I remove the CONTAINS() clause from the queries,
I am able to get rows from all the queries above and for query #3 & #4 I
can see that the field resource.[jcr:data] has the text ("This") I am
searching for when I dump the result to the log file.  I've also tried
deleting the index folder so that the repository will be re-indexed but I
am still not able to do full-text search successfully.

What am I missing?  In addition, is there any documentation on how to
configure tika (tika-config.xml)?


Thanks and Regards,
Orlando