You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Malligarjunan Sidduraj <Ma...@webMethods.com> on 2007/03/12 05:42:59 UTC

Problem with content search

My Query

String queryString = "/jcr:root//element(*, nt:resource)[(jcr:contains(., '"
                    + keyWord + "'))]";
Workspace workspace=session.getWorkspace();
QueryManager qm = workspace.getQueryManager();
Query query = qm.createQuery(queryString, Query.XPATH);
QueryResult queryResult = query.execute();


My Configuration 

<Workspace name="${wsp.name}">
        <FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${wsp.home}"/>
        </FileSystem>
        <PersistenceManager
class="org.apache.jackrabbit.core.persistence.obj.ObjectPersistenceManager">
        </PersistenceManager>
        <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="textFilterClasses"
value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,org.apach
e.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.M
sPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.
apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.
HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jac
krabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOffice
TextFilter"/>
        </SearchIndex>
    </Workspace>

    <Versioning rootPath="${rep.home}/version">
        <FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${rep.home}/version" />
        </FileSystem>

        <PersistenceManager
class="org.apache.jackrabbit.core.persistence.obj.ObjectPersistenceManager">
</Versioning>

    <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        <param name="path" value="${rep.home}/repository/index"/>
        <param name="textFilterClasses"
value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,org.apach
e.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.M
sPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.
apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.
HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jac
krabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOffice
TextFilter"/>
    </SearchIndex>


I have uploaded a document file.(hello.doc it contains a word "hello")

The above query returns the empty collection why?
Anything missing in the configuration?

Note: This query works fine for .txt, .java, .wsdl files.

Re: Problem with content search

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 3/12/07, Malligarjunan Sidduraj
<Ma...@webmethods.com> wrote:
> I have uploaded a document file.(hello.doc it contains a word "hello")
>
> The above query returns the empty collection why?
> Anything missing in the configuration?

What's the contents of the jcr:content/jcr:mimeType property of the
uploaded document? Jackrabbit uses that property to determine which
document parser to use for indexing the document. If it's something
like application/octet-stream, then the document will not be indexed.

BR,

Jukka Zitting

Re: Problem with content search

Posted by Marcel Reutegger <ma...@gmx.net>.

stefan81 wrote:
> When documents are stored under the old formats the indexing works. Are the
> new formats supported by newer JR versions?

no, jackrabbit does not support docx or xlsx.

 > Are there any efforts?

no, not currently. but if you create a jira issue someone might start to work on 
it ;)

regards
  marcel

Re: Problem with content search

Posted by stefan81 <st...@process-relations.com>.

I figured out that documents saved under the new office 2007 formats (docx,
xlsx) are not recognized by the indexer (at least in JR version 1.3). Maybe
this is the cause for your problem. My JBoss tells me then:

[NodeIndexer] Exception while indexing binary property: java.io.IOException:
Invalid header signature; read 1688935826934608, expected
-2226271756974174256

When documents are stored under the old formats the indexing works. Are the
new formats supported by newer JR versions? Are there any efforts?

Regards

Jukka Zitting wrote:
> 
> Hi,
> 
> On 3/13/07, Malligarjunan Sidduraj
> <Ma...@webmethods.com> wrote:
>> Content of the jcr:content/jcr:mimeType property is application/msword
> 
> That should be OK. It could be that the MS Word indexer is having some
> trouble parsing the document. Do you see some errors being logged by
> the repository?
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem-with-content-search-tf3387188.html#a12612402
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.