You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Michael Wechner <mi...@wyona.com> on 2007/11/26 21:38:26 UTC
Indexing properties with InputStream as value [WAS: Re: Indexing
of properties setProperty(String, InputStream)]
Marcel Reutegger wrote:
> Hi Michael,
>
> these are rather questions for the user list, but anyway...
>
> Michael Wechner wrote:
>
>> I am using setProperty(String, InputStream) resp.
>> setProperty("content", new InputStream(...)) in order to save XHTML
>> and other "bigger" content.
>> Also I am using the TransientRepository implementation.
>>
>> When I am searching with xpath, something like //*[@content] then I
>> don't receive any results whereas properties being set with
>> setProperty(String, String) are being found.
>>
>> Now I am very sure the "content" properties do exist, because I read
>> and write to them without a problem.
>>
>> So my guess is that properties being set through setProperty(String,
>> InputStream) are not being indexed by default, because it could be
>> any kind of data, right?
>
>
> that's correct. the JCR specification says that binary properties are
> not indexed. basically because of the reason you mentioned. it can be
> anything...
>
>> But I can get them indexed?
>
>
> yes, if you store the binary as a nt:resource node. this will give
> jackrabbit the required information how to index the binary (mime-type
> and encoding). furthermore you need to configure text extractors in
> the configuration.
> http://jackrabbit.apache.org/doc/components/text-extractors.html
thanks for these pointers. Based on your hints I have also found
http://www.mail-archive.com/dev@jackrabbit.apache.org/msg04145.html
http://repo1.maven.org/maven2/org/apache/jackrabbit/jackrabbit-text-extractors/1.3.3/
http://www.nabble.com/Jackrabbit-performance-with-large-binaries-t2778091.html
whereas I guess it would make sense to combine these into some
"tutorial" which would be more detailed than the individual resources.
Would you be interested in something like this for the Jackrabbit
documentation?
Cheers
Michael
>
>> Shall I rather use
>> setProperty(String, Value, int) and set the type to String and use
>> Value.getStream() ?
>
>
> that's an alternative, but then you will get matches for tag names as
> well. while you are probably only interested in the text between the
> elements and attribute values.
>
> regards
> marcel
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
michael.wechner@wyona.com michi@apache.org
+41 44 272 91 61
Re: Indexing properties with InputStream as value [WAS: Re: Indexing
of properties setProperty(String, InputStream)]
Posted by Marcel Reutegger <ma...@gmx.net>.
Michael Wechner wrote:
> Based on your hints I have also found
>
> http://www.mail-archive.com/dev@jackrabbit.apache.org/msg04145.html
>
> http://repo1.maven.org/maven2/org/apache/jackrabbit/jackrabbit-text-extractors/1.3.3/
>
>
> http://www.nabble.com/Jackrabbit-performance-with-large-binaries-t2778091.html
>
>
> whereas I guess it would make sense to combine these into some
> "tutorial" which would be more detailed than the individual resources.
>
> Would you be interested in something like this for the Jackrabbit
> documentation?
that would be great. feel free to create a wiki page [1]. I'd be happy to review
it and make additions.
regards
marcel
[1] http://wiki.apache.org/jackrabbit/