You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Michael Wechner <mi...@wyona.com> on 2007/11/26 21:38:26 UTC

Indexing properties with InputStream as value [WAS: Re: Indexing of properties setProperty(String, InputStream)]

Marcel Reutegger wrote:

> Hi Michael,
>
> these are rather questions for the user list, but anyway...
>
> Michael Wechner wrote:
>
>> I am using setProperty(String, InputStream) resp. 
>> setProperty("content", new InputStream(...)) in order to save XHTML 
>> and other "bigger" content.
>> Also I am using the TransientRepository implementation.
>>
>> When I am searching with xpath, something like //*[@content] then I 
>> don't receive any results whereas properties being set with 
>> setProperty(String, String) are being found.
>>
>> Now I am very sure the "content" properties do exist, because I read 
>> and write to them without a problem.
>>
>> So my guess is that properties being set through setProperty(String, 
>> InputStream) are not being indexed by default, because it could be 
>> any kind of data, right?
>
>
> that's correct. the JCR specification says that binary properties are 
> not indexed. basically because of the reason you mentioned. it can be 
> anything...
>
>> But I can get them indexed?
>
>
> yes, if you store the binary as a nt:resource node. this will give 
> jackrabbit the required information how to index the binary (mime-type 
> and encoding). furthermore you need to configure text extractors in 
> the configuration. 
> http://jackrabbit.apache.org/doc/components/text-extractors.html


thanks for these pointers. Based on your hints I have also found

http://www.mail-archive.com/dev@jackrabbit.apache.org/msg04145.html

http://repo1.maven.org/maven2/org/apache/jackrabbit/jackrabbit-text-extractors/1.3.3/

http://www.nabble.com/Jackrabbit-performance-with-large-binaries-t2778091.html

whereas I guess it would make sense to combine these into some 
"tutorial" which would be more detailed than the individual resources.

Would you be interested in something like this for the Jackrabbit 
documentation?

Cheers

Michael


>
>> Shall I rather use
>> setProperty(String, Value, int) and set the type to String and use 
>> Value.getStream() ?
>
>
> that's an alternative, but then you will get matches for tag names as 
> well. while you are probably only interested in the text between the 
> elements and attribute values.
>
> regards
>  marcel



-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner@wyona.com                        michi@apache.org
+41 44 272 91 61


Re: Indexing properties with InputStream as value [WAS: Re: Indexing of properties setProperty(String, InputStream)]

Posted by Marcel Reutegger <ma...@gmx.net>.
Michael Wechner wrote:
> Based on your hints I have also found
> 
> http://www.mail-archive.com/dev@jackrabbit.apache.org/msg04145.html
> 
> http://repo1.maven.org/maven2/org/apache/jackrabbit/jackrabbit-text-extractors/1.3.3/ 
> 
> 
> http://www.nabble.com/Jackrabbit-performance-with-large-binaries-t2778091.html 
> 
> 
> whereas I guess it would make sense to combine these into some 
> "tutorial" which would be more detailed than the individual resources.
> 
> Would you be interested in something like this for the Jackrabbit 
> documentation?

that would be great. feel free to create a wiki page [1]. I'd be happy to review 
it and make additions.


regards
  marcel


[1] http://wiki.apache.org/jackrabbit/