You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Phillip Rhodes <sp...@rhoderunner.com> on 2007/04/18 22:11:34 UTC

BinaryValue does not get indexed

I am adding BinaryValue properties to my nodes.  It appears that jackrabbit is not indexing the values of the BinaryValue even if the contents represent a string.  If I add the String value as a StringValue, the value is indexed and picked up in a contains search.

I have 2 issues with this:

1) String property values have a limit of around 16000 characters because the SimpleDBPersistence adapter will store the value in a BLOB field.  I get Mysql data truncation errors unless I chop the data down to 16000 characters.  In addition, I am doubling my space requirements.  No only do I have to store my binary content, by it's string representation in the node.

2) I use a byte[] array throughout my application has a means to store pdf files, image files, text files, etc...  It is a "common denominator for all content"  PDF files, image files, wiki entries, etc...  all can be stored, passed around, retrieved as a byte[] array.  I would like to figure out how to get jackrabbit to index the byte[] array properly.

3) Not an issue, but a question. How does jackrabbit know that a node is a pdf document?  It must figure it out somehow because I see that there is support in the SearchIndex to configure pdf extractions.  Do I add "jcr:mimeType" property of application/pdf to my pdf node and that will do it?  Will this solve the first 2 issues??

I appreciate your thoughts on this!


My Code:

String contentText= "this is a unique piece of text";
byte[] bytes = contentText.getBytes();
node.setProperty("content", new BinaryValue(bytes));
if (content.length() > 16000) {
	contentText= contentText.substring(0, 16000);					
}
node.setProperty("worksproperty", new StringValue(contentText));



This is my xpath query:
//*[jcr:contains(.,'unique')]



Re: Is jcr:contains supported only for text/plain type content?

Posted by Alexandru Popescu ☀ <th...@gmail.com>.
On 5/7/07, Nithya Mani <ni...@webmethods.com> wrote:
> Hi,
>
> I have added an xsd file as binary property value. The node structure I
> followed is,
>
> /Schema (nt:file)
>         /jcr:content (nt:resource)
>                 -jcr:mimetype (text/xsd)
>                 -jcr:lastModified
>                 -jcr:data (sample.xsd which contains a line
> <xsd:simpleType name="uddiKey" scc14n:caseMapKind="fold">)
>
> I tried to make contains search on this file using the following query,
>
> /jcr:root/Schema//element(*, nt:resource)[jcr:contains(.,
> '*caseMapKind*')]
>
> It returns none. If I neglect the contains part
> {/jcr:root/Schema//element(*, nt:resource)}, it works fine. The same
> query with contains works, if I set mime type as text/plain and
> partially works on text/xml type file. Does jcr:contains not supported
> for all mime types? Is it applicable only for text/plain?
>

According to the spec:

[quote]
Support for the jcr:contains() (CONTAINS() in SQL) function is
not required for any property types in particular. It is however
required to work at the node level. In that case it applies to those
properties of the node for which the implementation maintains an
index. Which properties those are is an implementation issue. See
6.6.5.2 jcr:contains Function and 8.5.4.5 CONTAINS.
[/quote]

So, most probably in your case you haven't configured any other text
indexers so that the content of your xsd is indexed.

./alex
--
.w( the_mindstorm )p.
_____________________________________
  Alexandru Popescu, OSS Evangelist
TestNG/Groovy/AspectJ/WebWork/more...
  Information Queue ~ www.InfoQ.com


> Thanks,
>
> Nithya Mani
> Senior Developer
>
> webMethods
>         Mobile: 9841825337
> Email: nithya@webmethods.com
> IM: nithya_infravio (Yahoo)
>

Is jcr:contains supported only for text/plain type content?

Posted by Nithya Mani <ni...@webmethods.com>.
Hi,

I have added an xsd file as binary property value. The node structure I
followed is,
 
/Schema (nt:file)
	/jcr:content (nt:resource)
    		-jcr:mimetype (text/xsd)
		-jcr:lastModified
		-jcr:data (sample.xsd which contains a line
<xsd:simpleType name="uddiKey" scc14n:caseMapKind="fold">)

I tried to make contains search on this file using the following query,

/jcr:root/Schema//element(*, nt:resource)[jcr:contains(.,
'*caseMapKind*')]

It returns none. If I neglect the contains part
{/jcr:root/Schema//element(*, nt:resource)}, it works fine. The same
query with contains works, if I set mime type as text/plain and
partially works on text/xml type file. Does jcr:contains not supported
for all mime types? Is it applicable only for text/plain?

Thanks,

Nithya Mani
Senior Developer

webMethods
	Mobile: 9841825337
Email: nithya@webmethods.com
IM: nithya_infravio (Yahoo)

Is jcr:contains supported only for text/plain type content?

Posted by Nithya Mani <ni...@webmethods.com>.
Hi,

I have added an xsd file as binary property value. The node structure I
followed is,
 
/Schema (nt:file)
	/jcr:content (nt:resource)
    		-jcr:mimetype (text/xsd)
		-jcr:lastModified
		-jcr:data (sample.xsd which contains a line
<xsd:simpleType name="uddiKey" scc14n:caseMapKind="fold">)

I tried to make contains search on this file using the following query,

/jcr:root/Schema//element(*, nt:resource)[jcr:contains(.,
'*caseMapKind*')]

It returns none. If I neglect the contains part
{/jcr:root/Schema//element(*, nt:resource)}, it works fine. The same
query with contains works, if I set mime type as text/plain and
partially works on text/xml type file. Does jcr:contains not supported
for all mime types? Is it applicable only for text/plain?

Thanks,

Nithya Mani
Senior Developer

webMethods
	Mobile: 9841825337
Email: nithya@webmethods.com
IM: nithya_infravio (Yahoo)

Re: BinaryValue does not get indexed

Posted by Stefan Guggisberg <st...@gmail.com>.
hi phillip,

On 4/18/07, Phillip Rhodes <sp...@rhoderunner.com> wrote:
>
> I am adding BinaryValue properties to my nodes.  It appears that jackrabbit is not indexing the values of the BinaryValue even if the contents represent a string.  If I add the String value as a StringValue, the value is indexed and picked up in a contains search.
>
> I have 2 issues with this:
>
> 1) String property values have a limit of around 16000 characters because the SimpleDBPersistence adapter will store the value in a BLOB field.  I get Mysql data truncation errors unless I chop the data down to 16000 characters.  In addition, I am doubling my space requirements.  No only do I have to store my binary content, by it's string representation in the node.

there's been a related jira issue:
https://issues.apache.org/jira/browse/JCR-760

the issue has been resolved and will be included in the upcoming 1.3 release.

for the time being you could either change the table defintions
directly on mysql
or change the mysql.ddl file and let jackrabbit recreate the tables.

>
> 2) I use a byte[] array throughout my application has a means to store pdf files, image files, text files, etc...  It is a "common denominator for all content"  PDF files, image files, wiki entries, etc...  all can be stored, passed around, retrieved as a byte[] array.  I would like to figure out how to get jackrabbit to index the byte[] array properly.

see below

>
> 3) Not an issue, but a question. How does jackrabbit know that a node is a pdf document?  It must figure it out somehow because I see that there is support in the SearchIndex to configure pdf extractions.  Do I add "jcr:mimeType" property of application/pdf to my pdf node and that will do it?  Will this solve the first 2 issues??

in order to be indexed the binary data must be stored in the jcr:data
property of a
node of type nt:resource  (or as a sub type thereof).

e.g.

    Node node = parent.addNode("jcr:content", "nt:resource");
    node.setProperty("jcr:mimeType", "application/pdf);
    node.setProperty("jcr:data", new ByteArrayInputStream(pdfBytes));
    node.setProperty("jcr:lastModified", Calendar.getInstance());
    session.save();

once the resource nodes went through the text filters you can search
binary content using the jcr:contains function:

//element(*, nt:resource)[jcr:contains(., 'foo')]

cheers
stefan

>
> I appreciate your thoughts on this!
>
>
> My Code:
>
> String contentText= "this is a unique piece of text";
> byte[] bytes = contentText.getBytes();
> node.setProperty("content", new BinaryValue(bytes));
> if (content.length() > 16000) {
>         contentText= contentText.substring(0, 16000);
> }
> node.setProperty("worksproperty", new StringValue(contentText));
>
>
>
> This is my xpath query:
> //*[jcr:contains(.,'unique')]
>
>
>