You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Paco Avila <pa...@git.es> on 2008/03/28 07:43:11 UTC

Re: how can I say to jackrabbit to index a text when I put a TIFF in the repository?

El vie, 28-03-2008 a las 08:26 +0200, Jukka Zitting escribió:
> Hi,
> 
> On Thu, Mar 27, 2008 at 7:59 PM, Dave Brosius <db...@mebigfatguy.com> wrote:
> > Maybe add a reference property to your TIFF node, that references
> > a node that contains the text?
> 
> Or just a normal string property with the text to be indexed.
> 
> BR,
> 
> Jukka Zitting

But, in this case, the query can't be:

  /jcr:root//element(*,my:document)[jcr:contains(nt:resource,'hola
mundo')]

and should be something like (if I store the text in my:docText
property:

  /jcr:root//element(*,my:document)[jcr:contains(my:docText,'hola
mundo')]

because Lucene is not indexing the "document text version". By the way,
can I get the text generated by text-extractors or it is only used by
Lucene engine?
-- 
Paco Avila <pa...@git.es>
GIT Consultors


Re: how can I say to jackrabbit to index a text when I put a TIFF in the repository?

Posted by Philipp Koch <ph...@gmail.com>.
> What do you mean with "document version"? the text from the TIFF image?
i was refering to my:docText in your example.

you could than call jcr:contains(jcr:content, 'hola mundo') .

regrads,
philipp


On Fri, Mar 28, 2008 at 8:33 AM, Paco Avila <pa...@git.es> wrote:
> What do you mean with "document version"? the text from the TIFF image?
>
>  El vie, 28-03-2008 a las 08:19 +0100, Philipp Koch escribió:
>
>
> > why don't you just add a mixin nodetype(that would contain the
>  > document version property) to the nt:resource node while uploading and
>  > store  doc version as additional property on the nt:resource node.
>  > this would solve your problem if i understood your use case the right
>  > way.
>  >
>  > regards,
>  > philipp
>  >
>  > On Fri, Mar 28, 2008 at 7:57 AM, Jukka Zitting <ju...@gmail.com> wrote:
>  > > Hi,
>  > >
>  > >
>  > >  On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila <pa...@git.es> wrote:
>  > >  > El vie, 28-03-2008 a las 08:26 +0200, Jukka Zitting escribió:
>  > >
>  > > >  > Or just a normal string property with the text to be indexed.
>  > >  >
>  > >
>  > > >  But, in this case, the query can't be:
>  > >  >
>  > >  >   /jcr:root//element(*,my:document)[jcr:contains(nt:resource,'hola
>  > >  >  mundo')]
>  > >  >
>  > >  >  and should be something like (if I store the text in my:docText
>  > >  >  property:
>  > >  >
>  > >  >   /jcr:root//element(*,my:document)[jcr:contains(my:docText,'hola
>  > >  >  mundo')]
>  > >  >
>  > >  >  because Lucene is not indexing the "document text version".
>  > >
>  > >  You could use jcr:contains(., 'hola mundo') that looks in all
>  > >  properties of a node.
>  > >
>  > >  Alternatively, you could also put the text in a TIFF comment and
>  > >  implement a custom TextExtractor class that pulls that comment for
>  > >  Jackrabbit to index as the text version of the TIFF file.
>  > >
>  > >
>  > >  >  By the way, can I get the text generated by text-extractors or
>  > >  >  it is only used by Lucene engine?
>  > >
>  > >  No, it's only used for Lucene. But of course you can instantiate and
>  > >  run the text extractors manually on any binary property you like.
>  > >
>  > >  BR,
>  > >
>  > >  Jukka Zitting
>  > >
>  --
>
> Paco Avila <pa...@git.es>
>  GIT Consultors
>
>

Re: how can I say to jackrabbit to index a text when I put a TIFF in the repository?

Posted by Paco Avila <pa...@git.es>.
What do you mean with "document version"? the text from the TIFF image?

El vie, 28-03-2008 a las 08:19 +0100, Philipp Koch escribió:
> why don't you just add a mixin nodetype(that would contain the
> document version property) to the nt:resource node while uploading and
> store  doc version as additional property on the nt:resource node.
> this would solve your problem if i understood your use case the right
> way.
> 
> regards,
> philipp
> 
> On Fri, Mar 28, 2008 at 7:57 AM, Jukka Zitting <ju...@gmail.com> wrote:
> > Hi,
> >
> >
> >  On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila <pa...@git.es> wrote:
> >  > El vie, 28-03-2008 a las 08:26 +0200, Jukka Zitting escribió:
> >
> > >  > Or just a normal string property with the text to be indexed.
> >  >
> >
> > >  But, in this case, the query can't be:
> >  >
> >  >   /jcr:root//element(*,my:document)[jcr:contains(nt:resource,'hola
> >  >  mundo')]
> >  >
> >  >  and should be something like (if I store the text in my:docText
> >  >  property:
> >  >
> >  >   /jcr:root//element(*,my:document)[jcr:contains(my:docText,'hola
> >  >  mundo')]
> >  >
> >  >  because Lucene is not indexing the "document text version".
> >
> >  You could use jcr:contains(., 'hola mundo') that looks in all
> >  properties of a node.
> >
> >  Alternatively, you could also put the text in a TIFF comment and
> >  implement a custom TextExtractor class that pulls that comment for
> >  Jackrabbit to index as the text version of the TIFF file.
> >
> >
> >  >  By the way, can I get the text generated by text-extractors or
> >  >  it is only used by Lucene engine?
> >
> >  No, it's only used for Lucene. But of course you can instantiate and
> >  run the text extractors manually on any binary property you like.
> >
> >  BR,
> >
> >  Jukka Zitting
> >
-- 
Paco Avila <pa...@git.es>
GIT Consultors


Re: how can I say to jackrabbit to index a text when I put a TIFF in the repository?

Posted by Philipp Koch <ph...@gmail.com>.
why don't you just add a mixin nodetype(that would contain the
document version property) to the nt:resource node while uploading and
store  doc version as additional property on the nt:resource node.
this would solve your problem if i understood your use case the right
way.

regards,
philipp

On Fri, Mar 28, 2008 at 7:57 AM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
>
>  On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila <pa...@git.es> wrote:
>  > El vie, 28-03-2008 a las 08:26 +0200, Jukka Zitting escribió:
>
> >  > Or just a normal string property with the text to be indexed.
>  >
>
> >  But, in this case, the query can't be:
>  >
>  >   /jcr:root//element(*,my:document)[jcr:contains(nt:resource,'hola
>  >  mundo')]
>  >
>  >  and should be something like (if I store the text in my:docText
>  >  property:
>  >
>  >   /jcr:root//element(*,my:document)[jcr:contains(my:docText,'hola
>  >  mundo')]
>  >
>  >  because Lucene is not indexing the "document text version".
>
>  You could use jcr:contains(., 'hola mundo') that looks in all
>  properties of a node.
>
>  Alternatively, you could also put the text in a TIFF comment and
>  implement a custom TextExtractor class that pulls that comment for
>  Jackrabbit to index as the text version of the TIFF file.
>
>
>  >  By the way, can I get the text generated by text-extractors or
>  >  it is only used by Lucene engine?
>
>  No, it's only used for Lucene. But of course you can instantiate and
>  run the text extractors manually on any binary property you like.
>
>  BR,
>
>  Jukka Zitting
>

Re: how can I say to jackrabbit to index a text when I put a TIFF in the repository?

Posted by Paco Avila <pa...@git.es>.
El vie, 28-03-2008 a las 09:44 +0200, Jukka Zitting escribió:
> Hi,
> 
> On Fri, Mar 28, 2008 at 9:39 AM, Paco Avila <pa...@git.es> wrote:
> >  El vie, 28-03-2008 a las 08:57 +0200, Jukka Zitting escribió:
> >  > You could use jcr:contains(., 'hola mundo') that looks in all
> >  > properties of a node.
> >
> >  But nt:resource is a subnode and my:docText is a property,
> >  jcr:contains(., 'hola mundo') will search in all properties and
> >  subnodes?
> 
> Ah, yes. Put the extra text property in the nt:resource node (with an
> appropriate mixin or subtype as mentioned by Philipp). Then
> jcr:contains(jcr:content,'hola mundo') should match the content.


Ok, I understand it now. Thanks!
-- 
Paco Avila <pa...@git.es>
GIT Consultors


Re: how can I say to jackrabbit to index a text when I put a TIFF in the repository?

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Mar 28, 2008 at 9:39 AM, Paco Avila <pa...@git.es> wrote:
>  El vie, 28-03-2008 a las 08:57 +0200, Jukka Zitting escribió:
>  > You could use jcr:contains(., 'hola mundo') that looks in all
>  > properties of a node.
>
>  But nt:resource is a subnode and my:docText is a property,
>  jcr:contains(., 'hola mundo') will search in all properties and
>  subnodes?

Ah, yes. Put the extra text property in the nt:resource node (with an
appropriate mixin or subtype as mentioned by Philipp). Then
jcr:contains(jcr:content,'hola mundo') should match the content.

BR,

Jukka Zitting

Re: how can I say to jackrabbit to index a text when I put a TIFF in the repository?

Posted by Paco Avila <pa...@git.es>.
El vie, 28-03-2008 a las 08:57 +0200, Jukka Zitting escribió:
> Hi,
> 
> On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila <pa...@git.es> wrote:
> > El vie, 28-03-2008 a las 08:26 +0200, Jukka Zitting escribió:
> >  > Or just a normal string property with the text to be indexed.
> >
> >  But, in this case, the query can't be:
> >
> >   /jcr:root//element(*,my:document)[jcr:contains(nt:resource,'hola
> >  mundo')]
> >
> >  and should be something like (if I store the text in my:docText
> >  property:
> >
> >   /jcr:root//element(*,my:document)[jcr:contains(my:docText,'hola
> >  mundo')]
> >
> >  because Lucene is not indexing the "document text version".
> 
> You could use jcr:contains(., 'hola mundo') that looks in all
> properties of a node.

But nt:resource is a subnode and my:docText is a property,
jcr:contains(., 'hola mundo') will search in all properties and
subnodes?

> Alternatively, you could also put the text in a TIFF comment and
> implement a custom TextExtractor class that pulls that comment for
> Jackrabbit to index as the text version of the TIFF file.
> 
> >  By the way, can I get the text generated by text-extractors or
> >  it is only used by Lucene engine?
> 
> No, it's only used for Lucene. But of course you can instantiate and
> run the text extractors manually on any binary property you like.

Thanks.


Re: how can I say to jackrabbit to index a text when I put a TIFF in the repository?

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila <pa...@git.es> wrote:
> El vie, 28-03-2008 a las 08:26 +0200, Jukka Zitting escribió:
>  > Or just a normal string property with the text to be indexed.
>
>  But, in this case, the query can't be:
>
>   /jcr:root//element(*,my:document)[jcr:contains(nt:resource,'hola
>  mundo')]
>
>  and should be something like (if I store the text in my:docText
>  property:
>
>   /jcr:root//element(*,my:document)[jcr:contains(my:docText,'hola
>  mundo')]
>
>  because Lucene is not indexing the "document text version".

You could use jcr:contains(., 'hola mundo') that looks in all
properties of a node.

Alternatively, you could also put the text in a TIFF comment and
implement a custom TextExtractor class that pulls that comment for
Jackrabbit to index as the text version of the TIFF file.

>  By the way, can I get the text generated by text-extractors or
>  it is only used by Lucene engine?

No, it's only used for Lucene. But of course you can instantiate and
run the text extractors manually on any binary property you like.

BR,

Jukka Zitting