You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Eric Thoman <ju...@manjug.org> on 2002/04/26 04:38:53 UTC

Text search with Xindice. XML from FrameMaker 7

Hi,

I'm looking at Xindice to store literary documents that will be created
with FrameMaker 7.

The user will enter keywords to search the entire text of the all of the
xml documents and return individual Chapters with their titles, the book
titles and the contents of the chapters.

How well might Xindice work against a 4000 page literary work? And, can
anyone
suggest a good approach? Sample code is always neat:)

Last, if anyone has any comments about FrameMakers' xml ability vis-a-vis
Xindice
I would be most appreciative.

Thanks,
Eric Thoman


Re: Text search with Xindice. XML from FrameMaker 7

Posted by "Mark J. Stang" <ma...@earthlink.net>.
Jeff,
Yes, it would, that is why I pointed him at the XPath tutorial.
The problem that he would have run into and that I ran into
was that I wanted to search entire collections for a paticular word.
That would have meant that I would have to look in practically
every element.   I decided to wait for Toms' enhancements.

Mark

Jeff Greif wrote:

> Mark,
> You mentioned 'contains' causing a search of the entire document.  Shouldn't
> it just search the content of the part of the document picked out by the
> rest of the XPath query?  I'm pretty sure I remember this from a quick
> source code tour a few months ago.
> Jeff
> ----- Original Message -----
> From: "Mark J. Stang" <ma...@earthlink.net>
> To: <xi...@xml.apache.org>
> Sent: Friday, April 26, 2002 9:38 AM
> Subject: Re: Text search with Xindice. XML from FrameMaker 7
>
> > Eric,
> > Tom Bradford is the one working on the full text indexer.   You will have
> > to check with him to see what his time-frame is for delivery.   As far as
> the
> > "contains", it is part of the XPath Query.   Try:
> >
> >  http://www.zvon.org/xxl/XPathTutorial/General/examples.html
> >
> > for a good introduction to XPath Queries.   Also, check out
> > the W3C XPath definition.   My code is designed around
> > attribute searching, not text.   Maybe someone has a sample, I
> > don't know what is available in the docs.
> >
> > HTH,
> > Mark
> >
> > Unknown wrote:
> >
> > > Mark,
> > > Thanks for the reply. Would you know when the 'full text indexer' will
> be
> > > implimented? And, would you have any sample code for doing an XPath
> query with
> > > 'Contains'?
> > > Thanks again,
> > > Eric
> > >
> > > "Mark J. Stang" wrote:
> > >
> > > > Eric,
> > > > Xindice would work, but the current version might be a bit slow.
> There
> > > > are plans for a full text indexer.   However, currently, you would
> have to
> > > > do an "XPath" query using "contains".   That will search, the entire
> > > > document.   Probably not as fast as you would like.   Once the
> > > > full text indexer comes out, then it should work fine.   It depends on
> > > > your time frame.
> > > >
> > > > Xindice is well suited to returning individual Chapters and all the
> > > > other items, if you break it up into the right set of documents and
> > > > collections.   I haven't used FrameMakers' xml so, I don't know
> > > > how it stores the documents.
> > > >
> > > > HTH,
> > > >
> > > > Mark
> > > >
> > > > Eric Thoman wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm looking at Xindice to store literary documents that will be
> created
> > > > > with FrameMaker 7.
> > > > >
> > > > > The user will enter keywords to search the entire text of the all of
> the
> > > > > xml documents and return individual Chapters with their titles, the
> book
> > > > > titles and the contents of the chapters.
> > > > >
> > > > > How well might Xindice work against a 4000 page literary work? And,
> can
> > > > > anyone
> > > > > suggest a good approach? Sample code is always neat:)
> > > > >
> > > > > Last, if anyone has any comments about FrameMakers' xml ability
> vis-a-vis
> > > > > Xindice
> > > > > I would be most appreciative.
> > > > >
> > > > > Thanks,
> > > > > Eric Thoman
> >
> >
> >


Re: Text search with Xindice. XML from FrameMaker 7

Posted by Jeff Greif <jg...@alumni.princeton.edu>.
Mark,
You mentioned 'contains' causing a search of the entire document.  Shouldn't
it just search the content of the part of the document picked out by the
rest of the XPath query?  I'm pretty sure I remember this from a quick
source code tour a few months ago.
Jeff
----- Original Message -----
From: "Mark J. Stang" <ma...@earthlink.net>
To: <xi...@xml.apache.org>
Sent: Friday, April 26, 2002 9:38 AM
Subject: Re: Text search with Xindice. XML from FrameMaker 7


> Eric,
> Tom Bradford is the one working on the full text indexer.   You will have
> to check with him to see what his time-frame is for delivery.   As far as
the
> "contains", it is part of the XPath Query.   Try:
>
>  http://www.zvon.org/xxl/XPathTutorial/General/examples.html
>
> for a good introduction to XPath Queries.   Also, check out
> the W3C XPath definition.   My code is designed around
> attribute searching, not text.   Maybe someone has a sample, I
> don't know what is available in the docs.
>
> HTH,
> Mark
>
> Unknown wrote:
>
> > Mark,
> > Thanks for the reply. Would you know when the 'full text indexer' will
be
> > implimented? And, would you have any sample code for doing an XPath
query with
> > 'Contains'?
> > Thanks again,
> > Eric
> >
> > "Mark J. Stang" wrote:
> >
> > > Eric,
> > > Xindice would work, but the current version might be a bit slow.
There
> > > are plans for a full text indexer.   However, currently, you would
have to
> > > do an "XPath" query using "contains".   That will search, the entire
> > > document.   Probably not as fast as you would like.   Once the
> > > full text indexer comes out, then it should work fine.   It depends on
> > > your time frame.
> > >
> > > Xindice is well suited to returning individual Chapters and all the
> > > other items, if you break it up into the right set of documents and
> > > collections.   I haven't used FrameMakers' xml so, I don't know
> > > how it stores the documents.
> > >
> > > HTH,
> > >
> > > Mark
> > >
> > > Eric Thoman wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm looking at Xindice to store literary documents that will be
created
> > > > with FrameMaker 7.
> > > >
> > > > The user will enter keywords to search the entire text of the all of
the
> > > > xml documents and return individual Chapters with their titles, the
book
> > > > titles and the contents of the chapters.
> > > >
> > > > How well might Xindice work against a 4000 page literary work? And,
can
> > > > anyone
> > > > suggest a good approach? Sample code is always neat:)
> > > >
> > > > Last, if anyone has any comments about FrameMakers' xml ability
vis-a-vis
> > > > Xindice
> > > > I would be most appreciative.
> > > >
> > > > Thanks,
> > > > Eric Thoman
>
>
>


Re: Text search with Xindice. XML from FrameMaker 7

Posted by Eric Thoman <ju...@manjug.org>.
Tom,
That sounds great. I'm currently working on a net project where I could try
it out for you. Or, just do private beta testing before you post it:)
Eric

Tom Bradford wrote:

> On Friday, April 26, 2002, at 09:38  AM, Mark J. Stang wrote:
> > Tom Bradford is the one working on the full text indexer.   You will
> > have
> > to check with him to see what his time-frame is for delivery.   As far
> > as the
> > "contains", it is part of the XPath Query.   Try:
> >
> >  http://www.zvon.org/xxl/XPathTutorial/General/examples.html
> >
> > for a good introduction to XPath Queries.   Also, check out
> > the W3C XPath definition.   My code is designed around
> > attribute searching, not text.   Maybe someone has a sample, I
> > don't know what is available in the docs.
>
> The contains() function will not be the place where full text searching
> is executed.  Contains is a substring search function only, and is not
> meant for full text searching where stemming and case rolling are
> applied.
>
> What likely will be the case is that I extend our XPath implementation
> to support a special full text searching function.  Other DBs use an
> operator like ~ for full text searching, but I'd rather not deviate from
> the spec.
>
> Likely, the initial usage scenario would be something like:
>
> /root[child[search(.,"words")]]
>
> Where everything is just implicitly ANDed.  Eventually, I'll add
> explicit ANDing as well as ORing, NOT, etc...
>
> The indexer and stemmer are done, it's just a matter of updating the
> XPath implementation.
>
> --
> Tom Bradford - http://www.tbradford.org
> Architect - XQRL (XQuery Engine) - http://www.xqrl.com
> Apache Xindice (XML Database) - http://xml.apache.org/xindice
> Labrador (Web Services Hub) - http://www.notdotnet.org/labrador


Re: Text search with Xindice. XML from FrameMaker 7

Posted by Tom Bradford <to...@xqrl.com>.
On Friday, April 26, 2002, at 09:38  AM, Mark J. Stang wrote:
> Tom Bradford is the one working on the full text indexer.   You will 
> have
> to check with him to see what his time-frame is for delivery.   As far 
> as the
> "contains", it is part of the XPath Query.   Try:
>
>  http://www.zvon.org/xxl/XPathTutorial/General/examples.html
>
> for a good introduction to XPath Queries.   Also, check out
> the W3C XPath definition.   My code is designed around
> attribute searching, not text.   Maybe someone has a sample, I
> don't know what is available in the docs.

The contains() function will not be the place where full text searching 
is executed.  Contains is a substring search function only, and is not 
meant for full text searching where stemming and case rolling are 
applied.

What likely will be the case is that I extend our XPath implementation 
to support a special full text searching function.  Other DBs use an 
operator like ~ for full text searching, but I'd rather not deviate from 
the spec.

Likely, the initial usage scenario would be something like:

/root[child[search(.,"words")]]

Where everything is just implicitly ANDed.  Eventually, I'll add 
explicit ANDing as well as ORing, NOT, etc...

The indexer and stemmer are done, it's just a matter of updating the 
XPath implementation.

--
Tom Bradford - http://www.tbradford.org
Architect - XQRL (XQuery Engine) - http://www.xqrl.com
Apache Xindice (XML Database) - http://xml.apache.org/xindice
Labrador (Web Services Hub) - http://www.notdotnet.org/labrador


Re: Text search with Xindice. XML from FrameMaker 7

Posted by "Mark J. Stang" <ma...@earthlink.net>.
Eric,
Tom Bradford is the one working on the full text indexer.   You will have
to check with him to see what his time-frame is for delivery.   As far as the
"contains", it is part of the XPath Query.   Try:

 http://www.zvon.org/xxl/XPathTutorial/General/examples.html

for a good introduction to XPath Queries.   Also, check out
the W3C XPath definition.   My code is designed around
attribute searching, not text.   Maybe someone has a sample, I
don't know what is available in the docs.

HTH,
Mark

Unknown wrote:

> Mark,
> Thanks for the reply. Would you know when the 'full text indexer' will be
> implimented? And, would you have any sample code for doing an XPath query with
> 'Contains'?
> Thanks again,
> Eric
>
> "Mark J. Stang" wrote:
>
> > Eric,
> > Xindice would work, but the current version might be a bit slow.   There
> > are plans for a full text indexer.   However, currently, you would have to
> > do an "XPath" query using "contains".   That will search, the entire
> > document.   Probably not as fast as you would like.   Once the
> > full text indexer comes out, then it should work fine.   It depends on
> > your time frame.
> >
> > Xindice is well suited to returning individual Chapters and all the
> > other items, if you break it up into the right set of documents and
> > collections.   I haven't used FrameMakers' xml so, I don't know
> > how it stores the documents.
> >
> > HTH,
> >
> > Mark
> >
> > Eric Thoman wrote:
> >
> > > Hi,
> > >
> > > I'm looking at Xindice to store literary documents that will be created
> > > with FrameMaker 7.
> > >
> > > The user will enter keywords to search the entire text of the all of the
> > > xml documents and return individual Chapters with their titles, the book
> > > titles and the contents of the chapters.
> > >
> > > How well might Xindice work against a 4000 page literary work? And, can
> > > anyone
> > > suggest a good approach? Sample code is always neat:)
> > >
> > > Last, if anyone has any comments about FrameMakers' xml ability vis-a-vis
> > > Xindice
> > > I would be most appreciative.
> > >
> > > Thanks,
> > > Eric Thoman


Re: Text search with Xindice. XML from FrameMaker 7

Posted by Grainne Reilly <gr...@attbi.com>.
Eric,

here is an example of using contains from an earlier posting to the list 
(http://marc.theaimsgroup.com/?l=xindice-users&m=101472543220142&w=2):

<snip>

I search for all "names" of "things" which contain the searchstring in one 
of its nodes. It looks like this:
"/things/name[contains(translate(../descendant-or-self::*,'ABCDEFGHIJKLM 
NOPQRSTUVWXYZ=C4=D6=DC','abcdefghijklmnopqrstuvwxyzäöü'), translate('" + 
searchstring + 
"','ABCDEFGHIJKLMNOPQRSTUVWXYZ=C4=D6=DC','abcdefghijklmnopqrstuvwxyzäöü'))]";

Case sensitive it would simply be: 
"/things/name[contains(../descendant-or-self::*, searchstring)]"

<snip>

Grainne.

At 11:45 AM 4/26/2002 -0400, Eric Thoman wrote:
>Mark,
>Thanks for the reply. Would you know when the 'full text indexer' will be
>implimented? And, would you have any sample code for doing an XPath query with
>'Contains'?
>Thanks again,
>Eric
>
>"Mark J. Stang" wrote:
>
> > Eric,
> > Xindice would work, but the current version might be a bit slow.   There
> > are plans for a full text indexer.   However, currently, you would have to
> > do an "XPath" query using "contains".   That will search, the entire
> > document.   Probably not as fast as you would like.   Once the
> > full text indexer comes out, then it should work fine.   It depends on
> > your time frame.
> >
> > Xindice is well suited to returning individual Chapters and all the
> > other items, if you break it up into the right set of documents and
> > collections.   I haven't used FrameMakers' xml so, I don't know
> > how it stores the documents.
> >
> > HTH,
> >
> > Mark
> >
> > Eric Thoman wrote:
> >
> > > Hi,
> > >
> > > I'm looking at Xindice to store literary documents that will be created
> > > with FrameMaker 7.
> > >
> > > The user will enter keywords to search the entire text of the all of the
> > > xml documents and return individual Chapters with their titles, the book
> > > titles and the contents of the chapters.
> > >
> > > How well might Xindice work against a 4000 page literary work? And, can
> > > anyone
> > > suggest a good approach? Sample code is always neat:)
> > >
> > > Last, if anyone has any comments about FrameMakers' xml ability vis-a-vis
> > > Xindice
> > > I would be most appreciative.
> > >
> > > Thanks,
> > > Eric Thoman



Re: Text search with Xindice. XML from FrameMaker 7

Posted by Eric Thoman <ju...@manjug.org>.
Mark,
Thanks for the reply. Would you know when the 'full text indexer' will be
implimented? And, would you have any sample code for doing an XPath query with
'Contains'?
Thanks again,
Eric

"Mark J. Stang" wrote:

> Eric,
> Xindice would work, but the current version might be a bit slow.   There
> are plans for a full text indexer.   However, currently, you would have to
> do an "XPath" query using "contains".   That will search, the entire
> document.   Probably not as fast as you would like.   Once the
> full text indexer comes out, then it should work fine.   It depends on
> your time frame.
>
> Xindice is well suited to returning individual Chapters and all the
> other items, if you break it up into the right set of documents and
> collections.   I haven't used FrameMakers' xml so, I don't know
> how it stores the documents.
>
> HTH,
>
> Mark
>
> Eric Thoman wrote:
>
> > Hi,
> >
> > I'm looking at Xindice to store literary documents that will be created
> > with FrameMaker 7.
> >
> > The user will enter keywords to search the entire text of the all of the
> > xml documents and return individual Chapters with their titles, the book
> > titles and the contents of the chapters.
> >
> > How well might Xindice work against a 4000 page literary work? And, can
> > anyone
> > suggest a good approach? Sample code is always neat:)
> >
> > Last, if anyone has any comments about FrameMakers' xml ability vis-a-vis
> > Xindice
> > I would be most appreciative.
> >
> > Thanks,
> > Eric Thoman


Re: Text search with Xindice. XML from FrameMaker 7

Posted by "Mark J. Stang" <ma...@earthlink.net>.
Eric,
Xindice would work, but the current version might be a bit slow.   There
are plans for a full text indexer.   However, currently, you would have to
do an "XPath" query using "contains".   That will search, the entire
document.   Probably not as fast as you would like.   Once the
full text indexer comes out, then it should work fine.   It depends on
your time frame.

Xindice is well suited to returning individual Chapters and all the
other items, if you break it up into the right set of documents and
collections.   I haven't used FrameMakers' xml so, I don't know
how it stores the documents.

HTH,

Mark

Eric Thoman wrote:

> Hi,
>
> I'm looking at Xindice to store literary documents that will be created
> with FrameMaker 7.
>
> The user will enter keywords to search the entire text of the all of the
> xml documents and return individual Chapters with their titles, the book
> titles and the contents of the chapters.
>
> How well might Xindice work against a 4000 page literary work? And, can
> anyone
> suggest a good approach? Sample code is always neat:)
>
> Last, if anyone has any comments about FrameMakers' xml ability vis-a-vis
> Xindice
> I would be most appreciative.
>
> Thanks,
> Eric Thoman