You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xindice-users@xml.apache.org by Dan Barron <db...@mail.acponline.org> on 2002/07/16 18:43:49 UTC

Indexer Patterns

I'm trying to create indexers for a database of journal citations. One of the fields is a unique identifier called PMID. This is stored at the xpath /MedlineCitation/PMID. However, in other parts of the document are references to other articles that also have a PMID tag (lower down in the hierarchy.)

Now in the documentations, all the patterns seem to be simple element names. However, I don't want my PMID indexer to pickup the documents where PMID occurs elsewhere than directly under the root tag.

So the question is, is this a valid pattern (/MedlineCitation/PMID)? Or do I have to use just PMID and filter out the unwanted records later? I tried using this pattern and when I add documents, the indexer does not grow, which seems to me to mean that something is wrong with the definition.

Any help would be appreciated.

dan


____________________________________________________________________
Daniel W. Barron
Senior Systems Analyst/Application Developer
American College of Physicians-American Society of Internal Medicine
Tel: (215) 351-2617     Tel: (800) 523-1546 x2617
Fax: (215) 351-2644    E-mail: dbarron@mail.acponline.org

AW: Indexes, attributes and XPATH usage

Posted by "Dr. Klemens Waldhör" <Wa...@t-online.de>.

Hi,

After some tests I have the impression that xindice's xpath queries do
not work through nested-collections, just at the level of the specified
collection.

I could show this as I find the matching entries if I fully qualify the
collection, but if I do the same on a collection "higher up in the tree"
I do not find the match anymore.

Is this assumption correct ? And does it apply to the indexing
meachanism too ?

(I am using Windows XP).

Klemens

-----Ursprüngliche Nachricht-----
Von: Dr. Klemens Waldhör [mailto:Waldhoer@t-online.de] 
Gesendet: Mittwoch, 17. Juli 2002 07:56
An: xindice-users@xml.apache.org
Betreff: Indexes, attributes and XPATH usage


Hi,

I am importing a lot of documents into a XINDICE.

Documents look like that:

<?xml version='1.0' encoding='UTF-8' ?>
<tmx version='1.3'>
<header
	creationtool='tool'
	creationtoolversion='blabla'
	creationdate='20020716T153854Z'
	datatype='plaintext'
	segtype='segment'
	adminlang='EN-US'
	srclang='en'
	o-tmf='xxxxx'>
</header>
<body>
	<tu tuid='1' creationid='20020716T153854Z#1#790893209'>
		<prop type='sourceFile'>xxx.htm</prop>
		<prop type='targetFile'>yyy.htm</prop>
		<prop type='sourceSegNumber'>1</prop>
		<prop type='targetSegNumber'>1</prop>
		<tuv xml:lang="en"
creationid='20020716T153854Z#1#790893209.en'>
			<seg>This is a segment</seg>
		</tuv>
		<tuv xml:lang="de"
creationid='20020716T153854Z#1#790893209.de'>
			<seg>Das ist ein Satz.</seg>
		</tuv>
	</tu>
	....
</body>
</tmx>

The following creates a huge number of sub collections and documents in
there.

call xindiceadmin.bat dc -c /db -n test
call xindiceadmin.bat ac -c /db -n /test
call xindiceadmin.bat import -c /db/test -f dir -e xml

I am now using the following indexing command:

call xindiceadmin.bat ai -c /db/test -n testtuid -p tu[@tuid] -t int
call xindiceadmin.bat ai -c /db/test -n testtucreationid -p
tu[@creationid] -t string call xindiceadmin.bat ai -c /db/test -n
testtuvcreationid -p tuv[@creationid] -t string

When searching I am using:

xindice.bat xpath_query -c /db/test -q
"tu[@creationid='20020716T153854Z#1#790893209']"
or 
xindice.bat xpath_query -c /db/test -q
"/tmx/body/tu[@creationid='20020716T153854Z#1#790893209']"
Etc.

But I never get any results back - although the entries are in - as I
can see them in the XINDICE browser. Does the xpath_query search through
sub collections ? Or is the xpath statement wrong ?

Any idea what's wrong ? 

And is there a command available which allows to get the keys/names of
all the documents in a collection and/or its sub collections ? 

Thanks for your help !

Klemens

Indexes, attributes and XPATH usage

Posted by "Dr. Klemens Waldhör" <Wa...@t-online.de>.

Hi,

I am importing a lot of documents into a XINDICE.

Documents look like that:

<?xml version='1.0' encoding='UTF-8' ?>
<tmx version='1.3'>
<header
	creationtool='tool'
	creationtoolversion='blabla'
	creationdate='20020716T153854Z'
	datatype='plaintext'
	segtype='segment'
	adminlang='EN-US'
	srclang='en'
	o-tmf='xxxxx'>
</header>
<body>
	<tu tuid='1' creationid='20020716T153854Z#1#790893209'>
		<prop type='sourceFile'>xxx.htm</prop>
		<prop type='targetFile'>yyy.htm</prop>
		<prop type='sourceSegNumber'>1</prop>
		<prop type='targetSegNumber'>1</prop>
		<tuv xml:lang="en"
creationid='20020716T153854Z#1#790893209.en'>
			<seg>This is a segment</seg>
		</tuv>
		<tuv xml:lang="de"
creationid='20020716T153854Z#1#790893209.de'>
			<seg>Das ist ein Satz.</seg>
		</tuv>
	</tu>
	....
</body>
</tmx>

The following creates a huge number of sub collections and documents in
there.

call xindiceadmin.bat dc -c /db -n test
call xindiceadmin.bat ac -c /db -n /test
call xindiceadmin.bat import -c /db/test -f dir -e xml

I am now using the following indexing command:

call xindiceadmin.bat ai -c /db/test -n testtuid -p tu[@tuid] -t int
call xindiceadmin.bat ai -c /db/test -n testtucreationid -p
tu[@creationid] -t string
call xindiceadmin.bat ai -c /db/test -n testtuvcreationid -p
tuv[@creationid] -t string

When searching I am using:

xindice.bat xpath_query -c /db/test -q
"tu[@creationid='20020716T153854Z#1#790893209']"
or 
xindice.bat xpath_query -c /db/test -q
"/tmx/body/tu[@creationid='20020716T153854Z#1#790893209']"
Etc.

But I never get any results back - although the entries are in - as I
can see them in the XINDICE browser. Does the xpath_query search through
sub collections ? Or is the xpath statement wrong ?

Any idea what's wrong ? 

And is there a command available which allows to get the keys/names of
all the documents in a collection and/or its sub collections ? 

Thanks for your help !

Klemens

Re: Indexer Patterns

Posted by Heinrich Götzger <go...@gmx.net>.

Dan,

If I understand you right, you using PMID as unique key for your
documents.

If you retrieve your documents using your unique PMID you don't need any
index, do you?

Am I still on your track?

On Tue, 16 Jul 2002, Dan Barron wrote:

>I'm trying to create indexers for a database of journal citations. One of
>the fields is a unique identifier called PMID. This is stored at the
>xpath /MedlineCitation/PMID. However, in other parts of the document are
>references to other articles that also have a PMID tag (lower down in the
>hierarchy.)
>
>Now in the documentations, all the patterns seem to be simple element
>names. However, I don't want my PMID indexer to pickup the documents
>where PMID occurs elsewhere than directly under the root tag.
>
>So the question is, is this a valid pattern (/MedlineCitation/PMID)? Or
>do I have to use just PMID and filter out the unwanted records later? I
>tried using this pattern and when I add documents, the indexer does not
>grow, which seems to me to mean that something is wrong with the
>definition.
>
>Any help would be appreciated.
>
>dan
>
>
>____________________________________________________________________
>Daniel W. Barron
>Senior Systems Analyst/Application Developer
>American College of Physicians-American Society of Internal Medicine
>Tel: (215) 351-2617     Tel: (800) 523-1546 x2617
>Fax: (215) 351-2644    E-mail: dbarron@mail.acponline.org
>
>
regards

Heinrich
--
http://www.xmlBlaster.org