You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Malcolm Clark <ma...@btinternet.com> on 2005/10/16 12:44:22 UTC

Lucene/Digester

Hi all,
I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm trying to obtain advice but it's hard to say whether the problem is Lucene or Digester.
Firstly:
I am trying to index the INEX collection but when I try to index repetitive elements only the last one is indexed. For example: 
<Book>
<Name>
<Title>
<Chapter></Chapter>
<Chapter></Chapter>
<Chapter></Chapter> //this is the only one indexed
</Title>
</Name>
</Book>
only the last Chapter element will be indexed and it will skip the first two. 
Secondly:
When using the Digester/Lucene with XML does each file have to contain e.g 
<!DOCTYPE books PUBLIC "-//LBIN//DTD IEEE Mag//EN" "xmlarticle.dtd" or is 
there a way around it?
 I have tried to use the sample line from the Digester API 
digester.register("-//Example Dot Com //DTD Sample Example//EN",  "assets/sample.dtd");
but to no avail.

Thanks very much. I really appreciate any possible solutions as I'm stumped.
Malcolm
Scotland

Re: Lucene/Digester

Posted by MALCOLM CLARK <ma...@btinternet.com>.

Okay I'll do that. Thanks very much for the advice as it's much appreciated.

Malcolm Clark

Re: Lucene/Digester

Posted by Grant Ingersoll <gs...@syr.edu>.

I would suggest isolating Digester from Lucene and running a unit test 
on just the Digester part to ensure you are getting the right things 
back from Digester.  Once you are sure your digester setup is correct, 
then look at how it feeds into Lucene and post back here if it still 
doesn't work

Oren Shir wrote:

>Hi,
>About your first question:
>How do you know that only the last "Chapter" element is stored? could it be
>that you are only getting the last one? Try using getValues() instead of
>get.
>
>Oren
>
>On 10/16/05, Malcolm Clark <ma...@btinternet.com> wrote:
>  
>
>>Hi all,
>>I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm
>>trying to obtain advice but it's hard to say whether the problem is Lucene
>>or Digester.
>>Firstly:
>>I am trying to index the INEX collection but when I try to index
>>repetitive elements only the last one is indexed. For example:
>><Book>
>><Name>
>><Title>
>><Chapter></Chapter>
>><Chapter></Chapter>
>><Chapter></Chapter> //this is the only one indexed
>></Title>
>></Name>
>></Book>
>>only the last Chapter element will be indexed and it will skip the first
>>two.
>>Secondly:
>>When using the Digester/Lucene with XML does each file have to contain e.g
>><!DOCTYPE books PUBLIC "-//LBIN//DTD IEEE Mag//EN" "xmlarticle.dtd" or is
>>there a way around it?
>>I have tried to use the sample line from the Digester API
>>digester.register("-//Example Dot Com //DTD Sample Example//EN",
>>"assets/sample.dtd");
>>but to no avail.
>>
>>Thanks very much. I really appreciate any possible solutions as I'm
>>stumped.
>>Malcolm
>>Scotland
>>
>>
>>    
>>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene/Digester

Posted by MALCOLM CLARK <ma...@btinternet.com>.

Hi

I used Luke to check the content of the index and they are not there.

cheers,

MC

Re: Lucene/Digester

Posted by Oren Shir <sh...@gmail.com>.

Hi,
About your first question:
How do you know that only the last "Chapter" element is stored? could it be
that you are only getting the last one? Try using getValues() instead of
get.

Oren

On 10/16/05, Malcolm Clark <ma...@btinternet.com> wrote:
>
> Hi all,
> I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm
> trying to obtain advice but it's hard to say whether the problem is Lucene
> or Digester.
> Firstly:
> I am trying to index the INEX collection but when I try to index
> repetitive elements only the last one is indexed. For example:
> <Book>
> <Name>
> <Title>
> <Chapter></Chapter>
> <Chapter></Chapter>
> <Chapter></Chapter> //this is the only one indexed
> </Title>
> </Name>
> </Book>
> only the last Chapter element will be indexed and it will skip the first
> two.
> Secondly:
> When using the Digester/Lucene with XML does each file have to contain e.g
> <!DOCTYPE books PUBLIC "-//LBIN//DTD IEEE Mag//EN" "xmlarticle.dtd" or is
> there a way around it?
> I have tried to use the sample line from the Digester API
> digester.register("-//Example Dot Com //DTD Sample Example//EN",
> "assets/sample.dtd");
> but to no avail.
>
> Thanks very much. I really appreciate any possible solutions as I'm
> stumped.
> Malcolm
> Scotland
>
>