You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Harun Altay <ha...@indiana.edu> on 2002/01/12 21:42:35 UTC

I want to search on BOTH --> (1) "XML" data and (2) "Text" data.

Hello Friends,

I want to search on BOTH --> (1) "XML" data and (2) "Text" data.


(1). "Text Data" --> mostly consist of HTML pages, residing on the server...
example : hundreds of HTML, TXT file, etc...


(2). "XML Data" --> for example, Articles that was stored in XML format, lets say like this :

<article>
<article code>  ....   </article code>
<article title>   ....  </article title>
<author>  .... </author>
<date> ... </date>
<etc> ... </etc>

<body of th eTEXT>
.
.......................... the article body, TEXT ......
.
.
.
.
</body of th eTEXT>

</article>

In this type of search, we need to search this "XML-based author file" in two different ways :
    2.a. First Way of searching : Searching XML file through its KEYWORDS, like : date = "Jan-01-2002" and author = "George Washington"
    2.b. Second Way of Searching : Free search on the article body. For example : All the articles, whose body has the word 'Hello', or the sentence 'Hello Mr. President!' 


Note-1:

XML file may reside either Operating System level, or in a XML-supporting DATABASE, as well.


Note-2:

If I need to have them, I can write extra java classes to support some more functionality, if possible...


Thank you very much,
Harun.






Re: I want to search on BOTH --> (1) "XML" data and (2) "Text" data.

Posted by Chantal Ackermann <ch...@web.de>.
oh, what stupidity, I'm sorry. I confused some mails! I hope you've done some 
grinning on this, at least.

Chantal

Am Montag, 21. Januar 2002 08:36 schrieben Sie:
> hello Harun,
>
> if your often doing searching, maybe you'd like to index all the files. Try
> out Lucene (Jakarta Project). It's a pure Java written indexing framework
> (to be integrated in your java program).
>
> Chantal
>
> Am Samstag, 12. Januar 2002 21:42 schrieben Sie:
> > Hello Friends,
> >
> > I want to search on BOTH --> (1) "XML" data and (2) "Text" data.
> >
> >
> > (1). "Text Data" --> mostly consist of HTML pages, residing on the
> > server... example : hundreds of HTML, TXT file, etc...
> >
> >
> > (2). "XML Data" --> for example, Articles that was stored in XML format,
> > lets say like this :
> >
> > <article>
> > <article code>  ....   </article code>
> > <article title>   ....  </article title>
> > <author>  .... </author>
> > <date> ... </date>
> > <etc> ... </etc>
> >
> > <body of th eTEXT>
> > .
> > .......................... the article body, TEXT ......
> > .
> > .
> > .
> > .
> > </body of th eTEXT>
> >
> > </article>
> >
> > In this type of search, we need to search this "XML-based author file" in
> > two different ways : 2.a. First Way of searching : Searching XML file
> > through its KEYWORDS, like : date = "Jan-01-2002" and author = "George
> > Washington" 2.b. Second Way of Searching : Free search on the article
> > body. For example : All the articles, whose body has the word 'Hello', or
> > the sentence 'Hello Mr. President!'
> >
> >
> > Note-1:
> >
> > XML file may reside either Operating System level, or in a XML-supporting
> > DATABASE, as well.
> >
> >
> > Note-2:
> >
> > If I need to have them, I can write extra java classes to support some
> > more functionality, if possible...
> >
> >
> > Thank you very much,
> > Harun.

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: I want to search on BOTH --> (1) "XML" data and (2) "Text" data.

Posted by Chantal Ackermann <ch...@biomax.de>.
hello Harun,

if your often doing searching, maybe you'd like to index all the files. Try 
out Lucene (Jakarta Project). It's a pure Java written indexing framework (to 
be integrated in your java program).

Chantal

Am Samstag, 12. Januar 2002 21:42 schrieben Sie:
> Hello Friends,
>
> I want to search on BOTH --> (1) "XML" data and (2) "Text" data.
>
>
> (1). "Text Data" --> mostly consist of HTML pages, residing on the
> server... example : hundreds of HTML, TXT file, etc...
>
>
> (2). "XML Data" --> for example, Articles that was stored in XML format,
> lets say like this :
>
> <article>
> <article code>  ....   </article code>
> <article title>   ....  </article title>
> <author>  .... </author>
> <date> ... </date>
> <etc> ... </etc>
>
> <body of th eTEXT>
> .
> .......................... the article body, TEXT ......
> .
> .
> .
> .
> </body of th eTEXT>
>
> </article>
>
> In this type of search, we need to search this "XML-based author file" in
> two different ways : 2.a. First Way of searching : Searching XML file
> through its KEYWORDS, like : date = "Jan-01-2002" and author = "George
> Washington" 2.b. Second Way of Searching : Free search on the article body.
> For example : All the articles, whose body has the word 'Hello', or the
> sentence 'Hello Mr. President!'
>
>
> Note-1:
>
> XML file may reside either Operating System level, or in a XML-supporting
> DATABASE, as well.
>
>
> Note-2:
>
> If I need to have them, I can write extra java classes to support some more
> functionality, if possible...
>
>
> Thank you very much,
> Harun.

-- 
Chantal Ackermann M.A.

Biomax Informatics AG
Lochhamer Str. 11
D - 82152 Martinsried
Tel: (0 89) 89 55 74 - 857

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: I want to search on BOTH --> (1) "XML" data and (2) "Text" data.

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I know :(
About 2 more weeks, Peter said.
Toes crossed.

Otis

--- Kelvin Tan <ke...@relevanz.com> wrote:
> > You can also look on www.mail-archive.com and search this list's
> > archive for some related discussions.  Try searching for Philip
> Ogren
> > (I think I got the name right), he sent some code that lets you go
> from
> > XML -> Lucene Document quickly, I think.
> >
> > Otis
> 
> Ugh. I hate that we have to rely on the mailing list archive to point
> to
> user contributions and/or resources. Can't wait for the
> "Contributions"
> section of Lucene to get up.
> 
> Regards,
> Kelvin
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: I want to search on BOTH --> (1) "XML" data and (2) "Text" data.

Posted by Kelvin Tan <ke...@relevanz.com>.
> You can also look on www.mail-archive.com and search this list's
> archive for some related discussions.  Try searching for Philip Ogren
> (I think I got the name right), he sent some code that lets you go from
> XML -> Lucene Document quickly, I think.
>
> Otis

Ugh. I hate that we have to rely on the mailing list archive to point to
user contributions and/or resources. Can't wait for the "Contributions"
section of Lucene to get up.

Regards,
Kelvin


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: I want to search on BOTH --> (1) "XML" data and (2) "Text" data.

Posted by Harun Altay <ha...@indiana.edu>.
I am very new in Lucene. Your comments are really helpfull for me. Thank
you.
Harun.


----- Original Message -----
From: <ca...@bookandhammer.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Saturday, January 12, 2002 5:41 PM
Subject: Re: I want to search on BOTH --> (1) "XML" data and (2) "Text"
data.


> Hi,
> I don't know how much you know about Lucene, but Otis is exactly right.
> You can create two (or more) different Lucene Documents and retrieve the
> information you want and put it into fields.
>
> I am in the midst of creating a Document Repository that I haven't
> gotten up yet.
>
> Here are some links to different documents that might help
>
>
> XML Based Document Implementations
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00346.html
> http://marc.theaimsgroup.com/?l=lucene-dev&m=100723333506246&w=2
>
> The best HTML Document, is the one used in the lucene demo.
>
> I hope this helps.
>
>
> --Peter
>
>
>
> On Saturday, January 12, 2002, at 01:54 PM, Otis Gospodnetic wrote:
>
> > Hello,
> >
> > You could write an XML parser (see http://xml.apache.org/ for some XML
> > tools) and store XML elements as Fields in Lucene Documents.
> > To search for 'Hello' and 'Hello Mr. President!' you can store the
> > whole article body as a Text (or maybe UnStored) Field.
> > You can also look on www.mail-archive.com and search this list's
> > archive for some related discussions.  Try searching for Philip Ogren
> > (I think I got the name right), he sent some code that lets you go from
> > XML -> Lucene Document quickly, I think.
> >
> > Otis
> >
> >
> > --- Harun Altay <ha...@indiana.edu> wrote:
> >> Hello Friends,
> >>
> >> I want to search on BOTH --> (1) "XML" data and (2) "Text" data.
> >>
> >>
> >> (1). "Text Data" --> mostly consist of HTML pages, residing on the
> >> server...
> >> example : hundreds of HTML, TXT file, etc...
> >>
> >>
> >> (2). "XML Data" --> for example, Articles that was stored in XML
> >> format, lets say like this :
> >>
> >> <article>
> >> <article code>  ....   </article code>
> >> <article title>   ....  </article title>
> >> <author>  .... </author>
> >> <date> ... </date>
> >> <etc> ... </etc>
> >>
> >> <body of th eTEXT>
> >> .
> >> .......................... the article body, TEXT ......
> >> .
> >> .
> >> .
> >> .
> >> </body of th eTEXT>
> >>
> >> </article>
> >>
> >> In this type of search, we need to search this "XML-based author
> >> file" in two different ways :
> >>     2.a. First Way of searching : Searching XML file through its
> >> KEYWORDS, like : date = "Jan-01-2002" and author = "George
> >> Washington"
> >>     2.b. Second Way of Searching : Free search on the article body.
> >> For example : All the articles, whose body has the word 'Hello', or
> >> the sentence 'Hello Mr. President!'
> >>
> >>
> >> Note-1:
> >>
> >> XML file may reside either Operating System level, or in a
> >> XML-supporting DATABASE, as well.
> >>
> >>
> >> Note-2:
> >>
> >> If I need to have them, I can write extra java classes to support
> >> some more functionality, if possible...
> >>
> >>
> >> Thank you very much,
> >> Harun.
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Send FREE video emails in Yahoo! Mail!
> > http://promo.yahoo.com/videomail/
> >
> > --
> > To unsubscribe, e-mail:   <mailto:lucene-user-
> > unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail: <mailto:lucene-user-
> > help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: I want to search on BOTH --> (1) "XML" data and (2) "Text" data.

Posted by ca...@bookandhammer.com.
Hi,
I don't know how much you know about Lucene, but Otis is exactly right.
You can create two (or more) different Lucene Documents and retrieve the 
information you want and put it into fields.

I am in the midst of creating a Document Repository that I haven't 
gotten up yet.

Here are some links to different documents that might help


XML Based Document Implementations
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00346.html
http://marc.theaimsgroup.com/?l=lucene-dev&m=100723333506246&w=2

The best HTML Document, is the one used in the lucene demo.

I hope this helps.


--Peter



On Saturday, January 12, 2002, at 01:54 PM, Otis Gospodnetic wrote:

> Hello,
>
> You could write an XML parser (see http://xml.apache.org/ for some XML
> tools) and store XML elements as Fields in Lucene Documents.
> To search for 'Hello' and 'Hello Mr. President!' you can store the
> whole article body as a Text (or maybe UnStored) Field.
> You can also look on www.mail-archive.com and search this list's
> archive for some related discussions.  Try searching for Philip Ogren
> (I think I got the name right), he sent some code that lets you go from
> XML -> Lucene Document quickly, I think.
>
> Otis
>
>
> --- Harun Altay <ha...@indiana.edu> wrote:
>> Hello Friends,
>>
>> I want to search on BOTH --> (1) "XML" data and (2) "Text" data.
>>
>>
>> (1). "Text Data" --> mostly consist of HTML pages, residing on the
>> server...
>> example : hundreds of HTML, TXT file, etc...
>>
>>
>> (2). "XML Data" --> for example, Articles that was stored in XML
>> format, lets say like this :
>>
>> <article>
>> <article code>  ....   </article code>
>> <article title>   ....  </article title>
>> <author>  .... </author>
>> <date> ... </date>
>> <etc> ... </etc>
>>
>> <body of th eTEXT>
>> .
>> .......................... the article body, TEXT ......
>> .
>> .
>> .
>> .
>> </body of th eTEXT>
>>
>> </article>
>>
>> In this type of search, we need to search this "XML-based author
>> file" in two different ways :
>>     2.a. First Way of searching : Searching XML file through its
>> KEYWORDS, like : date = "Jan-01-2002" and author = "George
>> Washington"
>>     2.b. Second Way of Searching : Free search on the article body.
>> For example : All the articles, whose body has the word 'Hello', or
>> the sentence 'Hello Mr. President!'
>>
>>
>> Note-1:
>>
>> XML file may reside either Operating System level, or in a
>> XML-supporting DATABASE, as well.
>>
>>
>> Note-2:
>>
>> If I need to have them, I can write extra java classes to support
>> some more functionality, if possible...
>>
>>
>> Thank you very much,
>> Harun.
>>
>>
>>
>>
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-
> help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: I want to search on BOTH --> (1) "XML" data and (2) "Text" data.

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

You could write an XML parser (see http://xml.apache.org/ for some XML
tools) and store XML elements as Fields in Lucene Documents.
To search for 'Hello' and 'Hello Mr. President!' you can store the
whole article body as a Text (or maybe UnStored) Field.
You can also look on www.mail-archive.com and search this list's
archive for some related discussions.  Try searching for Philip Ogren
(I think I got the name right), he sent some code that lets you go from
XML -> Lucene Document quickly, I think.

Otis


--- Harun Altay <ha...@indiana.edu> wrote:
> Hello Friends,
> 
> I want to search on BOTH --> (1) "XML" data and (2) "Text" data.
> 
> 
> (1). "Text Data" --> mostly consist of HTML pages, residing on the
> server...
> example : hundreds of HTML, TXT file, etc...
> 
> 
> (2). "XML Data" --> for example, Articles that was stored in XML
> format, lets say like this :
> 
> <article>
> <article code>  ....   </article code>
> <article title>   ....  </article title>
> <author>  .... </author>
> <date> ... </date>
> <etc> ... </etc>
> 
> <body of th eTEXT>
> .
> .......................... the article body, TEXT ......
> .
> .
> .
> .
> </body of th eTEXT>
> 
> </article>
> 
> In this type of search, we need to search this "XML-based author
> file" in two different ways :
>     2.a. First Way of searching : Searching XML file through its
> KEYWORDS, like : date = "Jan-01-2002" and author = "George
> Washington"
>     2.b. Second Way of Searching : Free search on the article body.
> For example : All the articles, whose body has the word 'Hello', or
> the sentence 'Hello Mr. President!' 
> 
> 
> Note-1:
> 
> XML file may reside either Operating System level, or in a
> XML-supporting DATABASE, as well.
> 
> 
> Note-2:
> 
> If I need to have them, I can write extra java classes to support
> some more functionality, if possible...
> 
> 
> Thank you very much,
> Harun.
> 
> 
> 
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>