You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jagdip Singh <jx...@cs.rit.edu> on 2003/07/08 07:40:37 UTC
converting text/doc to XML
Hi,
How can I convert text/doc to XML?
Please help.
Regards,
Jagdip
RE: converting text/doc to XML
Posted by "Nader S. Henein" <ns...@bayt.net>.
We read from the database and parse the data into a valid XML then I
hand over the XML file to lucene which in turn digests it and indexes
the information
N.
-----Original Message-----
From: Jagdip Singh [mailto:jxs1878@cs.rit.edu]
Sent: Tuesday, July 08, 2003 10:39 AM
To: 'Lucene Users List'; nsh@bayt.net
Subject: RE: converting text/doc to XML
Hi Nader,
As you talked about using Lucene for your http://www.bayt.com web site.
Do you convert CV's or any other documents to XML format before
submitting to Lucene for indexing?
Regards,
Jagdip
-----Original Message-----
From: Nader S. Henein [mailto:nsh@bayt.net]
Sent: Tuesday, July 08, 2003 1:55 AM
To: 'Lucene Users List'
Subject: RE: converting text/doc to XML
XML is an organized, standardized format so let's say your document has
the following characteristics
File name : foobar.doc
Firt line title : Foo Bar
File content :
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Then you have to read the file ( simple file read, java can do this in
about ten different ways, pick one ) But each of the files
characteristincs in a variable
And then parse it in a valid XML:
<doc doc_id=1>
<file_name>foobar.doc</file_name>
<title>Foo Bar</title>
<content>
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
</content>
</doc>
There are probably packages that will do this for you but it's so simple
you could pull it off in under a hundred lines, it's also good exercise
to familiarize yourself with XML (if you haven't played around with it
before)
-----Original Message-----
From: Jagdip Singh [mailto:jxs1878@cs.rit.edu]
Sent: Tuesday, July 08, 2003 9:41 AM
To: 'Lucene Users List'
Subject: converting text/doc to XML
Hi,
How can I convert text/doc to XML?
Please help.
Regards,
Jagdip
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: converting text/doc to XML
Posted by Jagdip Singh <jx...@cs.rit.edu>.
Hi Nader,
As you talked about using Lucene for your http://www.bayt.com web site.
Do you convert CV's or any other documents to XML format before
submitting to Lucene for indexing?
Regards,
Jagdip
-----Original Message-----
From: Nader S. Henein [mailto:nsh@bayt.net]
Sent: Tuesday, July 08, 2003 1:55 AM
To: 'Lucene Users List'
Subject: RE: converting text/doc to XML
XML is an organized, standardized format so let's say your document has
the following characteristics
File name : foobar.doc
Firt line title : Foo Bar
File content :
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Then you have to read the file ( simple file read, java can do this in
about ten different ways, pick one )
But each of the files characteristincs in a variable
And then parse it in a valid XML:
<doc doc_id=1>
<file_name>foobar.doc</file_name>
<title>Foo Bar</title>
<content>
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
</content>
</doc>
There are probably packages that will do this for you but it's so simple
you could pull it off in under a hundred lines, it's also good exercise
to familiarize yourself with XML (if you haven't played around with it
before)
-----Original Message-----
From: Jagdip Singh [mailto:jxs1878@cs.rit.edu]
Sent: Tuesday, July 08, 2003 9:41 AM
To: 'Lucene Users List'
Subject: converting text/doc to XML
Hi,
How can I convert text/doc to XML?
Please help.
Regards,
Jagdip
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: converting text/doc to XML
Posted by Jagdip Singh <jx...@cs.rit.edu>.
I will try coding this.
Thanks,
Jagdip
-----Original Message-----
From: Nader S. Henein [mailto:nsh@bayt.net]
Sent: Tuesday, July 08, 2003 1:55 AM
To: 'Lucene Users List'
Subject: RE: converting text/doc to XML
XML is an organized, standardized format so let's say your document has
the following characteristics
File name : foobar.doc
Firt line title : Foo Bar
File content :
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Then you have to read the file ( simple file read, java can do this in
about ten different ways, pick one )
But each of the files characteristincs in a variable
And then parse it in a valid XML:
<doc doc_id=1>
<file_name>foobar.doc</file_name>
<title>Foo Bar</title>
<content>
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
</content>
</doc>
There are probably packages that will do this for you but it's so simple
you could pull it off in under a hundred lines, it's also good exercise
to familiarize yourself with XML (if you haven't played around with it
before)
-----Original Message-----
From: Jagdip Singh [mailto:jxs1878@cs.rit.edu]
Sent: Tuesday, July 08, 2003 9:41 AM
To: 'Lucene Users List'
Subject: converting text/doc to XML
Hi,
How can I convert text/doc to XML?
Please help.
Regards,
Jagdip
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: converting text/doc to XML
Posted by "Nader S. Henein" <ns...@bayt.net>.
XML is an organized, standardized format so let's say your document has
the following characteristics
File name : foobar.doc
Firt line title : Foo Bar
File content :
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Then you have to read the file ( simple file read, java can do this in
about ten different ways, pick one )
But each of the files characteristincs in a variable
And then parse it in a valid XML:
<doc doc_id=1>
<file_name>foobar.doc</file_name>
<title>Foo Bar</title>
<content>
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
</content>
</doc>
There are probably packages that will do this for you but it's so simple
you could pull it off in under a hundred lines, it's also good exercise
to familiarize yourself with XML (if you haven't played around with it
before)
-----Original Message-----
From: Jagdip Singh [mailto:jxs1878@cs.rit.edu]
Sent: Tuesday, July 08, 2003 9:41 AM
To: 'Lucene Users List'
Subject: converting text/doc to XML
Hi,
How can I convert text/doc to XML?
Please help.
Regards,
Jagdip
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org