You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Scott Farquhar <sc...@atlassian.com> on 2003/07/15 17:39:49 UTC

Lucene 1.3 Release Schedule

Morning all!

Is there a document where I can find the release schedule for Lucene 
1.3, or a list of issues that are still remaining to be fixed?

I would like to upgrade our product (JIRA) to a later Lucene version, 
and I was waiting for 1.3 to be released.  However, I see that it has 
been in rc1 for 4 months - so I was curious to hear if I should wait for 
a final release.

Thanks for your time.  If there is a url / document answering my 
concerns - please feel free to just point me at that.

Cheers,
Scott
-- 

ATLASSIAN - http://www.atlassian.com
Expert J2EE Software, Services and Support
-------------------------------------------------------
Need a simple, powerful way to track and manage issues?
Try JIRA - http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: about PDF / HTML index

Posted by Ben Litchfield <be...@csh.rit.edu>.

PDFBox comes with the class
org.pdfbox.searchengine.lucene.LucenePDFDocument which shows how to
parse /index a pdf document.

Ben


On Tue, 15 Jul 2003, alvaro z wrote:

>
> im using lucene with TXT and HTML files , its working.
>
> the only problem with HTML files is that i have to index html files as txt first , before to index them as HTML.
>
> do anyone have try to index pdf files ?
>
> im trying the pdfbox , is there any samples for indexing pdf files ? (i dont find any samples to do that) with any of the parsers (pdfbox, jpedal ,etc).
>
> thanks for helping,
>
> Alvaro. from Lima - Peru
>
>
> ---------------------------------
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: about PDF / HTML index

Posted by Peter Becker <pb...@dstc.edu.au>.

Hi Alvaro,

there are some examples in our code here -- working with a slightly 
similar interface to the Ant task in the Lucene contributions.

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/toscanaj/docco/source/org/tockit/docco/indexer/documenthandler/

The actual step of turning it into a Lucene Document happens here:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/toscanaj/docco/source/org/tockit/docco/indexer/DocumentProcessingFactory.java?rev=1.30&content-type=text/vnd.viewcvs-markup

This code is still work in progress, but it does work -- we are running 
it on a few ten thousand documents from time to time. Both PDFBox and 
Multivalent fail to read some PDF documents in the collection, but so 
does Acrobat Reader. We still have to do a more formal test to see which 
one does a better job, at the moment we are still coding the core bits, 
then we test properly.

HTH,
    Peter

alvaro z wrote:

>im using lucene with TXT and HTML files , its working.
>
>the only problem with HTML files is that i have to index html files as txt first , before to index them as HTML.
>
>do anyone have try to index pdf files ? 
>
>im trying the pdfbox , is there any samples for indexing pdf files ? (i dont find any samples to do that) with any of the parsers (pdfbox, jpedal ,etc).
>
>thanks for helping,
>
>Alvaro. from Lima - Peru
>
>
>---------------------------------
>Do you Yahoo!?
>SBC Yahoo! DSL - Now only $29.95 per month!
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

about PDF / HTML index

Posted by alvaro z <al...@yahoo.com>.

im using lucene with TXT and HTML files , its working.

the only problem with HTML files is that i have to index html files as txt first , before to index them as HTML.

do anyone have try to index pdf files ? 

im trying the pdfbox , is there any samples for indexing pdf files ? (i dont find any samples to do that) with any of the parsers (pdfbox, jpedal ,etc).

thanks for helping,

Alvaro. from Lima - Peru


---------------------------------
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

Re: Lucene 1.3 Release Schedule

Posted by Matt Tucker <ma...@jivesoftware.com>.

Scott,

Not sure on the official release schedule, but I don't think there's any 
reason to wait to upgrade. We've been using 1.3 CVS builds and then the 
RC in Jive Forums for many, many months and it's a very solid 
improvement over 1.2.

-Matt

Scott Farquhar wrote:
> Morning all!
> 
> Is there a document where I can find the release schedule for Lucene 
> 1.3, or a list of issues that are still remaining to be fixed?
> 
> I would like to upgrade our product (JIRA) to a later Lucene version, 
> and I was waiting for 1.3 to be released.  However, I see that it has 
> been in rc1 for 4 months - so I was curious to hear if I should wait for 
> a final release.
> 
> Thanks for your time.  If there is a url / document answering my 
> concerns - please feel free to just point me at that.
> 
> Cheers,
> Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org