You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ben Litchfield <be...@csh.rit.edu> on 2003/04/21 05:12:26 UTC

[ANN]PDFBox-0.6.2

I am proud to announce the latest version of PDFBox.  This version comes
with some exciting changes to the project.  Bob Dickinson joins PDFBox the
development team and brings 20 years of programming expertise to this
project.  Bob is responsible for all changes in the 0.6.2 release.

PDFBox has also been moved to sourceforge which offers many features that
people have been requesting such as bug tracking and mailing lists.

Version 0.6.2 has fixed many issues with text extraction and it is
recommended that all Lucene users that use PDFBox upgrade to this latest
version.

PDFBox homepage
http://www.pdfbox.org

PDFBox Sourceforge site
http://www.sourceforge.net/projects/pdfbox

PDFBox-0.6.2 release notes
-Modified build so that build.properties settings are no longer required
-Added required libraries to CVS
-Added log4j logging
-Significant text extraction work
-Added automatic handling of files encrypted with the empty password
-Added automated tests and test data for text extraction
-Removed unimplemented decoders from filters test
-Fixed several LZW decode bugs introduced after 0.5.6
-Fixed bugs relating to processing out of spec PDF's with bad # escaping
in
    the name ("java.io.IOException: Error: expected hex number" bug)
-Fixed Lucene UID generation bug
-Fixed GetFontWidths null pointer exception bug


Peace,
Ben


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: JavaCC

Posted by dunpanic <du...@mailhost.net>.
Hi, René,

	Thanks for the pointer, but I have tried that already.  The following is an
extract from my default.properties:

	# Home directory of JavaCC
	javacc.home = d:/javacc-3.0/bin
	javacc.zip.dir = ${javacc.home}/lib
	javacc.zip = ${javacc.zip.dir}/javacc.zip

	I have also made the following changes:
	- rename javacc.jar to javacc.zip (JavaCC 3.0 distributes as a .jar file)
	- made the following changes in the ant build.xml file
		    <available
      			property="javacc.present"
		            classname="org.netbeans.javacc.parser.Main"
	                  classpath="${javacc.zip}"
                 />
        This is because JavaCC from experimentalstuff.com is somehow
packaged under org.netbeans.javacc.*, and this enables me to skip the check
for javacc.present.

	However, I still get
		java.lang.NoClassDefFoundError: COM/sun/labs/javacc/Main
	at exactly the line where the javacc ant task is located.


	So the question is, is there a way I can tell ant's javacc task to look for
org.netbeans.javacc.parser.Main instead?  Otherwise, I have to continue to
hunt for the JavaCC version that is packaged as COM.sun.labs.javacc.Main.

Choong Yong,Koh


-----Original Message-----
From: prolog_tutor@gmx.de [mailto:prolog_tutor@gmx.de]
Sent: Sunday, May 04, 2003 12:22 AM
To: Lucene Users List
Subject: Re: JavaCC


Choong Yong,

Just read the BUILD.txt --> Step 3 and create your own properties file as
stated or
modify the default.properties file at about line 60, so that the javacc.home
variable points
to the binaries.

HTH,
René

--
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: JavaCC

Posted by pr...@gmx.de.
Choong Yong,

Just read the BUILD.txt --> Step 3 and create your own properties file as
stated or
modify the default.properties file at about line 60, so that the javacc.home
variable points
to the binaries.

HTH,
René

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


JavaCC

Posted by dunpanic <du...@mailhost.net>.
Hi,

	I have downloaded lucene1.3rc1 and was attempting to build it.  The ant
build script told me that I will need JavaCC 2.0.  I followed the URL
provided in the echo message at webgain, but it no longer provides download
of JavaCC.  Subsequently I was able to download JavaCC 3.0 from
experimentalstuff, but the package name in JavaCC 3.0 is different from
JavaCC 2.0 and the ant javacc task cannot read from the JavaCC 3.0 packages.

	Can anyone pass me a copy of JavaCC2.0, or point me to anywhere I can
download it, or give me a pointer to how I can modify the ant javacc task to
look for the JavaCC3.0 packages instead?

	Thanks.

Choong Yong, Koh



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Thread question

Posted by Tatu Saloranta <ta...@hypermall.net>.
On Monday 21 April 2003 07:30, Rob Outar wrote:
...
> they would like to search.  Then there is a RepositorySearcher class (one
> instance for each repository) which is the class RepositoryFile calls to do
  ^^^^^^^^^^^^^^^^^^^^^^^^^

...
> doing the same.  The methods in the middle tier which calls the Index
> Manager tier are all synchronized yet I am getting requests for concurrent
> write/delete operations.  I am getting several IOExceptions saying the
> index is locked for wiring (the write.lock IOExceptions Lucene throws).  I
> figured by synchronizing the methods in
> the RepositorySearcher class
          ^^^^^^^^^^^^^^^^^^^^^^^                                                                               

> calls Index Manager class to modify the index) it would make the program
> thread safe.  I am new to whole multithreading environment, am I doing
> something wrong here?

Apologies if I misunderstood the scenario, but if not, synchronizing there 
wouldn't give you the critical section. Each of RepositorySearchers would 
have its own lock (one lock per object). Unless you make method(s) static, in 
which case there's a single lock per class (alternatively you could just use 
static member there, and sync on that, not use default synchronized keyword 
or synchronized (this)).
Synchronized keyword is equivalent to having 'synchronized (this) {' contain
all code in the method.
What you need is a central lock; either create a lock object (any Object does 
fine) and share it with RepositySearchers, or use a static Object to lock on.

> 	Also, I assume it is ok to query the index while a write/delete is
> occurring?

Not sure but I wouldn't count on that being true? For it to be true, whole 
contents of the index would have to be kept in memory (to have unmodified 
copy at all times)? Or file system would need to have atomic update sets (ie. 
this could be done if index was in DB).
Or alternatively, Lucene core would need to prevent read access while write is 
going on, without throwing an exception?
I'm sure someone else can give a definite answer though.

In system I'm working on I actually duplicate an existing index each time an 
update is needed, so existing read access (queries) use the old index up 
until new one is ready to be used. When all old read access threads are done 
using old index it can be deleted; new accesses go straight to newest index.
This is not the most efficient way to do it I'm sure, but it works reliably 
and (from query perspective) gives lowest latency. This because write access 
does not block reads, and system is multi-CPU machine and (hopefully) can 
parallelize read/write threads properly to multiple CPUs.

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Thread question

Posted by Rob Outar <ro...@ideorlando.org>.
	This is more of a Java language question but it does involve Lucene as
well.  I have a 3 tier index structure.  I have a RepositoryFile class which
extends java.io.File, folks create those at the directory in which they
would like to search.  Then there is a RepositorySearcher class (one
instance for each repository) which is the class RepositoryFile calls to do
the querying.  For manipulating the index there is an Index Manger class
that adds and removes stuff from the index.  The problem I am encountering
is threads are trying to write/delete from the index when another thread is
doing the same.  The methods in the middle tier which calls the Index
Manager tier are all synchronized yet I am getting requests for concurrent
write/delete operations.  I am getting several IOExceptions saying the index
is locked for wiring (the write.lock IOExceptions Lucene throws).  I figured
by synchronizing the methods in the RepositorySearcher class (which calls
Index Manager class to modify the index) it would make the program thread
safe.  I am new to whole multithreading environment, am I doing something
wrong here?

	Also, I assume it is ok to query the index while a write/delete is
occurring?

	Let me know.

Thanks,

Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org