You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Junjie Wei <jw...@nyu.edu> on 2017/03/15 21:25:16 UTC

pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Hi,

When I was trying to build pylucene-6.4.1 in Cygwin on Windows, the "$ make
build" exit with errors complaining that some jar files cannot be open. It
seems because some of the jars under lucene-java-6.4.1 are symbolic links
with size of 1k instead of concrete ones. Here is a list that I located
them with find command:

$ find ./lucene-java-6.4.1/ -name *.jar -size 1k
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/icu/lib/icu4j-56.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-fsa-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-polish-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-stemming-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/phonetic/lib/commons-codec-1.10.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Tagger-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/uimaj-core-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/WhitespaceTokenizer-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/antlr4-runtime-4.5.1-1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-5.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-commons-5.1.jar


After downloaded and replaced lucene-java-6.4.1 from
https://archive.apache.org/dist/lucene/java/6.4.1/, things went all good.

Is it an issue in the release, or I have missed something before built?

Thanks,

Junjie

Re: pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Posted by Andi Vajda <va...@apache.org>.
On Fri, 17 Mar 2017, Ruediger Meier wrote:

> On Friday 17 March 2017, Andi Vajda wrote:
>> Now, several people, including yourself, have proposed
>> python 3 ports. I still have to figure a way to package this all up
>> into a release that works with both. I need some time to integrate
>> the three python 3 ports,
>
> FYI I have the other two ports also imported into my github repo which
> makes it easy to compare again.
>
> $ git ls-remote  https://github.com/rudimeier/jcc |cut -f2
>
> refs/heads/master        <<< my final one, works for py2 and py3
> refs/heads/py3-old-orig  <<< old svn, pylucene/branches/python_3
> refs/heads/py3-tommykoch <<< from https://gist.github.com/tommykoch
> refs/tags/v2.23

Thank you. I got started on this today and I'm now starting to look at the 
three ports. So far, I've got jcc split into two parts (still one module, 
one egg) to work with both python2 and python3 but keeping the code 
separate. It's too much of a mess to keep both versions together in the same 
file and I don't expect the python2 version to change too much since jcc has 
been quite stable...

Andi..

>
>
> cu,
> Rudi
>

Re: pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Posted by Ruediger Meier <sw...@gmx.de>.
On Friday 17 March 2017, Andi Vajda wrote:
> Now, several people, including yourself, have proposed
> python 3 ports. I still have to figure a way to package this all up
> into a release that works with both. I need some time to integrate
> the three python 3 ports,

FYI I have the other two ports also imported into my github repo which 
makes it easy to compare again.

$ git ls-remote  https://github.com/rudimeier/jcc |cut -f2

refs/heads/master        <<< my final one, works for py2 and py3
refs/heads/py3-old-orig  <<< old svn, pylucene/branches/python_3
refs/heads/py3-tommykoch <<< from https://gist.github.com/tommykoch
refs/tags/v2.23


cu,
Rudi

Re: pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Posted by Andi Vajda <va...@apache.org>.
> On Mar 16, 2017, at 20:34, Ruediger Meier <sw...@gmx.de> wrote:
> 
>> On Friday 17 March 2017, Andi Vajda wrote:
>>> On Thu, 16 Mar 2017, Ruediger Meier wrote:
>>>> On Thursday 16 March 2017, Andi Vajda wrote:
>>>> Indeed, this is a bug of mine.
>>>> What would you prefer:
>>>>   - include the actual .jar files in the distribution archive
>>>> (tell tar to follow the symlinks when I build the PyLucene
>>>> distribution) - or exclude the symlinks (tell tar to exclude
>>>> symlinks); your running build would then use ivy to fetch them
>>> 
>>> Usually my opinion is that tarballs should have the least possible
>>> dependencies. But in this case where all the deps are hosted on the
>>> same source (apache.org) I would not include it but download on
>>> build time (if user has not downloaded it manually already).
>> 
>> +1, I'm leaning towards not including these .jar files as well.
>> It saves about 20Mb on the pylucene distribution tar file and they
>> can be obtained from ivy anyway.
>> 
>>> Maybe we could even enhance the Makefile to automatically find an
>>> already installed lucene or download the latest minor version. IMO
>>> it makes no sense that pylucene users by default always use a
>>> non-bugfixed outdated lucene. And I saw on this mailing list how
>>> difficult it can be to get enough votes for a pylucene minor
>>> update.
>> 
>> There is no such thing as a bugfixed Lucene. Each Lucene release has
>> new bug fixes but also new bugs, such is software development. Lucene
>> also breaks things on a regular basis inspite of being quite careful
>> about backwards compatibility, thus PyLucene unit tests have to be
>> checked for each release.
>> 
>> The problem you're referring to would not be much of an issue if it
>> was easier to garner votes for a PyLucene release. A new release
>> would happen in lock step with each Lucene release, as was the case
>> in the past, a few years ago. There is a Lucene 6.5 release being
>> talked about and I intend to release a PyLucene 6.5 shortly
>> thereafter.
> 
> Well, I was speaking about the minor maintenance updates like 6.4.2 but 
> you know surely better about the quality of lucene updates.
> 
>>> The same goes for the jcc python package which the user has to
>>> install manually anyways. We don't need to ship it with pylucene. I
>>> guess jcc would be far more famous if it would be hosted decoupled
>>> of pylucene. IMO jcc is a really amazing good working thing.
>>> pylucene is just a nice example how easy you can use java libs via
>>> python.
>> 
>> Thank you for the kind words. JCC is already available without
>> PyLucene from Python's PyPI: https://pypi.python.org/pypi/JCC/2.23
>> JCC gets released on PyPI at the same time as the main Apache
>> PyLucene release.
>> 
>> I agree that PyLucene is just an example of JCC usage but it's the
>> main one and PyLucene has been driving the features of JCC.
> 
> Yep, jcc only exists because of pylucene. And good that pylucene's 
> development and user base guarantees that jcc will be well maintained 
> in future too. On the other hand pylucene may be some kind of show 
> stopper for jcc. Why wasn't the old experimental jcc/py3 port released 
> quickly on PyPI 7 years ago?

Because it was an experimental branch that was never finished.

> Is there any chance to get the recent 
> jcc/py3 port released soon even pylucene still cares for stable py2 
> only?

I don't think PyLucene cares either way. I have not had enough time in a long while to do a releasable version of jcc with python 3 support. Now, several people, including yourself, have proposed python 3 ports. I still have to figure a way to package this all up into a release that works with both.
I need some time to integrate the three python 3 ports, update it to do proper string conversions and package it in a way that it works both with python 2 and 3 (can be different sets of sources, with possible  overlaps, but in the same source egg).

Andi..

> I mean releasing jcc for py3 cannot break any existing project. 
> No need to wait for the right time to test it more carefully.
> 
> Cheers,
> Rudi


Re: pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Posted by Ruediger Meier <sw...@gmx.de>.
On Friday 17 March 2017, Andi Vajda wrote:
> On Thu, 16 Mar 2017, Ruediger Meier wrote:
> > On Thursday 16 March 2017, Andi Vajda wrote:
> >> Indeed, this is a bug of mine.
> >> What would you prefer:
> >>    - include the actual .jar files in the distribution archive
> >> (tell tar to follow the symlinks when I build the PyLucene
> >> distribution) - or exclude the symlinks (tell tar to exclude
> >> symlinks); your running build would then use ivy to fetch them
> >
> > Usually my opinion is that tarballs should have the least possible
> > dependencies. But in this case where all the deps are hosted on the
> > same source (apache.org) I would not include it but download on
> > build time (if user has not downloaded it manually already).
>
> +1, I'm leaning towards not including these .jar files as well.
> It saves about 20Mb on the pylucene distribution tar file and they
> can be obtained from ivy anyway.
>
> > Maybe we could even enhance the Makefile to automatically find an
> > already installed lucene or download the latest minor version. IMO
> > it makes no sense that pylucene users by default always use a
> > non-bugfixed outdated lucene. And I saw on this mailing list how
> > difficult it can be to get enough votes for a pylucene minor
> > update.
>
> There is no such thing as a bugfixed Lucene. Each Lucene release has
> new bug fixes but also new bugs, such is software development. Lucene
> also breaks things on a regular basis inspite of being quite careful
> about backwards compatibility, thus PyLucene unit tests have to be
> checked for each release.
>
> The problem you're referring to would not be much of an issue if it
> was easier to garner votes for a PyLucene release. A new release
> would happen in lock step with each Lucene release, as was the case
> in the past, a few years ago. There is a Lucene 6.5 release being
> talked about and I intend to release a PyLucene 6.5 shortly
> thereafter.

Well, I was speaking about the minor maintenance updates like 6.4.2 but 
you know surely better about the quality of lucene updates.

> > The same goes for the jcc python package which the user has to
> > install manually anyways. We don't need to ship it with pylucene. I
> > guess jcc would be far more famous if it would be hosted decoupled
> > of pylucene. IMO jcc is a really amazing good working thing.
> > pylucene is just a nice example how easy you can use java libs via
> > python.
>
> Thank you for the kind words. JCC is already available without
> PyLucene from Python's PyPI: https://pypi.python.org/pypi/JCC/2.23
> JCC gets released on PyPI at the same time as the main Apache
> PyLucene release.
>
> I agree that PyLucene is just an example of JCC usage but it's the
> main one and PyLucene has been driving the features of JCC.

Yep, jcc only exists because of pylucene. And good that pylucene's 
development and user base guarantees that jcc will be well maintained 
in future too. On the other hand pylucene may be some kind of show 
stopper for jcc. Why wasn't the old experimental jcc/py3 port released 
quickly on PyPI 7 years ago? Is there any chance to get the recent 
jcc/py3 port released soon even pylucene still cares for stable py2 
only? I mean releasing jcc for py3 cannot break any existing project. 
No need to wait for the right time to test it more carefully.

Cheers,
Rudi

Re: pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Posted by Andi Vajda <va...@apache.org>.
On Thu, 16 Mar 2017, Ruediger Meier wrote:

> On Thursday 16 March 2017, Andi Vajda wrote:
>
>> Indeed, this is a bug of mine.
>> What would you prefer:
>>    - include the actual .jar files in the distribution archive (tell
>> tar to follow the symlinks when I build the PyLucene distribution) -
>> or exclude the symlinks (tell tar to exclude symlinks); your running
>> build would then use ivy to fetch them
>
> Usually my opinion is that tarballs should have the least possible
> dependencies. But in this case where all the deps are hosted on the
> same source (apache.org) I would not include it but download on build
> time (if user has not downloaded it manually already).

+1, I'm leaning towards not including these .jar files as well.
It saves about 20Mb on the pylucene distribution tar file and they can be
obtained from ivy anyway.

> Maybe we could even enhance the Makefile to automatically find an
> already installed lucene or download the latest minor version. IMO it
> makes no sense that pylucene users by default always use a non-bugfixed
> outdated lucene. And I saw on this mailing list how difficult it can be
> to get enough votes for a pylucene minor update.

There is no such thing as a bugfixed Lucene. Each Lucene release has new bug 
fixes but also new bugs, such is software development. Lucene also breaks 
things on a regular basis inspite of being quite careful about backwards 
compatibility, thus PyLucene unit tests have to be checked for each release.

The problem you're referring to would not be much of an issue if it was 
easier to garner votes for a PyLucene release. A new release would happen in 
lock step with each Lucene release, as was the case in the past, a few years 
ago. There is a Lucene 6.5 release being talked about and I intend to 
release a PyLucene 6.5 shortly thereafter.

> The same goes for the jcc python package which the user has to install
> manually anyways. We don't need to ship it with pylucene. I guess jcc
> would be far more famous if it would be hosted decoupled of pylucene.
> IMO jcc is a really amazing good working thing. pylucene is just a nice
> example how easy you can use java libs via python.

Thank you for the kind words. JCC is already available without PyLucene from 
Python's PyPI: https://pypi.python.org/pypi/JCC/2.23
JCC gets released on PyPI at the same time as the main Apache PyLucene release.

I agree that PyLucene is just an example of JCC usage but it's the main one 
and PyLucene has been driving the features of JCC.

Andi..

>
> cheers,
> Rudi
>

Re: pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Posted by Ruediger Meier <sw...@gmx.de>.
On Thursday 16 March 2017, Andi Vajda wrote:

> Indeed, this is a bug of mine.
> What would you prefer:
>    - include the actual .jar files in the distribution archive (tell
> tar to follow the symlinks when I build the PyLucene distribution) -
> or exclude the symlinks (tell tar to exclude symlinks); your running
> build would then use ivy to fetch them

Usually my opinion is that tarballs should have the least possible 
dependencies. But in this case where all the deps are hosted on the 
same source (apache.org) I would not include it but download on build 
time (if user has not downloaded it manually already).

Maybe we could even enhance the Makefile to automatically find an 
already installed lucene or download the latest minor version. IMO it 
makes no sense that pylucene users by default always use a non-bugfixed 
outdated lucene. And I saw on this mailing list how difficult it can be 
to get enough votes for a pylucene minor update.

The same goes for the jcc python package which the user has to install 
manually anyways. We don't need to ship it with pylucene. I guess jcc 
would be far more famous if it would be hosted decoupled of pylucene. 
IMO jcc is a really amazing good working thing. pylucene is just a nice 
example how easy you can use java libs via python.

cheers,
Rudi

Re: pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Posted by Andi Vajda <va...@apache.org>.
On Wed, 15 Mar 2017, Ruediger Meier wrote:

> On Wednesday 15 March 2017, Junjie Wei wrote:
>> Hi,
>>
>> When I was trying to build pylucene-6.4.1 in Cygwin on Windows, the
>> "$ make build" exit with errors complaining that some jar files
>> cannot be open. It seems because some of the jars under
>> lucene-java-6.4.1 are symbolic links with size of 1k instead of
>> concrete ones. Here is a list that I located them with find command:
>>
>> $ find ./lucene-java-6.4.1/ -name *.jar -size 1k
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/icu/lib/icu4j
>> -56.1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
>> b/morfologik-fsa-2.1.1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
>> b/morfologik-polish-2.1.1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
>> b/morfologik-stemming-2.1.1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/phonetic/lib/
>> commons-codec-1.10.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Tagg
>> er-2.3.1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/uima
>> j-core-2.3.1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Whit
>> espaceTokenizer-2.3.1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/antlr4
>> -runtime-4.5.1-1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-5.
>> 1.jar
>> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-co
>> mmons-5.1.jar
>>
>>
>> After downloaded and replaced lucene-java-6.4.1 from
>> https://archive.apache.org/dist/lucene/java/6.4.1/, things went all
>> good.
>>
>> Is it an issue in the release, or I have missed something before
>> built?
>
> Yes this is a minor but annoying issue of this realease. There are some
> dead links packaged, pointing to Andi's home. like this one
>
> ./lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-stemming-2.1.1.jar -> /Users/vajda/.ivy2/cache/org.carrot2/morfologik-stemming/bundles/morfologik-stemming-2.1.1.jar
>
> Maybe the "make release/distrib" target has a bug or these links where
> commited to svn by mistake.
>
> BTW this is no real issue on a real POSIX system. Cygwin seems to make
> this worse as it has to emulate symlinks somehow. I guess instead of
> downloading lucene manually you could have fixed it by just removing
> all the bad links.

Indeed, this is a bug of mine.
What would you prefer:
   - include the actual .jar files in the distribution archive (tell tar to
     follow the symlinks when I build the PyLucene distribution)
   - or exclude the symlinks (tell tar to exclude symlinks); your
     running build would then use ivy to fetch them

Andi..

>
> cu,
> Rudi
>
>

Re: pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory

Posted by Ruediger Meier <sw...@gmx.de>.
On Wednesday 15 March 2017, Junjie Wei wrote:
> Hi,
>
> When I was trying to build pylucene-6.4.1 in Cygwin on Windows, the
> "$ make build" exit with errors complaining that some jar files
> cannot be open. It seems because some of the jars under
> lucene-java-6.4.1 are symbolic links with size of 1k instead of
> concrete ones. Here is a list that I located them with find command:
>
> $ find ./lucene-java-6.4.1/ -name *.jar -size 1k
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/icu/lib/icu4j
>-56.1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
>b/morfologik-fsa-2.1.1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
>b/morfologik-polish-2.1.1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
>b/morfologik-stemming-2.1.1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/phonetic/lib/
>commons-codec-1.10.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Tagg
>er-2.3.1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/uima
>j-core-2.3.1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Whit
>espaceTokenizer-2.3.1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/antlr4
>-runtime-4.5.1-1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-5.
>1.jar
> ./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-co
>mmons-5.1.jar
>
>
> After downloaded and replaced lucene-java-6.4.1 from
> https://archive.apache.org/dist/lucene/java/6.4.1/, things went all
> good.
>
> Is it an issue in the release, or I have missed something before
> built?

Yes this is a minor but annoying issue of this realease. There are some 
dead links packaged, pointing to Andi's home. like this one

./lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-stemming-2.1.1.jar -> /Users/vajda/.ivy2/cache/org.carrot2/morfologik-stemming/bundles/morfologik-stemming-2.1.1.jar

Maybe the "make release/distrib" target has a bug or these links where 
commited to svn by mistake.

BTW this is no real issue on a real POSIX system. Cygwin seems to make 
this worse as it has to emulate symlinks somehow. I guess instead of 
downloading lucene manually you could have fixed it by just removing 
all the bad links.

cu,
Rudi