You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Erik Groeneveld - Seecr <er...@seecr.nl> on 2020/11/26 11:30:42 UTC

Multiple .so extension sharing classes

L.S.,

We use PyLucene and JCC from the beginning and are very satisfied with it
but we do have a problem. It appeared while using Lucene, but we brought it
down to the following.

(Everything is at https://github.com/ejgroene/jcc-howto.git, clone and run
go.sh.)

We have an extension pkg1 with:

package pkg1;

public class Wtf {}

and an second extension with that uses the first:

package pkg2;

import pkg1.Wtf;


public class MyWtf {

    public void f(Wtf a) {}

}

NB: pkg2 uses a class from pkg1 as method argument type.

The Python program is:

import pkg1

pkg1.initVM()


import pkg2

pkg2.initVM()


c = pkg1.Wtf()

s = pkg2.MyWtf()


# should work, but it gives InvalidArgs

s.f(c)

It will not call s.f because although c is an pkg1.Wtf, it is not the same
Java class (not ==).
It seems we have duplicate wrappers for the same class.

The build looks like:

python3 -m jcc \

        --shared \

        --jar build/pkg1.jar \

        --python pkg1 \

        --build \

        --install \

        --root dist


python3 -m jcc \

        --shared \

        --import pkg1 \

        --package pkg1 \

        --jar build/pkg2.jar \

        --python pkg2 \

        --build \

        --install \

        --root dist

The whole setup is available at https://github.com/ejgroene/jcc-howto.git.
Clone it and run go.sh.

It seems to me that there is a contradiction between --import, as it is
intended for sharing wrappers across extensions, as per

"When more than one JCC-built extension module is going to be used in the
same Python VM and these extension modules share Java classes, only one
extension module should be generated with wrappers for these shared
classes. The other extension modules must be built by importing the one
with the shared classes by using the --import command line parameter. This
ensures that only one copy of the wrappers for the shared classes are
generated and that they are compatible among all extension modules sharing
them."

however, we also need --package for JCC to recognize the type of the
argument and generate code for f as per

"JCC generates wrappers for all public classes that are requested by name
on the command line or via the --jar command line argument. It generates
wrapper methods for all public methods and fields on these classes whose
return type and parameter types are found in one of the following ways:

   - the type is one of the requested classes
   - the type is one of the requested classes' superclass or implemented
   interfaces
   - the type is available from one of the packages listed via the
--package command
   line argument"

(Originally MyWtf extended Wtf, making the wrapper generation for Wtf (in
pkg2) implicit, so we could do without --package.)

I looked at all the documentation as well as the (generated) source code,
but I could not find a solution.

What am I doing wrong?

Best regards,
Erik Groeneveld

--
Erik Groeneveld  ♦ seecr.nl ♦ +31 624 584 029

-- 
Seecr helpt informatieprofessionals met het consistent integreren en 
verbinden van decentrale metadata zodat zij zich helemaal kunnen focussen 
op de inhoud. 
Meer weten? Kijk op seecr.nl <https://seecr.nl>.


Re: Multiple .so extension sharing classes

Posted by Andi Vajda <va...@apache.org>.
Thank you for the detailed description and code to reproduce. You may have 
found a bug here, I need to investigate.
To be continued...

Andi..

On Thu, 26 Nov 2020, Erik Groeneveld - Seecr wrote:

> L.S.,
>
> We use PyLucene and JCC from the beginning and are very satisfied with it
> but we do have a problem. It appeared while using Lucene, but we brought it
> down to the following.
>
> (Everything is at https://github.com/ejgroene/jcc-howto.git, clone and run
> go.sh.)
>
> We have an extension pkg1 with:
>
> package pkg1;
>
> public class Wtf {}
>
> and an second extension with that uses the first:
>
> package pkg2;
>
> import pkg1.Wtf;
>
>
> public class MyWtf {
>
>    public void f(Wtf a) {}
>
> }
>
> NB: pkg2 uses a class from pkg1 as method argument type.
>
> The Python program is:
>
> import pkg1
>
> pkg1.initVM()
>
>
> import pkg2
>
> pkg2.initVM()
>
>
> c = pkg1.Wtf()
>
> s = pkg2.MyWtf()
>
>
> # should work, but it gives InvalidArgs
>
> s.f(c)
>
> It will not call s.f because although c is an pkg1.Wtf, it is not the same
> Java class (not ==).
> It seems we have duplicate wrappers for the same class.
>
> The build looks like:
>
> python3 -m jcc \
>
>        --shared \
>
>        --jar build/pkg1.jar \
>
>        --python pkg1 \
>
>        --build \
>
>        --install \
>
>        --root dist
>
>
> python3 -m jcc \
>
>        --shared \
>
>        --import pkg1 \
>
>        --package pkg1 \
>
>        --jar build/pkg2.jar \
>
>        --python pkg2 \
>
>        --build \
>
>        --install \
>
>        --root dist
>
> The whole setup is available at https://github.com/ejgroene/jcc-howto.git.
> Clone it and run go.sh.
>
> It seems to me that there is a contradiction between --import, as it is
> intended for sharing wrappers across extensions, as per
>
> "When more than one JCC-built extension module is going to be used in the
> same Python VM and these extension modules share Java classes, only one
> extension module should be generated with wrappers for these shared
> classes. The other extension modules must be built by importing the one
> with the shared classes by using the --import command line parameter. This
> ensures that only one copy of the wrappers for the shared classes are
> generated and that they are compatible among all extension modules sharing
> them."
>
> however, we also need --package for JCC to recognize the type of the
> argument and generate code for f as per
>
> "JCC generates wrappers for all public classes that are requested by name
> on the command line or via the --jar command line argument. It generates
> wrapper methods for all public methods and fields on these classes whose
> return type and parameter types are found in one of the following ways:
>
>   - the type is one of the requested classes
>   - the type is one of the requested classes' superclass or implemented
>   interfaces
>   - the type is available from one of the packages listed via the
> --package command
>   line argument"
>
> (Originally MyWtf extended Wtf, making the wrapper generation for Wtf (in
> pkg2) implicit, so we could do without --package.)
>
> I looked at all the documentation as well as the (generated) source code,
> but I could not find a solution.
>
> What am I doing wrong?
>
> Best regards,
> Erik Groeneveld
>
> --
> Erik Groeneveld  ♦ seecr.nl ♦ +31 624 584 029
>
> -- 
> Seecr helpt informatieprofessionals met het consistent integreren en
> verbinden van decentrale metadata zodat zij zich helemaal kunnen focussen
> op de inhoud.
?? Meer weten? Kijk op seecr.nl <https://seecr.nl>.
>
>

Re: Multiple .so extension sharing classes

Posted by Andi Vajda <va...@apache.org>.
  Hi Erik,

On Wed, 16 Dec 2020, Erik Groeneveld - Seecr wrote:

> Hi Andy,
>
> Sorry for my late response.
>
> I tried your suggestions with the latest JCC (trunk) and now it works. We
> are using Java 8 in this case, and the concatenation of classpaths still
> works. That is very fortunate, because that makes our Python extensions
> independent.

If you upgrade to a more recent java version, you just need to call initVM() 
once after loading all the extensions you need for that run and pass 
initVM() their concatenated classpaths.

Andi..

>
> Thank you for your efforts and the amount of time you put in JCC and
> PyLucene!
>
> Best regards,
> Erik
>
> --
> Erik Groeneveld  ♦ seecr.nl ♦ +31 624 584 029
>
>
> On Sat, Nov 28, 2020 at 4:22 AM Andi Vajda <va...@apache.org> wrote:
>
>>
>> I now made it return an error when calling initVM() a second time and
>> updating the VM's classpath failed because the system class loader is not
>> an
>> instance of java.net.URLClassLoader.
>>
>> Instead, call initVM() only once but with all the module.CLASSPATH strings
>> set into its classpath keyword argument:
>>
>>    import os, mod1, mod2, mod3
>>
>>    mod3.initVM(classpath=os.pathsep.join([mod1.CLASSPATH, mod2.CLASSPATH,
>>                                           mod3.CLASSPATH]))
>>
>> Andi..
>>
>
> -- 
> Seecr helpt informatieprofessionals met het consistent integreren en
> verbinden van decentrale metadata zodat zij zich helemaal kunnen focussen
> op de inhoud.
?? Meer weten? Kijk op seecr.nl <https://seecr.nl>.
>
>

Re: Multiple .so extension sharing classes

Posted by Erik Groeneveld - Seecr <er...@seecr.nl>.
Hi Andy,

Sorry for my late response.

I tried your suggestions with the latest JCC (trunk) and now it works. We
are using Java 8 in this case, and the concatenation of classpaths still
works. That is very fortunate, because that makes our Python extensions
independent.

Thank you for your efforts and the amount of time you put in JCC and
PyLucene!

Best regards,
Erik

--
Erik Groeneveld  ♦ seecr.nl ♦ +31 624 584 029


On Sat, Nov 28, 2020 at 4:22 AM Andi Vajda <va...@apache.org> wrote:

>
> I now made it return an error when calling initVM() a second time and
> updating the VM's classpath failed because the system class loader is not
> an
> instance of java.net.URLClassLoader.
>
> Instead, call initVM() only once but with all the module.CLASSPATH strings
> set into its classpath keyword argument:
>
>    import os, mod1, mod2, mod3
>
>    mod3.initVM(classpath=os.pathsep.join([mod1.CLASSPATH, mod2.CLASSPATH,
>                                           mod3.CLASSPATH]))
>
> Andi..
>

-- 
Seecr helpt informatieprofessionals met het consistent integreren en 
verbinden van decentrale metadata zodat zij zich helemaal kunnen focussen 
op de inhoud. 
Meer weten? Kijk op seecr.nl <https://seecr.nl>.


Re: Multiple .so extension sharing classes

Posted by Andi Vajda <va...@apache.org>.
I now made it return an error when calling initVM() a second time and 
updating the VM's classpath failed because the system class loader is not an 
instance of java.net.URLClassLoader.

Instead, call initVM() only once but with all the module.CLASSPATH strings
set into its classpath keyword argument:

   import os, mod1, mod2, mod3

   mod3.initVM(classpath=os.pathsep.join([mod1.CLASSPATH, mod2.CLASSPATH,
                                          mod3.CLASSPATH]))

Andi..

Re: Multiple .so extension sharing classes

Posted by Andi Vajda <va...@apache.org>.
  Hi,

I found several bugs here:

   - JCC with python3 using --import is broken. The code handling it is using
     a python2 function, os.path.walk(), that doesn't exist in python3.
     I fixed this in JCC's trunk just now.

   - JCC is dynamically adding paths to the classpath by calling
     URLClassLoader's protected addURL() method and assumes that
     ClassLoader.getSystemClassLoader() is an URLClassLoader. This is a bit of
     a hack but has worked fine until at least Java 8.
     In the version of Java I'm using, Java 11, this is not the case anymore.
     Thus --import is broken as well as calling initVM() multiple
     times. I fixed --import but I need to find a way around this for
     supporting calling initVM() multiple times - the second and
     subsequent times, it just updates the classpath. But there is a
     workaround you can use now: call initVM() once by setting all the
     classpaths you're going to need from all modules you're loading.

I modified your wtf.py file accordingly:

   import os, pkg1, pkg2
   pkg2.initVM(classpath=os.path.pathsep.join([pkg1.CLASSPATH, pkg2.CLASSPATH]))

   c = pkg1.Wtf()
   s = pkg2.MyWtf()

   # works fine
   s.f(c)

You don't need to use --package with --import, I modified your go.sh file 
accordingly:

   #!/bin/bash

   # clean up
   rm -rf build dist
   mkdir build dist
   rm *.egg-info

   # compile pkg1 and pkg2 source
   javac -d build $(find src -name "*.java")

   # make jars
   (cd build; jar -c pkg1 > pkg1.jar)
   (cd build; jar -c pkg2 > pkg2.jar)

   # make pkg1 extension .so
   ../_install3/bin/python -m jcc --shared --jar build/pkg1.jar --python pkg1 --build --install

   # make pkg2 extension .so
   # We need --import to find the wrappers from pkg,
   ../_install3/bin/python -m jcc --shared --import pkg1 --jar build/pkg2.jar --python pkg2 --build --install

   ../_install3/bin/python wtf.py

To be continued...

Andi..

On Thu, 26 Nov 2020, Erik Groeneveld - Seecr wrote:

> L.S.,
>
> We use PyLucene and JCC from the beginning and are very satisfied with it
> but we do have a problem. It appeared while using Lucene, but we brought it
> down to the following.
>
> (Everything is at https://github.com/ejgroene/jcc-howto.git, clone and run
> go.sh.)
>
> We have an extension pkg1 with:
>
> package pkg1;
>
> public class Wtf {}
>
> and an second extension with that uses the first:
>
> package pkg2;
>
> import pkg1.Wtf;
>
>
> public class MyWtf {
>
>    public void f(Wtf a) {}
>
> }
>
> NB: pkg2 uses a class from pkg1 as method argument type.
>
> The Python program is:
>
> import pkg1
>
> pkg1.initVM()
>
>
> import pkg2
>
> pkg2.initVM()
>
>
> c = pkg1.Wtf()
>
> s = pkg2.MyWtf()
>
>
> # should work, but it gives InvalidArgs
>
> s.f(c)
>
> It will not call s.f because although c is an pkg1.Wtf, it is not the same
> Java class (not ==).
> It seems we have duplicate wrappers for the same class.
>
> The build looks like:
>
> python3 -m jcc \
>
>        --shared \
>
>        --jar build/pkg1.jar \
>
>        --python pkg1 \
>
>        --build \
>
>        --install \
>
>        --root dist
>
>
> python3 -m jcc \
>
>        --shared \
>
>        --import pkg1 \
>
>        --package pkg1 \
>
>        --jar build/pkg2.jar \
>
>        --python pkg2 \
>
>        --build \
>
>        --install \
>
>        --root dist
>
> The whole setup is available at https://github.com/ejgroene/jcc-howto.git.
> Clone it and run go.sh.
>
> It seems to me that there is a contradiction between --import, as it is
> intended for sharing wrappers across extensions, as per
>
> "When more than one JCC-built extension module is going to be used in the
> same Python VM and these extension modules share Java classes, only one
> extension module should be generated with wrappers for these shared
> classes. The other extension modules must be built by importing the one
> with the shared classes by using the --import command line parameter. This
> ensures that only one copy of the wrappers for the shared classes are
> generated and that they are compatible among all extension modules sharing
> them."
>
> however, we also need --package for JCC to recognize the type of the
> argument and generate code for f as per
>
> "JCC generates wrappers for all public classes that are requested by name
> on the command line or via the --jar command line argument. It generates
> wrapper methods for all public methods and fields on these classes whose
> return type and parameter types are found in one of the following ways:
>
>   - the type is one of the requested classes
>   - the type is one of the requested classes' superclass or implemented
>   interfaces
>   - the type is available from one of the packages listed via the
> --package command
>   line argument"
>
> (Originally MyWtf extended Wtf, making the wrapper generation for Wtf (in
> pkg2) implicit, so we could do without --package.)
>
> I looked at all the documentation as well as the (generated) source code,
> but I could not find a solution.
>
> What am I doing wrong?
>
> Best regards,
> Erik Groeneveld
>
> --
> Erik Groeneveld  ♦ seecr.nl ♦ +31 624 584 029
>
> -- 
> Seecr helpt informatieprofessionals met het consistent integreren en
> verbinden van decentrale metadata zodat zij zich helemaal kunnen focussen
> op de inhoud.
?? Meer weten? Kijk op seecr.nl <https://seecr.nl>.
>
>