You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Jonathan Poltak Samosir <jo...@gmail.com> on 2014/02/08 20:54:50 UTC

Unable to call external library classes from within Samza

Hello,

This is a bit of a follow on on a previous thread of mine, "org.apache.hadoop.util.Shell$ExitCodeException whenever Samza container launches".

So I am not sure whether this is a bug, or if I am not doing something correct, or it is intended, but whenever I attempt to call Class.forName() on an external library class from within Samza, I am running into a ClassNotFoundException. My Maven deps are set up correctly, and the exact same code works without any issues if invoked manually through a main() method, for example, as opposed to running it through Samza.

The reason I want to do this, is to test that classes for JDBC drivers can be found and used. I fully understand that this method is no longer needed as of JDBC 4.0, and that the drivers should be automatically loaded if found in the class path, although this isn't happening either (an SQLException is thrown with "No suitable driver found" if I leave out the Class.forName() call, which leads to the same problem). Just so you know, this same code also works fine and loads the appropriate JDBC driver if invoked directly through a main() method.

I have also tried this with a number of different external libraries, both pulled in using Maven and manually linking .jar files on my disk, and the same result; those classes can be found fine from a standard Java main() method call, but cannot be found running in a Samza container.

Has anyone encountered the same issue, or know of why this could be happening?

I hope all that made sense, let me know if it didn't and I'll try to rephrase.

Here is the class I am trying to do this in, in case anyone wants to see what I'm talking about:
https://github.com/poltak/hello-samza/blob/database-reader/samza-wikipedia/src/main/java/samza/examples/databasereader/system/DatabaseReaderConsumer.java

Anyway, thanks for your time,
Jonathan

Re: Unable to call external library classes from within Samza

Posted by Jonathan Poltak Samosir <jo...@gmail.com>.
Hi Chris,

Lovely! It all makes sense now, and it works (until my next problem...).

Thanks for explaining the basics of the Maven assembly plugin; now I understand what that file was for.

Just a note, no need to follow-up on it if you don't know why it happened (I doubt it's Samza-related, anyhow):
My Samza code package's pom.xml already did have a dependency for the relevant jar files, and the Samza package was already being included via Maven's assembly plugin src.xml (as I am currently using the hello-samza project's "wikipedia" package to work off). The problem is that even though useTransitiveFiltering was set to true, those dependencies defined in my package's pom.xml were not being included transitively. The strange thing is that other dependencies defined in my package's pom.xml, such as jackson and slf4j libraries, do seem like they were being pulled in transitively. Maybe I'm just delusional...

Anyway, this is less of a problem for me now that I understand how the assembly plugin works, although hopefully can be of help if anyone else encounters a similar problem in the future.

Thank you,
Jonathan

------------------------------------------------------
From: Chris Riccomini criccomini@linkedin.com
Reply: dev@samza.incubator.apache.org dev@samza.incubator.apache.org
Date: 10 February 2014 at 16:01:56
To: dev@samza.incubator.apache.org dev@samza.incubator.apache.org
Subject:  Re: Unable to call external library classes from within Samza

>  
> Hey Jonathan,
>  
> Yep, you should put your jar into the lib directory of your Samza  
> job
> package.
>  
> This can be done using Maven's assembly plugin. You can have the  
> plugin
> put all runtime dependencies (including transitive dependencies)  
> into your
> lib directory. Then you just need to add a dependency, and it'll  
> get
> sucked in as part of the Maven build. This is what hello-samza  
> does.
>  
> Here's hello-samza's assembly file:
>  
> https://github.com/linkedin/hello-samza/blob/master/samza-job-package/src/m  
> ain/assembly/src.xml?source=c
>  
>  
> The most important part is the "dependencySets" section.
>  
>  
> lib
>  
> org.apache.samza:samza-core_2.8.1  
> org.apache.samza:samza-kafka_2.8.1  
> org.apache.samza:samza-serializers_2.8.1  
> org.apache.samza:samza-yarn_2.8.1  
> org.slf4j:slf4j-log4j12
> samza:samza-wikipedia
> org.apache.kafka:kafka_2.8.1
>  
> true  
>  
>  
> You can see that we're including transitive dependencies, and  
> depending on
> the Samza libraries, as well as samza-wikipedia. You can either  
> add a new
> block here for your MySQL jars, OR you can have your  
> Samza code
> package's pom.xml depend on the MySQL jar, and it will get sucked  
> in
> automatically (due to useTransitiveFiltering).
>  
>  
> Cheers,
> Chris
>  
> On 2/10/14 3:49 PM, "Jonathan Poltak Samosir"  
> wrote:
>  
> >Yep, sure enough the jar I need is not in my classpath (as per the  
> YARN
> >container's stdout).
> >
> >To fix this issue, and for future reference, what is the best  
> way of
> >adding external libraries to the classpath for Samza?
> >
> >Going by how the samza/bin/run-class.sh script is written,  
> it seems like
> >it would probably work if I place the jars I need in samza/lib  
> directory.
> >This seems rather hacky though, and I would have to do it everytime  
> I run
> >`mvn clean package`.
> >
> >So is there something I'm missing in the Maven pom.xml that is  
> placing
> >all the needed libraries, apart from the new ones I've added  
> as
> >dependencies, into the samza/lib directory of the tarball?  
> (sorry if this
> >is something obvious...)
> >
> >Thanks,
> >Jonathan
> >
> >
> >------------------------------------------------------  
> >From: Chris Riccomini criccomini@linkedin.com
> >Reply: dev@samza.incubator.apache.org dev@samza.incubator.apache.org  
> >Date: 10 February 2014 at 14:16:04
> >To: dev@samza.incubator.apache.org dev@samza.incubator.apache.org  
> >Subject: Re: Unable to call external library classes from within  
> Samza
> >
> >>
> >> Hey Jonathan,
> >>
> >> Can you post your classpath? You can usually find this in your  
> >> YARN
> >> container's stdout file, if you're running with YARN. If you're  
> >> running
> >> with LocalJobFactory, it should print to SDTOUT.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 2/8/14 11:54 AM, "Jonathan Poltak Samosir"
> >> wrote:
> >>
> >> >Hello,
> >> >
> >> >This is a bit of a follow on on a previous thread of mine,
> >> >"org.apache.hadoop.util.Shell$ExitCodeException whenever  
> >> Samza container
> >> >launches".
> >> >
> >> >So I am not sure whether this is a bug, or if I am not doing something  
> >>
> >> >correct, or it is intended, but whenever I attempt to call  
> >> >Class.forName() on an external library class from within  
> Samza,
> >> I am
> >> >running into a ClassNotFoundException. My Maven deps are  
> set
> >> up
> >> >correctly, and the exact same code works without any issues  
> >> if invoked
> >> >manually through a main() method, for example, as opposed  
> to
> >> running it
> >> >through Samza.
> >> >
> >> >The reason I want to do this, is to test that classes for JDBC  
> drivers
> >>
> >> >can be found and used. I fully understand that this method  
> is
> >> no longer
> >> >needed as of JDBC 4.0, and that the drivers should be automatically  
> >> >loaded if found in the class path, although this isn't happening  
> >> either
> >> >(an SQLException is thrown with "No suitable driver found"  
> >> if I leave out
> >> >the Class.forName() call, which leads to the same problem).  
> >> Just so you
> >> >know, this same code also works fine and loads the appropriate  
> >> JDBC
> >> >driver if invoked directly through a main() method.
> >> >
> >> >I have also tried this with a number of different external  
> libraries,
> >> >both pulled in using Maven and manually linking .jar files  
> on
> >> my disk,
> >> >and the same result; those classes can be found fine from a  
> standard
> >> Java
> >> >main() method call, but cannot be found running in a Samza  
> container.
> >> >
> >> >Has anyone encountered the same issue, or know of why this  
> could
> >> be
> >> >happening?
> >> >
> >> >I hope all that made sense, let me know if it didn't and I'll  
> try
> >> to
> >> >rephrase.
> >> >
> >> >Here is the class I am trying to do this in, in case anyone wants  
> >> to see
> >> >what I'm talking about:
> >>
> >>>https://github.com/poltak/hello-samza/blob/database-reader/samza-wikiped  
> >>>ia
> >>
> >>>/src/main/java/samza/examples/databasereader/system/DatabaseReaderConsum  
> >>>er
> >> >.java
> >> >
> >> >Anyway, thanks for your time,
> >> >Jonathan
> >>
> >>
> >
>  
>  


Re: Unable to call external library classes from within Samza

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey Jonathan,

Yep, you should put your jar into the lib directory of your Samza job
package.

This can be done using Maven's assembly plugin. You can have the plugin
put all runtime dependencies (including transitive dependencies) into your
lib directory. Then you just need to add a dependency, and it'll get
sucked in as part of the Maven build. This is what hello-samza does.

Here's hello-samza's assembly file:

https://github.com/linkedin/hello-samza/blob/master/samza-job-package/src/m
ain/assembly/src.xml?source=c


The most important part is the "dependencySets" section.

<dependencySet>
      <outputDirectory>lib</outputDirectory>
      <includes>
        <include>org.apache.samza:samza-core_2.8.1</include>
        <include>org.apache.samza:samza-kafka_2.8.1</include>
        <include>org.apache.samza:samza-serializers_2.8.1</include>
        <include>org.apache.samza:samza-yarn_2.8.1</include>
        <include>org.slf4j:slf4j-log4j12</include>
        <include>samza:samza-wikipedia</include>
        <include>org.apache.kafka:kafka_2.8.1</include>
      </includes>
      <useTransitiveFiltering>true</useTransitiveFiltering>
    </dependencySet>

You can see that we're including transitive dependencies, and depending on
the Samza libraries, as well as samza-wikipedia. You can either add a new
<include> block here for your MySQL jars, OR you can have your Samza code
package's pom.xml depend on the MySQL jar, and it will get sucked in
automatically (due to useTransitiveFiltering).


Cheers,
Chris

On 2/10/14 3:49 PM, "Jonathan Poltak Samosir" <jo...@gmail.com>
wrote:

>Yep, sure enough the jar I need is not in my classpath (as per the YARN
>container's stdout).
>
>To fix this issue, and for future reference, what is the best way of
>adding external libraries to the classpath for Samza?
>
>Going by how the samza/bin/run-class.sh script is written, it seems like
>it would probably work if I place the jars I need in samza/lib directory.
>This seems rather hacky though, and I would have to do it everytime I run
>`mvn clean package`.
>
>So is there something I'm missing in the Maven pom.xml that is placing
>all the needed libraries, apart from the new ones I've added as
>dependencies, into the samza/lib directory of the tarball? (sorry if this
>is something obvious...)
>
>Thanks,
>Jonathan
>
>
>------------------------------------------------------
>From: Chris Riccomini criccomini@linkedin.com
>Reply: dev@samza.incubator.apache.org dev@samza.incubator.apache.org
>Date: 10 February 2014 at 14:16:04
>To: dev@samza.incubator.apache.org dev@samza.incubator.apache.org
>Subject:  Re: Unable to call external library classes from within Samza
>
>>  
>> Hey Jonathan,
>>  
>> Can you post your classpath? You can usually find this in your
>> YARN
>> container's stdout file, if you're running with YARN. If you're
>> running
>> with LocalJobFactory, it should print to SDTOUT.
>>  
>> Cheers,
>> Chris
>>  
>> On 2/8/14 11:54 AM, "Jonathan Poltak Samosir"
>> wrote:
>>  
>> >Hello,
>> >
>> >This is a bit of a follow on on a previous thread of mine,
>> >"org.apache.hadoop.util.Shell$ExitCodeException whenever
>> Samza container
>> >launches".
>> >
>> >So I am not sure whether this is a bug, or if I am not doing something
>> 
>> >correct, or it is intended, but whenever I attempt to call
>> >Class.forName() on an external library class from within Samza,
>> I am
>> >running into a ClassNotFoundException. My Maven deps are set
>> up
>> >correctly, and the exact same code works without any issues
>> if invoked
>> >manually through a main() method, for example, as opposed to
>> running it
>> >through Samza.
>> >
>> >The reason I want to do this, is to test that classes for JDBC drivers
>> 
>> >can be found and used. I fully understand that this method is
>> no longer
>> >needed as of JDBC 4.0, and that the drivers should be automatically
>> >loaded if found in the class path, although this isn't happening
>> either
>> >(an SQLException is thrown with "No suitable driver found"
>> if I leave out
>> >the Class.forName() call, which leads to the same problem).
>> Just so you
>> >know, this same code also works fine and loads the appropriate
>> JDBC
>> >driver if invoked directly through a main() method.
>> >
>> >I have also tried this with a number of different external libraries,
>> >both pulled in using Maven and manually linking .jar files on
>> my disk,
>> >and the same result; those classes can be found fine from a standard
>> Java
>> >main() method call, but cannot be found running in a Samza container.
>> >
>> >Has anyone encountered the same issue, or know of why this could
>> be
>> >happening?
>> >
>> >I hope all that made sense, let me know if it didn't and I'll try
>> to
>> >rephrase.
>> >
>> >Here is the class I am trying to do this in, in case anyone wants
>> to see
>> >what I'm talking about:
>> 
>>>https://github.com/poltak/hello-samza/blob/database-reader/samza-wikiped
>>>ia  
>> 
>>>/src/main/java/samza/examples/databasereader/system/DatabaseReaderConsum
>>>er  
>> >.java
>> >
>> >Anyway, thanks for your time,
>> >Jonathan
>>  
>>  
>


Re: Unable to call external library classes from within Samza

Posted by Jonathan Poltak Samosir <jo...@gmail.com>.
Yep, sure enough the jar I need is not in my classpath (as per the YARN container's stdout).

To fix this issue, and for future reference, what is the best way of adding external libraries to the classpath for Samza? 

Going by how the samza/bin/run-class.sh script is written, it seems like it would probably work if I place the jars I need in samza/lib directory. This seems rather hacky though, and I would have to do it everytime I run `mvn clean package`. 

So is there something I'm missing in the Maven pom.xml that is placing all the needed libraries, apart from the new ones I've added as dependencies, into the samza/lib directory of the tarball? (sorry if this is something obvious...)

Thanks,
Jonathan


------------------------------------------------------
From: Chris Riccomini criccomini@linkedin.com
Reply: dev@samza.incubator.apache.org dev@samza.incubator.apache.org
Date: 10 February 2014 at 14:16:04
To: dev@samza.incubator.apache.org dev@samza.incubator.apache.org
Subject:  Re: Unable to call external library classes from within Samza

>  
> Hey Jonathan,
>  
> Can you post your classpath? You can usually find this in your  
> YARN
> container's stdout file, if you're running with YARN. If you're  
> running
> with LocalJobFactory, it should print to SDTOUT.
>  
> Cheers,
> Chris
>  
> On 2/8/14 11:54 AM, "Jonathan Poltak Samosir"  
> wrote:
>  
> >Hello,
> >
> >This is a bit of a follow on on a previous thread of mine,
> >"org.apache.hadoop.util.Shell$ExitCodeException whenever  
> Samza container
> >launches".
> >
> >So I am not sure whether this is a bug, or if I am not doing something  
> >correct, or it is intended, but whenever I attempt to call
> >Class.forName() on an external library class from within Samza,  
> I am
> >running into a ClassNotFoundException. My Maven deps are set  
> up
> >correctly, and the exact same code works without any issues  
> if invoked
> >manually through a main() method, for example, as opposed to  
> running it
> >through Samza.
> >
> >The reason I want to do this, is to test that classes for JDBC drivers  
> >can be found and used. I fully understand that this method is  
> no longer
> >needed as of JDBC 4.0, and that the drivers should be automatically  
> >loaded if found in the class path, although this isn't happening  
> either
> >(an SQLException is thrown with "No suitable driver found"  
> if I leave out
> >the Class.forName() call, which leads to the same problem).  
> Just so you
> >know, this same code also works fine and loads the appropriate  
> JDBC
> >driver if invoked directly through a main() method.
> >
> >I have also tried this with a number of different external libraries,  
> >both pulled in using Maven and manually linking .jar files on  
> my disk,
> >and the same result; those classes can be found fine from a standard  
> Java
> >main() method call, but cannot be found running in a Samza container.  
> >
> >Has anyone encountered the same issue, or know of why this could  
> be
> >happening?
> >
> >I hope all that made sense, let me know if it didn't and I'll try  
> to
> >rephrase.
> >
> >Here is the class I am trying to do this in, in case anyone wants  
> to see
> >what I'm talking about:
> >https://github.com/poltak/hello-samza/blob/database-reader/samza-wikipedia  
> >/src/main/java/samza/examples/databasereader/system/DatabaseReaderConsumer  
> >.java
> >
> >Anyway, thanks for your time,
> >Jonathan
>  
>  


Re: Unable to call external library classes from within Samza

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey Jonathan,

Can you post your classpath? You can usually find this in your YARN
container's stdout file, if you're running with YARN. If you're running
with LocalJobFactory, it should print to SDTOUT.

Cheers,
Chris

On 2/8/14 11:54 AM, "Jonathan Poltak Samosir" <jo...@gmail.com>
wrote:

>Hello,
>
>This is a bit of a follow on on a previous thread of mine,
>"org.apache.hadoop.util.Shell$ExitCodeException whenever Samza container
>launches".
>
>So I am not sure whether this is a bug, or if I am not doing something
>correct, or it is intended, but whenever I attempt to call
>Class.forName() on an external library class from within Samza, I am
>running into a ClassNotFoundException. My Maven deps are set up
>correctly, and the exact same code works without any issues if invoked
>manually through a main() method, for example, as opposed to running it
>through Samza.
>
>The reason I want to do this, is to test that classes for JDBC drivers
>can be found and used. I fully understand that this method is no longer
>needed as of JDBC 4.0, and that the drivers should be automatically
>loaded if found in the class path, although this isn't happening either
>(an SQLException is thrown with "No suitable driver found" if I leave out
>the Class.forName() call, which leads to the same problem). Just so you
>know, this same code also works fine and loads the appropriate JDBC
>driver if invoked directly through a main() method.
>
>I have also tried this with a number of different external libraries,
>both pulled in using Maven and manually linking .jar files on my disk,
>and the same result; those classes can be found fine from a standard Java
>main() method call, but cannot be found running in a Samza container.
>
>Has anyone encountered the same issue, or know of why this could be
>happening?
>
>I hope all that made sense, let me know if it didn't and I'll try to
>rephrase.
>
>Here is the class I am trying to do this in, in case anyone wants to see
>what I'm talking about:
>https://github.com/poltak/hello-samza/blob/database-reader/samza-wikipedia
>/src/main/java/samza/examples/databasereader/system/DatabaseReaderConsumer
>.java
>
>Anyway, thanks for your time,
>Jonathan