You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Navin Ipe <na...@searchlighthealth.com> on 2016/05/04 09:47:59 UTC

How to you store database connections in a Spout or Bolt without serialization problems?

Hi,

I know that if a MySQL database connection is instantiated in the
constructor of a Spout or Bolt, it won't work. It should be instantiated in
open() or prepare().

Problem is, when I store this database connection as a member of a class
which is a member of a bolt. Eg:













*public class MongoIteratorBolt extends BaseRichBolt {    private S1Table
s1;}    public class S1Table implements Serializable {        private
Connection connRef;    private Statement stmt;    private ResultSet rs;
        public S1Table(Connection conn, final String tableName) {
try {            this.connRef = conn;            this.stmt =
conn.createStatement();            *

I get an error like this:









*    8811 [main] ERROR o.a.s.s.o.a.z.s.NIOServerCnxnFactory - Thread
Thread[main,5,main] diedjava.lang.IllegalStateException: Bolt 'mongoBolt'
contains a non-serializable field of type
com.mysql.jdbc.SingleByteCharsetConverter, which was instantiated prior to
topology creation. com.mysql.jdbc.SingleByteCharsetConverter should be
instantiated within the prepare method of 'mongoBolt at the earliest.    at
org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:127)
~[MyStorm.jar:?]    at com.slh.Mystorm.MyStorm.main(MyStorm.java:76)
~[MyStorm.jar:?]Caused by: java.lang.RuntimeException:
java.io.NotSerializableException:
com.mysql.jdbc.SingleByteCharsetConverter    at
org.apache.storm.utils.Utils.javaSerialize(Utils.java:167)
~[MyStorm.jar:?]    at
org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:122)
~[MyStorm.jar:?]    ... 1 moreCaused by: java.io.NotSerializableException:
com.mysql.jdbc.SingleByteCharsetConverter    at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
~[?:1.8.0_73]*


I assume it is because of one of these which aren't getting serialized:


*    private Connection connRef;    private Statement stmt;    private
ResultSet rs;    *

So if you can't declare them as class members because they don't get
serialized, then how do you declare them so that the entire class will have
access to it and I won't have to keep creating new connections for every
query?
I'm quite sure that declaring and initializing them in *prepare()* won't
ensure that the rest of the class functions would be able to access it.

-- 
Regards,
Navin

Re: How to you store database connections in a Spout or Bolt without serialization problems?

Posted by Navin Ipe <na...@searchlighthealth.com>.
Hmm...ok thanks. In this case I need to preserve state, so can't use
transient.
Anyway, I redesigned the classes to keep the connection strings elsewhere,
and now everything is working fine.
Thanks a lot!

On Wed, May 4, 2016 at 3:59 PM, Jungtaek Lim <ka...@gmail.com> wrote:

> Declare them as "class fields" but as transient (not mandatory) and
> initialize them in prepare() or open().
>
> Leaving it as uninitialized until prepare() or open() gets called doesn't
> make any issue because of lifecycle of task of Apache Storm.
>
> On Wednesday, May 4, 2016, Navin Ipe <na...@searchlighthealth.com>
> wrote:
>
>> Yes, I know they should be initialized in open() or prepare(). But I'm
>> referring to the declaration. If I do this:
>>
>>     @Override
>>     public void prepare(Map map, TopologyContext tc, OutputCollector oc)
>> {
>>         private Connection connRef;
>>         private Statement stmt;
>>         private ResultSet rs;
>>     }
>>
>> Then connRef, stmt and rs won't be available to execute() or nextTuple(),
>> right? So how to declare them to avoid the serialization error, is what I'm
>> asking.
>>
>> On Wed, May 4, 2016 at 3:34 PM, Sinnema, Remon <re...@emc.com>
>> wrote:
>>
>>> Hi Navin,
>>>
>>>
>>>
>>> A DB connection is from one machine to another, how do you expect to
>>> share that between spouts and/or bolts that run on multiple machines? You
>>> should really set up the connection in open() or prepare(), so that it
>>> is specific to the machine that the spout or bolt runs on.
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Ray
>>>
>>>
>>>
>>>
>>>
>>> *From:* Navin Ipe [mailto:navin.ipe@searchlighthealth.com]
>>> *Sent:* woensdag 4 mei 2016 11:48
>>> *To:* user@storm.apache.org
>>> *Subject:* How to you store database connections in a Spout or Bolt
>>> without serialization problems?
>>>
>>>
>>>
>>> Hi,
>>>
>>> I know that if a MySQL database connection is instantiated in the
>>> constructor of a Spout or Bolt, it won't work. It should be instantiated in
>>> open() or prepare().
>>>
>>> Problem is, when I store this database connection as a member of a class
>>> which is a member of a bolt. Eg:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *public class MongoIteratorBolt extends BaseRichBolt {     private
>>> S1Table s1; }     public class S1Table implements Serializable {
>>> private Connection connRef;     private Statement stmt;     private
>>> ResultSet rs;            public S1Table(Connection conn, final String
>>> tableName) {         try {             this.connRef = conn;
>>> this.stmt = conn.createStatement();            *
>>>
>>>
>>>
>>> I get an error like this:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *    8811 [main] ERROR o.a.s.s.o.a.z.s.NIOServerCnxnFactory - Thread
>>> Thread[main,5,main] died java.lang.IllegalStateException: Bolt 'mongoBolt'
>>> contains a non-serializable field of type
>>> com.mysql.jdbc.SingleByteCharsetConverter, which was instantiated prior to
>>> topology creation. com.mysql.jdbc.SingleByteCharsetConverter should be
>>> instantiated within the prepare method of 'mongoBolt at the earliest.
>>> at
>>> org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:127)
>>> ~[MyStorm.jar:?]     at com.slh.Mystorm.MyStorm.main(MyStorm.java:76)
>>> ~[MyStorm.jar:?] Caused by: java.lang.RuntimeException:
>>> java.io.NotSerializableException: com.mysql.jdbc.SingleByteCharsetConverter
>>>     at org.apache.storm.utils.Utils.javaSerialize(Utils.java:167)
>>> ~[MyStorm.jar:?]     at
>>> org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:122)
>>> ~[MyStorm.jar:?]     ... 1 more Caused by:
>>> java.io.NotSerializableException: com.mysql.jdbc.SingleByteCharsetConverter
>>>     at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
>>> ~[?:1.8.0_73]*
>>>
>>> I assume it is because of one of these which aren't getting serialized:
>>>
>>>
>>> *    private Connection connRef;     private Statement stmt;     private
>>> ResultSet rs;    *
>>>
>>> So if you can't declare them as class members because they don't get
>>> serialized, then how do you declare them so that the entire class will have
>>> access to it and I won't have to keep creating new connections for every
>>> query?
>>>
>>> I'm quite sure that declaring and initializing them in *prepare()*
>>> won't ensure that the rest of the class functions would be able to access
>>> it.
>>>
>>>
>>> --
>>>
>>> Regards,
>>>
>>> Navin
>>>
>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>
>
> --
> Name : Jungtaek Lim
> Blog : http://medium.com/@heartsavior
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>
>


-- 
Regards,
Navin

Re: How to you store database connections in a Spout or Bolt without serialization problems?

Posted by Jungtaek Lim <ka...@gmail.com>.
Declare them as "class fields" but as transient (not mandatory) and
initialize them in prepare() or open().

Leaving it as uninitialized until prepare() or open() gets called doesn't
make any issue because of lifecycle of task of Apache Storm.

On Wednesday, May 4, 2016, Navin Ipe <na...@searchlighthealth.com>
wrote:

> Yes, I know they should be initialized in open() or prepare(). But I'm
> referring to the declaration. If I do this:
>
>     @Override
>     public void prepare(Map map, TopologyContext tc, OutputCollector oc)
> {
>         private Connection connRef;
>         private Statement stmt;
>         private ResultSet rs;
>     }
>
> Then connRef, stmt and rs won't be available to execute() or nextTuple(),
> right? So how to declare them to avoid the serialization error, is what I'm
> asking.
>
> On Wed, May 4, 2016 at 3:34 PM, Sinnema, Remon <remon.sinnema@emc.com
> <javascript:_e(%7B%7D,'cvml','remon.sinnema@emc.com');>> wrote:
>
>> Hi Navin,
>>
>>
>>
>> A DB connection is from one machine to another, how do you expect to
>> share that between spouts and/or bolts that run on multiple machines? You
>> should really set up the connection in open() or prepare(), so that it
>> is specific to the machine that the spout or bolt runs on.
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Ray
>>
>>
>>
>>
>>
>> *From:* Navin Ipe [mailto:navin.ipe@searchlighthealth.com
>> <javascript:_e(%7B%7D,'cvml','navin.ipe@searchlighthealth.com');>]
>> *Sent:* woensdag 4 mei 2016 11:48
>> *To:* user@storm.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@storm.apache.org');>
>> *Subject:* How to you store database connections in a Spout or Bolt
>> without serialization problems?
>>
>>
>>
>> Hi,
>>
>> I know that if a MySQL database connection is instantiated in the
>> constructor of a Spout or Bolt, it won't work. It should be instantiated in
>> open() or prepare().
>>
>> Problem is, when I store this database connection as a member of a class
>> which is a member of a bolt. Eg:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *public class MongoIteratorBolt extends BaseRichBolt {     private
>> S1Table s1; }     public class S1Table implements Serializable {
>> private Connection connRef;     private Statement stmt;     private
>> ResultSet rs;            public S1Table(Connection conn, final String
>> tableName) {         try {             this.connRef = conn;
>> this.stmt = conn.createStatement();            *
>>
>>
>>
>> I get an error like this:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *    8811 [main] ERROR o.a.s.s.o.a.z.s.NIOServerCnxnFactory - Thread
>> Thread[main,5,main] died java.lang.IllegalStateException: Bolt 'mongoBolt'
>> contains a non-serializable field of type
>> com.mysql.jdbc.SingleByteCharsetConverter, which was instantiated prior to
>> topology creation. com.mysql.jdbc.SingleByteCharsetConverter should be
>> instantiated within the prepare method of 'mongoBolt at the earliest.
>> at
>> org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:127)
>> ~[MyStorm.jar:?]     at com.slh.Mystorm.MyStorm.main(MyStorm.java:76)
>> ~[MyStorm.jar:?] Caused by: java.lang.RuntimeException:
>> java.io.NotSerializableException: com.mysql.jdbc.SingleByteCharsetConverter
>>     at org.apache.storm.utils.Utils.javaSerialize(Utils.java:167)
>> ~[MyStorm.jar:?]     at
>> org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:122)
>> ~[MyStorm.jar:?]     ... 1 more Caused by:
>> java.io.NotSerializableException: com.mysql.jdbc.SingleByteCharsetConverter
>>     at
>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
>> ~[?:1.8.0_73]*
>>
>> I assume it is because of one of these which aren't getting serialized:
>>
>>
>> *    private Connection connRef;     private Statement stmt;     private
>> ResultSet rs;    *
>>
>> So if you can't declare them as class members because they don't get
>> serialized, then how do you declare them so that the entire class will have
>> access to it and I won't have to keep creating new connections for every
>> query?
>>
>> I'm quite sure that declaring and initializing them in *prepare()* won't
>> ensure that the rest of the class functions would be able to access it.
>>
>>
>> --
>>
>> Regards,
>>
>> Navin
>>
>
>
>
> --
> Regards,
> Navin
>


-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Re: How to you store database connections in a Spout or Bolt without serialization problems?

Posted by Navin Ipe <na...@searchlighthealth.com>.
Yes, I know they should be initialized in open() or prepare(). But I'm
referring to the declaration. If I do this:

    @Override
    public void prepare(Map map, TopologyContext tc, OutputCollector oc)
{
        private Connection connRef;
        private Statement stmt;
        private ResultSet rs;
    }

Then connRef, stmt and rs won't be available to execute() or nextTuple(),
right? So how to declare them to avoid the serialization error, is what I'm
asking.

On Wed, May 4, 2016 at 3:34 PM, Sinnema, Remon <re...@emc.com>
wrote:

> Hi Navin,
>
>
>
> A DB connection is from one machine to another, how do you expect to share
> that between spouts and/or bolts that run on multiple machines? You should
> really set up the connection in open() or prepare(), so that it is
> specific to the machine that the spout or bolt runs on.
>
>
>
>
>
> Thanks,
>
> Ray
>
>
>
>
>
> *From:* Navin Ipe [mailto:navin.ipe@searchlighthealth.com]
> *Sent:* woensdag 4 mei 2016 11:48
> *To:* user@storm.apache.org
> *Subject:* How to you store database connections in a Spout or Bolt
> without serialization problems?
>
>
>
> Hi,
>
> I know that if a MySQL database connection is instantiated in the
> constructor of a Spout or Bolt, it won't work. It should be instantiated in
> open() or prepare().
>
> Problem is, when I store this database connection as a member of a class
> which is a member of a bolt. Eg:
>
>
>
>
>
>
>
>
>
>
>
>
>
> *public class MongoIteratorBolt extends BaseRichBolt {     private S1Table
> s1; }     public class S1Table implements Serializable {        private
> Connection connRef;     private Statement stmt;     private ResultSet
> rs;            public S1Table(Connection conn, final String tableName) {
>         try {             this.connRef = conn;             this.stmt =
> conn.createStatement();            *
>
>
>
> I get an error like this:
>
>
>
>
>
>
>
>
>
> *    8811 [main] ERROR o.a.s.s.o.a.z.s.NIOServerCnxnFactory - Thread
> Thread[main,5,main] died java.lang.IllegalStateException: Bolt 'mongoBolt'
> contains a non-serializable field of type
> com.mysql.jdbc.SingleByteCharsetConverter, which was instantiated prior to
> topology creation. com.mysql.jdbc.SingleByteCharsetConverter should be
> instantiated within the prepare method of 'mongoBolt at the earliest.
> at
> org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:127)
> ~[MyStorm.jar:?]     at com.slh.Mystorm.MyStorm.main(MyStorm.java:76)
> ~[MyStorm.jar:?] Caused by: java.lang.RuntimeException:
> java.io.NotSerializableException: com.mysql.jdbc.SingleByteCharsetConverter
>     at org.apache.storm.utils.Utils.javaSerialize(Utils.java:167)
> ~[MyStorm.jar:?]     at
> org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:122)
> ~[MyStorm.jar:?]     ... 1 more Caused by:
> java.io.NotSerializableException: com.mysql.jdbc.SingleByteCharsetConverter
>     at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
> ~[?:1.8.0_73]*
>
> I assume it is because of one of these which aren't getting serialized:
>
>
> *    private Connection connRef;     private Statement stmt;     private
> ResultSet rs;    *
>
> So if you can't declare them as class members because they don't get
> serialized, then how do you declare them so that the entire class will have
> access to it and I won't have to keep creating new connections for every
> query?
>
> I'm quite sure that declaring and initializing them in *prepare()* won't
> ensure that the rest of the class functions would be able to access it.
>
>
> --
>
> Regards,
>
> Navin
>



-- 
Regards,
Navin

RE: How to you store database connections in a Spout or Bolt without serialization problems?

Posted by "Sinnema, Remon" <re...@emc.com>.
Hi Navin,

A DB connection is from one machine to another, how do you expect to share that between spouts and/or bolts that run on multiple machines? You should really set up the connection in open() or prepare(), so that it is specific to the machine that the spout or bolt runs on.


Thanks,
Ray


From: Navin Ipe [mailto:navin.ipe@searchlighthealth.com]
Sent: woensdag 4 mei 2016 11:48
To: user@storm.apache.org
Subject: How to you store database connections in a Spout or Bolt without serialization problems?

Hi,
I know that if a MySQL database connection is instantiated in the constructor of a Spout or Bolt, it won't work. It should be instantiated in open() or prepare().
Problem is, when I store this database connection as a member of a class which is a member of a bolt. Eg:

public class MongoIteratorBolt extends BaseRichBolt {
    private S1Table s1;
}

public class S1Table implements Serializable {
    private Connection connRef;
    private Statement stmt;
    private ResultSet rs;

    public S1Table(Connection conn, final String tableName) {
        try {
            this.connRef = conn;
            this.stmt = conn.createStatement();

I get an error like this:
    8811 [main] ERROR o.a.s.s.o.a.z.s.NIOServerCnxnFactory - Thread Thread[main,5,main] died
java.lang.IllegalStateException: Bolt 'mongoBolt' contains a non-serializable field of type com.mysql.jdbc.SingleByteCharsetConverter, which was instantiated prior to topology creation. com.mysql.jdbc.SingleByteCharsetConverter should be instantiated within the prepare method of 'mongoBolt at the earliest.
    at org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:127) ~[MyStorm.jar:?]
    at com.slh.Mystorm.MyStorm.main(MyStorm.java:76) ~[MyStorm.jar:?]
Caused by: java.lang.RuntimeException: java.io.NotSerializableException: com.mysql.jdbc.SingleByteCharsetConverter
    at org.apache.storm.utils.Utils.javaSerialize(Utils.java:167) ~[MyStorm.jar:?]
    at org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:122) ~[MyStorm.jar:?]
    ... 1 more
Caused by: java.io.NotSerializableException: com.mysql.jdbc.SingleByteCharsetConverter
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) ~[?:1.8.0_73]

I assume it is because of one of these which aren't getting serialized:
    private Connection connRef;
    private Statement stmt;
    private ResultSet rs;
So if you can't declare them as class members because they don't get serialized, then how do you declare them so that the entire class will have access to it and I won't have to keep creating new connections for every query?
I'm quite sure that declaring and initializing them in prepare() won't ensure that the rest of the class functions would be able to access it.

--
Regards,
Navin

Re: How to you store database connections in a Spout or Bolt without serialization problems?

Posted by Jungtaek Lim <ka...@gmail.com>.
Navin,

Lifecycle of Spout and Bolt ensures that you can use fields which are
initialized in prepare() safely in execute(), nextTuple(), ack(), fail().
In other words, prepare() will be called earlier than other methods.
So please declare them as transient and initialize in prepare().

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 5월 4일 (수) 오후 6:48, Navin Ipe <na...@searchlighthealth.com>님이 작성:

> Hi,
>
> I know that if a MySQL database connection is instantiated in the
> constructor of a Spout or Bolt, it won't work. It should be instantiated in
> open() or prepare().
>
> Problem is, when I store this database connection as a member of a class
> which is a member of a bolt. Eg:
>
>
>
>
>
>
>
>
>
>
>
>
>
> *public class MongoIteratorBolt extends BaseRichBolt {    private S1Table
> s1;}    public class S1Table implements Serializable {        private
> Connection connRef;    private Statement stmt;    private ResultSet rs;
>         public S1Table(Connection conn, final String tableName) {
> try {            this.connRef = conn;            this.stmt =
> conn.createStatement();            *
>
> I get an error like this:
>
>
>
>
>
>
>
>
>
> *    8811 [main] ERROR o.a.s.s.o.a.z.s.NIOServerCnxnFactory - Thread
> Thread[main,5,main] diedjava.lang.IllegalStateException: Bolt 'mongoBolt'
> contains a non-serializable field of type
> com.mysql.jdbc.SingleByteCharsetConverter, which was instantiated prior to
> topology creation. com.mysql.jdbc.SingleByteCharsetConverter should be
> instantiated within the prepare method of 'mongoBolt at the earliest.    at
> org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:127)
> ~[MyStorm.jar:?]    at com.slh.Mystorm.MyStorm.main(MyStorm.java:76)
> ~[MyStorm.jar:?]Caused by: java.lang.RuntimeException:
> java.io.NotSerializableException:
> com.mysql.jdbc.SingleByteCharsetConverter    at
> org.apache.storm.utils.Utils.javaSerialize(Utils.java:167)
> ~[MyStorm.jar:?]    at
> org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:122)
> ~[MyStorm.jar:?]    ... 1 moreCaused by: java.io.NotSerializableException:
> com.mysql.jdbc.SingleByteCharsetConverter    at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
> ~[?:1.8.0_73]*
>
>
> I assume it is because of one of these which aren't getting serialized:
>
>
> *    private Connection connRef;    private Statement stmt;    private
> ResultSet rs;    *
>
> So if you can't declare them as class members because they don't get
> serialized, then how do you declare them so that the entire class will have
> access to it and I won't have to keep creating new connections for every
> query?
> I'm quite sure that declaring and initializing them in *prepare()* won't
> ensure that the rest of the class functions would be able to access it.
>
> --
> Regards,
> Navin
>