You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/08/08 23:02:55 UTC

Writable vs Externalizable

Hi,

can someone please tell me what is the technical difference between
org.apache.nutch.io.Writable and java.io.Externalizable?

For me that looks very similar and Externalizable is available since  
jdk 1.1.
What do I miss?

Thanks for any hints.
Stefan

Re: Writable vs Externalizable

Posted by Stefan Groschupf <sg...@media-style.com>.

> What do others think?

I think, RMI isn't a good idea. I waste a  lot of time with it. I  
like the nutch rpc very much.
However I think usage of Externalizable is a good idea, first it is a  
very small change.
Second many users use nutch for very custom things and usage of   
Externalizable make customization more easily.
For example using caching frameworks, (what in some case makes a lot  
of sense), post or pre processing data, experimental data storages etc.

For sure such a change is low priority but I would love to see it.

Stefan

RE: Writable vs Externalizable

Posted by Chirag Chaman <de...@filangy.com>.

In our experience, we use flavors of Nutch RPC, RMI and Externalizable.

RMI has been easy to implement when only one server needs to be accessed
(such as a status check) and class has many functions. 

The Nutch RPC is excellent for distribution -- yes one needs to serialize by
hand and create the OP_CODE, but while distributing you don't wand the
classes to be very heavy. We've created a few other distrusted server that
use the Nutch RPC as the distribution mechanism. B

Java RPC implementation while having improved over the years is still
heavier than RPC and in our tests took slightly longer. While we wanted to
use one or the other -- we got a lot better performance/milage by evaluating
which would be better for the particular subsystem. Distributed, homogenous
systems we use Nutch RPC. On more fluid, complex/vertical systems we started
with plain RMI (as it's a lot faster to develop/test) and then externalized
once functionality was solidified. 

This is just our experience, though as (and if) things get more complicated
it may make sense to look at RMI again. I personally feel if you're going to
do RMI and then write all the externalization stuff, why not just stick with
the simplified RPC -- the work involved is pretty much the same, and the
latter gives you more control with better speed.

-----Original Message-----
From: Doug Cutting [mailto:cutting@nutch.org] 
Sent: Monday, August 08, 2005 5:29 PM
To: nutch-dev@lucene.apache.org
Subject: Re: Writable vs Externalizable

Stefan Groschupf wrote:
> can someone please tell me what is the technical difference between 
> org.apache.nutch.io.Writable and java.io.Externalizable?
> 
> For me that looks very similar and Externalizable is available since 
> jdk 1.1.
> What do I miss?

You don't miss much!

I avoided using Java's built-in Serialization and RMI when first writing
Nutch as I wanted close control of how objects are written and of the
client/server architecture (how it connects, how many connections, what
happens when things fail, etc).  I felt that it might be difficult to use
parts of Serialization and RMI without getting tangled in the rest.

Yes, we could easily switch to using java.io.Externalizable in place of
org.apache.nutch.io.Writable.  We would also then need to switch to using
ObjectInput and ObjectOutput in place of DataInput and DataOutput. 
   But how should we implement writeObject() and readObject()?  I'm hesitant
to use ObjectInputStream and ObjectOutputStream, since these have a lot of
other baggage, but maybe I'm just paranoid.

That said, in org.apache.nutch.io.ObjectWritable (mapred branch) I have now
recreated much of object serialization, so perhaps it is time to seriously
reconsider this decision.

In general I try to not adopt libraries into the core that include a lot of
complex functionality that we don't intend to use.  Java's Serialization
provides a lot of features needed for RMI that I don't think that Nutch
requires.

What do others think?

Doug

Re: Writable vs Externalizable

Posted by Doug Cutting <cu...@nutch.org>.

Stefan Groschupf wrote:
> can someone please tell me what is the technical difference between
> org.apache.nutch.io.Writable and java.io.Externalizable?
> 
> For me that looks very similar and Externalizable is available since  
> jdk 1.1.
> What do I miss?

You don't miss much!

I avoided using Java's built-in Serialization and RMI when first writing 
Nutch as I wanted close control of how objects are written and of the 
client/server architecture (how it connects, how many connections, what 
happens when things fail, etc).  I felt that it might be difficult to 
use parts of Serialization and RMI without getting tangled in the rest.

Yes, we could easily switch to using java.io.Externalizable in place of 
org.apache.nutch.io.Writable.  We would also then need to switch to 
using ObjectInput and ObjectOutput in place of DataInput and DataOutput. 
   But how should we implement writeObject() and readObject()?  I'm 
hesitant to use ObjectInputStream and ObjectOutputStream, since these 
have a lot of other baggage, but maybe I'm just paranoid.

That said, in org.apache.nutch.io.ObjectWritable (mapred branch) I have 
now recreated much of object serialization, so perhaps it is time to 
seriously reconsider this decision.

In general I try to not adopt libraries into the core that include a lot 
of complex functionality that we don't intend to use.  Java's 
Serialization provides a lot of features needed for RMI that I don't 
think that Nutch requires.

What do others think?

Doug