You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by Damien Raude-Morvan <dr...@drazzib.com> on 2013/08/12 01:37:05 UTC

Issue with IOUtils static SerializationFactory field

Hi folks,

I think I might have found an issue in Gora IOUtils class.

Right now, IOUtils keep a *static* reference to an SerializationFactory
which is initialized on first call to writeObject() with a Configuration
instance. Given Configuration is also stored in a static field of same
class for latter usage.

But in fact each call to IOUtils.writeObject() can have a different
Configuration instance than previous one. In my personnal use case, I've
multiple M/R jobs which use Gora M/R feature to process Persistent object
but each job can work with a different datastore configuration (for
instance, name of table/collection/colum family).

If we keep a static reference to SerializationFactory (and so its
Configuration reference),
QueryBase#readFields will then create a DataStore with wrong Configuration
(ie. using first DataStore/Configuration instead of new one)

I've started working on this issue, and come up with a possible fix :
https://github.com/drazzib/gora/compare/apache-gora-0.2.1...ioutils_static_conf
- remove static SerializationFactory from IOUtils (will recreate it every
time)
- in PartitionQueryImpl and QueryBase now send *current* configuration to
deserialize
One linked fix, is that gora "drivers" needs to be updated to define
Configuration instance in PartitionQueryImpl (like this
https://github.com/drazzib/gora/commit/395f2e2ad50d524f42ecc563104c165fa0fa6f39
).

What do you think about this issue ?
If you need it, I can produce a reduced test case to help you understanding
this

Cheers,
-- 
Damien

Re: Issue with IOUtils static SerializationFactory field

Posted by Renato MarroquĂ­n Mogrovejo <re...@gmail.com>.
Hi Damien,

It's really nice there is people out there helping out with the Gora
MapReduce stuff, thanks! (:
I think you are right about the problem of the static reference, and for
some use-cases it is not suitable for sure. And as you have already started
on this, I think it totally makes sense.
I have been looking at the MapReduce classes (GoraInputFormat,
GoraInputSplit, and others) these last days, and I totally understand what
you are talking about. Maybe you'd like to open a JIRA for this, and if you
could put a patch up I'd be happy to push it Damien.
Now about the PartitionQueryImpl I saw you are also using a single
partition as you are using MongoDB sharded, but how would you envision to
use this if you were? I am asking this because I trying to fix the Hadoop
support for Cassandra, but I haven't got a clear idea of this. Every data
store is different and having a standard approach would probably help other
modules to get this one right.
Thanks Damien!


Renato M.


2013/8/11 Damien Raude-Morvan <dr...@drazzib.com>

> Hi folks,
>
> I think I might have found an issue in Gora IOUtils class.
>
> Right now, IOUtils keep a *static* reference to an SerializationFactory
> which is initialized on first call to writeObject() with a Configuration
> instance. Given Configuration is also stored in a static field of same
> class for latter usage.
>
> But in fact each call to IOUtils.writeObject() can have a different
> Configuration instance than previous one. In my personnal use case, I've
> multiple M/R jobs which use Gora M/R feature to process Persistent object
> but each job can work with a different datastore configuration (for
> instance, name of table/collection/colum family).
>
> If we keep a static reference to SerializationFactory (and so its
> Configuration reference),
> QueryBase#readFields will then create a DataStore with wrong Configuration
> (ie. using first DataStore/Configuration instead of new one)
>
> I've started working on this issue, and come up with a possible fix :
>
> https://github.com/drazzib/gora/compare/apache-gora-0.2.1...ioutils_static_conf
> - remove static SerializationFactory from IOUtils (will recreate it every
> time)
> - in PartitionQueryImpl and QueryBase now send *current* configuration to
> deserialize
> One linked fix, is that gora "drivers" needs to be updated to define
> Configuration instance in PartitionQueryImpl (like this
>
> https://github.com/drazzib/gora/commit/395f2e2ad50d524f42ecc563104c165fa0fa6f39
> ).
>
> What do you think about this issue ?
> If you need it, I can produce a reduced test case to help you understanding
> this
>
> Cheers,
> --
> Damien
>