You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by Furkan KAMACI <fu...@gmail.com> on 2015/09/02 16:36:34 UTC

Re: Unnecessary Variable

Also, here is the issue for general serialization mechanism for Map Reduce:
https://issues.apache.org/jira/browse/HADOOP-1986

Here is the main usage of that parameter:
https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericDatumReader.html#read(D,
org.apache.avro.io.Decoder)

I've opened a Jira issue for this:
https://issues.apache.org/jira/browse/GORA-431.

@Renato, I've checked Hadoop source codes and I see that we have to find an
appropriate way to pass that property. We do not instantiate that class at
Gora. What do you suggest?


On Thu, Aug 27, 2015 at 7:03 AM, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Yeah in the GoraMapReduceUtils it doesn't get used, I vaguely remember that
> I used it in some code a while ago, but I tried looking out for it, but I
> didn't find anything. The other thing I found was the
> PersistentDeserializer that also uses it [1] and that gets created in the
> GoraMapReduceUtils, but yeah we are always using a <true> value, maybe we
> should add it as a parameter to the configs or reviewing if it makes any
> difference or not anymore.
>
>
>
> https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentDeserializer.java#L70
>
> 2015-08-18 0:09 GMT+02:00 Furkan KAMACI <fu...@gmail.com>:
>
> > They are all passed to
> >
> > GoraMapReduceUtils.setIOSerializations(conf, reuseObjects);
> >
> > at that two examples, am I right? However, it is not used at there.
> > 18 Ağu 2015 00:47 tarihinde "Renato Marroquín Mogrovejo" <
> > renatoj.marroquin@gmail.com> yazdı:
> >
> > > Sorry, I pressed enter too fast there.
> > > As pointed out in the JIRA issue, it is for not creating too many
> objects
> > > while mapping or reducing. You can also find it in the GoraMapper.
> > > It is also used in here:
> > >
> > >
> > >
> >
> https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/GoraMapper.java#L62
> > >
> > > 2015-08-17 14:45 GMT-07:00 Renato Marroquín Mogrovejo <
> > > renatoj.marroquin@gmail.com>:
> > >
> > > > What about ...
> > > >
> > >
> >
> https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/GoraReducer.java#L47
> > > >
> > > > 2015-08-17 14:38 GMT-07:00 Furkan KAMACI <fu...@gmail.com>:
> > > >
> > > >> Whether you set *reuseObjects *to true or false, it is not
> considered
> > at
> > > >> source code..
> > > >>
> > > >> On Tue, Aug 18, 2015 at 12:35 AM, Furkan KAMACI <
> > furkankamaci@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > A job conf is passed to Spark to store RDD results. I needed a
> > similar
> > > >> > piece of code at my implementation (as like storing sth at data
> > store
> > > >> via
> > > >> > Apache Gora). When I check the code, I thought that *reuseObjects
> > > >> *variable
> > > >> > is not necessary for setIOSerializations method at Apache Gora and
> > > >> wanted
> > > >> > to be sure to make compatible with Gora.
> > > >> >
> > > >> > On Tue, Aug 18, 2015 at 12:27 AM, Henry Saputra <
> > > >> henry.saputra@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> >> I think it was from way back, didn't remember why.
> > > >> >>
> > > >> >> Why do you need to remove it to support Spark?
> > > >> >>
> > > >> >> - Henry
> > > >> >>
> > > >> >> On Mon, Aug 17, 2015 at 1:37 PM, Furkan KAMACI <
> > > furkankamaci@gmail.com
> > > >> >
> > > >> >> wrote:
> > > >> >> > Hi All,
> > > >> >> >
> > > >> >> > There is a method as:
> > > >> >> >
> > > >> >> >     public static void setIOSerializations(Configuration conf,
> > > >> boolean
> > > >> >> > reuseObjects)
> > > >> >> >
> > > >> >> > at GoraMapReduceUtils.java
> > > >> >> >
> > > >> >> > However, reuseObjects is never used at that method. I've
> removed
> > it
> > > >> at
> > > >> >> my
> > > >> >> > Spark implementation. Is it necessary for future use?
> > > >> >> >
> > > >> >> > Kind Regards,
> > > >> >> > Furkan KAMACI
> > > >> >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Unnecessary Variable

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi Furkan,

Thanks for taking the time and looking into this.
I agree with you, there is something fuzzy going on there. So our
persistent classes extent org.apache.avro.specific.SpecificRecord and use
their specific DatumReader/Writer which in turn use the the methods that
you are pointing out.
And yes, we should find a way to pass this into the datastores (which
shouldn't be too hard) but we should also double check if it makes sense to
allow the user to set it or not. I mean is it always better to have it set
as <true> (smaller memory footprint)? or is there any case when using the
object between serialization that could cause wrong behaviour if we were
reusing the object? I don't know the actual effect on setting this to true
or false. Could you please create a simple test where we could see the
different behaviour of setting this to true or false? That would be great
as it'd help everybody understand better what was the reasoning for setting
this parameter always true.


Renato M.

2015-09-02 16:36 GMT+02:00 Furkan KAMACI <fu...@gmail.com>:

> Also, here is the issue for general serialization mechanism for Map Reduce:
> https://issues.apache.org/jira/browse/HADOOP-1986
>
> Here is the main usage of that parameter:
>
> https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericDatumReader.html#read(D
> ,
> org.apache.avro.io.Decoder)
>
> I've opened a Jira issue for this:
> https://issues.apache.org/jira/browse/GORA-431.
>
> @Renato, I've checked Hadoop source codes and I see that we have to find an
> appropriate way to pass that property. We do not instantiate that class at
> Gora. What do you suggest?
>
>
> On Thu, Aug 27, 2015 at 7:03 AM, Renato Marroquín Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
> > Yeah in the GoraMapReduceUtils it doesn't get used, I vaguely remember
> that
> > I used it in some code a while ago, but I tried looking out for it, but I
> > didn't find anything. The other thing I found was the
> > PersistentDeserializer that also uses it [1] and that gets created in the
> > GoraMapReduceUtils, but yeah we are always using a <true> value, maybe we
> > should add it as a parameter to the configs or reviewing if it makes any
> > difference or not anymore.
> >
> >
> >
> >
> https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentDeserializer.java#L70
> >
> > 2015-08-18 0:09 GMT+02:00 Furkan KAMACI <fu...@gmail.com>:
> >
> > > They are all passed to
> > >
> > > GoraMapReduceUtils.setIOSerializations(conf, reuseObjects);
> > >
> > > at that two examples, am I right? However, it is not used at there.
> > > 18 Ağu 2015 00:47 tarihinde "Renato Marroquín Mogrovejo" <
> > > renatoj.marroquin@gmail.com> yazdı:
> > >
> > > > Sorry, I pressed enter too fast there.
> > > > As pointed out in the JIRA issue, it is for not creating too many
> > objects
> > > > while mapping or reducing. You can also find it in the GoraMapper.
> > > > It is also used in here:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/GoraMapper.java#L62
> > > >
> > > > 2015-08-17 14:45 GMT-07:00 Renato Marroquín Mogrovejo <
> > > > renatoj.marroquin@gmail.com>:
> > > >
> > > > > What about ...
> > > > >
> > > >
> > >
> >
> https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/GoraReducer.java#L47
> > > > >
> > > > > 2015-08-17 14:38 GMT-07:00 Furkan KAMACI <fu...@gmail.com>:
> > > > >
> > > > >> Whether you set *reuseObjects *to true or false, it is not
> > considered
> > > at
> > > > >> source code..
> > > > >>
> > > > >> On Tue, Aug 18, 2015 at 12:35 AM, Furkan KAMACI <
> > > furkankamaci@gmail.com
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> > A job conf is passed to Spark to store RDD results. I needed a
> > > similar
> > > > >> > piece of code at my implementation (as like storing sth at data
> > > store
> > > > >> via
> > > > >> > Apache Gora). When I check the code, I thought that
> *reuseObjects
> > > > >> *variable
> > > > >> > is not necessary for setIOSerializations method at Apache Gora
> and
> > > > >> wanted
> > > > >> > to be sure to make compatible with Gora.
> > > > >> >
> > > > >> > On Tue, Aug 18, 2015 at 12:27 AM, Henry Saputra <
> > > > >> henry.saputra@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> I think it was from way back, didn't remember why.
> > > > >> >>
> > > > >> >> Why do you need to remove it to support Spark?
> > > > >> >>
> > > > >> >> - Henry
> > > > >> >>
> > > > >> >> On Mon, Aug 17, 2015 at 1:37 PM, Furkan KAMACI <
> > > > furkankamaci@gmail.com
> > > > >> >
> > > > >> >> wrote:
> > > > >> >> > Hi All,
> > > > >> >> >
> > > > >> >> > There is a method as:
> > > > >> >> >
> > > > >> >> >     public static void setIOSerializations(Configuration
> conf,
> > > > >> boolean
> > > > >> >> > reuseObjects)
> > > > >> >> >
> > > > >> >> > at GoraMapReduceUtils.java
> > > > >> >> >
> > > > >> >> > However, reuseObjects is never used at that method. I've
> > removed
> > > it
> > > > >> at
> > > > >> >> my
> > > > >> >> > Spark implementation. Is it necessary for future use?
> > > > >> >> >
> > > > >> >> > Kind Regards,
> > > > >> >> > Furkan KAMACI
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>