You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@asterixdb.apache.org by Riyafa Abdul Hameed <ri...@apache.org> on 2017/07/11 17:03:26 UTC

Simplifying the creation of functions

Dear all,

I have been creating a few functions that act on geometry datatype. In each
of these functions I have been serializing or/and serializing geometry or
other datatypes. Some of these functions are STAreaDescriptor,
STIntersectsDescriptor and STMakePointDescriptor. As can be seen in these
implmentations[1] I am repeating code (Of course this is not the most
efficient way to implement because we are using  the Esri api library, but
we have given precedence for convenience over efficiency at the moment).
The number of functions to be implemented amounts to about 80[2]. This
means same code might get repeated over and over again. The implementations
of other functions also seem to do the same thing (ie. to deserialize and
then serialize)

The problem is that the arguments passed to the function via
"createEvaluatorFactory(final IScalarEvaluatorFactory[] args)" is not in
the deserialized format and the  return value should be in the serialized
format. My question is whether there's a simpler way to implement a
function where the arguments would be passed in the deserialized format and
then in function implementation we can simply return the result rather than
serializing it before returning. This would simplify the function
implementations and improve code reuse.


[1] https://asterix-gerrit.ics.uci.edu/1838
[2] https://postgis.net/docs/reference.html

Thank you.
Yours sincerely,
Riyafa

Re: Simplifying the creation of functions

Posted by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>.

Hi,

Thank you. I shall try following a similar approach in implementing geojson
functions as well

Yours sincerely,
Riyafa

On 11 July 2017 at 23:52, Ahmed Eldawy <el...@cs.ucr.edu> wrote:

> Hi Yingyi,
>
> That's a good idea.
> @Riyafa: I think we can follow a similar approach by creating one base
> class for each function type we want to provide based on the function
> signature.
>
> Thanks
> Ahmed
>
> On Tue, Jul 11, 2017 at 11:04 AM, Yingyi Bu <bu...@gmail.com> wrote:
>
> > Hi Ahmed,
> >
> >     You can build a class hierarchy or util classes that hide the details
> > of serde particularly for your functions.  Numeric functions are such
> > examples:
> >
> > https://github.com/apache/asterixdb/blob/master/
> > asterixdb/asterix-runtime/src/main/java/org/apache/asterix/
> > runtime/evaluators/functions/NumericAddDescriptor.java
> >
> > Best,
> > Yingyi
> >
> >
> > On Tue, Jul 11, 2017 at 10:50 AM, Ahmed Eldawy <el...@cs.ucr.edu>
> wrote:
> >
> > > Hi Yingyi,
> > >
> > > Thanks for your response. I understand that
> serialization/deserialization
> > > consumes some CPU cycles. Given the large number of functions we're
> > > planning to implement, we prefer a more productive method with a
> minimal
> > > number of code lines. If such an example exists, can you point us to an
> > > example of a UDF where AsterixDB automatically handles the
> > > serialization/deserialization while the function only handles the
> > function
> > > logic?
> > >
> > > Thanks
> > > Ahmed
> > >
> > > On Tue, Jul 11, 2017 at 10:21 AM, Yingyi Bu <bu...@gmail.com>
> wrote:
> > >
> > > > Hi Riyafa,
> > > >
> > > > -- My question is whether there's a simpler way to implement a
> > > > -- function where the arguments would be passed in the deserialized
> > > format
> > > > and
> > > > -- then in function implementation we can simply return the result
> > rather
> > > > than
> > > > -- serializing it before returning.
> > > >
> > > >    The evaluator interface itself doesn't force an implementation to
> > > > deserialize the input and then serialize the output --- "input" is a
> > > region
> > > > of bytes and the "result" pointer can bind to the byte region for the
> > > > output:
> > > >    public void evaluate(IFrameTupleReference input, IPointable
> result)
> > > >
> > > >    SerDe consumes CPU time and thus for performance reasons, we don't
> > > pass
> > > > in deserialized Java objects as function parameters. Therefore, SerDe
> > is
> > > > not mandatory for a function implementation and some existing
> functions
> > > do
> > > > not do SerDe.
> > > >
> > > > Best,
> > > > Yingyi
> > > >
> > > >
> > > > On Tue, Jul 11, 2017 at 10:03 AM, Riyafa Abdul Hameed <
> > riyafa@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Dear all,
> > > > >
> > > > > I have been creating a few functions that act on geometry datatype.
> > In
> > > > each
> > > > > of these functions I have been serializing or/and serializing
> > geometry
> > > or
> > > > > other datatypes. Some of these functions are STAreaDescriptor,
> > > > > STIntersectsDescriptor and STMakePointDescriptor. As can be seen in
> > > these
> > > > > implmentations[1] I am repeating code (Of course this is not the
> most
> > > > > efficient way to implement because we are using  the Esri api
> > library,
> > > > but
> > > > > we have given precedence for convenience over efficiency at the
> > > moment).
> > > > > The number of functions to be implemented amounts to about 80[2].
> > This
> > > > > means same code might get repeated over and over again. The
> > > > implementations
> > > > > of other functions also seem to do the same thing (ie. to
> deserialize
> > > and
> > > > > then serialize)
> > > > >
> > > > > The problem is that the arguments passed to the function via
> > > > > "createEvaluatorFactory(final IScalarEvaluatorFactory[] args)" is
> not
> > > in
> > > > > the deserialized format and the  return value should be in the
> > > serialized
> > > > > format. My question is whether there's a simpler way to implement a
> > > > > function where the arguments would be passed in the deserialized
> > format
> > > > and
> > > > > then in function implementation we can simply return the result
> > rather
> > > > than
> > > > > serializing it before returning. This would simplify the function
> > > > > implementations and improve code reuse.
> > > > >
> > > > >
> > > > > [1] https://asterix-gerrit.ics.uci.edu/1838
> > > > > [2] https://postgis.net/docs/reference.html
> > > > >
> > > > > Thank you.
> > > > > Yours sincerely,
> > > > > Riyafa
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Ahmed Eldawy
> > > Assistant Professor
> > > Computer Science and Engineering
> > > http://www.cs.ucr.edu/~eldawy
> > > Tel: +1 (951) 827-5654 <(951)%20827-5654>
> > >
> >
>
>
>
> --
>
> Ahmed Eldawy
> Assistant Professor
> Computer Science and Engineering
> http://www.cs.ucr.edu/~eldawy
> Tel: +1 (951) 827-5654
>



-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: Simplifying the creation of functions

Posted by Ahmed Eldawy <el...@cs.ucr.edu>.

Hi Yingyi,

That's a good idea.
@Riyafa: I think we can follow a similar approach by creating one base
class for each function type we want to provide based on the function
signature.

Thanks
Ahmed

On Tue, Jul 11, 2017 at 11:04 AM, Yingyi Bu <bu...@gmail.com> wrote:

> Hi Ahmed,
>
>     You can build a class hierarchy or util classes that hide the details
> of serde particularly for your functions.  Numeric functions are such
> examples:
>
> https://github.com/apache/asterixdb/blob/master/
> asterixdb/asterix-runtime/src/main/java/org/apache/asterix/
> runtime/evaluators/functions/NumericAddDescriptor.java
>
> Best,
> Yingyi
>
>
> On Tue, Jul 11, 2017 at 10:50 AM, Ahmed Eldawy <el...@cs.ucr.edu> wrote:
>
> > Hi Yingyi,
> >
> > Thanks for your response. I understand that serialization/deserialization
> > consumes some CPU cycles. Given the large number of functions we're
> > planning to implement, we prefer a more productive method with a minimal
> > number of code lines. If such an example exists, can you point us to an
> > example of a UDF where AsterixDB automatically handles the
> > serialization/deserialization while the function only handles the
> function
> > logic?
> >
> > Thanks
> > Ahmed
> >
> > On Tue, Jul 11, 2017 at 10:21 AM, Yingyi Bu <bu...@gmail.com> wrote:
> >
> > > Hi Riyafa,
> > >
> > > -- My question is whether there's a simpler way to implement a
> > > -- function where the arguments would be passed in the deserialized
> > format
> > > and
> > > -- then in function implementation we can simply return the result
> rather
> > > than
> > > -- serializing it before returning.
> > >
> > >    The evaluator interface itself doesn't force an implementation to
> > > deserialize the input and then serialize the output --- "input" is a
> > region
> > > of bytes and the "result" pointer can bind to the byte region for the
> > > output:
> > >    public void evaluate(IFrameTupleReference input, IPointable result)
> > >
> > >    SerDe consumes CPU time and thus for performance reasons, we don't
> > pass
> > > in deserialized Java objects as function parameters. Therefore, SerDe
> is
> > > not mandatory for a function implementation and some existing functions
> > do
> > > not do SerDe.
> > >
> > > Best,
> > > Yingyi
> > >
> > >
> > > On Tue, Jul 11, 2017 at 10:03 AM, Riyafa Abdul Hameed <
> riyafa@apache.org
> > >
> > > wrote:
> > >
> > > > Dear all,
> > > >
> > > > I have been creating a few functions that act on geometry datatype.
> In
> > > each
> > > > of these functions I have been serializing or/and serializing
> geometry
> > or
> > > > other datatypes. Some of these functions are STAreaDescriptor,
> > > > STIntersectsDescriptor and STMakePointDescriptor. As can be seen in
> > these
> > > > implmentations[1] I am repeating code (Of course this is not the most
> > > > efficient way to implement because we are using  the Esri api
> library,
> > > but
> > > > we have given precedence for convenience over efficiency at the
> > moment).
> > > > The number of functions to be implemented amounts to about 80[2].
> This
> > > > means same code might get repeated over and over again. The
> > > implementations
> > > > of other functions also seem to do the same thing (ie. to deserialize
> > and
> > > > then serialize)
> > > >
> > > > The problem is that the arguments passed to the function via
> > > > "createEvaluatorFactory(final IScalarEvaluatorFactory[] args)" is not
> > in
> > > > the deserialized format and the  return value should be in the
> > serialized
> > > > format. My question is whether there's a simpler way to implement a
> > > > function where the arguments would be passed in the deserialized
> format
> > > and
> > > > then in function implementation we can simply return the result
> rather
> > > than
> > > > serializing it before returning. This would simplify the function
> > > > implementations and improve code reuse.
> > > >
> > > >
> > > > [1] https://asterix-gerrit.ics.uci.edu/1838
> > > > [2] https://postgis.net/docs/reference.html
> > > >
> > > > Thank you.
> > > > Yours sincerely,
> > > > Riyafa
> > > >
> > >
> >
> >
> >
> > --
> >
> > Ahmed Eldawy
> > Assistant Professor
> > Computer Science and Engineering
> > http://www.cs.ucr.edu/~eldawy
> > Tel: +1 (951) 827-5654 <(951)%20827-5654>
> >
>



-- 

Ahmed Eldawy
Assistant Professor
Computer Science and Engineering
http://www.cs.ucr.edu/~eldawy
Tel: +1 (951) 827-5654

Re: Simplifying the creation of functions

Posted by Yingyi Bu <bu...@gmail.com>.

Hi Ahmed,

    You can build a class hierarchy or util classes that hide the details
of serde particularly for your functions.  Numeric functions are such
examples:

https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/NumericAddDescriptor.java

Best,
Yingyi


On Tue, Jul 11, 2017 at 10:50 AM, Ahmed Eldawy <el...@cs.ucr.edu> wrote:

> Hi Yingyi,
>
> Thanks for your response. I understand that serialization/deserialization
> consumes some CPU cycles. Given the large number of functions we're
> planning to implement, we prefer a more productive method with a minimal
> number of code lines. If such an example exists, can you point us to an
> example of a UDF where AsterixDB automatically handles the
> serialization/deserialization while the function only handles the function
> logic?
>
> Thanks
> Ahmed
>
> On Tue, Jul 11, 2017 at 10:21 AM, Yingyi Bu <bu...@gmail.com> wrote:
>
> > Hi Riyafa,
> >
> > -- My question is whether there's a simpler way to implement a
> > -- function where the arguments would be passed in the deserialized
> format
> > and
> > -- then in function implementation we can simply return the result rather
> > than
> > -- serializing it before returning.
> >
> >    The evaluator interface itself doesn't force an implementation to
> > deserialize the input and then serialize the output --- "input" is a
> region
> > of bytes and the "result" pointer can bind to the byte region for the
> > output:
> >    public void evaluate(IFrameTupleReference input, IPointable result)
> >
> >    SerDe consumes CPU time and thus for performance reasons, we don't
> pass
> > in deserialized Java objects as function parameters. Therefore, SerDe is
> > not mandatory for a function implementation and some existing functions
> do
> > not do SerDe.
> >
> > Best,
> > Yingyi
> >
> >
> > On Tue, Jul 11, 2017 at 10:03 AM, Riyafa Abdul Hameed <riyafa@apache.org
> >
> > wrote:
> >
> > > Dear all,
> > >
> > > I have been creating a few functions that act on geometry datatype. In
> > each
> > > of these functions I have been serializing or/and serializing geometry
> or
> > > other datatypes. Some of these functions are STAreaDescriptor,
> > > STIntersectsDescriptor and STMakePointDescriptor. As can be seen in
> these
> > > implmentations[1] I am repeating code (Of course this is not the most
> > > efficient way to implement because we are using  the Esri api library,
> > but
> > > we have given precedence for convenience over efficiency at the
> moment).
> > > The number of functions to be implemented amounts to about 80[2]. This
> > > means same code might get repeated over and over again. The
> > implementations
> > > of other functions also seem to do the same thing (ie. to deserialize
> and
> > > then serialize)
> > >
> > > The problem is that the arguments passed to the function via
> > > "createEvaluatorFactory(final IScalarEvaluatorFactory[] args)" is not
> in
> > > the deserialized format and the  return value should be in the
> serialized
> > > format. My question is whether there's a simpler way to implement a
> > > function where the arguments would be passed in the deserialized format
> > and
> > > then in function implementation we can simply return the result rather
> > than
> > > serializing it before returning. This would simplify the function
> > > implementations and improve code reuse.
> > >
> > >
> > > [1] https://asterix-gerrit.ics.uci.edu/1838
> > > [2] https://postgis.net/docs/reference.html
> > >
> > > Thank you.
> > > Yours sincerely,
> > > Riyafa
> > >
> >
>
>
>
> --
>
> Ahmed Eldawy
> Assistant Professor
> Computer Science and Engineering
> http://www.cs.ucr.edu/~eldawy
> Tel: +1 (951) 827-5654 <(951)%20827-5654>
>

Re: Simplifying the creation of functions

Posted by Ahmed Eldawy <el...@cs.ucr.edu>.

Hi Yingyi,

Thanks for your response. I understand that serialization/deserialization
consumes some CPU cycles. Given the large number of functions we're
planning to implement, we prefer a more productive method with a minimal
number of code lines. If such an example exists, can you point us to an
example of a UDF where AsterixDB automatically handles the
serialization/deserialization while the function only handles the function
logic?

Thanks
Ahmed

On Tue, Jul 11, 2017 at 10:21 AM, Yingyi Bu <bu...@gmail.com> wrote:

> Hi Riyafa,
>
> -- My question is whether there's a simpler way to implement a
> -- function where the arguments would be passed in the deserialized format
> and
> -- then in function implementation we can simply return the result rather
> than
> -- serializing it before returning.
>
>    The evaluator interface itself doesn't force an implementation to
> deserialize the input and then serialize the output --- "input" is a region
> of bytes and the "result" pointer can bind to the byte region for the
> output:
>    public void evaluate(IFrameTupleReference input, IPointable result)
>
>    SerDe consumes CPU time and thus for performance reasons, we don't pass
> in deserialized Java objects as function parameters. Therefore, SerDe is
> not mandatory for a function implementation and some existing functions do
> not do SerDe.
>
> Best,
> Yingyi
>
>
> On Tue, Jul 11, 2017 at 10:03 AM, Riyafa Abdul Hameed <ri...@apache.org>
> wrote:
>
> > Dear all,
> >
> > I have been creating a few functions that act on geometry datatype. In
> each
> > of these functions I have been serializing or/and serializing geometry or
> > other datatypes. Some of these functions are STAreaDescriptor,
> > STIntersectsDescriptor and STMakePointDescriptor. As can be seen in these
> > implmentations[1] I am repeating code (Of course this is not the most
> > efficient way to implement because we are using  the Esri api library,
> but
> > we have given precedence for convenience over efficiency at the moment).
> > The number of functions to be implemented amounts to about 80[2]. This
> > means same code might get repeated over and over again. The
> implementations
> > of other functions also seem to do the same thing (ie. to deserialize and
> > then serialize)
> >
> > The problem is that the arguments passed to the function via
> > "createEvaluatorFactory(final IScalarEvaluatorFactory[] args)" is not in
> > the deserialized format and the  return value should be in the serialized
> > format. My question is whether there's a simpler way to implement a
> > function where the arguments would be passed in the deserialized format
> and
> > then in function implementation we can simply return the result rather
> than
> > serializing it before returning. This would simplify the function
> > implementations and improve code reuse.
> >
> >
> > [1] https://asterix-gerrit.ics.uci.edu/1838
> > [2] https://postgis.net/docs/reference.html
> >
> > Thank you.
> > Yours sincerely,
> > Riyafa
> >
>



-- 

Ahmed Eldawy
Assistant Professor
Computer Science and Engineering
http://www.cs.ucr.edu/~eldawy
Tel: +1 (951) 827-5654 <(951)%20827-5654>

Re: Simplifying the creation of functions

Posted by Yingyi Bu <bu...@gmail.com>.

Hi Riyafa,

-- My question is whether there's a simpler way to implement a
-- function where the arguments would be passed in the deserialized format
and
-- then in function implementation we can simply return the result rather
than
-- serializing it before returning.

   The evaluator interface itself doesn't force an implementation to
deserialize the input and then serialize the output --- "input" is a region
of bytes and the "result" pointer can bind to the byte region for the
output:
   public void evaluate(IFrameTupleReference input, IPointable result)

   SerDe consumes CPU time and thus for performance reasons, we don't pass
in deserialized Java objects as function parameters. Therefore, SerDe is
not mandatory for a function implementation and some existing functions do
not do SerDe.

Best,
Yingyi


On Tue, Jul 11, 2017 at 10:03 AM, Riyafa Abdul Hameed <ri...@apache.org>
wrote:

> Dear all,
>
> I have been creating a few functions that act on geometry datatype. In each
> of these functions I have been serializing or/and serializing geometry or
> other datatypes. Some of these functions are STAreaDescriptor,
> STIntersectsDescriptor and STMakePointDescriptor. As can be seen in these
> implmentations[1] I am repeating code (Of course this is not the most
> efficient way to implement because we are using  the Esri api library, but
> we have given precedence for convenience over efficiency at the moment).
> The number of functions to be implemented amounts to about 80[2]. This
> means same code might get repeated over and over again. The implementations
> of other functions also seem to do the same thing (ie. to deserialize and
> then serialize)
>
> The problem is that the arguments passed to the function via
> "createEvaluatorFactory(final IScalarEvaluatorFactory[] args)" is not in
> the deserialized format and the  return value should be in the serialized
> format. My question is whether there's a simpler way to implement a
> function where the arguments would be passed in the deserialized format and
> then in function implementation we can simply return the result rather than
> serializing it before returning. This would simplify the function
> implementations and improve code reuse.
>
>
> [1] https://asterix-gerrit.ics.uci.edu/1838
> [2] https://postgis.net/docs/reference.html
>
> Thank you.
> Yours sincerely,
> Riyafa
>