You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Torsten Bergh Moss <to...@ig.ntnu.no> on 2019/11/16 13:16:16 UTC

Large UDFs

Greetings devs,


Hope you are all enjoying your weekends.


I am trying to build a GPU-based UDF, and this UDF relies on a bunch of dependencies (one of them being the GPU-framework). In order to "bake" these dependencies into the UDF I am packaging it as a jar-with-dependencies, however, this jar ends up being too big to deploy as a UDF as the Hyracks Http Server cries out


[nioEventLoopGroup-5-7] WARN org.apache.hyracks.http.server.HttpRequestAggregator - A large request encountered. Closing the channel.


Is there any way to adjust these file size limits, or should UDFs with dependencies be handled some other way? I looked into the HttpRequestAggregator.java file and tried following some trails, but I can't seem to discover where the limit is actually set.


Best wishes,

Torsten

Re: Large UDFs

Posted by Torsten Bergh Moss <to...@ig.ntnu.no>.
Everything you said was correct, the server accepted my large UDF now, thank you!

Best wishes,
Torsten Bergh Moss
________________________________________
From: Murtadha Hubail <hu...@gmail.com>
Sent: Sunday, November 17, 2019 4:29 PM
To: Torsten Bergh Moss; dev@asterixdb.apache.org
Subject: Re: Large UDFs

Yes, and I believe it should go under the [common] config section. You will need to restart the asterixdb instance after that for the change to take effect. This property is configured in bytes. For example, if you want to set it to 100MB, it would be something like this:

[common]
max.web.request.size=104857600

Cheers,
Murtadha

On 11/17/2019, 6:17 PM, "Torsten Bergh Moss" <to...@ig.ntnu.no> wrote:

    Thanks Murtadha,

    Do I configure this property under [cc] inside cc.conf?

    Best wishes,
    Torsten
    ________________________________________
    From: Murtadha Hubail <hu...@gmail.com>
    Sent: Sunday, November 17, 2019 1:50 PM
    To: Torsten Bergh Moss; dev@asterixdb.apache.org
    Subject: Re: Large UDFs

    Torsten,

    The maximum HTTP request size is configurable using the property (max.web.request.size) and by default it is set to 50MB.

    Cheers,
    Murtadha

    On 11/17/2019, 3:34 PM, "Torsten Bergh Moss" <to...@ig.ntnu.no> wrote:

        I must say that I feel really confident that the problem has to do with the size of the UDF.

        I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked.

        However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries.

        And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel."

        Best wishes,
        Torsten

        ________________________________________
        From: Xikui Wang <xi...@uci.edu>
        Sent: Sunday, November 17, 2019 12:21 AM
        To: dev@asterixdb.apache.org
        Subject: Re: Large UDFs

        I think the warning message that you see probably is orthogonal to the
        dependencies that you are trying to add, since the installation of UDF
        merely copies the jar files to a designated location for AsterixDB to
        discover. It shouldn't touch the code that raises the warning message.
        Maybe that's related to how you interacted with system? Not sure...

        As for handling large dependency libraries, besides making a fat jar, you
        can also copy the dependency jar files into the
        "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
        deployed to the cluster together with AsterixDB and then be used by UDFs
        directly.

        Best,
        Xikui

        On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon <im...@uci.edu> wrote:

        > Sounds like a bug, can you share the UDF in question so I can debug it?
        >
        > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss <to...@ig.ntnu.no>
        > wrote:
        > >
        > > Greetings devs,
        > >
        > >
        > > Hope you are all enjoying your weekends.
        > >
        > >
        > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
        > dependencies (one of them being the GPU-framework). In order to "bake"
        > these dependencies into the UDF I am packaging it as a
        > jar-with-dependencies, however, this jar ends up being too big to deploy as
        > a UDF as the Hyracks Http Server cries out
        > >
        > >
        > > [nioEventLoopGroup-5-7] WARN
        > org.apache.hyracks.http.server.HttpRequestAggregator - A large request
        > encountered. Closing the channel.
        > >
        > >
        > > Is there any way to adjust these file size limits, or should UDFs with
        > dependencies be handled some other way? I looked into the
        > HttpRequestAggregator.java file and tried following some trails, but I
        > can't seem to discover where the limit is actually set.
        > >
        > >
        > > Best wishes,
        > >
        > > Torsten
        >







Re: Large UDFs

Posted by Murtadha Hubail <hu...@gmail.com>.
Yes, and I believe it should go under the [common] config section. You will need to restart the asterixdb instance after that for the change to take effect. This property is configured in bytes. For example, if you want to set it to 100MB, it would be something like this:

[common]
max.web.request.size=104857600

Cheers,
Murtadha

On 11/17/2019, 6:17 PM, "Torsten Bergh Moss" <to...@ig.ntnu.no> wrote:

    Thanks Murtadha,
    
    Do I configure this property under [cc] inside cc.conf?
    
    Best wishes,
    Torsten
    ________________________________________
    From: Murtadha Hubail <hu...@gmail.com>
    Sent: Sunday, November 17, 2019 1:50 PM
    To: Torsten Bergh Moss; dev@asterixdb.apache.org
    Subject: Re: Large UDFs
    
    Torsten,
    
    The maximum HTTP request size is configurable using the property (max.web.request.size) and by default it is set to 50MB.
    
    Cheers,
    Murtadha
    
    On 11/17/2019, 3:34 PM, "Torsten Bergh Moss" <to...@ig.ntnu.no> wrote:
    
        I must say that I feel really confident that the problem has to do with the size of the UDF.
    
        I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked.
    
        However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries.
    
        And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel."
    
        Best wishes,
        Torsten
    
        ________________________________________
        From: Xikui Wang <xi...@uci.edu>
        Sent: Sunday, November 17, 2019 12:21 AM
        To: dev@asterixdb.apache.org
        Subject: Re: Large UDFs
    
        I think the warning message that you see probably is orthogonal to the
        dependencies that you are trying to add, since the installation of UDF
        merely copies the jar files to a designated location for AsterixDB to
        discover. It shouldn't touch the code that raises the warning message.
        Maybe that's related to how you interacted with system? Not sure...
    
        As for handling large dependency libraries, besides making a fat jar, you
        can also copy the dependency jar files into the
        "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
        deployed to the cluster together with AsterixDB and then be used by UDFs
        directly.
    
        Best,
        Xikui
    
        On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon <im...@uci.edu> wrote:
    
        > Sounds like a bug, can you share the UDF in question so I can debug it?
        >
        > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss <to...@ig.ntnu.no>
        > wrote:
        > >
        > > Greetings devs,
        > >
        > >
        > > Hope you are all enjoying your weekends.
        > >
        > >
        > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
        > dependencies (one of them being the GPU-framework). In order to "bake"
        > these dependencies into the UDF I am packaging it as a
        > jar-with-dependencies, however, this jar ends up being too big to deploy as
        > a UDF as the Hyracks Http Server cries out
        > >
        > >
        > > [nioEventLoopGroup-5-7] WARN
        > org.apache.hyracks.http.server.HttpRequestAggregator - A large request
        > encountered. Closing the channel.
        > >
        > >
        > > Is there any way to adjust these file size limits, or should UDFs with
        > dependencies be handled some other way? I looked into the
        > HttpRequestAggregator.java file and tried following some trails, but I
        > can't seem to discover where the limit is actually set.
        > >
        > >
        > > Best wishes,
        > >
        > > Torsten
        >
    
    
    
    



Re: Large UDFs

Posted by Torsten Bergh Moss <to...@ig.ntnu.no>.
Thanks Murtadha,

Do I configure this property under [cc] inside cc.conf?

Best wishes,
Torsten
________________________________________
From: Murtadha Hubail <hu...@gmail.com>
Sent: Sunday, November 17, 2019 1:50 PM
To: Torsten Bergh Moss; dev@asterixdb.apache.org
Subject: Re: Large UDFs

Torsten,

The maximum HTTP request size is configurable using the property (max.web.request.size) and by default it is set to 50MB.

Cheers,
Murtadha

On 11/17/2019, 3:34 PM, "Torsten Bergh Moss" <to...@ig.ntnu.no> wrote:

    I must say that I feel really confident that the problem has to do with the size of the UDF.

    I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked.

    However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries.

    And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel."

    Best wishes,
    Torsten

    ________________________________________
    From: Xikui Wang <xi...@uci.edu>
    Sent: Sunday, November 17, 2019 12:21 AM
    To: dev@asterixdb.apache.org
    Subject: Re: Large UDFs

    I think the warning message that you see probably is orthogonal to the
    dependencies that you are trying to add, since the installation of UDF
    merely copies the jar files to a designated location for AsterixDB to
    discover. It shouldn't touch the code that raises the warning message.
    Maybe that's related to how you interacted with system? Not sure...

    As for handling large dependency libraries, besides making a fat jar, you
    can also copy the dependency jar files into the
    "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
    deployed to the cluster together with AsterixDB and then be used by UDFs
    directly.

    Best,
    Xikui

    On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon <im...@uci.edu> wrote:

    > Sounds like a bug, can you share the UDF in question so I can debug it?
    >
    > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss <to...@ig.ntnu.no>
    > wrote:
    > >
    > > Greetings devs,
    > >
    > >
    > > Hope you are all enjoying your weekends.
    > >
    > >
    > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
    > dependencies (one of them being the GPU-framework). In order to "bake"
    > these dependencies into the UDF I am packaging it as a
    > jar-with-dependencies, however, this jar ends up being too big to deploy as
    > a UDF as the Hyracks Http Server cries out
    > >
    > >
    > > [nioEventLoopGroup-5-7] WARN
    > org.apache.hyracks.http.server.HttpRequestAggregator - A large request
    > encountered. Closing the channel.
    > >
    > >
    > > Is there any way to adjust these file size limits, or should UDFs with
    > dependencies be handled some other way? I looked into the
    > HttpRequestAggregator.java file and tried following some trails, but I
    > can't seem to discover where the limit is actually set.
    > >
    > >
    > > Best wishes,
    > >
    > > Torsten
    >




Re: Large UDFs

Posted by Murtadha Hubail <hu...@gmail.com>.
Torsten,

The maximum HTTP request size is configurable using the property (max.web.request.size) and by default it is set to 50MB.

Cheers,
Murtadha

On 11/17/2019, 3:34 PM, "Torsten Bergh Moss" <to...@ig.ntnu.no> wrote:

    I must say that I feel really confident that the problem has to do with the size of the UDF. 
    
    I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked.
    
    However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries. 
    
    And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel."
    
    Best wishes,
    Torsten
    
    ________________________________________
    From: Xikui Wang <xi...@uci.edu>
    Sent: Sunday, November 17, 2019 12:21 AM
    To: dev@asterixdb.apache.org
    Subject: Re: Large UDFs
    
    I think the warning message that you see probably is orthogonal to the
    dependencies that you are trying to add, since the installation of UDF
    merely copies the jar files to a designated location for AsterixDB to
    discover. It shouldn't touch the code that raises the warning message.
    Maybe that's related to how you interacted with system? Not sure...
    
    As for handling large dependency libraries, besides making a fat jar, you
    can also copy the dependency jar files into the
    "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
    deployed to the cluster together with AsterixDB and then be used by UDFs
    directly.
    
    Best,
    Xikui
    
    On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon <im...@uci.edu> wrote:
    
    > Sounds like a bug, can you share the UDF in question so I can debug it?
    >
    > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss <to...@ig.ntnu.no>
    > wrote:
    > >
    > > Greetings devs,
    > >
    > >
    > > Hope you are all enjoying your weekends.
    > >
    > >
    > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
    > dependencies (one of them being the GPU-framework). In order to "bake"
    > these dependencies into the UDF I am packaging it as a
    > jar-with-dependencies, however, this jar ends up being too big to deploy as
    > a UDF as the Hyracks Http Server cries out
    > >
    > >
    > > [nioEventLoopGroup-5-7] WARN
    > org.apache.hyracks.http.server.HttpRequestAggregator - A large request
    > encountered. Closing the channel.
    > >
    > >
    > > Is there any way to adjust these file size limits, or should UDFs with
    > dependencies be handled some other way? I looked into the
    > HttpRequestAggregator.java file and tried following some trails, but I
    > can't seem to discover where the limit is actually set.
    > >
    > >
    > > Best wishes,
    > >
    > > Torsten
    >
    



Re: Large UDFs

Posted by Torsten Bergh Moss <to...@ig.ntnu.no>.
I must say that I feel really confident that the problem has to do with the size of the UDF. 

I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked.

However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries. 

And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel."

Best wishes,
Torsten

________________________________________
From: Xikui Wang <xi...@uci.edu>
Sent: Sunday, November 17, 2019 12:21 AM
To: dev@asterixdb.apache.org
Subject: Re: Large UDFs

I think the warning message that you see probably is orthogonal to the
dependencies that you are trying to add, since the installation of UDF
merely copies the jar files to a designated location for AsterixDB to
discover. It shouldn't touch the code that raises the warning message.
Maybe that's related to how you interacted with system? Not sure...

As for handling large dependency libraries, besides making a fat jar, you
can also copy the dependency jar files into the
"apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
deployed to the cluster together with AsterixDB and then be used by UDFs
directly.

Best,
Xikui

On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon <im...@uci.edu> wrote:

> Sounds like a bug, can you share the UDF in question so I can debug it?
>
> > On Nov 16, 2019, at 05:17, Torsten Bergh Moss <to...@ig.ntnu.no>
> wrote:
> >
> > Greetings devs,
> >
> >
> > Hope you are all enjoying your weekends.
> >
> >
> > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
> dependencies (one of them being the GPU-framework). In order to "bake"
> these dependencies into the UDF I am packaging it as a
> jar-with-dependencies, however, this jar ends up being too big to deploy as
> a UDF as the Hyracks Http Server cries out
> >
> >
> > [nioEventLoopGroup-5-7] WARN
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request
> encountered. Closing the channel.
> >
> >
> > Is there any way to adjust these file size limits, or should UDFs with
> dependencies be handled some other way? I looked into the
> HttpRequestAggregator.java file and tried following some trails, but I
> can't seem to discover where the limit is actually set.
> >
> >
> > Best wishes,
> >
> > Torsten
>

Re: Large UDFs

Posted by Xikui Wang <xi...@uci.edu>.
I think the warning message that you see probably is orthogonal to the
dependencies that you are trying to add, since the installation of UDF
merely copies the jar files to a designated location for AsterixDB to
discover. It shouldn't touch the code that raises the warning message.
Maybe that's related to how you interacted with system? Not sure...

As for handling large dependency libraries, besides making a fat jar, you
can also copy the dependency jar files into the
"apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
deployed to the cluster together with AsterixDB and then be used by UDFs
directly.

Best,
Xikui

On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon <im...@uci.edu> wrote:

> Sounds like a bug, can you share the UDF in question so I can debug it?
>
> > On Nov 16, 2019, at 05:17, Torsten Bergh Moss <to...@ig.ntnu.no>
> wrote:
> >
> > Greetings devs,
> >
> >
> > Hope you are all enjoying your weekends.
> >
> >
> > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
> dependencies (one of them being the GPU-framework). In order to "bake"
> these dependencies into the UDF I am packaging it as a
> jar-with-dependencies, however, this jar ends up being too big to deploy as
> a UDF as the Hyracks Http Server cries out
> >
> >
> > [nioEventLoopGroup-5-7] WARN
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request
> encountered. Closing the channel.
> >
> >
> > Is there any way to adjust these file size limits, or should UDFs with
> dependencies be handled some other way? I looked into the
> HttpRequestAggregator.java file and tried following some trails, but I
> can't seem to discover where the limit is actually set.
> >
> >
> > Best wishes,
> >
> > Torsten
>

Re: Large UDFs

Posted by Ian Maxon <im...@uci.edu>.
Sounds like a bug, can you share the UDF in question so I can debug it?

> On Nov 16, 2019, at 05:17, Torsten Bergh Moss <to...@ig.ntnu.no> wrote:
> 
> Greetings devs,
> 
> 
> Hope you are all enjoying your weekends.
> 
> 
> I am trying to build a GPU-based UDF, and this UDF relies on a bunch of dependencies (one of them being the GPU-framework). In order to "bake" these dependencies into the UDF I am packaging it as a jar-with-dependencies, however, this jar ends up being too big to deploy as a UDF as the Hyracks Http Server cries out
> 
> 
> [nioEventLoopGroup-5-7] WARN org.apache.hyracks.http.server.HttpRequestAggregator - A large request encountered. Closing the channel.
> 
> 
> Is there any way to adjust these file size limits, or should UDFs with dependencies be handled some other way? I looked into the HttpRequestAggregator.java file and tried following some trails, but I can't seem to discover where the limit is actually set.
> 
> 
> Best wishes,
> 
> Torsten