You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Chaudhuri, Rajiv" <ra...@pearson.com> on 2013/12/11 20:47:20 UTC

How to set query timeout and threds kill implicity

Hi,

Our application which is running at Tomcat server is making query to Fuseki
TDB store using HTTP protocol.

We have observed that thread count for the Fuseki process is increasing
with time.

I know there might be code issue and resources should be closed properly.

*But we would like to achieve the same using configuration at Fuseki
server- i.e. If the query takes too much time (which means that resource is
not closed by application server) then there should be time out and Fuseki
server should implicitly kill the thread.*


The impact of this thread count getting increased is Fuseki server stop
responding and process gets kill automatically after sometimes and we have
to restart the Fuseki server again.

-- 
Regards,
*Rajiv Chaudhuri*
*Cell(O)# 2013642598*
*Cell(P)# 6093563706*

Re: How to set query timeout and threds kill implicity

Posted by Andy Seaborne <an...@apache.org>.
Rajiv,

I have failed to reproduce this.  I ran a Fuseki server and sent 1000's 
of ASK queries to it.  No threads were created.

I used quite simple ASK queries - maybe there is complexity in queryStr?

The results of an ASK query are about 400 bytes on the wire, including 
HTTP header - that fits in an IP packet and so in TCP buffering.  The 
server will respond to a request, finish the HTTP response and send it 
without any way for the client to lead the connection hanging around.

But your server is under a different load so I would speculate that the 
change you made perturbed the situation but is not the root cause.  If 
so, problems may arise later.

Could you please point a profiler at the server and report what the 
additional threads are doing?  jvisualvm will do this to a running 
server (thread dump under the treads section) as will YourKit, which is 
easier to read afterwards.

	Andy

On 12/12/13 18:57, Andy Seaborne wrote:
> On 12/12/13 18:41, Chaudhuri, Rajiv wrote:
>> QueryExecution qexec =
>> QueryExecutionFactory.sparqlService(queryServiceIRI, queryStr);
>> try {
>> boolean result = qexec.execAsk();
>> if (! result) {
>> }
>> } finally {
>> qexec.close();
>> }
>>
>>
>
> Thanks.
>
> Recorded as JENA-610
> https://issues.apache.org/jira/browse/JENA-610
>
>      Andy
>
>>
>> On Thu, Dec 12, 2013 at 1:36 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>> On 12/12/13 18:15, Chaudhuri, Rajiv wrote:
>>>
>>>> The issue got resolved. execAsk does not release the thread even if we
>>>> close the query execution; as an alternative we have used execSelect
>>>> and
>>>> check whether data exist or not.
>>>>
>>>
>>> Then that's a bug we need to fix.
>>>
>>> Which format are you using in the request to the server?
>>>
>>>          Andy



Re: How to set query timeout and threds kill implicity

Posted by Andy Seaborne <an...@apache.org>.
On 12/12/13 18:41, Chaudhuri, Rajiv wrote:
> QueryExecution qexec =
> QueryExecutionFactory.sparqlService(queryServiceIRI, queryStr);
> try {
> boolean result = qexec.execAsk();
> if (! result) {
> }
> } finally {
> qexec.close();
> }
>
>

Thanks.

Recorded as JENA-610
https://issues.apache.org/jira/browse/JENA-610

	Andy

>
> On Thu, Dec 12, 2013 at 1:36 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 12/12/13 18:15, Chaudhuri, Rajiv wrote:
>>
>>> The issue got resolved. execAsk does not release the thread even if we
>>> close the query execution; as an alternative we have used execSelect and
>>> check whether data exist or not.
>>>
>>
>> Then that's a bug we need to fix.
>>
>> Which format are you using in the request to the server?
>>
>>          Andy
>>
>>
>>
>>>
>>> On Thu, Dec 12, 2013 at 12:24 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>   Setting the timeout is a good idea but there are a couple of things in
>>>> Rajiv's message that suggest that is not all that is going on.
>>>>
>>>>
>>>>    We have observed that thread count for the Fuseki process is increasing
>>>>
>>>>> with time.
>>>>>
>>>>>
>>>>   I know there might be code issue and resources should be
>>>>> closed properly
>>>>>
>>>>
>>>> Are we talking about SELECT queries?
>>>> How many concurrent users of the application side are there?
>>>> And are the results quite large (larger than TCP buffering)?
>>>>
>>>> The only way I can think of, at the moment, is if the client is keeping
>>>> the TCP connection open and the TCP connection filling up, causing backup
>>>> at the server.  Given, you can calling it from Tomcat, is the Tomcat
>>>> application code long lived?
>>>>
>>>> If you can attach a java  debugger to the server, could you tell me what
>>>> the threads are doing (or not doing)?
>>>>
>>>> If the client is consuming the results very slowly, or jumping out of the
>>>> ResultSet loop prematurely, then the server has no way of knowing whether
>>>> the client application code has really finished or is going to come back
>>>> later.
>>>>
>>>> The pattern:
>>>> QueryExecution exec = QueryExecutionFactory.create(q,ds) ;
>>>> try {
>>>>      ResultSet rs = exec.execSelect() ;
>>>>      while(rs.hasNext()) {
>>>>            return ...
>>>>      }
>>>> } finally {
>>>>    exec.close();
>>>> }
>>>>
>>>> catches that.
>>>>
>>>> Another approach is
>>>>
>>>>       ResultSet rs = exec.execSelect() ;
>>>>       rs = ResultSetFactory.copyResults(rs) ;
>>>>
>>>> which forces the ARQ code to pull in all the results, which than releases
>>>> the connection back tot he local connection pool.
>>>>
>>>> It does not matter whether we're talking streaming or if Fuseki gets all
>>>> the query results, then sent the results (which it doesn't do - it
>>>> streams
>>>> when possible) but it does make this vector more likely.
>>>>
>>>> To defend the server, Brian's suggestion of a reverse proxy gives you a
>>>> control point.  Rather than put every mechanism needed into Fuseki,
>>>> keeping
>>>> it focues on beign a SPARQL engine and using well-known, well-engineered
>>>> other components.
>>>>
>>>> The default configuration of Fuseki is to use Jetty's
>>>> BlockingChannelConnector but it is possible to complete take control of
>>>> the
>>>> jetty configuration with --jetty-config.
>>>>
>>>> (SelectChannelConnector might have been better but it has been reported
>>>> to
>>>> be unstable on OS/X for Fuseki's usage patterns)
>>>>
>>>>
>>>> It looks like an (unintentional) "Slow HTTP" DOS attack but on the
>>>> response consumption side, not the request sending.
>>>>
>>>> https://community.qualys.com/blogs/securitylabs/2011/11/02/
>>>> how-to-protect-against-slow-http-attacks
>>>>
>>>> The client OS will still be ack'ing the TCP connection and may do so
>>>> forever from a server-style app like from Tomcat.
>>>>
>>>> Setting the query time may reduce the general effect but it will not free
>>>> up threads.  A query timeout is a graceful abort of the query - the query
>>>> gets a chance to clean up and that needs the query execution in the
>>>> server
>>>> getting called.
>>>>
>>>> Actually killing threads in Java without the threads involvement is bad.
>>>>
>>>> http://docs.oracle.com/javase/1.5.0/docs/guide/misc/
>>>> threadPrimitiveDeprecation.html
>>>>
>>>>           Andy
>>>>
>>>>
>>>>
>>>> On 12/12/13 09:33, Brian McBride wrote:
>>>>
>>>>   Hi Rajiv,
>>>>>
>>>>> You can find how to set query timeouts in the Fuseki documentation at
>>>>> [1]
>>>>>
>>>>> To set server wide timeout:
>>>>>
>>>>> []  rdf:type  fuseki:Server  ;
>>>>>       #Server-wide  context  parameters  can  be  given  here.
>>>>>       #For  example,  to  set  query  timeouts:  on  a  server-wide
>>>>>   basis:
>>>>>       #Format  1:  "1000"--  1second  timeout
>>>>>       #Format  2:  "10000,60000"--  10s  timeout  to  first  result,
>>>>> then  60s  timeout  to  for  rest  of  query.
>>>>>       #See  java  doc  for  ARQ.queryTimeout
>>>>>       #ja:context  [  ja:cxtName  "arq:queryTimeout";   ja:cxtValue
>>>>> "10000"]  ;
>>>>>
>>>>> Uncomment the last line and set the value you require per the comments
>>>>> above.
>>>>>
>>>>> Do you have a reverse proxy configured to limit the rate at which
>>>>> requests are going to Fuseki?  Can you confirm that all your requests
>>>>> are going through that proxy?
>>>>>
>>>>> Brian
>>>>>
>>>>> [1]
>>>>> http://jena.apache.org/documentation/serving_data/#
>>>>> running-a-fuseki-server
>>>>>
>>>>>
>>>>> On 11/12/2013 19:47, Chaudhuri, Rajiv wrote:
>>>>>
>>>>>   Hi,
>>>>>>
>>>>>> Our application which is running at Tomcat server is making query to
>>>>>> Fuseki
>>>>>> TDB store using HTTP protocol.
>>>>>>
>>>>>> We have observed that thread count for the Fuseki process is increasing
>>>>>> with time.
>>>>>>
>>>>>> I know there might be code issue and resources should be closed
>>>>>> properly.
>>>>>>
>>>>>> *But we would like to achieve the same using configuration at Fuseki
>>>>>> server- i.e. If the query takes too much time (which means that
>>>>>> resource is
>>>>>> not closed by application server) then there should be time out and
>>>>>> Fuseki
>>>>>> server should implicitly kill the thread.*
>>>>>>
>>>>>>
>>>>>> The impact of this thread count getting increased is Fuseki server stop
>>>>>> responding and process gets kill automatically after sometimes and we
>>>>>> have
>>>>>> to restart the Fuseki server again.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>


Re: How to set query timeout and threds kill implicity

Posted by "Chaudhuri, Rajiv" <ra...@pearson.com>.
QueryExecution qexec =
QueryExecutionFactory.sparqlService(queryServiceIRI, queryStr);
try {
boolean result = qexec.execAsk();
if (! result) {
}
} finally {
qexec.close();
}



On Thu, Dec 12, 2013 at 1:36 PM, Andy Seaborne <an...@apache.org> wrote:

> On 12/12/13 18:15, Chaudhuri, Rajiv wrote:
>
>> The issue got resolved. execAsk does not release the thread even if we
>> close the query execution; as an alternative we have used execSelect and
>> check whether data exist or not.
>>
>
> Then that's a bug we need to fix.
>
> Which format are you using in the request to the server?
>
>         Andy
>
>
>
>>
>> On Thu, Dec 12, 2013 at 12:24 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>  Setting the timeout is a good idea but there are a couple of things in
>>> Rajiv's message that suggest that is not all that is going on.
>>>
>>>
>>>   We have observed that thread count for the Fuseki process is increasing
>>>
>>>> with time.
>>>>
>>>>
>>>  I know there might be code issue and resources should be
>>>> closed properly
>>>>
>>>
>>> Are we talking about SELECT queries?
>>> How many concurrent users of the application side are there?
>>> And are the results quite large (larger than TCP buffering)?
>>>
>>> The only way I can think of, at the moment, is if the client is keeping
>>> the TCP connection open and the TCP connection filling up, causing backup
>>> at the server.  Given, you can calling it from Tomcat, is the Tomcat
>>> application code long lived?
>>>
>>> If you can attach a java  debugger to the server, could you tell me what
>>> the threads are doing (or not doing)?
>>>
>>> If the client is consuming the results very slowly, or jumping out of the
>>> ResultSet loop prematurely, then the server has no way of knowing whether
>>> the client application code has really finished or is going to come back
>>> later.
>>>
>>> The pattern:
>>> QueryExecution exec = QueryExecutionFactory.create(q,ds) ;
>>> try {
>>>     ResultSet rs = exec.execSelect() ;
>>>     while(rs.hasNext()) {
>>>           return ...
>>>     }
>>> } finally {
>>>   exec.close();
>>> }
>>>
>>> catches that.
>>>
>>> Another approach is
>>>
>>>      ResultSet rs = exec.execSelect() ;
>>>      rs = ResultSetFactory.copyResults(rs) ;
>>>
>>> which forces the ARQ code to pull in all the results, which than releases
>>> the connection back tot he local connection pool.
>>>
>>> It does not matter whether we're talking streaming or if Fuseki gets all
>>> the query results, then sent the results (which it doesn't do - it
>>> streams
>>> when possible) but it does make this vector more likely.
>>>
>>> To defend the server, Brian's suggestion of a reverse proxy gives you a
>>> control point.  Rather than put every mechanism needed into Fuseki,
>>> keeping
>>> it focues on beign a SPARQL engine and using well-known, well-engineered
>>> other components.
>>>
>>> The default configuration of Fuseki is to use Jetty's
>>> BlockingChannelConnector but it is possible to complete take control of
>>> the
>>> jetty configuration with --jetty-config.
>>>
>>> (SelectChannelConnector might have been better but it has been reported
>>> to
>>> be unstable on OS/X for Fuseki's usage patterns)
>>>
>>>
>>> It looks like an (unintentional) "Slow HTTP" DOS attack but on the
>>> response consumption side, not the request sending.
>>>
>>> https://community.qualys.com/blogs/securitylabs/2011/11/02/
>>> how-to-protect-against-slow-http-attacks
>>>
>>> The client OS will still be ack'ing the TCP connection and may do so
>>> forever from a server-style app like from Tomcat.
>>>
>>> Setting the query time may reduce the general effect but it will not free
>>> up threads.  A query timeout is a graceful abort of the query - the query
>>> gets a chance to clean up and that needs the query execution in the
>>> server
>>> getting called.
>>>
>>> Actually killing threads in Java without the threads involvement is bad.
>>>
>>> http://docs.oracle.com/javase/1.5.0/docs/guide/misc/
>>> threadPrimitiveDeprecation.html
>>>
>>>          Andy
>>>
>>>
>>>
>>> On 12/12/13 09:33, Brian McBride wrote:
>>>
>>>  Hi Rajiv,
>>>>
>>>> You can find how to set query timeouts in the Fuseki documentation at
>>>> [1]
>>>>
>>>> To set server wide timeout:
>>>>
>>>> []  rdf:type  fuseki:Server  ;
>>>>      #Server-wide  context  parameters  can  be  given  here.
>>>>      #For  example,  to  set  query  timeouts:  on  a  server-wide
>>>>  basis:
>>>>      #Format  1:  "1000"--  1second  timeout
>>>>      #Format  2:  "10000,60000"--  10s  timeout  to  first  result,
>>>> then  60s  timeout  to  for  rest  of  query.
>>>>      #See  java  doc  for  ARQ.queryTimeout
>>>>      #ja:context  [  ja:cxtName  "arq:queryTimeout";   ja:cxtValue
>>>> "10000"]  ;
>>>>
>>>> Uncomment the last line and set the value you require per the comments
>>>> above.
>>>>
>>>> Do you have a reverse proxy configured to limit the rate at which
>>>> requests are going to Fuseki?  Can you confirm that all your requests
>>>> are going through that proxy?
>>>>
>>>> Brian
>>>>
>>>> [1]
>>>> http://jena.apache.org/documentation/serving_data/#
>>>> running-a-fuseki-server
>>>>
>>>>
>>>> On 11/12/2013 19:47, Chaudhuri, Rajiv wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> Our application which is running at Tomcat server is making query to
>>>>> Fuseki
>>>>> TDB store using HTTP protocol.
>>>>>
>>>>> We have observed that thread count for the Fuseki process is increasing
>>>>> with time.
>>>>>
>>>>> I know there might be code issue and resources should be closed
>>>>> properly.
>>>>>
>>>>> *But we would like to achieve the same using configuration at Fuseki
>>>>> server- i.e. If the query takes too much time (which means that
>>>>> resource is
>>>>> not closed by application server) then there should be time out and
>>>>> Fuseki
>>>>> server should implicitly kill the thread.*
>>>>>
>>>>>
>>>>> The impact of this thread count getting increased is Fuseki server stop
>>>>> responding and process gets kill automatically after sometimes and we
>>>>> have
>>>>> to restart the Fuseki server again.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Regards,
*Rajiv Chaudhuri*
*Cell(O)# 2013642598*
*Cell(P)# 6093563706*

Re: How to set query timeout and threds kill implicity

Posted by Andy Seaborne <an...@apache.org>.
On 12/12/13 18:15, Chaudhuri, Rajiv wrote:
> The issue got resolved. execAsk does not release the thread even if we
> close the query execution; as an alternative we have used execSelect and
> check whether data exist or not.

Then that's a bug we need to fix.

Which format are you using in the request to the server?

	Andy

>
>
> On Thu, Dec 12, 2013 at 12:24 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> Setting the timeout is a good idea but there are a couple of things in
>> Rajiv's message that suggest that is not all that is going on.
>>
>>
>>   We have observed that thread count for the Fuseki process is increasing
>>> with time.
>>>
>>
>>> I know there might be code issue and resources should be
>>> closed properly
>>
>> Are we talking about SELECT queries?
>> How many concurrent users of the application side are there?
>> And are the results quite large (larger than TCP buffering)?
>>
>> The only way I can think of, at the moment, is if the client is keeping
>> the TCP connection open and the TCP connection filling up, causing backup
>> at the server.  Given, you can calling it from Tomcat, is the Tomcat
>> application code long lived?
>>
>> If you can attach a java  debugger to the server, could you tell me what
>> the threads are doing (or not doing)?
>>
>> If the client is consuming the results very slowly, or jumping out of the
>> ResultSet loop prematurely, then the server has no way of knowing whether
>> the client application code has really finished or is going to come back
>> later.
>>
>> The pattern:
>> QueryExecution exec = QueryExecutionFactory.create(q,ds) ;
>> try {
>>     ResultSet rs = exec.execSelect() ;
>>     while(rs.hasNext()) {
>>           return ...
>>     }
>> } finally {
>>   exec.close();
>> }
>>
>> catches that.
>>
>> Another approach is
>>
>>      ResultSet rs = exec.execSelect() ;
>>      rs = ResultSetFactory.copyResults(rs) ;
>>
>> which forces the ARQ code to pull in all the results, which than releases
>> the connection back tot he local connection pool.
>>
>> It does not matter whether we're talking streaming or if Fuseki gets all
>> the query results, then sent the results (which it doesn't do - it streams
>> when possible) but it does make this vector more likely.
>>
>> To defend the server, Brian's suggestion of a reverse proxy gives you a
>> control point.  Rather than put every mechanism needed into Fuseki, keeping
>> it focues on beign a SPARQL engine and using well-known, well-engineered
>> other components.
>>
>> The default configuration of Fuseki is to use Jetty's
>> BlockingChannelConnector but it is possible to complete take control of the
>> jetty configuration with --jetty-config.
>>
>> (SelectChannelConnector might have been better but it has been reported to
>> be unstable on OS/X for Fuseki's usage patterns)
>>
>>
>> It looks like an (unintentional) "Slow HTTP" DOS attack but on the
>> response consumption side, not the request sending.
>>
>> https://community.qualys.com/blogs/securitylabs/2011/11/02/
>> how-to-protect-against-slow-http-attacks
>>
>> The client OS will still be ack'ing the TCP connection and may do so
>> forever from a server-style app like from Tomcat.
>>
>> Setting the query time may reduce the general effect but it will not free
>> up threads.  A query timeout is a graceful abort of the query - the query
>> gets a chance to clean up and that needs the query execution in the server
>> getting called.
>>
>> Actually killing threads in Java without the threads involvement is bad.
>>
>> http://docs.oracle.com/javase/1.5.0/docs/guide/misc/
>> threadPrimitiveDeprecation.html
>>
>>          Andy
>>
>>
>>
>> On 12/12/13 09:33, Brian McBride wrote:
>>
>>> Hi Rajiv,
>>>
>>> You can find how to set query timeouts in the Fuseki documentation at [1]
>>>
>>> To set server wide timeout:
>>>
>>> []  rdf:type  fuseki:Server  ;
>>>      #Server-wide  context  parameters  can  be  given  here.
>>>      #For  example,  to  set  query  timeouts:  on  a  server-wide  basis:
>>>      #Format  1:  "1000"--  1second  timeout
>>>      #Format  2:  "10000,60000"--  10s  timeout  to  first  result,
>>> then  60s  timeout  to  for  rest  of  query.
>>>      #See  java  doc  for  ARQ.queryTimeout
>>>      #ja:context  [  ja:cxtName  "arq:queryTimeout";   ja:cxtValue
>>> "10000"]  ;
>>>
>>> Uncomment the last line and set the value you require per the comments
>>> above.
>>>
>>> Do you have a reverse proxy configured to limit the rate at which
>>> requests are going to Fuseki?  Can you confirm that all your requests
>>> are going through that proxy?
>>>
>>> Brian
>>>
>>> [1]
>>> http://jena.apache.org/documentation/serving_data/#
>>> running-a-fuseki-server
>>>
>>>
>>> On 11/12/2013 19:47, Chaudhuri, Rajiv wrote:
>>>
>>>> Hi,
>>>>
>>>> Our application which is running at Tomcat server is making query to
>>>> Fuseki
>>>> TDB store using HTTP protocol.
>>>>
>>>> We have observed that thread count for the Fuseki process is increasing
>>>> with time.
>>>>
>>>> I know there might be code issue and resources should be closed properly.
>>>>
>>>> *But we would like to achieve the same using configuration at Fuseki
>>>> server- i.e. If the query takes too much time (which means that
>>>> resource is
>>>> not closed by application server) then there should be time out and
>>>> Fuseki
>>>> server should implicitly kill the thread.*
>>>>
>>>>
>>>> The impact of this thread count getting increased is Fuseki server stop
>>>> responding and process gets kill automatically after sometimes and we
>>>> have
>>>> to restart the Fuseki server again.
>>>>
>>>>
>>>
>>>
>>
>
>


Re: How to set query timeout and threds kill implicity

Posted by "Chaudhuri, Rajiv" <ra...@pearson.com>.
The issue got resolved. execAsk does not release the thread even if we
close the query execution; as an alternative we have used execSelect and
check whether data exist or not.


On Thu, Dec 12, 2013 at 12:24 PM, Andy Seaborne <an...@apache.org> wrote:

> Setting the timeout is a good idea but there are a couple of things in
> Rajiv's message that suggest that is not all that is going on.
>
>
>  We have observed that thread count for the Fuseki process is increasing
>> with time.
>>
>
> > I know there might be code issue and resources should be
> > closed properly
>
> Are we talking about SELECT queries?
> How many concurrent users of the application side are there?
> And are the results quite large (larger than TCP buffering)?
>
> The only way I can think of, at the moment, is if the client is keeping
> the TCP connection open and the TCP connection filling up, causing backup
> at the server.  Given, you can calling it from Tomcat, is the Tomcat
> application code long lived?
>
> If you can attach a java  debugger to the server, could you tell me what
> the threads are doing (or not doing)?
>
> If the client is consuming the results very slowly, or jumping out of the
> ResultSet loop prematurely, then the server has no way of knowing whether
> the client application code has really finished or is going to come back
> later.
>
> The pattern:
> QueryExecution exec = QueryExecutionFactory.create(q,ds) ;
> try {
>    ResultSet rs = exec.execSelect() ;
>    while(rs.hasNext()) {
>          return ...
>    }
> } finally {
>  exec.close();
> }
>
> catches that.
>
> Another approach is
>
>     ResultSet rs = exec.execSelect() ;
>     rs = ResultSetFactory.copyResults(rs) ;
>
> which forces the ARQ code to pull in all the results, which than releases
> the connection back tot he local connection pool.
>
> It does not matter whether we're talking streaming or if Fuseki gets all
> the query results, then sent the results (which it doesn't do - it streams
> when possible) but it does make this vector more likely.
>
> To defend the server, Brian's suggestion of a reverse proxy gives you a
> control point.  Rather than put every mechanism needed into Fuseki, keeping
> it focues on beign a SPARQL engine and using well-known, well-engineered
> other components.
>
> The default configuration of Fuseki is to use Jetty's
> BlockingChannelConnector but it is possible to complete take control of the
> jetty configuration with --jetty-config.
>
> (SelectChannelConnector might have been better but it has been reported to
> be unstable on OS/X for Fuseki's usage patterns)
>
>
> It looks like an (unintentional) "Slow HTTP" DOS attack but on the
> response consumption side, not the request sending.
>
> https://community.qualys.com/blogs/securitylabs/2011/11/02/
> how-to-protect-against-slow-http-attacks
>
> The client OS will still be ack'ing the TCP connection and may do so
> forever from a server-style app like from Tomcat.
>
> Setting the query time may reduce the general effect but it will not free
> up threads.  A query timeout is a graceful abort of the query - the query
> gets a chance to clean up and that needs the query execution in the server
> getting called.
>
> Actually killing threads in Java without the threads involvement is bad.
>
> http://docs.oracle.com/javase/1.5.0/docs/guide/misc/
> threadPrimitiveDeprecation.html
>
>         Andy
>
>
>
> On 12/12/13 09:33, Brian McBride wrote:
>
>> Hi Rajiv,
>>
>> You can find how to set query timeouts in the Fuseki documentation at [1]
>>
>> To set server wide timeout:
>>
>> []  rdf:type  fuseki:Server  ;
>>     #Server-wide  context  parameters  can  be  given  here.
>>     #For  example,  to  set  query  timeouts:  on  a  server-wide  basis:
>>     #Format  1:  "1000"--  1second  timeout
>>     #Format  2:  "10000,60000"--  10s  timeout  to  first  result,
>> then  60s  timeout  to  for  rest  of  query.
>>     #See  java  doc  for  ARQ.queryTimeout
>>     #ja:context  [  ja:cxtName  "arq:queryTimeout";   ja:cxtValue
>> "10000"]  ;
>>
>> Uncomment the last line and set the value you require per the comments
>> above.
>>
>> Do you have a reverse proxy configured to limit the rate at which
>> requests are going to Fuseki?  Can you confirm that all your requests
>> are going through that proxy?
>>
>> Brian
>>
>> [1]
>> http://jena.apache.org/documentation/serving_data/#
>> running-a-fuseki-server
>>
>>
>> On 11/12/2013 19:47, Chaudhuri, Rajiv wrote:
>>
>>> Hi,
>>>
>>> Our application which is running at Tomcat server is making query to
>>> Fuseki
>>> TDB store using HTTP protocol.
>>>
>>> We have observed that thread count for the Fuseki process is increasing
>>> with time.
>>>
>>> I know there might be code issue and resources should be closed properly.
>>>
>>> *But we would like to achieve the same using configuration at Fuseki
>>> server- i.e. If the query takes too much time (which means that
>>> resource is
>>> not closed by application server) then there should be time out and
>>> Fuseki
>>> server should implicitly kill the thread.*
>>>
>>>
>>> The impact of this thread count getting increased is Fuseki server stop
>>> responding and process gets kill automatically after sometimes and we
>>> have
>>> to restart the Fuseki server again.
>>>
>>>
>>
>>
>


-- 
Regards,
*Rajiv Chaudhuri*
*Cell(O)# 2013642598*
*Cell(P)# 6093563706*

Re: How to set query timeout and threds kill implicity

Posted by Andy Seaborne <an...@apache.org>.
Setting the timeout is a good idea but there are a couple of things in 
Rajiv's message that suggest that is not all that is going on.

> We have observed that thread count for the Fuseki process is increasing
> with time.

 > I know there might be code issue and resources should be
 > closed properly

Are we talking about SELECT queries?
How many concurrent users of the application side are there?
And are the results quite large (larger than TCP buffering)?

The only way I can think of, at the moment, is if the client is keeping 
the TCP connection open and the TCP connection filling up, causing 
backup at the server.  Given, you can calling it from Tomcat, is the 
Tomcat application code long lived?

If you can attach a java  debugger to the server, could you tell me what 
the threads are doing (or not doing)?

If the client is consuming the results very slowly, or jumping out of 
the ResultSet loop prematurely, then the server has no way of knowing 
whether the client application code has really finished or is going to 
come back later.

The pattern:
QueryExecution exec = QueryExecutionFactory.create(q,ds) ;
try {
    ResultSet rs = exec.execSelect() ;
    while(rs.hasNext()) {
          return ...
    }
} finally {
  exec.close();
}

catches that.

Another approach is

     ResultSet rs = exec.execSelect() ;
     rs = ResultSetFactory.copyResults(rs) ;

which forces the ARQ code to pull in all the results, which than 
releases the connection back tot he local connection pool.

It does not matter whether we're talking streaming or if Fuseki gets all 
the query results, then sent the results (which it doesn't do - it 
streams when possible) but it does make this vector more likely.

To defend the server, Brian's suggestion of a reverse proxy gives you a 
control point.  Rather than put every mechanism needed into Fuseki, 
keeping it focues on beign a SPARQL engine and using well-known, 
well-engineered other components.

The default configuration of Fuseki is to use Jetty's 
BlockingChannelConnector but it is possible to complete take control of 
the jetty configuration with --jetty-config.

(SelectChannelConnector might have been better but it has been reported 
to be unstable on OS/X for Fuseki's usage patterns)


It looks like an (unintentional) "Slow HTTP" DOS attack but on the 
response consumption side, not the request sending.

https://community.qualys.com/blogs/securitylabs/2011/11/02/how-to-protect-against-slow-http-attacks

The client OS will still be ack'ing the TCP connection and may do so 
forever from a server-style app like from Tomcat.

Setting the query time may reduce the general effect but it will not 
free up threads.  A query timeout is a graceful abort of the query - the 
query gets a chance to clean up and that needs the query execution in 
the server getting called.

Actually killing threads in Java without the threads involvement is bad.

http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html

	Andy


On 12/12/13 09:33, Brian McBride wrote:
> Hi Rajiv,
>
> You can find how to set query timeouts in the Fuseki documentation at [1]
>
> To set server wide timeout:
>
> []  rdf:type  fuseki:Server  ;
>     #Server-wide  context  parameters  can  be  given  here.
>     #For  example,  to  set  query  timeouts:  on  a  server-wide  basis:
>     #Format  1:  "1000"--  1second  timeout
>     #Format  2:  "10000,60000"--  10s  timeout  to  first  result,
> then  60s  timeout  to  for  rest  of  query.
>     #See  java  doc  for  ARQ.queryTimeout
>     #ja:context  [  ja:cxtName  "arq:queryTimeout";   ja:cxtValue
> "10000"]  ;
>
> Uncomment the last line and set the value you require per the comments
> above.
>
> Do you have a reverse proxy configured to limit the rate at which
> requests are going to Fuseki?  Can you confirm that all your requests
> are going through that proxy?
>
> Brian
>
> [1]
> http://jena.apache.org/documentation/serving_data/#running-a-fuseki-server
>
>
> On 11/12/2013 19:47, Chaudhuri, Rajiv wrote:
>> Hi,
>>
>> Our application which is running at Tomcat server is making query to
>> Fuseki
>> TDB store using HTTP protocol.
>>
>> We have observed that thread count for the Fuseki process is increasing
>> with time.
>>
>> I know there might be code issue and resources should be closed properly.
>>
>> *But we would like to achieve the same using configuration at Fuseki
>> server- i.e. If the query takes too much time (which means that
>> resource is
>> not closed by application server) then there should be time out and
>> Fuseki
>> server should implicitly kill the thread.*
>>
>>
>> The impact of this thread count getting increased is Fuseki server stop
>> responding and process gets kill automatically after sometimes and we
>> have
>> to restart the Fuseki server again.
>>
>
>


Re: How to set query timeout and threds kill implicity

Posted by Brian McBride <br...@epimorphics.com>.
Hi Rajiv,

You can find how to set query timeouts in the Fuseki documentation at [1]

To set server wide timeout:

[]  rdf:type  fuseki:Server  ;
    #Server-wide  context  parameters  can  be  given  here.
    #For  example,  to  set  query  timeouts:  on  a  server-wide  basis:
    #Format  1:  "1000"--  1second  timeout
    #Format  2:  "10000,60000"--  10s  timeout  to  first  result,  then  60s  timeout  to  for  rest  of  query.
    #See  java  doc  for  ARQ.queryTimeout
    #ja:context  [  ja:cxtName  "arq:queryTimeout";   ja:cxtValue  "10000"]  ;

Uncomment the last line and set the value you require per the comments 
above.

Do you have a reverse proxy configured to limit the rate at which 
requests are going to Fuseki?  Can you confirm that all your requests 
are going through that proxy?

Brian

[1] 
http://jena.apache.org/documentation/serving_data/#running-a-fuseki-server


On 11/12/2013 19:47, Chaudhuri, Rajiv wrote:
> Hi,
>
> Our application which is running at Tomcat server is making query to Fuseki
> TDB store using HTTP protocol.
>
> We have observed that thread count for the Fuseki process is increasing
> with time.
>
> I know there might be code issue and resources should be closed properly.
>
> *But we would like to achieve the same using configuration at Fuseki
> server- i.e. If the query takes too much time (which means that resource is
> not closed by application server) then there should be time out and Fuseki
> server should implicitly kill the thread.*
>
>
> The impact of this thread count getting increased is Fuseki server stop
> responding and process gets kill automatically after sometimes and we have
> to restart the Fuseki server again.
>


-- 
Epimorphics Ltd (http://www.epimorphics.com) Epimorphics Ltd. is a limited company registered in England (number 7016688)
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT, UK