You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Kevin Risden (Jira)" <ji...@apache.org> on 2020/02/07 22:54:00 UTC

[jira] [Comment Edited] (SOLR-14249) Krb5HttpClientBuilder should not buffer requests

    [ https://issues.apache.org/jira/browse/SOLR-14249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032723#comment-17032723 ] 

Kevin Risden edited comment on SOLR-14249 at 2/7/20 10:53 PM:
--------------------------------------------------------------

So I haven't personally looked at Krb5HttpClientBuilder recently, other than completely unrelated SOLR-13726. Part of the reason that a lot of clients buffer is due to how Kerberos SPNEGO authentication works.

There are 2 parts typically
* a request without authentication where the server returns a 401 with a negotiate response
* a request with authentication in response to the negotiate which the server can verify

If you don't put any optimizations in place every request becomes two. A lot of times a cookie is used here to limit the amount of HTTP requests.

The reason the 401 and second request is an issue - is if the request is a non repeatable one - like a POST body. The client ends up sending the body and gets a 401 then goes o crap I need to send the body again and can't - because its non repeatable.

So a lot of times the super simple workaround is to buffer the request - do the 401 check dance and then proceed. This is a way to make a non repeatable request semi repeatable.

This buffering has issues though as you found where the buffer should be limited in size which then limits the usefulness of this technique.

There are a few alternatives to buffering:
* Authenticate upfront with say an OPTIONS request - which will get the cookie. the next request say a POST won't have any issue and won't do the 401 dance
* Use "Expect: 100-continue" header which asks the server if it can handle the request without the body and if it can then send the body. This actually holds the data from being sent in the first place if possible.
** Curl automatically activates "Expect: 100-continue" under a few conditions- https://gms.tf/when-curl-sends-100-continue.html
** Apache HttpClient does NOT do any special handling of "Expect: 100-continue"
** not sure if Jetty HttpClient does anything with "Expect: 100-continue"

So long story short - yes buffering is a problem.


was (Author: risdenk):
So I haven't personally looked at Krb5HttpClientBuilder recently, other than completely unrelated SOLR-13726. Part of the reason that a lot of clients buffer is due to how Kerberos SPNEGO authentication works.

There are 2 parts typically
* a request without authentication where the server returns a 401 with a negotiate response
* a request with authentication in response to the negotiate which the server can verify

If you don't put any optimizations in place every request becomes two. A lot of times a cookie is used here to limit the amount of HTTP requests.

The reason the 401 and second request is an issue - is if the request is a non repeatable one - like a POST body. 

So a lot of times the super simple workaround is to buffer the request - do the 401 check dance and then proceed. This is a way to make a non repeatable request semi repeatable.

This buffering has issues though as you found where the buffer should be limited in size which then limits the usefulness of this technique.

There are a few alternatives to buffering:
* Authenticate upfront with say an OPTIONS request - which will get the cookie. the next request say a POST won't have any issue and won't do the 401 dance
* Use "Expect: 100-continue" header which asks the server if it can handle the request without the body and if it can then send the body. This actually holds the data from being sent in the first place if possible.
** Curl automatically activates "Expect: 100-continue" under a few conditions- https://gms.tf/when-curl-sends-100-continue.html
** Apache HttpClient does NOT do any special handling of "Expect: 100-continue"
** not sure if Jetty HttpClient does anything with "Expect: 100-continue"

So long story short - yes buffering is a problem.

> Krb5HttpClientBuilder should not buffer requests 
> -------------------------------------------------
>
>                 Key: SOLR-14249
>                 URL: https://issues.apache.org/jira/browse/SOLR-14249
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Authentication, SolrJ
>    Affects Versions: 7.4, master (9.0), 8.4.1
>            Reporter: Jason Gerlowski
>            Priority: Major
>         Attachments: SOLR-14249-reproduction.patch
>
>
> When SolrJ clients enable Kerberos authentication, a request interceptor is set up which wraps the actual HttpEntity in a BufferedHttpEntity.  This BufferedHttpEntity, well, buffers the request body in a {{byte[]}} so it can be repeated if needed.  This works fine for small requests, but when requests get large storing the entire request in memory causes contention or OutOfMemoryErrors.
> The easiest way for this to manifest is to use ConcurrentUpdateSolrClient, which opens a connection to Solr and streams documents out in an ever increasing request entity until the doc queue held by the client is emptied.
> I ran into this while troubleshooting a DIH run that would reproducibly load a few hundred thousand documents before progress stalled out.  Solr never crashed and the DIH thread was still alive, but the ConcurrentUpdateSolrClient used by DIH had its "Runner" thread disappear around the time of the stall and an OOM like the one below could be seen in solr-8983-console.log:
> {code}
> WARNING: Uncaught exception in thread: Thread[concurrentUpdateScheduler-28-thread-1,5,TGRP-TestKerberosClientBuffering]
> java.lang.OutOfMemoryError: Java heap space
>   at __randomizedtesting.SeedInfo.seed([371A00FBA76D31DF]:0)
>   at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
>   at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
>   at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
>   at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
>   at org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:213)
>   at org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:94)
>   at org.apache.solr.common.util.ByteUtils.writeUTF16toUTF8(ByteUtils.java:145)
>   at org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:848)
>   at org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:932)
>   at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:328)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:616)
>   at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:355)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at org.apache.solr.common.util.JavaBinCodec.writeMapEntry(JavaBinCodec.java:764)
>   at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:383)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:705)
>   at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:367)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:223)
>   at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:330)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:155)
>   at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:91)
>   at org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
>   at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner$1.writeTo(ConcurrentUpdateSolrClient.java:264)
>   at org.apache.http.entity.EntityTemplate.writeTo(EntityTemplate.java:73)
>   at org.apache.http.entity.BufferedHttpEntity.<init>(BufferedHttpEntity.java:62)
>   at org.apache.solr.client.solrj.impl.Krb5HttpClientBuilder.lambda$new$3(Krb5HttpClientBuilder.java:155)
>   at org.apache.solr.client.solrj.impl.Krb5HttpClientBuilder$$Lambda$459/0x0000000800623840.process(Unknown Source)
>   at org.apache.solr.client.solrj.impl.HttpClientUtil$DynamicInterceptor$1.accept(HttpClientUtil.java:177)
> {code}
> We took heap dumps and were able to confirm that the entire 8gb heap was taken up with a single massive CUSC request body that was being buffered!
> (As an aside, I had no idea that OutOfMemoryError's could happen without killing the entire JVM.  But apparently they can.  CUSC.Runner propagates the OOM as it should and the OOM kills the Runner thread.  Since that thread is the gc-root for the massive BufferedHttpEntity though, a garbage collection frees up most of the heap space and the JVM survives its memory trouble.  Solr's oom script never triggers.)
> I've attached a JUnit test which reproduces the OOM issue by using a "fake" Kerberos config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org