You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Rajendran, Prabaharan" <Ra...@DNB.com> on 2016/06/27 13:24:36 UTC

SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

Hi,

I am trying to index a text file about 4.2 GB in size. This kind of POC to understand Solr capacity on indexing & searching.

Here is my Solr configuration
-Xms1024m        -Xmx1024m        -Xss256k

java -Dtype=text/csv -Dparams="separator=%09" -Durl=http://localhost:8983/solr/mycollection/update -jar ..\example\exampledocs\post.jar ..\example\exampledocs\largefile.txt

While doing index got error like below,
SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

Kindly let me know, if I need to change (increase memory) any solr configuration to handle this.

Here is my log file entry,

ERROR (qtp297811323-14) [   x:collection2] o.a.s.c.SolrCore org.apache.solr.common.SolrException: CSVLoader: input=null, line=2815040,can't read line: 2815040
                values={NO LINES AVAILABLE}
                at org.apache.solr.handler.loader.CSVLoaderBase.input_err(CSVLoaderBase.java:317)
                at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:356)
                at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
                at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
                at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
                at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
                at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
                at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
                at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
                at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
                at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
                at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
                at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
                at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
                at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
                at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
                at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
                at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
                at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
                at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
                at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
                at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
                at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
                at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
                at org.eclipse.jetty.server.Server.handle(Server.java:499)
                at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
                at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
                at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
                at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
                at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
                at java.lang.Thread.run(Thread.java:745)
Caused by: org.eclipse.jetty.io.EofException: Early EOF
                at org.eclipse.jetty.server.HttpInput$3.noContent(HttpInput.java:506)
                at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124)
                at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
                at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
                at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
                at java.io.InputStreamReader.read(InputStreamReader.java:184)
                at java.io.BufferedReader.fill(BufferedReader.java:154)
                at java.io.BufferedReader.read(BufferedReader.java:175)
                at org.apache.solr.internal.csv.ExtendedBufferedReader.read(ExtendedBufferedReader.java:82)
                at org.apache.solr.internal.csv.CSVParser.simpleTokenLexer(CSVParser.java:421)
                at org.apache.solr.internal.csv.CSVParser.nextToken(CSVParser.java:371)
                at org.apache.solr.internal.csv.CSVParser.getLine(CSVParser.java:231)
                at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:353)
                ... 29 more

Thanks,
Prabaharan

RE: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

Posted by "Rajendran, Prabaharan" <Ra...@DNB.com>.
Thanks Erick, for your response. Now I am splitting the file before indexing.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 28 June 2016 11:01
To: solr-user
Subject: Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

You're most likely not getting _near_ 4.2G written to Solr, the transport protocol is probably cutting that off as indicated by the "early EOF" exception.

It's really hard to justify trying to index 4.2G as a _single_ file.
First of all you won't even be able to receive it in Solr after you've given it only 1G of memory even if you get the transport stuff worked out. Second, searching it is totally useless in most cases as it will probably match _everything_.
Thirdly, even if it does match something, how are you going to return it to a user?

If it's multiple documents in a huge uber-doc you can break it up at ingestion and only send docs to Solr rather than the whole thing.

IOW, I think this is a waste of your time. I understand that you're trying to see the limits, but this limit is not a reasonable one to hope to cross.

Best,
Erick

On Mon, Jun 27, 2016 at 6:24 AM, Rajendran, Prabaharan <Ra...@dnb.com> wrote:
> Hi,
>
> I am trying to index a text file about 4.2 GB in size. This kind of POC to understand Solr capacity on indexing & searching.
>
> Here is my Solr configuration
> -Xms1024m        -Xmx1024m        -Xss256k
>
> java -Dtype=text/csv -Dparams="separator=%09" 
> -Durl=http://localhost:8983/solr/mycollection/update -jar 
> ..\example\exampledocs\post.jar ..\example\exampledocs\largefile.txt
>
> While doing index got error like below,
> SimplePostTool: FATAL: IOException while posting data: 
> java.io.IOException: too many bytes written
>
> Kindly let me know, if I need to change (increase memory) any solr configuration to handle this.
>
> Here is my log file entry,
>
> ERROR (qtp297811323-14) [   x:collection2] o.a.s.c.SolrCore org.apache.solr.common.SolrException: CSVLoader: input=null, line=2815040,can't read line: 2815040
>                 values={NO LINES AVAILABLE}
>                 at org.apache.solr.handler.loader.CSVLoaderBase.input_err(CSVLoaderBase.java:317)
>                 at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:356)
>                 at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
>                 at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
>                 at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>                 at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>                 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>                 at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>                 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
>                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>                 at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>                 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>                 at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>                 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>                 at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>                 at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>                 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>                 at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>                 at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>                 at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>                 at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>                 at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>                 at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>                 at org.eclipse.jetty.server.Server.handle(Server.java:499)
>                 at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>                 at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>                 at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>                 at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>                 at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>                 at java.lang.Thread.run(Thread.java:745)
> Caused by: org.eclipse.jetty.io.EofException: Early EOF
>                 at org.eclipse.jetty.server.HttpInput$3.noContent(HttpInput.java:506)
>                 at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124)
>                 at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>                 at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>                 at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>                 at java.io.InputStreamReader.read(InputStreamReader.java:184)
>                 at java.io.BufferedReader.fill(BufferedReader.java:154)
>                 at java.io.BufferedReader.read(BufferedReader.java:175)
>                 at org.apache.solr.internal.csv.ExtendedBufferedReader.read(ExtendedBufferedReader.java:82)
>                 at org.apache.solr.internal.csv.CSVParser.simpleTokenLexer(CSVParser.java:421)
>                 at org.apache.solr.internal.csv.CSVParser.nextToken(CSVParser.java:371)
>                 at org.apache.solr.internal.csv.CSVParser.getLine(CSVParser.java:231)
>                 at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:353)
>                 ... 29 more
>
> Thanks,
> Prabaharan

Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

Posted by Erick Erickson <er...@gmail.com>.
You're most likely not getting _near_ 4.2G written to Solr, the
transport protocol is probably cutting that off as indicated by
the "early EOF" exception.

It's really hard to justify trying to index 4.2G as a _single_ file.
First of all you won't even be able to receive it in Solr after
you've given it only 1G of memory even if you get the
transport stuff worked out. Second, searching it is totally
useless in most cases as it will probably match _everything_.
Thirdly, even if it does match something, how are you going
to return it to a user?

If it's multiple documents in a huge uber-doc you can
break it up at ingestion and only send docs to Solr rather
than the whole thing.

IOW, I think this is a waste of your time. I understand that
you're trying to see the limits, but this limit is not a reasonable
one to hope to cross.

Best,
Erick

On Mon, Jun 27, 2016 at 6:24 AM, Rajendran, Prabaharan
<Ra...@dnb.com> wrote:
> Hi,
>
> I am trying to index a text file about 4.2 GB in size. This kind of POC to understand Solr capacity on indexing & searching.
>
> Here is my Solr configuration
> -Xms1024m        -Xmx1024m        -Xss256k
>
> java -Dtype=text/csv -Dparams="separator=%09" -Durl=http://localhost:8983/solr/mycollection/update -jar ..\example\exampledocs\post.jar ..\example\exampledocs\largefile.txt
>
> While doing index got error like below,
> SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written
>
> Kindly let me know, if I need to change (increase memory) any solr configuration to handle this.
>
> Here is my log file entry,
>
> ERROR (qtp297811323-14) [   x:collection2] o.a.s.c.SolrCore org.apache.solr.common.SolrException: CSVLoader: input=null, line=2815040,can't read line: 2815040
>                 values={NO LINES AVAILABLE}
>                 at org.apache.solr.handler.loader.CSVLoaderBase.input_err(CSVLoaderBase.java:317)
>                 at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:356)
>                 at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
>                 at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
>                 at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>                 at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>                 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>                 at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>                 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
>                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>                 at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>                 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>                 at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>                 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>                 at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>                 at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>                 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>                 at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>                 at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>                 at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>                 at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>                 at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>                 at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>                 at org.eclipse.jetty.server.Server.handle(Server.java:499)
>                 at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>                 at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>                 at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>                 at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>                 at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>                 at java.lang.Thread.run(Thread.java:745)
> Caused by: org.eclipse.jetty.io.EofException: Early EOF
>                 at org.eclipse.jetty.server.HttpInput$3.noContent(HttpInput.java:506)
>                 at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124)
>                 at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>                 at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>                 at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>                 at java.io.InputStreamReader.read(InputStreamReader.java:184)
>                 at java.io.BufferedReader.fill(BufferedReader.java:154)
>                 at java.io.BufferedReader.read(BufferedReader.java:175)
>                 at org.apache.solr.internal.csv.ExtendedBufferedReader.read(ExtendedBufferedReader.java:82)
>                 at org.apache.solr.internal.csv.CSVParser.simpleTokenLexer(CSVParser.java:421)
>                 at org.apache.solr.internal.csv.CSVParser.nextToken(CSVParser.java:371)
>                 at org.apache.solr.internal.csv.CSVParser.getLine(CSVParser.java:231)
>                 at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:353)
>                 ... 29 more
>
> Thanks,
> Prabaharan

Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2016-06-28 at 16:42 +0000, Rajendran, Prabaharan wrote:
> Please suggest me which is best way to index(multithreaded) if your
> input format is text/csv (file).

Last I tried, it was pretty straight forward: Split your CSV in chunks
and start about as many separate uploads as you have (real) CPU cores.

- Toke Eskildsen, State and University Library, Denmark



RE: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

Posted by "Rajendran, Prabaharan" <Ra...@DNB.com>.
Thanks Toke, now I am splitting file before indexing. 

Shalin, thanks for the details. Even this fixed in 5.5 and 6.0 is there any threshold value. 
Please suggest me which is best way to index(multithreaded) if your input format is text/csv (file).

Thanks,
Prabaharan

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: 28 June 2016 16:06
To: solr-user@lucene.apache.org; Toke Eskildsen
Subject: Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

This was fixed in 5.5 and 6.0. You can upload files larger than 2GB with the simple post tool however I don't recommend it because it uses a single indexing thread.

On Tue, Jun 28, 2016 at 3:55 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Mon, 2016-06-27 at 13:24 +0000, Rajendran, Prabaharan wrote:
> > I am trying to index a text file about 4.2 GB in size. [...]
> >
> > SimplePostTool: FATAL: IOException while posting data:
> java.io.IOException: too many bytes written
>
> SimplePostTool uses
> HttpUrlConnection.setFixedLengthStreamingMode(file_size)
> where file_size is an integer.
>
> Unfortunately there is no check for overflow (which happens with files 
> > 2GB), so there is no sane error message up front and you only get 
> the error you pasted after some bytes has been sent. With a 4.2GB 
> input file, I would guess after about 200MB (4.2GB % 2GB).
>
>
> Long story short: Keep your posts below 2GB.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


--
Regards,
Shalin Shekhar Mangar.

Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
This was fixed in 5.5 and 6.0. You can upload files larger than 2GB with
the simple post tool however I don't recommend it because it uses a single
indexing thread.

On Tue, Jun 28, 2016 at 3:55 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Mon, 2016-06-27 at 13:24 +0000, Rajendran, Prabaharan wrote:
> > I am trying to index a text file about 4.2 GB in size. [...]
> >
> > SimplePostTool: FATAL: IOException while posting data:
> java.io.IOException: too many bytes written
>
> SimplePostTool uses
> HttpUrlConnection.setFixedLengthStreamingMode(file_size)
> where file_size is an integer.
>
> Unfortunately there is no check for overflow (which happens with files >
> 2GB), so there is no sane error message up front and you only get the
> error you pasted after some bytes has been sent. With a 4.2GB input
> file, I would guess after about 200MB (4.2GB % 2GB).
>
>
> Long story short: Keep your posts below 2GB.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Mon, 2016-06-27 at 13:24 +0000, Rajendran, Prabaharan wrote:
> I am trying to index a text file about 4.2 GB in size. [...]
> 
> SimplePostTool: FATAL: IOException while posting data: java.io.IOException: too many bytes written

SimplePostTool uses 
HttpUrlConnection.setFixedLengthStreamingMode(file_size)
where file_size is an integer.

Unfortunately there is no check for overflow (which happens with files >
2GB), so there is no sane error message up front and you only get the
error you pasted after some bytes has been sent. With a 4.2GB input
file, I would guess after about 200MB (4.2GB % 2GB).


Long story short: Keep your posts below 2GB.

- Toke Eskildsen, State and University Library, Denmark