You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2013/11/01 12:40:37 UTC

Re: Indexing logs files of thousands of GBs

Throwing a multi-gigabyte file at Solr and expecting it
to index it is asking for a bit too much. You either
have to stream it up and break it apart or something
similar.

And consider what happens if you just index the log as
a single document. How do you search it? Do you return
several G as the result? Most applications break
the log file up into individual documents and index each event
individually to enable searches like
"all OOM errors between 12:00 and 13:00 yesterday" or
similar. How do you expect to do such a thing if it's one
big document?

I may be completely off base here, but I think you need to
define the problem you're solving more clearly. I can flat
guarantee that trying to index a large log file as one document
will be unsatisfactory to search, even if you can get it into
the index.

Best,
Erick


On Wed, Oct 30, 2013 at 12:47 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> Hm, sorry for not helping with this particular issue directly, but it
> looks like you are *uploading* your logs and indexing that way?
> Wouldn't pushing them be a better fit when it comes to log indexing?
> We recently contributed a Logstash output that can index logs to Solr,
> which may be of interest - have a look at
> https://twitter.com/otisg/status/395563043045638144 -- includes a
> little diagram that shows how this fits into the picture.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Wed, Oct 30, 2013 at 9:55 AM, keshari.prerna
> <ke...@gmail.com> wrote:
> > Hello,
> >
> > As suggested by Chris, now I am accessing the files using java program
> and
> > creating SolrInputDocument, but i ran into this exception while doing
> > server.add(document). When i tried to increase "ramBufferSizeMB", it
> doesn't
> > let me make it more than 2 gig.
> >
> > org.apache.solr.client.solrj.SolrServerException: Server at
> > http://localhost:8983/solr/logsIndexing returned non ok status:500,
> > message:the request was rejected because its size (2097454) exceeds the
> > configured maximum (2097152)
> > org.apache.commons.fileupload.FileUploadBase$SizeLimitExceededException:
> the
> > request was rejected because its size (2097454) exceeds the configured
> > maximum (2097152)       at
> >
> org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl$1.raiseError(FileUploadBase.java:902)
> > at
> >
> org.apache.commons.fileupload.util.LimitedInputStream.checkLimit(LimitedInputStream.java:71)
> > at
> >
> org.apache.commons.fileupload.util.LimitedInputStream.read(LimitedInputStream.java:128)
> > at
> >
> org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977)
> > at
> >
> org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887)
> > at java.io.InputStream.read(Unknown Source)     at
> > org.apache.commons.fileupload.util.Streams.copy(Streams.java:94)
>  at
> > org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)
>  at
> >
> org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
> > at
> >
> org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
> > at
> >
> org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
> > at
> >
> org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
> > at
> >
> org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
> > at
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > at
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > at org.mortbay.jetty.handler.ContextHand
> >         at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
> >         at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
> >         at
> >
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> >         at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
> >         at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
> >         at Filewalker.walk(LogsIndexer.java:48)
> >         at Filewalker.main(LogsIndexer.java:69)
> >
> > How do I get rid of this?
> >
> > Thanks,
> > Prerna
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-logs-files-of-thousands-of-GBs-tp4097073p4098438.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>