You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sanjoy Ganguly <ga...@gmail.com> on 2019/08/22 13:00:40 UTC

Facing time out issue jn solr

Hello,

Good evening!

I am facing issue while trying to index 4 files. Getting "time out error"
in log.

I am using Solr 7.5, installed in the Linux server.  We have lot of
business document that we are able to index but except below listed file.

1. File 1
    Size- approx 340 MB
    Page count- approx 5800

Rest files are also have same type of figure.

Just to clarify this file are opening in Adobe reader. File are having text.

All files are in PDF format.

Question-  Is there any file size or page count restriction in solr?

*Asper business protocol I will not be able to attach the files.

Thanks .

Awaiting your response.

Regards,
Sanjoy Ganguly

Re: Facing time out issue jn solr

Posted by Erick Erickson <er...@gmail.com>.
No, there’s no a-priori file size in Solr. But ingesting a 340M file will take a long time. A very long time. The timeout is probably just the client timeout, I’ve seen a situation where the doc does get indexed even though there’s a timeout.

However:

1> There are several timeouts to be aware of that you can lengthen, all in solr.xml:
	• socketTimeout

	• connTimeout

	• distribUpdateConnTimeout

	• distribUpdateSoTimeout


distribUdateConnTimeout is important. If you have leaders and replicas (SolrCloud), the leader forwards the doc to the follower. If this timeout is exceeded, the leader may put the follower into “Leader Initiated Recovery”. You really need to insure that this parameter is longer than any anticipated timeout.

2> If you’re just throwing a 340MB  “semi structured” document at Solr (i.e. Word, PDF, whatever) you’re putting an awful lot of work on the node doing the indexing. You probably want to move the parsing off Solr, see: https://lucidworks.com/post/indexing-with-solrj/ or use one of the services.

3> I always question the utility of indexing such a large document. Assuming that’s mostly textual data, what are you going to do with it? It’ll have so many words in it that it’ll be found by many, many, many searches. It’ll also have so many words in it that it’ll tend to be far down in the results list. Assuming you’re OK with those issues, what will the user do with it if they click on it? Wait until the entire file is returned to the laptop then have the browser blow up trying to load it? My point is perhaps a better idea is to ask what use-case indexing this document serves. It may be that you have a perfectly valid reason, I just want to be sure you’ve thought through the implications.

Best,
Erick

> On Aug 22, 2019, at 9:00 AM, Sanjoy Ganguly <ga...@gmail.com> wrote:
> 
> Hello,
> 
> Good evening!
> 
> I am facing issue while trying to index 4 files. Getting "time out error"
> in log.
> 
> I am using Solr 7.5, installed in the Linux server.  We have lot of
> business document that we are able to index but except below listed file.
> 
> 1. File 1
>    Size- approx 340 MB
>    Page count- approx 5800
> 
> Rest files are also have same type of figure.
> 
> Just to clarify this file are opening in Adobe reader. File are having text.
> 
> All files are in PDF format.
> 
> Question-  Is there any file size or page count restriction in solr?
> 
> *Asper business protocol I will not be able to attach the files.
> 
> Thanks .
> 
> Awaiting your response.
> 
> Regards,
> Sanjoy Ganguly