You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mitra <mi...@ornext.com> on 2012/11/13 09:00:00 UTC

Solr Indexing MAX FILE LIMIT

 Hello Guys

Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on
windows machine

** My question is that what would be the max csv file size limit when doing
a HTTP POST or while using the following curl command
curl http://localhost:8080/solr/update/csv -F "stream.file=D:\eighth.csv" -F
"commit=true" -F "optimize=true" -F "encapsulate="" -F "keepEmpty=true"

** My requirement is quite large because we have to index CSV files ranging
between 8 to 10 GB

** What would be the optimum settings for index parameters like commit for
better perfomance on a machine with 8gb RAM

Please guide me on it

Thanks in Advance



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Indexing MAX FILE LIMIT

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Maybe you can start by testing this with split -l and xargs :-) These are
standard Unix toolkit approaches and since you use one of them (curl) you
may be happy to use others too.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Nov 14, 2012 at 11:33 PM, mitra <mi...@ornext.com> wrote:

> Thank you eric
>
> I didnt know that we could write a Java class for it , can you provide me
> with some info on how to
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4020407.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr Indexing MAX FILE LIMIT

Posted by mitra <mi...@ornext.com>.
Thank you eric

I didnt know that we could write a Java class for it , can you provide me
with some info on how to 

Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4020407.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Indexing MAX FILE LIMIT

Posted by Erick Erickson <er...@gmail.com>.
Have you considered writing a small SolrJ (or other client) program that
processed the rows in your huge file and sent them to solr in sensible
chunks? That would give you much finer control over how the file was
processed, how many docs were sent to Solr at a time, what to do with
errors. You could even run N simultaneous programs to increase throughput...

FWIW,
Erick


On Tue, Nov 13, 2012 at 3:42 AM, mitra <mi...@ornext.com> wrote:

> Thankyou
>
>
> *** I understand that the default size for HTTP POST in tomcat is 2mb can
> we
> change that somehow
>        so that i dont need to split the 10gb csv into 2mb chunks
>
> curl http://localhost:8080/solr/update/csv -F "stream.file=D:\eighth.csv"
> -F
> "commit=true" -F "optimize=true" -F "encapsulate="" -F "keepEmpty=true"
>
> *** As I mentioned im using the above command to post rather than using
> this
> below format
>
> curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H
> 'Content-type:text/plain; charset=utf-8'
>
> ***My question Is the Limit still applicable even when not using the above
> data binary format also
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: Solr Indexing MAX FILE LIMIT

Posted by mitra <mi...@ornext.com>.
Thankyou


*** I understand that the default size for HTTP POST in tomcat is 2mb can we
change that somehow
       so that i dont need to split the 10gb csv into 2mb chunks

curl http://localhost:8080/solr/update/csv -F "stream.file=D:\eighth.csv" -F
"commit=true" -F "optimize=true" -F "encapsulate="" -F "keepEmpty=true" 

*** As I mentioned im using the above command to post rather than using this
below format

curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H
'Content-type:text/plain; charset=utf-8'

***My question Is the Limit still applicable even when not using the above
data binary format also




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr Indexing MAX FILE LIMIT

Posted by Markus Jelsma <ma...@openindex.io>.
Hi - instead of trying to make the system ingest such large files perhaps you can split the files in many small pieces. 
 
-----Original message-----
> From:mitra <mi...@ornext.com>
> Sent: Tue 13-Nov-2012 09:05
> To: solr-user@lucene.apache.org
> Subject: Solr Indexing MAX FILE LIMIT
> 
>  Hello Guys
> 
> Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on
> windows machine
> 
> ** My question is that what would be the max csv file size limit when doing
> a HTTP POST or while using the following curl command
> curl http://localhost:8080/solr/update/csv -F "stream.file=D:\eighth.csv" -F
> "commit=true" -F "optimize=true" -F "encapsulate="" -F "keepEmpty=true"
> 
> ** My requirement is quite large because we have to index CSV files ranging
> between 8 to 10 GB
> 
> ** What would be the optimum settings for index parameters like commit for
> better perfomance on a machine with 8gb RAM
> 
> Please guide me on it
> 
> Thanks in Advance
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>