You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by horot <ro...@gmail.com> on 2013/05/15 12:06:43 UTC

how to increase upload into Solr 4.x ???

Hi,

I use to upload data with Pentahoo Kettle into Solr. The average speed is
3500-5000 records per second.
This is a very small speed. Is there a quick tool that would give the
highest speed, or it depends on the Solr?



--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-increase-upload-into-Solr-4-x-tp4063451.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to increase upload into Solr 4.x ???

Posted by Gora Mohanty <go...@mimirtech.com>.
On 15 May 2013 21:44, horot <ro...@gmail.com> wrote:
> Hi, Gora!
>
> The data is pulled from the MSSQL database.
> I think the bottleneck for indexing in SOLR.

Why do you think so? Have you checked the CPU/memory
usage on the Solr server? Likewise for the database
server?

Also, I had somehow glossed over your numbers. 3500-5000
records per second actually sounds pretty decent, especially
if you are using an ETL tool. That would be some 18 million
records/hour.

> Is it possible to further boost by kettle?

You would really be best off asking on a Kettle-
related list. As I said, we had little experience
with Kettle, and gave up on it after the ETL
transformations proved too slow.

Have you tried indexing directly from the database
using Solr's DataImportHandler, or something like
SolrJ?

Regards,
Gora

Re: how to increase upload into Solr 4.x ???

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You can't just hit the same handler twice? What about two different
handlers and pass the same config file via URL parameter?


Where does it make it single-threaded?

Regards,

   Alex.
On 15 May 2013 19:18, "Shawn Heisey" <so...@elyograg.org> wrote:

> On 5/15/2013 2:52 PM, Furkan KAMACI wrote:
>
>> You said "If I were doing this with the dataimport handler, I would define
>> more than one handler in solrconfig.xml, each with its own config file."
>> What is the benefit of using more than one handler?
>>
>
> DIH is single-threaded.  By using more than one handler at the same time,
> I would have multiple threads sending documents in to Solr.
>
>
>

Re: how to increase upload into Solr 4.x ???

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/15/2013 2:52 PM, Furkan KAMACI wrote:
> You said "If I were doing this with the dataimport handler, I would define
> more than one handler in solrconfig.xml, each with its own config file."
> What is the benefit of using more than one handler?

DIH is single-threaded.  By using more than one handler at the same 
time, I would have multiple threads sending documents in to Solr.



Re: how to increase upload into Solr 4.x ???

Posted by Furkan KAMACI <fu...@gmail.com>.
Hi Shawn;

You said "If I were doing this with the dataimport handler, I would define
more than one handler in solrconfig.xml, each with its own config file."
What is the benefit of using more than one handler?

2013/5/15 Shawn Heisey <so...@elyograg.org>

> > The data is pulled from the MSSQL database.
> > I think the bottleneck for indexing in SOLR.
> > Is it possible to further boost by kettle?
>
> I don't know what kettle is or what its capabilities are.
>
> Can you run more than one instance of kettle at the same time, each one
> retrieving part of the database? You could divide the DB by where clause,
> row limit, mod value on a hash, etc. Running updates at the same time is
> generally the way to get good indexing performance out of solr.
>
> If I were doing this with the dataimport handler, I would define more than
> one handler in solrconfig.xml, each with its own config file.
>
> Thanks,
> Shawn
>
>
>

Re: how to increase upload into Solr 4.x ???

Posted by Shawn Heisey <so...@elyograg.org>.
> The data is pulled from the MSSQL database.
> I think the bottleneck for indexing in SOLR.
> Is it possible to further boost by kettle?

I don't know what kettle is or what its capabilities are.

Can you run more than one instance of kettle at the same time, each one
retrieving part of the database? You could divide the DB by where clause,
row limit, mod value on a hash, etc. Running updates at the same time is
generally the way to get good indexing performance out of solr.

If I were doing this with the dataimport handler, I would define more than
one handler in solrconfig.xml, each with its own config file.

Thanks,
Shawn



Re: how to increase upload into Solr 4.x ???

Posted by horot <ro...@gmail.com>.
Hi, Gora!

The data is pulled from the MSSQL database.
I think the bottleneck for indexing in SOLR.
Is it possible to further boost by kettle?



--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-increase-upload-into-Solr-4-x-tp4063451p4063540.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to increase upload into Solr 4.x ???

Posted by Gora Mohanty <go...@mimirtech.com>.
On 15 May 2013 15:36, horot <ro...@gmail.com> wrote:
> Hi,
>
> I use to upload data with Pentahoo Kettle into Solr. The average speed is
> 3500-5000 records per second.
> This is a very small speed. Is there a quick tool that would give the
> highest speed, or it depends on the Solr?

First, you would need to figure out where the bottle-neck
is. Where are the data being pulled from? A database?
The limit could be because of the database server, network,
CPU, etc.

The one time that we tried using Kettle, the bottle-neck
was in the ETL transformations being applied, and we
finally had to find another method as this was just too slow
for our needs. Your mileage may vary.

Regards,
Gora

Re: how to increase upload into Solr 4.x ???

Posted by Walter Underwood <wu...@wunderwood.org>.
Seems fast to me, too. We get about 600/second pulling data from MySQL with a pretty complicated query.

Check the CPU usage on the Solr machine. If that is not reaching 100% for periods of time, then Solr is not the bottleneck. Indexing is very CPU-intensive.

On a multi-CPU Solr machine, you will need to use several indexing clients to use all CPUs.

What version of Solr? There have been some important indexing speedups in the latest versions.

wunder

On May 15, 2013, at 9:29 AM, Jack Krupansky wrote:

> "3500-5000 records per second. This is a very small speed."
> 
> That's hardly a slow rate for ingestion of data!
> 
> Who is telling you that it is?
> 
> That is not to say that the speed can't be improved, but let's keep things in perspective.
> 
> And of course the speed does depend on your schema and actual data.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: horot
> Sent: Wednesday, May 15, 2013 6:06 AM
> To: solr-user@lucene.apache.org
> Subject: how to increase upload into Solr 4.x ???
> 
> Hi,
> 
> I use to upload data with Pentahoo Kettle into Solr. The average speed is
> 3500-5000 records per second.
> This is a very small speed. Is there a quick tool that would give the
> highest speed, or it depends on the Solr?
> 





Re: how to increase upload into Solr 4.x ???

Posted by Jack Krupansky <ja...@basetechnology.com>.
"3500-5000 records per second. This is a very small speed."

That's hardly a slow rate for ingestion of data!

Who is telling you that it is?

That is not to say that the speed can't be improved, but let's keep things 
in perspective.

And of course the speed does depend on your schema and actual data.

-- Jack Krupansky

-----Original Message----- 
From: horot
Sent: Wednesday, May 15, 2013 6:06 AM
To: solr-user@lucene.apache.org
Subject: how to increase upload into Solr 4.x ???

Hi,

I use to upload data with Pentahoo Kettle into Solr. The average speed is
3500-5000 records per second.
This is a very small speed. Is there a quick tool that would give the
highest speed, or it depends on the Solr?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-increase-upload-into-Solr-4-x-tp4063451.html
Sent from the Solr - User mailing list archive at Nabble.com.