You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2017/02/14 04:12:12 UTC

Indexing slower on a better system

Hi,

I'm facing the issue of the indexing speed is slower is slower on a server
with a much better specification with Solr running on SSD, as compared to a
laptop with a normal hard disk.

Both the system has the exact same configurations. The configurations are
first setup on the laptop, before being replicate to the server.

The setup is Solr 6.4.1, of 1 shard with 2 replica, using external
ZooKeeper 3.4.8. The only difference is that in my laptop, both the shards
and ZooKeeper are on the same hard disk, while a the server, the ZooKeeper
is running on it's own hard disk, and each of the shards are also running
on a separate hard disk. From what I know, this configuration should result
in improving the performance, instead of making it worse?

What could be the other reasons that this could happen?

I'm running on Solr 6.4.1

Regards,
Edwin

Re: Indexing slower on a better system

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Ok no problem.

So you were saying that in your case, your indexing speed is also faster at
your MacBook Pro, as compared to your Amazon EC2 servers which has better
specifications?

Regards,
Edwin


On 14 February 2017 at 14:17, Walter Underwood <wu...@wunderwood.org>
wrote:

> Sorry. Haven’t used Windows since seven years ago and haven’t run Windows
> as a server for more than a decade.
>
> I would not recommend using Windows as your Solr OS. Windows is just not
> designed for that.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Feb 13, 2017, at 10:12 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > Hi Walter,
> >
> > For your suggestion to try out the time gunzip < solr-6.4.1.tgz >
> > /dev/null, does it works on Windows system? I tried on Windows, and it
> give
> > me the error "The syntax of the command is incorrect".
> >
> > In my current setup, if running on one trip, I can index about 16000
> lines
> > in a CSV file per minute on my laptop, but I can only index less than
> 1600
> > lines per minute on the server, which is more than 10 times slower.
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 14 February 2017 at 13:45, Zheng Lin Edwin Yeo <ed...@gmail.com>
> > wrote:
> >
> >> Thanks for the info.
> >>
> >> Yes, I'm running Solr 6.4.1 on both hosts.
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On 14 February 2017 at 13:21, Walter Underwood <wu...@wunderwood.org>
> >> wrote:
> >>
> >>> It is worth doing a basic CPU speed test. Once you have enough RAM,
> >>> indexing is mostly CPU-bound.
> >>>
> >>> Try something like this. Run it once to get the tgz file cached in OS
> >>> file buffers, then once to time it.
> >>>
> >>> time gunzip < solr-6.4.1.tgz > /dev/null
> >>>
> >>> I get 1.3 seconds on an Amazon c4.8xlarge and 0.8 seconds on my
> MacBook.
> >>> A bigger file would be a better test, but that is the general idea.
> >>>
> >>> Also, are you running 6.4.1 on both hosts? The new metrics code caused
> >>> some slowdowns from 6.3.0 to 6.4.0.
> >>>
> >>> On the other hand, I’m indexing about a million documents per minute
> into
> >>> a 16 node cluster (4 shards, 4-way replication factor) built with the
> >>> c4.8xlarge instances. I’m running 64 indexing threads and 1000 doc
> batches.
> >>> It might go a bit faster after we switch the cloud driver in SolrJ.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wunder@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>
> >>>> On Feb 13, 2017, at 9:10 PM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com>
> >>> wrote:
> >>>>
> >>>> No, currently the server is slower, and my laptop is faster.
> >>>>
> >>>> But shouldn't the server be faster, since it has a much better
> >>>> specification, like more RAM, better processor and SSD drive.
> >>>>
> >>>> Regards,
> >>>> Edwin
> >>>>
> >>>>
> >>>> On 14 February 2017 at 12:26, Walter Underwood <wunder@wunderwood.org
> >
> >>>> wrote:
> >>>>
> >>>>> Are you sure the server is faster? My MacBook Pro is a lot faster
> than
> >>>>> many of our Amazon EC2 servers.
> >>>>>
> >>>>> wunder
> >>>>> Walter Underwood
> >>>>> wunder@wunderwood.org
> >>>>> http://observer.wunderwood.org/  (my blog)
> >>>>>
> >>>>>
> >>>>>> On Feb 13, 2017, at 8:12 PM, Zheng Lin Edwin Yeo <
> >>> edwinyeozl@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm facing the issue of the indexing speed is slower is slower on a
> >>>>> server
> >>>>>> with a much better specification with Solr running on SSD, as
> compared
> >>>>> to a
> >>>>>> laptop with a normal hard disk.
> >>>>>>
> >>>>>> Both the system has the exact same configurations. The
> configurations
> >>> are
> >>>>>> first setup on the laptop, before being replicate to the server.
> >>>>>>
> >>>>>> The setup is Solr 6.4.1, of 1 shard with 2 replica, using external
> >>>>>> ZooKeeper 3.4.8. The only difference is that in my laptop, both the
> >>>>> shards
> >>>>>> and ZooKeeper are on the same hard disk, while a the server, the
> >>>>> ZooKeeper
> >>>>>> is running on it's own hard disk, and each of the shards are also
> >>> running
> >>>>>> on a separate hard disk. From what I know, this configuration should
> >>>>> result
> >>>>>> in improving the performance, instead of making it worse?
> >>>>>>
> >>>>>> What could be the other reasons that this could happen?
> >>>>>>
> >>>>>> I'm running on Solr 6.4.1
> >>>>>>
> >>>>>> Regards,
> >>>>>> Edwin
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Indexing slower on a better system

Posted by Walter Underwood <wu...@wunderwood.org>.
Sorry. Haven’t used Windows since seven years ago and haven’t run Windows as a server for more than a decade.

I would not recommend using Windows as your Solr OS. Windows is just not designed for that.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 13, 2017, at 10:12 PM, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> 
> Hi Walter,
> 
> For your suggestion to try out the time gunzip < solr-6.4.1.tgz >
> /dev/null, does it works on Windows system? I tried on Windows, and it give
> me the error "The syntax of the command is incorrect".
> 
> In my current setup, if running on one trip, I can index about 16000 lines
> in a CSV file per minute on my laptop, but I can only index less than 1600
> lines per minute on the server, which is more than 10 times slower.
> 
> Regards,
> Edwin
> 
> 
> 
> On 14 February 2017 at 13:45, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> 
>> Thanks for the info.
>> 
>> Yes, I'm running Solr 6.4.1 on both hosts.
>> 
>> Regards,
>> Edwin
>> 
>> 
>> On 14 February 2017 at 13:21, Walter Underwood <wu...@wunderwood.org>
>> wrote:
>> 
>>> It is worth doing a basic CPU speed test. Once you have enough RAM,
>>> indexing is mostly CPU-bound.
>>> 
>>> Try something like this. Run it once to get the tgz file cached in OS
>>> file buffers, then once to time it.
>>> 
>>> time gunzip < solr-6.4.1.tgz > /dev/null
>>> 
>>> I get 1.3 seconds on an Amazon c4.8xlarge and 0.8 seconds on my MacBook.
>>> A bigger file would be a better test, but that is the general idea.
>>> 
>>> Also, are you running 6.4.1 on both hosts? The new metrics code caused
>>> some slowdowns from 6.3.0 to 6.4.0.
>>> 
>>> On the other hand, I’m indexing about a million documents per minute into
>>> a 16 node cluster (4 shards, 4-way replication factor) built with the
>>> c4.8xlarge instances. I’m running 64 indexing threads and 1000 doc batches.
>>> It might go a bit faster after we switch the cloud driver in SolrJ.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Feb 13, 2017, at 9:10 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
>>> wrote:
>>>> 
>>>> No, currently the server is slower, and my laptop is faster.
>>>> 
>>>> But shouldn't the server be faster, since it has a much better
>>>> specification, like more RAM, better processor and SSD drive.
>>>> 
>>>> Regards,
>>>> Edwin
>>>> 
>>>> 
>>>> On 14 February 2017 at 12:26, Walter Underwood <wu...@wunderwood.org>
>>>> wrote:
>>>> 
>>>>> Are you sure the server is faster? My MacBook Pro is a lot faster than
>>>>> many of our Amazon EC2 servers.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>> 
>>>>> 
>>>>>> On Feb 13, 2017, at 8:12 PM, Zheng Lin Edwin Yeo <
>>> edwinyeozl@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'm facing the issue of the indexing speed is slower is slower on a
>>>>> server
>>>>>> with a much better specification with Solr running on SSD, as compared
>>>>> to a
>>>>>> laptop with a normal hard disk.
>>>>>> 
>>>>>> Both the system has the exact same configurations. The configurations
>>> are
>>>>>> first setup on the laptop, before being replicate to the server.
>>>>>> 
>>>>>> The setup is Solr 6.4.1, of 1 shard with 2 replica, using external
>>>>>> ZooKeeper 3.4.8. The only difference is that in my laptop, both the
>>>>> shards
>>>>>> and ZooKeeper are on the same hard disk, while a the server, the
>>>>> ZooKeeper
>>>>>> is running on it's own hard disk, and each of the shards are also
>>> running
>>>>>> on a separate hard disk. From what I know, this configuration should
>>>>> result
>>>>>> in improving the performance, instead of making it worse?
>>>>>> 
>>>>>> What could be the other reasons that this could happen?
>>>>>> 
>>>>>> I'm running on Solr 6.4.1
>>>>>> 
>>>>>> Regards,
>>>>>> Edwin
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: Indexing slower on a better system

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Walter,

For your suggestion to try out the time gunzip < solr-6.4.1.tgz >
/dev/null, does it works on Windows system? I tried on Windows, and it give
me the error "The syntax of the command is incorrect".

In my current setup, if running on one trip, I can index about 16000 lines
in a CSV file per minute on my laptop, but I can only index less than 1600
lines per minute on the server, which is more than 10 times slower.

Regards,
Edwin



On 14 February 2017 at 13:45, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Thanks for the info.
>
> Yes, I'm running Solr 6.4.1 on both hosts.
>
> Regards,
> Edwin
>
>
> On 14 February 2017 at 13:21, Walter Underwood <wu...@wunderwood.org>
> wrote:
>
>> It is worth doing a basic CPU speed test. Once you have enough RAM,
>> indexing is mostly CPU-bound.
>>
>> Try something like this. Run it once to get the tgz file cached in OS
>> file buffers, then once to time it.
>>
>> time gunzip < solr-6.4.1.tgz > /dev/null
>>
>> I get 1.3 seconds on an Amazon c4.8xlarge and 0.8 seconds on my MacBook.
>> A bigger file would be a better test, but that is the general idea.
>>
>> Also, are you running 6.4.1 on both hosts? The new metrics code caused
>> some slowdowns from 6.3.0 to 6.4.0.
>>
>> On the other hand, I’m indexing about a million documents per minute into
>> a 16 node cluster (4 shards, 4-way replication factor) built with the
>> c4.8xlarge instances. I’m running 64 indexing threads and 1000 doc batches.
>> It might go a bit faster after we switch the cloud driver in SolrJ.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Feb 13, 2017, at 9:10 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> wrote:
>> >
>> > No, currently the server is slower, and my laptop is faster.
>> >
>> > But shouldn't the server be faster, since it has a much better
>> > specification, like more RAM, better processor and SSD drive.
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 14 February 2017 at 12:26, Walter Underwood <wu...@wunderwood.org>
>> > wrote:
>> >
>> >> Are you sure the server is faster? My MacBook Pro is a lot faster than
>> >> many of our Amazon EC2 servers.
>> >>
>> >> wunder
>> >> Walter Underwood
>> >> wunder@wunderwood.org
>> >> http://observer.wunderwood.org/  (my blog)
>> >>
>> >>
>> >>> On Feb 13, 2017, at 8:12 PM, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I'm facing the issue of the indexing speed is slower is slower on a
>> >> server
>> >>> with a much better specification with Solr running on SSD, as compared
>> >> to a
>> >>> laptop with a normal hard disk.
>> >>>
>> >>> Both the system has the exact same configurations. The configurations
>> are
>> >>> first setup on the laptop, before being replicate to the server.
>> >>>
>> >>> The setup is Solr 6.4.1, of 1 shard with 2 replica, using external
>> >>> ZooKeeper 3.4.8. The only difference is that in my laptop, both the
>> >> shards
>> >>> and ZooKeeper are on the same hard disk, while a the server, the
>> >> ZooKeeper
>> >>> is running on it's own hard disk, and each of the shards are also
>> running
>> >>> on a separate hard disk. From what I know, this configuration should
>> >> result
>> >>> in improving the performance, instead of making it worse?
>> >>>
>> >>> What could be the other reasons that this could happen?
>> >>>
>> >>> I'm running on Solr 6.4.1
>> >>>
>> >>> Regards,
>> >>> Edwin
>> >>
>> >>
>>
>>
>

Re: Indexing slower on a better system

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Thanks for the info.

Yes, I'm running Solr 6.4.1 on both hosts.

Regards,
Edwin


On 14 February 2017 at 13:21, Walter Underwood <wu...@wunderwood.org>
wrote:

> It is worth doing a basic CPU speed test. Once you have enough RAM,
> indexing is mostly CPU-bound.
>
> Try something like this. Run it once to get the tgz file cached in OS file
> buffers, then once to time it.
>
> time gunzip < solr-6.4.1.tgz > /dev/null
>
> I get 1.3 seconds on an Amazon c4.8xlarge and 0.8 seconds on my MacBook. A
> bigger file would be a better test, but that is the general idea.
>
> Also, are you running 6.4.1 on both hosts? The new metrics code caused
> some slowdowns from 6.3.0 to 6.4.0.
>
> On the other hand, I’m indexing about a million documents per minute into
> a 16 node cluster (4 shards, 4-way replication factor) built with the
> c4.8xlarge instances. I’m running 64 indexing threads and 1000 doc batches.
> It might go a bit faster after we switch the cloud driver in SolrJ.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Feb 13, 2017, at 9:10 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > No, currently the server is slower, and my laptop is faster.
> >
> > But shouldn't the server be faster, since it has a much better
> > specification, like more RAM, better processor and SSD drive.
> >
> > Regards,
> > Edwin
> >
> >
> > On 14 February 2017 at 12:26, Walter Underwood <wu...@wunderwood.org>
> > wrote:
> >
> >> Are you sure the server is faster? My MacBook Pro is a lot faster than
> >> many of our Amazon EC2 servers.
> >>
> >> wunder
> >> Walter Underwood
> >> wunder@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>> On Feb 13, 2017, at 8:12 PM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I'm facing the issue of the indexing speed is slower is slower on a
> >> server
> >>> with a much better specification with Solr running on SSD, as compared
> >> to a
> >>> laptop with a normal hard disk.
> >>>
> >>> Both the system has the exact same configurations. The configurations
> are
> >>> first setup on the laptop, before being replicate to the server.
> >>>
> >>> The setup is Solr 6.4.1, of 1 shard with 2 replica, using external
> >>> ZooKeeper 3.4.8. The only difference is that in my laptop, both the
> >> shards
> >>> and ZooKeeper are on the same hard disk, while a the server, the
> >> ZooKeeper
> >>> is running on it's own hard disk, and each of the shards are also
> running
> >>> on a separate hard disk. From what I know, this configuration should
> >> result
> >>> in improving the performance, instead of making it worse?
> >>>
> >>> What could be the other reasons that this could happen?
> >>>
> >>> I'm running on Solr 6.4.1
> >>>
> >>> Regards,
> >>> Edwin
> >>
> >>
>
>

Re: Indexing slower on a better system

Posted by Walter Underwood <wu...@wunderwood.org>.
It is worth doing a basic CPU speed test. Once you have enough RAM, indexing is mostly CPU-bound.

Try something like this. Run it once to get the tgz file cached in OS file buffers, then once to time it.

time gunzip < solr-6.4.1.tgz > /dev/null

I get 1.3 seconds on an Amazon c4.8xlarge and 0.8 seconds on my MacBook. A bigger file would be a better test, but that is the general idea.

Also, are you running 6.4.1 on both hosts? The new metrics code caused some slowdowns from 6.3.0 to 6.4.0.

On the other hand, I’m indexing about a million documents per minute into a 16 node cluster (4 shards, 4-way replication factor) built with the c4.8xlarge instances. I’m running 64 indexing threads and 1000 doc batches. It might go a bit faster after we switch the cloud driver in SolrJ.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 13, 2017, at 9:10 PM, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> 
> No, currently the server is slower, and my laptop is faster.
> 
> But shouldn't the server be faster, since it has a much better
> specification, like more RAM, better processor and SSD drive.
> 
> Regards,
> Edwin
> 
> 
> On 14 February 2017 at 12:26, Walter Underwood <wu...@wunderwood.org>
> wrote:
> 
>> Are you sure the server is faster? My MacBook Pro is a lot faster than
>> many of our Amazon EC2 servers.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Feb 13, 2017, at 8:12 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm facing the issue of the indexing speed is slower is slower on a
>> server
>>> with a much better specification with Solr running on SSD, as compared
>> to a
>>> laptop with a normal hard disk.
>>> 
>>> Both the system has the exact same configurations. The configurations are
>>> first setup on the laptop, before being replicate to the server.
>>> 
>>> The setup is Solr 6.4.1, of 1 shard with 2 replica, using external
>>> ZooKeeper 3.4.8. The only difference is that in my laptop, both the
>> shards
>>> and ZooKeeper are on the same hard disk, while a the server, the
>> ZooKeeper
>>> is running on it's own hard disk, and each of the shards are also running
>>> on a separate hard disk. From what I know, this configuration should
>> result
>>> in improving the performance, instead of making it worse?
>>> 
>>> What could be the other reasons that this could happen?
>>> 
>>> I'm running on Solr 6.4.1
>>> 
>>> Regards,
>>> Edwin
>> 
>> 


Re: Indexing slower on a better system

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
No, currently the server is slower, and my laptop is faster.

But shouldn't the server be faster, since it has a much better
specification, like more RAM, better processor and SSD drive.

Regards,
Edwin


On 14 February 2017 at 12:26, Walter Underwood <wu...@wunderwood.org>
wrote:

> Are you sure the server is faster? My MacBook Pro is a lot faster than
> many of our Amazon EC2 servers.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Feb 13, 2017, at 8:12 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I'm facing the issue of the indexing speed is slower is slower on a
> server
> > with a much better specification with Solr running on SSD, as compared
> to a
> > laptop with a normal hard disk.
> >
> > Both the system has the exact same configurations. The configurations are
> > first setup on the laptop, before being replicate to the server.
> >
> > The setup is Solr 6.4.1, of 1 shard with 2 replica, using external
> > ZooKeeper 3.4.8. The only difference is that in my laptop, both the
> shards
> > and ZooKeeper are on the same hard disk, while a the server, the
> ZooKeeper
> > is running on it's own hard disk, and each of the shards are also running
> > on a separate hard disk. From what I know, this configuration should
> result
> > in improving the performance, instead of making it worse?
> >
> > What could be the other reasons that this could happen?
> >
> > I'm running on Solr 6.4.1
> >
> > Regards,
> > Edwin
>
>

Re: Indexing slower on a better system

Posted by Walter Underwood <wu...@wunderwood.org>.
Are you sure the server is faster? My MacBook Pro is a lot faster than many of our Amazon EC2 servers.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 13, 2017, at 8:12 PM, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> 
> Hi,
> 
> I'm facing the issue of the indexing speed is slower is slower on a server
> with a much better specification with Solr running on SSD, as compared to a
> laptop with a normal hard disk.
> 
> Both the system has the exact same configurations. The configurations are
> first setup on the laptop, before being replicate to the server.
> 
> The setup is Solr 6.4.1, of 1 shard with 2 replica, using external
> ZooKeeper 3.4.8. The only difference is that in my laptop, both the shards
> and ZooKeeper are on the same hard disk, while a the server, the ZooKeeper
> is running on it's own hard disk, and each of the shards are also running
> on a separate hard disk. From what I know, this configuration should result
> in improving the performance, instead of making it worse?
> 
> What could be the other reasons that this could happen?
> 
> I'm running on Solr 6.4.1
> 
> Regards,
> Edwin