You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Josh Clum <jo...@gmail.com> on 2013/10/31 22:13:46 UTC

HdfsDirectory Implementation

Hello,

I refactored out the HDFS directory implementation from Solr to use in my
own project and was surprised to see how it performed. I'm using the both
the HDFSDirectory class and the
HdfsDirectoryFactory class.

On my local machine when using the cache there was a significant speed up.
It was a small enough that each file making up lucene index (12 docs) fit
into one block inside the cache.

When running it on a multinode cluster on AWS the performance pulling back
1031 docs with the cache was not that much better than without. According
to my log statements, the cache was being hit every time, but the
difference between this an my local was that there were several blocks per
file.

When setting up the cache I used the default setting as specified in
HdfsDirectoryFactory.

Any ideas on how to speed up searches? Should I change the block size? Is
there something that blur does to put a wrapper around the cache?

ON A MULTI NODE CLUSTER
Number of documents in directory[1031]
Try #1 -> Total execution time: 3776
Try #2 -> Total execution time: 2995
Try #3 -> Total execution time: 2683
Try #4 -> Total execution time: 2301
Try #5 -> Total execution time: 2174
Try #6 -> Total execution time: 2253
Try #7 -> Total execution time: 2184
Try #8 -> Total execution time: 2087
Try #9 -> Total execution time: 2157
Try #10 -> Total execution time: 2089
Cached try #1 -> Total execution time: 2065
Cached try #2 -> Total execution time: 2298
Cached try #3 -> Total execution time: 2398
Cached try #4 -> Total execution time: 2421
Cached try #5 -> Total execution time: 2080
Cached try #6 -> Total execution time: 2060
Cached try #7 -> Total execution time: 2285
Cached try #8 -> Total execution time: 2048
Cached try #9 -> Total execution time: 2087
Cached try #10 -> Total execution time: 2106

ON MY LOCAL
Number of documents in directory[12]
Try #1 -> Total execution time: 627
Try #2 -> Total execution time: 620
Try #3 -> Total execution time: 637
Try #4 -> Total execution time: 535
Try #5 -> Total execution time: 486
Try #6 -> Total execution time: 527
Try #7 -> Total execution time: 363
Try #8 -> Total execution time: 430
Try #9 -> Total execution time: 431
Try #10 -> Total execution time: 337
Cached try #1 -> Total execution time: 38
Cached try #2 -> Total execution time: 38
Cached try #3 -> Total execution time: 36
Cached try #4 -> Total execution time: 35
Cached try #5 -> Total execution time: 135
Cached try #6 -> Total execution time: 31
Cached try #7 -> Total execution time: 36
Cached try #8 -> Total execution time: 30
Cached try #9 -> Total execution time: 29
Cached try #10 -> Total execution time: 28

Thanks,
Josh