Have you considered using bi-grams and tri-grams? It might be useful indexing with NgramFilter and then searching for N-grams through the text. You could also count the number of times a particular document consists of "Car Insurance Rate" for term-frequency etc. -Hemant