You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John kim <mo...@gmail.com> on 2012/02/01 03:12:03 UTC

not displaying html code in the results

Hello,

What options do I have to hide "ugly" data in the search results? For
example, I am crawling HTML pages and some documents have loose tags
or a long string such as "32lkj31U682860678Stock "

I could scrub the data before getting ingested into the index. (html
parsing, removing strings longer than x characters)

Once the data is in the index, is there anything i can do to the index
to not display ugly data?

Once the data is returned, i could create some rules to hide certain text...

what's the best way to go about this problem?