You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John kim <mo...@gmail.com> on 2012/02/01 03:12:03 UTC
not displaying html code in the results
Hello,
What options do I have to hide "ugly" data in the search results? For
example, I am crawling HTML pages and some documents have loose tags
or a long string such as "32lkj31U682860678Stock "
I could scrub the data before getting ingested into the index. (html
parsing, removing strings longer than x characters)
Once the data is in the index, is there anything i can do to the index
to not display ugly data?
Once the data is returned, i could create some rules to hide certain text...
what's the best way to go about this problem?