You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by anass talby <an...@gmail.com> on 2011/05/27 12:20:49 UTC
DIH render html entities
Is there any way to render html entities in DIH for a specific field?
Thanks
--
Anass
Re: DIH render html entities
Posted by Alexey Serba <as...@gmail.com>.
Maybe HTMLStripTransformer is what you are looking for.
* http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer
On Tue, May 31, 2011 at 5:35 PM, Erick Erickson <er...@gmail.com> wrote:
> Convert them to what? Individual fields in your docs? Text?
>
> If the former, you might get some joy from the XpathEntityProcessor.
> If you want to just strip the markup and index all the content you
> might get some joy from the various *html* analyzers listed here:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> Best
> Erick
>
> On Fri, May 27, 2011 at 5:19 AM, anass talby <an...@gmail.com> wrote:
>> Sorry my question was not clear.
>> when I get data from database, some field contains some html special chars,
>> and what i want to do is just convert them automatically.
>>
>> On Fri, May 27, 2011 at 1:00 PM, Gora Mohanty <go...@mimirtech.com> wrote:
>>
>>> On Fri, May 27, 2011 at 3:50 PM, anass talby <an...@gmail.com>
>>> wrote:
>>> > Is there any way to render html entities in DIH for a specific field?
>>> [...]
>>>
>>> This does not make too much sense: What do you mean by
>>> "rendering HTML entities". DIH just indexes, so where would
>>> it render HTML to, even if it could?
>>>
>>> Please take a look at http://wiki.apache.org/solr/UsingMailingLists
>>>
>>> Regards,
>>> Gora
>>>
>>
>>
>>
>> --
>> Anass
>>
>
Re: DIH render html entities
Posted by Erick Erickson <er...@gmail.com>.
Convert them to what? Individual fields in your docs? Text?
If the former, you might get some joy from the XpathEntityProcessor.
If you want to just strip the markup and index all the content you
might get some joy from the various *html* analyzers listed here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
Best
Erick
On Fri, May 27, 2011 at 5:19 AM, anass talby <an...@gmail.com> wrote:
> Sorry my question was not clear.
> when I get data from database, some field contains some html special chars,
> and what i want to do is just convert them automatically.
>
> On Fri, May 27, 2011 at 1:00 PM, Gora Mohanty <go...@mimirtech.com> wrote:
>
>> On Fri, May 27, 2011 at 3:50 PM, anass talby <an...@gmail.com>
>> wrote:
>> > Is there any way to render html entities in DIH for a specific field?
>> [...]
>>
>> This does not make too much sense: What do you mean by
>> "rendering HTML entities". DIH just indexes, so where would
>> it render HTML to, even if it could?
>>
>> Please take a look at http://wiki.apache.org/solr/UsingMailingLists
>>
>> Regards,
>> Gora
>>
>
>
>
> --
> Anass
>
Re: DIH render html entities
Posted by anass talby <an...@gmail.com>.
Sorry my question was not clear.
when I get data from database, some field contains some html special chars,
and what i want to do is just convert them automatically.
On Fri, May 27, 2011 at 1:00 PM, Gora Mohanty <go...@mimirtech.com> wrote:
> On Fri, May 27, 2011 at 3:50 PM, anass talby <an...@gmail.com>
> wrote:
> > Is there any way to render html entities in DIH for a specific field?
> [...]
>
> This does not make too much sense: What do you mean by
> "rendering HTML entities". DIH just indexes, so where would
> it render HTML to, even if it could?
>
> Please take a look at http://wiki.apache.org/solr/UsingMailingLists
>
> Regards,
> Gora
>
--
Anass
Re: DIH render html entities
Posted by Gora Mohanty <go...@mimirtech.com>.
On Fri, May 27, 2011 at 3:50 PM, anass talby <an...@gmail.com> wrote:
> Is there any way to render html entities in DIH for a specific field?
[...]
This does not make too much sense: What do you mean by
"rendering HTML entities". DIH just indexes, so where would
it render HTML to, even if it could?
Please take a look at http://wiki.apache.org/solr/UsingMailingLists
Regards,
Gora