You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by tfaltinat <tf...@iet-solutions.de> on 2018/08/10 14:02:56 UTC

Rich Text Format - Clob

Hi,

we have an Oracle database where we store Rtf content into a Clob column.
Now we try to index those records but we just want to get the plain text,
same as Tika does. I tried to use the TikaEntityProcessor but I’m getting
the following error message:

ClassCastException: java.io.StringReader cannot be cast to
java.io.InputStream

The configuration looks like this:

<dataSource name="f1" type="FieldReaderDataSource"/>

<entity name="SV_SOLVE_TXT" onError="continue" transformer="ClobTransformer"
query="select SOLUTION_ID, SOLUTION_TXT SOLUTION_TXT from IT_SOLUTION where
SOLUTION_ID = '${ts3_it_solution_text_search.SOLUTION_ID}'">
	<field name="text_4" column="SOLUTION_TXT" clob="true" />
	<entity name="tika_SOLUTION_TXT" onError="continue"
processor="TikaEntityProcessor" url="${SV_SOLVE_TXT.text_4}"
dataField="SV_SOLVE_TXT.text_4"  dataSource="f1" >
		<field name="text_1" column="text"/>
	</entity>
</entity>

Thx & Regards,
Torsten




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Rich Text Format - Clob

Posted by tfaltinat <tf...@iet-solutions.de>.
Hi Alex,

I'm using the ClobTransformer but now I'm getting ClassCastException:
java.io.StringReader cannot be cast to java.io.InputStream 

Maybe just my configuration is not okay.

Regards,
Torsten



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Rich Text Format - Clob

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I think you need ClobTransformer at some point in the processing:
https://lucene.apache.org/solr/guide/7_4/uploading-structured-data-store-data-with-the-data-import-handler.html#clobtransformer

Regards,
   Alex.

On 10 August 2018 at 10:02, tfaltinat <tf...@iet-solutions.de> wrote:
> Hi,
>
> we have an Oracle database where we store Rtf content into a Clob column.
> Now we try to index those records but we just want to get the plain text,
> same as Tika does. I tried to use the TikaEntityProcessor but I’m getting
> the following error message:
>
> ClassCastException: java.io.StringReader cannot be cast to
> java.io.InputStream
>
> The configuration looks like this:
>
> <dataSource name="f1" type="FieldReaderDataSource"/>
>
> <entity name="SV_SOLVE_TXT" onError="continue" transformer="ClobTransformer"
> query="select SOLUTION_ID, SOLUTION_TXT SOLUTION_TXT from IT_SOLUTION where
> SOLUTION_ID = '${ts3_it_solution_text_search.SOLUTION_ID}'">
>         <field name="text_4" column="SOLUTION_TXT" clob="true" />
>         <entity name="tika_SOLUTION_TXT" onError="continue"
> processor="TikaEntityProcessor" url="${SV_SOLVE_TXT.text_4}"
> dataField="SV_SOLVE_TXT.text_4"  dataSource="f1" >
>                 <field name="text_1" column="text"/>
>         </entity>
> </entity>
>
> Thx & Regards,
> Torsten
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html