You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by anarchos78 <ri...@hotmail.com> on 2014/05/23 12:18:28 UTC

Import data from Mysql concat issues

Hi,

I'm trying to index data from mysql. The indexing  is successful. Then I
tried to use the mysql concat function (data-config.xml) in order to
concatenate a custom string with a field like this: *CONCAT('(',
CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') *. The
custom string ('ΤΜΗΜΑ') in Greek. Then when I try to query the field solr
returns "?????" instead of "ΤΜΗΜΑ". I have also use this: *CONCAT('(',
'ΤΜΗΜΑ', ' ', apofasi_tmima, ')')* with no success.
The "data-config.xml" file is utf-8 encoded and at the beginning there is
the "<?xml version="1.0" encoding="UTF-8"?>" xml directive. I have also
tried to set in the dataSource url “characterEncoding=utf8” but indexing
fails.
What am I missing here? Is there any workaround for this?
Below is a snippet from data-config.xml:

<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
   <dataSource type="JdbcDataSource" autoCommit="true" batchSize="-1"
convertType="false" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://127.0.0.1:3306/apofaseis?zeroDateTimeBehavior=convertToNull"
user="root" password="xxxx" name="db" />
   <dataSource name="fieldReader" type="FieldStreamDataSource" />
  <document>	
   
  <entity name=" apofaseis _2000 "
  	dataSource="db" 
  	transformer="HTMLStripTransformer"
  	query="select id, CONCAT_WS('',	CONCAT(apofasi_number, '/',	apofasi_date,
' ', (CASE apofasi_tmima WHEN NULL THEN	'' WHEN '' THEN	'' ELSE	CONCAT('(',
CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') END))) AS
grid_title, CAST(CONCAT_WS('_',id,model) AS CHAR) AS solr_id,
apofasi_number, apofasi_date, apofasi_tmima, CONCAT(IFNULL(apofasi_thema,
''), ' ', IFNULL(apofasi_description, ''), ' ', apofasi_body) AS content,
type, model, url, search_tag, last_modified, CONCAT_WS('',
CONCAT(apofasi_number, '/',	apofasi_date,	' ', (CASE apofasi_tmima WHEN NULL
THEN	'' WHEN '' THEN	'' ELSE	CONCAT('(', CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET
utf8), ' ', apofasi_tmima, ')') END))) AS title from apofaseis _2000 where
type = 'text'" 
...
...


Regards,
anarchos78




--
View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Import data from Mysql concat issues

Posted by anarchos78 <ri...@hotmail.com>.
Tomcat setup is fine. I insist that it's Solr's issue. The whole index
consists of Greek (funny characters) and solr returns them normally. The
problem here is that I cannot concatenate Greek characters in
"data-config.xml" (hard-coded).



--
View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814p4137939.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Import data from Mysql concat issues

Posted by Erick Erickson <er...@gmail.com>.
bq: I think that it happens at index time

How do you know that? If you're looking at the results in a browser
you do _not_ know that. If you're looking at the raw values in, say,
SolrJ then you _might_ know that, there's still the issue of whether
you're sending the docs to Solr and your servlet container is munging
them. In this latter case, you're right the info is indexed that way.

Either way, it's not Solr's problem. Solr works just fine with utf-8.
So if the data is getting into the index with weird characters, it's
your setup. If it's just a browser problem, it's _still_ a problem
with your setup.

FWIW,
Erick

On Fri, May 23, 2014 at 10:36 AM, anarchos78
<ri...@hotmail.com> wrote:
> I think that it happens at index time. The reason is that when i query for
> the specific field solr returns the "?????" string!
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814p4137908.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Import data from Mysql concat issues

Posted by anarchos78 <ri...@hotmail.com>.
I think that it happens at index time. The reason is that when i query for
the specific field solr returns the "?????" string!



--
View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814p4137908.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Import data from Mysql concat issues

Posted by Erick Erickson <er...@gmail.com>.
Couple of possibilities:

1> The data in Solr is fine. However, your browser is getting the
proper characters back but is not set up to handle the proper
character set so displays ????.

2> Your servlet container is not set up (either inbound or outbound)
to handle the character set you're sending it.


Best,
Erick

On Fri, May 23, 2014 at 3:18 AM, anarchos78
<ri...@hotmail.com> wrote:
> Hi,
>
> I'm trying to index data from mysql. The indexing  is successful. Then I
> tried to use the mysql concat function (data-config.xml) in order to
> concatenate a custom string with a field like this: *CONCAT('(',
> CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') *. The
> custom string ('ΤΜΗΜΑ') in Greek. Then when I try to query the field solr
> returns "?????" instead of "ΤΜΗΜΑ". I have also use this: *CONCAT('(',
> 'ΤΜΗΜΑ', ' ', apofasi_tmima, ')')* with no success.
> The "data-config.xml" file is utf-8 encoded and at the beginning there is
> the "<?xml version="1.0" encoding="UTF-8"?>" xml directive. I have also
> tried to set in the dataSource url “characterEncoding=utf8” but indexing
> fails.
> What am I missing here? Is there any workaround for this?
> Below is a snippet from data-config.xml:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <dataConfig>
>    <dataSource type="JdbcDataSource" autoCommit="true" batchSize="-1"
> convertType="false" driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://127.0.0.1:3306/apofaseis?zeroDateTimeBehavior=convertToNull"
> user="root" password="xxxx" name="db" />
>    <dataSource name="fieldReader" type="FieldStreamDataSource" />
>   <document>
>
>   <entity name=" apofaseis _2000 "
>         dataSource="db"
>         transformer="HTMLStripTransformer"
>         query="select id, CONCAT_WS('', CONCAT(apofasi_number, '/',     apofasi_date,
> ' ', (CASE apofasi_tmima WHEN NULL THEN '' WHEN '' THEN '' ELSE CONCAT('(',
> CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') END))) AS
> grid_title, CAST(CONCAT_WS('_',id,model) AS CHAR) AS solr_id,
> apofasi_number, apofasi_date, apofasi_tmima, CONCAT(IFNULL(apofasi_thema,
> ''), ' ', IFNULL(apofasi_description, ''), ' ', apofasi_body) AS content,
> type, model, url, search_tag, last_modified, CONCAT_WS('',
> CONCAT(apofasi_number, '/',     apofasi_date,   ' ', (CASE apofasi_tmima WHEN NULL
> THEN    '' WHEN '' THEN '' ELSE CONCAT('(', CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET
> utf8), ' ', apofasi_tmima, ')') END))) AS title from apofaseis _2000 where
> type = 'text'"
> ...
> ...
>
>
> Regards,
> anarchos78
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814.html
> Sent from the Solr - User mailing list archive at Nabble.com.