You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by "Paton, Diego" <di...@teamaol.com> on 2014/02/19 12:46:02 UTC

Re: encoding problem iterating result set using a fuseki endpoint

re-sending just to receive all replies that are sent to the list ( I wasn't subscribed to the list )


On 19 Feb 2014, at 11:26, Diego Paton <di...@teamaol.com>> wrote:

Hi,


In my department we have stored one of the lasts dumps of the Freebase ontology using TDB and Fuseki.

TDB: 2.11
Fuseki: 1.0

We have a dataset defined ( /freebase/data ) and we execute SPARQL queries using Fuseki server and It works fine.


  *   We execute a query, for example:

prefix fb: <http://rdf.freebase.com/ns/>
select ?mid ?e ?nf ?desc
where {
      ?mid fb:type.object.name ?e .
     ?mid fb:common.topic.notable_for ?notab_for .
     ?mid fb:common.topic.description ?desc .
      ?notab_for fb:common.notable_for.display_name ?nf .
     FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en") && langMatches(lang(?desc), "en"))
}
We are able to output results as a TSV file that contains 6158452  lines without problems.

But when we try to execute the same query from a Java application using ARQ ( 2.8.7 ), we have problems.


  *   This is the code that I execute :


String ontology_service ="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
String query =
"prefix fb: <http://rdf.freebase.com/ns/>\n"+
" select ?mid ?e ?nf ?desc\n"+
" where {\n"+
"     ?mid fb:type.object.name ?e .\n"+
"     ?mid fb:common.topic.notable_for ?notab_for .\n"+
"     ?mid fb:common.topic.description ?desc .\n"+
"     ?notab_for fb:common.notable_for.display_name ?nf .\n"+
"     FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf), \"en\") && langMatches(lang(?desc), \"en\"))\n"+
" }\n";

QueryExecution queryExecution = QueryExecutionFactory.sparqlService(ontology_service, query);

ResultSet resultSet = queryExecution.execSelect();

ResultSetFormatter.outputAsTSV(System.out, resultSet);

while(resultSet.hasNext()){

QuerySolution querySolution = resultSet.next();

querySolution.get("mid");
querySolution.get("e");
querySolution.get("nf");
querySolution.get("desc");

}

  *   After processing around 400k of results, we have this exception:

             com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 2))
  at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:679)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:460)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:216)
at com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:42)
Exception in thread "main" com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException: Illegal character ((CTRL-CHAR, code 2))
   at [row,col {unknown-source}]: [5859511,312]
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLInputStAX.java:510)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:220)
at com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:42)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 2))
 at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:679)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:460)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:216)
... 1 more


  *   I have tried with showing the results using a ResultSetFormatter but I obtain the same exception:

 ResultSetFormatter.outputAsTSV(System.out, resultSet);


I would say that the ontology is well formed with the correct encoding in TBD, because through Fuseki we are able to obtain correct results.

I have read documentation related to this, but I can't find a way to set the correct encoding if it is required. Or It is possible that I am not using the correct way to execute it.

I would be grateful if you could help me.

Thanks in advance.


Diego Paton.

Re: encoding problem iterating result set using a fuseki endpoint

Posted by Andy Seaborne <an...@apache.org>.

On 20/02/14 10:48, Paton, Diego wrote:
> Hi,
>
> I isolated the project and now it is running with the new version.
>
> I executed the query through a http request using XML output format to see the line that causes the error:
>
> <literal xml:lang="en">Next 2008 by COMPATIBLES2&#x0A;Release Date: 19-Oct-2008&#x0A;UPC: 859701078081&#x0A;Genre: Electronic (primary), New Age (secondary)&#x0A;&#x0A;Electronica; Ambient; Downtempo; Chillout; Electronic; IDM; Breakbeat; Indie; Alternative; Music; Soundtrack; Mellow; Slow.&#x0A;&#x0AAll music Composed, Performed and Recorded by Compatibles2, using the KORG M3 + Radius EXB - Music Workstation.Sampler, between Q2 2007 and Q4 2008, Den Hoorn, the Netherlands. No other instruments used, 100% Digital production.</literal>

Bad escape sequence: &#x0AAll music

No ';'

XML parsers are often very, very picky and insist on perfectly correct 
input (think business process documents exchanging money - they must 
conform to strict checking).

	Andy

Re: encoding problem iterating result set using a fuseki endpoint

Posted by "Paton, Diego" <di...@teamaol.com>.

Hi,

I isolated the project and now it is running with the new version.

I executed the query through a http request using XML output format to see the line that causes the error:

<literal xml:lang="en">Next 2008 by COMPATIBLES2&#x0A;Release Date: 19-Oct-2008&#x0A;UPC: 859701078081&#x0A;Genre: Electronic (primary), New Age (secondary)&#x0A;&#x0A;Electronica; Ambient; Downtempo; Chillout; Electronic; IDM; Breakbeat; Indie; Alternative; Music; Soundtrack; Mellow; Slow.&#x0A;&#x0AAll music Composed, Performed and Recorded by Compatibles2, using the KORG M3 + Radius EXB - Music Workstation.Sampler, between Q2 2007 and Q4 2008, Den Hoorn, the Netherlands. No other instruments used, 100% Digital production.</literal>


After that, I tried to execute the query using QueryEngineHTTP object instead of QueryExecution:

QueryEngineHTTP queryEngine = QueryExecutionFactory.createServiceRequest(ontology_service, query);
        queryEngine.setSelectContentType("text/tab-separated-values");
        ResultSet resultSet = queryEngine.execSelect();
        queryExecution.execSelect();



And I can iterate all the resultset without problems if i set the SelectContentType with text/tab-separated-values but if not, I get the same problem.

This is the value of the QuerySolution that corresponds to the line that generates the error.

Next 2008 by COMPATIBLES2
Release Date: 19-Oct-2008
UPC: 859701078081
Genre: Electronic (primary), New Age (secondary)

Electronica; Ambient; Downtempo; Chillout; Electronic; IDM; Breakbeat; Indie; Alternative; Music; Soundtrack; Mellow; Slow.

All music Composed, Performed and Recorded by Compatibles2, using the KORG M3 + Radius EXB - Music Workstation.Sampler, between Q2 2007 and Q4 2008, Den Hoorn, the Netherlands. No other instruments used, 100% Digital production.@en

The content looks fine, so what I don't understand where is the problem when I use QueryExecution or QueryEngineHTTP with text/tab-separated-values  selected.

Thanks for your help,

Diego.


On 19 Feb 2014, at 13:56, Rob Vesse <rv...@dotnetrdf.org>> wrote:

This now looks like an Apache HttpClient version clash on your class path

Jena is using HttpClient 4.2.3 and presumably something in your class path
uses a different version of HttpClient because it's picking up the wrong
classes leading to the NoSuchFieldError

Rob

On 19/02/2014 12:45, "Paton, Diego" <di...@teamaol.com>>
wrote:

Hi,

After updating the dependency to 2.11.1, I get this error when executes
this line:

Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
at
org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(Default
HttpClient.java:175)
at
org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttp
Client.java:158)
at
org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClien
t.java:448)
at
com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:308)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineH
TTP.java:345)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:36)

Executing this line:

ResultSet resultSet = queryExecution.execSelect();

It seems I have to set the charset somewhere.

Thanks,

Diego.


On 19 Feb 2014, at 12:25, Rob Vesse
<rv...@dotnetrdf.org>>
wrote:

Why are you using such an outdated version of ARQ?

2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
to the latest version which is 2.11.1 and let us know if you still have
problems

Rob

On 19/02/2014 11:46, "Paton, Diego"
<di...@teamaol.com><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com>>>
wrote:


re-sending just to receive all replies that are sent to the list ( I
wasn't subscribed to the list )


On 19 Feb 2014, at 11:26, Diego Paton
<di...@teamaol.com><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com>><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com><http://aol.com>>> wrote:

Hi,


In my department we have stored one of the lasts dumps of the Freebase
ontology using TDB and Fuseki.

TDB: 2.11
Fuseki: 1.0

We have a dataset defined ( /freebase/data ) and we execute SPARQL
queries using Fuseki server and It works fine.


*   We execute a query, for example:

prefix fb: <http://rdf.freebase.com/ns/>
select ?mid ?e ?nf ?desc
where {
  ?mid fb:type.object.name ?e .
 ?mid fb:common.topic.notable_for ?notab_for .
 ?mid fb:common.topic.description ?desc .
  ?notab_for fb:common.notable_for.display_name ?nf .
 FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
&& langMatches(lang(?desc), "en"))
}
We are able to output results as a TSV file that contains 6158452  lines
without problems.

But when we try to execute the same query from a Java application using
ARQ ( 2.8.7 ), we have problems.


*   This is the code that I execute :


String ontology_service
="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
String query =
"prefix fb: <http://rdf.freebase.com/ns/>\n"+
" select ?mid ?e ?nf ?desc\n"+
" where {\n"+
"     ?mid fb:type.object.name ?e .\n"+
"     ?mid fb:common.topic.notable_for ?notab_for .\n"+
"     ?mid fb:common.topic.description ?desc .\n"+
"     ?notab_for fb:common.notable_for.display_name ?nf .\n"+
"     FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
\"en\") && langMatches(lang(?desc), \"en\"))\n"+
" }\n";

QueryExecution queryExecution =
QueryExecutionFactory.sparqlService(ontology_service, query);

ResultSet resultSet = queryExecution.execSelect();

ResultSetFormatter.outputAsTSV(System.out, resultSet);

while(resultSet.hasNext()){

QuerySolution querySolution = resultSet.next();

querySolution.get("mid");
querySolution.get("e");
querySolution.get("nf");
querySolution.get("desc");

}

*   After processing around 400k of results, we have this exception:

         com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Exception in thread "main"
com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
Illegal character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
nputStAX.java:510)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:220)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
... 1 more


*   I have tried with showing the results using a ResultSetFormatter
but I obtain the same exception:

ResultSetFormatter.outputAsTSV(System.out, resultSet);


I would say that the ontology is well formed with the correct encoding in
TBD, because through Fuseki we are able to obtain correct results.

I have read documentation related to this, but I can't find a way to set
the correct encoding if it is required. Or It is possible that I am not
using the correct way to execute it.

I would be grateful if you could help me.

Thanks in advance.


Diego Paton.

Re: encoding problem iterating result set using a fuseki endpoint

Posted by Rob Vesse <rv...@dotnetrdf.org>.

This now looks like an Apache HttpClient version clash on your class path

Jena is using HttpClient 4.2.3 and presumably something in your class path
uses a different version of HttpClient because it's picking up the wrong
classes leading to the NoSuchFieldError

Rob

On 19/02/2014 12:45, "Paton, Diego" <di...@teamaol.com>
wrote:

>Hi,
>
>After updating the dependency to 2.11.1, I get this error when executes
>this line:
>
>Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
>at 
>org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(Default
>HttpClient.java:175)
>at 
>org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttp
>Client.java:158)
>at 
>org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClien
>t.java:448)
>at 
>com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:308)
>at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
>at 
>com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineH
>TTP.java:345)
>at 
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:36)
>
>Executing this line:
>
>ResultSet resultSet = queryExecution.execSelect();
>
>It seems I have to set the charset somewhere.
>
>Thanks,
>
>Diego.
>
>
>On 19 Feb 2014, at 12:25, Rob Vesse
><rv...@dotnetrdf.org>>
> wrote:
>
>Why are you using such an outdated version of ARQ?
>
>2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
>to the latest version which is 2.11.1 and let us know if you still have
>problems
>
>Rob
>
>On 19/02/2014 11:46, "Paton, Diego"
><diego.paton.villahermosa@teamaol.com<mailto:diego.paton.villahermosa@team
>aol.com>>
>wrote:
>
>
>re-sending just to receive all replies that are sent to the list ( I
>wasn't subscribed to the list )
>
>
>On 19 Feb 2014, at 11:26, Diego Paton
><diego.paton.villahermosa@teamaol.com<mailto:diego.paton.villahermosa@team
>aol.com><mailto:diego.paton.villahermosa@team
>aol.com<http://aol.com>>> wrote:
>
>Hi,
>
>
>In my department we have stored one of the lasts dumps of the Freebase
>ontology using TDB and Fuseki.
>
>TDB: 2.11
>Fuseki: 1.0
>
>We have a dataset defined ( /freebase/data ) and we execute SPARQL
>queries using Fuseki server and It works fine.
>
>
>*   We execute a query, for example:
>
>prefix fb: <http://rdf.freebase.com/ns/>
>select ?mid ?e ?nf ?desc
>where {
>    ?mid fb:type.object.name ?e .
>   ?mid fb:common.topic.notable_for ?notab_for .
>   ?mid fb:common.topic.description ?desc .
>    ?notab_for fb:common.notable_for.display_name ?nf .
>   FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
>&& langMatches(lang(?desc), "en"))
>}
>We are able to output results as a TSV file that contains 6158452  lines
>without problems.
>
>But when we try to execute the same query from a Java application using
>ARQ ( 2.8.7 ), we have problems.
>
>
>*   This is the code that I execute :
>
>
>String ontology_service
>="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
>String query =
>"prefix fb: <http://rdf.freebase.com/ns/>\n"+
>" select ?mid ?e ?nf ?desc\n"+
>" where {\n"+
>"     ?mid fb:type.object.name ?e .\n"+
>"     ?mid fb:common.topic.notable_for ?notab_for .\n"+
>"     ?mid fb:common.topic.description ?desc .\n"+
>"     ?notab_for fb:common.notable_for.display_name ?nf .\n"+
>"     FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
>\"en\") && langMatches(lang(?desc), \"en\"))\n"+
>" }\n";
>
>QueryExecution queryExecution =
>QueryExecutionFactory.sparqlService(ontology_service, query);
>
>ResultSet resultSet = queryExecution.execSelect();
>
>ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>while(resultSet.hasNext()){
>
>QuerySolution querySolution = resultSet.next();
>
>querySolution.get("mid");
>querySolution.get("e");
>querySolution.get("nf");
>querySolution.get("desc");
>
>}
>
>*   After processing around 400k of results, we have this exception:
>
>           com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
>at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>at
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Exception in thread "main"
>com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
>Illegal character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
>nputStAX.java:510)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:220)
>at
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
>at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>... 1 more
>
>
>*   I have tried with showing the results using a ResultSetFormatter
>but I obtain the same exception:
>
>ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>
>I would say that the ontology is well formed with the correct encoding in
>TBD, because through Fuseki we are able to obtain correct results.
>
>I have read documentation related to this, but I can't find a way to set
>the correct encoding if it is required. Or It is possible that I am not
>using the correct way to execute it.
>
>I would be grateful if you could help me.
>
>Thanks in advance.
>
>
>Diego Paton.
>
>
>
>
>
>

Re: encoding problem iterating result set using a fuseki endpoint

Posted by Andy Seaborne <an...@apache.org>.

On 19/02/14 12:45, Paton, Diego wrote:
> Hi,
>
> After updating the dependency to 2.11.1, I get this error when executes this line:
>
> Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET

Java error. You need to update the dependencies as well.

> at org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(DefaultHttpClient.java:175)

Why not issue the request using curl or wget to grab the HTTP response 
and look at [row,col {unknown-source}]: [5859511,312]

My guess is that data has a charset error  Nothing to do the results 
themselves.

Also, try a different format.  Ask for TSV or JSON (--header "Accept: 
application/sparql-results+json" etc)

> *   After processing around 400k of results, we have this exception:
>
>             com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
> character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
> at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
> at
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
> :4668)

>
>
> *   I have tried with showing the results using a ResultSetFormatter
> but I obtain the same exception:

That is still getting it as XML and the writing locally as TSV.

>
> ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>
> I would say that the ontology is well formed with the correct encoding in
> TBD, because through Fuseki we are able to obtain correct results.

Example?

>
> I have read documentation related to this, but I can't find a way to set
> the correct encoding if it is required. Or It is possible that I am not
> using the correct way to execute it.
>
> I would be grateful if you could help me.
>
> Thanks in advance.
>
>
> Diego Paton.
>
>
>
>
>
>
>

Re: encoding problem iterating result set using a fuseki endpoint

Posted by "Paton, Diego" <di...@teamaol.com>.

Hi,

After updating the dependency to 2.11.1, I get this error when executes this line:

Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
at org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(DefaultHttpClient.java:175)
at org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttpClient.java:158)
at org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClient.java:448)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:308)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:345)
at com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:36)

Executing this line:

ResultSet resultSet = queryExecution.execSelect();

It seems I have to set the charset somewhere.

Thanks,

Diego.


On 19 Feb 2014, at 12:25, Rob Vesse <rv...@dotnetrdf.org>>
 wrote:

Why are you using such an outdated version of ARQ?

2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
to the latest version which is 2.11.1 and let us know if you still have
problems

Rob

On 19/02/2014 11:46, "Paton, Diego" <di...@teamaol.com>>
wrote:


re-sending just to receive all replies that are sent to the list ( I
wasn't subscribed to the list )


On 19 Feb 2014, at 11:26, Diego Paton
<di...@teamaol.com><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com>>> wrote:

Hi,


In my department we have stored one of the lasts dumps of the Freebase
ontology using TDB and Fuseki.

TDB: 2.11
Fuseki: 1.0

We have a dataset defined ( /freebase/data ) and we execute SPARQL
queries using Fuseki server and It works fine.


*   We execute a query, for example:

prefix fb: <http://rdf.freebase.com/ns/>
select ?mid ?e ?nf ?desc
where {
    ?mid fb:type.object.name ?e .
   ?mid fb:common.topic.notable_for ?notab_for .
   ?mid fb:common.topic.description ?desc .
    ?notab_for fb:common.notable_for.display_name ?nf .
   FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
&& langMatches(lang(?desc), "en"))
}
We are able to output results as a TSV file that contains 6158452  lines
without problems.

But when we try to execute the same query from a Java application using
ARQ ( 2.8.7 ), we have problems.


*   This is the code that I execute :


String ontology_service
="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
String query =
"prefix fb: <http://rdf.freebase.com/ns/>\n"+
" select ?mid ?e ?nf ?desc\n"+
" where {\n"+
"     ?mid fb:type.object.name ?e .\n"+
"     ?mid fb:common.topic.notable_for ?notab_for .\n"+
"     ?mid fb:common.topic.description ?desc .\n"+
"     ?notab_for fb:common.notable_for.display_name ?nf .\n"+
"     FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
\"en\") && langMatches(lang(?desc), \"en\"))\n"+
" }\n";

QueryExecution queryExecution =
QueryExecutionFactory.sparqlService(ontology_service, query);

ResultSet resultSet = queryExecution.execSelect();

ResultSetFormatter.outputAsTSV(System.out, resultSet);

while(resultSet.hasNext()){

QuerySolution querySolution = resultSet.next();

querySolution.get("mid");
querySolution.get("e");
querySolution.get("nf");
querySolution.get("desc");

}

*   After processing around 400k of results, we have this exception:

           com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Exception in thread "main"
com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
Illegal character ((CTRL-CHAR, code 2))
 at [row,col {unknown-source}]: [5859511,312]
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
nputStAX.java:510)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:220)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
... 1 more


*   I have tried with showing the results using a ResultSetFormatter
but I obtain the same exception:

ResultSetFormatter.outputAsTSV(System.out, resultSet);


I would say that the ontology is well formed with the correct encoding in
TBD, because through Fuseki we are able to obtain correct results.

I have read documentation related to this, but I can't find a way to set
the correct encoding if it is required. Or It is possible that I am not
using the correct way to execute it.

I would be grateful if you could help me.

Thanks in advance.


Diego Paton.

Re: encoding problem iterating result set using a fuseki endpoint

Posted by Rob Vesse <rv...@dotnetrdf.org>.

Why are you using such an outdated version of ARQ?

2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
to the latest version which is 2.11.1 and let us know if you still have
problems

Rob

On 19/02/2014 11:46, "Paton, Diego" <di...@teamaol.com>
wrote:

>
>re-sending just to receive all replies that are sent to the list ( I
>wasn't subscribed to the list )
>
>
>On 19 Feb 2014, at 11:26, Diego Paton
><diego.paton.villahermosa@teamaol.com<mailto:diego.paton.villahermosa@team
>aol.com>> wrote:
>
>Hi,
>
>
>In my department we have stored one of the lasts dumps of the Freebase
>ontology using TDB and Fuseki.
>
>TDB: 2.11
>Fuseki: 1.0
>
>We have a dataset defined ( /freebase/data ) and we execute SPARQL
>queries using Fuseki server and It works fine.
>
>
>  *   We execute a query, for example:
>
>prefix fb: <http://rdf.freebase.com/ns/>
>select ?mid ?e ?nf ?desc
>where {
>      ?mid fb:type.object.name ?e .
>     ?mid fb:common.topic.notable_for ?notab_for .
>     ?mid fb:common.topic.description ?desc .
>      ?notab_for fb:common.notable_for.display_name ?nf .
>     FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
>&& langMatches(lang(?desc), "en"))
>}
>We are able to output results as a TSV file that contains 6158452  lines
>without problems.
>
>But when we try to execute the same query from a Java application using
>ARQ ( 2.8.7 ), we have problems.
>
>
>  *   This is the code that I execute :
>
>
>String ontology_service
>="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
>String query =
>"prefix fb: <http://rdf.freebase.com/ns/>\n"+
>" select ?mid ?e ?nf ?desc\n"+
>" where {\n"+
>"     ?mid fb:type.object.name ?e .\n"+
>"     ?mid fb:common.topic.notable_for ?notab_for .\n"+
>"     ?mid fb:common.topic.description ?desc .\n"+
>"     ?notab_for fb:common.notable_for.display_name ?nf .\n"+
>"     FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
>\"en\") && langMatches(lang(?desc), \"en\"))\n"+
>" }\n";
>
>QueryExecution queryExecution =
>QueryExecutionFactory.sparqlService(ontology_service, query);
>
>ResultSet resultSet = queryExecution.execSelect();
>
>ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>while(resultSet.hasNext()){
>
>QuerySolution querySolution = resultSet.next();
>
>querySolution.get("mid");
>querySolution.get("e");
>querySolution.get("nf");
>querySolution.get("desc");
>
>}
>
>  *   After processing around 400k of results, we have this exception:
>
>             com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
>  at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at 
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at 
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at 
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at 
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at 
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>at 
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Exception in thread "main"
>com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
>Illegal character ((CTRL-CHAR, code 2))
>   at [row,col {unknown-source}]: [5859511,312]
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
>nputStAX.java:510)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:220)
>at 
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at 
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at 
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at 
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at 
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at 
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>... 1 more
>
>
>  *   I have tried with showing the results using a ResultSetFormatter
>but I obtain the same exception:
>
> ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>
>I would say that the ontology is well formed with the correct encoding in
>TBD, because through Fuseki we are able to obtain correct results.
>
>I have read documentation related to this, but I can't find a way to set
>the correct encoding if it is required. Or It is possible that I am not
>using the correct way to execute it.
>
>I would be grateful if you could help me.
>
>Thanks in advance.
>
>
>Diego Paton.
>