You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Paton, Diego" <di...@teamaol.com> on 2014/02/19 12:46:02 UTC
Re: encoding problem iterating result set using a fuseki endpoint
re-sending just to receive all replies that are sent to the list ( I wasn't subscribed to the list )
On 19 Feb 2014, at 11:26, Diego Paton <di...@teamaol.com>> wrote:
Hi,
In my department we have stored one of the lasts dumps of the Freebase ontology using TDB and Fuseki.
TDB: 2.11
Fuseki: 1.0
We have a dataset defined ( /freebase/data ) and we execute SPARQL queries using Fuseki server and It works fine.
* We execute a query, for example:
prefix fb: <http://rdf.freebase.com/ns/>
select ?mid ?e ?nf ?desc
where {
?mid fb:type.object.name ?e .
?mid fb:common.topic.notable_for ?notab_for .
?mid fb:common.topic.description ?desc .
?notab_for fb:common.notable_for.display_name ?nf .
FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en") && langMatches(lang(?desc), "en"))
}
We are able to output results as a TSV file that contains 6158452 lines without problems.
But when we try to execute the same query from a Java application using ARQ ( 2.8.7 ), we have problems.
* This is the code that I execute :
String ontology_service ="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
String query =
"prefix fb: <http://rdf.freebase.com/ns/>\n"+
" select ?mid ?e ?nf ?desc\n"+
" where {\n"+
" ?mid fb:type.object.name ?e .\n"+
" ?mid fb:common.topic.notable_for ?notab_for .\n"+
" ?mid fb:common.topic.description ?desc .\n"+
" ?notab_for fb:common.notable_for.display_name ?nf .\n"+
" FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf), \"en\") && langMatches(lang(?desc), \"en\"))\n"+
" }\n";
QueryExecution queryExecution = QueryExecutionFactory.sparqlService(ontology_service, query);
ResultSet resultSet = queryExecution.execSelect();
ResultSetFormatter.outputAsTSV(System.out, resultSet);
while(resultSet.hasNext()){
QuerySolution querySolution = resultSet.next();
querySolution.get("mid");
querySolution.get("e");
querySolution.get("nf");
querySolution.get("desc");
}
* After processing around 400k of results, we have this exception:
com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:679)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:460)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:216)
at com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:42)
Exception in thread "main" com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException: Illegal character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLInputStAX.java:510)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:220)
at com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:42)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:679)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:460)
at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:216)
... 1 more
* I have tried with showing the results using a ResultSetFormatter but I obtain the same exception:
ResultSetFormatter.outputAsTSV(System.out, resultSet);
I would say that the ontology is well formed with the correct encoding in TBD, because through Fuseki we are able to obtain correct results.
I have read documentation related to this, but I can't find a way to set the correct encoding if it is required. Or It is possible that I am not using the correct way to execute it.
I would be grateful if you could help me.
Thanks in advance.
Diego Paton.
Re: encoding problem iterating result set using a fuseki endpoint
Posted by Andy Seaborne <an...@apache.org>.
On 20/02/14 10:48, Paton, Diego wrote:
> Hi,
>
> I isolated the project and now it is running with the new version.
>
> I executed the query through a http request using XML output format to see the line that causes the error:
>
> <literal xml:lang="en">Next 2008 by COMPATIBLES2
Release Date: 19-Oct-2008
UPC: 859701078081
Genre: Electronic (primary), New Age (secondary)

Electronica; Ambient; Downtempo; Chillout; Electronic; IDM; Breakbeat; Indie; Alternative; Music; Soundtrack; Mellow; Slow.
ªll music Composed, Performed and Recorded by Compatibles2, using the KORG M3 + Radius EXB - Music Workstation.Sampler, between Q2 2007 and Q4 2008, Den Hoorn, the Netherlands. No other instruments used, 100% Digital production.</literal>
Bad escape sequence: ªll music
No ';'
XML parsers are often very, very picky and insist on perfectly correct
input (think business process documents exchanging money - they must
conform to strict checking).
Andy
Re: encoding problem iterating result set using a fuseki endpoint
Posted by "Paton, Diego" <di...@teamaol.com>.
Hi,
I isolated the project and now it is running with the new version.
I executed the query through a http request using XML output format to see the line that causes the error:
<literal xml:lang="en">Next 2008 by COMPATIBLES2
Release Date: 19-Oct-2008
UPC: 859701078081
Genre: Electronic (primary), New Age (secondary)

Electronica; Ambient; Downtempo; Chillout; Electronic; IDM; Breakbeat; Indie; Alternative; Music; Soundtrack; Mellow; Slow.
ªll music Composed, Performed and Recorded by Compatibles2, using the KORG M3 + Radius EXB - Music Workstation.Sampler, between Q2 2007 and Q4 2008, Den Hoorn, the Netherlands. No other instruments used, 100% Digital production.</literal>
After that, I tried to execute the query using QueryEngineHTTP object instead of QueryExecution:
QueryEngineHTTP queryEngine = QueryExecutionFactory.createServiceRequest(ontology_service, query);
queryEngine.setSelectContentType("text/tab-separated-values");
ResultSet resultSet = queryEngine.execSelect();
queryExecution.execSelect();
And I can iterate all the resultset without problems if i set the SelectContentType with text/tab-separated-values but if not, I get the same problem.
This is the value of the QuerySolution that corresponds to the line that generates the error.
Next 2008 by COMPATIBLES2
Release Date: 19-Oct-2008
UPC: 859701078081
Genre: Electronic (primary), New Age (secondary)
Electronica; Ambient; Downtempo; Chillout; Electronic; IDM; Breakbeat; Indie; Alternative; Music; Soundtrack; Mellow; Slow.
All music Composed, Performed and Recorded by Compatibles2, using the KORG M3 + Radius EXB - Music Workstation.Sampler, between Q2 2007 and Q4 2008, Den Hoorn, the Netherlands. No other instruments used, 100% Digital production.@en
The content looks fine, so what I don't understand where is the problem when I use QueryExecution or QueryEngineHTTP with text/tab-separated-values selected.
Thanks for your help,
Diego.
On 19 Feb 2014, at 13:56, Rob Vesse <rv...@dotnetrdf.org>> wrote:
This now looks like an Apache HttpClient version clash on your class path
Jena is using HttpClient 4.2.3 and presumably something in your class path
uses a different version of HttpClient because it's picking up the wrong
classes leading to the NoSuchFieldError
Rob
On 19/02/2014 12:45, "Paton, Diego" <di...@teamaol.com>>
wrote:
Hi,
After updating the dependency to 2.11.1, I get this error when executes
this line:
Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
at
org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(Default
HttpClient.java:175)
at
org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttp
Client.java:158)
at
org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClien
t.java:448)
at
com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:308)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineH
TTP.java:345)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:36)
Executing this line:
ResultSet resultSet = queryExecution.execSelect();
It seems I have to set the charset somewhere.
Thanks,
Diego.
On 19 Feb 2014, at 12:25, Rob Vesse
<rv...@dotnetrdf.org>>
wrote:
Why are you using such an outdated version of ARQ?
2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
to the latest version which is 2.11.1 and let us know if you still have
problems
Rob
On 19/02/2014 11:46, "Paton, Diego"
<di...@teamaol.com><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com>>>
wrote:
re-sending just to receive all replies that are sent to the list ( I
wasn't subscribed to the list )
On 19 Feb 2014, at 11:26, Diego Paton
<di...@teamaol.com><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com>><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com><http://aol.com>>> wrote:
Hi,
In my department we have stored one of the lasts dumps of the Freebase
ontology using TDB and Fuseki.
TDB: 2.11
Fuseki: 1.0
We have a dataset defined ( /freebase/data ) and we execute SPARQL
queries using Fuseki server and It works fine.
* We execute a query, for example:
prefix fb: <http://rdf.freebase.com/ns/>
select ?mid ?e ?nf ?desc
where {
?mid fb:type.object.name ?e .
?mid fb:common.topic.notable_for ?notab_for .
?mid fb:common.topic.description ?desc .
?notab_for fb:common.notable_for.display_name ?nf .
FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
&& langMatches(lang(?desc), "en"))
}
We are able to output results as a TSV file that contains 6158452 lines
without problems.
But when we try to execute the same query from a Java application using
ARQ ( 2.8.7 ), we have problems.
* This is the code that I execute :
String ontology_service
="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
String query =
"prefix fb: <http://rdf.freebase.com/ns/>\n"+
" select ?mid ?e ?nf ?desc\n"+
" where {\n"+
" ?mid fb:type.object.name ?e .\n"+
" ?mid fb:common.topic.notable_for ?notab_for .\n"+
" ?mid fb:common.topic.description ?desc .\n"+
" ?notab_for fb:common.notable_for.display_name ?nf .\n"+
" FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
\"en\") && langMatches(lang(?desc), \"en\"))\n"+
" }\n";
QueryExecution queryExecution =
QueryExecutionFactory.sparqlService(ontology_service, query);
ResultSet resultSet = queryExecution.execSelect();
ResultSetFormatter.outputAsTSV(System.out, resultSet);
while(resultSet.hasNext()){
QuerySolution querySolution = resultSet.next();
querySolution.get("mid");
querySolution.get("e");
querySolution.get("nf");
querySolution.get("desc");
}
* After processing around 400k of results, we have this exception:
com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Exception in thread "main"
com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
Illegal character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
nputStAX.java:510)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:220)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
... 1 more
* I have tried with showing the results using a ResultSetFormatter
but I obtain the same exception:
ResultSetFormatter.outputAsTSV(System.out, resultSet);
I would say that the ontology is well formed with the correct encoding in
TBD, because through Fuseki we are able to obtain correct results.
I have read documentation related to this, but I can't find a way to set
the correct encoding if it is required. Or It is possible that I am not
using the correct way to execute it.
I would be grateful if you could help me.
Thanks in advance.
Diego Paton.
Re: encoding problem iterating result set using a fuseki endpoint
Posted by Rob Vesse <rv...@dotnetrdf.org>.
This now looks like an Apache HttpClient version clash on your class path
Jena is using HttpClient 4.2.3 and presumably something in your class path
uses a different version of HttpClient because it's picking up the wrong
classes leading to the NoSuchFieldError
Rob
On 19/02/2014 12:45, "Paton, Diego" <di...@teamaol.com>
wrote:
>Hi,
>
>After updating the dependency to 2.11.1, I get this error when executes
>this line:
>
>Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
>at
>org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(Default
>HttpClient.java:175)
>at
>org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttp
>Client.java:158)
>at
>org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClien
>t.java:448)
>at
>com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:308)
>at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
>at
>com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineH
>TTP.java:345)
>at
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:36)
>
>Executing this line:
>
>ResultSet resultSet = queryExecution.execSelect();
>
>It seems I have to set the charset somewhere.
>
>Thanks,
>
>Diego.
>
>
>On 19 Feb 2014, at 12:25, Rob Vesse
><rv...@dotnetrdf.org>>
> wrote:
>
>Why are you using such an outdated version of ARQ?
>
>2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
>to the latest version which is 2.11.1 and let us know if you still have
>problems
>
>Rob
>
>On 19/02/2014 11:46, "Paton, Diego"
><diego.paton.villahermosa@teamaol.com<mailto:diego.paton.villahermosa@team
>aol.com>>
>wrote:
>
>
>re-sending just to receive all replies that are sent to the list ( I
>wasn't subscribed to the list )
>
>
>On 19 Feb 2014, at 11:26, Diego Paton
><diego.paton.villahermosa@teamaol.com<mailto:diego.paton.villahermosa@team
>aol.com><mailto:diego.paton.villahermosa@team
>aol.com<http://aol.com>>> wrote:
>
>Hi,
>
>
>In my department we have stored one of the lasts dumps of the Freebase
>ontology using TDB and Fuseki.
>
>TDB: 2.11
>Fuseki: 1.0
>
>We have a dataset defined ( /freebase/data ) and we execute SPARQL
>queries using Fuseki server and It works fine.
>
>
>* We execute a query, for example:
>
>prefix fb: <http://rdf.freebase.com/ns/>
>select ?mid ?e ?nf ?desc
>where {
> ?mid fb:type.object.name ?e .
> ?mid fb:common.topic.notable_for ?notab_for .
> ?mid fb:common.topic.description ?desc .
> ?notab_for fb:common.notable_for.display_name ?nf .
> FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
>&& langMatches(lang(?desc), "en"))
>}
>We are able to output results as a TSV file that contains 6158452 lines
>without problems.
>
>But when we try to execute the same query from a Java application using
>ARQ ( 2.8.7 ), we have problems.
>
>
>* This is the code that I execute :
>
>
>String ontology_service
>="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
>String query =
>"prefix fb: <http://rdf.freebase.com/ns/>\n"+
>" select ?mid ?e ?nf ?desc\n"+
>" where {\n"+
>" ?mid fb:type.object.name ?e .\n"+
>" ?mid fb:common.topic.notable_for ?notab_for .\n"+
>" ?mid fb:common.topic.description ?desc .\n"+
>" ?notab_for fb:common.notable_for.display_name ?nf .\n"+
>" FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
>\"en\") && langMatches(lang(?desc), \"en\"))\n"+
>" }\n";
>
>QueryExecution queryExecution =
>QueryExecutionFactory.sparqlService(ontology_service, query);
>
>ResultSet resultSet = queryExecution.execSelect();
>
>ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>while(resultSet.hasNext()){
>
>QuerySolution querySolution = resultSet.next();
>
>querySolution.get("mid");
>querySolution.get("e");
>querySolution.get("nf");
>querySolution.get("desc");
>
>}
>
>* After processing around 400k of results, we have this exception:
>
> com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
>at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>at
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Exception in thread "main"
>com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
>Illegal character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
>nputStAX.java:510)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:220)
>at
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
>at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>... 1 more
>
>
>* I have tried with showing the results using a ResultSetFormatter
>but I obtain the same exception:
>
>ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>
>I would say that the ontology is well formed with the correct encoding in
>TBD, because through Fuseki we are able to obtain correct results.
>
>I have read documentation related to this, but I can't find a way to set
>the correct encoding if it is required. Or It is possible that I am not
>using the correct way to execute it.
>
>I would be grateful if you could help me.
>
>Thanks in advance.
>
>
>Diego Paton.
>
>
>
>
>
>
Re: encoding problem iterating result set using a fuseki endpoint
Posted by Andy Seaborne <an...@apache.org>.
On 19/02/14 12:45, Paton, Diego wrote:
> Hi,
>
> After updating the dependency to 2.11.1, I get this error when executes this line:
>
> Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
Java error. You need to update the dependencies as well.
> at org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(DefaultHttpClient.java:175)
Why not issue the request using curl or wget to grab the HTTP response
and look at [row,col {unknown-source}]: [5859511,312]
My guess is that data has a charset error Nothing to do the results
themselves.
Also, try a different format. Ask for TSV or JSON (--header "Accept:
application/sparql-results+json" etc)
> * After processing around 400k of results, we have this exception:
>
> com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
> character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
> at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
> at
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
> :4668)
>
>
> * I have tried with showing the results using a ResultSetFormatter
> but I obtain the same exception:
That is still getting it as XML and the writing locally as TSV.
>
> ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>
> I would say that the ontology is well formed with the correct encoding in
> TBD, because through Fuseki we are able to obtain correct results.
Example?
>
> I have read documentation related to this, but I can't find a way to set
> the correct encoding if it is required. Or It is possible that I am not
> using the correct way to execute it.
>
> I would be grateful if you could help me.
>
> Thanks in advance.
>
>
> Diego Paton.
>
>
>
>
>
>
>
Re: encoding problem iterating result set using a fuseki endpoint
Posted by "Paton, Diego" <di...@teamaol.com>.
Hi,
After updating the dependency to 2.11.1, I get this error when executes this line:
Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
at org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(DefaultHttpClient.java:175)
at org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttpClient.java:158)
at org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClient.java:448)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:308)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:345)
at com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:36)
Executing this line:
ResultSet resultSet = queryExecution.execSelect();
It seems I have to set the charset somewhere.
Thanks,
Diego.
On 19 Feb 2014, at 12:25, Rob Vesse <rv...@dotnetrdf.org>>
wrote:
Why are you using such an outdated version of ARQ?
2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
to the latest version which is 2.11.1 and let us know if you still have
problems
Rob
On 19/02/2014 11:46, "Paton, Diego" <di...@teamaol.com>>
wrote:
re-sending just to receive all replies that are sent to the list ( I
wasn't subscribed to the list )
On 19 Feb 2014, at 11:26, Diego Paton
<di...@teamaol.com><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com>>> wrote:
Hi,
In my department we have stored one of the lasts dumps of the Freebase
ontology using TDB and Fuseki.
TDB: 2.11
Fuseki: 1.0
We have a dataset defined ( /freebase/data ) and we execute SPARQL
queries using Fuseki server and It works fine.
* We execute a query, for example:
prefix fb: <http://rdf.freebase.com/ns/>
select ?mid ?e ?nf ?desc
where {
?mid fb:type.object.name ?e .
?mid fb:common.topic.notable_for ?notab_for .
?mid fb:common.topic.description ?desc .
?notab_for fb:common.notable_for.display_name ?nf .
FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
&& langMatches(lang(?desc), "en"))
}
We are able to output results as a TSV file that contains 6158452 lines
without problems.
But when we try to execute the same query from a Java application using
ARQ ( 2.8.7 ), we have problems.
* This is the code that I execute :
String ontology_service
="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
String query =
"prefix fb: <http://rdf.freebase.com/ns/>\n"+
" select ?mid ?e ?nf ?desc\n"+
" where {\n"+
" ?mid fb:type.object.name ?e .\n"+
" ?mid fb:common.topic.notable_for ?notab_for .\n"+
" ?mid fb:common.topic.description ?desc .\n"+
" ?notab_for fb:common.notable_for.display_name ?nf .\n"+
" FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
\"en\") && langMatches(lang(?desc), \"en\"))\n"+
" }\n";
QueryExecution queryExecution =
QueryExecutionFactory.sparqlService(ontology_service, query);
ResultSet resultSet = queryExecution.execSelect();
ResultSetFormatter.outputAsTSV(System.out, resultSet);
while(resultSet.hasNext()){
QuerySolution querySolution = resultSet.next();
querySolution.get("mid");
querySolution.get("e");
querySolution.get("nf");
querySolution.get("desc");
}
* After processing around 400k of results, we have this exception:
com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Exception in thread "main"
com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
Illegal character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
nputStAX.java:510)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:220)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
... 1 more
* I have tried with showing the results using a ResultSetFormatter
but I obtain the same exception:
ResultSetFormatter.outputAsTSV(System.out, resultSet);
I would say that the ontology is well formed with the correct encoding in
TBD, because through Fuseki we are able to obtain correct results.
I have read documentation related to this, but I can't find a way to set
the correct encoding if it is required. Or It is possible that I am not
using the correct way to execute it.
I would be grateful if you could help me.
Thanks in advance.
Diego Paton.
Re: encoding problem iterating result set using a fuseki endpoint
Posted by Rob Vesse <rv...@dotnetrdf.org>.
Why are you using such an outdated version of ARQ?
2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
to the latest version which is 2.11.1 and let us know if you still have
problems
Rob
On 19/02/2014 11:46, "Paton, Diego" <di...@teamaol.com>
wrote:
>
>re-sending just to receive all replies that are sent to the list ( I
>wasn't subscribed to the list )
>
>
>On 19 Feb 2014, at 11:26, Diego Paton
><diego.paton.villahermosa@teamaol.com<mailto:diego.paton.villahermosa@team
>aol.com>> wrote:
>
>Hi,
>
>
>In my department we have stored one of the lasts dumps of the Freebase
>ontology using TDB and Fuseki.
>
>TDB: 2.11
>Fuseki: 1.0
>
>We have a dataset defined ( /freebase/data ) and we execute SPARQL
>queries using Fuseki server and It works fine.
>
>
> * We execute a query, for example:
>
>prefix fb: <http://rdf.freebase.com/ns/>
>select ?mid ?e ?nf ?desc
>where {
> ?mid fb:type.object.name ?e .
> ?mid fb:common.topic.notable_for ?notab_for .
> ?mid fb:common.topic.description ?desc .
> ?notab_for fb:common.notable_for.display_name ?nf .
> FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
>&& langMatches(lang(?desc), "en"))
>}
>We are able to output results as a TSV file that contains 6158452 lines
>without problems.
>
>But when we try to execute the same query from a Java application using
>ARQ ( 2.8.7 ), we have problems.
>
>
> * This is the code that I execute :
>
>
>String ontology_service
>="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";
>String query =
>"prefix fb: <http://rdf.freebase.com/ns/>\n"+
>" select ?mid ?e ?nf ?desc\n"+
>" where {\n"+
>" ?mid fb:type.object.name ?e .\n"+
>" ?mid fb:common.topic.notable_for ?notab_for .\n"+
>" ?mid fb:common.topic.description ?desc .\n"+
>" ?notab_for fb:common.notable_for.display_name ?nf .\n"+
>" FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
>\"en\") && langMatches(lang(?desc), \"en\"))\n"+
>" }\n";
>
>QueryExecution queryExecution =
>QueryExecutionFactory.sparqlService(ontology_service, query);
>
>ResultSet resultSet = queryExecution.execSelect();
>
>ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>while(resultSet.hasNext()){
>
>QuerySolution querySolution = resultSet.next();
>
>querySolution.get("mid");
>querySolution.get("e");
>querySolution.get("nf");
>querySolution.get("desc");
>
>}
>
> * After processing around 400k of results, we have this exception:
>
> com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>at
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Exception in thread "main"
>com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
>Illegal character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
>nputStAX.java:510)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:220)
>at
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>... 1 more
>
>
> * I have tried with showing the results using a ResultSetFormatter
>but I obtain the same exception:
>
> ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>
>I would say that the ontology is well formed with the correct encoding in
>TBD, because through Fuseki we are able to obtain correct results.
>
>I have read documentation related to this, but I can't find a way to set
>the correct encoding if it is required. Or It is possible that I am not
>using the correct way to execute it.
>
>I would be grateful if you could help me.
>
>Thanks in advance.
>
>
>Diego Paton.
>