You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by rv...@apache.org on 2012/04/10 02:15:25 UTC

svn commit: r1311528 - in /incubator/jena/Jena2/ARQ/trunk: ./ .settings/ src/main/java/com/hp/hpl/jena/query/ src/main/java/org/openjena/riot/ src/test/java/org/openjena/riot/lang/

Author: rvesse
Date: Tue Apr 10 00:15:25 2012
New Revision: 1311528

URL: http://svn.apache.org/viewvc?rev=1311528&view=rev
Log:
Adding ability to preserve the raw query string in a Query object which is useful for applications which need to inspect the original user input (e.g. for comments etc).  Also fixed a bug with SysRiot.resetJenaReaders() related to setting the RDF/JSON readers and writer class names to null (now uses empty strings instead)

Modified:
    incubator/jena/Jena2/ARQ/trunk/.classpath
    incubator/jena/Jena2/ARQ/trunk/.project
    incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.core.resources.prefs
    incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.jdt.core.prefs
    incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/Query.java
    incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/QueryFactory.java
    incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java
    incubator/jena/Jena2/ARQ/trunk/src/test/java/org/openjena/riot/lang/TestLangRdfJson.java

Modified: incubator/jena/Jena2/ARQ/trunk/.classpath
URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/.classpath?rev=1311528&r1=1311527&r2=1311528&view=diff
==============================================================================
--- incubator/jena/Jena2/ARQ/trunk/.classpath (original)
+++ incubator/jena/Jena2/ARQ/trunk/.classpath Tue Apr 10 00:15:25 2012
@@ -1,27 +1,11 @@
 <?xml version="1.0" encoding="UTF-8"?>
-
 <classpath>
-  <classpathentry excluding="**/.svn/" kind="src" path="src/main/java"/>
-  <classpathentry excluding="**/.svn/" kind="src" path="src/test/java"/>
-  <classpathentry excluding="**/.svn/" kind="src" path="src/main/resources"/>
-  <classpathentry excluding="**/.svn/" kind="src" path="src/test/resources"/>
-  <classpathentry excluding="**/.svn/" kind="src" path="src-examples"/>
-
-  <classpathentry kind="var" path="M2_REPO/commons-codec/commons-codec/1.4/commons-codec-1.4.jar" sourcepath="M2_REPO/commons-codec/commons-codec/1.4/commons-codec-1.4-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/org/apache/httpcomponents/httpclient/4.1.2/httpclient-4.1.2.jar" sourcepath="M2_REPO/org/apache/httpcomponents/httpclient/4.1.2/httpclient-4.1.2-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/org/apache/httpcomponents/httpcore/4.1.3/httpcore-4.1.3.jar" sourcepath="M2_REPO/org/apache/httpcomponents/httpcore/4.1.3/httpcore-4.1.3-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/com/ibm/icu/icu4j/3.4.4/icu4j-3.4.4.jar" sourcepath="M2_REPO/com/ibm/icu/icu4j/3.4.4/icu4j-3.4.4-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/org/apache/jena/jena-iri/0.9.1-incubating-SNAPSHOT/jena-iri-0.9.1-incubating-SNAPSHOT.jar" sourcepath="M2_REPO/org/apache/jena/jena-iri/0.9.1-incubating-SNAPSHOT/jena-iri-0.9.1-incubating-SNAPSHOT-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/org/apache/jena/jena-core/2.7.1-incubating-SNAPSHOT/jena-core-2.7.1-incubating-SNAPSHOT-tests.jar" sourcepath="M2_REPO/org/apache/jena/jena-core/2.7.1-incubating-SNAPSHOT/jena-core-2.7.1-incubating-SNAPSHOT-test-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/org/apache/jena/jena-core/2.7.1-incubating-SNAPSHOT/jena-core-2.7.1-incubating-SNAPSHOT.jar" sourcepath="M2_REPO/org/apache/jena/jena-core/2.7.1-incubating-SNAPSHOT/jena-core-2.7.1-incubating-SNAPSHOT-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/junit/junit/4.8.2/junit-4.8.2.jar" sourcepath="M2_REPO/junit/junit/4.8.2/junit-4.8.2-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/log4j/log4j/1.2.16/log4j-1.2.16.jar" sourcepath="M2_REPO/log4j/log4j/1.2.16/log4j-1.2.16-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/org/slf4j/slf4j-api/1.6.4/slf4j-api-1.6.4.jar" sourcepath="M2_REPO/org/slf4j/slf4j-api/1.6.4/slf4j-api-1.6.4-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/org/slf4j/slf4j-log4j12/1.6.4/slf4j-log4j12-1.6.4.jar" sourcepath="M2_REPO/org/slf4j/slf4j-log4j12/1.6.4/slf4j-log4j12-1.6.4-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/org/slf4j/jcl-over-slf4j/1.6.4/jcl-over-slf4j-1.6.4.jar" sourcepath="M2_REPO/org/slf4j/jcl-over-slf4j/1.6.4/jcl-over-slf4j-1.6.4-sources.jar"/>
-  <classpathentry kind="var" path="M2_REPO/xerces/xercesImpl/2.10.0/xercesImpl-2.10.0.jar"/>
-  <classpathentry kind="var" path="M2_REPO/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar"/>
-
-  <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
-  <classpathentry kind="output" path="classes"/>
+	<classpathentry kind="src" output="target/classes" path="src/main/java"/>
+	<classpathentry excluding="**" kind="src" output="target/classes" path="src/main/resources"/>
+	<classpathentry excluding="**" kind="src" output="target/classes" path="resources"/>
+	<classpathentry kind="src" output="target/test-classes" path="src/test/java"/>
+	<classpathentry excluding="**" kind="src" output="target/test-classes" path="src/test/resources"/>
+	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
+	<classpathentry kind="con" path="org.eclipse.m2e.MAVEN2_CLASSPATH_CONTAINER"/>
+	<classpathentry kind="output" path="target/classes"/>
 </classpath>

Modified: incubator/jena/Jena2/ARQ/trunk/.project
URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/.project?rev=1311528&r1=1311527&r2=1311528&view=diff
==============================================================================
--- incubator/jena/Jena2/ARQ/trunk/.project (original)
+++ incubator/jena/Jena2/ARQ/trunk/.project Tue Apr 10 00:15:25 2012
@@ -1,13 +1,23 @@
+<?xml version="1.0" encoding="UTF-8"?>
 <projectDescription>
-  <name>ARQ</name>
-  <comment>ARQ is a SPARQL 1.1 query engine for Jena</comment>
-  <projects/>
-  <buildSpec>
-    <buildCommand>
-      <name>org.eclipse.jdt.core.javabuilder</name>
-    </buildCommand>
-  </buildSpec>
-  <natures>
-    <nature>org.eclipse.jdt.core.javanature</nature>
-  </natures>
+	<name>ARQ</name>
+	<comment>ARQ is a SPARQL 1.1 query engine for Jena</comment>
+	<projects>
+	</projects>
+	<buildSpec>
+		<buildCommand>
+			<name>org.eclipse.jdt.core.javabuilder</name>
+			<arguments>
+			</arguments>
+		</buildCommand>
+		<buildCommand>
+			<name>org.eclipse.m2e.core.maven2Builder</name>
+			<arguments>
+			</arguments>
+		</buildCommand>
+	</buildSpec>
+	<natures>
+		<nature>org.eclipse.m2e.core.maven2Nature</nature>
+		<nature>org.eclipse.jdt.core.javanature</nature>
+	</natures>
 </projectDescription>

Modified: incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.core.resources.prefs
URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.core.resources.prefs?rev=1311528&r1=1311527&r2=1311528&view=diff
==============================================================================
--- incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.core.resources.prefs (original)
+++ incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.core.resources.prefs Tue Apr 10 00:15:25 2012
@@ -1,2 +1,7 @@
 eclipse.preferences.version=1
+encoding//src/main/java=UTF-8
+encoding//src/main/resources=UTF-8
+encoding//src/test/java=UTF-8
+encoding//src/test/resources=UTF-8
 encoding/<project>=UTF-8
+encoding/resources=UTF-8

Modified: incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.jdt.core.prefs
URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.jdt.core.prefs?rev=1311528&r1=1311527&r2=1311528&view=diff
==============================================================================
--- incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.jdt.core.prefs (original)
+++ incubator/jena/Jena2/ARQ/trunk/.settings/org.eclipse.jdt.core.prefs Tue Apr 10 00:15:25 2012
@@ -1,4 +1,3 @@
-#Mon Mar 19 09:15:55 PDT 2012
 eclipse.preferences.version=1
 encoding//src/test/resources=UTF-8
 encoding/resources=UTF-8
@@ -25,7 +24,7 @@ org.eclipse.jdt.core.compiler.problem.fa
 org.eclipse.jdt.core.compiler.problem.fieldHiding=warning
 org.eclipse.jdt.core.compiler.problem.finalParameterBound=warning
 org.eclipse.jdt.core.compiler.problem.finallyBlockNotCompletingNormally=warning
-org.eclipse.jdt.core.compiler.problem.forbiddenReference=error
+org.eclipse.jdt.core.compiler.problem.forbiddenReference=warning
 org.eclipse.jdt.core.compiler.problem.hiddenCatchBlock=warning
 org.eclipse.jdt.core.compiler.problem.incompatibleNonInheritedInterfaceMethod=warning
 org.eclipse.jdt.core.compiler.problem.incompleteEnumSwitch=warning

Modified: incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/Query.java
URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/Query.java?rev=1311528&r1=1311527&r2=1311528&view=diff
==============================================================================
--- incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/Query.java (original)
+++ incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/Query.java Tue Apr 10 00:15:25 2012
@@ -75,6 +75,9 @@ public class Query extends Prologue impl
     private List<String> graphURIs = new ArrayList<String>() ;
     private List<String> namedGraphURIs = new ArrayList<String>() ;
     
+    // The Original Raw Query as provided to the API
+    private String rawQuery = null;
+    
     // The WHERE clause
     private Element queryPattern = null ;
     
@@ -122,17 +125,47 @@ public class Query extends Prologue impl
     // Also uses resultVars
     protected List<Node> resultNodes               = new ArrayList<Node>() ;     // Type in list: Node
     
+    /**
+     * Creates a new empty query
+     */
     public Query()
     {
         syntax = Syntax.syntaxSPARQL ;
     }
     
+    /**
+     * Creates a new empty query with the given prologue
+     */
     public Query(Prologue prologue)
     {
         this() ;
         usePrologueFrom(prologue) ;
     }
     
+    /**
+     * Creates a new empty query with the given raw query string
+     * <p>
+     * <strong>Important:</strong> This constructor does not cause the query to be parsed, this only stores a reference to the original query string in the query which may be useful if you want to see the original unaltered syntax (including comments) at some later point.
+     * </p>
+     */
+    public Query(String queryString)
+    {
+    	this();
+    	rawQuery = queryString;
+    }
+    
+    /**
+     * Creates a new empty query with the given raw query string and prologue
+     * <p>
+     * <strong>Important:</strong> This constructor does not cause the query to be parsed, this only stores a reference to the original query string in the query which may be useful if you want to see the original unaltered syntax (including comments) at some later point.
+     * </p>
+     */
+    public Query(String queryString, Prologue prologue)
+    {
+    	this(prologue);
+    	rawQuery = queryString;
+    }
+    
     // Allocate variables that are unique to this query.
     private VarAlloc varAlloc = new VarAlloc(ARQConstants.allocVarMarker) ;
     private Var allocInternVar() { return varAlloc.allocVar() ; }
@@ -189,6 +222,22 @@ public class Query extends Prologue impl
     
     public void setReduced(boolean b) { reduced = b ; }
     public boolean isReduced()        { return reduced ; }
+    
+    /**
+     * Sets the raw query string
+     */
+    protected void setRawQuery(String queryString)
+    {
+    	rawQuery = queryString;
+    }
+    
+    /**
+     * Gets the original raw query string from which this instance was populated, may be null depending on how the query was created
+     */
+    public String getRawQuery()
+    {
+    	return rawQuery;
+    }
 
     /** @return Returns the syntax. */
     public Syntax getSyntax()         { return syntax ; }
@@ -782,12 +831,36 @@ public class Query extends Prologue impl
     @Override
     public Object clone() { return cloneQuery() ; }
     
+    /**
+     * Makes a copy of this query.  Copies by parsing a query from the serialized form of this query
+     * @return Copy of this query
+     */
     public Query cloneQuery()
     {
-        // A little crude.
-        IndentedLineBuffer buff = new IndentedLineBuffer() ;
-        serialize(buff, getSyntax()) ;
-        String qs = buff.toString() ;
+    	//By default clone from serialized form of this query
+    	return cloneQuery(false);
+    }
+    
+    /**
+     * Makes a copy of this query.  May specify whether is cloned by parsing from original raw query or by parsing from serialized form of this query
+     * @param useRawQuery Copy from raw query if present
+     * @return Copy of this query
+     */
+    public Query cloneQuery(boolean useRawQuery)
+    {
+    	String qs;
+    	if (useRawQuery && this.rawQuery != null && !this.rawQuery.equals(""))
+    	{
+    		//If specified (and is present) clone from raw query rather than the serialized query
+    		qs = this.rawQuery;
+    	}
+    	else
+    	{
+    		// A little crude.
+    		IndentedLineBuffer buff = new IndentedLineBuffer() ;
+    		serialize(buff, getSyntax()) ;
+    		qs = buff.toString() ;
+    	}
         return QueryFactory.create(qs, getSyntax()) ;
     }
     

Modified: incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/QueryFactory.java
URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/QueryFactory.java?rev=1311528&r1=1311527&r2=1311528&view=diff
==============================================================================
--- incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/QueryFactory.java (original)
+++ incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/query/QueryFactory.java Tue Apr 10 00:15:25 2012
@@ -61,7 +61,7 @@ public class QueryFactory
     
     static public Query create(String queryString, String baseURI)
     {
-        Query query = new Query() ;
+        Query query = new Query(queryString) ;
         parse(query, queryString, baseURI, Syntax.defaultQuerySyntax) ;
         return query ;
         
@@ -77,7 +77,7 @@ public class QueryFactory
    
    static public Query create(String queryString, String baseURI, Syntax syntax)
    {
-       Query query = new Query() ;
+       Query query = new Query(queryString) ;
        parse(query, queryString, baseURI, syntax) ;
        return query ;
        
@@ -109,6 +109,22 @@ public class QueryFactory
         return originalQuery.cloneQuery() ;
     }
     
+    /**
+     * Make a query from another one by deep copy (a clone).
+     * The returned query will be .equals to the original.
+     * The returned query can be mutated without changing the
+     * original (at which point it will stop being .equals)
+     * 
+     * @param originalQuery  The query to clone.
+     * @param useRawQuery Whether to clone from the raw query string the original query was created from (if it is available)
+     *   
+     */
+
+    static public Query create(Query originalQuery, boolean useRawQuery)
+    {
+        return originalQuery.cloneQuery(useRawQuery) ;
+    }
+    
 
     /** Parse a query from the given string by calling the parser.
      *
@@ -126,6 +142,7 @@ public class QueryFactory
         else
             query.setSyntax(syntaxURI) ;
 
+        query.setRawQuery(queryString);
         Parser parser = Parser.createParser(syntaxURI) ;
         
         if ( parser == null )

Modified: incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java
URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java?rev=1311528&r1=1311527&r2=1311528&view=diff
==============================================================================
--- incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java (original)
+++ incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java Tue Apr 10 00:15:25 2012
@@ -103,8 +103,8 @@ public class SysRIOT
         RDFReaderFImpl.setBaseReaderClassName("Turtle", jenaTurtleReader) ;
         RDFReaderFImpl.setBaseReaderClassName("TTL",    jenaTurtleReader) ;
 
-        RDFReaderFImpl.setBaseReaderClassName("RDF/JSON", null) ;
-        RDFWriterFImpl.setBaseWriterClassName("RDF/JSON", null) ;
+        RDFReaderFImpl.setBaseReaderClassName("RDF/JSON", "") ;
+        RDFWriterFImpl.setBaseWriterClassName("RDF/JSON", "") ;
     }
 
 }

Modified: incubator/jena/Jena2/ARQ/trunk/src/test/java/org/openjena/riot/lang/TestLangRdfJson.java
URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/src/test/java/org/openjena/riot/lang/TestLangRdfJson.java?rev=1311528&r1=1311527&r2=1311528&view=diff
==============================================================================
--- incubator/jena/Jena2/ARQ/trunk/src/test/java/org/openjena/riot/lang/TestLangRdfJson.java (original)
+++ incubator/jena/Jena2/ARQ/trunk/src/test/java/org/openjena/riot/lang/TestLangRdfJson.java Tue Apr 10 00:15:25 2012
@@ -21,6 +21,8 @@ package org.openjena.riot.lang;
 import java.io.ByteArrayInputStream ;
 import java.io.StringReader ;
 
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
 import org.junit.Test ;
 import org.openjena.atlas.io.PeekReader ;
 import org.openjena.atlas.json.io.parser.TokenizerJSON ;
@@ -30,6 +32,7 @@ import org.openjena.atlas.lib.StrUtils ;
 import org.openjena.riot.ErrorHandlerTestLib.ExFatal ;
 import org.openjena.riot.RiotReader ;
 import org.openjena.riot.ErrorHandlerTestLib.ErrorHandlerEx ;
+import org.openjena.riot.SysRIOT;
 import org.openjena.riot.system.JenaReaderNTriples2 ;
 import org.openjena.riot.system.JenaReaderRdfJson ;
 import org.openjena.riot.tokens.Tokenizer ;
@@ -42,6 +45,32 @@ import com.hp.hpl.jena.rdf.model.RDFRead
 
 public class TestLangRdfJson extends BaseTest
 {
+	@BeforeClass
+	public static void setup()
+	{
+		SysRIOT.wireIntoJena();
+	}
+	
+	@AfterClass
+	public static void teardown()
+	{
+		SysRIOT.resetJenaReaders();
+	}
+	
+	@Test
+	public void rdfjson_get_jena_reader()
+	{
+		Model m = ModelFactory.createDefaultModel();
+		m.getReader("RDF/JSON");
+	}
+	
+	@Test
+	public void rdfjson_get_jena_writer()
+	{
+		Model m = ModelFactory.createDefaultModel();
+		m.getWriter("RDF/JSON");
+	}
+	
 	@Test
 	public void rdfjson_read_empty_graph()
 	{



Re: Reader/Writer registries

Posted by Robert Vesse <rv...@yarcdata.com>.
Setting to "" appears to be fine but setting to explicit null caused a NPE for some reason

I only noticed this because I was playing with something vaguely related and needed to actually turn this on in the @AfterClass method and then discovered the bug

A better registry of readers and writers for SPARQL results, RDF triple and quads formats would be nice since right now we have several different places doing this (the Jena reader/writer registry, RIOTs language registry, Fuseki also has some MIME type related stuff.

I don't know if this is helpful but dotNetRDF has a single central registry which maps MIME types and file extensions to readers/writers for various formats (and vice versa).  This gets used for everything from just auto-detecting what the most likely parser is when a user asks to load a file through to content negotiation in HTTP endpoints.

Implementing such a strategy for Jena would be harder because of its modular nature and I don't know whether adding yet another registry would just complicate things further (even if all the old registries just became facades to the new registry)

Rob

On Apr 10, 2012, at 1:18 AM, Paolo Castagna wrote:

> rvesse@apache.org wrote:
>> Also fixed a bug with SysRiot.resetJenaReaders() related to setting the RDF/JSON readers and writer class names to null (now uses empty strings instead)
> 
> [...]
> 
>> Modified: incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java
>> URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java?rev=1311528&r1=1311527&r2=1311528&view=diff
>> ==============================================================================
>> --- incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java (original)
>> +++ incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java Tue Apr 10 00:15:25 2012
>> @@ -103,8 +103,8 @@ public class SysRIOT
>>         RDFReaderFImpl.setBaseReaderClassName("Turtle", jenaTurtleReader) ;
>>         RDFReaderFImpl.setBaseReaderClassName("TTL",    jenaTurtleReader) ;
>> 
>> -        RDFReaderFImpl.setBaseReaderClassName("RDF/JSON", null) ;
>> -        RDFWriterFImpl.setBaseWriterClassName("RDF/JSON", null) ;
>> +        RDFReaderFImpl.setBaseReaderClassName("RDF/JSON", "") ;
>> +        RDFWriterFImpl.setBaseWriterClassName("RDF/JSON", "") ;
>>     }
>> 
>> }
> 
> Yep, I remember that.
> 
> The thing is that there is no explicit way to unregister a reader or a writer.
> What's the advantage of using "" instead of null? (I don't like both... ;-))
> Was null causing any problem?
> 
> Perhaps, we should add an explicit unsetBase*ClassName(...) call.
> 
> In future, it would be nice to give users the ability to easily register/unregister their RDF readers and writers making easy for third parties to add their own  serialization formats and/or parsers
> from native formats.
> 
> Paolo


Re: I/O and extensibility.

Posted by Robert Vesse <rv...@yarcdata.com>.
Yes that looks a lot more like what I was thinking of in my reply to Paolo's email

Looks good

Rob

On Apr 10, 2012, at 1:37 AM, Andy Seaborne wrote:

> 
>> Perhaps, we should add an explicit unsetBase*ClassName(...) call.
>> 
>> In future, it would be nice to give users the ability to easily register/unregister their RDF readers and writers making easy for third parties to add their own  serialization formats and/or parsers
>> from native formats.
>> 
>> Paolo
> 
> See
> 
> https://svn.apache.org/repos/asf/incubator/jena/Scratch/AFS/Dev/trunk/src/main/java/projects/riot_reader/
> 
> and tell me what you think.
> 
> It's a complete replacement for reading models (and more) in Jena and integrates filemanager-isms and Model.read.  It does proper HTTP conneg.
> 
> (anything {X}2.java) is a replacement for current {X}.java)
> 
> The current public API works -- Model.read, FileManager.get().read/load.  FileManger like functionality is built into all read operations.
> 
> There is one reader implementation (RDFReaderRIOT) - it uses file extension (and URL extensions), Accept: media type and the app-supplied language name to decide what the syntax.
> 
> WebReader2 is a collection of functions to read triples/quads.  It's the new front-door.
> 
> See WebReader2.addTripleSyntax/addQuadSyntax for extensibility.
> 
> WebReader2.wireIntoJena/resetJenaReaders are just for running with existing Jena.
> 
> Langs.java is a class of many constants for mapping media types to handlers.  None of this fixed RDFReaderFImpl stuff.
> 
> There are some misnamings - things have grown over time (e.g. addTripleSyntax is a "add or replace").
> 
> 
> 	Andy


I/O and extensibility.

Posted by Andy Seaborne <an...@apache.org>.
> Perhaps, we should add an explicit unsetBase*ClassName(...) call.
>
> In future, it would be nice to give users the ability to easily register/unregister their RDF readers and writers making easy for third parties to add their own  serialization formats and/or parsers
> from native formats.
>
> Paolo

See

https://svn.apache.org/repos/asf/incubator/jena/Scratch/AFS/Dev/trunk/src/main/java/projects/riot_reader/

and tell me what you think.

It's a complete replacement for reading models (and more) in Jena and 
integrates filemanager-isms and Model.read.  It does proper HTTP conneg.

(anything {X}2.java) is a replacement for current {X}.java)

The current public API works -- Model.read, FileManager.get().read/load. 
  FileManger like functionality is built into all read operations.

There is one reader implementation (RDFReaderRIOT) - it uses file 
extension (and URL extensions), Accept: media type and the app-supplied 
language name to decide what the syntax.

WebReader2 is a collection of functions to read triples/quads.  It's the 
new front-door.

See WebReader2.addTripleSyntax/addQuadSyntax for extensibility.

WebReader2.wireIntoJena/resetJenaReaders are just for running with 
existing Jena.

Langs.java is a class of many constants for mapping media types to 
handlers.  None of this fixed RDFReaderFImpl stuff.

There are some misnamings - things have grown over time (e.g. 
addTripleSyntax is a "add or replace").


	Andy

Re: svn commit: r1311528 - in /incubator/jena/Jena2/ARQ/trunk: ./ .settings/ src/main/java/com/hp/hpl/jena/query/ src/main/java/org/openjena/riot/ src/test/java/org/openjena/riot/lang/

Posted by Paolo Castagna <ca...@googlemail.com>.
rvesse@apache.org wrote:
> Also fixed a bug with SysRiot.resetJenaReaders() related to setting the RDF/JSON readers and writer class names to null (now uses empty strings instead)

[...]

> Modified: incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java
> URL: http://svn.apache.org/viewvc/incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java?rev=1311528&r1=1311527&r2=1311528&view=diff
> ==============================================================================
> --- incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java (original)
> +++ incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/riot/SysRIOT.java Tue Apr 10 00:15:25 2012
> @@ -103,8 +103,8 @@ public class SysRIOT
>          RDFReaderFImpl.setBaseReaderClassName("Turtle", jenaTurtleReader) ;
>          RDFReaderFImpl.setBaseReaderClassName("TTL",    jenaTurtleReader) ;
>  
> -        RDFReaderFImpl.setBaseReaderClassName("RDF/JSON", null) ;
> -        RDFWriterFImpl.setBaseWriterClassName("RDF/JSON", null) ;
> +        RDFReaderFImpl.setBaseReaderClassName("RDF/JSON", "") ;
> +        RDFWriterFImpl.setBaseWriterClassName("RDF/JSON", "") ;
>      }
>  
>  }

Yep, I remember that.

The thing is that there is no explicit way to unregister a reader or a writer.
What's the advantage of using "" instead of null? (I don't like both... ;-))
Was null causing any problem?

Perhaps, we should add an explicit unsetBase*ClassName(...) call.

In future, it would be nice to give users the ability to easily register/unregister their RDF readers and writers making easy for third parties to add their own  serialization formats and/or parsers
from native formats.

Paolo

Re: Inline data in queries

Posted by Andy Seaborne <an...@apache.org>.
On 27/04/12 20:12, Stephen Allen wrote:
> Great!  Thanks for pushing this.
>
> Although we may be limited in reordering, if we make the assumption
> that the data table will be small enough to comfortably fit into
> memory [1], then at least we can do a hash-join.
>
> -Stephen
>
> [1]  We currently parse the query into an in-memory data structure, so
> we can't support arbitrarily large queries anyway.  A user could work
> around that with a temporary graph (although he'd have to normalize
> the table into a graph in order to do so).

Agreed - and also it would be nice to:

1/ name a table in the database for the data
2/ name results of an earlier query to combine with this query.
3/ Have tables in the database that are cached patterns that are 
frequently queried
...

	Andy


Re: Inline data in queries

Posted by Stephen Allen <sa...@apache.org>.
Great!  Thanks for pushing this.

Although we may be limited in reordering, if we make the assumption
that the data table will be small enough to comfortably fit into
memory [1], then at least we can do a hash-join.

-Stephen

[1]  We currently parse the query into an in-memory data structure, so
we can't support arbitrarily large queries anyway.  A user could work
around that with a temporary graph (although he'd have to normalize
the table into a graph in order to do so).


On Fri, Apr 27, 2012 at 9:40 AM, Andy Seaborne <an...@apache.org> wrote:
> On 26/04/12 18:44, Andy Seaborne wrote:
>>
>> On 26/04/12 17:58, Robert Vesse wrote:
>>>
>>> Ok
>>>
>>>
>>> Not sure I entirely understand what the proposal on BINDINGS is so please
>>> correct me if I'm wrong. It sounds like it will be possible to the do the
>>> following:
>>>
>>> SELECT *
>>> {
>>> BINDINGS ?x { (<a> ) (<b> ) (<c> ) }
>>> ?x<property> ?value .
>>> }
>>>
>>> Or have I misunderstood?
>>
>>
>> No - spot on although the word may not be BINDINGS. BIND and BINDINGS -
>> similar but different.
>>
>> It's a data table - it's logically joined to the rest of the group.
>>
>> The eval order is
>>
>> patterns - BIND - Inline DATA - FILTER
>>
>> i.e. between BIND and FILTER.
>>
>> And, ideally, I want to suggest a two syntax forms:
>>
>> # One variable short form.
>> DATA ?var1 { 1 2 <a> <b> }
>>
>> # Multi variable
>> DATA (?var1 ?var2) {
>> (1 <b>)
>> (3 <a>) }
>>
>> because one variable is a common usage.
>>
>> But we'll have to see about the details - I am just experimenting with
>> the design at the moment in preparation for a complete proposal to
>> SPARQL-WG.
>>
>> Comments welcome - nothing too radical is going to be accepted though.
>
>
> Implementation in ARQ dev build.
>
> I did find that you can't do the eval reordering game - the DATA block has
> to execute where it's defined so that OPTIONALs work as (I) expected.
>
>        Andy

Re: Inline data in queries

Posted by Andy Seaborne <an...@apache.org>.
On 26/04/12 18:44, Andy Seaborne wrote:
> On 26/04/12 17:58, Robert Vesse wrote:
>> Ok
>>
>>
>> Not sure I entirely understand what the proposal on BINDINGS is so please
>> correct me if I'm wrong. It sounds like it will be possible to the do the
>> following:
>>
>> SELECT *
>> {
>> BINDINGS ?x { (<a> ) (<b> ) (<c> ) }
>> ?x<property> ?value .
>> }
>>
>> Or have I misunderstood?
>
> No - spot on although the word may not be BINDINGS. BIND and BINDINGS -
> similar but different.
>
> It's a data table - it's logically joined to the rest of the group.
>
> The eval order is
>
> patterns - BIND - Inline DATA - FILTER
>
> i.e. between BIND and FILTER.
>
> And, ideally, I want to suggest a two syntax forms:
>
> # One variable short form.
> DATA ?var1 { 1 2 <a> <b> }
>
> # Multi variable
> DATA (?var1 ?var2) {
> (1 <b>)
> (3 <a>) }
>
> because one variable is a common usage.
>
> But we'll have to see about the details - I am just experimenting with
> the design at the moment in preparation for a complete proposal to
> SPARQL-WG.
>
> Comments welcome - nothing too radical is going to be accepted though.

Implementation in ARQ dev build.

I did find that you can't do the eval reordering game - the DATA block 
has to execute where it's defined so that OPTIONALs work as (I) expected.

	Andy

Re: Inline data in queries

Posted by Andy Seaborne <an...@apache.org>.
On 26/04/12 17:58, Robert Vesse wrote:
> Ok
>
>
> Not sure I entirely understand what the proposal on BINDINGS is so please
> correct me if I'm wrong.  It sounds like it will be possible to the do the
> following:
>
> SELECT *
> {
>    BINDINGS ?x { (<a>  ) (<b>  ) (<c>  ) }
>    ?x<property>  ?value .
> }
>
> Or have I misunderstood?

No - spot on although the word may not be BINDINGS. BIND and BINDINGS - 
similar but different.

It's a data table - it's logically joined to the rest of the group.

The eval order is

patterns - BIND - Inline DATA - FILTER

i.e. between BIND and FILTER.

And, ideally, I want to suggest a two syntax forms:

# One variable short form.
DATA ?var1 { 1 2 <a> <b> }

# Multi variable
DATA (?var1 ?var2) {
    (1 <b>)
    (3 <a>) }

because one variable is a common usage.

But we'll have to see about the details - I am just experimenting with 
the design at the moment in preparation for a complete proposal to 
SPARQL-WG.

Comments welcome - nothing too radical is going to be accepted though.

	Andy

>
> Rob
>
> On 4/26/12 9:30 AM, "Andy Seaborne"<an...@apache.org>  wrote:
>
>> Re: Carrying raw query strings (public API change).
>> On 13/04/12 17:00, Robert Vesse wrote:
>>> The more I think about this the less I think it actually solves our
>>> problem (it being carrying raw query strings) because we are still
>>> left with the issue that a query may turn into multiple queries
>>> internally and our developers wanted the query string to associate
>>> with each of those internal queries but that isn't a 1:1
>>> relationship
>>>
>>> Maybe it is for the best if I just go ahead and revert those
>>> changes?
>>
>> I was removing some unfinished stuff in Query so I took the liberty of
>> reverting the changes in passing.  Hope that's OK.
>>
>> The "unfinished stuff" was the syntax part of having inline data (AKA
>> BINDINGS anywhere in a query).  It wasn't going anywhere and there is a
>> better way.
>>
>> Stephen's comment to SPARQL-WG is one of several:
>>
>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0010.
>> html
>>
>> The SPARQL-WG is considering (but has not decided whether to do or not
>> do yet) allowing BINDINGS anywhere in a {}-block, which generalises the
>> subquery, or removes the need for placeholder { SELECT * WHERE { } }
>> just to attach the BINDINGS.
>>
>> Going as far as referencing external data (the results of a previous
>> query) is too radical at this stage for standardisation but it's a
>> logical step for an ARQ extension.
>>
>> 	Andy
>


Re: Inline data in queries

Posted by Robert Vesse <rv...@yarcdata.com>.
Ok


Not sure I entirely understand what the proposal on BINDINGS is so please
correct me if I'm wrong.  It sounds like it will be possible to the do the
following:

SELECT *
{
  BINDINGS ?x { ( <a> ) ( <b> ) ( <c> ) }
  ?x <property> ?value .
}

Or have I misunderstood?

Rob

On 4/26/12 9:30 AM, "Andy Seaborne" <an...@apache.org> wrote:

>Re: Carrying raw query strings (public API change).
>On 13/04/12 17:00, Robert Vesse wrote:
>> The more I think about this the less I think it actually solves our
>> problem (it being carrying raw query strings) because we are still
>> left with the issue that a query may turn into multiple queries
>> internally and our developers wanted the query string to associate
>> with each of those internal queries but that isn't a 1:1
>> relationship
>>
>> Maybe it is for the best if I just go ahead and revert those
>> changes?
>
>I was removing some unfinished stuff in Query so I took the liberty of
>reverting the changes in passing.  Hope that's OK.
>
>The "unfinished stuff" was the syntax part of having inline data (AKA
>BINDINGS anywhere in a query).  It wasn't going anywhere and there is a
>better way.
>
>Stephen's comment to SPARQL-WG is one of several:
>
>http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0010.
>html
>
>The SPARQL-WG is considering (but has not decided whether to do or not
>do yet) allowing BINDINGS anywhere in a {}-block, which generalises the
>subquery, or removes the need for placeholder { SELECT * WHERE { } }
>just to attach the BINDINGS.
>
>Going as far as referencing external data (the results of a previous
>query) is too radical at this stage for standardisation but it's a
>logical step for an ARQ extension.
>
>	Andy


Inline data in queries

Posted by Andy Seaborne <an...@apache.org>.
Re: Carrying raw query strings (public API change).
On 13/04/12 17:00, Robert Vesse wrote:
> The more I think about this the less I think it actually solves our
> problem (it being carrying raw query strings) because we are still
> left with the issue that a query may turn into multiple queries
> internally and our developers wanted the query string to associate
> with each of those internal queries but that isn't a 1:1
> relationship
>
> Maybe it is for the best if I just go ahead and revert those
> changes?

I was removing some unfinished stuff in Query so I took the liberty of 
reverting the changes in passing.  Hope that's OK.

The "unfinished stuff" was the syntax part of having inline data (AKA 
BINDINGS anywhere in a query).  It wasn't going anywhere and there is a 
better way.

Stephen's comment to SPARQL-WG is one of several:

http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0010.html

The SPARQL-WG is considering (but has not decided whether to do or not 
do yet) allowing BINDINGS anywhere in a {}-block, which generalises the 
subquery, or removes the need for placeholder { SELECT * WHERE { } } 
just to attach the BINDINGS.

Going as far as referencing external data (the results of a previous 
query) is too radical at this stage for standardisation but it's a 
logical step for an ARQ extension.

	Andy

Re: Carrying raw query strings (public API change).

Posted by Andy Seaborne <an...@apache.org>.
On 13/04/12 17:00, Robert Vesse wrote:
> We work at the level of QueryEngine but we have multiple
> implementations as depending on the query either the entire thing can
> be handled by the backend or only parts of it can so we override
> eval() in our query engine implementations.
>
> Then we either use a OpExecutor or just parcel the query off
> wholesale to our backend.  So I was slightly inaccurate in that we no
> longer use StageGenerator (though we did at one point)
>
> Regardless we are still at a level of the API where we don't see the
> QueryExecution so we couldn't utilize the context even if we wanted
> to

The context get everywhere.  QueryEngineBase has it as does 
ExecutionContext.

The context is the merge of the global and the dataset specific context 
then add in user settings.  It will be available from ExecutionContext 
when it gets to the eval code.

ExecutionContext.getContext()

or I hope so - it's how the dataset and active graph get passed around 
to actually deliver data!

It's even available in custom functions - FunctionEnv is the interface 
it exposes but it is really the ExecutionContext object.

And it does the iterator tracking (have you met tracking yet? :-)

> The more I think about this the less I think it actually solves our
> problem (it being carrying raw query strings) because we are still
> left with the issue that a query may turn into multiple queries
> internally and our developers wanted the query string to associate
> with each of those internal queries but that isn't a 1:1
> relationship
>
> Maybe it is for the best if I just go ahead and revert those
> changes?

OK, no rush.  ... and maybe open a JIRA if you think there is an 
architectural point here.  Given your last comment (splitting queries) 
maybe there isn't, or isn't at the moment.

Stephen's API experiment may have something to say here.

	Andy

>
> Rob
>
> On Apr 12, 2012, at 12:02 PM, Andy Seaborne wrote:
>
>> Rob -
>>
>> On 12/04/12 17:43, Robert Vesse wrote:
>>> The notion of jobs makes sense to me but it implies some
>>> refactoring of our APIs are is simply not feasible in our current
>>> setup where we use Fuseki this is not doable because we are
>>> extending Fuseki indirectly by hooking into ARQs
>>> QueryExecutionFactory mechanism and so don't have any means to
>>> create this Job thing prior to starting to see the actual query
>>> in our ARQ integration layer.
>>>
>>> Even in a hypothetical situation where we did have such
>>> capability we still run into the issue that at some point the
>>> query has to drop into the ARQ machinery to be processed at which
>>> point it has to be a query and we'd lose any visibility back to
>>> our Job notion anyway. This is especially true since the point at
>>> which we actually send work off to our backend for processing is
>>> potentially very low level in the ARQ API (as far down as the
>>> Stage Generator layer)
>>
>> This makes me a bit nervous; the needs of Cray to tunnel info from
>> one place to another because of current code structure balanced
>> against a long term change to the public API.
>>
>> The good news is there is a better way in Fuseki.
>>
>> The QueryExecution object is a one-time-use object and it has
>> somewhere to put such additional information - getContext().  This
>> is where the current time for the query goes for example.  It even
>> gets to the StageGenerator.  It's already got the query as an
>> object.
>>
>> The Fuseki-specific HttpActionQuery doesn't get into ARQ - it's the
>> nearest I can see the "Job" from the point of view of the web
>> request.
>>
>> So we can have the QueryExecution carry a per-operation label.
>>
>> Change:
>>
>> 1/ Add a new symbol: ARQ.queryLabel
>>
>> 2/ SPARQL_Query.executeQuery creates the QueryExecution and can set
>> the context with a key/value that is the query string as
>> ARQ.queryLabel.  It knowns the queryStringLog -- it can take the
>> original query string as well, or we can put it in the
>> HttpActionQuery and put in the execution context.
>>
>> (Aside: I thought you'd be using OpExecutor so as to access the
>> filters and LeftJoins as -- different discussion though ... though
>> I'd like to remove StageGenerator because there are too many ways
>> to do very similar things makign it messier to add new storage
>> layers .., so compatibility issue noted!)
>>
>>> I don't think having the raw query string breaks the Java
>>> equality/hash code contract since the Query class is a
>>> structural representation of a query, preserving the original
>>> query string is just a convenience to users and doesn't change
>>> the fact that the class is a structural representation of a query
>>> and by definition different query strings can resolve to the same
>>> definition (white space, comments, prefix ordering etc.)
>>
>> In your use case, sure, the string is not particularly
>> significant.
>>
>> The contract for .equals in java is that two objects to be equal
>> they must be substitutable for one another.  Jena is a general
>> library - some app may rely on the query label for display
>> purposes, or as a key into another data structure.  That's the
>> long-term promise being made and it's hard to predicate what some
>> app may do - hence my desire for a strict adherence to the .equals
>> contract.
>>
>> Also, by preserving the query string and comments, there is a
>> slippery slope to putting stuff in comments and relying on it.
>>
>> Preserving the query string is convenience in your use case but if
>> some other use case is relying on the label for something, it is no
>> longer ancillary.
>>
>> This shows a difference - a bit artificial but it's also supposed
>> to be small example -- image the two "put" operations being in very
>> different parts of the code:
>>
>> try changing the order of the two .put -- I expected the different
>> output when it was put(q1,..), put(q2,...) but the way round below.
>> We live and learn about the runtime library implementation --
>> HashMap.put sets the entry to the last accessed object.  Other JREs
>> may differ.
>>
>> public class QueryLabels { public static void main(String ...
>> argv) { Map<Query, String>  x = new HashMap<>() ; Query q1 =
>> QueryFactory.create("ASK{}") ; Query q2 =
>> QueryFactory.create("ASK{} # Andy's query ") ;
>>
>> x.put(q2, q2.getRawQuery()) ; x.put(q1, q1.getRawQuery()) ;
>>
>> if ( x.containsKey(q2) ) { System.out.println(x.get(q2)) ;
>> System.out.println("---") ; System.out.println(q2.getRawQuery()) ;
>> } else { System.out.println("Not found") ; } } }
>>
>


Re: Carrying raw query strings (public API change).

Posted by Robert Vesse <rv...@yarcdata.com>.
We work at the level of QueryEngine but we have multiple implementations as depending on the query either the entire thing can be handled by the backend or only parts of it can so we override eval() in our query engine implementations.

Then we either use a OpExecutor or just parcel the query off wholesale to our backend.  So I was slightly inaccurate in that we no longer use StageGenerator (though we did at one point)

Regardless we are still at a level of the API where we don't see the QueryExecution so we couldn't utilize the context even if we wanted to

The more I think about this the less I think it actually solves our problem (it being carrying raw query strings) because we are still left with the issue that a query may turn into multiple queries internally and our developers wanted the query string to associate with each of those internal queries but that isn't a 1:1 relationship

Maybe it is for the best if I just go ahead and revert those changes?

Rob

On Apr 12, 2012, at 12:02 PM, Andy Seaborne wrote:

> Rob -
> 
> On 12/04/12 17:43, Robert Vesse wrote:
>> The notion of jobs makes sense to me but it implies some refactoring
>> of our APIs are is simply not feasible in our current setup where we
>> use Fuseki this is not doable because we are extending Fuseki
>> indirectly by hooking into ARQs QueryExecutionFactory mechanism and
>> so don't have any means to create this Job thing prior to starting to
>> see the actual query in our ARQ integration layer.
> >
>> Even in a hypothetical situation where we did have such capability we
>> still run into the issue that at some point the query has to drop
>> into the ARQ machinery to be processed at which point it has to be a
>> query and we'd lose any visibility back to our Job notion anyway.
>> This is especially true since the point at which we actually send
>> work off to our backend for processing is potentially very low level
>> in the ARQ API (as far down as the Stage Generator layer)
> 
> This makes me a bit nervous; the needs of Cray to tunnel info from one place to another because of current code structure balanced against a long term change to the public API.
> 
> The good news is there is a better way in Fuseki.
> 
> The QueryExecution object is a one-time-use object and it has somewhere to put such additional information - getContext().  This is where the current time for the query goes for example.  It even gets to the StageGenerator.  It's already got the query as an object.
> 
> The Fuseki-specific HttpActionQuery doesn't get into ARQ - it's the nearest I can see the "Job" from the point of view of the web request.
> 
> So we can have the QueryExecution carry a per-operation label.
> 
> Change:
> 
> 1/ Add a new symbol: ARQ.queryLabel
> 
> 2/ SPARQL_Query.executeQuery creates the QueryExecution and can set the context with a key/value that is the query string as ARQ.queryLabel.  It knowns the queryStringLog -- it can take the original query string as well, or we can put it in the HttpActionQuery and put in the execution context.
> 
> (Aside: I thought you'd be using OpExecutor so as to access the filters and LeftJoins as -- different discussion though ... though I'd like to remove StageGenerator because there are too many ways to do very similar things makign it messier to add new storage layers .., so compatibility issue noted!)
> 
>> I don't think having the raw query string breaks the Java
>> equality/hash code contract since the Query class is a structural
>> representation of a query, preserving the original query string is
>> just a convenience to users and doesn't change the fact that the
>> class is a structural representation of a query and by definition
>> different query strings can resolve to the same definition (white
>> space, comments, prefix ordering etc.)
> 
> In your use case, sure, the string is not particularly significant.
> 
> The contract for .equals in java is that two objects to be equal they must be substitutable for one another.  Jena is a general library - some app may rely on the query label for display purposes, or as a key into another data structure.  That's the long-term promise being made and it's hard to predicate what some app may do - hence my desire for a strict adherence to the .equals contract.
> 
> Also, by preserving the query string and comments, there is a slippery slope to putting stuff in comments and relying on it.
> 
> Preserving the query string is convenience in your use case but if some other use case is relying on the label for something, it is no longer ancillary.
> 
> This shows a difference - a bit artificial but it's also supposed to be small example -- image the two "put" operations being in very different parts of the code:
> 
> try changing the order of the two .put -- I expected the different output when it was put(q1,..), put(q2,...) but the way round below.  We live and learn about the runtime library implementation -- HashMap.put sets the entry to the last accessed object.  Other JREs may differ.
> 
> public class QueryLabels
> {
>    public static void main(String ... argv)
>    {
>        Map<Query, String> x = new HashMap<>() ;
>        Query q1 = QueryFactory.create("ASK{}") ;
>        Query q2 = QueryFactory.create("ASK{} # Andy's query ") ;
> 
>        x.put(q2, q2.getRawQuery()) ;
>        x.put(q1, q1.getRawQuery()) ;
> 
>        if ( x.containsKey(q2) )
>        {
>            System.out.println(x.get(q2)) ;
>            System.out.println("---") ;
>            System.out.println(q2.getRawQuery()) ;
>        }
>        else
>        {
>            System.out.println("Not found") ;
>        }
>    }
> }
> 


Re: Carrying raw query strings (public API change).

Posted by Andy Seaborne <an...@apache.org>.
Rob -

On 12/04/12 17:43, Robert Vesse wrote:
> The notion of jobs makes sense to me but it implies some refactoring
> of our APIs are is simply not feasible in our current setup where we
> use Fuseki this is not doable because we are extending Fuseki
> indirectly by hooking into ARQs QueryExecutionFactory mechanism and
> so don't have any means to create this Job thing prior to starting to
> see the actual query in our ARQ integration layer.
 >
> Even in a hypothetical situation where we did have such capability we
> still run into the issue that at some point the query has to drop
> into the ARQ machinery to be processed at which point it has to be a
> query and we'd lose any visibility back to our Job notion anyway.
> This is especially true since the point at which we actually send
> work off to our backend for processing is potentially very low level
> in the ARQ API (as far down as the Stage Generator layer)

This makes me a bit nervous; the needs of Cray to tunnel info from one 
place to another because of current code structure balanced against a 
long term change to the public API.

The good news is there is a better way in Fuseki.

The QueryExecution object is a one-time-use object and it has somewhere 
to put such additional information - getContext().  This is where the 
current time for the query goes for example.  It even gets to the 
StageGenerator.  It's already got the query as an object.

The Fuseki-specific HttpActionQuery doesn't get into ARQ - it's the 
nearest I can see the "Job" from the point of view of the web request.

So we can have the QueryExecution carry a per-operation label.

Change:

1/ Add a new symbol: ARQ.queryLabel

2/ SPARQL_Query.executeQuery creates the QueryExecution and can set the 
context with a key/value that is the query string as ARQ.queryLabel.  It 
knowns the queryStringLog -- it can take the original query string as 
well, or we can put it in the HttpActionQuery and put in the execution 
context.

(Aside: I thought you'd be using OpExecutor so as to access the filters 
and LeftJoins as -- different discussion though ... though I'd like to 
remove StageGenerator because there are too many ways to do very similar 
things makign it messier to add new storage layers .., so compatibility 
issue noted!)

> I don't think having the raw query string breaks the Java
> equality/hash code contract since the Query class is a structural
> representation of a query, preserving the original query string is
> just a convenience to users and doesn't change the fact that the
> class is a structural representation of a query and by definition
> different query strings can resolve to the same definition (white
> space, comments, prefix ordering etc.)

In your use case, sure, the string is not particularly significant.

The contract for .equals in java is that two objects to be equal they 
must be substitutable for one another.  Jena is a general library - some 
app may rely on the query label for display purposes, or as a key into 
another data structure.  That's the long-term promise being made and 
it's hard to predicate what some app may do - hence my desire for a 
strict adherence to the .equals contract.

Also, by preserving the query string and comments, there is a slippery 
slope to putting stuff in comments and relying on it.

Preserving the query string is convenience in your use case but if some 
other use case is relying on the label for something, it is no longer 
ancillary.

This shows a difference - a bit artificial but it's also supposed to be 
small example -- image the two "put" operations being in very different 
parts of the code:

try changing the order of the two .put -- I expected the different 
output when it was put(q1,..), put(q2,...) but the way round below.  We 
live and learn about the runtime library implementation -- HashMap.put 
sets the entry to the last accessed object.  Other JREs may differ.

public class QueryLabels
{
     public static void main(String ... argv)
     {
         Map<Query, String> x = new HashMap<>() ;
         Query q1 = QueryFactory.create("ASK{}") ;
         Query q2 = QueryFactory.create("ASK{} # Andy's query ") ;

         x.put(q2, q2.getRawQuery()) ;
         x.put(q1, q1.getRawQuery()) ;

         if ( x.containsKey(q2) )
         {
             System.out.println(x.get(q2)) ;
             System.out.println("---") ;
             System.out.println(q2.getRawQuery()) ;
         }
         else
         {
             System.out.println("Not found") ;
         }
     }
}


Re: Carrying raw query strings (public API change).

Posted by Robert Vesse <rv...@yarcdata.com>.
The notion of jobs makes sense to me but it implies some refactoring of our APIs are is simply not feasible in our current setup where we use Fuseki this is not doable because we are extending Fuseki indirectly by hooking into ARQs QueryExecutionFactory mechanism and so don't have any means to create this Job thing prior to starting to see the actual query in our ARQ integration layer.

Even in a hypothetical situation where we did have such capability we still run into the issue that at some point the query has to drop into the ARQ machinery to be processed at which point it has to be a query and we'd lose any visibility back to our Job notion anyway.  This is especially true since the point at which we actually send work off to our backend for processing is potentially very low level in the ARQ API (as far down as the Stage Generator layer)

I don't think having the raw query string breaks the Java equality/hash code contract since the Query class is a structural representation of a query, preserving the original query string is just a convenience to users and doesn't change the fact that the class is a structural representation of a query and by definition different query strings can resolve to the same definition (white space, comments, prefix ordering etc.)

Rob

On Apr 10, 2012, at 12:21 PM, Andy Seaborne wrote:

> On 10/04/12 17:08, Robert Vesse wrote:
> 
>> The primary motivation of this is that if like us you are
>> intercepting queries and providing your own processing you have no
>> visibility back to the original query string since at the level of
>> QueryExecutionFactory and query execution you have only a Query
>> object and an algebra object
>> 
>> In our architecture queries may be very long running so we have a
>> queue into which we give users visibility but right now we can only
>> show them the serialized form of their parsed query.  Due to the nice
>> syntax printing and possible optimization ARQ does on the query that
>> serialized form may look very different and users are confused by
>> this.
>> 
>> The ability to preserve comments is of particular interest because we
>> may want to use comments as a means to tag queries to indicate where
>> they originated from.  Right now the only other mechanism that would
>> let us do this would be to define a fake prefix which encodes this
>> (perhaps with tag URLs) but that only covers one use case and still
>> doesn't allow us to preserve more free form description of the
>> queries in the form of comments.
> 
> Rob,
> 
> That use case make sense to me and I was just about to reply ... but I went for a run and something occurred to me.
> 
> Query objects provide structure equality.  They override .equals(Object) and .hashCode().
> 
> This allows query objects to be used in hash tables for example.  You might have a cache of results by query to avoid re-execution of a query (picked from a library by two different people?)
> 
> I have used this for a query to results cache (see my github and project LD-Access for example).
> 
> Whether the label is part of the quality contract or not is tricky - whether it is or isn't seems to get into a bit of trouble either way round.  If it isn't (aside from violating the general Java contract), then the object in the cache/map/set/whatever may not be recognized by the user a the one put in - the label may change or disappear.  If you do, then it a more specific instance.
> 
> In your system, what I read as happening is that there is a "Job" - at the moment the Query is the Job but a job may have other characteristics like submission time, who submitted it, priority etc etc.  Putting a label on the job seems the right thing because it can carry a lot of other stuff like the submitter and also return the execution time, the cost, etc.  The Job then has Job.getQuery()
> 
> To put it another way, a query is a class - the job is the instance.
> 
> The tagging is a good example - a query may come from a library of queries so is it labelled as from the library or the person submitting it?
> 
> There has, in the past, been META on queries for stashing away labels etc.  but it gets confusing.  Better to put external to the query e.g. Job.
> 
> 	Andy


Re: Carrying raw query strings (public API change).

Posted by Andy Seaborne <an...@apache.org>.
On 10/04/12 17:08, Robert Vesse wrote:

> The primary motivation of this is that if like us you are
> intercepting queries and providing your own processing you have no
> visibility back to the original query string since at the level of
> QueryExecutionFactory and query execution you have only a Query
> object and an algebra object
>
> In our architecture queries may be very long running so we have a
> queue into which we give users visibility but right now we can only
> show them the serialized form of their parsed query.  Due to the nice
> syntax printing and possible optimization ARQ does on the query that
> serialized form may look very different and users are confused by
> this.
>
> The ability to preserve comments is of particular interest because we
> may want to use comments as a means to tag queries to indicate where
> they originated from.  Right now the only other mechanism that would
> let us do this would be to define a fake prefix which encodes this
> (perhaps with tag URLs) but that only covers one use case and still
> doesn't allow us to preserve more free form description of the
> queries in the form of comments.

Rob,

That use case make sense to me and I was just about to reply ... but I 
went for a run and something occurred to me.

Query objects provide structure equality.  They override .equals(Object) 
and .hashCode().

This allows query objects to be used in hash tables for example.  You 
might have a cache of results by query to avoid re-execution of a query 
(picked from a library by two different people?)

I have used this for a query to results cache (see my github and project 
LD-Access for example).

Whether the label is part of the quality contract or not is tricky - 
whether it is or isn't seems to get into a bit of trouble either way 
round.  If it isn't (aside from violating the general Java contract), 
then the object in the cache/map/set/whatever may not be recognized by 
the user a the one put in - the label may change or disappear.  If you 
do, then it a more specific instance.

In your system, what I read as happening is that there is a "Job" - at 
the moment the Query is the Job but a job may have other characteristics 
like submission time, who submitted it, priority etc etc.  Putting a 
label on the job seems the right thing because it can carry a lot of 
other stuff like the submitter and also return the execution time, the 
cost, etc.  The Job then has Job.getQuery()

To put it another way, a query is a class - the job is the instance.

The tagging is a good example - a query may come from a library of 
queries so is it labelled as from the library or the person submitting it?

There has, in the past, been META on queries for stashing away labels 
etc.  but it gets confusing.  Better to put external to the query e.g. Job.

	Andy

Re: Carrying raw query strings (public API change).

Posted by Robert Vesse <rv...@yarcdata.com>.
The primary motivation of this is that if like us you are intercepting queries and providing your own processing you have no visibility back to the original query string since at the level of QueryExecutionFactory and query execution you have only a Query object and an algebra object

In our architecture queries may be very long running so we have a queue into which we give users visibility but right now we can only show them the serialized form of their parsed query.  Due to the nice syntax printing and possible optimization ARQ does on the query that serialized form may look very different and users are confused
by this.

The ability to preserve comments is of particular interest because we may want to use comments as a means to tag queries to indicate where they originated from.  Right now the only other mechanism that would let us do this would be to define a fake prefix which encodes this (perhaps with tag URLs) but that only covers one use case and still doesn't allow us to preserve more free form description of the queries in the form of comments.

Rob


On Apr 10, 2012, at 1:06 AM, Andy Seaborne wrote:

> On 10/04/12 01:15, rvesse@apache.org wrote:
>> Author: rvesse
>> Date: Tue Apr 10 00:15:25 2012
>> New Revision: 1311528
>> 
>> URL:http://svn.apache.org/viewvc?rev=1311528&view=rev
>> Log:
>> Adding ability to preserve the raw query string in a Query object which is useful for applications which need to inspect the original user input (e.g. for comments etc).
> 
> Rob,
> 
> Could you provide some context for this?
> 
> (I wonder if this is the best way of handling comments, especially ones at the start of the query.)
> 
> 	Andy


Carrying raw query strings (public API change).

Posted by Andy Seaborne <an...@apache.org>.
On 10/04/12 01:15, rvesse@apache.org wrote:
> Author: rvesse
> Date: Tue Apr 10 00:15:25 2012
> New Revision: 1311528
>
> URL:http://svn.apache.org/viewvc?rev=1311528&view=rev
> Log:
> Adding ability to preserve the raw query string in a Query object which is useful for applications which need to inspect the original user input (e.g. for comments etc).

Rob,

Could you provide some context for this?

(I wonder if this is the best way of handling comments, especially ones 
at the start of the query.)

	Andy