You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by ehedgehog <gi...@git.apache.org> on 2016/07/18 14:43:32 UTC

[GitHub] jena pull request #157: Fixes for JENA-1212

GitHub user ehedgehog opened a pull request:

    https://github.com/apache/jena/pull/157

    Fixes for JENA-1212

    Comparator used by sorting blocks of bindings now
    checks for cancellation every 10000 compares and
    aborts the sort if cancelled=true.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/epimorphics/jena master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #157
    
----
commit d32cf8f45eab662618920731c065ade53dd951c9
Author: Chris Dollin <eh...@googlemail.com>
Date:   2016-07-18T14:40:00Z

    Fixes for JENA-1212

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71161082
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/SortedDataBag.java ---
    @@ -69,19 +69,81 @@
         
         protected final ThresholdPolicy<E> policy;
         protected final SerializationFactory<E> serializationFactory;
    -    protected final Comparator<? super E> comparator;
    +    protected final CanAbortComparator comparator;
         
         protected boolean finishedAdding = false;
         protected boolean spilled = false;
         protected boolean closed = false;
    +    protected volatile boolean cancelled;
         
         public SortedDataBag(ThresholdPolicy<E> policy, SerializationFactory<E> serializerFactory, Comparator<? super E> comparator)
         {
             this.policy = policy;
             this.serializationFactory = serializerFactory;
    -        this.comparator = comparator;
    +        this.comparator = new CanAbortComparator(comparator);
         }
         
    +    private final class CanAbortComparator implements Comparator<E> 
    --- End diff --
    
    It might be worth breaking this out into its own compilation unit. In the current arrangement, this is not a `static` class, and I don't think it could be because it is picking up the type parameter from the enclosing class. That doesn't seem ideal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Various dev@ email.
    
    It's the java code conventions with spaces not tabs (so code displays in github!), indent of 4, opening brace on same line (these are the eclipse defaults or very close to them IIRC). No trailing whitespace if possible.  Use common sense to make it readable for the next person.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    The most cost a `volatile` usually incurs is a few times the time to retrieve the value (and I shouldn't think that happens normally on x86 archs, see [here](http://gee.cs.oswego.edu/dl/jmm/cookbook.html)). You are balancing that that against an integer increment, a modulo operation, and an integer comparison. Some of that may get optimized, but then the cost of the `volatile` may be much lower. Even more, to do this with precision, you'd have to make `count` into a `volatile`, too, which makes the whole thing work out again the extra machinery. Otherwise, threads might be using cached values of `count`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Is there a summary of the approved formatting somewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I'm attempting some timing tests to estimate how much overhead
    making sorts cancellable costs. Would it be better to (a) merge and
    then follow-up with performance measurement/improvement as a
    separate task or (b) defer merging until timing results available and
    acceptable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    We've run a Fuseki with the modified sort in our staging setup and
    verified (by comparision with the timings using an unmodified
    Fuseki) that timeout cancellation finishes faster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Correction: test is sort with existing BindingComparator contrasted with 
    BindingComparator wrapped in AbortableComparator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    We could have a note of that on the contributions page, if that seems useful. It seems a little oblique to me to tell people to grep the mailing lists for information like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I have spent a lot of time thinking hard about this kind of stuff, only to find I haven't thought as long or as hard as the folks who work on the JVMs and compilers. {grin}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/157


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I don't see that an AtomicBoolean has any advantage over a volatile variable in this code. The only changes made are setting `cancelled` (the volatile boolean) from `false` (initial value) to `true` when `cancel()` is called. Once it's true it stays true. And the only place it is written to and read from from is within AbortableComparator itself.
    
    Is volatile perhaps not noisy enough to trigger a reader's notice?
    
    Chris



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/157
  
    It's sorting time inside of Fuseki that will really count. A few % is OK for me - optimizing other things, e.g. the comparison process of {{BindingComparator}} itself, is better to recover any loss.
    
    I am a bit surprised the code makes any difference because the comparison of two items is rather flexible ... which means not cheap.
    
    I agree with @ajs6f - let's merge it and the see what's what such as folding the cancellability into {{BindingComparator}} is one option.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/157
  
    +1 to merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71170120
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/SortedDataBag.java ---
    @@ -69,19 +69,81 @@
         
         protected final ThresholdPolicy<E> policy;
         protected final SerializationFactory<E> serializationFactory;
    -    protected final Comparator<? super E> comparator;
    +    protected final CanAbortComparator comparator;
         
         protected boolean finishedAdding = false;
         protected boolean spilled = false;
         protected boolean closed = false;
    +    protected volatile boolean cancelled;
         
         public SortedDataBag(ThresholdPolicy<E> policy, SerializationFactory<E> serializerFactory, Comparator<? super E> comparator)
         {
             this.policy = policy;
             this.serializationFactory = serializerFactory;
    -        this.comparator = comparator;
    +        this.comparator = new CanAbortComparator(comparator);
         }
         
    +    private final class CanAbortComparator implements Comparator<E> 
    +    	{
    +    	/**
    +    	    The test for whether the sort has been cancelled is
    +    	    performed every <code>cancelTestFrequency</code> comparisons.
    +    	    This reduces the (presumed) overhead of access to a
    +    	    volatile boolean.    	    
    +    	*/
    +    	static final int cancelTestFrequency = 10000;
    +    	
    +    	/**
    +    	    Count of the number of times this comparator has been called.
    +    	*/
    +		int count = 0;
    +		
    +		final Comparator<? super E> baseComparator;
    +		
    +		public CanAbortComparator(Comparator<? super E> comparator) 
    +			{
    +			this.baseComparator = comparator;
    +			}
    +
    +		@Override public int compare(E o1, E o2) 
    +		{	
    +			count += 1;
    +			if (count % cancelTestFrequency == 0) 
    +			{
    +				if (cancelled) throw new AbandonSort();
    +			}
    +			return baseComparator.compare(o1, o2);
    +		}
    +		
    +		/**
    +		    Sort the array <code>e</code> using this comparator
    +		 	with the additional ability to abort the sort.
    +		*/
    +		public boolean abortableSort(E[] e) {
    +			try { Arrays.sort(e, this); }
    --- End diff --
    
    If the concern of this PR is being able to deal with really big piles of data, maybe it's worth adding a switch to use `Arrays.parallelSort(e, this)` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Okay, but just to be clear, that is faster by the "right" amount? In other words, if you trigger a really really long sort and immediately abort, it more-or-less immediately aborts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71347034
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/AbortableComparator.java ---
    @@ -0,0 +1,93 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.jena.atlas.data;
    +
    +import java.util.Arrays;
    +import java.util.Comparator;
    +
    +public final class AbortableComparator<E> implements Comparator<E> 
    +{
    +	public AbortableComparator(Comparator<? super E> comparator) 
    +	{
    +		this.baseComparator = comparator;
    +	}
    +	
    +	/**
    +        	<code>AbandonSort</code> is the exception thrown from
    +        	<code>AbortableComparator</code> to abandon a sort.
    +	 */
    +	public static class AbandonSort extends RuntimeException 
    +	{
    +		private static final long serialVersionUID = 1L;
    +	}
    +	
    +	public static enum Finish {COMPLETED, ABORTED}
    +	
    +	/**
    +	    The test for whether the sort has been cancelled is
    +	    performed every <code>cancelTestFrequency</code> comparisons.
    +	    This reduces the (presumed) overhead of access to a
    +	    volatile boolean.    	    
    +	*/
    +	static final int cancelTestFrequency = 10000;
    +	
    +	/**
    +	    Count of the number of times this comparator has been called.
    +	*/
    +	int count = 0;
    +	
    +	protected volatile boolean cancelled;
    +	
    +	final Comparator<? super E> baseComparator;
    +
    +	@Override public int compare(E o1, E o2) 
    +	{	
    +		count += 1;
    --- End diff --
    
    I think this might read a bit better in the next line as:
    ```
    if (++count % cancelTestFrequency == 0)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    This looks a lot better to me. Do you want to do anything with the idea of `Arrays.parallelSort`, or leave that for work that is future, if ever?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I just wanted to be sure that my ituition ("this extra wrapper and volatile access
    won't impose a significant penalty on normal use") was sound. Because it would be
    unfortunate if false ...
    
    Merge +1



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    This http://stackoverflow.com/questions/4633866/is-volatile-expensive
    suggests that volatile reads are cheap enough that worrying about 
    "optimising" by only checking occasionally is bootless. I'll simplify the
    code in the next round.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Yeah, I can see how some kind of "moving to phase X" signaling could be real handy. I'm not familiar enough with the query machinery to understand where that should live, though. I think that logging the cancellation is a very good idea, perhaps at `INFO` level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] jena pull request #157: Fixes for JENA-1212

Posted by "A. Soroka" <aj...@virginia.edu>.
Sorry, that was meant to be on the line above the line with the variable `count`. See my edit.

---
A. Soroka
The University of Virginia Library

> On Jul 19, 2016, at 10:25 AM, Chris Dollin <ch...@epimorphics.com> wrote:
> 
>> Github user ajs6f commented on a diff in the pull request:
> ...
>>     +	/**
>>     +	    Count of the number of times this comparator has been called.
>>     +	*/
>>     +	int count = 0;
>>     +	
>>     +	protected volatile boolean cancelled;
>>     --- End diff --
>> 
>>     Shouldn't this also be `volatile` for thread-safety?
> 
> Looks volatile to me?
> 
> Chris
> 
> -- 
> "You've spotted a flaw in my thinking, Trev" Big Al,/The Beiderbeck Connection/
> 
> Epimorphics Ltd, http://www.epimorphics.com
> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
> Epimorphics Ltd. is a limited company registered in England (number 7016688)


Re: [GitHub] jena pull request #157: Fixes for JENA-1212

Posted by Chris Dollin <ch...@epimorphics.com>.
> Github user ajs6f commented on a diff in the pull request:
...
>      +	/**
>      +	    Count of the number of times this comparator has been called.
>      +	*/
>      +	int count = 0;
>      +	
>      +	protected volatile boolean cancelled;
>      --- End diff --
>
>      Shouldn't this also be `volatile` for thread-safety?

Looks volatile to me?

Chris

-- 
"You've spotted a flaw in my thinking, Trev" Big Al,/The Beiderbeck Connection/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71346429
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/AbortableComparator.java ---
    @@ -0,0 +1,93 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.jena.atlas.data;
    +
    +import java.util.Arrays;
    +import java.util.Comparator;
    +
    +public final class AbortableComparator<E> implements Comparator<E> 
    +{
    +	public AbortableComparator(Comparator<? super E> comparator) 
    +	{
    +		this.baseComparator = comparator;
    +	}
    +	
    +	/**
    +        	<code>AbandonSort</code> is the exception thrown from
    +        	<code>AbortableComparator</code> to abandon a sort.
    +	 */
    +	public static class AbandonSort extends RuntimeException 
    +	{
    +		private static final long serialVersionUID = 1L;
    +	}
    +	
    +	public static enum Finish {COMPLETED, ABORTED}
    +	
    +	/**
    +	    The test for whether the sort has been cancelled is
    +	    performed every <code>cancelTestFrequency</code> comparisons.
    +	    This reduces the (presumed) overhead of access to a
    +	    volatile boolean.    	    
    +	*/
    +	static final int cancelTestFrequency = 10000;
    +	
    +	/**
    +	    Count of the number of times this comparator has been called.
    +	*/
    +	int count = 0;
    +	
    +	protected volatile boolean cancelled;
    --- End diff --
    
    Shouldn't this also be `volatile` for thread-safety?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    OK. Timing: I had writen tests that did not allow for garbage collection times and had also not allocated
    enough heap to the JVM running the timing test.
    
    Here's what I did. Create an array of 8M Bindings. Create a base BindingComparator with a 
    SortCondition of ?v = "x/" + ((i * 223 * 359) & 0xffffff) where i is the index of the  
    comparator -- this is so that sorting is predictable but telements are not already sorted.
    Run the GC and then sort the array using either the base comparator (A) or the
    comparator  wrapped as an abortableComparator (B). Report the time taken.
    Run comparison 20 times AB AB AB AB ... Discard first 10 measurments as
    warmup. The remaining sorts take 59-61 seconds with no visible bias to either A or B.
    
    Similar results (that is, little difference between A and B) obtain using 2M bindings 
    rather than 8M, and using a different generated values for the SortCondition.
    
    We also compared timings of a typical query from our initial problem app; swichting
    from the Fuseki currently in use for this app to the code in this pull request,
    ie with cancellable sorts, replaced runs of approximately 26.5 seconds with
    runs of approximately 27.7 seconds.
    
    I think this is enough to say that making sorts cancellable has an overhead of
    2-4%. Do we think that's OK ?
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71167237
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/SortedDataBag.java ---
    @@ -69,19 +69,81 @@
         
         protected final ThresholdPolicy<E> policy;
         protected final SerializationFactory<E> serializationFactory;
    -    protected final Comparator<? super E> comparator;
    +    protected final CanAbortComparator comparator;
         
         protected boolean finishedAdding = false;
         protected boolean spilled = false;
         protected boolean closed = false;
    +    protected volatile boolean cancelled;
    --- End diff --
    
    By making this a field inside `CanAbortComparator` and offering a method `cancel()` on `CanAbortComparator` you could break that type out and keep it static, instead of an inner type here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71160809
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/SortedDataBag.java ---
    @@ -69,19 +69,81 @@
         
         protected final ThresholdPolicy<E> policy;
         protected final SerializationFactory<E> serializationFactory;
    -    protected final Comparator<? super E> comparator;
    +    protected final CanAbortComparator comparator;
         
         protected boolean finishedAdding = false;
         protected boolean spilled = false;
         protected boolean closed = false;
    +    protected volatile boolean cancelled;
         
         public SortedDataBag(ThresholdPolicy<E> policy, SerializationFactory<E> serializerFactory, Comparator<? super E> comparator)
         {
             this.policy = policy;
             this.serializationFactory = serializerFactory;
    -        this.comparator = comparator;
    +        this.comparator = new CanAbortComparator(comparator);
         }
         
    +    private final class CanAbortComparator implements Comparator<E> 
    --- End diff --
    
    A minor point, Jena seems usually to use a "adjectival" naming, i.e. `AbortableComparator`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/157
  
    It does not need logging, and definitely not at level `INFO` - cancellation is a normal feature (and should cause an exception) and Jena is a library.  Libraries should not log at `INFO` in normal use.  Consider Fuseki - it logs a cancelled request anyway, catching the query cancellation exception.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71170935
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/SortedDataBag.java ---
    @@ -200,9 +263,13 @@ public void flush()
             if (!finishedAdding && memSize > 1)
             {
                 // Again, some ugliness for speed
    -            Object[] array = memory.toArray();
    -            Arrays.sort(array, (Comparator)comparator);
    -            memory = Arrays.asList((E[])array);
    +            E[] array = (E[]) memory.toArray();
    +            if (comparator.abortableSort(array)) 
    +            {
    +            	// if we comment this back in, we lose the timeout message!
    +            	// return Iter.nullIterator();
    +            }
    --- End diff --
    
    What's this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Yeah, the fact that multiple threads are not just reading but might be _writing_ the value really does make an `AtomicBoolean` make sense here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I wondered whether logging the cancellation of the sort would be a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I thin it is better to consider them separately - first, get cancellable sorts into the code base, then look at `Arrays.parallelSort`.
    
    `Arrays.parallelSort` is going to need some care to make sure it does not degrade concurrent requests and that is it properly behaves when cancelled.
    
    What I'd like to know is how the current PR behaves in a live system. As it builds on an existing facility, we can't take a "experimental feature" approach.
    
    1. Does it completely address the motivating use case in production usage? (May be it is only partial - we want to make one set of changes on master, not a series changes.) 
    2. Does it lead to any advere/different effects? (including performance comparisons though I doubt it will have an impact, because it replaces, it would be good to know before it goes into becomes the master branch.)
    
    I hope @ehedgehog can run tests in the with a modified Fuseki in a staging setup for the system(s) that illustrated the problem in the first place.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    No, I think you are right. It's actually pretty much a classic use case for volatile, and the semantics of the class are not dependent on volatility, more the correctness. I'd use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I would not experiment with parallelSort at this time.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] jena issue #157: Fixes for JENA-1212

Posted by Chris Dollin <ch...@epimorphics.com>.
On 19/07/16 15:23, ajs6f wrote:
> Github user ajs6f commented on the issue:
>
>      https://github.com/apache/jena/pull/157
>
>      I'm not sure what the point of the periodic cancellation checking machinery is.
>      It seems like it is left over from your diagnostics.

(no)

>      It doesn't seem to save any time, because the test `count % cancelTestFrequency == 0`
 >      is more expensive than the test it is guarding `(cancelled)`.

`cancelled` is volatile so reading from it involves some synchronisation
cost, or so I've been lead to believe. (The magic number is just A
RandomValue To Use.)

Chris

-- 
"He could not weigh up which was worse and so tried not to think about either."
                                                 /The Spellgrinder's Apprentice/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I'm not sure what the point of the periodic cancellation checking machinery is. It seems like it is left over from your diagnostics. It doesn't seem to save any time, because the test `count % cancelTestFrequency == 0` is more expensive than the test it is guarding `(cancelled)`. How about getting rid of `cancelTestFrequency` and `compare` just being:
    ```
    @Override public int compare(E o1, E o2) {	
        if (cancelled) throw new AbandonSort();
        return baseComparator.compare(o1, o2);
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Yes.
    
    With a proviso. In the unmodified Fuseki, if the cancellation happens while the
    bindings are being loaded into the array that will be sort()ed, then the load
    finishes "immediately". That phase, the load phase, already terminates
    quickly.
    
    The modification applies once the sort() has started running, so in the
    live test we need to be able to estimate when the sort()ing will start.
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Looks OK.
    
    `TestSortedDataBagCancellation` has a warning of missing `@Override`.
    
    Some formatting/whitespace issue issues:
    
    1. Tabs used - not spaces.
    2. `AbortableComparator` - old style layout.
    3. `QueryIterSort` is reformatted to current style but tabs used and assumed to be 8.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    PS I can type up the details of the numbers if it would be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71348900
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/AbortableComparator.java ---
    @@ -0,0 +1,93 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.jena.atlas.data;
    +
    +import java.util.Arrays;
    +import java.util.Comparator;
    +
    +public final class AbortableComparator<E> implements Comparator<E> 
    +{
    +	public AbortableComparator(Comparator<? super E> comparator) 
    +	{
    +		this.baseComparator = comparator;
    +	}
    +	
    +	/**
    +        	<code>AbandonSort</code> is the exception thrown from
    +        	<code>AbortableComparator</code> to abandon a sort.
    +	 */
    +	public static class AbandonSort extends RuntimeException 
    +	{
    +		private static final long serialVersionUID = 1L;
    +	}
    +	
    +	public static enum Finish {COMPLETED, ABORTED}
    +	
    +	/**
    +	    The test for whether the sort has been cancelled is
    +	    performed every <code>cancelTestFrequency</code> comparisons.
    +	    This reduces the (presumed) overhead of access to a
    +	    volatile boolean.    	    
    +	*/
    +	static final int cancelTestFrequency = 10000;
    +	
    +	/**
    +	    Count of the number of times this comparator has been called.
    +	*/
    +	int count = 0;
    --- End diff --
    
    Shouldn't this also be `volatile` for thread-safety?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    I think we are getting into the realm of premature optimization at this point. My inclination is to merge this as-is, given that we have no evidence of real performance problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/157
  
    http://jena.apache.org/getting_involved/reviewing_contributions.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ehedgehog <gi...@git.apache.org>.
Github user ehedgehog commented on the issue:

    https://github.com/apache/jena/pull/157
  
    [Current numbers suggest 10% overhead for the volatile access but I
    don't believe the tests yet -- for example there can be 10% variablility
    within a test.]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71167748
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/SortedDataBag.java ---
    @@ -69,19 +69,81 @@
         
         protected final ThresholdPolicy<E> policy;
         protected final SerializationFactory<E> serializationFactory;
    -    protected final Comparator<? super E> comparator;
    +    protected final CanAbortComparator comparator;
         
         protected boolean finishedAdding = false;
         protected boolean spilled = false;
         protected boolean closed = false;
    +    protected volatile boolean cancelled;
         
         public SortedDataBag(ThresholdPolicy<E> policy, SerializationFactory<E> serializerFactory, Comparator<? super E> comparator)
         {
             this.policy = policy;
             this.serializationFactory = serializerFactory;
    -        this.comparator = comparator;
    +        this.comparator = new CanAbortComparator(comparator);
         }
         
    +    private final class CanAbortComparator implements Comparator<E> 
    +    	{
    +    	/**
    +    	    The test for whether the sort has been cancelled is
    +    	    performed every <code>cancelTestFrequency</code> comparisons.
    +    	    This reduces the (presumed) overhead of access to a
    +    	    volatile boolean.    	    
    +    	*/
    +    	static final int cancelTestFrequency = 10000;
    +    	
    +    	/**
    +    	    Count of the number of times this comparator has been called.
    +    	*/
    +		int count = 0;
    +		
    +		final Comparator<? super E> baseComparator;
    +		
    +		public CanAbortComparator(Comparator<? super E> comparator) 
    +			{
    +			this.baseComparator = comparator;
    +			}
    +
    +		@Override public int compare(E o1, E o2) 
    +		{	
    +			count += 1;
    +			if (count % cancelTestFrequency == 0) 
    +			{
    +				if (cancelled) throw new AbandonSort();
    +			}
    +			return baseComparator.compare(o1, o2);
    +		}
    +		
    +		/**
    +		    Sort the array <code>e</code> using this comparator
    +		 	with the additional ability to abort the sort.
    +		*/
    +		public boolean abortableSort(E[] e) {
    +			try { Arrays.sort(e, this); }
    +			catch (AbandonSort s) { return true; }
    --- End diff --
    
    Maybe it's just me, but this seems a little backwardsish. I would expect `true` to indicate successful completion. Maybe it's better to introduce a simple `enum Finish { COMPLETED, ABORTED }` here, for clarity?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/157
  
    Yeah, the fact that multiple threads are not just reading but might be _writing_ the value really does make an `AtomicBoolean` make sense here. On the other hand, there are no compound atomic operations against the variable, just simple sets and gets. I dunno.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/157#discussion_r71314599
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/atlas/data/SortedDataBag.java ---
    @@ -69,19 +69,81 @@
         
         protected final ThresholdPolicy<E> policy;
         protected final SerializationFactory<E> serializationFactory;
    -    protected final Comparator<? super E> comparator;
    +    protected final CanAbortComparator comparator;
         
         protected boolean finishedAdding = false;
         protected boolean spilled = false;
         protected boolean closed = false;
    +    protected volatile boolean cancelled;
         
         public SortedDataBag(ThresholdPolicy<E> policy, SerializationFactory<E> serializerFactory, Comparator<? super E> comparator)
         {
             this.policy = policy;
             this.serializationFactory = serializerFactory;
    -        this.comparator = comparator;
    +        this.comparator = new CanAbortComparator(comparator);
         }
         
    +    private final class CanAbortComparator implements Comparator<E> 
    +    	{
    +    	/**
    +    	    The test for whether the sort has been cancelled is
    +    	    performed every <code>cancelTestFrequency</code> comparisons.
    +    	    This reduces the (presumed) overhead of access to a
    +    	    volatile boolean.    	    
    +    	*/
    +    	static final int cancelTestFrequency = 10000;
    +    	
    +    	/**
    +    	    Count of the number of times this comparator has been called.
    +    	*/
    +		int count = 0;
    +		
    +		final Comparator<? super E> baseComparator;
    +		
    +		public CanAbortComparator(Comparator<? super E> comparator) 
    +			{
    +			this.baseComparator = comparator;
    +			}
    +
    +		@Override public int compare(E o1, E o2) 
    +		{	
    +			count += 1;
    +			if (count % cancelTestFrequency == 0) 
    +			{
    +				if (cancelled) throw new AbandonSort();
    +			}
    +			return baseComparator.compare(o1, o2);
    +		}
    +		
    +		/**
    +		    Sort the array <code>e</code> using this comparator
    +		 	with the additional ability to abort the sort.
    +		*/
    +		public boolean abortableSort(E[] e) {
    +			try { Arrays.sort(e, this); }
    --- End diff --
    
    Yes - with two notes: both are easily testable:
    
    What happens if the comparator throws an exception on a separate thread (I expect the std library handles this somehow)?
    
    Is there a risk that one sort impacts the whole server?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #157: Fixes for JENA-1212

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/157
  
    or be fancy and use "AtomicBoolean"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---