You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@river.apache.org by jg...@simulexinc.com on 2010/12/22 10:19:09 UTC

Space/outrigger suggestions (remote iterator vs. collection)

My current email client is not advanced enough to do inline, but I think I'm following your explanation.

Successive calls of contents may retrieve the same objects, so merely calling contents multiple times wouldn't provide the functionality of running through a space.   Thus, the remote iterator was introduced in order to provide the ability to exhaustively read space on an iterative-type basis.

Meanwhile, the takeMultipleLimit in Outrigger that limits the returned collection size isn't a practical hindrance because successive takeMultiples will eventually grab everything from space, whether it happens all at once or not.   The same could be said of a client "sipping" from the space a couple entries at a time via maxEntries.

The case for the remote iterator stands reasonably well-made, then: it keeps memory overhead fairly low (beholden to the size of actual entries), and at minimal network cost.   It could only be reasonably replaced with a collection of all matching entries, which would not be satisfactory for underpowered clients.

So my next question would be: why not use a remote iterator for the takeMultiple?

Using a remote iterator would presumably eliminate things like takeMultipleLimit, removing the case where the client receives fewer than the maxEntries requested when they are available.   Indeed, takeMultipleLimit would effectively be replaced with "takeMultipleBatchSize", largely transparent to the end user.   We'd gain a uniform return type for multiple entry fetches.

Remote iterator usage with takeMultiple would require more network use, but perhaps (wild speculation) not much more than a call to contents with a transaction.   (Would also need to compare remote iterator to successive calls to "take" in evaluating network cost.)   Any pitfalls I'm missing?

jamesG

PS: Apparently I need to study up on read lock semantics; please excuse the confusion.

-----Original Message-----
From: "Dan Creswell" <da...@gmail.com>
Sent: Monday, December 20, 2010 12:15pm
To: river-dev@incubator.apache.org
Subject: Re: Space/outrigger suggestions

K, so inline.....

On 20 December 2010 16:54, <jg...@simulexinc.com> wrote:

> Glad to explain.
>
> My argument is a bit simplistic; as a matter of API design, it's preferable
> to have a single return mechanism for multiple returns.
>
> I realize there were likely technical reasons for the decision, but it
> makes for a less uniform API and in particular becomes a greater concern if
> we elect to add new method signatures returning multiple items.
>
> I'm not clear on what you mean by the "non-destructive" nature of
> contents() requiring a remote iterator to be useful.   At my company, we
> actually wrapped the method to so that we'd ultimately get a collection (by
> exhausting the iterator).
>
>
Non-destructive:

If I have one hundred entry's in a space and I do a batch take of 10 at a
time assuming there are no other operations I will empty the space after 10
batch takes.

The same scenario for a batch read does not work. You will never (as the
spec is now) exhaustively search the entrys. It's entirely acceptable for
the space to return the same 10 entrys each time you call batch read. Hence
the need for contents which does some continuous book-keeping that ensures
you can exhaust the space contents.

Also, contents() presumably sets 'read' locks if a transaction is used,
> creating reservations for future takes, so doesn't the level of
> 'destructiveness' depend on usage?
>
>
If a transaction is used, locks are set. However it's possible to not pass a
transaction in which case read locks are not asserted. Note also that a read
lock doesn't prevent other read locks thus reservation for a take doesn't
simply follow.

> Now, that's not to say I'm deadset against the remote iterator approach.
> Remote iterators might save some memory/cpu overhead for truly massive
> requests, particularly if the user does not necessarily want every entry
> (though were that the case, maxEntries should have been used).
>
>
How many entrys can you knowingly take/read as a batch without exhausting
client memory? Difficult to say given one doesn't know how big marshalled
entrys will be or indeed the amount of free space on the client or indeed
the server. The result is that large batch takes or indeed reads are
somewhat undesirable.

Decent remote iterator implementations, incidentally, don't transfer all
matches in one go - they parcel them out in batches. Large batches obviously
take a long time to transfer and are problematic for clients that want to be
somewhat responsive to their users. Imagine asking for contents of a large
number of entrys and waiting whilst all of them are transferred (e.g.
because you want to browse a space).

> On the other hand, returning a collection would spare network costs of
> sustained remote iterator interactions and the mild timing uncertainties its
> usage entails.   And the remote iterator is more complex by its nature.
>
>
Can you explain more about the network costs you envision?

Most remote iterator impls leave the connection open so the window and
handshake issues suffered by e.g. TCP are eliminated. The same number of
packets will be transferred give or take the odd frame that is only
half-full due to the end of a batch being reached.

> In any case, I think it would be best to standardize on one or the other.
>
> Perhaps as someone involved with Javaspace05, you can illuminate some of
> the decision making surrounding the current usage of both?
>
>
Some of that is above so I'll stop for now and see what else you ask for
details of, okay?

Thanks for the explanation, definitely helps....

> jamesG
>
> -----Original Message-----
> From: "Dan Creswell" <da...@gmail.com>
> Sent: Monday, December 20, 2010 4:19am
> To: river-dev@incubator.apache.org
> Subject: Re: Space/outrigger suggestions
>
> James G,
>
> Can you explain some more about this statement please?
>
> "3) Collections or remote iterators, not both.
>
> "contents" returns a remote iterator named "MatchSet", while "take (with
> collection)" returns a collection.   I can understand the argument
> behind both use cases, but not necessarily the argument for using both
> simultaneously.
>
> "
>
> This has been heavily discussed in the past and contents(), by virtue of
> it's non-destructive nature (unlike take) needs something akin to a remote
> iterator to be practical/useful. Multiple takes allow you to eventually
> exhaust a space's contents, multiple reads won't do similarly.
>
> So, given I'm scarred with the previous efforts of space implementation
> including JavaSpace05 I fear my past is colouring my thinking so I'd like
> to
> understand more.
>
> Cheers,
>
> Dan.
>
>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Peter Firmstone <ji...@zeus.net.au>.

Then you end up with something like MatchSet, so it would seem someone 
already solved the problem.  Perhaps the name "Set" might have been a 
confusing choice.

Just think of MatchSet like an Iterator, without the hasNext() call.

The caller can also terminate when you've found the result you want.

Cheers,

Peter.

Peter Firmstone wrote:
> Dan Creswell wrote:
>>> When access large disk records and pulling these into memory, you 
>>> typically
>>> only take a small chunk, process it and move on, so the behaviour is 
>>> more
>>> stream like, rather than iterator, I created an interface called
>>> ResultStream (yes it supports Generics, but beware the compilation
>>> boundaries that haven't been checked by the compiler), result stream is
>>> terminated by a null value.
>>>
>>>
>>>     
>> That's generally similar to how MatchSet works under the covers....
>>
>>  <snip>
>>
>>
>>  
>>> package org.apache.river.api.util;
>>>
>>> /**
>>> * This interface is similar to an Enumerator, it is designed to return
>>> * results incrementally in loops, however unlike an Enumerator, 
>>> there is no
>>> * check first operation as implementors must return a null value after
>>> * the backing data source has been exhausted. So this terminates like a
>>> stream
>>> * by returning a null value.
>>> *
>>> * @author Peter Firmstone
>>> */
>>> public interface ResultStream<T> {
>>>   /**
>>>    * Get next T, call from a loop until T is null;
>>>    * @return T unless end of stream in which case null is returned.
>>>    */
>>>   public T get();
>>>
>>>     
>>
>> So this overall, isn't far from where MatchSet goes. For those following
>> along, remoteness issues are such that we can't reliably return null to
>> indicate "end of stream". One may never reach the end of the stream if
>> there's a failure at the back-end. One could counter that with a  
>> "wait for
>> the failure to go away" scenario but that assumes the failure does go 
>> away
>> and it might not (and whilst you're figuring all that out your thread is
>> blocked and can't say much to your upstream users).
>>
>> In essence, then returning null means "we successfully reached the 
>> end of
>> the stream". And we need some other mechanism to say "we didn't reach 
>> the
>> end of the stream for some reason". Typically the mechanism is an
>> exception....
>>
>>   
>
> Yeah, you'd need to throw an IOException, I've pondered adding it, so 
> it can be used over remote connections, that way another interface can 
> implement both Remote and ResultStream.
>
> Thanks, I think that was all the convincing I needed.
>
> ResultStream was originally created for Parallel iteration, multiple 
> threads calling get, you can't check first, you have to check after ;)
>
> Cheers,
>
> Peter.
>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Peter Firmstone <ji...@zeus.net.au>.

Dan Creswell wrote:
>> When access large disk records and pulling these into memory, you typically
>> only take a small chunk, process it and move on, so the behaviour is more
>> stream like, rather than iterator, I created an interface called
>> ResultStream (yes it supports Generics, but beware the compilation
>> boundaries that haven't been checked by the compiler), result stream is
>> terminated by a null value.
>>
>>
>>     
> That's generally similar to how MatchSet works under the covers....
>
>  <snip>
>
>
>   
>> package org.apache.river.api.util;
>>
>> /**
>> * This interface is similar to an Enumerator, it is designed to return
>> * results incrementally in loops, however unlike an Enumerator, there is no
>> * check first operation as implementors must return a null value after
>> * the backing data source has been exhausted. So this terminates like a
>> stream
>> * by returning a null value.
>> *
>> * @author Peter Firmstone
>> */
>> public interface ResultStream<T> {
>>   /**
>>    * Get next T, call from a loop until T is null;
>>    * @return T unless end of stream in which case null is returned.
>>    */
>>   public T get();
>>
>>     
>
> So this overall, isn't far from where MatchSet goes. For those following
> along, remoteness issues are such that we can't reliably return null to
> indicate "end of stream". One may never reach the end of the stream if
> there's a failure at the back-end. One could counter that with a  "wait for
> the failure to go away" scenario but that assumes the failure does go away
> and it might not (and whilst you're figuring all that out your thread is
> blocked and can't say much to your upstream users).
>
> In essence, then returning null means "we successfully reached the end of
> the stream". And we need some other mechanism to say "we didn't reach the
> end of the stream for some reason". Typically the mechanism is an
> exception....
>
>   

Yeah, you'd need to throw an IOException, I've pondered adding it, so it 
can be used over remote connections, that way another interface can 
implement both Remote and ResultStream.

Thanks, I think that was all the convincing I needed.

ResultStream was originally created for Parallel iteration, multiple 
threads calling get, you can't check first, you have to check after ;)

Cheers,

Peter.

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Dan Creswell <da...@gmail.com>.

On 19 January 2011 10:01, Peter Firmstone <ji...@zeus.net.au> wrote:

> Dan Creswell wrote:
>
>> Note too that iterator does not support remote semantics whereas MatchSet
>> does (all those explicit RemoteExceptions etc).
>>
>> An iterator as defined for the Java platform might fail for concurrency
>> reasons (although notably it doesn't for many of the modern concurrent
>> collections) or because an operation (typically remove) is not supported.
>>
>>
> Yes I bumped into this recently when creating a concurrent policy
> implementation, although it was with Enumeration, the backing set cannot be
> modified while the Enumeration is being read from a loop, the same with the
> iterator.


:)

<snip>


> When access large disk records and pulling these into memory, you typically
> only take a small chunk, process it and move on, so the behaviour is more
> stream like, rather than iterator, I created an interface called
> ResultStream (yes it supports Generics, but beware the compilation
> boundaries that haven't been checked by the compiler), result stream is
> terminated by a null value.
>
>
That's generally similar to how MatchSet works under the covers....

 <snip>


> package org.apache.river.api.util;
>
> /**
> * This interface is similar to an Enumerator, it is designed to return
> * results incrementally in loops, however unlike an Enumerator, there is no
> * check first operation as implementors must return a null value after
> * the backing data source has been exhausted. So this terminates like a
> stream
> * by returning a null value.
> *
> * @author Peter Firmstone
> */
> public interface ResultStream<T> {
>   /**
>    * Get next T, call from a loop until T is null;
>    * @return T unless end of stream in which case null is returned.
>    */
>   public T get();
>

So this overall, isn't far from where MatchSet goes. For those following
along, remoteness issues are such that we can't reliably return null to
indicate "end of stream". One may never reach the end of the stream if
there's a failure at the back-end. One could counter that with a  "wait for
the failure to go away" scenario but that assumes the failure does go away
and it might not (and whilst you're figuring all that out your thread is
blocked and can't say much to your upstream users).

In essence, then returning null means "we successfully reached the end of
the stream". And we need some other mechanism to say "we didn't reach the
end of the stream for some reason". Typically the mechanism is an
exception....


>   /**
>    * Close the result stream, this allows the implementer to close any
>    * resources prior to deleting reference.
>    */
>   public void close();
>
> }
>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Peter Firmstone <ji...@zeus.net.au>.

Patricia Shanahan wrote:
> On 1/19/2011 12:58 PM, Peter Firmstone wrote:
>> Patricia Shanahan wrote:
>>> Peter Firmstone wrote:
>>> ...
>>>> Yes I bumped into this recently when creating a concurrent policy
>>>> implementation, although it was with Enumeration, the backing set
>>>> cannot be modified while the Enumeration is being read from a loop,
>>>> the same with the iterator.
>>> ...
>>>
>>> That depends on the implementation of the Iterator, and the Iterable's
>>> related contract. There is nothing prohibiting concurrency-supporting
>>> contracts. See, for example,
>>> http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html#iterator%28%29 
>>>
>>>
>>
>> Which is essentially the same way I've handled Iterators and a
>> concurrent underlying collection, by creating a copy of that collection,
>> it was current at the time of access, no Guarantees over time, the
>> copied collection is shared among all iterator readers, its reference is
>> nullified as soon as the underlying collection is written, then the next
>> iterator copy's the collection again. The currently executing iterators,
>> still hold a reference to the previous copy of the underlying 
>> collection.
>
> Note that there are multiple ways of achieving the specified semantics.
> For example, ConcurrentLinkedQueue does not use a copy of the
> collection. Instead, it depends on a combination of invariants,
> including non-reuse of nodes, and volatile fields.
>

Sounds interesting, guess anythings possible, given enough thought.

Cheers,

Peter.

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

On 1/19/2011 12:58 PM, Peter Firmstone wrote:
> Patricia Shanahan wrote:
>> Peter Firmstone wrote:
>> ...
>>> Yes I bumped into this recently when creating a concurrent policy
>>> implementation, although it was with Enumeration, the backing set
>>> cannot be modified while the Enumeration is being read from a loop,
>>> the same with the iterator.
>> ...
>>
>> That depends on the implementation of the Iterator, and the Iterable's
>> related contract. There is nothing prohibiting concurrency-supporting
>> contracts. See, for example,
>> http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html#iterator%28%29
>>
>
> Which is essentially the same way I've handled Iterators and a
> concurrent underlying collection, by creating a copy of that collection,
> it was current at the time of access, no Guarantees over time, the
> copied collection is shared among all iterator readers, its reference is
> nullified as soon as the underlying collection is written, then the next
> iterator copy's the collection again. The currently executing iterators,
> still hold a reference to the previous copy of the underlying collection.

Note that there are multiple ways of achieving the specified semantics.
For example, ConcurrentLinkedQueue does not use a copy of the
collection. Instead, it depends on a combination of invariants,
including non-reuse of nodes, and volatile fields.

Patricia

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Peter Firmstone <ji...@zeus.net.au>.

Patricia Shanahan wrote:
> Peter Firmstone wrote:
> ...
>> Yes I bumped into this recently when creating a concurrent policy 
>> implementation, although it was with Enumeration, the backing set 
>> cannot be modified while the Enumeration is being read from a loop, 
>> the same with the iterator.
> ...
>
> That depends on the implementation of the Iterator, and the Iterable's 
> related contract. There is nothing prohibiting concurrency-supporting 
> contracts. See, for example, 
> http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html#iterator%28%29 
>

Which is essentially the same way I've handled Iterators and a 
concurrent underlying collection, by creating a copy of that collection, 
it was current at the time of access, no Guarantees over time, the 
copied collection is shared among all iterator readers, its reference is 
nullified as soon as the underlying collection is written, then the next 
iterator copy's the collection again.  The currently executing 
iterators, still hold a reference to the previous copy of the underlying 
collection.

The foreach style loop is much nicer, the for loop example I provided 
earlier isn't as intuitive.

>
> I'm planning to make the FastList in outrigger Iterable, and follow 
> the ConcurrentLinkedQueue model. As the results looked in preliminary 
> benchmarking before I went on vacation, the best implementation is 
> based on ConcurrentLinkedQueue.
>
> The issue of remoteness, and allowing remote-related exceptions, is 
> another issue. However, rather than giving up the nice loop syntax we 
> get with Iterable, we could consider wrapping in an unchecked exception.

I've pondered something like that too, it is important to remember that 
the Iterator is still restricted to single thread access by the caller, 
due to check then get, not being atomic.

The caller might have to be happy to live with duplicates in the 
iterator, if it is updated during iteration, remembering every value 
iterated could create memory problems. ;)

Cheers,

Peter.

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

Peter Firmstone wrote:
...
> Yes I bumped into this recently when creating a concurrent policy 
> implementation, although it was with Enumeration, the backing set cannot 
> be modified while the Enumeration is being read from a loop, the same 
> with the iterator.
...

That depends on the implementation of the Iterator, and the Iterable's 
related contract. There is nothing prohibiting concurrency-supporting 
contracts. See, for example, 
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html#iterator%28%29

I'm planning to make the FastList in outrigger Iterable, and follow the 
ConcurrentLinkedQueue model. As the results looked in preliminary 
benchmarking before I went on vacation, the best implementation is based 
on ConcurrentLinkedQueue.

The issue of remoteness, and allowing remote-related exceptions, is 
another issue. However, rather than giving up the nice loop syntax we 
get with Iterable, we could consider wrapping in an unchecked exception.

Patricia

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Peter Firmstone <ji...@zeus.net.au>.

Dan Creswell wrote:
> Note too that iterator does not support remote semantics whereas MatchSet
> does (all those explicit RemoteExceptions etc).
>
> An iterator as defined for the Java platform might fail for concurrency
> reasons (although notably it doesn't for many of the modern concurrent
> collections) or because an operation (typically remove) is not supported.
>   
Yes I bumped into this recently when creating a concurrent policy 
implementation, although it was with Enumeration, the backing set cannot 
be modified while the Enumeration is being read from a loop, the same 
with the iterator.

Although this is for local code only, I've got a Multi Read / Single 
Write Collections utility, similar to the Synchronized Collections 
utility in skunk/pepe.  Iterators are generated from a copy of the 
encapsulated collection, but when remove is called, the call is 
redirected to the underlying collection, and requires the write lock.  
So Enumerators and Iterators are up to date at their creation time, but 
become stale quickly.

When access large disk records and pulling these into memory, you 
typically only take a small chunk, process it and move on, so the 
behaviour is more stream like, rather than iterator, I created an 
interface called ResultStream (yes it supports Generics, but beware the 
compilation boundaries that haven't been checked by the compiler), 
result stream is terminated by a null value.

So you can process very large amounts of data, in small doses, in a 
loop, like this one, which is actually also an implementation of 
ResultStream.get() and performs filtering operations:

    public ServiceItem get() {
        for(Object item = inputResultStream.get(); item != null; item = 
inputResultStream.get()) {
            if (item instanceof ServiceItem){
                ServiceItem it = (ServiceItem) item;
                int l = filters.size();
                for ( int i = 0; i < l; i++){
                    ServiceItemFilter filter = filters.get(i);
                    if (filter == null) continue;
                    if (filter.check(it))  return it;
                }// end filter loop
            }// If it isn't a ServiceItem it is ignored.
        }//end item loop
        return null; // Our stream terminated item was null;

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership. The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License. You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.river.api.util;

/**
 * This interface is similar to an Enumerator, it is designed to return
 * results incrementally in loops, however unlike an Enumerator, there is no
 * check first operation as implementors must return a null value after
 * the backing data source has been exhausted. So this terminates like a 
stream
 * by returning a null value.
 *
 * @author Peter Firmstone
 */
public interface ResultStream<T> {
    /**
     * Get next T, call from a loop until T is null;
     * @return T unless end of stream in which case null is returned.
     */
    public T get();
    /**
     * Close the result stream, this allows the implementer to close any
     * resources prior to deleting reference.
     */
    public void close();
}

> MatchSet semantics are substantially different. I'm sure it's still possible
> to do a wrapper around MatchSet that looks like an iterator but there will
> be some implementation cracks to pain over in respect of hiding away
> exceptions and so on.
>
> Dan.
>
> On 19 January 2011 07:54, Patricia Shanahan <pa...@acm.org> wrote:
>
>   
>> I don't think we should commit to a single class doing both Iterable and
>> Iterator. An Iterable is already committed to being able to supply an
>> Iterator on demand, but often the Iterator implementation is better done as
>> a private class member of the Iterable. Note that an Iterable needs to be
>> able to supply a new Iterator each time its iterator() method is called.
>>
>> I'm not sure what you are saying about combining Iterator and the MatchSet
>> features. My inclination would be to keep each interface simple and clean.
>> Many classes will implement Iterable and appropriate interfaces representing
>> the snapshot and lease capabilities.
>>
>> As you can probably guess from the length of this reply, I'm back from
>> Egypt and have a full keyboard, not just an iPhone.
>>
>> Patricia
>>
>>
>>
>> James Grahn wrote:
>>
>>     
>>> I should also add, we'd likely need to derive our own class extending
>>> Iterable & Iterator to avoid losing existing MatchSet methods of getSnapshot
>>> and getLease.
>>>
>>> I don't see an immediate problem with this; a collection-backed
>>> Iterable/Iterator would always have a null Lease, correct?
>>>
>>> jamesG
>>>
>>> On 1/18/2011 5:38 PM, James Grahn wrote:
>>>
>>>       
>>>> It (finally) occurred to me that we can have our cake and eat it too in
>>>> this case.
>>>>
>>>> We can have the sweet deliciousness of API symmetry and retain the
>>>> implementation advantages of remote iterator & collection by having both
>>>> take-multiple and contents return:
>>>> Iterable.
>>>>
>>>>
>>>>         
>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Dan Creswell <da...@gmail.com>.

Note too that iterator does not support remote semantics whereas MatchSet
does (all those explicit RemoteExceptions etc).

An iterator as defined for the Java platform might fail for concurrency
reasons (although notably it doesn't for many of the modern concurrent
collections) or because an operation (typically remove) is not supported.

MatchSet semantics are substantially different. I'm sure it's still possible
to do a wrapper around MatchSet that looks like an iterator but there will
be some implementation cracks to pain over in respect of hiding away
exceptions and so on.

Dan.

On 19 January 2011 07:54, Patricia Shanahan <pa...@acm.org> wrote:

> I don't think we should commit to a single class doing both Iterable and
> Iterator. An Iterable is already committed to being able to supply an
> Iterator on demand, but often the Iterator implementation is better done as
> a private class member of the Iterable. Note that an Iterable needs to be
> able to supply a new Iterator each time its iterator() method is called.
>
> I'm not sure what you are saying about combining Iterator and the MatchSet
> features. My inclination would be to keep each interface simple and clean.
> Many classes will implement Iterable and appropriate interfaces representing
> the snapshot and lease capabilities.
>
> As you can probably guess from the length of this reply, I'm back from
> Egypt and have a full keyboard, not just an iPhone.
>
> Patricia
>
>
>
> James Grahn wrote:
>
>> I should also add, we'd likely need to derive our own class extending
>> Iterable & Iterator to avoid losing existing MatchSet methods of getSnapshot
>> and getLease.
>>
>> I don't see an immediate problem with this; a collection-backed
>> Iterable/Iterator would always have a null Lease, correct?
>>
>> jamesG
>>
>> On 1/18/2011 5:38 PM, James Grahn wrote:
>>
>>> It (finally) occurred to me that we can have our cake and eat it too in
>>> this case.
>>>
>>> We can have the sweet deliciousness of API symmetry and retain the
>>> implementation advantages of remote iterator & collection by having both
>>> take-multiple and contents return:
>>> Iterable.
>>>
>>>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by James Grahn <jg...@simulexinc.com>.

On 1/19/2011 2:54 AM, Patricia Shanahan wrote:
> I don't think we should commit to a single class doing both Iterable and
> Iterator. An Iterable is already committed to being able to supply an
> Iterator on demand, but often the Iterator implementation is better done
> as a private class member of the Iterable. Note that an Iterable needs
> to be able to supply a new Iterator each time its iterator() method is
> called.

Excellent point.

And quite worthy of further consideration, considering each iterator 
involves remote interaction.   If Iterable would lull the user into more 
expensive usage patterns, it's worth considering keeping it to an 
Iterator, though it would be nice to have an Iterable with its language 
support.

Even MatchSet could have an implementation that merely iterates over a 
collection, if we're desirous of its checked exceptions.   So a unified 
return type and broader spec would still be possible in that case.

Something to mull, at least.

> I'm not sure what you are saying about combining Iterator and the
> MatchSet features. My inclination would be to keep each interface simple
> and clean. Many classes will implement Iterable and appropriate
> interfaces representing the snapshot and lease capabilities.

Agreed.

I was shooting from the hip a bit yesterday, having just had the 
realization that Collection vs. Remote Iterator is best addressed with a 
higher abstraction.   Thanks for the refinements.

> As you can probably guess from the length of this reply, I'm back from
> Egypt and have a full keyboard, not just an iPhone.

Welcome back.

> Patricia
jamesG

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

I don't think we should commit to a single class doing both Iterable and 
Iterator. An Iterable is already committed to being able to supply an 
Iterator on demand, but often the Iterator implementation is better done 
as a private class member of the Iterable. Note that an Iterable needs 
to be able to supply a new Iterator each time its iterator() method is 
called.

I'm not sure what you are saying about combining Iterator and the 
MatchSet features. My inclination would be to keep each interface simple 
and clean. Many classes will implement Iterable and appropriate 
interfaces representing the snapshot and lease capabilities.

As you can probably guess from the length of this reply, I'm back from 
Egypt and have a full keyboard, not just an iPhone.

Patricia

James Grahn wrote:
> I should also add, we'd likely need to derive our own class extending 
> Iterable & Iterator to avoid losing existing MatchSet methods of 
> getSnapshot and getLease.
> 
> I don't see an immediate problem with this; a collection-backed 
> Iterable/Iterator would always have a null Lease, correct?
> 
> jamesG
> 
> On 1/18/2011 5:38 PM, James Grahn wrote:
>> It (finally) occurred to me that we can have our cake and eat it too in
>> this case.
>>
>> We can have the sweet deliciousness of API symmetry and retain the
>> implementation advantages of remote iterator & collection by having both
>> take-multiple and contents return:
>> Iterable.
>>
>> This would introduce more flexibility in the spec, allowing more design
>> decisions to be made by those implementing, while presenting a uniform
>> external return type to the users. (A relatively standard one at that.)
>>
>> When I was looking at the remote iterator earlier, I was thinking it
>> would probably be best to have it implement Iterator and probably
>> Iterable anyway.
>>
>> The only potential sour note is that Iterable is a 1.5 interface. So
>> that's come up again.
>>
>> jamesG
>>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by James Grahn <jg...@simulexinc.com>.

I should also add, we'd likely need to derive our own class extending 
Iterable & Iterator to avoid losing existing MatchSet methods of 
getSnapshot and getLease.

I don't see an immediate problem with this; a collection-backed 
Iterable/Iterator would always have a null Lease, correct?

jamesG

On 1/18/2011 5:38 PM, James Grahn wrote:
> It (finally) occurred to me that we can have our cake and eat it too in
> this case.
>
> We can have the sweet deliciousness of API symmetry and retain the
> implementation advantages of remote iterator & collection by having both
> take-multiple and contents return:
> Iterable.
>
> This would introduce more flexibility in the spec, allowing more design
> decisions to be made by those implementing, while presenting a uniform
> external return type to the users. (A relatively standard one at that.)
>
> When I was looking at the remote iterator earlier, I was thinking it
> would probably be best to have it implement Iterator and probably
> Iterable anyway.
>
> The only potential sour note is that Iterable is a 1.5 interface. So
> that's come up again.
>
> jamesG
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by MICHAEL MCGRADY <mm...@topiatechnology.com>.

I might be daft but I did not hear an objection to Java 1.5.  Did anyone else?  I had a question about Java 1.6.

MG


On Jan 18, 2011, at 4:16 PM, Patricia Shanahan wrote:

> James Grahn wrote:
>> It (finally) occurred to me that we can have our cake and eat it too in this case.
>> We can have the sweet deliciousness of API symmetry and retain the implementation advantages of remote iterator & collection by having both take-multiple and contents return:
>> Iterable.
>> This would introduce more flexibility in the spec, allowing more design decisions to be made by those implementing, while presenting a uniform external return type to the users.   (A relatively standard one at that.)
>> When I was looking at the remote iterator earlier, I was thinking it would probably be best to have it implement Iterator and probably Iterable anyway.
>> The only potential sour note is that Iterable is a 1.5 interface.   So that's come up again.
> 
> We could define our own Iterable, in one of our packages, with the same definition as java.util.Iterable.
> 
> However, it would not have the nice for-loop syntax, and we could spend a lot of time building up workarounds for not being 1.5.
> 
> Patricia

Michael McGrady
Chief Architect
Topia Technology, Inc.
Cel 1.253.720.3365
Work 1.253.572.9712 extension 2037
mmcgrady@topiatechnology.com

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Peter Firmstone <ji...@zeus.net.au>.

Patricia Shanahan wrote:
> James Grahn wrote:
>> It (finally) occurred to me that we can have our cake and eat it too 
>> in this case.
>>
>> We can have the sweet deliciousness of API symmetry and retain the 
>> implementation advantages of remote iterator & collection by having 
>> both take-multiple and contents return:
>> Iterable.
>>
>> This would introduce more flexibility in the spec, allowing more 
>> design decisions to be made by those implementing, while presenting a 
>> uniform external return type to the users.   (A relatively standard 
>> one at that.)
>>
>> When I was looking at the remote iterator earlier, I was thinking it 
>> would probably be best to have it implement Iterator and probably 
>> Iterable anyway.
>>
>> The only potential sour note is that Iterable is a 1.5 interface.   
>> So that's come up again.
>
> We could define our own Iterable, in one of our packages, with the 
> same definition as java.util.Iterable.
>
> However, it would not have the nice for-loop syntax, and we could 
> spend a lot of time building up workarounds for not being 1.5.
>
> Patricia
>
The interesting part about a mixed distributed environment is that it is 
possible to have a Service API that requires a particular java platform, 
clients that cannot support that Service API, can't look it up.  Also 
earlier java / jini platforms that lookup a Service that provides a 
proxy with later version bytecode will not be unmarshalled, this is 
actually acceptable as part of the spec.

Because changing a Service API creates a new API, it is actually 
distinct, clients using the earlier Service, will get service proxy's of 
that type, while clients that look up the latter Service api will get 
different service instances that have a different type.

It is best to use a different class name when we change a Service API 
Interface.

Part of my interest in a modular build relates to the possibility of 
handling platform dependencies.

Cheers,

Peter.

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

James Grahn wrote:
> It (finally) occurred to me that we can have our cake and eat it too in 
> this case.
> 
> We can have the sweet deliciousness of API symmetry and retain the 
> implementation advantages of remote iterator & collection by having both 
> take-multiple and contents return:
> Iterable.
> 
> This would introduce more flexibility in the spec, allowing more design 
> decisions to be made by those implementing, while presenting a uniform 
> external return type to the users.   (A relatively standard one at that.)
> 
> When I was looking at the remote iterator earlier, I was thinking it 
> would probably be best to have it implement Iterator and probably 
> Iterable anyway.
> 
> The only potential sour note is that Iterable is a 1.5 interface.   So 
> that's come up again.

We could define our own Iterable, in one of our packages, with the same 
definition as java.util.Iterable.

However, it would not have the nice for-loop syntax, and we could spend 
a lot of time building up workarounds for not being 1.5.

Patricia

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by James Grahn <jg...@simulexinc.com>.

It (finally) occurred to me that we can have our cake and eat it too in 
this case.

We can have the sweet deliciousness of API symmetry and retain the 
implementation advantages of remote iterator & collection by having both 
take-multiple and contents return:
Iterable.

This would introduce more flexibility in the spec, allowing more design 
decisions to be made by those implementing, while presenting a uniform 
external return type to the users.   (A relatively standard one at that.)

When I was looking at the remote iterator earlier, I was thinking it 
would probably be best to have it implement Iterator and probably 
Iterable anyway.

The only potential sour note is that Iterable is a 1.5 interface.   So 
that's come up again.

jamesG

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by James Grahn <jg...@simulexinc.com>.

On 1/19/2011 3:44 AM, Dan Creswell wrote:
> Ah, so my measure of the merit of your suggestions includes "how many other
> users want this/like it etc".

Unquestionably, some of the suggestions fall more into the realm of 
preference than anything else; where that's true, I was hoping for some 
initial feedback from the devs.   Others potentially involve questions 
of soundness/methodology, and I believe the discussion here has been 
helpful in shedding light on those areas.   Thanks again for your 
contributions.

> I certainly think this work would be best done in a separate interface as
> I'm reasonably sure at this stage that there will be some implementation
> surprises along the way that probably shouldn't be exposed on the "stable"
> interfaces we already have.
>
> In fact, I'm wondering why you wouldn't just code it up in a branch/scratch
> for yourself and see what happens.....

Oh, that may happen.

I wanted a discussion first because:
1) The discussion would help expose gaps in my thinking.   (And it has 
done that: for instance, low-memory clients weren't on my radar because 
they were never in the usage pattern at my company.)
2) Many of the suggestions can be implemented separately.   If something 
was overwhelmingly popular, there would be sense in doing that first. 
If something was drastically wrongheaded, it would be best to not allow 
it to sink a monolithic set of changes (especially where it involves 
creating a parallel implementation of space).
3) If others were enthused about a particular suggestion, there might be 
more people willing to contribute and/or check over the code.

The last question, of course, is my availability and how exactly to 
split my time between such a branch and whatever other contributions I 
may be able to make.   I'll figure this out myself (naturally), but 
means I won't be contributing a branch of all my proposed changes tomorrow.

jamesG

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Dan Creswell <da...@gmail.com>.

Ah, so my measure of the merit of your suggestions includes "how many other
users want this/like it etc". I'm happy for you to ignore that and proceed
to debate your views but at this point, am not significantly motivated to
continue the discussion on that basis. Or put another way I've probably made
as many contributions as I'm going to whilst we haven't talked more widely
with users.

I certainly think this work would be best done in a separate interface as
I'm reasonably sure at this stage that there will be some implementation
surprises along the way that probably shouldn't be exposed on the "stable"
interfaces we already have.

In fact, I'm wondering why you wouldn't just code it up in a branch/scratch
for yourself and see what happens.....

On 19 January 2011 01:40, James Grahn <jg...@simulexinc.com> wrote:

> One last thing from the original discussion...
>
>
> On 12/22/2010 3:27 PM, Dan Creswell wrote:
>
>> Maybe the test we should do first is to ask our users what they think
>> about
>> the APIs, naming and such....maybe you guys already did that and I haven't
>> read enough of the archives to know in which case, my bad.
>>
>
> We haven't gotten to the point of surveying users yet; the start of this
> discussion was merely my list of suggestions flowing from being a user of
> spaces for years.   So the first thing to decide is whether or not any of
> the suggestions have merit.   (Thanks for your contributions toward that
> discussion, by the way.)
>
> It was also immediately suggested that it might be favorable to preserve
> the original Javaspace interface and implement whatever changes we decide to
> adopt within a new interface (Riverspace?).
>
> The implementation cost of such a new service would be relatively low, as
> it would share most behavior with Javaspace; the primary cost of that
> approach would be potential user confusion over similar-but-distinct
> services.
>
> It may be the best way to introduce changes, however: it would be similar
> to introducing Javaspace05 alongside Javaspace, though we'd probably not be
> extending the interface this time.
>
> jamesG
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by James Grahn <jg...@simulexinc.com>.

One last thing from the original discussion...

On 12/22/2010 3:27 PM, Dan Creswell wrote:
> Maybe the test we should do first is to ask our users what they think about
> the APIs, naming and such....maybe you guys already did that and I haven't
> read enough of the archives to know in which case, my bad.

We haven't gotten to the point of surveying users yet; the start of this 
discussion was merely my list of suggestions flowing from being a user 
of spaces for years.   So the first thing to decide is whether or not 
any of the suggestions have merit.   (Thanks for your contributions 
toward that discussion, by the way.)

It was also immediately suggested that it might be favorable to preserve 
the original Javaspace interface and implement whatever changes we 
decide to adopt within a new interface (Riverspace?).

The implementation cost of such a new service would be relatively low, 
as it would share most behavior with Javaspace; the primary cost of that 
approach would be potential user confusion over similar-but-distinct 
services.

It may be the best way to introduce changes, however: it would be 
similar to introducing Javaspace05 alongside Javaspace, though we'd 
probably not be extending the interface this time.

jamesG

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Dan Creswell <da...@gmail.com>.

On 22 December 2010 18:57, <jg...@simulexinc.com> wrote:

> Regarding the complications pointed out for returning a remote iterator
> from takeMultiple:
> --
> (1) One can size the batch to make best balance network bandwidth and
> latency.
> --
>
> That's currently done by combination of server-side takeMultipleLimit and
> client-side maxEntries.   If we use a remote iterator, I assume

I'm not sure what you're referring to when you say "server-side
takeMultipleLimit". To the best of my knowledge there's no such thing
currently spec'd/required.

> we'd include takeMultipleBatchSize and retain the client-side maxEntries.
> So I don't see this as being substantially different.
>
> --
> (2) One can limit the time a collection of exclusive locks are held under a
> transaction by virtue of the timeout.
> --
>
> Hmm, why would this not be the case under a remote iterator?   I would
> think that the correct behavior would be to release locks after a timeout
> expires regardless of whether the return type was an iterator or collection.
>

Mmmm, it's not really about correct behaviour, it's about spec'd behaviour.
When you say correct I'm guessing you mean expected or maybe ideal?

As spec'd a remote iterator doesn't have a timeout. It has a lease,
sometimes but not always. You may of course extend the lease (more
roundtrips, more performance cost). However, leases can be bounded down by
the space implementation so you don't actually get the timeout you'd like
and would have a number of round-trips to extend the lease to get the
timeout you want. So you can use the lease as a "poor man's timeout" but
it's not the sharp tool you'd perhaps prefer.

All of that makes for some ugly code just to try and do a "simple" batch
take.

Batch take's timeout as spec'd doesn't actually apply unless there are no
entry's initially present to take. Thus any sane implementation does as many
takes as it can as soon as it's invoked and returns them, thus freeing the
locks fast. If there are no matches, it'll wait but probably only for a
single entry, again to avoid the lock/search-queue problem contents gives
you.

> --
> (3) Batching in this way allows multiple clients to remove and process
> entrys in a more scalable fashion than with a (unbounded or no entry limit)
> remote iterator.
> --
>
> Users would still be free to make multiple calls with small values for
> maxEntries if they so chose.   They would also gain the ability to make an
> unbounded request, which is currently lacking, outside of repeated calls.
>

Unbounded requests build up big long lists of taken entries all locked which
inhibits search times for other clients. This is because they will find
matches, then test the locks and find they're potentially taken and must
continue the search. Unless of course each client is using a unique
template. Further you've got something of a fairness problem. Should an
active matchset be filled as a priority over a single take with an identical
template or not? In the interests of liveness, you might say no however the
penalty for that is your matchset holds locks on entrys for longer and
continues to slow searching.

Note that passing a null transaction as you would for a read contents to
reduce the cost of all this locking doesn't work because one must grab and
maintain a take lock for all entries until the matchset is closed down
regardless of transaction type.

And finally, because of the leasing bits, book-keeping and such (and no
means to control the size of batches as spec'd) using a contents type
construct for small batch takes is very costly.

> --
> (4) [gleaned from text] More bookkeeping is necessary.
> --
>
> Certainly.   Also, we'd have to work out the precise semantics that the
> iterator operates under and make them clear in the documentation.
>

I wholeheartedly agree with a qualification:

One of the long debates we had when spec'ing particularly contents was how
tight to constrain things. One big fear was that we might as a result of
very explicit spec rule out a reasonably useful implementation (in
particular clustered spaces). To some extent the ifExists methods when done
properly make a performant cluster difficult, tightening some of these other
specs will potentially make things even harder.

> --
> (5) [gleaned from text] A remote iterator would certainly be less
> performant than a straight batch take.
> --
>
> This is the biggest concern, I think.   As such, I'd be interested in
> seeing performance runs, to back up the intuition.   Then, at least, we'd
> know precisely what trade-off we're talking about.
>

See above for my commentary on the effects of long running matchsets
asserting take locks. The current batch take is designed so as to eradicate
that as an issue. Interestingly, spec'ing batch take more tightly than it is
now could drive an implementation to contents style approach with all it's
costs.

To some extent, I'd claim there's no need for the tests or intuition:

(1) Batch take is a single round-trip. Contents, maybe not, unless you have
means to control the batch size and make sure that fits the maxMatchSize.
Note that the space implementation may choose to control batch sizes so as
to optimise/balance resource usage. Putting control of that in the hands of
the user denies the implementation such an option. In any case, larger
limits equals more batches equals more round-trips so at best you'll get
linear degradation.

(2) Batch take doesn't hold huge chains of locks for extended periods of
time as currently spec'd. Contents does/would. That affects search times
when clients are concurrently "challenging" for similar entries as is often
found in compute farms and such. Again, as above, that'll be at best linear
degradation based on the length of the lock chains.

What the tests might tell you is how bad some particular implementation
approach degrades but that won't be uniform across all implementations which
limits it's value somewhat IMHO.

I think my deepest reservation is that most users I deal with want all
operations to be as quick as possible and have no problems with the
resultant API asymmetry. They also prefer to not worry about managing batch
sizes in contents calls because it's a detail they don't want to have to
account for. Lastly, bulk take is a popular method and adding additional
boilerplate like leasing to get something that looks like contents probably
won't appeal. Contents tends to be used for admin and debugging (some do use
it to get database-style cursors and such, yikes) which are less common than
the need for batch takes and thus the lease paraphernalia is tolerable.

Maybe the test we should do first is to ask our users what they think about
the APIs, naming and such....maybe you guys already did that and I haven't
read enough of the archives to know in which case, my bad.

>
> The test would need to cover both small batches and large, both in
> multiples of the batch-size/takeMultipleLimit and for numbers off of those
> multiples, with transactions and without.
>
> jamesG
>
> -----Original Message-----
> From: "Dan Creswell" <da...@gmail.com>
> Sent: Wednesday, December 22, 2010 5:23am
> To: river-dev@incubator.apache.org
> Subject: Re: Space/outrigger suggestions (remote iterator vs. collection)
>
> Hey,
>
> So the below means you are indeed following my explanation so to your
> question:
>
> Yes, you could use a remote iterator style of thing but for take it's quite
> a heavyweight construct especially once you have transactions in the way.
> The core implementation itself is very similar to contents and would have
> for the most part similar performance. However, it'd certainly be less
> performant than a straight batch take.
>
> More of a concern though is the impact on other clients of the space
> implementation: by virtue of lots of book-keeping, the most exclusive locks
> on entry's and long running transactions that inflict delays on other
> clients leading to poor scaling. Contents by virtue of it's read nature is
> a
> little less painful performance wise and for a lot of applications you'd
> pass no transaction which reduces performance pain further.
>
> So I'd say that batch take is probably a better tradeoff than a take/remote
> iterator combo because:
>
> (1) One can size the batch to make best balance network bandwidth and
> latency.
> (2) One can limit the time a collection of exclusive locks are held under a
> transaction by virtue of the timeout.
> (3) Batching in this way allows multiple clients to remove and process
> entrys in a more scalable fashion than with a (unbounded or no entry limit)
> remote iterator.
>
> In essence one puts the control squarely with the user so's they can get
> what they want albeit at the price of some API asymmetry as you correctly
> point out.
>
> As an implementer, I could reduce my codebase a little if we did takes with
> a remote iterator but being completely honest, not by enough that I'd
> support a spec change for that reason alone.
>
> HTH,
>
> Dan.
>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by MICHAEL MCGRADY <mm...@topiatechnology.com>.

One sort of test would be to see where the weaknesses are and try to break it at the weakest point.  

MG


On Dec 22, 2010, at 11:08 AM, Patricia Shanahan wrote:

> On 12/22/2010 10:57 AM, jgrahn@simulexinc.com wrote:
> ...
>> This is the biggest concern, I think.   As such, I'd be interested in
>> seeing performance runs, to back up the intuition.   Then, at least,
>> we'd know precisely what trade-off we're talking about.
>> 
>> The test would need to cover both small batches and large, both in
>> multiples of the batch-size/takeMultipleLimit and for numbers off of
>> those multiples, with transactions and without.
> 
> I think we need a lot of performance tests, some way to organize them, and some way to retain their results.
> 
> I propose adding a "performance" folder to the River trunk, with subdirectories "src" and "results". src would contain benchmark source code. result would contain benchmark output.
> 
> System level tests could have their own package hierarchy, under org.apache.impl, but reflecting what is being measured. Unit level tests would need to follow the package hierarchy for the code being tested, to get package access. The results hierarchy would mirror that src hierarchy for the tests.
> 
> Any ideas, alternatives, changes, improvements?
> 
> Patricia

Michael McGrady
Chief Architect
Topia Technology, Inc.
Cel 1.253.720.3365
Work 1.253.572.9712 extension 2037
mmcgrady@topiatechnology.com

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Mike McGrady <mm...@topiatechnology.com>.

The dependency issue is important to my mind.  This solution seems like a gray compromise

Sent from my iPhone

Michael McGrady
Principal investigator AF081_028 SBIR
Chief Architect
Topia Technology, Inc
Work 1.253.572.9712
Cel 1.253.720.3365

On Dec 26, 2010, at 4:40 AM, Niclas Hedhman <ni...@hedhman.org> wrote:

> On Thu, Dec 23, 2010 at 6:11 AM, Sim IJskes - QCG <si...@qcg.nl> wrote:
> 
>> Could we use xml? XML based processing is already in the java rt and it
>> saves us another dependency. If things need to be tooled in a later stage i
>> would prefer xml. To be honest, coding in DOM will cause me to tear at least
>> a few hairs out, and JDOM is my favorite (which will cause another
>> dependency).
>> 
>> To be honest, my preference for XML is a purely emotional one.
> 
> Short list;
> JSON pros over XML;
> * Most of the time less verbose
> * Parsing much faster and less resource intensive
> 
> XML pros over XML
> * Namespaces
> 
> In my current main project (Qi4j) we rely quite heavily on JSON, and
> at the same time we want as few dependencies as possible. Solution;
> Copy-Paste an implementation into our sources. It isn't rocket science
> to get JSON right.
> 
> 
> That said; I leave it to you guys to work what you want.
> 
> 
> Cheers
> -- 
> Niclas Hedhman, Software Developer
> http://www.qi4j.org - New Energy for Java
> 
> I live here; http://tinyurl.com/3xugrbk
> I work here; http://tinyurl.com/24svnvk
> I relax here; http://tinyurl.com/2cgsug

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Niclas Hedhman <ni...@hedhman.org>.

On Thu, Dec 23, 2010 at 6:11 AM, Sim IJskes - QCG <si...@qcg.nl> wrote:

> Could we use xml? XML based processing is already in the java rt and it
> saves us another dependency. If things need to be tooled in a later stage i
> would prefer xml. To be honest, coding in DOM will cause me to tear at least
> a few hairs out, and JDOM is my favorite (which will cause another
> dependency).
>
> To be honest, my preference for XML is a purely emotional one.

Short list;
JSON pros over XML;
 * Most of the time less verbose
 * Parsing much faster and less resource intensive

XML pros over XML
 * Namespaces

In my current main project (Qi4j) we rely quite heavily on JSON, and
at the same time we want as few dependencies as possible. Solution;
Copy-Paste an implementation into our sources. It isn't rocket science
to get JSON right.


That said; I leave it to you guys to work what you want.


Cheers
-- 
Niclas Hedhman, Software Developer
http://www.qi4j.org - New Energy for Java

I live here; http://tinyurl.com/3xugrbk
I work here; http://tinyurl.com/24svnvk
I relax here; http://tinyurl.com/2cgsug

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Sim IJskes - QCG <si...@qcg.nl>.

On 12/22/2010 11:38 PM, Patricia Shanahan wrote:
> I also prefer JDOM for my Java XML processing. That is why I was not
> taking into account the dependency issue.
>
> Do you think performance tests should be part of the distribution? I was
> thinking of them as an internal tool, but there is some value in getting
> them into the hands of users.

I'm a bit of a code horder. Tooling that is used during development, put 
it in. Just document it's a tool and not part of the runtime. Maybe 
things like performance change, and somebody else would like to retest 
the assumptions made before. Or somebody has a totally different setup 
than you, or just want to repeat the tests.

>> To be honest, my preference for XML is a purely emotional one.
>
> There is one idea I've found very effective that I don't think would
> work so well with JSON. I used XML for both input and output language
> for my simulations. That way, I could do things like including the input
> parameters in the output, and trivially regenerate the input file if I
> wanted to re-run a test.

Or just code some poor mans persistence with jaxb. I've used it to 
inspect intermediate large datastructures with grep/awk/wc etc.

And a XSLT with FOP or docbook and you have a nice PDF.

Gr. Sim

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

Sim IJskes - QCG wrote:
> On 12/22/2010 10:02 PM, Dan Creswell wrote:
>> What I would say is that if I were to do such a thing I'd be tempted 
>> more by
>> JSON which is often a little less verbose than XML for many things and 
>> has
>> plenty of libraries for parsing etc in almost any language. And the other
>> thing I'd say is that if you're doing the work and prefer XML, you should
>> get ultimate choice.
> 
> Could we use xml? XML based processing is already in the java rt and it 
> saves us another dependency. If things need to be tooled in a later 
> stage i would prefer xml. To be honest, coding in DOM will cause me to 
> tear at least a few hairs out, and JDOM is my favorite (which will cause 
> another dependency).

I also prefer JDOM for my Java XML processing. That is why I was not
taking into account the dependency issue.

Do you think performance tests should be part of the distribution? I was
thinking of them as an internal tool, but there is some value in getting
them into the hands of users.

> To be honest, my preference for XML is a purely emotional one.

There is one idea I've found very effective that I don't think would
work so well with JSON. I used XML for both input and output language
for my simulations. That way, I could do things like including the input
parameters in the output, and trivially regenerate the input file if I
wanted to re-run a test.

I don't think JSON would be a good performance test input file format,
because of the lack of comments.

I really don't know. I'm still thinking about all this.

Patricia

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Sim IJskes - QCG <si...@qcg.nl>.

On 12/22/2010 10:02 PM, Dan Creswell wrote:
> What I would say is that if I were to do such a thing I'd be tempted more by
> JSON which is often a little less verbose than XML for many things and has
> plenty of libraries for parsing etc in almost any language. And the other
> thing I'd say is that if you're doing the work and prefer XML, you should
> get ultimate choice.

Could we use xml? XML based processing is already in the java rt and it 
saves us another dependency. If things need to be tooled in a later 
stage i would prefer xml. To be honest, coding in DOM will cause me to 
tear at least a few hairs out, and JDOM is my favorite (which will cause 
another dependency).

To be honest, my preference for XML is a purely emotional one.

Gr. Sim

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Dan Creswell <da...@gmail.com>.

Two favourite libraries:

http://code.google.com/p/google-gson/

http://jackson.codehaus.org/


On 22 December 2010 21:28, Patricia Shanahan <pa...@acm.org> wrote:

> Patricia Shanahan wrote:
>
>> On 12/22/2010 1:02 PM, Dan Creswell wrote:
>>
> ...
>
>  What I would say is that if I were to do such a thing I'd be tempted more
>>> by
>>> JSON which is often a little less verbose than XML for many things and
>>> has
>>> plenty of libraries for parsing etc in almost any language. And the other
>>> thing I'd say is that if you're doing the work and prefer XML, you should
>>> get ultimate choice.
>>>
>>
>> I know nothing about JSON, so I can't say I have an informed preference.
>> I'll take a look at it. I'm always willing to learn.
>>
>
> I've taken a quick look, and it does seem to have the things I like about
> XML, such as nested objects, in a simpler, less verbose, form.
>
> Can you recommend a specific Java library for generating and parsing JSON?
>
> Thanks,
>
> Patricia
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

Patricia Shanahan wrote:
> On 12/22/2010 1:02 PM, Dan Creswell wrote:
...
>> What I would say is that if I were to do such a thing I'd be tempted 
>> more by
>> JSON which is often a little less verbose than XML for many things and 
>> has
>> plenty of libraries for parsing etc in almost any language. And the other
>> thing I'd say is that if you're doing the work and prefer XML, you should
>> get ultimate choice.
> 
> I know nothing about JSON, so I can't say I have an informed preference.
> I'll take a look at it. I'm always willing to learn.

I've taken a quick look, and it does seem to have the things I like 
about XML, such as nested objects, in a simpler, less verbose, form.

Can you recommend a specific Java library for generating and parsing JSON?

Thanks,

Patricia

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

On 12/22/2010 1:02 PM, Dan Creswell wrote:
> So I agree with the common structure for analysis requirement - it's
> essential for any form of performance and capacity stuff. However I'm also
> lazy so would quite happily leave someone else to spec/build that :)
>
> What I would say is that if I were to do such a thing I'd be tempted more by
> JSON which is often a little less verbose than XML for many things and has
> plenty of libraries for parsing etc in almost any language. And the other
> thing I'd say is that if you're doing the work and prefer XML, you should
> get ultimate choice.

I know nothing about JSON, so I can't say I have an informed preference.
I'll take a look at it. I'm always willing to learn.

> I've built up a variety of benchmarks for Blitz, some of them are basic
> operation exercisers (how many takes or writes can I do etc) and I use those
> for simple tuning exercises. The ones I deem more important though are a
> collection based on real user application behaviour. Micro benchmarks are
> fine and all but don't mean so much in the real-world.

I think we need a range of benchmarks. Micro-benchmarks are good for
focusing in on one class or feature, and can be helpful in finding the
reasons for changes. Benchmarks that reflect real user behavior are key
for knowing where we are, and may tell us that something has changed,
without being very helpful in telling us what has changed.

In many ways, I look at benchmarking the way I look at functional
testing. We need both the benchmarking equivalent of unit tests, and the
benchmarking equivalent of system tests.

>
> I don't believe there's a standard set of benchmarks that everyone does but
> I'm happy to summarise what I've personally done if that's useful. I
> wouldn't offer up the code though as it's been organically built up over
> time and is great for me and my way of working but not so amenable to what
> you want or indeed particularly readable. In my judgement it'd be better to
> re-code from scratch...

Maybe I should add a docs sub-folder to my performance folder idea. Your
summary could go in there, and also be a starting point for a
benchmarking requirements document.

>
>
> On 22 December 2010 20:51, Patricia Shanahan<pa...@acm.org>  wrote:
>
>> Ideally, I'd like to get enough common structure in the output file formats
>> that I can automate comparing a new bulk run to a previous bulk run, and
>> highlighting significant changes.
>>
>> For my recent dissertation research, I needed to compare results of large
>> numbers of simulation runs. I found XML a reasonable compromise between
>> human readability and machine processing.
>>
>> Not an immediate concern - I'd like to have the problem of having too many
>> performance tests for manual handling.
>>
>> Do you know of any existing benchmarks we could use?
>>
>> Patricia
>>
>>
>>
>> On 12/22/2010 12:30 PM, Dan Creswell wrote:
>>
>>> I agree with the need for performance tests.
>>>
>>>   From my own experience I'd say you'd want to be able to run those tests
>>>> in
>>>>
>>> isolation but also together to get a big picture view of a change because
>>> spaces being what they are, it's incredibly easy for an optimisation that
>>> improves one test to cripple another.
>>>
>>> On 22 December 2010 19:08, Patricia Shanahan<pa...@acm.org>   wrote:
>>>
>>>   On 12/22/2010 10:57 AM, jgrahn@simulexinc.com wrote:
>>>> ...
>>>>
>>>>   This is the biggest concern, I think.   As such, I'd be interested in
>>>>
>>>>> seeing performance runs, to back up the intuition.   Then, at least,
>>>>> we'd know precisely what trade-off we're talking about.
>>>>>
>>>>> The test would need to cover both small batches and large, both in
>>>>> multiples of the batch-size/takeMultipleLimit and for numbers off of
>>>>> those multiples, with transactions and without.
>>>>>
>>>>>
>>>> I think we need a lot of performance tests, some way to organize them,
>>>> and
>>>> some way to retain their results.
>>>>
>>>> I propose adding a "performance" folder to the River trunk, with
>>>> subdirectories "src" and "results". src would contain benchmark source
>>>> code.
>>>> result would contain benchmark output.
>>>>
>>>> System level tests could have their own package hierarchy, under
>>>> org.apache.impl, but reflecting what is being measured. Unit level tests
>>>> would need to follow the package hierarchy for the code being tested, to
>>>> get
>>>> package access. The results hierarchy would mirror that src hierarchy for
>>>> the tests.
>>>>
>>>> Any ideas, alternatives, changes, improvements?
>>>>
>>>> Patricia
>>>>
>>>>
>>>
>>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Dan Creswell <da...@gmail.com>.

So I agree with the common structure for analysis requirement - it's
essential for any form of performance and capacity stuff. However I'm also
lazy so would quite happily leave someone else to spec/build that :)

What I would say is that if I were to do such a thing I'd be tempted more by
JSON which is often a little less verbose than XML for many things and has
plenty of libraries for parsing etc in almost any language. And the other
thing I'd say is that if you're doing the work and prefer XML, you should
get ultimate choice.

I've built up a variety of benchmarks for Blitz, some of them are basic
operation exercisers (how many takes or writes can I do etc) and I use those
for simple tuning exercises. The ones I deem more important though are a
collection based on real user application behaviour. Micro benchmarks are
fine and all but don't mean so much in the real-world.

I don't believe there's a standard set of benchmarks that everyone does but
I'm happy to summarise what I've personally done if that's useful. I
wouldn't offer up the code though as it's been organically built up over
time and is great for me and my way of working but not so amenable to what
you want or indeed particularly readable. In my judgement it'd be better to
re-code from scratch...

On 22 December 2010 20:51, Patricia Shanahan <pa...@acm.org> wrote:

> Ideally, I'd like to get enough common structure in the output file formats
> that I can automate comparing a new bulk run to a previous bulk run, and
> highlighting significant changes.
>
> For my recent dissertation research, I needed to compare results of large
> numbers of simulation runs. I found XML a reasonable compromise between
> human readability and machine processing.
>
> Not an immediate concern - I'd like to have the problem of having too many
> performance tests for manual handling.
>
> Do you know of any existing benchmarks we could use?
>
> Patricia
>
>
>
> On 12/22/2010 12:30 PM, Dan Creswell wrote:
>
>> I agree with the need for performance tests.
>>
>>  From my own experience I'd say you'd want to be able to run those tests
>>> in
>>>
>> isolation but also together to get a big picture view of a change because
>> spaces being what they are, it's incredibly easy for an optimisation that
>> improves one test to cripple another.
>>
>> On 22 December 2010 19:08, Patricia Shanahan<pa...@acm.org>  wrote:
>>
>>  On 12/22/2010 10:57 AM, jgrahn@simulexinc.com wrote:
>>> ...
>>>
>>>  This is the biggest concern, I think.   As such, I'd be interested in
>>>
>>>> seeing performance runs, to back up the intuition.   Then, at least,
>>>> we'd know precisely what trade-off we're talking about.
>>>>
>>>> The test would need to cover both small batches and large, both in
>>>> multiples of the batch-size/takeMultipleLimit and for numbers off of
>>>> those multiples, with transactions and without.
>>>>
>>>>
>>> I think we need a lot of performance tests, some way to organize them,
>>> and
>>> some way to retain their results.
>>>
>>> I propose adding a "performance" folder to the River trunk, with
>>> subdirectories "src" and "results". src would contain benchmark source
>>> code.
>>> result would contain benchmark output.
>>>
>>> System level tests could have their own package hierarchy, under
>>> org.apache.impl, but reflecting what is being measured. Unit level tests
>>> would need to follow the package hierarchy for the code being tested, to
>>> get
>>> package access. The results hierarchy would mirror that src hierarchy for
>>> the tests.
>>>
>>> Any ideas, alternatives, changes, improvements?
>>>
>>> Patricia
>>>
>>>
>>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

Ideally, I'd like to get enough common structure in the output file 
formats that I can automate comparing a new bulk run to a previous bulk 
run, and highlighting significant changes.

For my recent dissertation research, I needed to compare results of 
large numbers of simulation runs. I found XML a reasonable compromise 
between human readability and machine processing.

Not an immediate concern - I'd like to have the problem of having too 
many performance tests for manual handling.

Do you know of any existing benchmarks we could use?

Patricia


On 12/22/2010 12:30 PM, Dan Creswell wrote:
> I agree with the need for performance tests.
>
>> From my own experience I'd say you'd want to be able to run those tests in
> isolation but also together to get a big picture view of a change because
> spaces being what they are, it's incredibly easy for an optimisation that
> improves one test to cripple another.
>
> On 22 December 2010 19:08, Patricia Shanahan<pa...@acm.org>  wrote:
>
>> On 12/22/2010 10:57 AM, jgrahn@simulexinc.com wrote:
>> ...
>>
>>   This is the biggest concern, I think.   As such, I'd be interested in
>>> seeing performance runs, to back up the intuition.   Then, at least,
>>> we'd know precisely what trade-off we're talking about.
>>>
>>> The test would need to cover both small batches and large, both in
>>> multiples of the batch-size/takeMultipleLimit and for numbers off of
>>> those multiples, with transactions and without.
>>>
>>
>> I think we need a lot of performance tests, some way to organize them, and
>> some way to retain their results.
>>
>> I propose adding a "performance" folder to the River trunk, with
>> subdirectories "src" and "results". src would contain benchmark source code.
>> result would contain benchmark output.
>>
>> System level tests could have their own package hierarchy, under
>> org.apache.impl, but reflecting what is being measured. Unit level tests
>> would need to follow the package hierarchy for the code being tested, to get
>> package access. The results hierarchy would mirror that src hierarchy for
>> the tests.
>>
>> Any ideas, alternatives, changes, improvements?
>>
>> Patricia
>>
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Dan Creswell <da...@gmail.com>.

I agree with the need for performance tests.

>From my own experience I'd say you'd want to be able to run those tests in
isolation but also together to get a big picture view of a change because
spaces being what they are, it's incredibly easy for an optimisation that
improves one test to cripple another.

On 22 December 2010 19:08, Patricia Shanahan <pa...@acm.org> wrote:

> On 12/22/2010 10:57 AM, jgrahn@simulexinc.com wrote:
> ...
>
>  This is the biggest concern, I think.   As such, I'd be interested in
>> seeing performance runs, to back up the intuition.   Then, at least,
>> we'd know precisely what trade-off we're talking about.
>>
>> The test would need to cover both small batches and large, both in
>> multiples of the batch-size/takeMultipleLimit and for numbers off of
>> those multiples, with transactions and without.
>>
>
> I think we need a lot of performance tests, some way to organize them, and
> some way to retain their results.
>
> I propose adding a "performance" folder to the River trunk, with
> subdirectories "src" and "results". src would contain benchmark source code.
> result would contain benchmark output.
>
> System level tests could have their own package hierarchy, under
> org.apache.impl, but reflecting what is being measured. Unit level tests
> would need to follow the package hierarchy for the code being tested, to get
> package access. The results hierarchy would mirror that src hierarchy for
> the tests.
>
> Any ideas, alternatives, changes, improvements?
>
> Patricia
>

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Patricia Shanahan <pa...@acm.org>.

On 12/22/2010 10:57 AM, jgrahn@simulexinc.com wrote:
...
> This is the biggest concern, I think.   As such, I'd be interested in
> seeing performance runs, to back up the intuition.   Then, at least,
> we'd know precisely what trade-off we're talking about.
>
> The test would need to cover both small batches and large, both in
> multiples of the batch-size/takeMultipleLimit and for numbers off of
> those multiples, with transactions and without.

I think we need a lot of performance tests, some way to organize them, 
and some way to retain their results.

I propose adding a "performance" folder to the River trunk, with 
subdirectories "src" and "results". src would contain benchmark source 
code. result would contain benchmark output.

System level tests could have their own package hierarchy, under 
org.apache.impl, but reflecting what is being measured. Unit level tests 
would need to follow the package hierarchy for the code being tested, to 
get package access. The results hierarchy would mirror that src 
hierarchy for the tests.

Any ideas, alternatives, changes, improvements?

Patricia

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by jg...@simulexinc.com.

Regarding the complications pointed out for returning a remote iterator from takeMultiple:
--
(1) One can size the batch to make best balance network bandwidth and
latency.
--

That's currently done by combination of server-side takeMultipleLimit and client-side maxEntries.   If we use a remote iterator, I assume we'd include takeMultipleBatchSize and retain the client-side maxEntries.   So I don't see this as being substantially different.

--
(2) One can limit the time a collection of exclusive locks are held under a
transaction by virtue of the timeout.
--

Hmm, why would this not be the case under a remote iterator?   I would think that the correct behavior would be to release locks after a timeout expires regardless of whether the return type was an iterator or collection.

--
(3) Batching in this way allows multiple clients to remove and process
entrys in a more scalable fashion than with a (unbounded or no entry limit)
remote iterator.
--

Users would still be free to make multiple calls with small values for maxEntries if they so chose.   They would also gain the ability to make an unbounded request, which is currently lacking, outside of repeated calls.

--
(4) [gleaned from text] More bookkeeping is necessary.
--

Certainly.   Also, we'd have to work out the precise semantics that the iterator operates under and make them clear in the documentation.

--
(5) [gleaned from text] A remote iterator would certainly be less performant than a straight batch take.
--

This is the biggest concern, I think.   As such, I'd be interested in seeing performance runs, to back up the intuition.   Then, at least, we'd know precisely what trade-off we're talking about.

The test would need to cover both small batches and large, both in multiples of the batch-size/takeMultipleLimit and for numbers off of those multiples, with transactions and without.

jamesG

-----Original Message-----
From: "Dan Creswell" <da...@gmail.com>
Sent: Wednesday, December 22, 2010 5:23am
To: river-dev@incubator.apache.org
Subject: Re: Space/outrigger suggestions (remote iterator vs. collection)

Hey,

So the below means you are indeed following my explanation so to your
question:

Yes, you could use a remote iterator style of thing but for take it's quite
a heavyweight construct especially once you have transactions in the way.
The core implementation itself is very similar to contents and would have
for the most part similar performance. However, it'd certainly be less
performant than a straight batch take.

More of a concern though is the impact on other clients of the space
implementation: by virtue of lots of book-keeping, the most exclusive locks
on entry's and long running transactions that inflict delays on other
clients leading to poor scaling. Contents by virtue of it's read nature is a
little less painful performance wise and for a lot of applications you'd
pass no transaction which reduces performance pain further.

So I'd say that batch take is probably a better tradeoff than a take/remote
iterator combo because:

(1) One can size the batch to make best balance network bandwidth and
latency.
(2) One can limit the time a collection of exclusive locks are held under a
transaction by virtue of the timeout.
(3) Batching in this way allows multiple clients to remove and process
entrys in a more scalable fashion than with a (unbounded or no entry limit)
remote iterator.

In essence one puts the control squarely with the user so's they can get
what they want albeit at the price of some API asymmetry as you correctly
point out.

As an implementer, I could reduce my codebase a little if we did takes with
a remote iterator but being completely honest, not by enough that I'd
support a spec change for that reason alone.

HTH,

Dan.

Re: Space/outrigger suggestions (remote iterator vs. collection)

Posted by Dan Creswell <da...@gmail.com>.

Hey,

So the below means you are indeed following my explanation so to your
question:

Yes, you could use a remote iterator style of thing but for take it's quite
a heavyweight construct especially once you have transactions in the way.
The core implementation itself is very similar to contents and would have
for the most part similar performance. However, it'd certainly be less
performant than a straight batch take.

More of a concern though is the impact on other clients of the space
implementation: by virtue of lots of book-keeping, the most exclusive locks
on entry's and long running transactions that inflict delays on other
clients leading to poor scaling. Contents by virtue of it's read nature is a
little less painful performance wise and for a lot of applications you'd
pass no transaction which reduces performance pain further.

So I'd say that batch take is probably a better tradeoff than a take/remote
iterator combo because:

(1) One can size the batch to make best balance network bandwidth and
latency.
(2) One can limit the time a collection of exclusive locks are held under a
transaction by virtue of the timeout.
(3) Batching in this way allows multiple clients to remove and process
entrys in a more scalable fashion than with a (unbounded or no entry limit)
remote iterator.

In essence one puts the control squarely with the user so's they can get
what they want albeit at the price of some API asymmetry as you correctly
point out.

As an implementer, I could reduce my codebase a little if we did takes with
a remote iterator but being completely honest, not by enough that I'd
support a spec change for that reason alone.

HTH,

Dan.

On 22 December 2010 09:19, <jg...@simulexinc.com> wrote:

> My current email client is not advanced enough to do inline, but I think
> I'm following your explanation.
>
> Successive calls of contents may retrieve the same objects, so merely
> calling contents multiple times wouldn't provide the functionality of
> running through a space.   Thus, the remote iterator was introduced in order
> to provide the ability to exhaustively read space on an iterative-type
> basis.
>
> Meanwhile, the takeMultipleLimit in Outrigger that limits the returned
> collection size isn't a practical hindrance because successive takeMultiples
> will eventually grab everything from space, whether it happens all at once
> or not.   The same could be said of a client "sipping" from the space a
> couple entries at a time via maxEntries.
>
> The case for the remote iterator stands reasonably well-made, then: it
> keeps memory overhead fairly low (beholden to the size of actual entries),
> and at minimal network cost.   It could only be reasonably replaced with a
> collection of all matching entries, which would not be satisfactory for
> underpowered clients.
>

So my next question would be: why not use a remote iterator for the
> takeMultiple?
>
> Using a remote iterator would presumably eliminate things like
> takeMultipleLimit, removing the case where the client receives fewer than
> the maxEntries requested when they are available.   Indeed,
> takeMultipleLimit would effectively be replaced with
> "takeMultipleBatchSize", largely transparent to the end user.   We'd gain a
> uniform return type for multiple entry fetches.
>
> Remote iterator usage with takeMultiple would require more network use, but
> perhaps (wild speculation) not much more than a call to contents with a
> transaction.   (Would also need to compare remote iterator to successive
> calls to "take" in evaluating network cost.)   Any pitfalls I'm missing?
>
> jamesG
>
> PS: Apparently I need to study up on read lock semantics; please excuse the
> confusion.
>
> -----Original Message-----
> From: "Dan Creswell" <da...@gmail.com>
> Sent: Monday, December 20, 2010 12:15pm
> To: river-dev@incubator.apache.org
> Subject: Re: Space/outrigger suggestions
>
> K, so inline.....
>
> On 20 December 2010 16:54, <jg...@simulexinc.com> wrote:
>
> > Glad to explain.
> >
> > My argument is a bit simplistic; as a matter of API design, it's
> preferable
> > to have a single return mechanism for multiple returns.
> >
> > I realize there were likely technical reasons for the decision, but it
> > makes for a less uniform API and in particular becomes a greater concern
> if
> > we elect to add new method signatures returning multiple items.
> >
> > I'm not clear on what you mean by the "non-destructive" nature of
> > contents() requiring a remote iterator to be useful.   At my company, we
> > actually wrapped the method to so that we'd ultimately get a collection
> (by
> > exhausting the iterator).
> >
> >
> Non-destructive:
>
> If I have one hundred entry's in a space and I do a batch take of 10 at a
> time assuming there are no other operations I will empty the space after 10
> batch takes.
>
> The same scenario for a batch read does not work. You will never (as the
> spec is now) exhaustively search the entrys. It's entirely acceptable for
> the space to return the same 10 entrys each time you call batch read. Hence
> the need for contents which does some continuous book-keeping that ensures
> you can exhaust the space contents.
>
>
> Also, contents() presumably sets 'read' locks if a transaction is used,
> > creating reservations for future takes, so doesn't the level of
> > 'destructiveness' depend on usage?
> >
> >
> If a transaction is used, locks are set. However it's possible to not pass
> a
> transaction in which case read locks are not asserted. Note also that a
> read
> lock doesn't prevent other read locks thus reservation for a take doesn't
> simply follow.
>
>
> > Now, that's not to say I'm deadset against the remote iterator approach.
> > Remote iterators might save some memory/cpu overhead for truly massive
> > requests, particularly if the user does not necessarily want every entry
> > (though were that the case, maxEntries should have been used).
> >
> >
> How many entrys can you knowingly take/read as a batch without exhausting
> client memory? Difficult to say given one doesn't know how big marshalled
> entrys will be or indeed the amount of free space on the client or indeed
> the server. The result is that large batch takes or indeed reads are
> somewhat undesirable.
>
> Decent remote iterator implementations, incidentally, don't transfer all
> matches in one go - they parcel them out in batches. Large batches
> obviously
> take a long time to transfer and are problematic for clients that want to
> be
> somewhat responsive to their users. Imagine asking for contents of a large
> number of entrys and waiting whilst all of them are transferred (e.g.
> because you want to browse a space).
>
>
> > On the other hand, returning a collection would spare network costs of
> > sustained remote iterator interactions and the mild timing uncertainties
> its
> > usage entails.   And the remote iterator is more complex by its nature.
> >
> >
> Can you explain more about the network costs you envision?
>
> Most remote iterator impls leave the connection open so the window and
> handshake issues suffered by e.g. TCP are eliminated. The same number of
> packets will be transferred give or take the odd frame that is only
> half-full due to the end of a batch being reached.
>
>
> > In any case, I think it would be best to standardize on one or the other.
> >
> > Perhaps as someone involved with Javaspace05, you can illuminate some of
> > the decision making surrounding the current usage of both?
> >
> >
> Some of that is above so I'll stop for now and see what else you ask for
> details of, okay?
>
> Thanks for the explanation, definitely helps....
>
>
>
> > jamesG
> >
> > -----Original Message-----
> > From: "Dan Creswell" <da...@gmail.com>
> > Sent: Monday, December 20, 2010 4:19am
> > To: river-dev@incubator.apache.org
> > Subject: Re: Space/outrigger suggestions
> >
> > James G,
> >
> > Can you explain some more about this statement please?
> >
> > "3) Collections or remote iterators, not both.
> >
> > "contents" returns a remote iterator named "MatchSet", while "take (with
> > collection)" returns a collection.   I can understand the argument
> > behind both use cases, but not necessarily the argument for using both
> > simultaneously.
> >
> > "
> >
> > This has been heavily discussed in the past and contents(), by virtue of
> > it's non-destructive nature (unlike take) needs something akin to a
> remote
> > iterator to be practical/useful. Multiple takes allow you to eventually
> > exhaust a space's contents, multiple reads won't do similarly.
> >
> > So, given I'm scarred with the previous efforts of space implementation
> > including JavaSpace05 I fear my past is colouring my thinking so I'd like
> > to
> > understand more.
> >
> > Cheers,
> >
> > Dan.
> >
> >
> >
>
>
>
>