You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2013/01/29 13:50:10 UTC

Prefixes

There aren't many compatibility issues at the moment and preserving the 
name as the interface would be better compatibility anyway.  IMO The 
interface name is more important then implementation name so I think 
that it gets priority.

1/ shall we make the interface PrefixMap, and call current PrefixFix 
PrefixFixStd, and have a factory?

Other future implementations include one for XML-valid prefixing for 
example as Turtle and XML are diverging.

The cost of finding a perfect Turtle one vs fast may become significant 
as well.  There are escapes in the local par tof  a prefix name in 
Turtle-1.1.


The abbrevKey used in FastPrefixMap is the URI upto the last '#' or 
failing that '/'.

This is a prefix look up only when two prefix URI abbrevKeys share a prefix.

This is rare (I claim!). I added some tests to 
AbstractTestLightweightPrefixMap to investigate ...
and the two implementations differ here.  The FastPrefixMap does not 
abbreviate where the old code does.

2/ In the general purpose PrefixMapStd class, we could keep a Map from
registered URI string to prefix to speed up that common case.

A Map is smaller than the Trie (which has a map at each level, so 
http:// is 7 maps).  It does not optimize cases where one prefix 
possibility is a prefix of another, and it falls back to brute force 
here.  Does this case matter?  (Using the abbrevKey and stripping back 
the possibilities would be a possibility.)


This all gets a lot more complicated for RDF 1.1 where the local part 
can have escaped / and # in it.

Over-engineering 1:

Delay the Tries creation until the first call of abbrev or at the end of 
parsing (the case of many small files concatenated might have had a lot 
of prefixes)

Or build the Trie during output only?  if there is an abbrevKey, find 
the abbreviate old style, and install it.  This is using in a cache fashion.

Otherwise the suggestion of OutputPrefixMap looks good.

Over-engineering 2:

I thought a bit about two interfaces - one for parsers, one for writers 
but it seem to be quite complicated for little or no value.  What's more 
a prefix map may start by being used for parsing and then used for 
writing (if and when old PrefixMapping gets sorted which would be good 
at Jena3 - it's XML centric).


I'm happy to put time in to help with changes.

	Andy

Re: Prefixes

Posted by Andy Seaborne <an...@apache.org>.

On 29/01/13 21:14, Rob Vesse wrote:
> Comments inline:
>
>
> On 1/29/13 6:42 PM, "Andy Seaborne" <an...@apache.org> wrote:

>> The performance test isn't stable - I added a static BeforeClass to warm
>> the JIT which partially helped.  It gave stability but some tests are
>> failing with the PrefixMapStd now of roughly comparable speed :-|
>>
>> The performance test is commented out ATM because of the residual
>> instability and failures.  This may be machine dependent.
>
> Yes I had a lot of issues with instability of the performance tests, I
> kept adding and removing them and trying to tweak them but yes they are
> unstable and somewhat machine dependent.
>
> If the two implementations are comparable then that is a good thing.  To
> be honest now it looks like PrefixMapStd will likely outperform
> FastAbbreviatingPrefixMap for the common use case since the single map
> lookup in the extra map is likely faster than the Trie lookup in most
> cases :-S
>
> Maybe we should be making PrefixMapStd the default implementation for
> output as well now?  I may do some more experimenting to try and
> characterize whether one is better than another in certain cases.

I'd leave it as-is.  The fastpath in PrefixMapStd only covers some cases 
(copied from earlier FastAbbreviatingPrefixMap code!) and the Trie is 
more general now.

Changing the size of the warm-up changes the test results ... but not in 
a simple way.  Large N for warm-up may favour one over the other but 
increasing N again can reverse the result.

	Andy

Re: Prefixes

Posted by Rob Vesse <rv...@yarcdata.com>.

Comments inline:


On 1/29/13 6:42 PM, "Andy Seaborne" <an...@apache.org> wrote:

>On 29/01/13 17:35, Rob Vesse wrote:
>> I refactored the implementation of Trie to keep the Trie sparse wherever
>> possible, the performance difference between this and the purely Map
>>based
>> version is quite marked. I also added a bunch of useful helper methods
>>to
>> Trie for doing things like longest and shortest partial matches and all
>> partial matches.  I expanded the tests for Trie to cover the new
>> functionality and some other corner cases.
>>
>>
>> PrefixMap is now the interface and PrefixMapStd the default
>> implementation, I also moved more of the common functionality up into
>> PrefixMapBase.  FastAbbreviatingPrefixMap has been improved to use
>> longestMatch() when doing abbreviating.  PrefixMap2 has been renamed to
>> PrefixMapExtended and rewritten to be a true wrapper around an arbitrary
>> PrefixMap, tests for PrefixMapExtended are added to the test suite.
>>
>> There is now a PrefixMapFactory that provides a variety of create
>>methods,
>> plain create() will give the default implementation (currently
>> PrefixMapStd), createForInput() will give an implementation suited to
>> input tasks (currently PrefixMapStd) and createForOutput() will give an
>> implementation suited to output tasks (currently
>> FastAbbreviatingPrefixMap).  There are overloads for all these to
>>create a
>> prefix map which copies from an existing PrefixMap or PrefixMapping.
>>
>> I didn't have a go at any of the refactoring around local name
>>validation
>> to make it extensible, I may do that tomorrow or someone else can take a
>> crack at it if they are interested.
>>
>> Please review and enhance/feedback as desired
>
>Looks great.
>
>I added a reverse map to PrefixMapStd for registered prefix uri to
>prefix string.
>
>This is cheap to manage - it's only an additional map.put during map
>building/parsing.

Ok good, always useful to improve performance on anything

>
>The performance test isn't stable - I added a static BeforeClass to warm
>the JIT which partially helped.  It gave stability but some tests are
>failing with the PrefixMapStd now of roughly comparable speed :-|
>
>The performance test is commented out ATM because of the residual
>instability and failures.  This may be machine dependent.

Yes I had a lot of issues with instability of the performance tests, I
kept adding and removing them and trying to tweak them but yes they are
unstable and somewhat machine dependent.

If the two implementations are comparable then that is a good thing.  To
be honest now it looks like PrefixMapStd will likely outperform
FastAbbreviatingPrefixMap for the common use case since the single map
lookup in the extra map is likely faster than the Trie lookup in most
cases :-S

Maybe we should be making PrefixMapStd the default implementation for
output as well now?  I may do some more experimenting to try and
characterize whether one is better than another in certain cases.

Rob


>
>	Andy
>
>>
>> Rob
>>
>> On 1/29/13 2:46 PM, "Rob Vesse" <rv...@cray.com> wrote:
>>
>>> Comments inline as usual:
>>>
>>>
>>> On 1/29/13 12:50 PM, "Andy Seaborne" <an...@apache.org> wrote:
>>>
>>>> There aren't many compatibility issues at the moment and preserving
>>>>the
>>>> name as the interface would be better compatibility anyway.  IMO The
>>>> interface name is more important then implementation name so I think
>>>> that it gets priority.
>>>>
>>>> 1/ shall we make the interface PrefixMap, and call current PrefixFix
>>>> PrefixFixStd, and have a factory?
>>>
>>> Yes this is what I would like to do, I didn't do this initially since I
>>> wanted to avoid breaking anything to start with.  Since we're doing a
>>> minor version bump I think the change is acceptable.  I will aim to get
>>> this done this afternoon.
>>>
>>>>
>>>> Other future implementations include one for XML-valid prefixing for
>>>> example as Turtle and XML are diverging.
>>>>
>>>> The cost of finding a perfect Turtle one vs fast may become
>>>>significant
>>>> as well.  There are escapes in the local par tof  a prefix name in
>>>> Turtle-1.1.
>>>
>>> Yes that's an interesting challenge, I may refactor the code slightly
>>>to
>>> make it easier to swap in and out different local name validation
>>> routines.
>>>
>>>>
>>>>
>>>> The abbrevKey used in FastPrefixMap is the URI upto the last '#' or
>>>> failing that '/'.
>>>>
>>>> This is a prefix look up only when two prefix URI abbrevKeys share a
>>>> prefix.
>>>>
>>>> This is rare (I claim!). I added some tests to
>>>> AbstractTestLightweightPrefixMap to investigate ...
>>>> and the two implementations differ here.  The FastPrefixMap does not
>>>> abbreviate where the old code does.
>>>
>>> This was the case when the IRI was precisely the namespace IRI so you
>>> expect an abbreviation of prefix:  ?
>>>
>>> I added a fix for that already after I saw your extra tests.
>>>
>>> I have an idea for a better fix which is to add a couple of additional
>>> methods to the Trie class which will find either the shortest or
>>>longest
>>> match possible.  That would give behaviour more akin to the current
>>> PrefixMap implementation.
>>>
>>> Although to be honest the fact that the Map based implementation picks
>>>the
>>> shorter over the longer abbreviation is going to be down to luck and
>>>the
>>> underlying Map implementation.
>>>
>>>>
>>>> 2/ In the general purpose PrefixMapStd class, we could keep a Map from
>>>> registered URI string to prefix to speed up that common case.
>>>>
>>>> A Map is smaller than the Trie (which has a map at each level, so
>>>> http:// is 7 maps).  It does not optimize cases where one prefix
>>>> possibility is a prefix of another, and it falls back to brute force
>>>> here.  Does this case matter?  (Using the abbrevKey and stripping back
>>>> the possibilities would be a possibility.)
>>>
>>> My proposed additions to Trie and modifying the code to use that should
>>> take care of that.  I suspect that the brute force approach is likely
>>>not
>>> necessary once the other changes are in place.
>>>
>>>>
>>>>
>>>> This all gets a lot more complicated for RDF 1.1 where the local part
>>>> can have escaped / and # in it.
>>>
>>> Yes, that'll be a bridge to cross when we get to it.
>>>
>>>>
>>>> Over-engineering 1:
>>>>
>>>> Delay the Tries creation until the first call of abbrev or at the end
>>>>of
>>>> parsing (the case of many small files concatenated might have had a
>>>>lot
>>>> of prefixes)
>>>>
>>>> Or build the Trie during output only?  if there is an abbrevKey, find
>>>> the abbreviate old style, and install it.  This is using in a cache
>>>> fashion.
>>>
>>> I will likely do some refactoring to make Trie sparse where possible.
>>>
>>>>
>>>> Otherwise the suggestion of OutputPrefixMap looks good.
>>>>
>>>> Over-engineering 2:
>>>>
>>>> I thought a bit about two interfaces - one for parsers, one for
>>>>writers
>>>> but it seem to be quite complicated for little or no value.  What's
>>>>more
>>>> a prefix map may start by being used for parsing and then used for
>>>> writing (if and when old PrefixMapping gets sorted which would be good
>>>> at Jena3 - it's XML centric).
>>>
>>> Yes I think the only real difference is that one would have expand()
>>>and
>>> the other abbreviate() but otherwise they would be identical so seems
>>>odd
>>> to separate.
>>>
>>>>
>>>>
>>>> I'm happy to put time in to help with changes.
>>>
>>> Let me have a go at refactoring a bit based on your suggestions and my
>>> ideas and then you can review the changes and see what further
>>>suggestions
>>> or room for improvements you have.
>>>
>>> Rob
>>>
>>>>
>>>> 	Andy
>>>>
>>>
>>
>

Re: Prefixes

Posted by Andy Seaborne <an...@apache.org>.

On 29/01/13 17:35, Rob Vesse wrote:
> I refactored the implementation of Trie to keep the Trie sparse wherever
> possible, the performance difference between this and the purely Map based
> version is quite marked. I also added a bunch of useful helper methods to
> Trie for doing things like longest and shortest partial matches and all
> partial matches.  I expanded the tests for Trie to cover the new
> functionality and some other corner cases.
>
>
> PrefixMap is now the interface and PrefixMapStd the default
> implementation, I also moved more of the common functionality up into
> PrefixMapBase.  FastAbbreviatingPrefixMap has been improved to use
> longestMatch() when doing abbreviating.  PrefixMap2 has been renamed to
> PrefixMapExtended and rewritten to be a true wrapper around an arbitrary
> PrefixMap, tests for PrefixMapExtended are added to the test suite.
>
> There is now a PrefixMapFactory that provides a variety of create methods,
> plain create() will give the default implementation (currently
> PrefixMapStd), createForInput() will give an implementation suited to
> input tasks (currently PrefixMapStd) and createForOutput() will give an
> implementation suited to output tasks (currently
> FastAbbreviatingPrefixMap).  There are overloads for all these to create a
> prefix map which copies from an existing PrefixMap or PrefixMapping.
>
> I didn't have a go at any of the refactoring around local name validation
> to make it extensible, I may do that tomorrow or someone else can take a
> crack at it if they are interested.
>
> Please review and enhance/feedback as desired

Looks great.

I added a reverse map to PrefixMapStd for registered prefix uri to 
prefix string.

This is cheap to manage - it's only an additional map.put during map 
building/parsing.

The performance test isn't stable - I added a static BeforeClass to warm 
the JIT which partially helped.  It gave stability but some tests are 
failing with the PrefixMapStd now of roughly comparable speed :-|

The performance test is commented out ATM because of the residual 
instability and failures.  This may be machine dependent.

	Andy

>
> Rob
>
> On 1/29/13 2:46 PM, "Rob Vesse" <rv...@cray.com> wrote:
>
>> Comments inline as usual:
>>
>>
>> On 1/29/13 12:50 PM, "Andy Seaborne" <an...@apache.org> wrote:
>>
>>> There aren't many compatibility issues at the moment and preserving the
>>> name as the interface would be better compatibility anyway.  IMO The
>>> interface name is more important then implementation name so I think
>>> that it gets priority.
>>>
>>> 1/ shall we make the interface PrefixMap, and call current PrefixFix
>>> PrefixFixStd, and have a factory?
>>
>> Yes this is what I would like to do, I didn't do this initially since I
>> wanted to avoid breaking anything to start with.  Since we're doing a
>> minor version bump I think the change is acceptable.  I will aim to get
>> this done this afternoon.
>>
>>>
>>> Other future implementations include one for XML-valid prefixing for
>>> example as Turtle and XML are diverging.
>>>
>>> The cost of finding a perfect Turtle one vs fast may become significant
>>> as well.  There are escapes in the local par tof  a prefix name in
>>> Turtle-1.1.
>>
>> Yes that's an interesting challenge, I may refactor the code slightly to
>> make it easier to swap in and out different local name validation
>> routines.
>>
>>>
>>>
>>> The abbrevKey used in FastPrefixMap is the URI upto the last '#' or
>>> failing that '/'.
>>>
>>> This is a prefix look up only when two prefix URI abbrevKeys share a
>>> prefix.
>>>
>>> This is rare (I claim!). I added some tests to
>>> AbstractTestLightweightPrefixMap to investigate ...
>>> and the two implementations differ here.  The FastPrefixMap does not
>>> abbreviate where the old code does.
>>
>> This was the case when the IRI was precisely the namespace IRI so you
>> expect an abbreviation of prefix:  ?
>>
>> I added a fix for that already after I saw your extra tests.
>>
>> I have an idea for a better fix which is to add a couple of additional
>> methods to the Trie class which will find either the shortest or longest
>> match possible.  That would give behaviour more akin to the current
>> PrefixMap implementation.
>>
>> Although to be honest the fact that the Map based implementation picks the
>> shorter over the longer abbreviation is going to be down to luck and the
>> underlying Map implementation.
>>
>>>
>>> 2/ In the general purpose PrefixMapStd class, we could keep a Map from
>>> registered URI string to prefix to speed up that common case.
>>>
>>> A Map is smaller than the Trie (which has a map at each level, so
>>> http:// is 7 maps).  It does not optimize cases where one prefix
>>> possibility is a prefix of another, and it falls back to brute force
>>> here.  Does this case matter?  (Using the abbrevKey and stripping back
>>> the possibilities would be a possibility.)
>>
>> My proposed additions to Trie and modifying the code to use that should
>> take care of that.  I suspect that the brute force approach is likely not
>> necessary once the other changes are in place.
>>
>>>
>>>
>>> This all gets a lot more complicated for RDF 1.1 where the local part
>>> can have escaped / and # in it.
>>
>> Yes, that'll be a bridge to cross when we get to it.
>>
>>>
>>> Over-engineering 1:
>>>
>>> Delay the Tries creation until the first call of abbrev or at the end of
>>> parsing (the case of many small files concatenated might have had a lot
>>> of prefixes)
>>>
>>> Or build the Trie during output only?  if there is an abbrevKey, find
>>> the abbreviate old style, and install it.  This is using in a cache
>>> fashion.
>>
>> I will likely do some refactoring to make Trie sparse where possible.
>>
>>>
>>> Otherwise the suggestion of OutputPrefixMap looks good.
>>>
>>> Over-engineering 2:
>>>
>>> I thought a bit about two interfaces - one for parsers, one for writers
>>> but it seem to be quite complicated for little or no value.  What's more
>>> a prefix map may start by being used for parsing and then used for
>>> writing (if and when old PrefixMapping gets sorted which would be good
>>> at Jena3 - it's XML centric).
>>
>> Yes I think the only real difference is that one would have expand() and
>> the other abbreviate() but otherwise they would be identical so seems odd
>> to separate.
>>
>>>
>>>
>>> I'm happy to put time in to help with changes.
>>
>> Let me have a go at refactoring a bit based on your suggestions and my
>> ideas and then you can review the changes and see what further suggestions
>> or room for improvements you have.
>>
>> Rob
>>
>>>
>>> 	Andy
>>>
>>
>

Re: Prefixes

Posted by Rob Vesse <rv...@yarcdata.com>.

I refactored the implementation of Trie to keep the Trie sparse wherever
possible, the performance difference between this and the purely Map based
version is quite marked. I also added a bunch of useful helper methods to
Trie for doing things like longest and shortest partial matches and all
partial matches.  I expanded the tests for Trie to cover the new
functionality and some other corner cases.


PrefixMap is now the interface and PrefixMapStd the default
implementation, I also moved more of the common functionality up into
PrefixMapBase.  FastAbbreviatingPrefixMap has been improved to use
longestMatch() when doing abbreviating.  PrefixMap2 has been renamed to
PrefixMapExtended and rewritten to be a true wrapper around an arbitrary
PrefixMap, tests for PrefixMapExtended are added to the test suite.

There is now a PrefixMapFactory that provides a variety of create methods,
plain create() will give the default implementation (currently
PrefixMapStd), createForInput() will give an implementation suited to
input tasks (currently PrefixMapStd) and createForOutput() will give an
implementation suited to output tasks (currently
FastAbbreviatingPrefixMap).  There are overloads for all these to create a
prefix map which copies from an existing PrefixMap or PrefixMapping.

I didn't have a go at any of the refactoring around local name validation
to make it extensible, I may do that tomorrow or someone else can take a
crack at it if they are interested.

Please review and enhance/feedback as desired

Rob

On 1/29/13 2:46 PM, "Rob Vesse" <rv...@cray.com> wrote:

>Comments inline as usual:
>
>
>On 1/29/13 12:50 PM, "Andy Seaborne" <an...@apache.org> wrote:
>
>>There aren't many compatibility issues at the moment and preserving the
>>name as the interface would be better compatibility anyway.  IMO The
>>interface name is more important then implementation name so I think
>>that it gets priority.
>>
>>1/ shall we make the interface PrefixMap, and call current PrefixFix
>>PrefixFixStd, and have a factory?
>
>Yes this is what I would like to do, I didn't do this initially since I
>wanted to avoid breaking anything to start with.  Since we're doing a
>minor version bump I think the change is acceptable.  I will aim to get
>this done this afternoon.
>
>>
>>Other future implementations include one for XML-valid prefixing for
>>example as Turtle and XML are diverging.
>>
>>The cost of finding a perfect Turtle one vs fast may become significant
>>as well.  There are escapes in the local par tof  a prefix name in
>>Turtle-1.1.
>
>Yes that's an interesting challenge, I may refactor the code slightly to
>make it easier to swap in and out different local name validation
>routines.
>
>>
>>
>>The abbrevKey used in FastPrefixMap is the URI upto the last '#' or
>>failing that '/'.
>>
>>This is a prefix look up only when two prefix URI abbrevKeys share a
>>prefix.
>>
>>This is rare (I claim!). I added some tests to
>>AbstractTestLightweightPrefixMap to investigate ...
>>and the two implementations differ here.  The FastPrefixMap does not
>>abbreviate where the old code does.
>
>This was the case when the IRI was precisely the namespace IRI so you
>expect an abbreviation of prefix:  ?
>
>I added a fix for that already after I saw your extra tests.
>
>I have an idea for a better fix which is to add a couple of additional
>methods to the Trie class which will find either the shortest or longest
>match possible.  That would give behaviour more akin to the current
>PrefixMap implementation.
>
>Although to be honest the fact that the Map based implementation picks the
>shorter over the longer abbreviation is going to be down to luck and the
>underlying Map implementation.
>
>>
>>2/ In the general purpose PrefixMapStd class, we could keep a Map from
>>registered URI string to prefix to speed up that common case.
>>
>>A Map is smaller than the Trie (which has a map at each level, so
>>http:// is 7 maps).  It does not optimize cases where one prefix
>>possibility is a prefix of another, and it falls back to brute force
>>here.  Does this case matter?  (Using the abbrevKey and stripping back
>>the possibilities would be a possibility.)
>
>My proposed additions to Trie and modifying the code to use that should
>take care of that.  I suspect that the brute force approach is likely not
>necessary once the other changes are in place.
>
>>
>>
>>This all gets a lot more complicated for RDF 1.1 where the local part
>>can have escaped / and # in it.
>
>Yes, that'll be a bridge to cross when we get to it.
>
>>
>>Over-engineering 1:
>>
>>Delay the Tries creation until the first call of abbrev or at the end of
>>parsing (the case of many small files concatenated might have had a lot
>>of prefixes)
>>
>>Or build the Trie during output only?  if there is an abbrevKey, find
>>the abbreviate old style, and install it.  This is using in a cache
>>fashion.
>
>I will likely do some refactoring to make Trie sparse where possible.
>
>>
>>Otherwise the suggestion of OutputPrefixMap looks good.
>>
>>Over-engineering 2:
>>
>>I thought a bit about two interfaces - one for parsers, one for writers
>>but it seem to be quite complicated for little or no value.  What's more
>>a prefix map may start by being used for parsing and then used for
>>writing (if and when old PrefixMapping gets sorted which would be good
>>at Jena3 - it's XML centric).
>
>Yes I think the only real difference is that one would have expand() and
>the other abbreviate() but otherwise they would be identical so seems odd
>to separate.
>
>>
>>
>>I'm happy to put time in to help with changes.
>
>Let me have a go at refactoring a bit based on your suggestions and my
>ideas and then you can review the changes and see what further suggestions
>or room for improvements you have.
>
>Rob
>
>>
>>	Andy
>>
>

Re: Prefixes

Posted by Rob Vesse <rv...@yarcdata.com>.

Comments inline as usual:

On 1/29/13 12:50 PM, "Andy Seaborne" <an...@apache.org> wrote:

>There aren't many compatibility issues at the moment and preserving the
>name as the interface would be better compatibility anyway.  IMO The
>interface name is more important then implementation name so I think
>that it gets priority.
>
>1/ shall we make the interface PrefixMap, and call current PrefixFix
>PrefixFixStd, and have a factory?

Yes this is what I would like to do, I didn't do this initially since I
wanted to avoid breaking anything to start with.  Since we're doing a
minor version bump I think the change is acceptable.  I will aim to get
this done this afternoon.

>
>Other future implementations include one for XML-valid prefixing for
>example as Turtle and XML are diverging.
>
>The cost of finding a perfect Turtle one vs fast may become significant
>as well.  There are escapes in the local par tof  a prefix name in
>Turtle-1.1.

Yes that's an interesting challenge, I may refactor the code slightly to
make it easier to swap in and out different local name validation routines.

>
>
>The abbrevKey used in FastPrefixMap is the URI upto the last '#' or
>failing that '/'.
>
>This is a prefix look up only when two prefix URI abbrevKeys share a
>prefix.
>
>This is rare (I claim!). I added some tests to
>AbstractTestLightweightPrefixMap to investigate ...
>and the two implementations differ here.  The FastPrefixMap does not
>abbreviate where the old code does.

This was the case when the IRI was precisely the namespace IRI so you
expect an abbreviation of prefix:  ?

I added a fix for that already after I saw your extra tests.

I have an idea for a better fix which is to add a couple of additional
methods to the Trie class which will find either the shortest or longest
match possible.  That would give behaviour more akin to the current
PrefixMap implementation.

Although to be honest the fact that the Map based implementation picks the
shorter over the longer abbreviation is going to be down to luck and the
underlying Map implementation.

>
>2/ In the general purpose PrefixMapStd class, we could keep a Map from
>registered URI string to prefix to speed up that common case.
>
>A Map is smaller than the Trie (which has a map at each level, so
>http:// is 7 maps).  It does not optimize cases where one prefix
>possibility is a prefix of another, and it falls back to brute force
>here.  Does this case matter?  (Using the abbrevKey and stripping back
>the possibilities would be a possibility.)

My proposed additions to Trie and modifying the code to use that should
take care of that.  I suspect that the brute force approach is likely not
necessary once the other changes are in place.

>
>
>This all gets a lot more complicated for RDF 1.1 where the local part
>can have escaped / and # in it.

Yes, that'll be a bridge to cross when we get to it.

>
>Over-engineering 1:
>
>Delay the Tries creation until the first call of abbrev or at the end of
>parsing (the case of many small files concatenated might have had a lot
>of prefixes)
>
>Or build the Trie during output only?  if there is an abbrevKey, find
>the abbreviate old style, and install it.  This is using in a cache
>fashion.

I will likely do some refactoring to make Trie sparse where possible.

>
>Otherwise the suggestion of OutputPrefixMap looks good.
>
>Over-engineering 2:
>
>I thought a bit about two interfaces - one for parsers, one for writers
>but it seem to be quite complicated for little or no value.  What's more
>a prefix map may start by being used for parsing and then used for
>writing (if and when old PrefixMapping gets sorted which would be good
>at Jena3 - it's XML centric).

Yes I think the only real difference is that one would have expand() and
the other abbreviate() but otherwise they would be identical so seems odd
to separate.

>
>
>I'm happy to put time in to help with changes.

Let me have a go at refactoring a bit based on your suggestions and my
ideas and then you can review the changes and see what further suggestions
or room for improvements you have.

Rob

>
>	Andy
>