You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2011/06/16 13:01:48 UTC
[jira] [Issue Comment Edited] (LUCENE-3206) FST package API
refactoring
[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050347#comment-13050347 ]
Dawid Weiss edited comment on LUCENE-3206 at 6/16/11 11:01 AM:
---------------------------------------------------------------
This is my take at the revamped FST API. My changes are mostly aiming at having a bit clearer code (especially wrt. to loops), but also detach the "algebra" of a transition's output from the actual output. This should allow us to create an output algebra that would work directly on mutable integers, for example (to save on autoboxing). I also just like the way it reads after the changes:
{code}
FST<Integer> fst = FSTBuilder.fst(FST.ArcLabel.BYTE2, PositiveInt.class)
.add("abc", 10)
.add("abc, 5)
.add("def", 0, 3), 2)
.build();
{code}
or a loop over all arcs of a state:
{code}
Arc<Integer> arc = fst.getRoot();
for (Arc<Integer> tmp = arc.copy(); tmp.hasNext(); tmp.next()) {
int label = tmp.getLabel(); // transition label here.
Integer output = tmp.getOutput();
}
{code}
I definitely didn't consider all the use cases that FSTs are used for currently (in particular the "stop" bit indicating non-accepted input sequences that are also dead ends), but I think these could be integrated... I think :)
Arcs now also store the pointer to the FST object, which may seem like an overhead, but I doubt it really will be (it's a single pointer and we buffer arcs whenever we can; a larger waste is having an object on each arc's output, even if it can be a primitive type or reused buffer).
was (Author: dweiss):
This is my take at the revamped FST API. My changes are mostly aiming at having a bit clearer code (especially wrt. to loops), but also detach the "algebra" of a transition's output from the actual output. This should allow us to create an output algebra that would work directly on mutable integers, for example (to save on autoboxing). I also just like the way it reads after the changes:
{code}
FST<Integer> fst = FSTBuilder.fst(FST.ArcLabel.BYTE2, PositiveInt.class)
.add("abc", 10)
.add("abc, 5)
.add("def", 0, 3), 2)
.build();
{code}
or a loop over all arcs of a state:
{code}
Arc<Integer> arc = fst.getRoot();
for (Arc<Integer> tmp = arc.copy(); tmp.hasNext(); tmp.next()) {
int label = tmp.getLabel(); // transition label here.
Integer output = tmp.getOutput(); // FSAs have a constant empty output.
}
{code}
I definitely didn't consider all the use cases that FSTs are used for currently (in particular the "stop" bit indicating non-accepted input sequences that are also dead ends), but I think these could be integrated... I think :)
Arcs now also store the pointer to the FST object, which may seem like an overhead, but I doubt it really will be (it's a single pointer and we buffer arcs whenever we can; a larger waste is having an object on each arc's output, even if it can be a primitive type or reused buffer).
> FST package API refactoring
> ---------------------------
>
> Key: LUCENE-3206
> URL: https://issues.apache.org/jira/browse/LUCENE-3206
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/FSTs
> Affects Versions: 3.2
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org