You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by donghai tang <ta...@gmail.com> on 2022/07/07 04:31:32 UTC

Some questions about implementation of LevenshteinAutomata

Hi,
    Recently,I have read the paper <<Fast String Correction with
Levenshtein-Automata>>. And then I tried to read its implementation of the
Lucene.But I found it's too difficult to figure it out.
I want to ask some questions about the implementation of
Levenshtein-Automata.

*1. What does state mean in LevenshteinAutomata?*
I found the  comment of the state in Lev1ParametricDescription.java is like
below

> // state map
> //   0 -> [(0, 0)]
> //   1 -> [(0, 1)]
> //   2 -> [(0, 1), (1, 1)]
> //   3 -> [(0, 1), (1, 1), (2, 1)]
> //   4 -> [(0, 1), (2, 1)]
>
> Does state mean A^i, B^i .... in the paper?
eg.
0 -> A , 1 -> B, 2 -> C.
[image: 截屏2022-07-07 12.20.02.png]
If so, what does [(0, 0)]  [(0, 1), (1, 1), (2, 1)] in the comments mean?

*2. Why can minErrors have a negative number?*

> Lev1ParametricDescription minErrors is [0, 1, 0, -1, -1]
>
> minErrors is used to calculate if a state is final or not.
the code is

boolean isAccept(int absState) {
  // decode absState -> state, offset
  int state = absState / (w + 1);
  int offset = absState % (w + 1);
  assert offset >= 0;
  return w - offset + minErrors[state] <= n;
}

In the paper the final definition is wordlen - offset <=
max-edit-operations - current-used-edit-operations

So minErrrors should be the min current-used-edit-operations? But why
could it be negative?

*Final Question*
Any good resources could help me better understand the implementation?