You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by Jacob Eckel <ec...@dealfusion.com> on 2001/02/14 10:47:17 UTC

A major bug with non-greedy closures (.*?)

The bug is reproducible in Regexp 1.2 using the statement:

  RE re = new RE("ABC.*?X+Z");

This will cause an ArrayIndexOutOfBoundsException in
RECompiler.setNextOfEnd().
The actual reason for the exception is the direct casting from char to int
which
causes a negative value to be set as a large positive value into the int.
This may be fixed using a (short) casting:

(short)instruction[node + RE.offsetNext]

The same problem exists also in RECompiler.expr() and
REProgram.setInstructions().

However fixing those problems only brings us to the next one -
an infinite loop in RECompiler.setNextOfEnd(). This is caused by a loop
existing in the instruction linked list. I tried to work on this
but unfortunately was unable to find the source of the problem.
It seams that the line "setNextOfEnd(ret, lenInstruction);" found in
RECompiler.closure() is somehow responsible for the creation of the loop.
Please help...

Jacob Eckel


Re: A major bug with non-greedy closures (.*?)

Posted by Michael McCallum <gh...@xtra.co.nz>.
Ok have a fix.

Its not perfect though. I think there is a small design flaw with way 
reluctant matches are handled which makes a proper fix very difficult.

Have made sure that programs get dumped on test failures and also for 
interactive tests. ( a small helper script is attached which if put into the 
build directory can be used to run RETest.)

Fixed a bug that caused various failures when the same compiler was used for 
multiple compilations. Did a bit of cruising past the end of its program 
chaining all sorts of goodies together. (Usually did not affect the program 
but occasionally caused the compiler to get ArrayIndexOutOfBounds).

Michael


Re: A major bug with non-greedy closures (.*?)

Posted by Michael McCallum <gh...@xtra.co.nz>.
On Wednesday 14 February 2001 09:47, you wrote:
} The bug is reproducible in Regexp 1.2 using the statement:
}
}   RE re = new RE("ABC.*?X+Z");
}
} This will cause an ArrayIndexOutOfBoundsException in
} RECompiler.setNextOfEnd().
} The actual reason for the exception is the direct casting from char to int
} which
} causes a negative value to be set as a large positive value into the int.
} This may be fixed using a (short) casting:
}
} (short)instruction[node + RE.offsetNext]
}
} The same problem exists also in RECompiler.expr() and
} REProgram.setInstructions().
As you discovered the problem is not the cast as the offsets should always be 
positive I think. The goto's are for negative offsets.

}
} However fixing those problems only brings us to the next one -
} an infinite loop in RECompiler.setNextOfEnd(). This is caused by a loop
} existing in the instruction linked list. I tried to work on this
} but unfortunately was unable to find the source of the problem.
} It seams that the line "setNextOfEnd(ret, lenInstruction);" found in
} RECompiler.closure() is somehow responsible for the creation of the loop.
} Please help...

The reluctant closures definitely has something fishy going on.
Will have another look tomorrow. 

Michael