You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2011/12/01 01:04:03 UTC
Re: [lucy-dev] Implementing a tokenizer in core
On Wed, Nov 30, 2011 at 11:29:26PM +0100, Nick Wellnhofer wrote:
> OK, things are getting a little more complicated. I'd also like to
> generate some #defines along with the tables, so I could either generate
> a separate .h file, or I could simply create a single .c file that gets
> included by another .c file. This is not very tasteful but it would
> simplify things.
All of those sound fine to me. Sounds like you like the .h file option best,
so +1 to that.
> Another question: The perl script that generates the tables uses text
> files from http://www.unicode.org/Public/UNIDATA/. Should we bundle
> these files with Lucy?
How about we provide a link in the script's docs to the monolithic archive of
the version of those files we want to use? For instance:
http://www.unicode.org/Public/6.0.0/ucd/UCD.zip
Then the script can just take an arg to the expanded directory.
perl devel/bin/gen_uniprops.pl /path/to/UCD
We can also bundle if you prefer (the license allows it) -- it's just a little
more work and a little more bandwidth.
Marvin Humphrey