You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2011/12/01 01:04:03 UTC

Re: [lucy-dev] Implementing a tokenizer in core

On Wed, Nov 30, 2011 at 11:29:26PM +0100, Nick Wellnhofer wrote:
> OK, things are getting a little more complicated. I'd also like to  
> generate some #defines along with the tables, so I could either generate  
> a separate .h file, or I could simply create a single .c file that gets  
> included by another .c file. This is not very tasteful but it would  
> simplify things.

All of those sound fine to me.  Sounds like you like the .h file option best,
so +1 to that.

> Another question: The perl script that generates the tables uses text  
> files from http://www.unicode.org/Public/UNIDATA/. Should we bundle  
> these files with Lucy?

How about we provide a link in the script's docs to the monolithic archive of
the version of those files we want to use?  For instance:

    http://www.unicode.org/Public/6.0.0/ucd/UCD.zip

Then the script can just take an arg to the expanded directory.

    perl devel/bin/gen_uniprops.pl /path/to/UCD

We can also bundle if you prefer (the license allows it) -- it's just a little
more work and a little more bandwidth.

Marvin Humphrey