You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Benson Margulies <be...@basistech.com> on 2013/12/02 18:11:33 UTC

Where is the source for the .dat files in Kuromoji?

There are a handful of binary files
in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
ending in .dat.

Trailing around in the source, it seems as if at least one of these derives
from a source file named "unk.def".  In turn, this file comes from a
dependency. should the build generate the file rather than having it in the
tree and shipped as part of the source release?

Re: Where is the source for the .dat files in Kuromoji?

Posted by Benson Margulies <be...@basistech.com>.
Thanks.


On Mon, Dec 2, 2013 at 12:21 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi Benson,
>
> If you run "ant regenerate", it downloads the source files (which is "ant
> download-dict") and then rebuilds ("ant build-dict") the FSTs and other
> binary stuff stored in the dat file. See also the ivy.xml.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Benson Margulies [mailto:benson@basistech.com]
> > Sent: Monday, December 02, 2013 6:12 PM
> > To: java-user@lucene.apache.org; Christian Moen
> > Subject: Where is the source for the .dat files in Kuromoji?
> >
> > There are a handful of binary files
> > in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
> > ending in .dat.
> >
> > Trailing around in the source, it seems as if at least one of these
> derives from
> > a source file named "unk.def".  In turn, this file comes from a
> dependency.
> > should the build generate the file rather than having it in the tree and
> > shipped as part of the source release?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Where is the source for the .dat files in Kuromoji?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Benson,

If you run "ant regenerate", it downloads the source files (which is "ant download-dict") and then rebuilds ("ant build-dict") the FSTs and other binary stuff stored in the dat file. See also the ivy.xml.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Benson Margulies [mailto:benson@basistech.com]
> Sent: Monday, December 02, 2013 6:12 PM
> To: java-user@lucene.apache.org; Christian Moen
> Subject: Where is the source for the .dat files in Kuromoji?
> 
> There are a handful of binary files
> in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
> ending in .dat.
> 
> Trailing around in the source, it seems as if at least one of these derives from
> a source file named "unk.def".  In turn, this file comes from a dependency.
> should the build generate the file rather than having it in the tree and
> shipped as part of the source release?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Where is the source for the .dat files in Kuromoji?

Posted by Christian Moen <cm...@atilika.com>.
On Dec 3, 2013, at 8:38 AM, Benson Margulies <be...@basistech.com> wrote:

> I'm not clear that there's anything that anyone would complain of. The question is, are the .dat files part of the source bundle that is the 'official release'? I just fetched from git, not from the official release, so I don't know.

I’d say the .dat files are part of the source bundle, which is the ‘official release’, but PMCs feel free to chime in...

The .dat files have been checked into SVN in binary form to make Lucene easy to build and they’re also rather modest in size thanks to squeezing work done by Robert and Uwe.

Best,
Christian


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Where is the source for the .dat files in Kuromoji?

Posted by Benson Margulies <be...@basistech.com>.
On Mon, Dec 2, 2013 at 6:27 PM, Christian Moen <cm...@atilika.com> wrote:

> Hello Benson,
>
> The sources for the .dat files are available from
>
>
> https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
>
> http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz




>
> and a range of other places.
>
> I’m not sure I follow what you’re saying regarding unk.def -- it’s to my
> knowledge used as-is from the above sources when the binary .dat files are
> made.  (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.)
>
> Perhaps I’m missing something.  Could you clarify how you think things
> should be done?
>

I'm not clear that there's anything that anyone would complain of. The
question is, are the .dat files part of the source bundle that is the
'official release'? I just fetched from git, not from the official release,
so I don't know.







>
> Many thanks,
>
> Christian Moen
> アティリカ株式会社
> http://www.atilika.com
>
> On Dec 3, 2013, at 2:11 AM, Benson Margulies <be...@basistech.com> wrote:
>
> > There are a handful of binary files in
> ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending
> in .dat.
> >
> > Trailing around in the source, it seems as if at least one of these
> derives from a source file named "unk.def".  In turn, this file comes from
> a dependency. should the build generate the file rather than having it in
> the tree and shipped as part of the source release?
> >
> >
>
>

Re: Where is the source for the .dat files in Kuromoji?

Posted by Christian Moen <cm...@atilika.com>.
Hello Benson,

The sources for the .dat files are available from

	https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
	http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz

and a range of other places.

I’m not sure I follow what you’re saying regarding unk.def -- it’s to my knowledge used as-is from the above sources when the binary .dat files are made.  (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.)

Perhaps I’m missing something.  Could you clarify how you think things should be done?

Many thanks,

Christian Moen
アティリカ株式会社
http://www.atilika.com

On Dec 3, 2013, at 2:11 AM, Benson Margulies <be...@basistech.com> wrote:

> There are a handful of binary files in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in .dat.
> 
> Trailing around in the source, it seems as if at least one of these derives from a source file named "unk.def".  In turn, this file comes from a dependency. should the build generate the file rather than having it in the tree and shipped as part of the source release?
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org