You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@joshua.apache.org by le...@apache.org on 2016/05/26 05:11:55 UTC

[50/53] [abbrv] [partial] incubator-joshua git commit: Pulled JOSHUA-252 changes and Resolved Merge Conflicts

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/LICENSE
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/LICENSE b/ext/giza-pp/GIZA++-v2/LICENSE
deleted file mode 100644
index 5b2225e..0000000
--- a/ext/giza-pp/GIZA++-v2/LICENSE
+++ /dev/null
@@ -1,282 +0,0 @@
-
-
-Preamble
-
-The licenses for most software are designed to take away your freedom
-to share and change it. By contrast, the GNU General Public License is
-intended to guarantee your freedom to share and change free
-software--to make sure the software is free for all its users. This
-General Public License applies to most of the Free Software
-Foundation's software and to any other program whose authors commit to
-using it. (Some other Free Software Foundation software is covered by
-the GNU Library General Public License instead.) You can apply it to
-your programs, too.
-
-When we speak of free software, we are referring to freedom, not
-price. Our General Public Licenses are designed to make sure that you
-have the freedom to distribute copies of free software (and charge for
-this service if you wish), that you receive source code or can get it
-if you want it, that you can change the software or use pieces of it
-in new free programs; and that you know you can do these things.
-
-To protect your rights, we need to make restrictions that forbid
-anyone to deny you these rights or to ask you to surrender the
-rights. These restrictions translate to certain responsibilities for
-you if you distribute copies of the software, or if you modify it.
-
-For example, if you distribute copies of such a program, whether
-gratis or for a fee, you must give the recipients all the rights that
-you have. You must make sure that they, too, receive or can get the
-source code. And you must show them these terms so they know their
-rights.
-
-We protect your rights with two steps: (1) copyright the software, and
-(2) offer you this license which gives you legal permission to copy,
-distribute and/or modify the software.
-
-Also, for each author's protection and ours, we want to make certain
-that everyone understands that there is no warranty for this free
-software. If the software is modified by someone else and passed on,
-we want its recipients to know that what they have is not the
-original, so that any problems introduced by others will not reflect
-on the original authors' reputations.
-
-Finally, any free program is threatened constantly by software
-patents. We wish to avoid the danger that redistributors of a free
-program will individually obtain patent licenses, in effect making the
-program proprietary. To prevent this, we have made it clear that any
-patent must be licensed for everyone's free use or not licensed at
-all.
-
-The precise terms and conditions for copying, distribution and
-modification follow.
-
-
-TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-
-0. This License applies to any program or other work which contains a
-notice placed by the copyright holder saying it may be distributed
-under the terms of this General Public License. The "Program", below,
-refers to any such program or work, and a "work based on the Program"
-means either the Program or any derivative work under copyright law:
-that is to say, a work containing the Program or a portion of it,
-either verbatim or with modifications and/or translated into another
-language. (Hereinafter, translation is included without limitation in
-the term "modification".) Each licensee is addressed as "you".
-
-Activities other than copying, distribution and modification are not
-covered by this License; they are outside its scope. The act of
-running the Program is not restricted, and the output from the Program
-is covered only if its contents constitute a work based on the Program
-(independent of having been made by running the Program). Whether that
-is true depends on what the Program does.
-
-1. You may copy and distribute verbatim copies of the Program's source
-code as you receive it, in any medium, provided that you conspicuously
-and appropriately publish on each copy an appropriate copyright notice
-and disclaimer of warranty; keep intact all the notices that refer to
-this License and to the absence of any warranty; and give any other
-recipients of the Program a copy of this License along with the
-Program.
-
-You may charge a fee for the physical act of transferring a copy, and
-you may at your option offer warranty protection in exchange for a
-fee.
-
-2. You may modify your copy or copies of the Program or any portion of
-it, thus forming a work based on the Program, and copy and distribute
-such modifications or work under the terms of Section 1 above,
-provided that you also meet all of these conditions:
-
-     a) You must cause the modified files to carry prominent notices
-     stating that you changed the files and the date of any change.
-
-     b) You must cause any work that you distribute or publish, that
-     in whole or in part contains or is derived from the Program or
-     any part thereof, to be licensed as a whole at no charge to all
-     third parties under the terms of this License.
-
-     c) If the modified program normally reads commands interactively
-     when run, you must cause it, when started running for such
-     interactive use in the most ordinary way, to print or display an
-     announcement including an appropriate copyright notice and a
-     notice that there is no warranty (or else, saying that you
-     provide a warranty) and that users may redistribute the program
-     under these conditions, and telling the user how to view a copy
-     of this License. (Exception: if the Program itself is interactive
-     but does not normally print such an announcement, your work based
-     on the Program is not required to print an announcement.)
-
-These requirements apply to the modified work as a whole. If
-identifiable sections of that work are not derived from the Program,
-and can be reasonably considered independent and separate works in
-themselves, then this License, and its terms, do not apply to those
-sections when you distribute them as separate works. But when you
-distribute the same sections as part of a whole which is a work based
-on the Program, the distribution of the whole must be on the terms of
-this License, whose permissions for other licensees extend to the
-entire whole, and thus to each and every part regardless of who wrote
-it.
-
-Thus, it is not the intent of this section to claim rights or contest
-your rights to work written entirely by you; rather, the intent is to
-exercise the right to control the distribution of derivative or
-collective works based on the Program.
-
-
-In addition, mere aggregation of another work not based on the Program
-with the Program (or with a work based on the Program) on a volume of
-a storage or distribution medium does not bring the other work under
-the scope of this License.
-
-3. You may copy and distribute the Program (or a work based on it,
-under Section 2) in object code or executable form under the terms of
-Sections 1 and 2 above provided that you also do one of the following:
-
-     a) Accompany it with the complete corresponding machine-readable
-     source code, which must be distributed under the terms of
-     Sections 1 and 2 above on a medium customarily used for software
-     interchange; or,
-
-     b) Accompany it with a written offer, valid for at least three
-     years, to give any third party, for a charge no more than your
-     cost of physically performing source distribution, a complete
-     machine-readable copy of the corresponding source code, to be
-     distributed under the terms of Sections 1 and 2 above on a medium
-     customarily used for software interchange; or,
-
-     c) Accompany it with the information you received as to the offer
-     to distribute corresponding source code. (This alternative is
-     allowed only for noncommercial distribution and only if you
-     received the program in object code or executable form with such
-     an offer, in accord with Subsection b above.)
-
-The source code for a work means the preferred form of the work for
-making modifications to it. For an executable work, complete source
-code means all the source code for all modules it contains, plus any
-associated interface definition files, plus the scripts used to
-control compilation and installation of the executable. However, as a
-special exception, the source code distributed need not include
-anything that is normally distributed (in either source or binary
-form) with the major components (compiler, kernel, and so on) of the
-operating system on which the executable runs, unless that component
-itself accompanies the executable.
-
-If distribution of executable or object code is made by offering
-access to copy from a designated place, then offering equivalent
-access to copy the source code from the same place counts as
-distribution of the source code, even though third parties are not
-compelled to copy the source along with the object code.
-
-4. You may not copy, modify, sublicense, or distribute the Program
-except as expressly provided under this License. Any attempt otherwise
-to copy, modify, sublicense or distribute the Program is void, and
-will automatically terminate your rights under this License. However,
-parties who have received copies, or rights, from you under this
-License will not have their licenses terminated so long as such
-parties remain in full compliance.
-
-5. You are not required to accept this License, since you have not
-signed it. However, nothing else grants you permission to modify or
-distribute the Program or its derivative works. These actions are
-prohibited by law if you do not accept this License. Therefore, by
-modifying or distributing the Program (or any work based on the
-Program), you indicate your acceptance of this License to do so, and
-all its terms and conditions for copying, distributing or modifying
-the Program or works based on it.
-
-6. Each time you redistribute the Program (or any work based on the
-Program), the recipient automatically receives a license from the
-original licensor to copy, distribute or modify the Program subject to
-these terms and conditions. You may not impose any further
-restrictions on the recipients' exercise of the rights granted
-herein. You are not responsible for enforcing compliance by third
-parties to this License.
-
-
-7. If, as a consequence of a court judgment or allegation of patent
-infringement or for any other reason (not limited to patent issues),
-conditions are imposed on you (whether by court order, agreement or
-otherwise) that contradict the conditions of this License, they do not
-excuse you from the conditions of this License. If you cannot
-distribute so as to satisfy simultaneously your obligations under this
-License and any other pertinent obligations, then as a consequence you
-may not distribute the Program at all. For example, if a patent
-license would not permit royalty-free redistribution of the Program by
-all those who receive copies directly or indirectly through you, then
-the only way you could satisfy both it and this License would be to
-refrain entirely from distribution of the Program.
-
-If any portion of this section is held invalid or unenforceable under
-any particular circumstance, the balance of the section is intended to
-apply and the section as a whole is intended to apply in other
-circumstances.
-
-It is not the purpose of this section to induce you to infringe any
-patents or other property right claims or to contest validity of any
-such claims; this section has the sole purpose of protecting the
-integrity of the free software distribution system, which is
-implemented by public license practices. Many people have made
-generous contributions to the wide range of software distributed
-through that system in reliance on consistent application of that
-system; it is up to the author/donor to decide if he or she is willing
-to distribute software through any other system and a licensee cannot
-impose that choice.
-
-This section is intended to make thoroughly clear what is believed to
-be a consequence of the rest of this License.
-
-8. If the distribution and/or use of the Program is restricted in
-certain countries either by patents or by copyrighted interfaces, the
-original copyright holder who places the Program under this License
-may add an explicit geographical distribution limitation excluding
-those countries, so that distribution is permitted only in or among
-countries not thus excluded. In such case, this License incorporates
-the limitation as if written in the body of this License.
-
-9. The Free Software Foundation may publish revised and/or new
-versions of the General Public License from time to time. Such new
-versions will be similar in spirit to the present version, but may
-differ in detail to address new problems or concerns.
-
-Each version is given a distinguishing version number. If the Program
-specifies a version number of this License which applies to it and
-"any later version", you have the option of following the terms and
-conditions either of that version or of any later version published by
-the Free Software Foundation. If the Program does not specify a
-version number of this License, you may choose any version ever
-published by the Free Software Foundation.
-
-10. If you wish to incorporate parts of the Program into other free
-programs whose distribution conditions are different, write to the
-author to ask for permission. For software which is copyrighted by the
-Free Software Foundation, write to the Free Software Foundation; we
-sometimes make exceptions for this. Our decision will be guided by the
-two goals of preserving the free status of all derivatives of our free
-software and of promoting the sharing and reuse of software generally.
-
-NO WARRANTY
-
-11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
-WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
-LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS
-AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF
-ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
-THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
-PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME
-THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
-
-
-12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
-WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
-AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU
-FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
-CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
-PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
-RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
-FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF
-SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
-DAMAGES.
-
-END OF TERMS AND CONDITIONS

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/Makefile
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/Makefile b/ext/giza-pp/GIZA++-v2/Makefile
deleted file mode 100644
index 0148849..0000000
--- a/ext/giza-pp/GIZA++-v2/Makefile
+++ /dev/null
@@ -1,140 +0,0 @@
-.SUFFIXES: .out .o .c .e .r .f .y .l .s .p .cpp .alpha2o .pentiumo .sgio .alphao
-
-INSTALLDIR ?= /usr/local/bin/
-
-#CXX = g++
-
-CFLAGS = $(CFLAGS_GLOBAL) -Wall -Wno-parentheses -std=c++11
-#CFLAGS_OPT = $(CFLAGS) -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE -ffast-math
-CFLAGS_OPT = $(CFLAGS) -O3 -funroll-loops -DNDEBUG -DWORDINDEX_WITH_4_BYTE -DBINARY_SEARCH_FOR_TTABLE -DWORDINDEX_WITH_4_BYTE
-CFLAGS_PRF = $(CFLAGS) -O2 -pg -DNDEBUG -DWORDINDEX_WITH_4_BYTE
-CFLAGS_DBG = $(CFLAGS) -g -DDEBUG -DWORDINDEX_WITH_4_BYTE
-CFLAGS_NRM = $(CFLAGS) -DWORDINDEX_WITH_4_BYTE
-CFLAGS_VDBG = $(CFLAGS) -g -DDEBUG -DWORDINDEX_WITH_4_BYTE -DVDEBUG
-SRC = *.cpp
-TYPE = 
-LDFLAGS =
-
-include Makefile.src
-
-OBJ_DIR_PRF = profile/
-OBJ_DIR_OPT = optimized/
-OBJ_DIR_DBG = debug/
-OBJ_DIR_VDBG = vdebug/
-OBJ_DIR_NRM = norm/
-OBJ_OPT2 = ${SRC2:%.cpp=$(OBJ_DIR_OPT)%.o}
-OBJ_OPT = ${SRC:%.cpp=$(OBJ_DIR_OPT)%.o}
-OBJ_DBG = ${SRC:%.cpp=$(OBJ_DIR_DBG)%.o}
-OBJ_VDBG = ${SRC:%.cpp=$(OBJ_DIR_VDBG)%.o}
-OBJ_NRM = ${SRC:%.cpp=$(OBJ_DIR_NRM)%.o}
-OBJ_PRF = ${SRC:%.cpp=$(OBJ_DIR_PRF)%.o}
-OBJ_DIR = 
-DATE = `date +%d-%m-%Y`
-
-opt: GIZA++ snt2plain.out plain2snt.out snt2cooc.out
-
-GIZA++: $(OBJ_DIR_OPT) $(OBJ_OPT) 
-	$(CXX) $(CFLAGS_OPT) $(OBJ_OPT) $(LDFLAGS) -o GIZA++
-
-prf: GIZA++.prf
-
-GIZA++.prf: $(OBJ_DIR_PRF) $(OBJ_PRF) 
-	$(CXX) $(CFLAGS_PRF) $(OBJ_PRF) -o GIZA++.prf $(LDFLAGS)
-
-dbg: GIZA++.dbg
-
-GIZA++.dbg: $(OBJ_DIR_DBG) $(OBJ_DBG) 
-	$(CXX) $(CFLAGS_DBG) $(OBJ_DBG) -o GIZA++.dbg $(LDFLAGS)
-
-vdbg: GIZA++.vdbg
-
-GIZA++.vdbg: $(OBJ_DIR_VDBG) $(OBJ_VDBG) 
-	$(CXX) $(CFLAGS_VDBG) $(OBJ_VDBG) -o GIZA++.vdbg $(LDFLAGS)
-
-nrm: GIZA++.nrm 
-
-GIZA++.nrm: $(OBJ_DIR_NRM) $(OBJ_NRM) 
-	$(CXX) $(CFLAGS_NRM) $(OBJ_NRM) -o GIZA++.nrm $(LDFLAGS)
-
-all: dbg opt nrm prf 
-
-$(OBJ_DIR_PRF): $(OBJ_DIR)
-	-mkdir $(OBJ_DIR_PRF)
-
-$(OBJ_DIR_OPT): $(OBJ_DIR)
-	-mkdir $(OBJ_DIR_OPT)
-
-$(OBJ_DIR_DBG): $(OBJ_DIR)
-	-mkdir $(OBJ_DIR_DBG)
-
-$(OBJ_DIR_VDBG): $(OBJ_DIR)
-	-mkdir $(OBJ_DIR_VDBG)
-
-$(OBJ_DIR_NRM): $(OBJ_DIR)
-	-mkdir $(OBJ_DIR_NRM)
-
-$(OBJ_DIR):
-	-mkdir $(OBJ_DIR)
-
-$(OBJ_DIR_DBG)%.o: %.cpp
-	$(CXX)  $(CFLAGS_DBG)  -c $< -o $@
-
-$(OBJ_DIR_VDBG)%.o: %.cpp
-	$(CXX)  $(CFLAGS_VDBG)  -c $< -o $@
-
-$(OBJ_DIR_NRM)%.o: %.cpp
-	$(CXX)  $(CFLAGS_NRM)  -c $< -o $@
-
-$(OBJ_DIR_PRF)%.o: %.cpp
-	$(CXX)  $(CFLAGS_PRF) -c $< -o $@
-
-$(OBJ_DIR_OPT)%.o: %.cpp
-	$(CXX)  $(CFLAGS_OPT)  -c $< -o $@
-
-iinstall: opt prf dbg
-	-mkdir $(INSTALLDIR)/$(ARCH)
-	-cp GIZA++ $(INSTALLDIR)/GIZA++
-	-cp GIZA++.prf $(INSTALLDIR)/GIZA++.prf
-	-cp GIZA++.dbg $(INSTALLDIR)/GIZA++.dbg
-
-install: opt 
-	-mkdir $(INSTALLDIR)
-	-cp GIZA++ $(INSTALLDIR)/GIZA++
-
-clean:
-	-rm -f $(OBJ_DIR_NRM)/*.o $(OBJ_DIR_DBG)/*.o $(OBJ_DIR_VDBG)/*.o $(OBJ_DIR_PRF)/*.o $(OBJ_DIR_OPT)/*.o
-	-rm -rf $(OBJ_DIR_NRM) $(OBJ_DIR_DBG) $(OBJ_DIR_VDBG) $(OBJ_DIR_PRF) $(OBJ_DIR_OPT)
-	-rm -f snt2plain.out plain2snt.out snt2cooc.out GIZA++
-
-
-backup: clean
-	tar cf - . | gzip -9 > ../GIZA++src.tar.gz
-
-depend: depend_CLEAN dependencies
-
-depend_CLEAN:
-	rm dependencies
-
-dependencies:
-	@(echo "#Automatically generated dependecy list" >>  dependencies ;\
-	$(CXX) -MM *.cpp $(CFLAGS_OPT) | perl -e 'while(<>){s?^([^\:]+\.o:)?$(OBJ_DIR_OPT)\1?g;print;}'>>  dependencies)
-	@(echo "#Automatically generated dependecy list" >>  dependencies ;\
-	$(CXX) -MM *.cpp $(CFLAGS_DBG) | perl -e 'while(<>){s?^([^\:]+\.o:)?$(OBJ_DIR_DBG)\1?g;print;}'>>  dependencies)
-	@(echo "#Automatically generated dependecy list" >>  dependencies ;\
-	$(CXX) -MM *.cpp $(CFLAGS_VDBG) | perl -e 'while(<>){s?^([^\:]+\.o:)?$(OBJ_DIR_VDBG)\1?g;print;}'>>  dependencies)
-	@(echo "#Automatically generated dependecy list" >>  dependencies ;\
-	$(CXX) -MM *.cpp $(CFLAGS_NRM) | perl -e 'while(<>){s?^([^\:]+\.o:)?$(OBJ_DIR_NRM)\1?g;print;}'>>  dependencies)
-	@(echo "#Automatically generated dependecy list" >>  dependencies ;\
-	$(CXX) -MM *.cpp $(CFLAGS_PRF) | perl -e 'while(<>){s?^([^\:]+\.o:)?$(OBJ_DIR_PRF)\1?g;print;}'>>  dependencies)
-
--include dependencies
-
-snt2plain.out: snt2plain.cpp
-	$(CXX) $(LDFLAGS) -O3 -W -Wall snt2plain.cpp -o snt2plain.out
-
-plain2snt.out: plain2snt.cpp
-	$(CXX) $(LDFLAGS) -O3 -W -Wall plain2snt.cpp -o plain2snt.out
-
-snt2cooc.out: snt2cooc.cpp
-	$(CXX) $(LDFLAGS) -O3 -g -W -Wall snt2cooc.cpp -o snt2cooc.out
-

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/Makefile.definitions
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/Makefile.definitions b/ext/giza-pp/GIZA++-v2/Makefile.definitions
deleted file mode 100644
index e69de29..0000000

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/Makefile.src
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/Makefile.src b/ext/giza-pp/GIZA++-v2/Makefile.src
deleted file mode 100644
index a6b8be7..0000000
--- a/ext/giza-pp/GIZA++-v2/Makefile.src
+++ /dev/null
@@ -1,2 +0,0 @@
-SRC = Parameter.cpp myassert.cpp Perplexity.cpp model1.cpp model2.cpp model3.cpp getSentence.cpp TTables.cpp ATables.cpp AlignTables.cpp main.cpp NTables.cpp model2to3.cpp collCounts.cpp alignment.cpp vocab.cpp MoveSwapMatrix.cpp transpair_model3.cpp transpair_model5.cpp transpair_model4.cpp utility.cpp parse.cpp reports.cpp model3_viterbi.cpp model3_viterbi_with_tricks.cpp Dictionary.cpp model345-peg.cpp hmm.cpp HMMTables.cpp ForwardBackward.cpp
-

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/MoveSwapMatrix.cpp
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/MoveSwapMatrix.cpp b/ext/giza-pp/GIZA++-v2/MoveSwapMatrix.cpp
deleted file mode 100644
index 2b0c3a3..0000000
--- a/ext/giza-pp/GIZA++-v2/MoveSwapMatrix.cpp
+++ /dev/null
@@ -1,231 +0,0 @@
-/*
-
-Copyright (C) 1999,2000,2001  Franz Josef Och (RWTH Aachen - Lehrstuhl fuer Informatik VI)
-
-This file is part of GIZA++ ( extension of GIZA ).
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-#include "MoveSwapMatrix.h"
-
-template<class TRANSPAIR>
-MoveSwapMatrix<TRANSPAIR>::MoveSwapMatrix(const TRANSPAIR&_ef, const alignment&_a)
-  : alignment(_a), ef(_ef), l(ef.get_l()), m(ef.get_m()), _cmove(l+1, m+1), _cswap(m+1, m+1), 
-  delmove(l+1, m+1,0),delswap(m+1, m+1,0),changed(l+2, 0), changedCounter(1), 
-  modelnr(_ef.modelnr()),lazyEvaluation(0),centerDeleted(0)
-{
-  double thisValue=ef.scoreOfAlignmentForChange((*this));
-  if( lazyEvaluation==0)
-    for(WordIndex j=1;j<=m;j++)updateJ(j, 0,thisValue);
-}
-
-template<class TRANSPAIR>
-void MoveSwapMatrix<TRANSPAIR>::updateJ(WordIndex j, bool useChanged,double thisValue)
-{
-  massert( lazyEvaluation==0 );
-  for(WordIndex i=0;i<=l;i++)
-    if( (useChanged==0||changed[i]!=changedCounter) )
-      if( get_al(j)!=i ) 
-	_cmove(i, j)=ef.scoreOfMove((*this), i, j,thisValue);
-      else
-	_cmove(i, j)=1.0;
-  for(WordIndex j2=j+1;j2<=m;j2++)
-    if( get_al(j)!=get_al(j2) )
-      _cswap(j, j2)=ef.scoreOfSwap((*this), j, j2,thisValue);
-    else
-      _cswap(j, j2)=1.0;
-  for(WordIndex j2=1;j2<j;j2++)
-    if( get_al(j)!=get_al(j2) )
-      _cswap(j2, j)=ef.scoreOfSwap((*this), j2, j,thisValue);
-    else
-      _cswap(j2, j)=1.0;
-}
-template<class TRANSPAIR>
-void MoveSwapMatrix<TRANSPAIR>::updateI(WordIndex i,double thisValue)
-{
-  massert( lazyEvaluation==0);
-  for(WordIndex j=1;j<=m;j++)
-    if( get_al(j)!=i )
-      _cmove(i, j)=ef.scoreOfMove((*this), i, j,thisValue);
-    else
-      _cmove(i, j)=1.0;
-}
-
-template<class TRANSPAIR>
-void MoveSwapMatrix<TRANSPAIR>::printWrongs()const{
-  for(WordIndex i=0;i<=l;i++)
-    {
-      for(WordIndex j=1;j<=m;j++)
-	if( get_al(j)==i)
-	  cout << "A";
-	else
-	  {
-	    LogProb real=_cmove(i, j), wanted=ef.scoreOfMove((*this), i, j);
-	    if( fabs(1.0-real/wanted)>1e-3 )
-	      cout << 'b';
-	    else if(fabs(1.0-real/wanted)>1e-10 )
-	      cout << 'e';
-	    else if(real!=wanted)
-	      cout << 'E';
-	    else
-	      cout << ' ';
-	  }
-      cout << endl;
-    }
-  cout << endl;
-  for(WordIndex j=1;j<=m;j++)
-    {
-      for(WordIndex j1=1;j1<=m;j1++)
-	if( j1>j )
-	  {
-	    if( get_al(j)==get_al(j1) )
-	      cout << 'A';
-	    else 
-	      cout << (_cswap(j, j1)==ef.scoreOfSwap((*this), j, j1));
-	  }
-	else
-	  cout << ' ';
-      cout << endl;  
-    }
-  massert(0);
-}
-template<class TRANSPAIR>
-bool MoveSwapMatrix<TRANSPAIR>::isRight()const{
-  if( lazyEvaluation ) 
-    return 1;
-  for(WordIndex i=0;i<=l;i++)
-    for(WordIndex j=1;j<=m;j++)
-      if( get_al(j)!=i && (!(doubleEqual(_cmove(i, j), ef.scoreOfMove((*this), i, j)))) )
-	{
-	  cerr << "DIFF: " << i << " " << j << " " << _cmove(i, j) << " " << ef.scoreOfMove((*this), i, j) << endl;
-	  return 0;
-	}
-  for(WordIndex j=1;j<=m;j++)
-    for(WordIndex j1=1;j1<=m;j1++)
-      if( j1>j&&get_al(j)!=get_al(j1)&&(!doubleEqual(_cswap(j, j1), ef.scoreOfSwap((*this), j, j1))) )
-	{
-	  cerr << "DIFFERENT: " << j << " " << j1 << " " << _cswap(j, j1) << " " << ef.scoreOfSwap((*this), j, j1) << endl;
-	  return 0;
-	}
-  return 1;
-}
-
-template<class TRANSPAIR>
-void MoveSwapMatrix<TRANSPAIR>::doMove(WordIndex _i, WordIndex _j)
-{
-  WordIndex old_i=get_al(_j);
-  if( lazyEvaluation )
-    set(_j,_i);
-  else
-    {
-      if ( modelnr==5||modelnr==6 )
-	{
-	  set(_j, _i);
-	  double thisValue=ef.scoreOfAlignmentForChange((*this));
-	  for(WordIndex j=1;j<=m;j++)updateJ(j, 0,thisValue);
-	}
-      else if ( modelnr==4 )
-	{
-	  changedCounter++;
-	  for(unsigned int k=prev_cept(old_i);k<=next_cept(old_i);++k)changed[k]=changedCounter;
-	  for(unsigned int k=prev_cept(_i);k<=next_cept(_i);++k)changed[k]=changedCounter;
-	  set(_j, _i);
-	  for(unsigned int k=prev_cept(old_i);k<=next_cept(old_i);++k)changed[k]=changedCounter;
-	  for(unsigned int k=prev_cept(_i);k<=next_cept(_i);++k)changed[k]=changedCounter;
-	  double thisValue=ef.scoreOfAlignmentForChange((*this));
-	  for(unsigned int i=0;i<=l;i++)
-	    if(changed[i]==changedCounter)
-	      updateI(i,thisValue);
-	  for(unsigned int j=1;j<=m;j++)
-	    if( changed[get_al(j)]==changedCounter )
-	      updateJ(j, 1,thisValue);
-	}
-      else
-	{
-	  assert(modelnr==3);
-	  set(_j, _i);
-	  changedCounter++;
-	  double thisValue=ef.scoreOfAlignmentForChange((*this));
-	  updateI(old_i,thisValue);
-	  changed[old_i]=changedCounter;
-	  updateI(_i,thisValue);
-	  changed[_i]=changedCounter;
-	  for(WordIndex j=1;j<=m;j++)
-	    if( get_al(j)==_i || get_al(j)==old_i )
-	      updateJ(j, 1,thisValue);
-	}
-    }
-}
-template<class TRANSPAIR>
-void MoveSwapMatrix<TRANSPAIR>::doSwap(WordIndex _j1, WordIndex _j2)
-{
-  assert( cswap(_j1, _j2)>1 );
-  WordIndex i1=get_al(_j1), i2=get_al(_j2);
-  if( lazyEvaluation==1 )
-    {
-      set(_j1, i2);
-      set(_j2, i1);
-    }
-  else
-    {
-      if ( modelnr==5||modelnr==6 )
-	{
-	  set(_j1, i2);
-	  set(_j2, i1);
-	  double thisValue=ef.scoreOfAlignmentForChange((*this));
-	  for(WordIndex j=1;j<=m;j++)updateJ(j, 0,thisValue);
-	}
-      else if( modelnr==4 )
-	{
-	  changedCounter++;
-	  for(unsigned int k=prev_cept(i1);k<=next_cept(i1);++k)changed[k]=changedCounter;
-	  for(unsigned int k=prev_cept(i2);k<=next_cept(i2);++k)changed[k]=changedCounter;
-	  set(_j1, i2);
-	  set(_j2, i1);
-	  double thisValue=ef.scoreOfAlignmentForChange((*this));
-	  for(unsigned int i=0;i<=l;i++)
-	    if(changed[i]==changedCounter)
-	      updateI(i,thisValue);
-	  for(unsigned int j=1;j<=m;j++)
-	    if( changed[get_al(j)]==changedCounter )
-	      updateJ(j, 1,thisValue);
-	}
-      else
-	{
-	  assert(modelnr==3);
-	  set(_j1, i2);
-	  set(_j2, i1);
-	  changedCounter++;
-	  double thisValue=ef.scoreOfAlignmentForChange((*this));
-	  updateI(i1,thisValue);
-	  changed[i1]=changedCounter;
-	  updateI(i2,thisValue);
-	  changed[i2]=changedCounter;
-	  updateJ(_j1, 1,thisValue);
-	  updateJ(_j2, 1,thisValue);
-	}
-    }
-}
-
-#include "transpair_model3.h"
-#include "transpair_model4.h"
-#include "transpair_model5.h"
-#include "transpair_modelhmm.h"
-template class MoveSwapMatrix<transpair_model3>;
-template class MoveSwapMatrix<transpair_model4>;
-template class MoveSwapMatrix<transpair_model5>;
-template class MoveSwapMatrix<transpair_modelhmm>;

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/MoveSwapMatrix.h
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/MoveSwapMatrix.h b/ext/giza-pp/GIZA++-v2/MoveSwapMatrix.h
deleted file mode 100644
index b1bbf15..0000000
--- a/ext/giza-pp/GIZA++-v2/MoveSwapMatrix.h
+++ /dev/null
@@ -1,116 +0,0 @@
-/*
-
-EGYPT Toolkit for Statistical Machine Translation
-Written by Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, David Purdy, Franz Och, Noah Smith, and David Yarowsky.
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-/*--
-MoveSwapMatrix: Efficient representation for moving and swapping
-around in IBM3 training.
-Franz Josef Och (30/07/99)
---*/
-#ifndef moveswap2_costs_h_defined
-#define moveswap2_costs_h_defined
-#include "alignment.h"
-#include "transpair_model3.h"
-#include "myassert.h"
-
-extern short DoViterbiTraining;
-
-template<class TRANSPAIR>
-class MoveSwapMatrix : public alignment
-{
- private:
-  const TRANSPAIR&ef;
-  const WordIndex l, m;
-  Array2<LogProb, Vector<LogProb> > _cmove, _cswap;
-  Array2<char,Vector<char> > delmove,delswap;
-  Vector<int> changed;
-  int changedCounter;
-  const int modelnr;
-  bool lazyEvaluation;
-  bool centerDeleted;
- public:
-  bool check()const
-    {
-	  return 1;
-    }
-  const TRANSPAIR&get_ef()const
-    {return ef;}
-  bool isCenterDeleted()const
-    {return centerDeleted;}
-  bool isLazy()const
-    {return lazyEvaluation;}
-  MoveSwapMatrix(const TRANSPAIR&_ef, const alignment&_a);
-  void updateJ(WordIndex j, bool,double thisValue);
-  void updateI(WordIndex i,double thisValue);
-  void doMove(WordIndex _i, WordIndex _j);
-  void doSwap(WordIndex _j1, WordIndex _j2);
-  void delCenter()
-    {
-      centerDeleted=1;
-    }
-  void delMove(WordIndex x, WordIndex y)
-    {
-      delmove(x,y)=1;
-    }
-  void delSwap(WordIndex x, WordIndex y)
-    {
-      massert(y>x);
-      delswap(x,y)=1;
-      delswap(y,x)=1;
-    }
-  bool isDelMove(WordIndex x, WordIndex y)const
-    {
-      return DoViterbiTraining||delmove(x,y);
-    }
-  bool isDelSwap(WordIndex x, WordIndex y)const
-    {
-      massert(y>x);
-      return DoViterbiTraining||delswap(x,y);
-    }
-  LogProb cmove(WordIndex x, WordIndex y)const
-    {
-      massert( get_al(y)!=x );
-      massert( delmove(x,y)==0 );
-      if( lazyEvaluation )
-	return ef.scoreOfMove(*this,x,y);
-      else
-	{
-	  return _cmove(x, y);
-	}
-    }
-  LogProb cswap(WordIndex x, WordIndex y)const
-    {
-      massert(x<y);
-      massert(delswap(x,y)==0);
-      massert(get_al(x)!=get_al(y));
-      if( lazyEvaluation )
-	return ef.scoreOfSwap(*this,x,y);
-      else
-	{
-	  massert(y>x);
-	  return _cswap(x, y);
-	}
-    }
-  void printWrongs()const;
-  bool isRight()const;
-  friend ostream&operator<<(ostream&out, const MoveSwapMatrix<TRANSPAIR>&m)
-    {return out << (alignment)m << "\nEF:\n"<< m.ef << "\nCMOVE\n"<<m._cmove << "\nCSWAP\n" << m._cswap << endl;};
-};
-#endif

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/NTables.cpp
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/NTables.cpp b/ext/giza-pp/GIZA++-v2/NTables.cpp
deleted file mode 100644
index e02a7c9..0000000
--- a/ext/giza-pp/GIZA++-v2/NTables.cpp
+++ /dev/null
@@ -1,93 +0,0 @@
-/*
-
-EGYPT Toolkit for Statistical Machine Translation
-Written by Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, David Purdy, Franz Och, Noah Smith, and David Yarowsky.
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-#include "NTables.h"
-#include <iostream>
-#include "defs.h"
-#include <fstream>
-#include "Parameter.h"
-
-GLOBAL_PARAMETER(double,NTablesFactorGraphemes,"nSmooth","smoothing for fertility parameters (good value: 64): weight for wordlength-dependent fertility parameters",PARLEV_SMOOTH,64.0);
-GLOBAL_PARAMETER(double,NTablesFactorGeneral,"nSmoothGeneral","smoothing for fertility parameters (default: 0): weight for word-independent fertility parameters",PARLEV_SMOOTH,0.0);
-
-template <class VALTYPE>
-void nmodel<VALTYPE>::printNTable(int noEW, const char* filename, 
-				  const Vector<WordEntry>& evlist, 
-				  bool actual) const
-     // prints the fertility table but with actual sourcce words (not their id)
-{
-  cerr << "Dumping nTable to: " << filename <<  '\n';  
-  ofstream of(filename);
-  VALTYPE p ;
-  WordIndex k, i ;
-  for(i=1; int(i) < noEW; i++){
-    if (evlist[i].freq > 0){
-      if (actual)
-	of << evlist[i].word << ' ' ;
-      else 
-	of << i << ' ' ;
-      for( k=0; k < MAX_FERTILITY; k++){
-	p = getValue(i, k);
-	if (p <= PROB_SMOOTH) 
-	  p = 0;
-	of << p << ' ';      
-      } 
-      of << '\n';
-    }
-  }
-}
-
-template <class VALTYPE>
-void nmodel<VALTYPE>::readNTable(const char *filename){
-  /* This function reads the n table from a file.
-     Each line is of the format:  source_word_id p0 p1 p2 ... pn
-     This is the inverse operation of the printTable function.
-     NAS, 7/11/99
-  */
-  ifstream inf(filename);
-  cerr << "Reading fertility table from " << filename << "\n";
-  if(!inf){
-    cerr << "\nERROR: Cannot open " << filename <<"\n";
-    return;
-  }
-
-  VALTYPE prob;
-  WordIndex tok, i;
-  int nFert=0;
-  while(!inf.eof()){
-    nFert++;
-    inf >> ws >> tok;
-    if (tok > MAX_VOCAB_SIZE){
-      cerr << "NTables:readNTable(): unrecognized token id: " << tok
-    <<'\n';
-    exit(-1);
-  }
-    for(i = 0; i < MAX_FERTILITY; i++){
-      inf >> ws >> prob;
-      getRef(tok, i)=prob;
-    }
-  }
-  cerr << "Read " << nFert << " entries in fertility table.\n";
-  inf.close();
-}
-
-template class nmodel<COUNT>;
-//template class nmodel<PROB>;

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/NTables.h
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/NTables.h b/ext/giza-pp/GIZA++-v2/NTables.h
deleted file mode 100644
index 4bb0565..0000000
--- a/ext/giza-pp/GIZA++-v2/NTables.h
+++ /dev/null
@@ -1,145 +0,0 @@
-/*
-
-EGYPT Toolkit for Statistical Machine Translation
-Written by Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, David Purdy, Franz Och, Noah Smith, and David Yarowsky.
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-#ifndef _ntables_h
-#define _ntables_h 1
-#include "Array2.h"
-#include "Vector.h"
-#include <cassert>
-#include "defs.h"
-#include "vocab.h"
-#include "myassert.h"
-#include "Globals.h"
-
-extern double NTablesFactorGraphemes,NTablesFactorGeneral;
-
-template <class VALTYPE>
-class nmodel
-{
- private:
-  Array2<VALTYPE, Vector<VALTYPE> > ntab;
- public:
-  nmodel(int maxw, int maxn)
-    : ntab(maxw, maxn, 0.0)
-    {}
-  VALTYPE getValue(int w, unsigned int n)const
-    {
-      massert(w!=0);
-      if(n>=ntab.getLen2())
-	return 0.0;
-      else
-	return max(ntab(w, n), VALTYPE(PROB_SMOOTH));
-    }
-  VALTYPE&getRef(int w, int n)
-    {
-      //massert(w!=0);
-      return ntab(w, n);
-    }
-  template<class COUNT>
-  void normalize(nmodel<COUNT>&write,const Vector<WordEntry>* _evlist)const
-{
-  int h1=ntab.getLen1(), h2=ntab.getLen2();
-  int nParams=0;
-  if( _evlist&&(NTablesFactorGraphemes||NTablesFactorGeneral) )
-    {
-      size_t maxlen=0;
-      const Vector<WordEntry>&evlist=*_evlist;
-      for(unsigned int i=1;i<evlist.size();i++)
-	maxlen=max(maxlen,evlist[i].word.length());
-      Array2<COUNT,Vector<COUNT> > counts(maxlen+1,MAX_FERTILITY+1,0.0);
-      Vector<COUNT> nprob_general(MAX_FERTILITY+1,0.0);
-      for(unsigned int i=1;i<min((unsigned int)h1,(unsigned int)evlist.size());i++)
-	{
-	  int l=evlist[i].word.length();
-	  for(int k=0;k<h2;k++)
-	    {
-	      counts(l,k)+=getValue(i,k);
-	      nprob_general[k]+=getValue(i,k);
-	    }
-	}
-      COUNT sum2=0; 
-      for(unsigned int i=1;i<maxlen+1;i++)
-	{
-	  COUNT sum=0.0;
-	  for(int k=0;k<h2;k++)
-	    sum+=counts(i,k);
-	  sum2+=sum;
-	  if( sum )
-	    {
-	      double average=0.0;
-	      //cerr << "l: " << i << " " << sum << " ";
-	      for(int k=0;k<h2;k++)
-		{
-		  counts(i,k)/=sum;
-		  //cerr << counts(i,k) << ' ';
-		  average+=k*counts(i,k);
-		}
-	      //cerr << "avg: " << average << endl;
-	      //cerr << '\n';
-	    }
-	}
-      for(unsigned int k=0;k<nprob_general.size();k++)
-	nprob_general[k]/=sum2;
-      
-      for(int i=1;i<h1;i++)
-	{
-	  int l=-1;
-	  if((unsigned int)i<evlist.size())
-	    l=evlist[i].word.length();
-	  COUNT sum=0.0;
-	  for(int k=0;k<h2;k++)
-	    sum+=getValue(i, k)+((l==-1)?0.0:(counts(l,k)*NTablesFactorGraphemes)) + NTablesFactorGeneral*nprob_general[k];
-	  assert(sum);
-	  for(int k=0;k<h2;k++)
-	    {
-	      write.getRef(i, k)=(getValue(i, k)+((l==-1)?0.0:(counts(l,k)*NTablesFactorGraphemes)))/sum + NTablesFactorGeneral*nprob_general[k];
-	      nParams++;
-	    }
-	}
-    }
-  else
-    for(int i=1;i<h1;i++)
-      {
-	COUNT sum=0.0;
-	for(int k=0;k<h2;k++)
-	  sum+=getValue(i, k);
-	assert(sum);
-	for(int k=0;k<h2;k++)
-	  {
-	    write.getRef(i, k)=getValue(i, k)/sum;
-	    nParams++;
-	  }
-      }
-  cerr << "NTable contains " << nParams << " parameter.\n";
-}
-
-  void clear()
-    {
-      int h1=ntab.getLen1(), h2=ntab.getLen2();
-      for(int i=0;i<h1;i++)for(int k=0;k<h2;k++)
-	ntab(i, k)=0;
-    }
-  void printNTable(int noEW, const char* filename, const Vector<WordEntry>& evlist, bool) const;
-  void readNTable(const char *filename);
-  
-};
-
-#endif

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/Parameter.cpp
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/Parameter.cpp b/ext/giza-pp/GIZA++-v2/Parameter.cpp
deleted file mode 100644
index 8379a25..0000000
--- a/ext/giza-pp/GIZA++-v2/Parameter.cpp
+++ /dev/null
@@ -1,144 +0,0 @@
-/*
-
-Copyright (C) 1997,1998,1999,2000,2001  Franz Josef Och (RWTH Aachen - Lehrstuhl fuer Informatik VI)
-
-This file is part of GIZA++ ( extension of GIZA ).
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-#include "Parameter.h"
-#include <fstream>
-#include <unistd.h>
-#include <sstream>
-
-
-bool absolutePathNames=0;
-string ParameterPathPrefix;
-bool ParameterChangedFlag=0;
-
-bool writeParameters(ofstream&of,const ParSet&parset,int level)
-{
-  if(!of)return 0;
-  for(ParSet::const_iterator i=parset.begin();i!=parset.end();++i)
-    {
-      if(((*i)->getLevel()==level||level==-1)&&(*i)->onlyCopy==0)
-	{
-	  ostringstream os;
-	  (*i)->printValue(os);
-	  os << ends;
-	  string s(os.str());
-	  of << (*i)->getString() << " ";
-	  if( absolutePathNames&&(*i)->isFilename()&&s.length()&&s[0]!='/' )
-	    {
-	      char path[1024];
-	      getcwd(path,1024);
-	      of << path << '/';
-	    }
-	  if( ParameterPathPrefix.length()&&(*i)->isFilename()&&s.length()&&s[0]!='/' )
-	    of << ParameterPathPrefix << '/'; 
-	  (*i)->printValue(of);
-	  of << endl;
-	}
-    }
-  return 1;
-}
-
-bool readParameters(ifstream&f,const ParSet&parset,int verb,int level)
-{
-  string s;
-  if(!f)return 0;
-  while(getline(f,s))
-    {
-      istringstream eingabe(s);
-      string s1,s2;
-      eingabe>>s1>>s2;
-      if(makeSetCommand(s1,s2,parset,verb,level)==0)
-	cerr << "ERROR: could not set: (C) " << s1 << " " << s2 << endl;
-    }
-  return 1;
-}
-
- 
-bool makeSetCommand(string _s1,string s2,const ParSet&parset,int verb,int level)
-{
-  ParPtr anf;
-  int anfset=0;
-  string s1=simpleString(_s1);
-  for(ParSet::const_iterator i=parset.begin();i!=parset.end();++i)
-    {
-      if( *(*i)==s1 )
-	{
-	  if( level==-1 || level==(*i)->getLevel() )
-	    (*i)->setParameter(s2,verb);
-	  else if(verb>1)
-	    cerr << "ERROR: Could not set: (A) " << s1 << " " << s2 << " " << level << " " << (*i)->getLevel() << endl;
-	  return 1;
-	}
-      else if( (*i)->getString().substr(0,s1.length())==s1 )
-	{
-	  anf=(*i);anfset++;
-	}
-    }
-  if(anfset==1)
-    {
-      if( level==-1 || level==anf->getLevel() )
-	anf->setParameter(s2,verb);
-      else if( verb>1 )
-	cerr << "ERROR: Could not set: (B) " << s1 << " " << s2 << " " << level << " " << anf->getLevel() << endl;
-      return 1;
-    }
-  if( anfset>1 )
-    cerr << "ERROR: ambiguous parameter '" << s1 << "'.\n";
-  if( anfset==0 )
-    cerr << "ERROR: parameter '" << s1 << "' does not exist.\n";
-  return 0;
-}
-
-ostream& printPars(ostream&of,const ParSet&parset,int level)
-{
-  if(!of)return of;
-  for(ParSet::const_iterator i=parset.begin();i!=parset.end();++i)
-    {
-      if(((*i)->getLevel()==level||level==-1)&&(*i)->onlyCopy==0)
-	{
-	  (*i)->printAt(of);
-	  of << endl;
-	}
-    }
-  return of;
-}
-
-string simpleString(const string s)
-{
-  string k;
-  for(unsigned int i=0;i<s.length();++i)
-    {
-      char c[2];
-      c[0]=tolower(s[i]);
-      c[1]=0;
-      if( (c[0]>='a'&&c[0]<='z')||(c[0]>='0'&&c[0]<='9') )
-	k += c;
-    }
-  return k;
-}
-
-
-ParSet&getGlobalParSet()
-{
-  static ParSet x;
-  return x;
-}

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/Parameter.h
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/Parameter.h b/ext/giza-pp/GIZA++-v2/Parameter.h
deleted file mode 100644
index 9a6239d..0000000
--- a/ext/giza-pp/GIZA++-v2/Parameter.h
+++ /dev/null
@@ -1,200 +0,0 @@
-/*
-
-Copyright (C) 1997,1998,1999,2000,2001  Franz Josef Och (RWTH Aachen - Lehrstuhl fuer Informatik VI)
-
-This file is part of GIZA++ ( extension of GIZA ).
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-#ifndef PARAMETER_H_DEFINED
-#define PARAMETER_H_DEFINED
-
-#include "mystl.h"
-#include <set>
-#include "Pointer.h"
-#include <string>
-#include "Globals.h"
-#include <fstream>
-#include <cstring>
-
-inline unsigned int mConvert(const string&s,unsigned int &i)
-{ 
-  if( strcasecmp(s.c_str(),"yes")==0 || strcasecmp(s.c_str(),"y")==0 || strcasecmp(s.c_str(),"true")==0 || strcasecmp(s.c_str(),"t")==0 ) { cerr << "TRUE\n";return i=1; }
-  if( strcasecmp(s.c_str(),"no")==0 ||  strcasecmp(s.c_str(),"n")==0 ||  strcasecmp(s.c_str(),"false")==0 ||  strcasecmp(s.c_str(),"f")==0 ) { cerr << "FALSE\n";return i=0;}
-  return i=atoi(s.c_str()); 
-}
-inline int mConvert(const string&s,int &i){ 
-  if( strcasecmp(s.c_str(),"yes")==0 ||  strcasecmp(s.c_str(),"y")==0 ||  strcasecmp(s.c_str(),"true")==0 ||  strcasecmp(s.c_str(),"t")==0 ) { cerr << "TRUE\n";return i=1;}
-  if( strcasecmp(s.c_str(),"no")==0 ||  strcasecmp(s.c_str(),"n")==0 ||  strcasecmp(s.c_str(),"false")==0 ||  strcasecmp(s.c_str(),"f")==0 ) { cerr << "FALSE\n";return i=0;}
-  return i=atoi(s.c_str()); 
-}
-inline double mConvert(const string&s,double &d) { return d=atof(s.c_str()); }
-inline double mConvert(const string&s,float &d) { return d=atof(s.c_str()); }
-inline string mConvert(const string&s,string&n) { return n=s; }
-inline bool mConvert(const string&s,bool&n) { 
-  if( strcasecmp(s.c_str(),"yes")==0 ||  strcasecmp(s.c_str(),"y")==0 ||  strcasecmp(s.c_str(),"true")==0 ||  strcasecmp(s.c_str(),"t")==0 ) { cerr << "TRUE\n";return n=1;}
-  if( strcasecmp(s.c_str(),"no")==0 ||  strcasecmp(s.c_str(),"n")==0 ||  strcasecmp(s.c_str(),"false")==0 ||  strcasecmp(s.c_str(),"f")==0 ) { cerr << "FALSE\n";return n=0;}
-  return n=atoi(s.c_str()); 
-}
-inline short mConvert(const string&s,short&n) { 
-  if( strcasecmp(s.c_str(),"yes")==0 ||  strcasecmp(s.c_str(),"y")==0 ||  strcasecmp(s.c_str(),"true")==0 ||  strcasecmp(s.c_str(),"t")==0 ) { cerr << "TRUE\n";return n=1;}
-  if( strcasecmp(s.c_str(),"no")==0 ||  strcasecmp(s.c_str(),"n")==0 ||  strcasecmp(s.c_str(),"false")==0 ||  strcasecmp(s.c_str(),"f")==0 ) { cerr << "FALSE\n";return n=0;}
-  return n=atoi(s.c_str()); 
-}
-inline unsigned short mConvert(const string&s,unsigned short&n) { 
-  if( strcasecmp(s.c_str(),"yes")==0 ||  strcasecmp(s.c_str(),"y")==0 ||  strcasecmp(s.c_str(),"true")==0 ||  strcasecmp(s.c_str(),"t")==0 ) { cerr << "TRUE\n";return n=1;}
-  if( strcasecmp(s.c_str(),"no")==0 ||  strcasecmp(s.c_str(),"n")==0 ||  strcasecmp(s.c_str(),"false")==0 ||  strcasecmp(s.c_str(),"f")==0 ) { cerr << "FALSE\n";return n=0;}
-  return n=atoi(s.c_str()); 
-}
-
-string simpleString(const string s);
-
-inline int Hashstring(const string& s)
-{
-  int sum=0;
-  string::const_iterator i=s.begin(),end=s.end();
-  for(;i!=end;i++)sum=5*sum+(*i);
-  return sum;
-}
-
-class _Parameter
-{
- protected:
-  string name;
-  bool *ifChanged;
-  string description;
-  int level;
-  bool filename;
- public:
-  int onlyCopy;
-  _Parameter(string n,bool&b,string desc,int _level,bool _onlyCopy)
-    : name(simpleString(n)),ifChanged(&b),description(desc),level(_level),filename(0),onlyCopy(_onlyCopy) {}
-  virtual ~_Parameter(){};
-  bool operator==(const string&s)const
-    { return name== simpleString(s); }
-  void setChanged() 
-    { *ifChanged=true; }
-  virtual bool setParameter(string s2,int)=0;
-  virtual ostream&printAt(ostream&out)=0;
-  virtual ostream&printValue(ostream&out)=0;
-  const string&getString() const { return name; }
-  int getLevel() const { return level;}
-  bool isFilename() { return filename;}
-  void setFilename(bool x=1) { filename=x;}
-  friend bool operator==(const _Parameter&a,const _Parameter&b)
-    { return a.name==b.name; }
-  friend bool operator<(const _Parameter&a,const _Parameter&b)
-    { return a.name<b.name; }
-  friend int Hash(const _Parameter&aaa)
-    { return Hashstring(aaa.name); }
-  friend ostream&operator<<(ostream&out,const _Parameter&p)
-    { return out<<"Parameter: "<<p.name <<endl;}
-};
-
-template<class T>
-class Parameter : public _Parameter
-{
- private:
-  T*t;
- public:
-  Parameter(string n,bool&b,string desc,T&_t,int level=0,bool onlyCopy=0)
-    : _Parameter(n,b,desc,level,onlyCopy),t(&_t) {}
-  virtual ~Parameter(){}
-  virtual bool setParameter(string s2,int verb)
-    { 
-      T x;  
-      if( !(*t==mConvert(s2,x)))
-	{
-	  bool printedFirst=0;
-	  if( verb>1 ) 
-	    {
-	      cout << "Parameter '"<<name <<"' changed from '"<<*t<<"' to '";
-	      printedFirst=1;
-	    }
-	  mConvert(s2,*t);
-	  if( printedFirst ) 
-	    cout << *t <<"'\n";
-	  setChanged();
-	  return 1;
-	} 
-      return 0;
-    }
-  virtual ostream&printAt(ostream&out)
-    {return out << name << " = " << *t << "  (" << description << ")";}
-  virtual ostream&printValue(ostream&out)
-    {return out << *t;}
-};
-
-typedef MP<_Parameter> ParPtr;
-
-class ParSet : public set<ParPtr>
-{
- public:
-  void insert(const ParPtr&x)
-    {
-      if( count(x)!=0 )
-	cerr << "ERROR: element " << x->getString() << " already inserted.\n";
-      set<ParPtr>::insert(x);
-    }
-};
-
-bool makeSetCommand(string s1,string s2,const ParSet&pars,int verb=1,int level= -1);
-ostream&printPars(ostream&out,const ParSet&pars,int level=-1);
-bool writeParameters(ofstream&of,const ParSet&parset,int level=0);
-bool readParameters(ifstream&f,const ParSet&parset,int verb=2,int level=0);
-ParSet&getGlobalParSet();
-extern bool ParameterChangedFlag;
-template<class T>const T&addGlobalParameter(const char *name,const char *description,int level,T*adr,const T&init)
-{
-  *adr=init;
-  getGlobalParSet().insert(new Parameter<T>(name,ParameterChangedFlag,description,*adr,level));
-  return init;
-}
-template<class T>const T&addGlobalParameter(const char *name,const char *name2,const char *description,int level,T*adr,const T&init)
-{
-  *adr=init;
-  getGlobalParSet().insert(new Parameter<T>(name,ParameterChangedFlag,description,*adr,level));
-  getGlobalParSet().insert(new Parameter<T>(name2,ParameterChangedFlag,description,*adr,-1));
-  return init;
-}
-template<class T>const T&addGlobalParameter(const char *name,const char *name2,const char *name3,const char *description,int level,T*adr,const T&init)
-{
-  *adr=init;
-  getGlobalParSet().insert(new Parameter<T>(name,ParameterChangedFlag,description,*adr,level));
-  getGlobalParSet().insert(new Parameter<T>(name2,ParameterChangedFlag,description,*adr,-1));
-  getGlobalParSet().insert(new Parameter<T>(name3,ParameterChangedFlag,description,*adr,-1));
-  return init;
-}
-template<class T>const T&addGlobalParameter(const char *name,const char *name2,const char *name3,const char *name4,const char *description,int level,T*adr,const T&init)
-{
-  *adr=init;
-  getGlobalParSet().insert(new Parameter<T>(name,ParameterChangedFlag,description,*adr,level));
-  getGlobalParSet().insert(new Parameter<T>(name2,ParameterChangedFlag,description,*adr,-1));
-  getGlobalParSet().insert(new Parameter<T>(name3,ParameterChangedFlag,description,*adr,-1));
-  getGlobalParSet().insert(new Parameter<T>(name4,ParameterChangedFlag,description,*adr,-1));
-  return init;
-}
-void MakeParameterOptimizing(istream&file,string resultingParameters);
-
-#define GLOBAL_PARAMETER(TYP,VARNAME,NAME,DESCRIPTION,LEVEL,INIT) TYP VARNAME=addGlobalParameter< TYP >(NAME,DESCRIPTION,LEVEL,&VARNAME,INIT);
-#define GLOBAL_PARAMETER2(TYP,VARNAME,NAME,NAME2,DESCRIPTION,LEVEL,INIT) TYP VARNAME=addGlobalParameter< TYP >(NAME,NAME2,DESCRIPTION,LEVEL,&VARNAME,INIT);
-#define GLOBAL_PARAMETER3(TYP,VARNAME,NAME,NAME2,NAME3,DESCRIPTION,LEVEL,INIT) TYP VARNAME=addGlobalParameter< TYP >(NAME,NAME2,NAME3,DESCRIPTION,LEVEL,&VARNAME,INIT);
-#define GLOBAL_PARAMETER4(TYP,VARNAME,NAME,NAME2,NAME3,NAME4,DESCRIPTION,LEVEL,INIT) TYP VARNAME=addGlobalParameter< TYP >(NAME,NAME2,NAME3,NAME4,DESCRIPTION,LEVEL,&VARNAME,INIT);
-
-void setParameterLevelName(unsigned int i,string x);
-
-#endif

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/Perplexity.cpp
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/Perplexity.cpp b/ext/giza-pp/GIZA++-v2/Perplexity.cpp
deleted file mode 100644
index d44dec5..0000000
--- a/ext/giza-pp/GIZA++-v2/Perplexity.cpp
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
-
-EGYPT Toolkit for Statistical Machine Translation
-Written by Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, David Purdy, Franz Och, Noah Smith, and David Yarowsky.
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-/* Perplexity.cc
- * =============
- * Mike Jahr, 7/21/99
- * Machine Translation group, WS99
- * Center for Language and Speech Processing
- * 
- * Last Modified by: Yaser Al-Onaizan, August 17, 1999
- *
- * Simple class used to calculate cross entropy and perplexity
- * of models.
- */
-
-#include "Perplexity.h"
-
-void Perplexity::record(string model){
-  modelid.push_back(model);
-  perp.push_back(perplexity());
-  ce.push_back(cross_entropy());
-}

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/Perplexity.h
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/Perplexity.h b/ext/giza-pp/GIZA++-v2/Perplexity.h
deleted file mode 100644
index 5010280..0000000
--- a/ext/giza-pp/GIZA++-v2/Perplexity.h
+++ /dev/null
@@ -1,108 +0,0 @@
-/*
-
-EGYPT Toolkit for Statistical Machine Translation
-Written by Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, David Purdy, Franz Och, Noah Smith, and David Yarowsky.
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-/* Perplexity.h
- * ============
- * Mike Jahr, 7/15/99
- * Machine Translation group, WS99
- * Center for Language and Speech Processing
- * 
- * Last Modified by: Yaser Al-Onaizan, August 17, 1999
- *
- * Simple class used to calculate cross entropy and perplexity
- * of models.
- */
-
-#ifndef _PERPLEXITY_H
-#define _PERPLEXITY_H
-
-#include <cmath>
-#include <fstream>
-#include "Vector.h"
-#include "defs.h"
-#include "Array2.h"
-#include "Globals.h"
-
-#define CROSS_ENTROPY_BASE 2
-
-class Perplexity {
- private:
-    double sum;
-    double wc;
-    Array2<double, Vector<double> > *E_M_L;
-    Vector<string> modelid;
-    Vector<double > perp;
-    Vector<double > ce;
-    Vector<string> name ;
- public:
-    ~Perplexity() { delete E_M_L;}
-    Perplexity() {
-      E_M_L  = new Array2<double, Vector<double> >(MAX_SENTENCE_LENGTH,MAX_SENTENCE_LENGTH);
-      unsigned int l, m ;
-      Vector<double> fact(MAX_SENTENCE_LENGTH, 1.0);
-      for (m = 2 ; m < MAX_SENTENCE_LENGTH ; m++)
-	fact[m] = fact[m-1] * m ;
-      for (m = 1 ; m < MAX_SENTENCE_LENGTH ; m++)
-	for (l = 1 ; l < MAX_SENTENCE_LENGTH ; l++) {
-	  (*E_M_L)(l, m) = log (pow((LAMBDA * l), double(m)) * exp(-LAMBDA * double(l)) / 
-				(fact[m])) ;
-	}
-      sum = 0 ;
-      wc = 0;
-      perp.clear();
-      ce.clear();
-      name.clear();
-    }
-    inline void clear() {
-      sum = 0 ;
-      wc = 0 ;
-    }
-    size_t size() const {return(min(perp.size(), ce.size()));}
-    inline void addFactor(const double p, const double count, const int l, 
-			  const int m,bool withPoisson) {
-      wc += count * m ; // number of french words 
-      sum += count * ( (withPoisson?((*E_M_L)(l, m)):0.0) + p) ;
-    }
-    inline double perplexity() const {
-	return exp( -1*sum / wc);
-    }
-
-    inline double cross_entropy() const {
-      return (-1.0*sum / (log(double(CROSS_ENTROPY_BASE)) * wc)); 
-    }
-
-    inline double word_count() const {
-	return wc;
-    }
-    
-    inline double getSum() const {
-      return sum ;
-    }
-
-    void record(string model);
-    
-    friend void generatePerplexityReport(const Perplexity&, const Perplexity&, 
-					 const Perplexity&, const Perplexity&, 
-					 ostream&, int, int, bool); 
-};
-
-
-#endif

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/Pointer.h
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/Pointer.h b/ext/giza-pp/GIZA++-v2/Pointer.h
deleted file mode 100644
index 4892656..0000000
--- a/ext/giza-pp/GIZA++-v2/Pointer.h
+++ /dev/null
@@ -1,175 +0,0 @@
-/*
-
-Copyright (C) 1997,1998,1999,2000,2001  Franz Josef Och (RWTH Aachen - Lehrstuhl fuer Informatik VI)
-
-This file is part of GIZA++ ( extension of GIZA ).
-
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-This program is distributed in the hope that it will be useful, 
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
-USA.
-
-*/
-#ifndef HEADER_Pointer_DEFINED
-#define HEADER_Pointer_DEFINED
-
-#include <cassert>
-#include <ostream>
-
-template<class T>
-class SmartPointer
-{
- protected:
-  T*p;
- public:
-  SmartPointer(T*_p=0) 
-    : p(_p) {}
-  inline T&operator*() const 
-    {return *p;}
-  inline T*operator->() const 
-    {return p;}
-  inline operator bool() const 
-    {return p!=0;}
-  inline T*ptr() const
-    { return p; }
-};
-template<class T> inline ostream &operator<<(ostream&out,const SmartPointer<T>&s)
-{if( s.ptr() )return out << *s;else return out <<"nullpointer";}
-
-
-template<class T>
-class SmartPointerConst
-{
- protected:
-  const T*p;
- public:
-  SmartPointerConst(const T*_p=0) 
-    : p(_p) {}
-  inline const T&operator*() const 
-    {return *p;}
-  inline const T*operator->() const 
-    {return p;}
-  inline operator bool() const
-    {return p!=0;}
-  inline const T*ptr() const
-    { return p; }
-};
-template<class T> inline ostream &operator<<(ostream&out,const SmartPointerConst<T>&s)
-{if( s.ptr() )return out << *s;else return out <<"nullpointer";}
-
-template <class T>
-class UP : public SmartPointer<T>
-{
- public:
-  UP(T*_p=0) 
-    : SmartPointer<T>(_p) {}
-};
-template<class T> inline bool operator==(const UP<T>&s1,const UP<T>&s2)
-{return s1.ptr()==s2.ptr();}
-template<class T>  inline bool operator<(const UP<T>&s1,const UP<T>&s2)
-{return s1.ptr() < s2.ptr();}
-template<class T> inline int Hash(const UP<T> &wp)
-{if(wp.ptr())return Hash(*wp);else return 0;}
-
-
-template <class T>
-class UPConst : public SmartPointerConst<T>
-{
- public:
-  UPConst(const T*_p=0) 
-    : SmartPointerConst<T>(_p) {}
-};
-template<class T> inline bool operator==(const UPConst<T>&s1,const UPConst<T>&s2)
-{return s1.ptr()==s2.ptr();}
-template<class T> inline bool operator<(const UPConst<T>&s1,const UPConst<T>&s2)
-{return s1.ptr()<s2.ptr();}
-template<class T> inline int Hash(const UPConst<T> &wp)
-{if(wp.ptr())return Hash(*wp);else return 0;}
-
-	
-template <class T>
-class MP : public SmartPointer<T>
-{
- public:
-  MP(T*_p=0) 
-    : SmartPointer<T>(_p) {}
-};
-template <class T> inline bool operator==(const MP<T>&s1,const MP<T>&s2)
-{assert(s1);assert(s2);return *s1==*s2;}
-template <class T> inline bool operator<(const MP<T>&s1,const MP<T>&s2)
-{assert(s1);assert(s2);return *s1 < *s2;}
-template <class T> inline int Hash(const MP<T> &wp)
-{if(wp.ptr())return Hash(*wp);else return 0;}
-
-
-template <class T>
-class MPConst : public SmartPointerConst<T>
-{
- public:
-  MPConst(const T*_p=0) 
-    : SmartPointerConst<T>(_p) {}
-};
-template <class T> inline bool operator==(const MPConst<T>&s1,const MPConst<T>&s2)
-{assert(s1);assert(s2);return *s1== *s2;}
-template <class T> inline bool operator<(const MPConst<T>&s1,const MPConst<T>&s2)
-{assert(s1);assert(s2);return *s1 < *s2;}
-template <class T> inline int Hash(const MPConst<T> &wp)
-{if(wp.ptr())return Hash(*wp);else return 0;}
-
-
-template <class T> 
-class DELP : public SmartPointer<T>
-{
- private:
-  DELP(const DELP<T>&x);
- public:
-  const DELP<T>&operator=(DELP<T>&x)
-  {
-    delete this->p;
-    this->p=x.p;x.p=0;
-    return *this;
-  }
-
-  ~DELP()
-    { delete this->p;this->p=0;}
-  DELP(T*_p=0) 
-    : SmartPointer<T>(_p) {}
-  void set(T*_p)
-    {
-      delete this->p;
-      this->p=_p;
-    }
-  friend bool operator==(const DELP<T>&s1,const DELP<T>&s2)
-    {
-      return *(s1.p)== *(s2.p);
-    }
-  friend bool operator<(const DELP<T>&s1,const DELP<T>&s2)
-    {
-      return *(s1.p) < *(s2.p);
-    }
-  friend inline int Hash(const DELP<T> &wp)
-    {
-      if(wp.p)
-	return Hash(*wp.p);
-      else 
-	return 0;
-    }
-};
-#endif
-
-
-
-
-
-
-

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/ef91969a/ext/giza-pp/GIZA++-v2/README
----------------------------------------------------------------------
diff --git a/ext/giza-pp/GIZA++-v2/README b/ext/giza-pp/GIZA++-v2/README
deleted file mode 100644
index 25af288..0000000
--- a/ext/giza-pp/GIZA++-v2/README
+++ /dev/null
@@ -1,508 +0,0 @@
-========================================================================
-GIZA++ is an extension of the program GIZA.
-It is a program for learning statistical translation models from
-bitext. It is an implementation of the models described in
-(Brown et al., 1993), (Vogel et al., 1996), (Och et al., 2000a), 
-(Och et al., 2000b).
-========================================================================
-
-
-
-CONTENTS of this README file:
-
-Part I: GIZA Package Contents
-Part II: How To Compile GIZA
-Part III: How to Run GIZA
-Part IV: Input File Formats
-     A. VOCABULARY FILES
-     B. Bitext Files
-     C. Dictionary File (optional)
-Part V: Output File Formats:
-     A. PROBABILITY TABLES
-	1. T TABLE (translation table)
-	2. N TABLE (Fertility table)
-	3. P0 TABLE
-	4. A TABLE
-	5. D3 TABLE
-	6. D4 TABLE
-	7. D5 TABLE
-	8. HMM TABLE
-     B. ALIGNMENT FILE
-     C. Cross Entropy and Perplexity Files
-     D. Revised Vocabulary files
-Part VI: Literature
-Part VII: New features
-
-HISTORY of this README file:
-
-GIZA++:
-edited: 11 Jan. 2000, Franz Josef Och
-GIZA:
-edited: 16 Aug. 1999, Dan Melamed
-edited: 13 Aug. 1999, Yaser Al-Onaizan
-edited: 20 July 1999, Yaser Al-Onaizan
-edited: 15 July 1999, Yaser Al-Onaizan
-edited: 13 July 1999, Noah Smith
-========================================================================
-
-Part 0: What is GIZA++
-
-GIZA++ is an extension of the program GIZA (part of the SMT toolkit
-EGYPT - http://www.clsp.jhu.edu/ws99/projects/mt/toolkit/ ) which was
-developed by the Statistical Machine Translation team during the
-summer workshop in 1999 at the Center for Language and Speech
-Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a
-lot of additional features. The extensions of GIZA++ were designed and
-written by Franz Josef Och.
-
-Features of GIZA++ not in GIZA:
-
-- Implements full IBM-4 alignment model with a dependency of word
-classes as described in (Brown et al. 1993)
-
-- Implements IBM-5: dependency on word classes, smoothing, ...
-
-- Implements HMM alignment model: Baum-Welch training, Forward-Backward
-algorithm, empty word, dependency on word classes, transfer to
-fertility models, ...
-
-- Implementation of a variant of the IBM-3 and IBM-4
-(-deficientDistortionModel 1) models which allow the training of -p0
-
-- Smoothing for fertility, distortion/alignment parameters
-
-- Significant more efficient training of the fertility models 
-
-- Correct implementation of pegging as described in (Brown et
-al. 1993), implemented a series of heuristics in order to make pegging
-sufficiently efficient
-
-- Completely new parameter mechanism: allows to easily add additional
-parameters
-
-- Improved perplexity calculation for models IBM-1, IBM-2 and HMM (the
-parameter of the Poisson-distribution of the sentence lengths is
-computed automatically from the used training corpus)
-
-========================================================================
-Part I: GIZA++ Package Programs
-
-GIZA++: GIZA++ itself
-
-plain2snt.out: simple tool to transform plain text into GIZA text
-format
-
-snt2plain.out: simple tool to transform GIZA text format into plain
-text
-
-trainGIZA++.sh: Shell script to perform standard training given a
-corpus in GIZA text format
-
-========================================================================
-Part II: How To Compile GIZA++
-
-In order to compile GIZA++ you may need:
-- recent version of the GNU compiler (2.95 or higher)
-- recent version of assembler and linker which do not have restrictions
-  with respect to the length of symbol names
-
-There is a make file in the src directory that will take care of the
-compilation. The most important targets are:
-
-GIZA++: generates an optimized version 
-
-GIZA++.dbg: generates the debug version 
-
-depend: generates the "dependencies" file (make this whenever you add
-source or header files to the package.
-
-========================================================================
-Part III: How To run GIZA++
-
-It's simple:
-
-GIZA++ [config-file] [options]
-
-All options which expect a parameter could also be used in the
-parameter file. For example the command line options
-
-	GIZA++ -S S.vcb -T T.vcb -C ST.snt
-
-corresponds to the config file:
-
-	S: S.vcb
-	T: T.vcb
-	C: ST.snt
-
-If you call GIZA++ without a parameter you get a list of all the
-options. The option names form GIZA are normally still valid. The
-default values of the parameters typically are optimized with respect
-to the corpora I use and typically give good results. It is
-nevertheless important that these parameters are always optimized for
-every new task.
-
-==========================================================================
-Part IV: Input File Formats
-
-A. VOCABULARY FILES
-
-Each entry is stored on one line as follows:
-
- uniq_id1 string1 no_occurrences1
- uniq_id2 string2 no_occurrences2
- uniq_id3 string3 no_occurrences3
- ....
-
-Here is a sample from an English vocabulary file:
-
-627 abandon 10
-628 abandoned 17
-629 abandoning 2
-630 abandonment 12
-631 abatement 8
-632 abbotsford 2
-
-uniq_ids are sequential positive integer numbers.  0 is reserved for
-the special token NULL.
-
-
-B. Bitext Files
-
-Each sentence pair is stored in three lines. The first line
-is the number of times this sentence pair occurred. The second line is
-the source sentence where each token is replaced by its unique integer
-id from the vocabulary file and the third is the target sentence in
-the same format.
-
-Here's a sample of 3 sentences from English/french corpus:
-
-1
-1 1 226 5008 621 6492 226 6377 6813 226 9505 5100 6824 226 5100 5222 0 614 10243 613
-2769 155 7989 585 1 578 6503 585 8242 578 8142 8541 578 12328 6595 8550 578 6595 6710 1
-1
-1 1 226 6260 11856 11806 1293
-11 1 1 11 155 14888 2649 11447 9457 8488 4168
-1
-1 1 226 7652 1 226 5337 226 6940 12089 5582 8076 12050
-1 1 155 4140 6812 153 1 154 155 14668 15616 10524 9954 1392
-
-C. Dictionary File
-
-This is optional. The dictionary file is of the format:
-
-target_word_id source_word_id
-
-The list should be sorted by the target_word_id.
-
-C. Dictionary Files
-
-If you provide a dictionary and list it in the configuration file,
-GIZA++ will change the cooccurrence counting in the first iteration
-of model 1 to honor the so-called "Dictionary Constraint":
-
-	In parallel sentences "e1 ... en" and "f1 ... fm",
-	ei and fi are counted as a coocurrence pair if one of two
-	conditions is met:  1.) ei and fi occur as an entry in the
-	dictionary, or 2.) ei does not occur in the dictionary with
-	any fj (1 <= j <= m) and fi does not occur in the dictionary
-	with any ej (1 <= j <= n).
-
-The dictionary must a list of pairs, one per line:
-
-	F E
-
-where F is an integer of a target token, and E is the integer of a
-source token.  F may be listed with other Es, and vice versa.
-
-Important:  The dictionary must be sorted by the F integers!
-
-==========================================================================
-Part V: Output File Formats:
-
-For file names, we will use the prefix "prob_table".  This can be
-changed using the -o switch.  The default is a combination of user id
-and time stamp.
-
-
-A. PROBABILITY TABLES
-
-Normally, Model1 is trained first, and the result is used to start
-Model2 training.  Then Model2 is transfered to Model3.  Model3 viterbi
-training follows.  This sequence can be adjusted by the various
-options, either on the command line or in a config file.
-
-1. T TABLE ( *.t3.* )
-
-(translation table)
-
- prob_table.t1.n = t table after n iterations of Model1 training
- prob_table.t2.n = t table after n iterations of Model2 training
- prob_table.t2to3 = t table after transfering Model2 to Model3
- prob_table.t3.n = t table after n iterations of Model3 training
- prob_table.4.n = t table after n iterations of Model4 training
-
-Each line is of the following format:
-
-s_id t_id P(t_id/s_id)
-
-where: 
- s_id: is the unique id for the source token
- t_id: is the unique id for the target token
- P(t_id/s_id) the probability of translating s_id as t_id
-
-sample part of a file:
-
-3599 5697 0.0628115
-2056 10686 0.000259988
-8227 3738 3.57132e-13
-5141 13720 5.52332e-12
-10798 4102 6.53047e-06
-8227 3750 6.97502e-14
-7712 14080 6.0365e-20
-7712 14082 2.68323e-17
-7713 1083 3.94464e-15
-7712 14084 2.98768e-15
-
-Similar files will be generated (with the prefix
-"prob_table.actual.xxx" that has the actual tokens instead of their
-unique ids). This is also true for fertility tables. Also the inverse
-probability table will be generated for the final table and it will
-have the infix "ti" .
-
-2. N TABLE ( *.n3.* )
-
-(Fertility table)
- 
- prob_table.n2to3 = n table estimated during the transfer from M2 to M3
- ptob_table.n3.X = n table after X iterations of model3
-
-Each line in this file is of the following format:
-
-source_token_id p0 p1 p2 .... pn
-
-where p0 is the probability that the source token has zero fertility;
-p1, fertility one, ...., and n is the maximum possible fertility as
-defined in the program.
-
-sample:
-
-1 0.475861 0.282418 0.133455 0.0653083 0.0329326 0.00844979 0.0014008
-10 0.249747 0.000107778 0.307767 0.192208 0.0641439 0.15016 0.0358886
-11 0.397111 0.390421 0.19925 0.013382 2.21286e-05 0 0
-12 0.0163432 0.560621 0.374745 0.00231588 0 0 0
-13 1.78045e-07 0.545694 0.299573 0.132127 0.0230494 9.00322e-05 0
-14 1.41918e-18 0.332721 0.300773 0.0334969 0 0 0
-15 0 5.98626e-10 0.47729 0.0230955 0 0 0
-17 0 1.66346e-07 0.895883 0.103948 0 0 0
-
-
-3. P0 TABLE ( *.p0* )
-
-(1 - P0 is the probability of inserting a null after a
-   source word.)
- 
-This file contains only one line with one real number which is the
-value of P0, the probability of not inserting a NULL token.
-
-
-4. A TABLE ( *.a[23].* )
-
-The file names follow the naming conventions above. The format of each
-line is as follows:
-
-i j l m p(i | j, l, m)
-
-where i, j, l, m are all integers and
- j = position in target sentence
- i = position in source sentence
- l = length of source sentence
- m = length of target sentence
-and p(i/j,l,m) is the probability that a source word in position i is
-moved to position j in a pair of sentences of length l and m.
-
-sample:
-
-15 14 15 14 0.630798
-15 14 15 15 0.414137
-15 14 15 16 0.268919
-15 14 15 17 0.23171
-15 14 15 18 0.117311
-15 14 15 19 0.119202
-15 14 15 20 0.111369
-15 14 15 21 0.0358169
-
-
-5. D3 TABLE ( *.d3.* )
-
-distortion table
-
-The format is similar to the A table with a slight difference --- the
-position of i & j are switched:
-
-j i l m p(j/i,l,m)
-
-sample:
-
-15 14 14 15 0.286397
-15 14 14 16 0.138898
-15 14 14 17 0.109712
-15 14 14 18 0.0868322
-15 14 14 19 0.0535823
-
-6. D4 TABLE: (( *.d4.* )
-
-distortion table for IBM-4
-
-7. D5 TABLE: ( *.d5.* )
-
-distortion table for IBM-5
-
-8. HMM TABLE: ( *.hhmm.* )
-
-alignment probability table for HMM alignment model
-
-B. ALIGNMENT FILE ( *.A3.* )
-
-In each iteration of the training, and for each sentence pair in the
-training set, the best alignment (viterbi alignment) is written to the
-alignment file (if the dump parameters are set accordingly). The
-alignment file is named prob_table.An.i, where n is the model number
-({1,2, 2to3, 3 or 4}), and i is the iteration number. The format of
-the alignments file is illustrated in the following sample:
-
-# Sentence pair (1)
-il s' agit de la m\ufffdme soci\ufffdt\ufffd qui a chang\ufffd de propri\ufffdtaires
-NULL ({ }) UNK ({ }) UNK ({ }) ( ({ }) this ({ 4 11 }) is ({ }) the ({ }) same ({ 6 }) agency ({ }) which ({ 8 }) has ({ }) undergone ({ 1 2 3 7 9 10 12 }) a ({ }) change ({ 5 }) of ({ }) UNK ({ })
-# Sentence pair (2)
-UNK UNK , le propri\ufffdtaire , dit que cela s' est produit si rapidement qu' il n' en conna\ufffdt pas la cause exacte
-NULL ({ 4 }) UNK ({ 1 2 }) UNK ({ }) , ({ 3 }) the ({ }) owner ({ 5 22 23 }) , ({ 6 }) says ({ 7 8 }) it ({ }) happened ({ 10 11 12 }) so ({ 13 }) fast ({ 14 19 }) he ({ 16 }) is ({ }) not ({ 20 }) sure ({ 15 17 }) what ({ }) went ({ 18 21 }) wrong ({ 9 })
-
-The alignment file is represented by three lines for each sentence
-pair. The first line is a label that can be used, e.g., as a caption
-for alignment visualization tools.  It contains information about the
-sentence sequential number in the training corpus, sentence lengths,
-and alignment probability. The second line is the target sentence, the
-third line is the source sentence. Each token in the source sentence
-is followed by a set of zero or more numbers. These numbers represent
-the positions of the target words to which this source word is
-connected, according to the alignment.
-
-
-C. Perplexity File ( *.perp )
-
-This file will be generated at the end of training. It summarizes
-perplexity values for each training iteration.  Here is a sample
-perplexity file that illustrates the format. The format is the same
-for cross entropy. If no test corpus was provided, the values for it
-will be set to "N/A".
-
-# train-size test-size iter. model train-perplexity test-perplexity final(y/n) train-viterbi-perp test-viterbi-perp
-	447136 9625 0 1 187067 186722 n 3.34328e+06 3.35352e+06
-	447136 9625 1 1 192.88 248.763 n 909.879 1203.13
-	447136 9625 2 1 99.45 139.214 n 316.363 459.745
-	447136 9625 3 1 83.4746 126.046 n 214.612 341.27
-	447136 9625 4 1 78.6939 124.914 n 179.218 303.169
-	447136 9625 5 2 76.6848 125.986 n 161.874 286.226
-	447136 9625 6 2 50.7452 86.2273 n 84.7227 151.701
-	447136 9625 7 2 42.9178 74.5574 n 63.6644 116.034
-	447136 9625 8 2 40.0651 70.7444 n 56.3186 104.274
-	447136 9625 9 2 38.8471 69.4105 n 53.1277 99.6044
-	447136 9625 10 2to3 38.2561 68.9576 n 51.4856 97.4414
-	447136 9625 11 3 129.993 248.885 n 86.6675 165.012
-	447136 9625 12 3 79.2212 169.902 n 86.4842 171.367
-	447136 9625 13 3 75.0746 164.488 n 84.9647 172.639
-	447136 9625 14 3 73.412 162.765 n 83.5762 172.797
-	447136 9625 15 3 72.6107 162.254 y 82.4575 172.688
-
-
-D. Revised Vocabulary files (*.src.vcb, *.trg.vcb)
-
-The revised vocabulary files are similar in format to the original
-vocabulary files. The only exceptions is that the frequency for each
-token is calculated from the given corpus (i.e. it is exact), which is
-not required in the input.
-
-E. final parameter file: ( *.gizacfg )
-
-This file includes all the parameter settings that were used in order
-to perform this training. This means that starting GIZA using this
-parameter file produces (should produce) the same training.
-
-
-
-Part VI: LITERATURE
--------------------
-
-The following two articles include a comparison of the alignment
-models implemented in GIZA++:
-
-@INPROCEEDINGS{och00:isa,
-	AUTHOR = {F.~J.~Och and H.~Ney},
-	TITLE ={Improved Statistical Alignment Models},
-	BOOKTITLE = ACL00 ,
-	PAGES ={440--447},
-	ADDRESS={ Hongkong, China},
-	MONTH = {October},
-	YEAR = 2000}
-
-@INPROCEEDINGS{och00:aco,
-	AUTHOR =  {F.~J.~Och and H.~Ney},
-	TITLE = {A Comparison of Alignment Models for Statistical Machine Translation},
-	BOOKTITLE = COLING00,
-	ADDRESS	= {Saarbr\"ucken, Germany},
-	YEAR = {2000},
-	MONTH = {August},
-        PAGES = {1086--1090}
-	}
-
-The following article describes the statistical machine translation
-toolkit EGYPT:
-
-@MISC{ alonaizan99:smt,
-AUTHOR = {Y. Al-Onaizan and J. Curin and M. Jahr and K. Knight and J. Lafferty and I. D. Melamed and F. J. Och and D. Purdy and N. A. Smith and D. Yarowsky},
-TITLE = {Statistical Machine Translation, Final Report, {JHU} Workshop},
-YEAR = {1999},
-ADDRESS = {Baltimore, Maryland, MD},
-NOTE={{\tt http://www.clsp.jhu.edu/ws99/projects/ mt/final\_report/mt-final-report.ps}}
-}
-
-
-The implemented alignment models IBM-1 to IBM-5 and HMM were originally described in:
-
-@ARTICLE{brown93:tmo,
-	AUTHOR	= {Brown, P. F. and Della Pietra, S. A. and Della Pietra, V. J. and Mercer, R. L.},
-	TITLE	= {The Mathematics of Statistical Machine Translation: Parameter Estimation},
-	JOURNAL	= {Computational Linguistics},
-	YEAR	= 1993,
-	VOLUME	= 19,
-	NUMBER	= 2,
-	PAGES	= {263--311}
-}
-
-@INPROCEEDINGS{ vogel96:hbw,
-	AUTHOR	= {Vogel, S. and Ney, H. and Tillmann, C.},
-	TITLE	= {{HMM}-Based Word Alignment in Statistical Translation},
-	YEAR	= 1996,
-	PAGES	= {836--841},
-	MONTH	= {August},
-	ADDRESS	= {Copenhagen},
-	BOOKTITLE	= COLING96
-}
-
-
-Part VII: New features
-======================
-
-2003-06-09: 
-
-- new parameter "-nbestalignments N": prints an N-best list of
-  alignments into a file *.NBEST
-
-- If program is compiled with "-DBINARY_SEARCH_FOR_TTABLE", it uses
-  more memory-efficient data structures for the t table (vector with
-  binary search instead of hash table). Then, the program expects a
-  parameter "-CoocurrenceFile FILE" which specifies a file which
-  includes all lexical coccurrences in the training corpus. This file
-  can be produced by the snt2cooc.out tool.
-
-