You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fred Gilmore <fg...@mail.utexas.edu> on 2010/11/18 22:09:51 UTC

using DIH with mets/alto file sets

mets/alto is an xml standard for describing physical objects.  In this 
case, we're describing books.  The mets file holds the metadata (author, 
title, etc.), the alto file is the physical description (words on the 
page, formatting of the page).  So it's a one (mets) to many (alto) 
relationship.

the directory structure:

/our/collection/IDxxx/:

IDxxx-mets.xml
ALTO/

/our/collection/IDxxx/ALTO/:

IDxxx-ALTO001.xml
IDxxx-ALTO002.xml

ie. an xml file per scanned book page.

Beyond the ID number as part of the file names, the mets file contains 
no reference to the alto children.  The alto children do contain a 
reference to the jpg page scan, which is labelled with the ID number as 
part of the name.

The idea is to create a full text index of the alto content, accompanied 
by the author/title info from the mets file for purposes of results 
display.  The first try with this is attempting a recursive 
FileDataSource approach.

It was relatively easy to create a "content" field which holds the text 
of the page (each word is actually an attribute of a separate tag), but 
I'm having difficulty determining how I'm going to conditionally add the 
author and title data from the METS file to the rows created with the 
ALTO content field.  It'll involve regex'ing out the ID number 
associated with both the mets and alto filenames for starters, but even 
at that, I don't see how to keep it straight since it's not one mets=one 
alto and it's also not a static string for the entire index.

thanks for any hints you can provide.

Fred
University of Texas at Austin
==========================================
data-config.xml thus far:

<dataConfig>
<dataSource type="FileDataSource" />
<document>
<entity name="landscapes" rootEntity="false" 
processor="FileListEntityProcessor" fileName=".xml$" recursive="true"
baseDir="/home/utlol/htdocs/lib-landscapes-new/publications/">
<entity name="sample" rootEntity="true"
stream="true"
pk="filename"
url="${landscapes.fileAbsolutePath}"
processor="XPathEntityProcessor"
forEach="/mets | /alto"
transformer="TemplateTransformer,RegexTransformer,LogTransformer"
logTemplate=" processing ${landscapes.fileAbsolutePath}"
logLevel="info"
 >

<!-- use system filename for getting OCLC number -->
<!-- we need it both for linking to results and for referencing the METS 
file -->
<field column="fileAbsPath"     template="${landscapes.fileAbsolutePath}" />


<field column="title"   
xpath="/mets/dmdSec/mdWrap/xmlData/mods/titleInfo/title" />
<!--
<field column="author"  
xpath="/mets/dmdSec/mdWrap/xmlData/mods/name[@ID='MODSMD_PRINT_N1']/namePart[@type='given']" 
/>
-->
<field column="filename" 
xpath="/alto/Description/sourceImageInformation/fileName" />
<field column="content" 
xpath="/alto/Layout/Page/PrintSpace/TextBlock/TextLine/String/@CONTENT" />
</entity>
</entity>
</document>
</dataConfig>
==============================================
METS example:

<?xml version="1.0" encoding="UTF-8"?>
<mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns="http://www.loc.gov/METS/" 
xsi:schemaLocation="http://www.loc.gov/METS/ 
http://schema.ccs-gmbh.com/docworks/version20/mets-docworks.xsd" 
xmlns:MODS="http://www.loc.gov/mods/v3" 
xmlns:mix="http://www.loc.gov/mix/" 
xmlns:xlink="http://www.w3.org/1999/xlink" TYPE="METAe_Monograph" 
LABEL="ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE- 
Kingsville Area">
<metsHdr CREATEDATE="2010-05-06T11:21:18" LASTMODDATE="2010-05-06T11:21:18">
<agent ROLE="CREATOR" TYPE="OTHER" OTHERTYPE="SOFTWARE">
<name>CCS docWORKS/METAe Version 6.3-0</name>
<note>docWORKS-ID: 1677</note>
</agent>
</metsHdr>
<dmdSec ID="MODSMD_PRINT">
<mdWrap MIMETYPE="text/xml" MDTYPE="MODS" LABEL="Bibliographic meta-data 
of the printed version">
<xmlData>
<MODS:mods>
<MODS:titleInfo ID="MODSMD_PRINT_TI1" xml:lang="en">
<MODS:title>ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE- 
Kingsville Area</MODS:title>
</MODS:titleInfo>
<MODS:name ID="MODSMD_PRINT_N1" type="personal">
<MODS:namePart type="given">L F. Brown, Jr., J. H. McGowen, T. J. Evans, 
C. G.</MODS:namePart>
<MODS:namePart type="family">Groat</MODS:namePart>
<MODS:role>
<MODS:roleTerm>aut</MODS:roleTerm>
</MODS:role>
</MODS:name>
<MODS:name ID="MODSMD_PRINT_N2" type="personal">
<MODS:namePart type="given">W. L.</MODS:namePart>
<MODS:namePart type="family">Fisher</MODS:namePart>
<MODS:role>
<MODS:roleTerm>aut</MODS:roleTerm>
</MODS:role>
</MODS:name>

============================================
ALTO example:

<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="http://schema.ccs-gmbh.com/metae/alto-1-1.xsd" 
xmlns:xlink="http://www.w3.org/TR/xlink">
<Description>
<MeasurementUnit>mm10</MeasurementUnit>
<sourceImageInformation>
<fileName>/Docworks/IN/GeologyBooks/txu-oclc-6917337/txu-oclc-6917337-009.jpg</fileName>
</sourceImageInformation>
<OCRProcessing ID="OCRPROCESSING_1">
<preProcessingStep>
<processingSoftware>
<softwareCreator>CCS Content Conversion Specialists GmbH, 
Germany</softwareCreator>
<softwareName>CCS docWORKS</softwareName>
<softwareVersion>6.3-0.93</softwareVersion>
</processingSoftware>
</preProcessingStep>
<ocrProcessingStep>
<processingSoftware>
<softwareCreator>ABBYY (BIT Software), Russia</softwareCreator>
<softwareName>FineReader</softwareName>
<softwareVersion>7.0</softwareVersion>
</processingSoftware>
</ocrProcessingStep>
</OCRProcessing>
</Description>
<Styles>
<TextStyle ID="TXT_0" FONTSIZE="11" FONTFAMILY="Times New Roman"/>
<ParagraphStyle ID="PAR_CENTER" ALIGN="Center"/>
<ParagraphStyle ID="PAR_BLOCK" ALIGN="Block"/>
<ParagraphStyle ID="PAR_RIGHT" ALIGN="Right"/>
<ParagraphStyle ID="PAR_LEFT" ALIGN="Left"/>
</Styles>
<Layout>
<Page ID="P9" PHYSICAL_IMG_NR="9" HEIGHT="2855" WIDTH="2258">
<TopMargin ID="P9_TM00001" HPOS="0" VPOS="0" WIDTH="2258" HEIGHT="196"/>
<LeftMargin ID="P9_LM00001" HPOS="0" VPOS="196" WIDTH="151" HEIGHT="2345"/>
<RightMargin ID="P9_RM00001" HPOS="2104" VPOS="196" WIDTH="154" 
HEIGHT="2345"/>
<BottomMargin ID="P9_BM00001" HPOS="0" VPOS="2541" WIDTH="2258" 
HEIGHT="314"/>
<PrintSpace ID="P9_PS00001" HPOS="151" VPOS="196" WIDTH="1953" 
HEIGHT="2345">
<TextBlock ID="P9_TB00001" HPOS="1045" VPOS="196" WIDTH="173" 
HEIGHT="28" STYLEREFS="TXT_0 PAR_CENTER">
<TextLine ID="P9_TL00001" HPOS="1045" VPOS="197" WIDTH="173" HEIGHT="27">
<String ID="P9_ST00001" HPOS="1045" VPOS="197" WIDTH="173" HEIGHT="27" 
CONTENT="Preface" WC="0.98" CC="0000000"/>
</TextLine>




Re: using DIH with mets/alto file sets

Posted by Alexey Serba <as...@gmail.com>.
> The idea is to create a full text index of the alto content, accompanied by the author/title info from the mets file for purposes of results display.

- Then you need to list only alto files in your landscapes entity
(fileName="^ID.{3}-ALTO\d{3}.xml$" or something like that), because
you don't want to index every mets file as a separate solr document,
right?

- Also it seems you might want to try to add regex transformer that
extract ID from avto file name
   <field column="metsId" regex="ID(.{3})-ALTO\d{3}.xml"
sourceColName="${landscapes.fileAbsolutePath} or fileAbsolutePath"/>

- And finally add nested entity to process mets file for every alto record

"<entity name="landscapes" ...>
  <entity name="sample">
    <entity name="metsProcessor"
url="${landscapes.fileAbsolutePath}../ID${sample.metsId}-mets.xml"
processor="XPathEntityProcessor" forEach="/mets"
transformer="TemplateTransformer,RegexTransformer,LogTransformer">"
and extract mets elements/attributes and index them as a separate fields.

P.S. I haven't tried similar scenario, so just speculating

On Fri, Nov 19, 2010 at 12:09 AM, Fred Gilmore <fg...@mail.utexas.edu> wrote:
> mets/alto is an xml standard for describing physical objects.  In this case,
> we're describing books.  The mets file holds the metadata (author, title,
> etc.), the alto file is the physical description (words on the page,
> formatting of the page).  So it's a one (mets) to many (alto) relationship.
>
> the directory structure:
>
> /our/collection/IDxxx/:
>
> IDxxx-mets.xml
> ALTO/
>
> /our/collection/IDxxx/ALTO/:
>
> IDxxx-ALTO001.xml
> IDxxx-ALTO002.xml
>
> ie. an xml file per scanned book page.
>
> Beyond the ID number as part of the file names, the mets file contains no
> reference to the alto children.  The alto children do contain a reference to
> the jpg page scan, which is labelled with the ID number as part of the name.
>
> The idea is to create a full text index of the alto content, accompanied by
> the author/title info from the mets file for purposes of results display.
>  The first try with this is attempting a recursive FileDataSource approach.
>
> It was relatively easy to create a "content" field which holds the text of
> the page (each word is actually an attribute of a separate tag), but I'm
> having difficulty determining how I'm going to conditionally add the author
> and title data from the METS file to the rows created with the ALTO content
> field.  It'll involve regex'ing out the ID number associated with both the
> mets and alto filenames for starters, but even at that, I don't see how to
> keep it straight since it's not one mets=one alto and it's also not a static
> string for the entire index.
>
> thanks for any hints you can provide.
>
> Fred
> University of Texas at Austin
> ==========================================
> data-config.xml thus far:
>
> <dataConfig>
> <dataSource type="FileDataSource" />
> <document>
> <entity name="landscapes" rootEntity="false"
> processor="FileListEntityProcessor" fileName=".xml$" recursive="true"
> baseDir="/home/utlol/htdocs/lib-landscapes-new/publications/">
> <entity name="sample" rootEntity="true"
> stream="true"
> pk="filename"
> url="${landscapes.fileAbsolutePath}"
> processor="XPathEntityProcessor"
> forEach="/mets | /alto"
> transformer="TemplateTransformer,RegexTransformer,LogTransformer"
> logTemplate=" processing ${landscapes.fileAbsolutePath}"
> logLevel="info"
>>
>
> <!-- use system filename for getting OCLC number -->
> <!-- we need it both for linking to results and for referencing the METS
> file -->
> <field column="fileAbsPath"     template="${landscapes.fileAbsolutePath}" />
>
>
> <field column="title"
> xpath="/mets/dmdSec/mdWrap/xmlData/mods/titleInfo/title" />
> <!--
> <field column="author"
>  xpath="/mets/dmdSec/mdWrap/xmlData/mods/name[@ID='MODSMD_PRINT_N1']/namePart[@type='given']"
> />
> -->
> <field column="filename"
> xpath="/alto/Description/sourceImageInformation/fileName" />
> <field column="content"
> xpath="/alto/Layout/Page/PrintSpace/TextBlock/TextLine/String/@CONTENT" />
> </entity>
> </entity>
> </document>
> </dataConfig>
> ==============================================
> METS example:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="http://www.loc.gov/METS/"
> xsi:schemaLocation="http://www.loc.gov/METS/
> http://schema.ccs-gmbh.com/docworks/version20/mets-docworks.xsd"
> xmlns:MODS="http://www.loc.gov/mods/v3" xmlns:mix="http://www.loc.gov/mix/"
> xmlns:xlink="http://www.w3.org/1999/xlink" TYPE="METAe_Monograph"
> LABEL="ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE- Kingsville
> Area">
> <metsHdr CREATEDATE="2010-05-06T11:21:18" LASTMODDATE="2010-05-06T11:21:18">
> <agent ROLE="CREATOR" TYPE="OTHER" OTHERTYPE="SOFTWARE">
> <name>CCS docWORKS/METAe Version 6.3-0</name>
> <note>docWORKS-ID: 1677</note>
> </agent>
> </metsHdr>
> <dmdSec ID="MODSMD_PRINT">
> <mdWrap MIMETYPE="text/xml" MDTYPE="MODS" LABEL="Bibliographic meta-data of
> the printed version">
> <xmlData>
> <MODS:mods>
> <MODS:titleInfo ID="MODSMD_PRINT_TI1" xml:lang="en">
> <MODS:title>ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE-
> Kingsville Area</MODS:title>
> </MODS:titleInfo>
> <MODS:name ID="MODSMD_PRINT_N1" type="personal">
> <MODS:namePart type="given">L F. Brown, Jr., J. H. McGowen, T. J. Evans, C.
> G.</MODS:namePart>
> <MODS:namePart type="family">Groat</MODS:namePart>
> <MODS:role>
> <MODS:roleTerm>aut</MODS:roleTerm>
> </MODS:role>
> </MODS:name>
> <MODS:name ID="MODSMD_PRINT_N2" type="personal">
> <MODS:namePart type="given">W. L.</MODS:namePart>
> <MODS:namePart type="family">Fisher</MODS:namePart>
> <MODS:role>
> <MODS:roleTerm>aut</MODS:roleTerm>
> </MODS:role>
> </MODS:name>
>
> ============================================
> ALTO example:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:noNamespaceSchemaLocation="http://schema.ccs-gmbh.com/metae/alto-1-1.xsd"
> xmlns:xlink="http://www.w3.org/TR/xlink">
> <Description>
> <MeasurementUnit>mm10</MeasurementUnit>
> <sourceImageInformation>
> <fileName>/Docworks/IN/GeologyBooks/txu-oclc-6917337/txu-oclc-6917337-009.jpg</fileName>
> </sourceImageInformation>
> <OCRProcessing ID="OCRPROCESSING_1">
> <preProcessingStep>
> <processingSoftware>
> <softwareCreator>CCS Content Conversion Specialists GmbH,
> Germany</softwareCreator>
> <softwareName>CCS docWORKS</softwareName>
> <softwareVersion>6.3-0.93</softwareVersion>
> </processingSoftware>
> </preProcessingStep>
> <ocrProcessingStep>
> <processingSoftware>
> <softwareCreator>ABBYY (BIT Software), Russia</softwareCreator>
> <softwareName>FineReader</softwareName>
> <softwareVersion>7.0</softwareVersion>
> </processingSoftware>
> </ocrProcessingStep>
> </OCRProcessing>
> </Description>
> <Styles>
> <TextStyle ID="TXT_0" FONTSIZE="11" FONTFAMILY="Times New Roman"/>
> <ParagraphStyle ID="PAR_CENTER" ALIGN="Center"/>
> <ParagraphStyle ID="PAR_BLOCK" ALIGN="Block"/>
> <ParagraphStyle ID="PAR_RIGHT" ALIGN="Right"/>
> <ParagraphStyle ID="PAR_LEFT" ALIGN="Left"/>
> </Styles>
> <Layout>
> <Page ID="P9" PHYSICAL_IMG_NR="9" HEIGHT="2855" WIDTH="2258">
> <TopMargin ID="P9_TM00001" HPOS="0" VPOS="0" WIDTH="2258" HEIGHT="196"/>
> <LeftMargin ID="P9_LM00001" HPOS="0" VPOS="196" WIDTH="151" HEIGHT="2345"/>
> <RightMargin ID="P9_RM00001" HPOS="2104" VPOS="196" WIDTH="154"
> HEIGHT="2345"/>
> <BottomMargin ID="P9_BM00001" HPOS="0" VPOS="2541" WIDTH="2258"
> HEIGHT="314"/>
> <PrintSpace ID="P9_PS00001" HPOS="151" VPOS="196" WIDTH="1953"
> HEIGHT="2345">
> <TextBlock ID="P9_TB00001" HPOS="1045" VPOS="196" WIDTH="173" HEIGHT="28"
> STYLEREFS="TXT_0 PAR_CENTER">
> <TextLine ID="P9_TL00001" HPOS="1045" VPOS="197" WIDTH="173" HEIGHT="27">
> <String ID="P9_ST00001" HPOS="1045" VPOS="197" WIDTH="173" HEIGHT="27"
> CONTENT="Preface" WC="0.98" CC="0000000"/>
> </TextLine>
>
>
>
>

Re: using DIH with mets/alto file sets

Posted by Lance Norskog <go...@gmail.com>.
Some ideas:

XPathEntityProcessor parses a very limited XPath syntax. However, you
can add an XSL script as an attribute, and this somehow gets called
instead.

With this, you might be able to create an XPath that selects out every
combination that you want.

A second option: SOLR-1499 is an entity processor that fetches from a
Solr instance. You could index all of the Alto records in one pass,
then fetch back each all records for each Mets record that you want to
associate with the alto record  and re-index the Alto document with
the new data.

But really this sounds like a database join problem.

On Thu, Nov 18, 2010 at 1:09 PM, Fred Gilmore <fg...@mail.utexas.edu> wrote:
> mets/alto is an xml standard for describing physical objects.  In this case,
> we're describing books.  The mets file holds the metadata (author, title,
> etc.), the alto file is the physical description (words on the page,
> formatting of the page).  So it's a one (mets) to many (alto) relationship.
>
> the directory structure:
>
> /our/collection/IDxxx/:
>
> IDxxx-mets.xml
> ALTO/
>
> /our/collection/IDxxx/ALTO/:
>
> IDxxx-ALTO001.xml
> IDxxx-ALTO002.xml
>
> ie. an xml file per scanned book page.
>
> Beyond the ID number as part of the file names, the mets file contains no
> reference to the alto children.  The alto children do contain a reference to
> the jpg page scan, which is labelled with the ID number as part of the name.
>
> The idea is to create a full text index of the alto content, accompanied by
> the author/title info from the mets file for purposes of results display.
>  The first try with this is attempting a recursive FileDataSource approach.
>
> It was relatively easy to create a "content" field which holds the text of
> the page (each word is actually an attribute of a separate tag), but I'm
> having difficulty determining how I'm going to conditionally add the author
> and title data from the METS file to the rows created with the ALTO content
> field.  It'll involve regex'ing out the ID number associated with both the
> mets and alto filenames for starters, but even at that, I don't see how to
> keep it straight since it's not one mets=one alto and it's also not a static
> string for the entire index.
>
> thanks for any hints you can provide.
>
> Fred
> University of Texas at Austin
> ==========================================
> data-config.xml thus far:
>
> <dataConfig>
> <dataSource type="FileDataSource" />
> <document>
> <entity name="landscapes" rootEntity="false"
> processor="FileListEntityProcessor" fileName=".xml$" recursive="true"
> baseDir="/home/utlol/htdocs/lib-landscapes-new/publications/">
> <entity name="sample" rootEntity="true"
> stream="true"
> pk="filename"
> url="${landscapes.fileAbsolutePath}"
> processor="XPathEntityProcessor"
> forEach="/mets | /alto"
> transformer="TemplateTransformer,RegexTransformer,LogTransformer"
> logTemplate=" processing ${landscapes.fileAbsolutePath}"
> logLevel="info"
>>
>
> <!-- use system filename for getting OCLC number -->
> <!-- we need it both for linking to results and for referencing the METS
> file -->
> <field column="fileAbsPath"     template="${landscapes.fileAbsolutePath}" />
>
>
> <field column="title"
> xpath="/mets/dmdSec/mdWrap/xmlData/mods/titleInfo/title" />
> <!--
> <field column="author"
>  xpath="/mets/dmdSec/mdWrap/xmlData/mods/name[@ID='MODSMD_PRINT_N1']/namePart[@type='given']"
> />
> -->
> <field column="filename"
> xpath="/alto/Description/sourceImageInformation/fileName" />
> <field column="content"
> xpath="/alto/Layout/Page/PrintSpace/TextBlock/TextLine/String/@CONTENT" />
> </entity>
> </entity>
> </document>
> </dataConfig>
> ==============================================
> METS example:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="http://www.loc.gov/METS/"
> xsi:schemaLocation="http://www.loc.gov/METS/
> http://schema.ccs-gmbh.com/docworks/version20/mets-docworks.xsd"
> xmlns:MODS="http://www.loc.gov/mods/v3" xmlns:mix="http://www.loc.gov/mix/"
> xmlns:xlink="http://www.w3.org/1999/xlink" TYPE="METAe_Monograph"
> LABEL="ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE- Kingsville
> Area">
> <metsHdr CREATEDATE="2010-05-06T11:21:18" LASTMODDATE="2010-05-06T11:21:18">
> <agent ROLE="CREATOR" TYPE="OTHER" OTHERTYPE="SOFTWARE">
> <name>CCS docWORKS/METAe Version 6.3-0</name>
> <note>docWORKS-ID: 1677</note>
> </agent>
> </metsHdr>
> <dmdSec ID="MODSMD_PRINT">
> <mdWrap MIMETYPE="text/xml" MDTYPE="MODS" LABEL="Bibliographic meta-data of
> the printed version">
> <xmlData>
> <MODS:mods>
> <MODS:titleInfo ID="MODSMD_PRINT_TI1" xml:lang="en">
> <MODS:title>ENVIRONMENTAL GEOLOGIC ATLAS OF THE TEXAS COASTAL ZONE-
> Kingsville Area</MODS:title>
> </MODS:titleInfo>
> <MODS:name ID="MODSMD_PRINT_N1" type="personal">
> <MODS:namePart type="given">L F. Brown, Jr., J. H. McGowen, T. J. Evans, C.
> G.</MODS:namePart>
> <MODS:namePart type="family">Groat</MODS:namePart>
> <MODS:role>
> <MODS:roleTerm>aut</MODS:roleTerm>
> </MODS:role>
> </MODS:name>
> <MODS:name ID="MODSMD_PRINT_N2" type="personal">
> <MODS:namePart type="given">W. L.</MODS:namePart>
> <MODS:namePart type="family">Fisher</MODS:namePart>
> <MODS:role>
> <MODS:roleTerm>aut</MODS:roleTerm>
> </MODS:role>
> </MODS:name>
>
> ============================================
> ALTO example:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:noNamespaceSchemaLocation="http://schema.ccs-gmbh.com/metae/alto-1-1.xsd"
> xmlns:xlink="http://www.w3.org/TR/xlink">
> <Description>
> <MeasurementUnit>mm10</MeasurementUnit>
> <sourceImageInformation>
> <fileName>/Docworks/IN/GeologyBooks/txu-oclc-6917337/txu-oclc-6917337-009.jpg</fileName>
> </sourceImageInformation>
> <OCRProcessing ID="OCRPROCESSING_1">
> <preProcessingStep>
> <processingSoftware>
> <softwareCreator>CCS Content Conversion Specialists GmbH,
> Germany</softwareCreator>
> <softwareName>CCS docWORKS</softwareName>
> <softwareVersion>6.3-0.93</softwareVersion>
> </processingSoftware>
> </preProcessingStep>
> <ocrProcessingStep>
> <processingSoftware>
> <softwareCreator>ABBYY (BIT Software), Russia</softwareCreator>
> <softwareName>FineReader</softwareName>
> <softwareVersion>7.0</softwareVersion>
> </processingSoftware>
> </ocrProcessingStep>
> </OCRProcessing>
> </Description>
> <Styles>
> <TextStyle ID="TXT_0" FONTSIZE="11" FONTFAMILY="Times New Roman"/>
> <ParagraphStyle ID="PAR_CENTER" ALIGN="Center"/>
> <ParagraphStyle ID="PAR_BLOCK" ALIGN="Block"/>
> <ParagraphStyle ID="PAR_RIGHT" ALIGN="Right"/>
> <ParagraphStyle ID="PAR_LEFT" ALIGN="Left"/>
> </Styles>
> <Layout>
> <Page ID="P9" PHYSICAL_IMG_NR="9" HEIGHT="2855" WIDTH="2258">
> <TopMargin ID="P9_TM00001" HPOS="0" VPOS="0" WIDTH="2258" HEIGHT="196"/>
> <LeftMargin ID="P9_LM00001" HPOS="0" VPOS="196" WIDTH="151" HEIGHT="2345"/>
> <RightMargin ID="P9_RM00001" HPOS="2104" VPOS="196" WIDTH="154"
> HEIGHT="2345"/>
> <BottomMargin ID="P9_BM00001" HPOS="0" VPOS="2541" WIDTH="2258"
> HEIGHT="314"/>
> <PrintSpace ID="P9_PS00001" HPOS="151" VPOS="196" WIDTH="1953"
> HEIGHT="2345">
> <TextBlock ID="P9_TB00001" HPOS="1045" VPOS="196" WIDTH="173" HEIGHT="28"
> STYLEREFS="TXT_0 PAR_CENTER">
> <TextLine ID="P9_TL00001" HPOS="1045" VPOS="197" WIDTH="173" HEIGHT="27">
> <String ID="P9_ST00001" HPOS="1045" VPOS="197" WIDTH="173" HEIGHT="27"
> CONTENT="Preface" WC="0.98" CC="0000000"/>
> </TextLine>
>
>
>
>



-- 
Lance Norskog
goksron@gmail.com