You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by William Morgenweck <mo...@gmail.com> on 2015/04/02 17:42:19 UTC

XML file utility

Hi,

 

I've used Lucene.Net on and off for about 10 years, but now I need to push
my limits.  I was wondering if there is any type of XML utility that can
help in the indexing of an XML file.  I've seen some code at
http://www.codewrecks.com/blog/index.php/2012/06/21/getting-started-with-luc
ene-netsearching/ that loops an XML file but was wondering if there might be
something more extensive.  I've even though about possibly putting it into a
NoSQL database and let Lucene.Net do it's magic.  My XML file is make up of
multiple nodes each containing the information of publicly available
information.  Can anyone point me in the most efficient direction?  What I
want to end up with after a search for any item in the XML file is just the
SERIAL NUMBER node and I will build the output on the fly.  I would like the
term to have a higher score that any work in the text.  Also should I
combine all of the Terms into one Field or Index each separately.

 

Thanks,

 

Bill

 

The following is one Node in the XML file that could have as many as 100,000
Nodes.

 

<SERIAL_NUMBER>xxxx</SERIAL_NUMBER>

<STUDY_SECTION>ZRG1</STUDY_SECTION>

<STUDY_SECTION_NAME>Special Emphasis Panel</STUDY_SECTION_NAME>

<SUPPORT_YEAR>1</SUPPORT_YEAR>

<SUFFIX xsi:nil="true"/>

<SUBPROJECT_ID xsi:nil="true"/>

<TOTAL_COST>179278</TOTAL_COST>

<TOTAL_COST_SUB_PROJECT xsi:nil="true"/>

<CORE_PROJECT_NUM>R43OD020158</CORE_PROJECT_NUM>

<CFDA_CODE>351</CFDA_CODE>

<PROGRAM_OFFICER_NAME>xxxxx.</PROGRAM_OFFICER_NAME>

<ED_INST_TYPE xsi:nil="true"/>

<AWARD_NOTICE_DATE>03/11/2015</AWARD_NOTICE_DATE>

<FUNDING_MECHANISM>SBIR-STTR</FUNDING_MECHANISM>

</row>

<row>

<APPLICATION_ID>8834630</APPLICATION_ID>

<ACTIVITY>R44</ACTIVITY>

<ADMINISTERING_IC>CA</ADMINISTERING_IC>

<APPLICATION_TYPE>1</APPLICATION_TYPE>

<ARRA_FUNDED>N</ARRA_FUNDED>

<BUDGET_START>03/11/2015</BUDGET_START>

<BUDGET_END>02/29/2016</BUDGET_END>

<FOA_NUMBER>PAR-14-088</FOA_NUMBER>

<FULL_PROJECT_NUM>1R44CA192460-01</FULL_PROJECT_NUM>

<FUNDING_ICs>NCI:1072220\</FUNDING_ICs>

<FY>2015</FY>

<NIH_SPENDING_CATS xsi:nil="true"/>

<ORG_CITY>Bethesda</ORG_CITY>

<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>

<ORG_DISTRICT>08</ORG_DISTRICT>

<ORG_DUNS>44444444</ORG_DUNS>

<ORG_DEPT xsi:nil="true"/>

<ORG_FIPS>US</ORG_FIPS>

<ORG_STATE>gg</ORG_STATE>

<ORG_ZIPCODE>0000000000</ORG_ZIPCODE>

<IC_NAME>NATIONAL CANCER INSTITUTE</IC_NAME>

<ORG_NAME>yyyyyy</ORG_NAME>

<PIS>

<PI>

<PI_NAME>xxxxx</PI_NAME>

<PI_ID>9086143</PI_ID>

</PI>

</PIS>

<PROJECT_TERMSX>

<TERM>adenoma</TERM>

<TERM>Adult</TERM>

<TERM>aged</TERM>

<TERM>Algorithms</TERM>

<TERM>Anatomy</TERM>

<TERM>angiogenesis</TERM>

<TERM>Animal Model</TERM>

<TERM>Animals</TERM>

<TERM>base</TERM>

<TERM>Benign</TERM>

<TERM>Cancer Etiology</TERM>

<TERM>Cancerous</TERM>

<TERM>Cessation of life</TERM>

<TERM>Chromosome Mapping</TERM>

<TERM>Clinical</TERM>

<TERM>Colon</TERM>

<TERM>Colon Carcinoma</TERM>

<TERM>Colonic Neoplasms</TERM>

<TERM>Colonic Polyps</TERM>

<TERM>Colonoscopes</TERM>

<TERM>Colonoscopy</TERM>

<TERM>Colorectal Cancer</TERM>

<TERM>colorectal cancer screening</TERM>

<TERM>cost effectiveness</TERM>

<TERM>Data</TERM>

<TERM>Depressed mood</TERM>

<TERM>design</TERM>

<TERM>Detection</TERM>

<TERM>Development</TERM>

<TERM>Devices</TERM>

<TERM>Documentation</TERM>

<TERM>Drug Formulations</TERM>

<TERM>Endoscopy</TERM>

<TERM>Equipment</TERM>

<TERM>Excision</TERM>

<TERM>experience</TERM>

<TERM>Family suidae</TERM>

<TERM>Feasibility Studies</TERM>

<TERM>feeding</TERM>

<TERM>Gene Mutation</TERM>

<TERM>Generations</TERM>

<TERM>Goals</TERM>

<TERM>Gold</TERM>

<TERM>Growth</TERM>

<TERM>Harvest</TERM>

<TERM>Histologic</TERM>

<TERM>Histopathology</TERM>

<TERM>Image</TERM>

<TERM>image registration</TERM>

<TERM>Imaging Techniques</TERM>

<TERM>improved</TERM>

<TERM>in vivo</TERM>

<TERM>Indolent</TERM>

<TERM>instrument</TERM>

<TERM>Intestines</TERM>

<TERM>Lesion</TERM>

<TERM>Life</TERM>

<TERM>Light</TERM>

<TERM>Literature</TERM>

<TERM>Location</TERM>

<TERM>Malignant Conversion</TERM>

<TERM>Malignant Neoplasms</TERM>

<TERM>man</TERM>

<TERM>Maps</TERM>

<TERM>Measurable</TERM>

<TERM>Measures</TERM>

<TERM>meetings</TERM>

<TERM>Methods</TERM>

<TERM>Metric</TERM>

<TERM>Modality</TERM>

<TERM>Modeling</TERM>

<TERM>Molecular Probes</TERM>

<TERM>Monitor</TERM>

<TERM>Morphology</TERM>

<TERM>Motion</TERM>

<TERM>Neoplasms</TERM>

<TERM>neoplastic</TERM>

<TERM>Neoplastic Polyp</TERM>

<TERM>Oxygen</TERM>

<TERM>Oxygen measurement, partial pressure, arterial</TERM>

<TERM>Pathologic</TERM>

<TERM>Pathway interactions</TERM>

<TERM>Patients</TERM>

<TERM>Performance</TERM>

<TERM>Phase</TERM>

<TERM>Physiological</TERM>

<TERM>Polyps</TERM>

<TERM>Population</TERM>

<TERM>Premalignant</TERM>

<TERM>Prevention</TERM>

<TERM>Procedures</TERM>

<TERM>public health relevance</TERM>

<TERM>Rattus</TERM>

<TERM>Research</TERM>

<TERM>Rodent Model</TERM>

<TERM>screening</TERM>

<TERM>Series</TERM>

<TERM>Side</TERM>

<TERM>Simulate</TERM>

<TERM>Staging</TERM>

<TERM>System</TERM>

<TERM>Techniques</TERM>

<TERM>Technology</TERM>

<TERM>Temperature</TERM>

<TERM>Testing</TERM>

<TERM>Time</TERM>

<TERM>tissue oxygenation</TERM>

<TERM>tissue phantom</TERM>

<TERM>Tissues</TERM>

<TERM>Toxic effect</TERM>

<TERM>tumor growth</TERM>

<TERM>Work</TERM>

</PROJECT_TERMSX>

<PROJECT_TITLE>Mapping System for Colonoscopic Neoplasm
Detection</PROJECT_TITLE>

<PROJECT_START>03/11/2015</PROJECT_START>

<PROJECT_END>02/28/2017</PROJECT_END>

<PHR>PUBLIC HEALTH RELEVANCE: Colorectal cancer is preventable if caught as
an early harmless polyp. Video colonoscopy is the best method for screening
patients to identify polyps, but is far from perfect as roughly 25% of
polyps are missed despite current technology. This research develops a
secondary imaging system aimed at improving the identification of polyps
during screening colonoscopy;the system utilizes an imaging bundle that
extends through the working channel of a traditional colonoscope to
quantitatively measure and highlight oxygen differences that exist in polyps
when compared to normal colonic tissue.   

      

 

</PHR>