You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Sarvadnya Mutalik <sa...@renaissance-it.com> on 2006/07/17 09:53:04 UTC

Lucene for XML Not searching tag values

Hi I'm using Lucene for searching tag values in XML.

In the bellow valid xml when search for the string "individual" it's
getting me the results but when I search for "New York" (which is a
value of a sub-tag "city"), it is not giving me the result.

And here is the code sample...

XML file:

<?xml version='1.0' encoding='utf-8'?>

<address-book>

    <contact type="individual">
        <name>Zane Pasolini</name>
        <address>999 W. Prince St.</address>
        <city>New York</city>
        <province>NY</province>
        <postalcode>10013</postalcode>
        <country>USA</country>
        <telephone>1-212-345-6789</telephone>
    </contact>

    <contact type="business">
        <name>SAMOFIX d.o.o.</name>
        <address>Ilica 47-2 York</address>
        <city>New Zagreb</city>
        <province>CR</province>
        <postalcode>10000</postalcode>
        <country>Croatia</country>
        <telephone>385-1-123-4567</telephone>
    </contact>

    <contact type="testtype">
        <name>Test name</name>
        <address>Test address</address>
        <city>city test</city>
        <province>NY test</province>
        <postalcode>10013 test</postalcode>
        <country>USA test</country>
        <telephone>1-212-345-6789</telephone>
    </contact>    

</address-book>



AddressBookSearcher class:

package index;

import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
import java.io.IOException;


public class AddressBookSearcher
{
    public static void main(String[] args) throws IOException
    {
        String indexDir ="D:/Work/addressbook"; 
        
        IndexSearcher searcher = new IndexSearcher(indexDir);
        
        String searchFor ="individual"; 	//String to search
        
        Hits hits = null;
        
            //for(int j=1;j<8;j++){
                //hits = searcher.search(generateQuery(j,searchFor));
                  hits = searcher.search(new TermQuery(new
Term("type",searchFor)));
                if(hits.length() > 0){
                    System.out.println("NUMBER OF MATCHS: " +
hits.length()+"\n");
                }    
                for (int i = 0; i < hits.length(); i++){
                    System.out.println("NAME: " +
hits.doc(i).get("name"));
                    System.out.println("ADDRESS: " +
hits.doc(i).get("address"));
                    System.out.println("CITY: " +
hits.doc(i).get("city"));
                    System.out.println("PROVINCE: " +
hits.doc(i).get("province"));
                    System.out.println("POSTAL CODE: " +
hits.doc(i).get("postalcode"));
                    System.out.println("COUNTRY: " +
hits.doc(i).get("country"));
                    System.out.println("PHONE: " +
hits.doc(i).get("telephone"));        
                 }          
         //  }
    }
    
   private static Query generateQuery(int index,String searchFor){
      Query query = null;
      switch (index){
      case 1:
          query = new TermQuery(new Term("name",searchFor));
          break;
      case 2:     
          query = new TermQuery(new Term("address",searchFor));
          break;
      case 3:     
          query = new TermQuery(new Term("city",searchFor));
          break;
      case 4:          
          query = new TermQuery(new Term("province",searchFor));
          break;
      case 5:     
          query = new TermQuery(new Term("postalcode",searchFor));
          break;
      case 6:     
          query = new TermQuery(new Term("country",searchFor));
          break;
      case 7:     
          query = new TermQuery(new Term("telephone",searchFor));
          break;
      default:
          query = new TermQuery(new Term("telephone",searchFor));
          break;
      }      
      return query; 
   }
    
}




Thanks in advance.

Regards,
Sam

=========================================
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you
received this in error, please contact the sender and delete all copies
from any computer.




Re: Lucene for XML Not searching tag values

Posted by Simon Willnauer <si...@googlemail.com>.
Hi I guess this is a kind of a question which should be asked on the
user mailing list.
But anyway If you use the code below you search just in a single field
named "type".
"Type" in your xml has a value "individual" but not value "New York".
Aside from the sense of your approach you should use the
generateQuery() method to search in every field for the search string.
All your Hits e.g. results depend on how you index your data so nobody
can tell how to find a certain string without the knowledge how the
data was indexed.

You might wanna have a look at BooleanQuery?!

simon

On 7/17/06, Sarvadnya Mutalik <sa...@renaissance-it.com> wrote:
> Hi I'm using Lucene for searching tag values in XML.
>
> In the bellow valid xml when search for the string "individual" it's
> getting me the results but when I search for "New York" (which is a
> value of a sub-tag "city"), it is not giving me the result.
>
> And here is the code sample...
>
> XML file:
>
> <?xml version='1.0' encoding='utf-8'?>
>
> <address-book>
>
>     <contact type="individual">
>         <name>Zane Pasolini</name>
>         <address>999 W. Prince St.</address>
>         <city>New York</city>
>         <province>NY</province>
>         <postalcode>10013</postalcode>
>         <country>USA</country>
>         <telephone>1-212-345-6789</telephone>
>     </contact>
>
>     <contact type="business">
>         <name>SAMOFIX d.o.o.</name>
>         <address>Ilica 47-2 York</address>
>         <city>New Zagreb</city>
>         <province>CR</province>
>         <postalcode>10000</postalcode>
>         <country>Croatia</country>
>         <telephone>385-1-123-4567</telephone>
>     </contact>
>
>     <contact type="testtype">
>         <name>Test name</name>
>         <address>Test address</address>
>         <city>city test</city>
>         <province>NY test</province>
>         <postalcode>10013 test</postalcode>
>         <country>USA test</country>
>         <telephone>1-212-345-6789</telephone>
>     </contact>
>
> </address-book>
>
>
>
> AddressBookSearcher class:
>
> package index;
>
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.Hits;
> import java.io.IOException;
>
>
> public class AddressBookSearcher
> {
>     public static void main(String[] args) throws IOException
>     {
>         String indexDir ="D:/Work/addressbook";
>
>         IndexSearcher searcher = new IndexSearcher(indexDir);
>
>         String searchFor ="individual";         //String to search
>
>         Hits hits = null;
>
>             //for(int j=1;j<8;j++){
>                 //hits = searcher.search(generateQuery(j,searchFor));
>                   hits = searcher.search(new TermQuery(new
> Term("type",searchFor)));
>                 if(hits.length() > 0){
>                     System.out.println("NUMBER OF MATCHS: " +
> hits.length()+"\n");
>                 }
>                 for (int i = 0; i < hits.length(); i++){
>                     System.out.println("NAME: " +
> hits.doc(i).get("name"));
>                     System.out.println("ADDRESS: " +
> hits.doc(i).get("address"));
>                     System.out.println("CITY: " +
> hits.doc(i).get("city"));
>                     System.out.println("PROVINCE: " +
> hits.doc(i).get("province"));
>                     System.out.println("POSTAL CODE: " +
> hits.doc(i).get("postalcode"));
>                     System.out.println("COUNTRY: " +
> hits.doc(i).get("country"));
>                     System.out.println("PHONE: " +
> hits.doc(i).get("telephone"));
>                  }
>          //  }
>     }
>
>    private static Query generateQuery(int index,String searchFor){
>       Query query = null;
>       switch (index){
>       case 1:
>           query = new TermQuery(new Term("name",searchFor));
>           break;
>       case 2:
>           query = new TermQuery(new Term("address",searchFor));
>           break;
>       case 3:
>           query = new TermQuery(new Term("city",searchFor));
>           break;
>       case 4:
>           query = new TermQuery(new Term("province",searchFor));
>           break;
>       case 5:
>           query = new TermQuery(new Term("postalcode",searchFor));
>           break;
>       case 6:
>           query = new TermQuery(new Term("country",searchFor));
>           break;
>       case 7:
>           query = new TermQuery(new Term("telephone",searchFor));
>           break;
>       default:
>           query = new TermQuery(new Term("telephone",searchFor));
>           break;
>       }
>       return query;
>    }
>
> }
>
>
>
>
> Thanks in advance.
>
> Regards,
> Sam
>
> =========================================
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you
> received this in error, please contact the sender and delete all copies
> from any computer.
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org