You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by KK <di...@gmail.com> on 2009/05/20 11:43:11 UTC

How to create a new index

How to create a new index? everytime I need to do so , I've to create a new
directory and put the path to that, right? how to automate the creation of
new directory?

I'm a new user of lucene. Please help me out.

Thanks,
KK.

Re: How to create a new index

Posted by KK <di...@gmail.com>.

Thank you again@John.
This is even better. I don't have to bother about the 3rd argument, right?
I'll use the same one everytime for both registering a new core as well as
adding docs to an existing one.

Thanks,
KK.

On Wed, May 20, 2009 at 6:54 PM, John Byrne <jo...@propylon.com> wrote:

> Hi KK,
>
> You're welcome!
>
> BTW, I had a quick look at the Javadoc for IndexWriter and noticed this
> constructor:
>
> public IndexWriter(Directory d, Analyzer a)
> "Constructs an IndexWriter for the index in d, first creating it if it does
> not already exist."
>
> I think that might solve your problem and simplify the code a little - I
> think you could just use that constructor every time, because it will only
> create the index if it does not already exist.
>
> -John
>
>
> KK wrote:
>
>> Thanks a lot @John. That solved the problem and the other advice is really
>> helpful. I'd have bumped over that otherwise.
>> This clarifies my doubt, that everytime I've to create a new index just
>> call
>> the indexwriter with "true" thereby creating the directory, then start
>> adding docs with "false" as the 3rd argument instead of "true", right?
>> Lucene is pretty simple and gives you the full control of whatever you are
>> doing. I've been trying to automate the creation of new solr cores for
>> last
>> two days without any luck. Finally today moved to Lucene and it fixed my
>> problem very soon. Thank you all and special thanks to Lucene guys.
>>
>> Thanks,
>> KK.
>>
>> On Wed, May 20, 2009 at 6:28 PM, John Byrne <jo...@propylon.com>
>> wrote:
>>
>>
>>
>>> I think the problem is that you are creating an new index every time you
>>> add a document:
>>>
>>> IndexWriter writer = new IndexWriter(trueIndexPath, new
>>> StandardAnalyzer(), true);
>>>
>>> The last argument, the boolean 'true' tells IndexWriter to overwrite any
>>> existing index in that directory. If you set that to false, it will not
>>> overwrite the previous index, but will add to it.
>>>
>>> How, then do you create it in the first place? You call the IndexWriter's
>>> constructor once with 'true' as the 3rd argumrent, creating the index,
>>> then
>>> subsequently use 'false'. You could do this in your main method, right
>>> after
>>> you create an instance of SimpleIndexer, but before you call createIndex.
>>>
>>> -John
>>>
>>>
>>>
>>> KK wrote:
>>>
>>>
>>>
>>>> Thank you very much.
>>>> I'm using the one mentioned by @Anshum ..but the problem is that after
>>>> indexing some no of docs what I see is only the last one indexed which
>>>> clearly indicates that the index is getting overwritten. I'm posing my
>>>> simple indexer and searcher herewith. Actually I'm trying to crawl web
>>>> pages
>>>> and add each pages content under a filed called "content" againts a
>>>> field
>>>> called "id" and for this id I'm using the page URL. These are the codes
>>>>
>>>> The indexer:
>>>> --------------------------------------------
>>>> package solrSearch;
>>>>
>>>> import org.apache.lucene.analysis.SimpleAnalyzer;
>>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>>> import org.apache.lucene.document.Document;
>>>> import org.apache.lucene.document.Field;
>>>> import org.apache.lucene.index.IndexWriter;
>>>>
>>>> public class SimpleIndexer {
>>>>
>>>>  // Base Path to the index directory
>>>>  private static final String baseIndexPath = "/opt/lucene/index/";
>>>>
>>>>
>>>>  public void createIndex(String pageContent, String pageId, String
>>>> coreId)
>>>> throws Exception {
>>>>   String trueIndexPath = baseIndexPath + coreId ;
>>>>   String contentField = "content";
>>>>   String contentId    = "id";
>>>>
>>>>   // Create a writer
>>>>   IndexWriter writer = new IndexWriter(trueIndexPath, new
>>>> StandardAnalyzer(), true);
>>>>
>>>>   System.out.println("Adding page to lucene " + pageId);
>>>>   Document doc = new Document();
>>>>   doc.add(new Field(contentField, pageContent, Field.Store.YES,
>>>> Field.Index.TOKENIZED));
>>>>   doc.add(new Field(contentId, pageId, Field.Store.YES,
>>>> Field.Index.TOKENIZED));
>>>>
>>>>   // Add documents to the index
>>>>   writer.addDocument(doc);
>>>>
>>>>   // Lucene recommends calling optimize upon completion of indexing
>>>>   writer.optimize();
>>>>
>>>>   // clean up
>>>>   writer.close();
>>>>  }
>>>>
>>>>  public static void main(String args[]) throws Exception{
>>>>      SimpleIndexer empIndex = new SimpleIndexer();
>>>>   empIndex.createIndex("this is sample test content", "test0", "core0");
>>>>   System.out.println("Data indexed by lucene");
>>>>  }
>>>>
>>>> }
>>>>
>>>> and the searcher:
>>>> ---------------------------------------
>>>> package solrSearch;
>>>>
>>>> import java.io.FileReader;
>>>> import java.io.IOException;
>>>> import java.io.InputStreamReader;
>>>> import java.util.Date;
>>>>
>>>> import org.apache.lucene.analysis.Analyzer;
>>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>>> import org.apache.lucene.document.Document;
>>>> import org.apache.lucene.index.FilterIndexReader;
>>>> import org.apache.lucene.index.IndexReader;
>>>> import org.apache.lucene.queryParser.QueryParser;
>>>> import org.apache.lucene.search.HitCollector;
>>>> import org.apache.lucene.search.Hits;
>>>> import org.apache.lucene.search.IndexSearcher;
>>>> import org.apache.lucene.search.Query;
>>>> import org.apache.lucene.search.ScoreDoc;
>>>> import org.apache.lucene.search.Searcher;
>>>> import org.apache.lucene.search.TopDocCollector;
>>>>
>>>> /** Simple command-line based search demo. */
>>>> public class SimpleSearcher {
>>>>   private static final String baseIndexPath = "/opt/lucene/index/" ;
>>>>
>>>>   private void searchIndex(String queryString, String coreId) throws
>>>> Exception{
>>>>       String trueIndexPath = baseIndexPath + coreId;
>>>>       String searchField = "content";
>>>>        IndexSearcher searcher = new IndexSearcher(trueIndexPath);
>>>>       QueryParser queryParser = null;
>>>>       try {
>>>>           queryParser = new QueryParser(searchField, new
>>>> StandardAnalyzer());
>>>>       } catch (Exception ex) {
>>>>            ex.printStackTrace();
>>>>       }
>>>>
>>>>       Query query = queryParser.parse(queryString);
>>>>
>>>>       Hits hits = null;
>>>>       try {
>>>>            hits = searcher.search(query);
>>>>       } catch (Exception ex) {
>>>>            ex.printStackTrace();
>>>>       }
>>>>
>>>>       int hitCount = hits.length();
>>>>       System.out.println("Results found :" + hitCount);
>>>>
>>>>       for (int ix=0; (ix<hitCount && ix<10); ix++) {
>>>>            Document doc = hits.doc(ix);
>>>>           System.out.println(doc.get("id"));
>>>>           System.out.println(doc.get("content"));
>>>>       }
>>>>   }
>>>>
>>>>   public static void main(String args[]) throws Exception{
>>>>        SimpleSearcher searcher = new SimpleSearcher();
>>>>       String queryString = args[0];
>>>>       System.out.println("Quering for :" + queryString);
>>>>       searcher.searchIndex(queryString, "core0");
>>>>   }
>>>>
>>>> }
>>>>
>>>> ---------------
>>>> When I tried intially without having the core0 directory, it
>>>> automatically
>>>> created that. Its fine, but I'm not able to figure what is the issue,
>>>> why
>>>> the data is getting overwritten. Some silly mistakes some where. Can
>>>> some
>>>> one point me that?
>>>> And this is the code snip that I'm using to post to lucene index.
>>>>
>>>> public void postToSolr(String rawText, String pageId) throws Exception{
>>>>       // Which solr core are we posting to???
>>>>       //String solrCoreId = getCoreId(pageId);
>>>>       String coreId = "core0";
>>>>       SimpleIndexer indexer = new SimpleIndexer();
>>>>       indexer.createIndex(rawText, pageId, coreId);
>>>>
>>>>   }
>>>>
>>>> NB: I din't pay attention to change the names , so you might find the
>>>> word
>>>> "solr" here and there. I was using that earlier, but bcoz of lack of
>>>> facility of creating new separate indexes I moved to lucene today only.
>>>> I
>>>> guess trying to crete a new index with non-existing directory will
>>>> automatically create it, which is what i want. Correct me if i'm wrong.
>>>> As
>>>> I
>>>> mentioned earlier for each domain [say www.bcd.co.uk] I want to have a
>>>> separate index and coreId is a map of this URL to a unique number. Do
>>>> let
>>>> me
>>>> know if i'm going wrong anywhere of if you feel it can be done in any
>>>> other
>>>> better way.
>>>>
>>>>
>>>> Thanks,
>>>> KK.
>>>>
>>>>
>>>> On Wed, May 20, 2009 at 4:10 PM, Anshum <an...@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Hi KK,
>>>>>
>>>>> Easier still, you could just open the indexwriter with the last (3rd)
>>>>> arguement as true, this way the indexwriter would create a new index as
>>>>> soon
>>>>> as you start indexing. Also, if you just leave the indexWriter without
>>>>> the
>>>>> 3rd arguement, it'd conditionally create a new directory i.e. only if
>>>>> the
>>>>> index dir doesn't exist at that location would it create a new index
>>>>> else
>>>>> it
>>>>> would append to the already existing index at that location.
>>>>> Coming to the 2nd point, if you are talking about the index name, as
>>>>> mentioned by John you could simply use the timestamp as the index name.
>>>>>
>>>>> --
>>>>> Anshum Gupta
>>>>> Naukri Labs!
>>>>> http://ai-cafe.blogspot.com
>>>>>
>>>>> The facts expressed here belong to everybody, the opinions to me. The
>>>>> distinction is yours to draw............
>>>>>
>>>>>
>>>>> On Wed, May 20, 2009 at 3:23 PM, John Byrne <jo...@propylon.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> You can do this with pure Java. Create a file object with the path you
>>>>>> want, check if it exists, and it not, create it:
>>>>>>
>>>>>> File newIndexDir = new File("/foo/bar")
>>>>>>
>>>>>> if(!newFileDir.exists())   {
>>>>>>
>>>>>>  newDirFile.mkdirs();
>>>>>> }
>>>>>>
>>>>>> The 'mkdirs()' method creates any necessary parent directories.
>>>>>>
>>>>>> If you want to automate the generation of the path itself, then there
>>>>>> are
>>>>>> several ways to do it, but the best way really depends on *why* you're
>>>>>> generating a new index. For instance, you could just create a
>>>>>> timestamped
>>>>>> name, but that name might not be very meaningful.
>>>>>>
>>>>>> Hope that helps!
>>>>>>
>>>>>> -John
>>>>>>
>>>>>> KK wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> How to create a new index? everytime I need to do so , I've to create
>>>>>>> a
>>>>>>> new
>>>>>>> directory and put the path to that, right? how to automate the
>>>>>>> creation
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> of
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>> new directory?
>>>>>>
>>>>>>
>>>>>>> I'm a new user of lucene. Please help me out.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> KK.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>  ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> No virus found in this incoming message.
>>>>>>
>>>>>>
>>>>>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>>>>>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>  ------------------------------------------------------------------------
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>>>
>>>>
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>  ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to create a new index

Posted by John Byrne <jo...@propylon.com>.

Hi KK,

You're welcome!

BTW, I had a quick look at the Javadoc for IndexWriter and noticed this 
constructor:

public IndexWriter(Directory d, Analyzer a)
"Constructs an IndexWriter for the index in d, first creating it if it 
does not already exist."

I think that might solve your problem and simplify the code a little - I 
think you could just use that constructor every time, because it will 
only create the index if it does not already exist.

-John

KK wrote:
> Thanks a lot @John. That solved the problem and the other advice is really
> helpful. I'd have bumped over that otherwise.
> This clarifies my doubt, that everytime I've to create a new index just call
> the indexwriter with "true" thereby creating the directory, then start
> adding docs with "false" as the 3rd argument instead of "true", right?
> Lucene is pretty simple and gives you the full control of whatever you are
> doing. I've been trying to automate the creation of new solr cores for last
> two days without any luck. Finally today moved to Lucene and it fixed my
> problem very soon. Thank you all and special thanks to Lucene guys.
>
> Thanks,
> KK.
>
> On Wed, May 20, 2009 at 6:28 PM, John Byrne <jo...@propylon.com> wrote:
>
>   
>> I think the problem is that you are creating an new index every time you
>> add a document:
>>
>> IndexWriter writer = new IndexWriter(trueIndexPath, new
>> StandardAnalyzer(), true);
>>
>> The last argument, the boolean 'true' tells IndexWriter to overwrite any
>> existing index in that directory. If you set that to false, it will not
>> overwrite the previous index, but will add to it.
>>
>> How, then do you create it in the first place? You call the IndexWriter's
>> constructor once with 'true' as the 3rd argumrent, creating the index, then
>> subsequently use 'false'. You could do this in your main method, right after
>> you create an instance of SimpleIndexer, but before you call createIndex.
>>
>> -John
>>
>>
>>
>> KK wrote:
>>
>>     
>>> Thank you very much.
>>> I'm using the one mentioned by @Anshum ..but the problem is that after
>>> indexing some no of docs what I see is only the last one indexed which
>>> clearly indicates that the index is getting overwritten. I'm posing my
>>> simple indexer and searcher herewith. Actually I'm trying to crawl web
>>> pages
>>> and add each pages content under a filed called "content" againts a field
>>> called "id" and for this id I'm using the page URL. These are the codes
>>>
>>> The indexer:
>>> --------------------------------------------
>>> package solrSearch;
>>>
>>> import org.apache.lucene.analysis.SimpleAnalyzer;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.document.Document;
>>> import org.apache.lucene.document.Field;
>>> import org.apache.lucene.index.IndexWriter;
>>>
>>> public class SimpleIndexer {
>>>
>>>  // Base Path to the index directory
>>>  private static final String baseIndexPath = "/opt/lucene/index/";
>>>
>>>
>>>  public void createIndex(String pageContent, String pageId, String coreId)
>>> throws Exception {
>>>    String trueIndexPath = baseIndexPath + coreId ;
>>>    String contentField = "content";
>>>    String contentId    = "id";
>>>
>>>    // Create a writer
>>>    IndexWriter writer = new IndexWriter(trueIndexPath, new
>>> StandardAnalyzer(), true);
>>>
>>>    System.out.println("Adding page to lucene " + pageId);
>>>    Document doc = new Document();
>>>    doc.add(new Field(contentField, pageContent, Field.Store.YES,
>>> Field.Index.TOKENIZED));
>>>    doc.add(new Field(contentId, pageId, Field.Store.YES,
>>> Field.Index.TOKENIZED));
>>>
>>>    // Add documents to the index
>>>    writer.addDocument(doc);
>>>
>>>    // Lucene recommends calling optimize upon completion of indexing
>>>    writer.optimize();
>>>
>>>    // clean up
>>>    writer.close();
>>>  }
>>>
>>>  public static void main(String args[]) throws Exception{
>>>       SimpleIndexer empIndex = new SimpleIndexer();
>>>    empIndex.createIndex("this is sample test content", "test0", "core0");
>>>    System.out.println("Data indexed by lucene");
>>>  }
>>>
>>> }
>>>
>>> and the searcher:
>>> ---------------------------------------
>>> package solrSearch;
>>>
>>> import java.io.FileReader;
>>> import java.io.IOException;
>>> import java.io.InputStreamReader;
>>> import java.util.Date;
>>>
>>> import org.apache.lucene.analysis.Analyzer;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.document.Document;
>>> import org.apache.lucene.index.FilterIndexReader;
>>> import org.apache.lucene.index.IndexReader;
>>> import org.apache.lucene.queryParser.QueryParser;
>>> import org.apache.lucene.search.HitCollector;
>>> import org.apache.lucene.search.Hits;
>>> import org.apache.lucene.search.IndexSearcher;
>>> import org.apache.lucene.search.Query;
>>> import org.apache.lucene.search.ScoreDoc;
>>> import org.apache.lucene.search.Searcher;
>>> import org.apache.lucene.search.TopDocCollector;
>>>
>>> /** Simple command-line based search demo. */
>>> public class SimpleSearcher {
>>>    private static final String baseIndexPath = "/opt/lucene/index/" ;
>>>
>>>    private void searchIndex(String queryString, String coreId) throws
>>> Exception{
>>>        String trueIndexPath = baseIndexPath + coreId;
>>>        String searchField = "content";
>>>         IndexSearcher searcher = new IndexSearcher(trueIndexPath);
>>>        QueryParser queryParser = null;
>>>        try {
>>>            queryParser = new QueryParser(searchField, new
>>> StandardAnalyzer());
>>>        } catch (Exception ex) {
>>>             ex.printStackTrace();
>>>        }
>>>
>>>        Query query = queryParser.parse(queryString);
>>>
>>>        Hits hits = null;
>>>        try {
>>>             hits = searcher.search(query);
>>>        } catch (Exception ex) {
>>>             ex.printStackTrace();
>>>        }
>>>
>>>        int hitCount = hits.length();
>>>        System.out.println("Results found :" + hitCount);
>>>
>>>        for (int ix=0; (ix<hitCount && ix<10); ix++) {
>>>             Document doc = hits.doc(ix);
>>>            System.out.println(doc.get("id"));
>>>            System.out.println(doc.get("content"));
>>>        }
>>>    }
>>>
>>>    public static void main(String args[]) throws Exception{
>>>         SimpleSearcher searcher = new SimpleSearcher();
>>>        String queryString = args[0];
>>>        System.out.println("Quering for :" + queryString);
>>>        searcher.searchIndex(queryString, "core0");
>>>    }
>>>
>>> }
>>>
>>> ---------------
>>> When I tried intially without having the core0 directory, it automatically
>>> created that. Its fine, but I'm not able to figure what is the issue, why
>>> the data is getting overwritten. Some silly mistakes some where. Can some
>>> one point me that?
>>> And this is the code snip that I'm using to post to lucene index.
>>>
>>> public void postToSolr(String rawText, String pageId) throws Exception{
>>>        // Which solr core are we posting to???
>>>        //String solrCoreId = getCoreId(pageId);
>>>        String coreId = "core0";
>>>        SimpleIndexer indexer = new SimpleIndexer();
>>>        indexer.createIndex(rawText, pageId, coreId);
>>>
>>>    }
>>>
>>> NB: I din't pay attention to change the names , so you might find the word
>>> "solr" here and there. I was using that earlier, but bcoz of lack of
>>> facility of creating new separate indexes I moved to lucene today only. I
>>> guess trying to crete a new index with non-existing directory will
>>> automatically create it, which is what i want. Correct me if i'm wrong. As
>>> I
>>> mentioned earlier for each domain [say www.bcd.co.uk] I want to have a
>>> separate index and coreId is a map of this URL to a unique number. Do let
>>> me
>>> know if i'm going wrong anywhere of if you feel it can be done in any
>>> other
>>> better way.
>>>
>>>
>>> Thanks,
>>> KK.
>>>
>>>
>>> On Wed, May 20, 2009 at 4:10 PM, Anshum <an...@gmail.com> wrote:
>>>
>>>
>>>
>>>       
>>>> Hi KK,
>>>>
>>>> Easier still, you could just open the indexwriter with the last (3rd)
>>>> arguement as true, this way the indexwriter would create a new index as
>>>> soon
>>>> as you start indexing. Also, if you just leave the indexWriter without
>>>> the
>>>> 3rd arguement, it'd conditionally create a new directory i.e. only if the
>>>> index dir doesn't exist at that location would it create a new index else
>>>> it
>>>> would append to the already existing index at that location.
>>>> Coming to the 2nd point, if you are talking about the index name, as
>>>> mentioned by John you could simply use the timestamp as the index name.
>>>>
>>>> --
>>>> Anshum Gupta
>>>> Naukri Labs!
>>>> http://ai-cafe.blogspot.com
>>>>
>>>> The facts expressed here belong to everybody, the opinions to me. The
>>>> distinction is yours to draw............
>>>>
>>>>
>>>> On Wed, May 20, 2009 at 3:23 PM, John Byrne <jo...@propylon.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> You can do this with pure Java. Create a file object with the path you
>>>>> want, check if it exists, and it not, create it:
>>>>>
>>>>> File newIndexDir = new File("/foo/bar")
>>>>>
>>>>> if(!newFileDir.exists())   {
>>>>>
>>>>>  newDirFile.mkdirs();
>>>>> }
>>>>>
>>>>> The 'mkdirs()' method creates any necessary parent directories.
>>>>>
>>>>> If you want to automate the generation of the path itself, then there
>>>>> are
>>>>> several ways to do it, but the best way really depends on *why* you're
>>>>> generating a new index. For instance, you could just create a
>>>>> timestamped
>>>>> name, but that name might not be very meaningful.
>>>>>
>>>>> Hope that helps!
>>>>>
>>>>> -John
>>>>>
>>>>> KK wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> How to create a new index? everytime I need to do so , I've to create a
>>>>>> new
>>>>>> directory and put the path to that, right? how to automate the creation
>>>>>>
>>>>>>
>>>>>>             
>>>>> of
>>>>>           
>>>>         
>>>>> new directory?
>>>>>           
>>>>>> I'm a new user of lucene. Please help me out.
>>>>>>
>>>>>> Thanks,
>>>>>> KK.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>  ------------------------------------------------------------------------
>>>>
>>>>
>>>>         
>>>>> No virus found in this incoming message.
>>>>>           
>>>>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>>>>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>  ------------------------------------------------------------------------
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>>
>>>
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.339 / Virus Database: 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to create a new index

Posted by Erick Erickson <er...@gmail.com>.

Unless something about your problem space *requires* that you reopen theindex,
you're better off just opining it once, writing all your documents to
it, then closing it. Although what you're doing will work, it's not very
efficient.

And the same thing is *especially* true of the searcher. There's
considerable
overhead warming up a new searcher, and doing it for every search does
not scale at all well (but this is demo code so that's probably irrelevant).

Best
Erick

On Wed, May 20, 2009 at 9:13 AM, KK <di...@gmail.com> wrote:

> Thanks a lot @John. That solved the problem and the other advice is really
> helpful. I'd have bumped over that otherwise.
> This clarifies my doubt, that everytime I've to create a new index just
> call
> the indexwriter with "true" thereby creating the directory, then start
> adding docs with "false" as the 3rd argument instead of "true", right?
> Lucene is pretty simple and gives you the full control of whatever you are
> doing. I've been trying to automate the creation of new solr cores for last
> two days without any luck. Finally today moved to Lucene and it fixed my
> problem very soon. Thank you all and special thanks to Lucene guys.
>
> Thanks,
> KK.
>
> On Wed, May 20, 2009 at 6:28 PM, John Byrne <jo...@propylon.com>
> wrote:
>
> > I think the problem is that you are creating an new index every time you
> > add a document:
> >
> > IndexWriter writer = new IndexWriter(trueIndexPath, new
> > StandardAnalyzer(), true);
> >
> > The last argument, the boolean 'true' tells IndexWriter to overwrite any
> > existing index in that directory. If you set that to false, it will not
> > overwrite the previous index, but will add to it.
> >
> > How, then do you create it in the first place? You call the IndexWriter's
> > constructor once with 'true' as the 3rd argumrent, creating the index,
> then
> > subsequently use 'false'. You could do this in your main method, right
> after
> > you create an instance of SimpleIndexer, but before you call createIndex.
> >
> > -John
> >
> >
> >
> > KK wrote:
> >
> >> Thank you very much.
> >> I'm using the one mentioned by @Anshum ..but the problem is that after
> >> indexing some no of docs what I see is only the last one indexed which
> >> clearly indicates that the index is getting overwritten. I'm posing my
> >> simple indexer and searcher herewith. Actually I'm trying to crawl web
> >> pages
> >> and add each pages content under a filed called "content" againts a
> field
> >> called "id" and for this id I'm using the page URL. These are the codes
> >>
> >> The indexer:
> >> --------------------------------------------
> >> package solrSearch;
> >>
> >> import org.apache.lucene.analysis.SimpleAnalyzer;
> >> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> >> import org.apache.lucene.document.Document;
> >> import org.apache.lucene.document.Field;
> >> import org.apache.lucene.index.IndexWriter;
> >>
> >> public class SimpleIndexer {
> >>
> >>  // Base Path to the index directory
> >>  private static final String baseIndexPath = "/opt/lucene/index/";
> >>
> >>
> >>  public void createIndex(String pageContent, String pageId, String
> coreId)
> >> throws Exception {
> >>    String trueIndexPath = baseIndexPath + coreId ;
> >>    String contentField = "content";
> >>    String contentId    = "id";
> >>
> >>    // Create a writer
> >>    IndexWriter writer = new IndexWriter(trueIndexPath, new
> >> StandardAnalyzer(), true);
> >>
> >>    System.out.println("Adding page to lucene " + pageId);
> >>    Document doc = new Document();
> >>    doc.add(new Field(contentField, pageContent, Field.Store.YES,
> >> Field.Index.TOKENIZED));
> >>    doc.add(new Field(contentId, pageId, Field.Store.YES,
> >> Field.Index.TOKENIZED));
> >>
> >>    // Add documents to the index
> >>    writer.addDocument(doc);
> >>
> >>    // Lucene recommends calling optimize upon completion of indexing
> >>    writer.optimize();
> >>
> >>    // clean up
> >>    writer.close();
> >>  }
> >>
> >>  public static void main(String args[]) throws Exception{
> >>       SimpleIndexer empIndex = new SimpleIndexer();
> >>    empIndex.createIndex("this is sample test content", "test0",
> "core0");
> >>    System.out.println("Data indexed by lucene");
> >>  }
> >>
> >> }
> >>
> >> and the searcher:
> >> ---------------------------------------
> >> package solrSearch;
> >>
> >> import java.io.FileReader;
> >> import java.io.IOException;
> >> import java.io.InputStreamReader;
> >> import java.util.Date;
> >>
> >> import org.apache.lucene.analysis.Analyzer;
> >> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> >> import org.apache.lucene.document.Document;
> >> import org.apache.lucene.index.FilterIndexReader;
> >> import org.apache.lucene.index.IndexReader;
> >> import org.apache.lucene.queryParser.QueryParser;
> >> import org.apache.lucene.search.HitCollector;
> >> import org.apache.lucene.search.Hits;
> >> import org.apache.lucene.search.IndexSearcher;
> >> import org.apache.lucene.search.Query;
> >> import org.apache.lucene.search.ScoreDoc;
> >> import org.apache.lucene.search.Searcher;
> >> import org.apache.lucene.search.TopDocCollector;
> >>
> >> /** Simple command-line based search demo. */
> >> public class SimpleSearcher {
> >>    private static final String baseIndexPath = "/opt/lucene/index/" ;
> >>
> >>    private void searchIndex(String queryString, String coreId) throws
> >> Exception{
> >>        String trueIndexPath = baseIndexPath + coreId;
> >>        String searchField = "content";
> >>         IndexSearcher searcher = new IndexSearcher(trueIndexPath);
> >>        QueryParser queryParser = null;
> >>        try {
> >>            queryParser = new QueryParser(searchField, new
> >> StandardAnalyzer());
> >>        } catch (Exception ex) {
> >>             ex.printStackTrace();
> >>        }
> >>
> >>        Query query = queryParser.parse(queryString);
> >>
> >>        Hits hits = null;
> >>        try {
> >>             hits = searcher.search(query);
> >>        } catch (Exception ex) {
> >>             ex.printStackTrace();
> >>        }
> >>
> >>        int hitCount = hits.length();
> >>        System.out.println("Results found :" + hitCount);
> >>
> >>        for (int ix=0; (ix<hitCount && ix<10); ix++) {
> >>             Document doc = hits.doc(ix);
> >>            System.out.println(doc.get("id"));
> >>            System.out.println(doc.get("content"));
> >>        }
> >>    }
> >>
> >>    public static void main(String args[]) throws Exception{
> >>         SimpleSearcher searcher = new SimpleSearcher();
> >>        String queryString = args[0];
> >>        System.out.println("Quering for :" + queryString);
> >>        searcher.searchIndex(queryString, "core0");
> >>    }
> >>
> >> }
> >>
> >> ---------------
> >> When I tried intially without having the core0 directory, it
> automatically
> >> created that. Its fine, but I'm not able to figure what is the issue,
> why
> >> the data is getting overwritten. Some silly mistakes some where. Can
> some
> >> one point me that?
> >> And this is the code snip that I'm using to post to lucene index.
> >>
> >> public void postToSolr(String rawText, String pageId) throws Exception{
> >>        // Which solr core are we posting to???
> >>        //String solrCoreId = getCoreId(pageId);
> >>        String coreId = "core0";
> >>        SimpleIndexer indexer = new SimpleIndexer();
> >>        indexer.createIndex(rawText, pageId, coreId);
> >>
> >>    }
> >>
> >> NB: I din't pay attention to change the names , so you might find the
> word
> >> "solr" here and there. I was using that earlier, but bcoz of lack of
> >> facility of creating new separate indexes I moved to lucene today only.
> I
> >> guess trying to crete a new index with non-existing directory will
> >> automatically create it, which is what i want. Correct me if i'm wrong.
> As
> >> I
> >> mentioned earlier for each domain [say www.bcd.co.uk] I want to have a
> >> separate index and coreId is a map of this URL to a unique number. Do
> let
> >> me
> >> know if i'm going wrong anywhere of if you feel it can be done in any
> >> other
> >> better way.
> >>
> >>
> >> Thanks,
> >> KK.
> >>
> >>
> >> On Wed, May 20, 2009 at 4:10 PM, Anshum <an...@gmail.com> wrote:
> >>
> >>
> >>
> >>> Hi KK,
> >>>
> >>> Easier still, you could just open the indexwriter with the last (3rd)
> >>> arguement as true, this way the indexwriter would create a new index as
> >>> soon
> >>> as you start indexing. Also, if you just leave the indexWriter without
> >>> the
> >>> 3rd arguement, it'd conditionally create a new directory i.e. only if
> the
> >>> index dir doesn't exist at that location would it create a new index
> else
> >>> it
> >>> would append to the already existing index at that location.
> >>> Coming to the 2nd point, if you are talking about the index name, as
> >>> mentioned by John you could simply use the timestamp as the index name.
> >>>
> >>> --
> >>> Anshum Gupta
> >>> Naukri Labs!
> >>> http://ai-cafe.blogspot.com
> >>>
> >>> The facts expressed here belong to everybody, the opinions to me. The
> >>> distinction is yours to draw............
> >>>
> >>>
> >>> On Wed, May 20, 2009 at 3:23 PM, John Byrne <jo...@propylon.com>
> >>> wrote:
> >>>
> >>>
> >>>
> >>>> You can do this with pure Java. Create a file object with the path you
> >>>> want, check if it exists, and it not, create it:
> >>>>
> >>>> File newIndexDir = new File("/foo/bar")
> >>>>
> >>>> if(!newFileDir.exists())   {
> >>>>
> >>>>  newDirFile.mkdirs();
> >>>> }
> >>>>
> >>>> The 'mkdirs()' method creates any necessary parent directories.
> >>>>
> >>>> If you want to automate the generation of the path itself, then there
> >>>> are
> >>>> several ways to do it, but the best way really depends on *why* you're
> >>>> generating a new index. For instance, you could just create a
> >>>> timestamped
> >>>> name, but that name might not be very meaningful.
> >>>>
> >>>> Hope that helps!
> >>>>
> >>>> -John
> >>>>
> >>>> KK wrote:
> >>>>
> >>>>
> >>>>
> >>>>> How to create a new index? everytime I need to do so , I've to create
> a
> >>>>> new
> >>>>> directory and put the path to that, right? how to automate the
> creation
> >>>>>
> >>>>>
> >>>> of
> >>>
> >>>
> >>>> new directory?
> >>>>>
> >>>>> I'm a new user of lucene. Please help me out.
> >>>>>
> >>>>> Thanks,
> >>>>> KK.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
>  ------------------------------------------------------------------------
> >>>
> >>>
> >>>> No virus found in this incoming message.
> >>>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
> >>>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
>  ------------------------------------------------------------------------
> >>
> >>
> >> No virus found in this incoming message.
> >> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
> >> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
> >>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: How to create a new index

Posted by KK <di...@gmail.com>.

Thanks a lot @John. That solved the problem and the other advice is really
helpful. I'd have bumped over that otherwise.
This clarifies my doubt, that everytime I've to create a new index just call
the indexwriter with "true" thereby creating the directory, then start
adding docs with "false" as the 3rd argument instead of "true", right?
Lucene is pretty simple and gives you the full control of whatever you are
doing. I've been trying to automate the creation of new solr cores for last
two days without any luck. Finally today moved to Lucene and it fixed my
problem very soon. Thank you all and special thanks to Lucene guys.

Thanks,
KK.

On Wed, May 20, 2009 at 6:28 PM, John Byrne <jo...@propylon.com> wrote:

> I think the problem is that you are creating an new index every time you
> add a document:
>
> IndexWriter writer = new IndexWriter(trueIndexPath, new
> StandardAnalyzer(), true);
>
> The last argument, the boolean 'true' tells IndexWriter to overwrite any
> existing index in that directory. If you set that to false, it will not
> overwrite the previous index, but will add to it.
>
> How, then do you create it in the first place? You call the IndexWriter's
> constructor once with 'true' as the 3rd argumrent, creating the index, then
> subsequently use 'false'. You could do this in your main method, right after
> you create an instance of SimpleIndexer, but before you call createIndex.
>
> -John
>
>
>
> KK wrote:
>
>> Thank you very much.
>> I'm using the one mentioned by @Anshum ..but the problem is that after
>> indexing some no of docs what I see is only the last one indexed which
>> clearly indicates that the index is getting overwritten. I'm posing my
>> simple indexer and searcher herewith. Actually I'm trying to crawl web
>> pages
>> and add each pages content under a filed called "content" againts a field
>> called "id" and for this id I'm using the page URL. These are the codes
>>
>> The indexer:
>> --------------------------------------------
>> package solrSearch;
>>
>> import org.apache.lucene.analysis.SimpleAnalyzer;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.index.IndexWriter;
>>
>> public class SimpleIndexer {
>>
>>  // Base Path to the index directory
>>  private static final String baseIndexPath = "/opt/lucene/index/";
>>
>>
>>  public void createIndex(String pageContent, String pageId, String coreId)
>> throws Exception {
>>    String trueIndexPath = baseIndexPath + coreId ;
>>    String contentField = "content";
>>    String contentId    = "id";
>>
>>    // Create a writer
>>    IndexWriter writer = new IndexWriter(trueIndexPath, new
>> StandardAnalyzer(), true);
>>
>>    System.out.println("Adding page to lucene " + pageId);
>>    Document doc = new Document();
>>    doc.add(new Field(contentField, pageContent, Field.Store.YES,
>> Field.Index.TOKENIZED));
>>    doc.add(new Field(contentId, pageId, Field.Store.YES,
>> Field.Index.TOKENIZED));
>>
>>    // Add documents to the index
>>    writer.addDocument(doc);
>>
>>    // Lucene recommends calling optimize upon completion of indexing
>>    writer.optimize();
>>
>>    // clean up
>>    writer.close();
>>  }
>>
>>  public static void main(String args[]) throws Exception{
>>       SimpleIndexer empIndex = new SimpleIndexer();
>>    empIndex.createIndex("this is sample test content", "test0", "core0");
>>    System.out.println("Data indexed by lucene");
>>  }
>>
>> }
>>
>> and the searcher:
>> ---------------------------------------
>> package solrSearch;
>>
>> import java.io.FileReader;
>> import java.io.IOException;
>> import java.io.InputStreamReader;
>> import java.util.Date;
>>
>> import org.apache.lucene.analysis.Analyzer;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.index.FilterIndexReader;
>> import org.apache.lucene.index.IndexReader;
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.search.HitCollector;
>> import org.apache.lucene.search.Hits;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.search.Query;
>> import org.apache.lucene.search.ScoreDoc;
>> import org.apache.lucene.search.Searcher;
>> import org.apache.lucene.search.TopDocCollector;
>>
>> /** Simple command-line based search demo. */
>> public class SimpleSearcher {
>>    private static final String baseIndexPath = "/opt/lucene/index/" ;
>>
>>    private void searchIndex(String queryString, String coreId) throws
>> Exception{
>>        String trueIndexPath = baseIndexPath + coreId;
>>        String searchField = "content";
>>         IndexSearcher searcher = new IndexSearcher(trueIndexPath);
>>        QueryParser queryParser = null;
>>        try {
>>            queryParser = new QueryParser(searchField, new
>> StandardAnalyzer());
>>        } catch (Exception ex) {
>>             ex.printStackTrace();
>>        }
>>
>>        Query query = queryParser.parse(queryString);
>>
>>        Hits hits = null;
>>        try {
>>             hits = searcher.search(query);
>>        } catch (Exception ex) {
>>             ex.printStackTrace();
>>        }
>>
>>        int hitCount = hits.length();
>>        System.out.println("Results found :" + hitCount);
>>
>>        for (int ix=0; (ix<hitCount && ix<10); ix++) {
>>             Document doc = hits.doc(ix);
>>            System.out.println(doc.get("id"));
>>            System.out.println(doc.get("content"));
>>        }
>>    }
>>
>>    public static void main(String args[]) throws Exception{
>>         SimpleSearcher searcher = new SimpleSearcher();
>>        String queryString = args[0];
>>        System.out.println("Quering for :" + queryString);
>>        searcher.searchIndex(queryString, "core0");
>>    }
>>
>> }
>>
>> ---------------
>> When I tried intially without having the core0 directory, it automatically
>> created that. Its fine, but I'm not able to figure what is the issue, why
>> the data is getting overwritten. Some silly mistakes some where. Can some
>> one point me that?
>> And this is the code snip that I'm using to post to lucene index.
>>
>> public void postToSolr(String rawText, String pageId) throws Exception{
>>        // Which solr core are we posting to???
>>        //String solrCoreId = getCoreId(pageId);
>>        String coreId = "core0";
>>        SimpleIndexer indexer = new SimpleIndexer();
>>        indexer.createIndex(rawText, pageId, coreId);
>>
>>    }
>>
>> NB: I din't pay attention to change the names , so you might find the word
>> "solr" here and there. I was using that earlier, but bcoz of lack of
>> facility of creating new separate indexes I moved to lucene today only. I
>> guess trying to crete a new index with non-existing directory will
>> automatically create it, which is what i want. Correct me if i'm wrong. As
>> I
>> mentioned earlier for each domain [say www.bcd.co.uk] I want to have a
>> separate index and coreId is a map of this URL to a unique number. Do let
>> me
>> know if i'm going wrong anywhere of if you feel it can be done in any
>> other
>> better way.
>>
>>
>> Thanks,
>> KK.
>>
>>
>> On Wed, May 20, 2009 at 4:10 PM, Anshum <an...@gmail.com> wrote:
>>
>>
>>
>>> Hi KK,
>>>
>>> Easier still, you could just open the indexwriter with the last (3rd)
>>> arguement as true, this way the indexwriter would create a new index as
>>> soon
>>> as you start indexing. Also, if you just leave the indexWriter without
>>> the
>>> 3rd arguement, it'd conditionally create a new directory i.e. only if the
>>> index dir doesn't exist at that location would it create a new index else
>>> it
>>> would append to the already existing index at that location.
>>> Coming to the 2nd point, if you are talking about the index name, as
>>> mentioned by John you could simply use the timestamp as the index name.
>>>
>>> --
>>> Anshum Gupta
>>> Naukri Labs!
>>> http://ai-cafe.blogspot.com
>>>
>>> The facts expressed here belong to everybody, the opinions to me. The
>>> distinction is yours to draw............
>>>
>>>
>>> On Wed, May 20, 2009 at 3:23 PM, John Byrne <jo...@propylon.com>
>>> wrote:
>>>
>>>
>>>
>>>> You can do this with pure Java. Create a file object with the path you
>>>> want, check if it exists, and it not, create it:
>>>>
>>>> File newIndexDir = new File("/foo/bar")
>>>>
>>>> if(!newFileDir.exists())   {
>>>>
>>>>  newDirFile.mkdirs();
>>>> }
>>>>
>>>> The 'mkdirs()' method creates any necessary parent directories.
>>>>
>>>> If you want to automate the generation of the path itself, then there
>>>> are
>>>> several ways to do it, but the best way really depends on *why* you're
>>>> generating a new index. For instance, you could just create a
>>>> timestamped
>>>> name, but that name might not be very meaningful.
>>>>
>>>> Hope that helps!
>>>>
>>>> -John
>>>>
>>>> KK wrote:
>>>>
>>>>
>>>>
>>>>> How to create a new index? everytime I need to do so , I've to create a
>>>>> new
>>>>> directory and put the path to that, right? how to automate the creation
>>>>>
>>>>>
>>>> of
>>>
>>>
>>>> new directory?
>>>>>
>>>>> I'm a new user of lucene. Please help me out.
>>>>>
>>>>> Thanks,
>>>>> KK.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>  ------------------------------------------------------------------------
>>>
>>>
>>>> No virus found in this incoming message.
>>>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>>>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>  ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to create a new index

Posted by John Byrne <jo...@propylon.com>.

I think the problem is that you are creating an new index every time you 
add a document:

IndexWriter writer = new IndexWriter(trueIndexPath, new
StandardAnalyzer(), true);

The last argument, the boolean 'true' tells IndexWriter to overwrite any 
existing index in that directory. If you set that to false, it will not 
overwrite the previous index, but will add to it.

How, then do you create it in the first place? You call the 
IndexWriter's constructor once with 'true' as the 3rd argumrent, 
creating the index, then subsequently use 'false'. You could do this in 
your main method, right after you create an instance of SimpleIndexer, 
but before you call createIndex.

-John


KK wrote:
> Thank you very much.
> I'm using the one mentioned by @Anshum ..but the problem is that after
> indexing some no of docs what I see is only the last one indexed which
> clearly indicates that the index is getting overwritten. I'm posing my
> simple indexer and searcher herewith. Actually I'm trying to crawl web pages
> and add each pages content under a filed called "content" againts a field
> called "id" and for this id I'm using the page URL. These are the codes
>
> The indexer:
> --------------------------------------------
> package solrSearch;
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexWriter;
>
> public class SimpleIndexer {
>
>   // Base Path to the index directory
>   private static final String baseIndexPath = "/opt/lucene/index/";
>
>
>   public void createIndex(String pageContent, String pageId, String coreId)
> throws Exception {
>     String trueIndexPath = baseIndexPath + coreId ;
>     String contentField = "content";
>     String contentId    = "id";
>
>     // Create a writer
>     IndexWriter writer = new IndexWriter(trueIndexPath, new
> StandardAnalyzer(), true);
>
>     System.out.println("Adding page to lucene " + pageId);
>     Document doc = new Document();
>     doc.add(new Field(contentField, pageContent, Field.Store.YES,
> Field.Index.TOKENIZED));
>     doc.add(new Field(contentId, pageId, Field.Store.YES,
> Field.Index.TOKENIZED));
>
>     // Add documents to the index
>     writer.addDocument(doc);
>
>     // Lucene recommends calling optimize upon completion of indexing
>     writer.optimize();
>
>     // clean up
>     writer.close();
>   }
>
>   public static void main(String args[]) throws Exception{
>        SimpleIndexer empIndex = new SimpleIndexer();
>     empIndex.createIndex("this is sample test content", "test0", "core0");
>     System.out.println("Data indexed by lucene");
>   }
>
> }
>
> and the searcher:
> ---------------------------------------
> package solrSearch;
>
> import java.io.FileReader;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.util.Date;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.index.FilterIndexReader;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.search.HitCollector;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.Searcher;
> import org.apache.lucene.search.TopDocCollector;
>
> /** Simple command-line based search demo. */
> public class SimpleSearcher {
>     private static final String baseIndexPath = "/opt/lucene/index/" ;
>
>     private void searchIndex(String queryString, String coreId) throws
> Exception{
>         String trueIndexPath = baseIndexPath + coreId;
>         String searchField = "content";
>          IndexSearcher searcher = new IndexSearcher(trueIndexPath);
>         QueryParser queryParser = null;
>         try {
>             queryParser = new QueryParser(searchField, new
> StandardAnalyzer());
>         } catch (Exception ex) {
>              ex.printStackTrace();
>         }
>
>         Query query = queryParser.parse(queryString);
>
>         Hits hits = null;
>         try {
>              hits = searcher.search(query);
>         } catch (Exception ex) {
>              ex.printStackTrace();
>         }
>
>         int hitCount = hits.length();
>         System.out.println("Results found :" + hitCount);
>
>         for (int ix=0; (ix<hitCount && ix<10); ix++) {
>              Document doc = hits.doc(ix);
>             System.out.println(doc.get("id"));
>             System.out.println(doc.get("content"));
>         }
>     }
>
>     public static void main(String args[]) throws Exception{
>          SimpleSearcher searcher = new SimpleSearcher();
>         String queryString = args[0];
>         System.out.println("Quering for :" + queryString);
>         searcher.searchIndex(queryString, "core0");
>     }
>
> }
>
> ---------------
> When I tried intially without having the core0 directory, it automatically
> created that. Its fine, but I'm not able to figure what is the issue, why
> the data is getting overwritten. Some silly mistakes some where. Can some
> one point me that?
> And this is the code snip that I'm using to post to lucene index.
>
> public void postToSolr(String rawText, String pageId) throws Exception{
>         // Which solr core are we posting to???
>         //String solrCoreId = getCoreId(pageId);
>         String coreId = "core0";
>         SimpleIndexer indexer = new SimpleIndexer();
>         indexer.createIndex(rawText, pageId, coreId);
>
>     }
>
> NB: I din't pay attention to change the names , so you might find the word
> "solr" here and there. I was using that earlier, but bcoz of lack of
> facility of creating new separate indexes I moved to lucene today only. I
> guess trying to crete a new index with non-existing directory will
> automatically create it, which is what i want. Correct me if i'm wrong. As I
> mentioned earlier for each domain [say www.bcd.co.uk] I want to have a
> separate index and coreId is a map of this URL to a unique number. Do let me
> know if i'm going wrong anywhere of if you feel it can be done in any other
> better way.
>
>
> Thanks,
> KK.
>
>
> On Wed, May 20, 2009 at 4:10 PM, Anshum <an...@gmail.com> wrote:
>
>   
>> Hi KK,
>>
>> Easier still, you could just open the indexwriter with the last (3rd)
>> arguement as true, this way the indexwriter would create a new index as
>> soon
>> as you start indexing. Also, if you just leave the indexWriter without the
>> 3rd arguement, it'd conditionally create a new directory i.e. only if the
>> index dir doesn't exist at that location would it create a new index else
>> it
>> would append to the already existing index at that location.
>> Coming to the 2nd point, if you are talking about the index name, as
>> mentioned by John you could simply use the timestamp as the index name.
>>
>> --
>> Anshum Gupta
>> Naukri Labs!
>> http://ai-cafe.blogspot.com
>>
>> The facts expressed here belong to everybody, the opinions to me. The
>> distinction is yours to draw............
>>
>>
>> On Wed, May 20, 2009 at 3:23 PM, John Byrne <jo...@propylon.com>
>> wrote:
>>
>>     
>>> You can do this with pure Java. Create a file object with the path you
>>> want, check if it exists, and it not, create it:
>>>
>>> File newIndexDir = new File("/foo/bar")
>>>
>>> if(!newFileDir.exists())   {
>>>
>>>   newDirFile.mkdirs();
>>> }
>>>
>>> The 'mkdirs()' method creates any necessary parent directories.
>>>
>>> If you want to automate the generation of the path itself, then there are
>>> several ways to do it, but the best way really depends on *why* you're
>>> generating a new index. For instance, you could just create a timestamped
>>> name, but that name might not be very meaningful.
>>>
>>> Hope that helps!
>>>
>>> -John
>>>
>>> KK wrote:
>>>
>>>       
>>>> How to create a new index? everytime I need to do so , I've to create a
>>>> new
>>>> directory and put the path to that, right? how to automate the creation
>>>>         
>> of
>>     
>>>> new directory?
>>>>
>>>> I'm a new user of lucene. Please help me out.
>>>>
>>>> Thanks,
>>>> KK.
>>>>
>>>>
>>>>         
>>  ------------------------------------------------------------------------
>>     
>>>> No virus found in this incoming message.
>>>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>>>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>>>
>>>>
>>>>
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>       
>
>   
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.339 / Virus Database: 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to create a new index

Posted by KK <di...@gmail.com>.

Thank you very much.
I'm using the one mentioned by @Anshum ..but the problem is that after
indexing some no of docs what I see is only the last one indexed which
clearly indicates that the index is getting overwritten. I'm posing my
simple indexer and searcher herewith. Actually I'm trying to crawl web pages
and add each pages content under a filed called "content" againts a field
called "id" and for this id I'm using the page URL. These are the codes

The indexer:
--------------------------------------------
package solrSearch;

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;

public class SimpleIndexer {

  // Base Path to the index directory
  private static final String baseIndexPath = "/opt/lucene/index/";


  public void createIndex(String pageContent, String pageId, String coreId)
throws Exception {
    String trueIndexPath = baseIndexPath + coreId ;
    String contentField = "content";
    String contentId    = "id";

    // Create a writer
    IndexWriter writer = new IndexWriter(trueIndexPath, new
StandardAnalyzer(), true);

    System.out.println("Adding page to lucene " + pageId);
    Document doc = new Document();
    doc.add(new Field(contentField, pageContent, Field.Store.YES,
Field.Index.TOKENIZED));
    doc.add(new Field(contentId, pageId, Field.Store.YES,
Field.Index.TOKENIZED));

    // Add documents to the index
    writer.addDocument(doc);

    // Lucene recommends calling optimize upon completion of indexing
    writer.optimize();

    // clean up
    writer.close();
  }

  public static void main(String args[]) throws Exception{
       SimpleIndexer empIndex = new SimpleIndexer();
    empIndex.createIndex("this is sample test content", "test0", "core0");
    System.out.println("Data indexed by lucene");
  }

}

and the searcher:
---------------------------------------
package solrSearch;

import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Date;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.FilterIndexReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.HitCollector;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.TopDocCollector;

/** Simple command-line based search demo. */
public class SimpleSearcher {
    private static final String baseIndexPath = "/opt/lucene/index/" ;

    private void searchIndex(String queryString, String coreId) throws
Exception{
        String trueIndexPath = baseIndexPath + coreId;
        String searchField = "content";
         IndexSearcher searcher = new IndexSearcher(trueIndexPath);
        QueryParser queryParser = null;
        try {
            queryParser = new QueryParser(searchField, new
StandardAnalyzer());
        } catch (Exception ex) {
             ex.printStackTrace();
        }

        Query query = queryParser.parse(queryString);

        Hits hits = null;
        try {
             hits = searcher.search(query);
        } catch (Exception ex) {
             ex.printStackTrace();
        }

        int hitCount = hits.length();
        System.out.println("Results found :" + hitCount);

        for (int ix=0; (ix<hitCount && ix<10); ix++) {
             Document doc = hits.doc(ix);
            System.out.println(doc.get("id"));
            System.out.println(doc.get("content"));
        }
    }

    public static void main(String args[]) throws Exception{
         SimpleSearcher searcher = new SimpleSearcher();
        String queryString = args[0];
        System.out.println("Quering for :" + queryString);
        searcher.searchIndex(queryString, "core0");
    }

}

---------------
When I tried intially without having the core0 directory, it automatically
created that. Its fine, but I'm not able to figure what is the issue, why
the data is getting overwritten. Some silly mistakes some where. Can some
one point me that?
And this is the code snip that I'm using to post to lucene index.

public void postToSolr(String rawText, String pageId) throws Exception{
        // Which solr core are we posting to???
        //String solrCoreId = getCoreId(pageId);
        String coreId = "core0";
        SimpleIndexer indexer = new SimpleIndexer();
        indexer.createIndex(rawText, pageId, coreId);

    }

NB: I din't pay attention to change the names , so you might find the word
"solr" here and there. I was using that earlier, but bcoz of lack of
facility of creating new separate indexes I moved to lucene today only. I
guess trying to crete a new index with non-existing directory will
automatically create it, which is what i want. Correct me if i'm wrong. As I
mentioned earlier for each domain [say www.bcd.co.uk] I want to have a
separate index and coreId is a map of this URL to a unique number. Do let me
know if i'm going wrong anywhere of if you feel it can be done in any other
better way.


Thanks,
KK.


On Wed, May 20, 2009 at 4:10 PM, Anshum <an...@gmail.com> wrote:

> Hi KK,
>
> Easier still, you could just open the indexwriter with the last (3rd)
> arguement as true, this way the indexwriter would create a new index as
> soon
> as you start indexing. Also, if you just leave the indexWriter without the
> 3rd arguement, it'd conditionally create a new directory i.e. only if the
> index dir doesn't exist at that location would it create a new index else
> it
> would append to the already existing index at that location.
> Coming to the 2nd point, if you are talking about the index name, as
> mentioned by John you could simply use the timestamp as the index name.
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Wed, May 20, 2009 at 3:23 PM, John Byrne <jo...@propylon.com>
> wrote:
>
> > You can do this with pure Java. Create a file object with the path you
> > want, check if it exists, and it not, create it:
> >
> > File newIndexDir = new File("/foo/bar")
> >
> > if(!newFileDir.exists())   {
> >
> >   newDirFile.mkdirs();
> > }
> >
> > The 'mkdirs()' method creates any necessary parent directories.
> >
> > If you want to automate the generation of the path itself, then there are
> > several ways to do it, but the best way really depends on *why* you're
> > generating a new index. For instance, you could just create a timestamped
> > name, but that name might not be very meaningful.
> >
> > Hope that helps!
> >
> > -John
> >
> > KK wrote:
> >
> >> How to create a new index? everytime I need to do so , I've to create a
> >> new
> >> directory and put the path to that, right? how to automate the creation
> of
> >> new directory?
> >>
> >> I'm a new user of lucene. Please help me out.
> >>
> >> Thanks,
> >> KK.
> >>
> >>
>  ------------------------------------------------------------------------
> >>
> >>
> >> No virus found in this incoming message.
> >> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
> >> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
> >>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: How to create a new index

Posted by Anshum <an...@gmail.com>.

Hi KK,

Easier still, you could just open the indexwriter with the last (3rd)
arguement as true, this way the indexwriter would create a new index as soon
as you start indexing. Also, if you just leave the indexWriter without the
3rd arguement, it'd conditionally create a new directory i.e. only if the
index dir doesn't exist at that location would it create a new index else it
would append to the already existing index at that location.
Coming to the 2nd point, if you are talking about the index name, as
mentioned by John you could simply use the timestamp as the index name.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............

On Wed, May 20, 2009 at 3:23 PM, John Byrne <jo...@propylon.com> wrote:

> You can do this with pure Java. Create a file object with the path you
> want, check if it exists, and it not, create it:
>
> File newIndexDir = new File("/foo/bar")
>
> if(!newFileDir.exists())   {
>
>   newDirFile.mkdirs();
> }
>
> The 'mkdirs()' method creates any necessary parent directories.
>
> If you want to automate the generation of the path itself, then there are
> several ways to do it, but the best way really depends on *why* you're
> generating a new index. For instance, you could just create a timestamped
> name, but that name might not be very meaningful.
>
> Hope that helps!
>
> -John
>
> KK wrote:
>
>> How to create a new index? everytime I need to do so , I've to create a
>> new
>> directory and put the path to that, right? how to automate the creation of
>> new directory?
>>
>> I'm a new user of lucene. Please help me out.
>>
>> Thanks,
>> KK.
>>
>>  ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com Version: 8.5.339 / Virus Database:
>> 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to create a new index

Posted by John Byrne <jo...@propylon.com>.

You can do this with pure Java. Create a file object with the path you 
want, check if it exists, and it not, create it:

File newIndexDir = new File("/foo/bar")

if(!newFileDir.exists())   {

    newDirFile.mkdirs();
}

The 'mkdirs()' method creates any necessary parent directories.

If you want to automate the generation of the path itself, then there 
are several ways to do it, but the best way really depends on *why* 
you're generating a new index. For instance, you could just create a 
timestamped name, but that name might not be very meaningful.

Hope that helps!

-John

KK wrote:
> How to create a new index? everytime I need to do so , I've to create a new
> directory and put the path to that, right? how to automate the creation of
> new directory?
>
> I'm a new user of lucene. Please help me out.
>
> Thanks,
> KK.
>
>   
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.339 / Virus Database: 270.12.35/2123 - Release Date: 05/19/09 17:59:00
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org