You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Sebastin Naveen <se...@gmail.com> on 2007/06/06 16:19:01 UTC

Lucene Compression

Hi All,
            I am a lucene developer. i saw your benchmark in lucene website
http://lucene.apache.org
I have records upto 45 GB.when i compress the record it goes to 80
GB.howcan i compress to 10 GB or lower than that.
please help me in this regards.

i have send you the source code that i use:



*public class MediationIndexer {

 public static void main(String[] args) throws Exception{
        String slNo="";
         String fileName="";
         String callType="";
         String callingPartyNumber="";
         String calledPartyNumber="";
         String dateSc="";
         String timeSc="";
         String chargDur="";
         String outgoingRoute="";
         String incomingRoute="";
         String orgCalledNumber="";
         String redirectingNumber="";
         String imsiNumber="";



        File   indexDir = new File("C:/Sample/Mediatio/Index");
            Analyzer analyzer = new StandardAnalyzer();
        IndexWriter indexWriter = new IndexWriter(indexDir,analyzer,true);
           // indexWriter.setUseCompoundFile(true);

        File mediationFiles=new File("C:/mediation files");
        File   fileDir  = new File("C:/mediation files");
            long startTime = new Date().getTime();
        String mediFiles[]=mediationFiles.list();
        for(int j=0;j<mediFiles.length;j++)
        {
        File   file = new File("C:/mediation files"+ "/" +mediFiles[j]);

        //indexDir is the directory that hosts Lucene's index files




        String myFiles[] = file.list();
        System.out.println(myFiles.length);

        for (int i = 0; i <myFiles.length ; i++){

        int recCount = 0;
             try {
                  FileReader fr     = new FileReader(file+"/"+ myFiles[i]);
                  BufferedReader br = new BufferedReader(fr);

        //Add documents to the index

                 *

*                 String record = br.readLine();
                      System.out.println("First:"+record);
                  while (record  != null){

                       System.out.println("Current:"+record);

                       System.out.println(record);

                       String[] afterSplit = record.split(",");
                       for(int p=0;p<1;p++) {

                       slNo = afterSplit[0];
                       fileName= afterSplit[1];
                       callType=  afterSplit[2];
                       callingPartyNumber= afterSplit[3];
                       calledPartyNumber=afterSplit[4];
                       dateSc= afterSplit[5];
                       timeSc=afterSplit[6];
                       chargDur= afterSplit[7];
                       outgoingRoute=afterSplit[8];
                       incomingRoute=afterSplit[9];
                       orgCalledNumber=afterSplit[10];
                       redirectingNumber=afterSplit[11];
                       imsiNumber=afterSplit[12];

                       String contents =
                          new String(callType  + callingPartyNumber  +
calledPartyNumber  + dateSc +
                                      timeSc  + chargDur +  outgoingRoute +
                                      incomingRoute  +
                                     imsiNumber);
                                    recCount++;
                       System.out.println(recCount + ": " + record);
                       System.out.println(recCount + ": " + contents);

                       Document document = new Document();
                       *

*                       document.add(new Field("contents",contents,
Field.Store.YES,Field.Index.TOKENIZED));
                       document.add(new Field("callType",callType,
Field.Store.YES ,Field.Index.NO <http://field.index.no/>));
                       document.add(new
Field("callingPartyNumber",callingPartyNumber,Field.Store.YES,Field.Index.NO<http://field.index.no/>));

                       document.add(new
Field("calledPartyNumber",calledPartyNumber,Field.Store.YES,Field.Index.NO<http://field.index.no/>
));
                       document.add(new Field("dateSc",dateSc,
Field.Store.YES,Field.Index.TOKENIZED));
                       document.add(new Field("timeSc",timeSc,
Field.Store.YES,Field.Index.NO <http://field.index.no/>));
                       document.add (new Field("chargDur",chargDur,
Field.Store.YES,Field.Index.NO <http://field.index.no/>));
                       document.add(new Field("outgoingRoute",outgoingRoute,
Field.Store.YES, Field.Index.NO <http://field.index.no/>));
                       document.add(new Field("incomingRoute",incomingRoute,
Field.Store.YES,Field.Index.NO <http://field.index.no/>));
                       }



                       record = br.readLine();
                       if(record.equalsIgnoreCase("")) {
                           record=null;
                       }


                   }


                  }catch (IOException e) {
                               // catch possible io errors from readLine()
                               e.printStackTrace();
                            }
        }
        }
            long endTime = new Date().getTime();
            System.out.println("It took " + (endTime - startTime)
             + " milliseconds to create an index for the files in the
directory "
             + fileDir.getPath());
        }


}
*


-- 
Regards,
Sebastin Naveen

Re: Lucene Compression

Posted by Chris Hostetter <ho...@fucit.org>.

:    i removed that field.but cant able to compress 20% of the original
: size.when i index some fields it took almost 2:10 of the size.how can i
: compress please help me in this regards.

Please reread the first two sentence of the message your replied to...

: > First off: please don't send questions about *using* the Lucene Java
: > library to the java-dev list ... it is for discussion about *developing*
: > the internals of hte Lucene Java API. the java-user@lucene list is for
: > discussion about *using* the API in your own applications.
: >
: > If you hav any followup questions, please start a new thread on that list.

You may also find this thread of interest...

http://www.nabble.com/Need-Lucene-Compression-help----can-pay-nominal-fee-tf3881801.html#a11013878

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene Compression

Posted by Sebastin <se...@gmail.com>.

Hi,
   i removed that field.but cant able to compress 20% of the original
size.when i index some fields it took almost 2:10 of the size.how can i
compress please help me in this regards.

hossman_lucene wrote:
> 
> 
> First off: please don't send questions about *using* the Lucene Java
> library to the java-dev list ... it is for discussion about *developing*
> the internals of hte Lucene Java API. the java-user@lucene list is for
> discussion about *using* the API in your own applications.
> 
> If you hav any followup questions, please start a new thread on that list.
> 
> : *                       document.add(new Field("contents",contents,
> : Field.Store.YES,Field.Index.TOKENIZED));
> 
> I'm going to guess that you don't really need to "store" that field since
> it seems to be a meaningless concatenation of a bunch of data that you've
> already stored in other fields.  Field.Store.NO will probably cut down
> your index size by a significant ammount.
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Lucene-Compression-tf3878281.html#a11188311
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene Compression

Posted by Chris Hostetter <ho...@fucit.org>.

First off: please don't send questions about *using* the Lucene Java
library to the java-dev list ... it is for discussion about *developing*
the internals of hte Lucene Java API. the java-user@lucene list is for
discussion about *using* the API in your own applications.

If you hav any followup questions, please start a new thread on that list.

: *                       document.add(new Field("contents",contents,
: Field.Store.YES,Field.Index.TOKENIZED));

I'm going to guess that you don't really need to "store" that field since
it seems to be a meaningless concatenation of a bunch of data that you've
already stored in other fields.  Field.Store.NO will probably cut down
your index size by a significant ammount.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org