You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ankit Murarka <an...@rancoretech.com> on 2013/08/29 07:51:07 UTC
Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Hello all,
Faced with a typical issue.
I have many files which I am indexing.
Problem Faced:
a. File having size less than 20 MB are successfully indexed and merged.
b. File having size >20MB are not getting INDEXED.. No Exception is
being thrown. Only a lock file is being created in the index directory.
The indexing process for a single file exceeding 20 MB size continues
for more than 8 minutes after which I have a code which merge the
generated index to existing index.
Since no index is being generated now, I get an exception during merging
process.
Why Files having size greater than 20 MB are not being indexed..??. I
am indexing each line of the file. Why IndexWriter is not throwing any
error.
Do I need to change any parameter in Lucene or tweak the Lucene settings
?? Lucene version is 4.4.0
My current deployment for Lucene is on a server running with 128 MB and
512 MB heap.
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ankit Murarka <an...@rancoretech.com>.
Any help would be highly appreciated........I am kind of struck and
unable to find out a possible solution..
On 8/29/2013 11:21 AM, Ankit Murarka wrote:
> Hello all,
>
> Faced with a typical issue.
> I have many files which I am indexing.
>
> Problem Faced:
> a. File having size less than 20 MB are successfully indexed and merged.
>
> b. File having size >20MB are not getting INDEXED.. No Exception is
> being thrown. Only a lock file is being created in the index
> directory. The indexing process for a single file exceeding 20 MB size
> continues for more than 8 minutes after which I have a code which
> merge the generated index to existing index.
>
> Since no index is being generated now, I get an exception during
> merging process.
>
> Why Files having size greater than 20 MB are not being indexed..??. I
> am indexing each line of the file. Why IndexWriter is not throwing any
> error.
>
> Do I need to change any parameter in Lucene or tweak the Lucene
> settings ?? Lucene version is 4.4.0
>
> My current deployment for Lucene is on a server running with 128 MB
> and 512 MB heap.
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ian Lea <ia...@gmail.com>.
The exact point at which a java program hits OOM is pretty random and
it isn't always fair to blame the chunk triggering the exception.
callMethodThatAllocatesNearlyAllMemory();
callMethodThatAllocatesABitOfMemory();
and the second one hits OOM, but is not to blame.
Anyway, your base problem seems to be that you are trying to index a
20Mb chunk of text in one go and, despite appearing to have plenty of
memory, failing.
What happens with different -Xmx values? Are you sure they are taking
effect? What does code like this:
Runtime rt = Runtime.getRuntime();
long maxMemory = rt.maxMemory() / 1024 / 1024;
long totalMemory = rt.totalMemory() / 1024 / 1024;
long freeMemory = rt.freeMemory() / 1024 / 1024;
say before/after loading a 10Mb file, 15Mb file, 20Mb file etc? Have
you run it with a memory profiler? Verbose GC? Googled diagnosing
java memory problems? What happens when you don't index the 20Mb in
one chunk? Only index the 20Mb in one chunk, not the individual
lines? Does it work when you are adding a doc but not when you are
updating? The other way round? Fail on all 20Mb files or just some?
If some, what's the difference between them?
Good luck. Have a nice weekend.
--
Ian.
On Fri, Aug 30, 2013 at 4:14 PM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Can someone please suggest what might be the possible resolution for the
> issue mentioned in trailing mail::
>
> Also now on changing some settings for IndexWriterConfig and
> LiveIndexWriterConfig I get the following exception:
>
>
> 20:31:23,540 INFO java.lang.OutOfMemoryError: Java heap space
> 20:31:23,540 INFO at
> org.apache.lucene.util.UnicodeUtil.UTF16toUTF8WithHash(UnicodeUtil.java:136)
> 20:31:23,540 INFO at
> org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.fillBytesRef(CharTermAttributeImpl.java:91)
> 20:31:23,541 INFO at
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
> 20:31:23,541 INFO at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165)
> 20:31:23,541 INFO at
> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
> 20:31:23,542 INFO at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
> 20:31:23,542 INFO at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
> 20:31:23,542 INFO at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
> 20:31:23,542 INFO at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188)
> 20:31:23,543 INFO at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169)
> 20:31:23,543 INFO at
> com.rancore.MainClass1.indexDocs(MainClass1.java:220)
> 20:31:23,543 INFO at
> com.rancore.MainClass1.indexDocs(MainClass1.java:167)
> 20:31:23,543 INFO at com.rancore.MainClass1.main(MainClass1.java:110)
> 20:31:23,546 INFO java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
> 20:31:23,546 INFO at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
> 20:31:23,546 INFO at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
> 20:31:23,546 INFO at
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
> 20:31:23,547 INFO at com.rancore.MainClass1.main(MainClass1.java:136)
>
> Can anyone please guide....
> There has to be some way how a file of say 20 MB can be properly indexed...
>
> Any guidance is highly appreciated..
>
>
>
> On 8/30/2013 6:49 PM, Ankit Murarka wrote:
>>
>> Hello,
>>
>> The following exception is being printed on the server console when trying
>> to index. As usual, indexes are not getting created.
>>
>>
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:148)
>> at
>> org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:128)
>> 18:42:21,764 INFO at
>> org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:91)
>> 18:42:21,765 INFO at
>> org.apache.lucene.document.Field$StringTokenStream.<init>(Field.java:568)
>> 18:42:21,765 INFO at
>> org.apache.lucene.document.Field.tokenStream(Field.java:541)
>> 18:42:21,765 INFO at
>> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
>> 18:42:21,766 INFO at
>> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
>> 18:42:21,766 INFO at
>> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
>> 18:42:21,766 INFO at
>> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
>> 18:42:21,767 INFO at
>> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
>> 18:42:21,767 INFO at
>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188)
>> 18:42:21,767 INFO at
>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169)
>> 18:42:21,768 INFO at
>> com.rancore.MainClass1.indexDocs(MainClass1.java:197)
>> 18:42:21,768 INFO at
>> com.rancore.MainClass1.indexDocs(MainClass1.java:153)
>> 18:42:21,768 INFO at com.rancore.MainClass1.main(MainClass1.java:95)
>> 18:42:21,771 INFO java.lang.IllegalStateException: this writer hit an
>> OutOfMemoryError; cannot commit
>> 18:42:21,772 INFO at
>> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
>> 18:42:21,911 INFO at
>> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
>> 18:42:21,911 INFO at
>> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
>> 18:42:21,912 INFO at com.rancore.MainClass1.main(MainClass1.java:122)
>> 18:42:22,008 INFO Indexing to directory
>>
>>
>> Any guidance will be highly appreciated...>!!!!... Server Opts are
>> -server -Xms8192m -Xmx16384m -XX:MaxPermSize=512m
>>
>> On 8/30/2013 3:13 PM, Ankit Murarka wrote:
>>>
>>> Hello.
>>> The server has much more memory. I have given minimum 8 GB to
>>> Application Server..
>>>
>>> The Java opts which are of interest is : -server -Xms8192m -Xmx16384m
>>> -XX:MaxPermSize=8192m
>>>
>>> Even after giving this much memory to the server, how come i am hitting
>>> OOM exceptions. No other activity is being performed on the server apart
>>> from this.
>>>
>>> Checking from JConsole, the maximum Heap during indexing was close to 1.2
>>> GB whereas the memory allocated is as mentioned above,.
>>>
>>> I did mentioned 128MB also but this is when I start the server on a
>>> normal windows machine.
>>>
>>> Isn't there any property/configuration in LUCENE which I should do in
>>> order to index large files. Say about 30 MB.. I read something MergeFactor
>>> and etc. but was not able to set any value for it. Don't even know whether
>>> doing that will help the cause..
>>>
>>>
>>> On 8/29/2013 7:04 PM, Ian Lea wrote:
>>>>
>>>> Well, I use neither Eclipse nor your application server and can offer
>>>> no advice on any differences in behaviour between the two. Maybe you
>>>> should try Eclipse or app server forums.
>>>>
>>>> If you are going to index the complete contents of a file as one field
>>>> you are likely to hit OOM exceptions. How big is the largest file you
>>>> are ever going to index?
>>>>
>>>> The server may have 8GB but how much memory are you allowing the JVM?
>>>> What are the command line flags? I think you mentioned 128Mb in an
>>>> earlier email. That isn't much.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>>
>>>> On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka
>>>> <an...@rancoretech.com> wrote:
>>>>>
>>>>> Hello,
>>>>> I get exception only when the code is fired from Eclipse.
>>>>> When it is deployed on an application server, I get no exception at
>>>>> all.
>>>>> This forced me to invoke the same code from Eclipse and check what is
>>>>> the
>>>>> issue.,.
>>>>>
>>>>> I ran the code on server with 8 GB memory.. Even then no exception
>>>>> occurred....!!.. Only write.lock is formed..
>>>>>
>>>>> Removing contents field is not desirable as this is needed for search
>>>>> to
>>>>> work perfectly...
>>>>>
>>>>> On 8/29/2013 6:17 PM, Ian Lea wrote:
>>>>>>
>>>>>> So you do get an exception after all, OOM.
>>>>>>
>>>>>> Try it without this line:
>>>>>>
>>>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>>>> InputStreamReader(fis, "UTF-8"))));
>>>>>>
>>>>>> I think that will slurp the whole file in one go which will obviously
>>>>>> need more memory on larger files than on smaller ones.
>>>>>>
>>>>>> Or just run the program with more memory,
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ian.
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
>>>>>> <an...@rancoretech.com> wrote:
>>>>>>
>>>>>>> Yes I know that Lucene should not have any document size limits. All
>>>>>>> I
>>>>>>> get
>>>>>>> is a lock file inside my index folder. Along with this there's no
>>>>>>> other
>>>>>>> file
>>>>>>> inside the index folder. Then I get OOM exception.
>>>>>>> Please provide some guidance...
>>>>>>>
>>>>>>> Here is the example:
>>>>>>>
>>>>>>> package com.issue;
>>>>>>>
>>>>>>>
>>>>>>> import org.apache.lucene.analysis.Analyzer;
>>>>>>> import org.apache.lucene.document.Document;
>>>>>>> import org.apache.lucene.document.Field;
>>>>>>> import org.apache.lucene.document.LongField;
>>>>>>> import org.apache.lucene.document.StringField;
>>>>>>> import org.apache.lucene.document.TextField;
>>>>>>> import org.apache.lucene.index.IndexCommit;
>>>>>>> import org.apache.lucene.index.IndexWriter;
>>>>>>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>>>>>>> import org.apache.lucene.index.IndexWriterConfig;
>>>>>>> import org.apache.lucene.index.LiveIndexWriterConfig;
>>>>>>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>>>>>>> import org.apache.lucene.index.MergePolicy;
>>>>>>> import org.apache.lucene.index.SerialMergeScheduler;
>>>>>>> import org.apache.lucene.index.MergePolicy.OneMerge;
>>>>>>> import org.apache.lucene.index.MergeScheduler;
>>>>>>> import org.apache.lucene.index.Term;
>>>>>>> import org.apache.lucene.store.Directory;
>>>>>>> import org.apache.lucene.store.FSDirectory;
>>>>>>> import org.apache.lucene.util.Version;
>>>>>>>
>>>>>>>
>>>>>>> import java.io.BufferedReader;
>>>>>>> import java.io.File;
>>>>>>> import java.io.FileInputStream;
>>>>>>> import java.io.FileNotFoundException;
>>>>>>> import java.io.FileReader;
>>>>>>> import java.io.IOException;
>>>>>>> import java.io.InputStreamReader;
>>>>>>> import java.io.LineNumberReader;
>>>>>>> import java.util.Date;
>>>>>>>
>>>>>>> public class D {
>>>>>>>
>>>>>>> /** Index all text files under a directory. */
>>>>>>>
>>>>>>>
>>>>>>> static String[] filenames;
>>>>>>>
>>>>>>> public static void main(String[] args) {
>>>>>>>
>>>>>>> //String indexPath = args[0];
>>>>>>>
>>>>>>> String indexPath="D:\\Issue";//Place where indexes will be
>>>>>>> created
>>>>>>> String docsPath="Issue"; //Place where the files are kept.
>>>>>>> boolean create=true;
>>>>>>>
>>>>>>> String ch="OverAll";
>>>>>>>
>>>>>>>
>>>>>>> final File docDir = new File(docsPath);
>>>>>>> if (!docDir.exists() || !docDir.canRead()) {
>>>>>>> System.out.println("Document directory '"
>>>>>>> +docDir.getAbsolutePath()+
>>>>>>> "' does not exist or is not readable, please check the path");
>>>>>>> System.exit(1);
>>>>>>> }
>>>>>>>
>>>>>>> Date start = new Date();
>>>>>>> try {
>>>>>>> Directory dir = FSDirectory.open(new File(indexPath));
>>>>>>> Analyzer analyzer=new
>>>>>>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>>>>>> IndexWriterConfig iwc = new
>>>>>>> IndexWriterConfig(Version.LUCENE_44,
>>>>>>> analyzer);
>>>>>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>>>>>
>>>>>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>>>>>> if(ch.equalsIgnoreCase("OverAll")){
>>>>>>> indexDocs(writer, docDir,true);
>>>>>>> }else{
>>>>>>> filenames=args[2].split(",");
>>>>>>> // indexDocs(writer, docDir);
>>>>>>>
>>>>>>> }
>>>>>>> writer.commit();
>>>>>>> writer.close();
>>>>>>>
>>>>>>> } catch (IOException e) {
>>>>>>> System.out.println(" caught a " + e.getClass() +
>>>>>>> "\n with message: " + e.getMessage());
>>>>>>> }
>>>>>>> catch(Exception e)
>>>>>>> {
>>>>>>>
>>>>>>> e.printStackTrace();
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> //Over All
>>>>>>> static void indexDocs(IndexWriter writer, File file,boolean flag)
>>>>>>> throws IOException {
>>>>>>>
>>>>>>> FileInputStream fis = null;
>>>>>>> if (file.canRead()) {
>>>>>>>
>>>>>>> if (file.isDirectory()) {
>>>>>>> String[] files = file.list();
>>>>>>> // an IO error could occur
>>>>>>> if (files != null) {
>>>>>>> for (int i = 0; i< files.length; i++) {
>>>>>>> indexDocs(writer, new File(file, files[i]),flag);
>>>>>>> }
>>>>>>> }
>>>>>>> } else {
>>>>>>> try {
>>>>>>> fis = new FileInputStream(file);
>>>>>>> } catch (FileNotFoundException fnfe) {
>>>>>>>
>>>>>>> fnfe.printStackTrace();
>>>>>>> }
>>>>>>>
>>>>>>> try {
>>>>>>>
>>>>>>> Document doc = new Document();
>>>>>>>
>>>>>>> Field pathField = new StringField("path", file.getPath(),
>>>>>>> Field.Store.YES);
>>>>>>> doc.add(pathField);
>>>>>>>
>>>>>>> doc.add(new LongField("modified", file.lastModified(),
>>>>>>> Field.Store.NO));
>>>>>>>
>>>>>>> doc.add(new
>>>>>>> StringField("name",file.getName(),Field.Store.YES));
>>>>>>>
>>>>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>>>>> InputStreamReader(fis, "UTF-8"))));
>>>>>>>
>>>>>>> LineNumberReader lnr=new LineNumberReader(new
>>>>>>> FileReader(file));
>>>>>>>
>>>>>>>
>>>>>>> String line=null;
>>>>>>> while( null != (line = lnr.readLine()) ){
>>>>>>> doc.add(new
>>>>>>> StringField("SC",line.trim(),Field.Store.YES));
>>>>>>> // doc.add(new
>>>>>>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>>>>>> }
>>>>>>>
>>>>>>> if (writer.getConfig().getOpenMode() ==
>>>>>>> OpenMode.CREATE_OR_APPEND)
>>>>>>> {
>>>>>>>
>>>>>>> writer.addDocument(doc);
>>>>>>> writer.commit();
>>>>>>> fis.close();
>>>>>>> } else {
>>>>>>> try
>>>>>>> {
>>>>>>> writer.updateDocument(new Term("path", file.getPath()),
>>>>>>> doc);
>>>>>>>
>>>>>>> fis.close();
>>>>>>>
>>>>>>> }catch(Exception e)
>>>>>>> {
>>>>>>> writer.close();
>>>>>>> fis.close();
>>>>>>>
>>>>>>> e.printStackTrace();
>>>>>>>
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> }catch (Exception e) {
>>>>>>> writer.close();
>>>>>>> fis.close();
>>>>>>>
>>>>>>> e.printStackTrace();
>>>>>>> }finally {
>>>>>>> // writer.close();
>>>>>>>
>>>>>>> fis.close();
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>>>>>>
>>>>>>>> Lucene doesn't have document size limits.
>>>>>>>>
>>>>>>>> There are default limits for how many tokens the highlighters will
>>>>>>>> process
>>>>>>>> ...
>>>>>>>>
>>>>>>>> But, if you are passing each line as a separate document to Lucene,
>>>>>>>> then Lucene only sees a bunch of tiny documents, right?
>>>>>>>>
>>>>>>>> Can you boil this down to a small test showing the problem?
>>>>>>>>
>>>>>>>> Mike McCandless
>>>>>>>>
>>>>>>>> http://blog.mikemccandless.com
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>>>>>>>> <an...@rancoretech.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hello all,
>>>>>>>>>
>>>>>>>>> Faced with a typical issue.
>>>>>>>>> I have many files which I am indexing.
>>>>>>>>>
>>>>>>>>> Problem Faced:
>>>>>>>>> a. File having size less than 20 MB are successfully indexed and
>>>>>>>>> merged.
>>>>>>>>>
>>>>>>>>> b. File having size>20MB are not getting INDEXED.. No Exception is
>>>>>>>>> being
>>>>>>>>> thrown. Only a lock file is being created in the index directory.
>>>>>>>>> The
>>>>>>>>> indexing process for a single file exceeding 20 MB size continues
>>>>>>>>> for
>>>>>>>>> more
>>>>>>>>> than 8 minutes after which I have a code which merge the generated
>>>>>>>>> index
>>>>>>>>> to
>>>>>>>>> existing index.
>>>>>>>>>
>>>>>>>>> Since no index is being generated now, I get an exception during
>>>>>>>>> merging
>>>>>>>>> process.
>>>>>>>>>
>>>>>>>>> Why Files having size greater than 20 MB are not being indexed..??.
>>>>>>>>> I
>>>>>>>>> am
>>>>>>>>> indexing each line of the file. Why IndexWriter is not throwing any
>>>>>>>>> error.
>>>>>>>>>
>>>>>>>>> Do I need to change any parameter in Lucene or tweak the Lucene
>>>>>>>>> settings
>>>>>>>>> ??
>>>>>>>>> Lucene version is 4.4.0
>>>>>>>>>
>>>>>>>>> My current deployment for Lucene is on a server running with 128 MB
>>>>>>>>> and
>>>>>>>>> 512
>>>>>>>>> MB heap.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Ankit Murarka
>>>>>>>>>
>>>>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>>>>> compared
>>>>>>>>> with
>>>>>>>>> what lies within us"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>>
>>>>>>> Ankit Murarka
>>>>>>>
>>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>>> compared
>>>>>>> with
>>>>>>> what lies within us"
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>>
>>>>> Ankit Murarka
>>>>>
>>>>> "What lies behind us and what lies before us are tiny matters compared
>>>>> with
>>>>> what lies within us"
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>
>>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello.
Code Sample for the issue:-
Would like to mention that I am able to index documents containing close
to 3.5 Lakh lines.. Whereas INDEXING is NOT happening when number of
lines are anything greater than 5 Lakhs.. I get Memory Exception from
Java...
Would sincerely appreciate some help and guidance,.
import org.apache.lucene.analysis.Analyzer;
>>>>>>> import org.apache.lucene.document.Document;
>>>>>>> import org.apache.lucene.document.Field;
>>>>>>> import org.apache.lucene.document.LongField;
>>>>>>> import org.apache.lucene.document.StringField;
>>>>>>> import org.apache.lucene.document.TextField;
>>>>>>> import org.apache.lucene.index.IndexCommit;
>>>>>>> import org.apache.lucene.index.IndexWriter;
>>>>>>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>>>>>>> import org.apache.lucene.index.IndexWriterConfig;
>>>>>>> import org.apache.lucene.index.LiveIndexWriterConfig;
>>>>>>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>>>>>>> import org.apache.lucene.index.MergePolicy;
>>>>>>> import org.apache.lucene.index.SerialMergeScheduler;
>>>>>>> import org.apache.lucene.index.MergePolicy.OneMerge;
>>>>>>> import org.apache.lucene.index.MergeScheduler;
>>>>>>> import org.apache.lucene.index.Term;
>>>>>>> import org.apache.lucene.store.Directory;
>>>>>>> import org.apache.lucene.store.FSDirectory;
>>>>>>> import org.apache.lucene.util.Version;
>>>>>>>
>>>>>>>
>>>>>>> import java.io.BufferedReader;
>>>>>>> import java.io.File;
>>>>>>> import java.io.FileInputStream;
>>>>>>> import java.io.FileNotFoundException;
>>>>>>> import java.io.FileReader;
>>>>>>> import java.io.IOException;
>>>>>>> import java.io.InputStreamReader;
>>>>>>> import java.io.LineNumberReader;
>>>>>>> import java.util.Date;
>>>>>>>
>>>>>>> public class D {
>>>>>>>
>>>>>>> /** Index all text files under a directory. */
>>>>>>>
>>>>>>>
>>>>>>> static String[] filenames;
>>>>>>>
>>>>>>> public static void main(String[] args) {
>>>>>>>
>>>>>>> //String indexPath = args[0];
>>>>>>>
>>>>>>> String indexPath="D:\\Issue";//Place where indexes will be
>>>>>>> created
>>>>>>> String docsPath="Issue"; //Place where the files are kept.
>>>>>>> boolean create=true;
>>>>>>>
>>>>>>> String ch="OverAll";
>>>>>>>
>>>>>>>
>>>>>>> final File docDir = new File(docsPath);
>>>>>>> if (!docDir.exists() || !docDir.canRead()) {
>>>>>>> System.out.println("Document directory '"
>>>>>>> +docDir.getAbsolutePath()+
>>>>>>> "' does not exist or is not readable, please check the path");
>>>>>>> System.exit(1);
>>>>>>> }
>>>>>>>
>>>>>>> Date start = new Date();
>>>>>>> try {
>>>>>>> Directory dir = FSDirectory.open(new File(indexPath));
>>>>>>> Analyzer analyzer=new
>>>>>>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>>>>>> IndexWriterConfig iwc = new
>>>>>>> IndexWriterConfig(Version.LUCENE_44,
>>>>>>> analyzer);
>>>>>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>>>>>
>>>>>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>>>>>> if(ch.equalsIgnoreCase("OverAll")){
>>>>>>> indexDocs(writer, docDir,true);
>>>>>>> }else{
>>>>>>> filenames=args[2].split(",");
>>>>>>> // indexDocs(writer, docDir);
>>>>>>>
>>>>>>> }
>>>>>>> writer.commit();
>>>>>>> writer.close();
>>>>>>>
>>>>>>> } catch (IOException e) {
>>>>>>> System.out.println(" caught a " + e.getClass() +
>>>>>>> "\n with message: " + e.getMessage());
>>>>>>> }
>>>>>>> catch(Exception e)
>>>>>>> {
>>>>>>>
>>>>>>> e.printStackTrace();
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> //Over All
>>>>>>> static void indexDocs(IndexWriter writer, File file,boolean flag)
>>>>>>> throws IOException {
>>>>>>>
>>>>>>> FileInputStream fis = null;
>>>>>>> if (file.canRead()) {
>>>>>>>
>>>>>>> if (file.isDirectory()) {
>>>>>>> String[] files = file.list();
>>>>>>> // an IO error could occur
>>>>>>> if (files != null) {
>>>>>>> for (int i = 0; i< files.length; i++) {
>>>>>>> indexDocs(writer, new File(file, files[i]),flag);
>>>>>>> }
>>>>>>> }
>>>>>>> } else {
>>>>>>> try {
>>>>>>> fis = new FileInputStream(file);
>>>>>>> } catch (FileNotFoundException fnfe) {
>>>>>>>
>>>>>>> fnfe.printStackTrace();
>>>>>>> }
>>>>>>>
>>>>>>> try {
>>>>>>>
>>>>>>> Document doc = new Document();
>>>>>>>
>>>>>>> Field pathField = new StringField("path", file.getPath(),
>>>>>>> Field.Store.YES);
>>>>>>> doc.add(pathField);
>>>>>>>
>>>>>>> doc.add(new LongField("modified", file.lastModified(),
>>>>>>> Field.Store.NO));
>>>>>>>
>>>>>>> doc.add(new
>>>>>>> StringField("name",file.getName(),Field.Store.YES));
>>>>>>>
>>>>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>>>>> InputStreamReader(fis, "UTF-8"))));
>>>>>>>
>>>>>>> LineNumberReader lnr=new LineNumberReader(new
>>>>>>> FileReader(file));
>>>>>>>
>>>>>>>
>>>>>>> String line=null;
>>>>>>> while( null != (line = lnr.readLine()) ){
>>>>>>> doc.add(new
>>>>>>> StringField("SC",line.trim(),Field.Store.YES));
>>>>>>> // doc.add(new
>>>>>>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>>>>>> }
>>>>>>>
>>>>>>> if (writer.getConfig().getOpenMode() ==
>>>>>>> OpenMode.CREATE_OR_APPEND)
>>>>>>> {
>>>>>>>
>>>>>>> writer.addDocument(doc);
>>>>>>> writer.commit();
>>>>>>> fis.close();
>>>>>>> } else {
>>>>>>> try
>>>>>>> {
>>>>>>> writer.updateDocument(new Term("path", file.getPath()),
>>>>>>> doc);
>>>>>>>
>>>>>>> fis.close();
>>>>>>>
>>>>>>> }catch(Exception e)
>>>>>>> {
>>>>>>> writer.close();
>>>>>>> fis.close();
>>>>>>>
>>>>>>> e.printStackTrace();
>>>>>>>
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> }catch (Exception e) {
>>>>>>> writer.close();
>>>>>>> fis.close();
>>>>>>>
>>>>>>> e.printStackTrace();
>>>>>>> }finally {
>>>>>>> // writer.close();
>>>>>>>
>>>>>>> fis.close();
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>> }
On 8/30/2013 10:33 PM, Adrien Grand wrote:
> Ankit,
>
> The stack traces you are showing only say there was an out of memory
> error. In those case, the stack trace is unfortunately not always
> helpful since the allocation may fail on a small object because
> another object is taking all the memory of the JVM. Can you come up
> with a small piece of code that reproduces the error you are
> encountering? This would help us see if there is something wrong in
> the indexing code and try to debug it otherwise.
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Adrien Grand <jp...@gmail.com>.
Ankit,
The stack traces you are showing only say there was an out of memory
error. In those case, the stack trace is unfortunately not always
helpful since the allocation may fail on a small object because
another object is taking all the memory of the JVM. Can you come up
with a small piece of code that reproduces the error you are
encountering? This would help us see if there is something wrong in
the indexing code and try to debug it otherwise.
--
Adrien
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ankit Murarka <an...@rancoretech.com>.
Can someone please suggest what might be the possible resolution for the
issue mentioned in trailing mail::
Also now on changing some settings for IndexWriterConfig and
LiveIndexWriterConfig I get the following exception:
20:31:23,540 INFO java.lang.OutOfMemoryError: Java heap space
20:31:23,540 INFO at
org.apache.lucene.util.UnicodeUtil.UTF16toUTF8WithHash(UnicodeUtil.java:136)
20:31:23,540 INFO at
org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.fillBytesRef(CharTermAttributeImpl.java:91)
20:31:23,541 INFO at
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
20:31:23,541 INFO at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165)
20:31:23,541 INFO at
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
20:31:23,542 INFO at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
20:31:23,542 INFO at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
20:31:23,542 INFO at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
20:31:23,542 INFO at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188)
20:31:23,543 INFO at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169)
20:31:23,543 INFO at
com.rancore.MainClass1.indexDocs(MainClass1.java:220)
20:31:23,543 INFO at
com.rancore.MainClass1.indexDocs(MainClass1.java:167)
20:31:23,543 INFO at com.rancore.MainClass1.main(MainClass1.java:110)
20:31:23,546 INFO java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
20:31:23,546 INFO at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
20:31:23,546 INFO at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
20:31:23,546 INFO at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
20:31:23,547 INFO at com.rancore.MainClass1.main(MainClass1.java:136)
Can anyone please guide....
There has to be some way how a file of say 20 MB can be properly indexed...
Any guidance is highly appreciated..
On 8/30/2013 6:49 PM, Ankit Murarka wrote:
> Hello,
>
> The following exception is being printed on the server console when
> trying to index. As usual, indexes are not getting created.
>
>
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:148)
> at
> org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:128)
> 18:42:21,764 INFO at
> org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:91)
> 18:42:21,765 INFO at
> org.apache.lucene.document.Field$StringTokenStream.<init>(Field.java:568)
> 18:42:21,765 INFO at
> org.apache.lucene.document.Field.tokenStream(Field.java:541)
> 18:42:21,765 INFO at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
>
> 18:42:21,766 INFO at
> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
>
> 18:42:21,766 INFO at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
>
> 18:42:21,766 INFO at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
>
> 18:42:21,767 INFO at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
> 18:42:21,767 INFO at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188)
> 18:42:21,767 INFO at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169)
> 18:42:21,768 INFO at
> com.rancore.MainClass1.indexDocs(MainClass1.java:197)
> 18:42:21,768 INFO at
> com.rancore.MainClass1.indexDocs(MainClass1.java:153)
> 18:42:21,768 INFO at com.rancore.MainClass1.main(MainClass1.java:95)
> 18:42:21,771 INFO java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
> 18:42:21,772 INFO at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
>
> 18:42:21,911 INFO at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
> 18:42:21,911 INFO at
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
> 18:42:21,912 INFO at com.rancore.MainClass1.main(MainClass1.java:122)
> 18:42:22,008 INFO Indexing to directory
>
>
> Any guidance will be highly appreciated...>!!!!... Server Opts are
> -server -Xms8192m -Xmx16384m -XX:MaxPermSize=512m
>
> On 8/30/2013 3:13 PM, Ankit Murarka wrote:
>> Hello.
>> The server has much more memory. I have given minimum 8 GB to
>> Application Server..
>>
>> The Java opts which are of interest is : -server -Xms8192m
>> -Xmx16384m -XX:MaxPermSize=8192m
>>
>> Even after giving this much memory to the server, how come i am
>> hitting OOM exceptions. No other activity is being performed on the
>> server apart from this.
>>
>> Checking from JConsole, the maximum Heap during indexing was close to
>> 1.2 GB whereas the memory allocated is as mentioned above,.
>>
>> I did mentioned 128MB also but this is when I start the server on a
>> normal windows machine.
>>
>> Isn't there any property/configuration in LUCENE which I should do in
>> order to index large files. Say about 30 MB.. I read something
>> MergeFactor and etc. but was not able to set any value for it. Don't
>> even know whether doing that will help the cause..
>>
>>
>> On 8/29/2013 7:04 PM, Ian Lea wrote:
>>> Well, I use neither Eclipse nor your application server and can offer
>>> no advice on any differences in behaviour between the two. Maybe you
>>> should try Eclipse or app server forums.
>>>
>>> If you are going to index the complete contents of a file as one field
>>> you are likely to hit OOM exceptions. How big is the largest file you
>>> are ever going to index?
>>>
>>> The server may have 8GB but how much memory are you allowing the JVM?
>>> What are the command line flags? I think you mentioned 128Mb in an
>>> earlier email. That isn't much.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>>
>>> On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka
>>> <an...@rancoretech.com> wrote:
>>>> Hello,
>>>> I get exception only when the code is fired from Eclipse.
>>>> When it is deployed on an application server, I get no exception at
>>>> all.
>>>> This forced me to invoke the same code from Eclipse and check what
>>>> is the
>>>> issue.,.
>>>>
>>>> I ran the code on server with 8 GB memory.. Even then no exception
>>>> occurred....!!.. Only write.lock is formed..
>>>>
>>>> Removing contents field is not desirable as this is needed for
>>>> search to
>>>> work perfectly...
>>>>
>>>> On 8/29/2013 6:17 PM, Ian Lea wrote:
>>>>> So you do get an exception after all, OOM.
>>>>>
>>>>> Try it without this line:
>>>>>
>>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>>> InputStreamReader(fis, "UTF-8"))));
>>>>>
>>>>> I think that will slurp the whole file in one go which will obviously
>>>>> need more memory on larger files than on smaller ones.
>>>>>
>>>>> Or just run the program with more memory,
>>>>>
>>>>>
>>>>> --
>>>>> Ian.
>>>>>
>>>>>
>>>>> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
>>>>> <an...@rancoretech.com> wrote:
>>>>>
>>>>>> Yes I know that Lucene should not have any document size limits.
>>>>>> All I
>>>>>> get
>>>>>> is a lock file inside my index folder. Along with this there's no
>>>>>> other
>>>>>> file
>>>>>> inside the index folder. Then I get OOM exception.
>>>>>> Please provide some guidance...
>>>>>>
>>>>>> Here is the example:
>>>>>>
>>>>>> package com.issue;
>>>>>>
>>>>>>
>>>>>> import org.apache.lucene.analysis.Analyzer;
>>>>>> import org.apache.lucene.document.Document;
>>>>>> import org.apache.lucene.document.Field;
>>>>>> import org.apache.lucene.document.LongField;
>>>>>> import org.apache.lucene.document.StringField;
>>>>>> import org.apache.lucene.document.TextField;
>>>>>> import org.apache.lucene.index.IndexCommit;
>>>>>> import org.apache.lucene.index.IndexWriter;
>>>>>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>>>>>> import org.apache.lucene.index.IndexWriterConfig;
>>>>>> import org.apache.lucene.index.LiveIndexWriterConfig;
>>>>>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>>>>>> import org.apache.lucene.index.MergePolicy;
>>>>>> import org.apache.lucene.index.SerialMergeScheduler;
>>>>>> import org.apache.lucene.index.MergePolicy.OneMerge;
>>>>>> import org.apache.lucene.index.MergeScheduler;
>>>>>> import org.apache.lucene.index.Term;
>>>>>> import org.apache.lucene.store.Directory;
>>>>>> import org.apache.lucene.store.FSDirectory;
>>>>>> import org.apache.lucene.util.Version;
>>>>>>
>>>>>>
>>>>>> import java.io.BufferedReader;
>>>>>> import java.io.File;
>>>>>> import java.io.FileInputStream;
>>>>>> import java.io.FileNotFoundException;
>>>>>> import java.io.FileReader;
>>>>>> import java.io.IOException;
>>>>>> import java.io.InputStreamReader;
>>>>>> import java.io.LineNumberReader;
>>>>>> import java.util.Date;
>>>>>>
>>>>>> public class D {
>>>>>>
>>>>>> /** Index all text files under a directory. */
>>>>>>
>>>>>>
>>>>>> static String[] filenames;
>>>>>>
>>>>>> public static void main(String[] args) {
>>>>>>
>>>>>> //String indexPath = args[0];
>>>>>>
>>>>>> String indexPath="D:\\Issue";//Place where indexes will be
>>>>>> created
>>>>>> String docsPath="Issue"; //Place where the files are kept.
>>>>>> boolean create=true;
>>>>>>
>>>>>> String ch="OverAll";
>>>>>>
>>>>>>
>>>>>> final File docDir = new File(docsPath);
>>>>>> if (!docDir.exists() || !docDir.canRead()) {
>>>>>> System.out.println("Document directory '"
>>>>>> +docDir.getAbsolutePath()+
>>>>>> "' does not exist or is not readable, please check the path");
>>>>>> System.exit(1);
>>>>>> }
>>>>>>
>>>>>> Date start = new Date();
>>>>>> try {
>>>>>> Directory dir = FSDirectory.open(new File(indexPath));
>>>>>> Analyzer analyzer=new
>>>>>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>>>>> IndexWriterConfig iwc = new
>>>>>> IndexWriterConfig(Version.LUCENE_44,
>>>>>> analyzer);
>>>>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>>>>
>>>>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>>>>> if(ch.equalsIgnoreCase("OverAll")){
>>>>>> indexDocs(writer, docDir,true);
>>>>>> }else{
>>>>>> filenames=args[2].split(",");
>>>>>> // indexDocs(writer, docDir);
>>>>>>
>>>>>> }
>>>>>> writer.commit();
>>>>>> writer.close();
>>>>>>
>>>>>> } catch (IOException e) {
>>>>>> System.out.println(" caught a " + e.getClass() +
>>>>>> "\n with message: " + e.getMessage());
>>>>>> }
>>>>>> catch(Exception e)
>>>>>> {
>>>>>>
>>>>>> e.printStackTrace();
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> //Over All
>>>>>> static void indexDocs(IndexWriter writer, File file,boolean
>>>>>> flag)
>>>>>> throws IOException {
>>>>>>
>>>>>> FileInputStream fis = null;
>>>>>> if (file.canRead()) {
>>>>>>
>>>>>> if (file.isDirectory()) {
>>>>>> String[] files = file.list();
>>>>>> // an IO error could occur
>>>>>> if (files != null) {
>>>>>> for (int i = 0; i< files.length; i++) {
>>>>>> indexDocs(writer, new File(file, files[i]),flag);
>>>>>> }
>>>>>> }
>>>>>> } else {
>>>>>> try {
>>>>>> fis = new FileInputStream(file);
>>>>>> } catch (FileNotFoundException fnfe) {
>>>>>>
>>>>>> fnfe.printStackTrace();
>>>>>> }
>>>>>>
>>>>>> try {
>>>>>>
>>>>>> Document doc = new Document();
>>>>>>
>>>>>> Field pathField = new StringField("path",
>>>>>> file.getPath(),
>>>>>> Field.Store.YES);
>>>>>> doc.add(pathField);
>>>>>>
>>>>>> doc.add(new LongField("modified", file.lastModified(),
>>>>>> Field.Store.NO));
>>>>>>
>>>>>> doc.add(new
>>>>>> StringField("name",file.getName(),Field.Store.YES));
>>>>>>
>>>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>>>> InputStreamReader(fis, "UTF-8"))));
>>>>>>
>>>>>> LineNumberReader lnr=new LineNumberReader(new
>>>>>> FileReader(file));
>>>>>>
>>>>>>
>>>>>> String line=null;
>>>>>> while( null != (line = lnr.readLine()) ){
>>>>>> doc.add(new
>>>>>> StringField("SC",line.trim(),Field.Store.YES));
>>>>>> // doc.add(new
>>>>>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>>>>> }
>>>>>>
>>>>>> if (writer.getConfig().getOpenMode() ==
>>>>>> OpenMode.CREATE_OR_APPEND)
>>>>>> {
>>>>>>
>>>>>> writer.addDocument(doc);
>>>>>> writer.commit();
>>>>>> fis.close();
>>>>>> } else {
>>>>>> try
>>>>>> {
>>>>>> writer.updateDocument(new Term("path",
>>>>>> file.getPath()),
>>>>>> doc);
>>>>>>
>>>>>> fis.close();
>>>>>>
>>>>>> }catch(Exception e)
>>>>>> {
>>>>>> writer.close();
>>>>>> fis.close();
>>>>>>
>>>>>> e.printStackTrace();
>>>>>>
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> }catch (Exception e) {
>>>>>> writer.close();
>>>>>> fis.close();
>>>>>>
>>>>>> e.printStackTrace();
>>>>>> }finally {
>>>>>> // writer.close();
>>>>>>
>>>>>> fis.close();
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>>>>>
>>>>>>> Lucene doesn't have document size limits.
>>>>>>>
>>>>>>> There are default limits for how many tokens the highlighters will
>>>>>>> process
>>>>>>> ...
>>>>>>>
>>>>>>> But, if you are passing each line as a separate document to Lucene,
>>>>>>> then Lucene only sees a bunch of tiny documents, right?
>>>>>>>
>>>>>>> Can you boil this down to a small test showing the problem?
>>>>>>>
>>>>>>> Mike McCandless
>>>>>>>
>>>>>>> http://blog.mikemccandless.com
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>>>>>>> <an...@rancoretech.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> Faced with a typical issue.
>>>>>>>> I have many files which I am indexing.
>>>>>>>>
>>>>>>>> Problem Faced:
>>>>>>>> a. File having size less than 20 MB are successfully indexed and
>>>>>>>> merged.
>>>>>>>>
>>>>>>>> b. File having size>20MB are not getting INDEXED.. No Exception is
>>>>>>>> being
>>>>>>>> thrown. Only a lock file is being created in the index
>>>>>>>> directory. The
>>>>>>>> indexing process for a single file exceeding 20 MB size
>>>>>>>> continues for
>>>>>>>> more
>>>>>>>> than 8 minutes after which I have a code which merge the generated
>>>>>>>> index
>>>>>>>> to
>>>>>>>> existing index.
>>>>>>>>
>>>>>>>> Since no index is being generated now, I get an exception during
>>>>>>>> merging
>>>>>>>> process.
>>>>>>>>
>>>>>>>> Why Files having size greater than 20 MB are not being
>>>>>>>> indexed..??. I
>>>>>>>> am
>>>>>>>> indexing each line of the file. Why IndexWriter is not throwing
>>>>>>>> any
>>>>>>>> error.
>>>>>>>>
>>>>>>>> Do I need to change any parameter in Lucene or tweak the Lucene
>>>>>>>> settings
>>>>>>>> ??
>>>>>>>> Lucene version is 4.4.0
>>>>>>>>
>>>>>>>> My current deployment for Lucene is on a server running with
>>>>>>>> 128 MB and
>>>>>>>> 512
>>>>>>>> MB heap.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Ankit Murarka
>>>>>>>>
>>>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>>>> compared
>>>>>>>> with
>>>>>>>> what lies within us"
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>>
>>>>>> Ankit Murarka
>>>>>>
>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>> compared
>>>>>> with
>>>>>> what lies within us"
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit Murarka
>>>>
>>>> "What lies behind us and what lies before us are tiny matters
>>>> compared with
>>>> what lies within us"
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello,
The following exception is being printed on the server console when
trying to index. As usual, indexes are not getting created.
java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:148)
at
org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:128)
18:42:21,764 INFO at
org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:91)
18:42:21,765 INFO at
org.apache.lucene.document.Field$StringTokenStream.<init>(Field.java:568)
18:42:21,765 INFO at
org.apache.lucene.document.Field.tokenStream(Field.java:541)
18:42:21,765 INFO at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
18:42:21,766 INFO at
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
18:42:21,766 INFO at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
18:42:21,766 INFO at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
18:42:21,767 INFO at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
18:42:21,767 INFO at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188)
18:42:21,767 INFO at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169)
18:42:21,768 INFO at
com.rancore.MainClass1.indexDocs(MainClass1.java:197)
18:42:21,768 INFO at
com.rancore.MainClass1.indexDocs(MainClass1.java:153)
18:42:21,768 INFO at com.rancore.MainClass1.main(MainClass1.java:95)
18:42:21,771 INFO java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
18:42:21,772 INFO at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
18:42:21,911 INFO at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
18:42:21,911 INFO at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
18:42:21,912 INFO at com.rancore.MainClass1.main(MainClass1.java:122)
18:42:22,008 INFO Indexing to directory
Any guidance will be highly appreciated...>!!!!... Server Opts are
-server -Xms8192m -Xmx16384m -XX:MaxPermSize=512m
On 8/30/2013 3:13 PM, Ankit Murarka wrote:
> Hello.
> The server has much more memory. I have given minimum 8 GB to
> Application Server..
>
> The Java opts which are of interest is : -server -Xms8192m -Xmx16384m
> -XX:MaxPermSize=8192m
>
> Even after giving this much memory to the server, how come i am
> hitting OOM exceptions. No other activity is being performed on the
> server apart from this.
>
> Checking from JConsole, the maximum Heap during indexing was close to
> 1.2 GB whereas the memory allocated is as mentioned above,.
>
> I did mentioned 128MB also but this is when I start the server on a
> normal windows machine.
>
> Isn't there any property/configuration in LUCENE which I should do in
> order to index large files. Say about 30 MB.. I read something
> MergeFactor and etc. but was not able to set any value for it. Don't
> even know whether doing that will help the cause..
>
>
> On 8/29/2013 7:04 PM, Ian Lea wrote:
>> Well, I use neither Eclipse nor your application server and can offer
>> no advice on any differences in behaviour between the two. Maybe you
>> should try Eclipse or app server forums.
>>
>> If you are going to index the complete contents of a file as one field
>> you are likely to hit OOM exceptions. How big is the largest file you
>> are ever going to index?
>>
>> The server may have 8GB but how much memory are you allowing the JVM?
>> What are the command line flags? I think you mentioned 128Mb in an
>> earlier email. That isn't much.
>>
>>
>> --
>> Ian.
>>
>>
>>
>> On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka
>> <an...@rancoretech.com> wrote:
>>> Hello,
>>> I get exception only when the code is fired from Eclipse.
>>> When it is deployed on an application server, I get no exception at
>>> all.
>>> This forced me to invoke the same code from Eclipse and check what
>>> is the
>>> issue.,.
>>>
>>> I ran the code on server with 8 GB memory.. Even then no exception
>>> occurred....!!.. Only write.lock is formed..
>>>
>>> Removing contents field is not desirable as this is needed for
>>> search to
>>> work perfectly...
>>>
>>> On 8/29/2013 6:17 PM, Ian Lea wrote:
>>>> So you do get an exception after all, OOM.
>>>>
>>>> Try it without this line:
>>>>
>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>> InputStreamReader(fis, "UTF-8"))));
>>>>
>>>> I think that will slurp the whole file in one go which will obviously
>>>> need more memory on larger files than on smaller ones.
>>>>
>>>> Or just run the program with more memory,
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
>>>> <an...@rancoretech.com> wrote:
>>>>
>>>>> Yes I know that Lucene should not have any document size limits.
>>>>> All I
>>>>> get
>>>>> is a lock file inside my index folder. Along with this there's no
>>>>> other
>>>>> file
>>>>> inside the index folder. Then I get OOM exception.
>>>>> Please provide some guidance...
>>>>>
>>>>> Here is the example:
>>>>>
>>>>> package com.issue;
>>>>>
>>>>>
>>>>> import org.apache.lucene.analysis.Analyzer;
>>>>> import org.apache.lucene.document.Document;
>>>>> import org.apache.lucene.document.Field;
>>>>> import org.apache.lucene.document.LongField;
>>>>> import org.apache.lucene.document.StringField;
>>>>> import org.apache.lucene.document.TextField;
>>>>> import org.apache.lucene.index.IndexCommit;
>>>>> import org.apache.lucene.index.IndexWriter;
>>>>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>>>>> import org.apache.lucene.index.IndexWriterConfig;
>>>>> import org.apache.lucene.index.LiveIndexWriterConfig;
>>>>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>>>>> import org.apache.lucene.index.MergePolicy;
>>>>> import org.apache.lucene.index.SerialMergeScheduler;
>>>>> import org.apache.lucene.index.MergePolicy.OneMerge;
>>>>> import org.apache.lucene.index.MergeScheduler;
>>>>> import org.apache.lucene.index.Term;
>>>>> import org.apache.lucene.store.Directory;
>>>>> import org.apache.lucene.store.FSDirectory;
>>>>> import org.apache.lucene.util.Version;
>>>>>
>>>>>
>>>>> import java.io.BufferedReader;
>>>>> import java.io.File;
>>>>> import java.io.FileInputStream;
>>>>> import java.io.FileNotFoundException;
>>>>> import java.io.FileReader;
>>>>> import java.io.IOException;
>>>>> import java.io.InputStreamReader;
>>>>> import java.io.LineNumberReader;
>>>>> import java.util.Date;
>>>>>
>>>>> public class D {
>>>>>
>>>>> /** Index all text files under a directory. */
>>>>>
>>>>>
>>>>> static String[] filenames;
>>>>>
>>>>> public static void main(String[] args) {
>>>>>
>>>>> //String indexPath = args[0];
>>>>>
>>>>> String indexPath="D:\\Issue";//Place where indexes will be
>>>>> created
>>>>> String docsPath="Issue"; //Place where the files are kept.
>>>>> boolean create=true;
>>>>>
>>>>> String ch="OverAll";
>>>>>
>>>>>
>>>>> final File docDir = new File(docsPath);
>>>>> if (!docDir.exists() || !docDir.canRead()) {
>>>>> System.out.println("Document directory '"
>>>>> +docDir.getAbsolutePath()+
>>>>> "' does not exist or is not readable, please check the path");
>>>>> System.exit(1);
>>>>> }
>>>>>
>>>>> Date start = new Date();
>>>>> try {
>>>>> Directory dir = FSDirectory.open(new File(indexPath));
>>>>> Analyzer analyzer=new
>>>>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>>>> IndexWriterConfig iwc = new
>>>>> IndexWriterConfig(Version.LUCENE_44,
>>>>> analyzer);
>>>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>>>
>>>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>>>> if(ch.equalsIgnoreCase("OverAll")){
>>>>> indexDocs(writer, docDir,true);
>>>>> }else{
>>>>> filenames=args[2].split(",");
>>>>> // indexDocs(writer, docDir);
>>>>>
>>>>> }
>>>>> writer.commit();
>>>>> writer.close();
>>>>>
>>>>> } catch (IOException e) {
>>>>> System.out.println(" caught a " + e.getClass() +
>>>>> "\n with message: " + e.getMessage());
>>>>> }
>>>>> catch(Exception e)
>>>>> {
>>>>>
>>>>> e.printStackTrace();
>>>>> }
>>>>> }
>>>>>
>>>>> //Over All
>>>>> static void indexDocs(IndexWriter writer, File file,boolean flag)
>>>>> throws IOException {
>>>>>
>>>>> FileInputStream fis = null;
>>>>> if (file.canRead()) {
>>>>>
>>>>> if (file.isDirectory()) {
>>>>> String[] files = file.list();
>>>>> // an IO error could occur
>>>>> if (files != null) {
>>>>> for (int i = 0; i< files.length; i++) {
>>>>> indexDocs(writer, new File(file, files[i]),flag);
>>>>> }
>>>>> }
>>>>> } else {
>>>>> try {
>>>>> fis = new FileInputStream(file);
>>>>> } catch (FileNotFoundException fnfe) {
>>>>>
>>>>> fnfe.printStackTrace();
>>>>> }
>>>>>
>>>>> try {
>>>>>
>>>>> Document doc = new Document();
>>>>>
>>>>> Field pathField = new StringField("path", file.getPath(),
>>>>> Field.Store.YES);
>>>>> doc.add(pathField);
>>>>>
>>>>> doc.add(new LongField("modified", file.lastModified(),
>>>>> Field.Store.NO));
>>>>>
>>>>> doc.add(new
>>>>> StringField("name",file.getName(),Field.Store.YES));
>>>>>
>>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>>> InputStreamReader(fis, "UTF-8"))));
>>>>>
>>>>> LineNumberReader lnr=new LineNumberReader(new
>>>>> FileReader(file));
>>>>>
>>>>>
>>>>> String line=null;
>>>>> while( null != (line = lnr.readLine()) ){
>>>>> doc.add(new
>>>>> StringField("SC",line.trim(),Field.Store.YES));
>>>>> // doc.add(new
>>>>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>>>> }
>>>>>
>>>>> if (writer.getConfig().getOpenMode() ==
>>>>> OpenMode.CREATE_OR_APPEND)
>>>>> {
>>>>>
>>>>> writer.addDocument(doc);
>>>>> writer.commit();
>>>>> fis.close();
>>>>> } else {
>>>>> try
>>>>> {
>>>>> writer.updateDocument(new Term("path", file.getPath()),
>>>>> doc);
>>>>>
>>>>> fis.close();
>>>>>
>>>>> }catch(Exception e)
>>>>> {
>>>>> writer.close();
>>>>> fis.close();
>>>>>
>>>>> e.printStackTrace();
>>>>>
>>>>> }
>>>>> }
>>>>>
>>>>> }catch (Exception e) {
>>>>> writer.close();
>>>>> fis.close();
>>>>>
>>>>> e.printStackTrace();
>>>>> }finally {
>>>>> // writer.close();
>>>>>
>>>>> fis.close();
>>>>> }
>>>>> }
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>>>>
>>>>>> Lucene doesn't have document size limits.
>>>>>>
>>>>>> There are default limits for how many tokens the highlighters will
>>>>>> process
>>>>>> ...
>>>>>>
>>>>>> But, if you are passing each line as a separate document to Lucene,
>>>>>> then Lucene only sees a bunch of tiny documents, right?
>>>>>>
>>>>>> Can you boil this down to a small test showing the problem?
>>>>>>
>>>>>> Mike McCandless
>>>>>>
>>>>>> http://blog.mikemccandless.com
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>>>>>> <an...@rancoretech.com> wrote:
>>>>>>
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> Faced with a typical issue.
>>>>>>> I have many files which I am indexing.
>>>>>>>
>>>>>>> Problem Faced:
>>>>>>> a. File having size less than 20 MB are successfully indexed and
>>>>>>> merged.
>>>>>>>
>>>>>>> b. File having size>20MB are not getting INDEXED.. No Exception is
>>>>>>> being
>>>>>>> thrown. Only a lock file is being created in the index
>>>>>>> directory. The
>>>>>>> indexing process for a single file exceeding 20 MB size
>>>>>>> continues for
>>>>>>> more
>>>>>>> than 8 minutes after which I have a code which merge the generated
>>>>>>> index
>>>>>>> to
>>>>>>> existing index.
>>>>>>>
>>>>>>> Since no index is being generated now, I get an exception during
>>>>>>> merging
>>>>>>> process.
>>>>>>>
>>>>>>> Why Files having size greater than 20 MB are not being
>>>>>>> indexed..??. I
>>>>>>> am
>>>>>>> indexing each line of the file. Why IndexWriter is not throwing any
>>>>>>> error.
>>>>>>>
>>>>>>> Do I need to change any parameter in Lucene or tweak the Lucene
>>>>>>> settings
>>>>>>> ??
>>>>>>> Lucene version is 4.4.0
>>>>>>>
>>>>>>> My current deployment for Lucene is on a server running with 128
>>>>>>> MB and
>>>>>>> 512
>>>>>>> MB heap.
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>>
>>>>>>> Ankit Murarka
>>>>>>>
>>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>>> compared
>>>>>>> with
>>>>>>> what lies within us"
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>>
>>>>> Ankit Murarka
>>>>>
>>>>> "What lies behind us and what lies before us are tiny matters
>>>>> compared
>>>>> with
>>>>> what lies within us"
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters
>>> compared with
>>> what lies within us"
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello.
The server has much more memory. I have given minimum 8 GB to
Application Server..
The Java opts which are of interest is : -server -Xms8192m -Xmx16384m
-XX:MaxPermSize=8192m
Even after giving this much memory to the server, how come i am hitting
OOM exceptions. No other activity is being performed on the server apart
from this.
Checking from JConsole, the maximum Heap during indexing was close to
1.2 GB whereas the memory allocated is as mentioned above,.
I did mentioned 128MB also but this is when I start the server on a
normal windows machine.
Isn't there any property/configuration in LUCENE which I should do in
order to index large files. Say about 30 MB.. I read something
MergeFactor and etc. but was not able to set any value for it. Don't
even know whether doing that will help the cause..
On 8/29/2013 7:04 PM, Ian Lea wrote:
> Well, I use neither Eclipse nor your application server and can offer
> no advice on any differences in behaviour between the two. Maybe you
> should try Eclipse or app server forums.
>
> If you are going to index the complete contents of a file as one field
> you are likely to hit OOM exceptions. How big is the largest file you
> are ever going to index?
>
> The server may have 8GB but how much memory are you allowing the JVM?
> What are the command line flags? I think you mentioned 128Mb in an
> earlier email. That isn't much.
>
>
> --
> Ian.
>
>
>
> On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka
> <an...@rancoretech.com> wrote:
>
>> Hello,
>> I get exception only when the code is fired from Eclipse.
>> When it is deployed on an application server, I get no exception at all.
>> This forced me to invoke the same code from Eclipse and check what is the
>> issue.,.
>>
>> I ran the code on server with 8 GB memory.. Even then no exception
>> occurred....!!.. Only write.lock is formed..
>>
>> Removing contents field is not desirable as this is needed for search to
>> work perfectly...
>>
>> On 8/29/2013 6:17 PM, Ian Lea wrote:
>>
>>> So you do get an exception after all, OOM.
>>>
>>> Try it without this line:
>>>
>>> doc.add(new TextField("contents", new BufferedReader(new
>>> InputStreamReader(fis, "UTF-8"))));
>>>
>>> I think that will slurp the whole file in one go which will obviously
>>> need more memory on larger files than on smaller ones.
>>>
>>> Or just run the program with more memory,
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
>>> <an...@rancoretech.com> wrote:
>>>
>>>
>>>> Yes I know that Lucene should not have any document size limits. All I
>>>> get
>>>> is a lock file inside my index folder. Along with this there's no other
>>>> file
>>>> inside the index folder. Then I get OOM exception.
>>>> Please provide some guidance...
>>>>
>>>> Here is the example:
>>>>
>>>> package com.issue;
>>>>
>>>>
>>>> import org.apache.lucene.analysis.Analyzer;
>>>> import org.apache.lucene.document.Document;
>>>> import org.apache.lucene.document.Field;
>>>> import org.apache.lucene.document.LongField;
>>>> import org.apache.lucene.document.StringField;
>>>> import org.apache.lucene.document.TextField;
>>>> import org.apache.lucene.index.IndexCommit;
>>>> import org.apache.lucene.index.IndexWriter;
>>>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>>>> import org.apache.lucene.index.IndexWriterConfig;
>>>> import org.apache.lucene.index.LiveIndexWriterConfig;
>>>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>>>> import org.apache.lucene.index.MergePolicy;
>>>> import org.apache.lucene.index.SerialMergeScheduler;
>>>> import org.apache.lucene.index.MergePolicy.OneMerge;
>>>> import org.apache.lucene.index.MergeScheduler;
>>>> import org.apache.lucene.index.Term;
>>>> import org.apache.lucene.store.Directory;
>>>> import org.apache.lucene.store.FSDirectory;
>>>> import org.apache.lucene.util.Version;
>>>>
>>>>
>>>> import java.io.BufferedReader;
>>>> import java.io.File;
>>>> import java.io.FileInputStream;
>>>> import java.io.FileNotFoundException;
>>>> import java.io.FileReader;
>>>> import java.io.IOException;
>>>> import java.io.InputStreamReader;
>>>> import java.io.LineNumberReader;
>>>> import java.util.Date;
>>>>
>>>> public class D {
>>>>
>>>> /** Index all text files under a directory. */
>>>>
>>>>
>>>> static String[] filenames;
>>>>
>>>> public static void main(String[] args) {
>>>>
>>>> //String indexPath = args[0];
>>>>
>>>> String indexPath="D:\\Issue";//Place where indexes will be created
>>>> String docsPath="Issue"; //Place where the files are kept.
>>>> boolean create=true;
>>>>
>>>> String ch="OverAll";
>>>>
>>>>
>>>> final File docDir = new File(docsPath);
>>>> if (!docDir.exists() || !docDir.canRead()) {
>>>> System.out.println("Document directory '"
>>>> +docDir.getAbsolutePath()+
>>>> "' does not exist or is not readable, please check the path");
>>>> System.exit(1);
>>>> }
>>>>
>>>> Date start = new Date();
>>>> try {
>>>> Directory dir = FSDirectory.open(new File(indexPath));
>>>> Analyzer analyzer=new
>>>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>>> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
>>>> analyzer);
>>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>>
>>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>>> if(ch.equalsIgnoreCase("OverAll")){
>>>> indexDocs(writer, docDir,true);
>>>> }else{
>>>> filenames=args[2].split(",");
>>>> // indexDocs(writer, docDir);
>>>>
>>>> }
>>>> writer.commit();
>>>> writer.close();
>>>>
>>>> } catch (IOException e) {
>>>> System.out.println(" caught a " + e.getClass() +
>>>> "\n with message: " + e.getMessage());
>>>> }
>>>> catch(Exception e)
>>>> {
>>>>
>>>> e.printStackTrace();
>>>> }
>>>> }
>>>>
>>>> //Over All
>>>> static void indexDocs(IndexWriter writer, File file,boolean flag)
>>>> throws IOException {
>>>>
>>>> FileInputStream fis = null;
>>>> if (file.canRead()) {
>>>>
>>>> if (file.isDirectory()) {
>>>> String[] files = file.list();
>>>> // an IO error could occur
>>>> if (files != null) {
>>>> for (int i = 0; i< files.length; i++) {
>>>> indexDocs(writer, new File(file, files[i]),flag);
>>>> }
>>>> }
>>>> } else {
>>>> try {
>>>> fis = new FileInputStream(file);
>>>> } catch (FileNotFoundException fnfe) {
>>>>
>>>> fnfe.printStackTrace();
>>>> }
>>>>
>>>> try {
>>>>
>>>> Document doc = new Document();
>>>>
>>>> Field pathField = new StringField("path", file.getPath(),
>>>> Field.Store.YES);
>>>> doc.add(pathField);
>>>>
>>>> doc.add(new LongField("modified", file.lastModified(),
>>>> Field.Store.NO));
>>>>
>>>> doc.add(new
>>>> StringField("name",file.getName(),Field.Store.YES));
>>>>
>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>> InputStreamReader(fis, "UTF-8"))));
>>>>
>>>> LineNumberReader lnr=new LineNumberReader(new
>>>> FileReader(file));
>>>>
>>>>
>>>> String line=null;
>>>> while( null != (line = lnr.readLine()) ){
>>>> doc.add(new
>>>> StringField("SC",line.trim(),Field.Store.YES));
>>>> // doc.add(new
>>>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>>> }
>>>>
>>>> if (writer.getConfig().getOpenMode() ==
>>>> OpenMode.CREATE_OR_APPEND)
>>>> {
>>>>
>>>> writer.addDocument(doc);
>>>> writer.commit();
>>>> fis.close();
>>>> } else {
>>>> try
>>>> {
>>>> writer.updateDocument(new Term("path", file.getPath()),
>>>> doc);
>>>>
>>>> fis.close();
>>>>
>>>> }catch(Exception e)
>>>> {
>>>> writer.close();
>>>> fis.close();
>>>>
>>>> e.printStackTrace();
>>>>
>>>> }
>>>> }
>>>>
>>>> }catch (Exception e) {
>>>> writer.close();
>>>> fis.close();
>>>>
>>>> e.printStackTrace();
>>>> }finally {
>>>> // writer.close();
>>>>
>>>> fis.close();
>>>> }
>>>> }
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>>> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>>>
>>>>
>>>>> Lucene doesn't have document size limits.
>>>>>
>>>>> There are default limits for how many tokens the highlighters will
>>>>> process
>>>>> ...
>>>>>
>>>>> But, if you are passing each line as a separate document to Lucene,
>>>>> then Lucene only sees a bunch of tiny documents, right?
>>>>>
>>>>> Can you boil this down to a small test showing the problem?
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>>>>> <an...@rancoretech.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> Faced with a typical issue.
>>>>>> I have many files which I am indexing.
>>>>>>
>>>>>> Problem Faced:
>>>>>> a. File having size less than 20 MB are successfully indexed and
>>>>>> merged.
>>>>>>
>>>>>> b. File having size>20MB are not getting INDEXED.. No Exception is
>>>>>> being
>>>>>> thrown. Only a lock file is being created in the index directory. The
>>>>>> indexing process for a single file exceeding 20 MB size continues for
>>>>>> more
>>>>>> than 8 minutes after which I have a code which merge the generated
>>>>>> index
>>>>>> to
>>>>>> existing index.
>>>>>>
>>>>>> Since no index is being generated now, I get an exception during
>>>>>> merging
>>>>>> process.
>>>>>>
>>>>>> Why Files having size greater than 20 MB are not being indexed..??. I
>>>>>> am
>>>>>> indexing each line of the file. Why IndexWriter is not throwing any
>>>>>> error.
>>>>>>
>>>>>> Do I need to change any parameter in Lucene or tweak the Lucene
>>>>>> settings
>>>>>> ??
>>>>>> Lucene version is 4.4.0
>>>>>>
>>>>>> My current deployment for Lucene is on a server running with 128 MB and
>>>>>> 512
>>>>>> MB heap.
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>>
>>>>>> Ankit Murarka
>>>>>>
>>>>>> "What lies behind us and what lies before us are tiny matters compared
>>>>>> with
>>>>>> what lies within us"
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit Murarka
>>>>
>>>> "What lies behind us and what lies before us are tiny matters compared
>>>> with
>>>> what lies within us"
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "What lies behind us and what lies before us are tiny matters compared with
>> what lies within us"
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ian Lea <ia...@gmail.com>.
Well, I use neither Eclipse nor your application server and can offer
no advice on any differences in behaviour between the two. Maybe you
should try Eclipse or app server forums.
If you are going to index the complete contents of a file as one field
you are likely to hit OOM exceptions. How big is the largest file you
are ever going to index?
The server may have 8GB but how much memory are you allowing the JVM?
What are the command line flags? I think you mentioned 128Mb in an
earlier email. That isn't much.
--
Ian.
On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Hello,
> I get exception only when the code is fired from Eclipse.
> When it is deployed on an application server, I get no exception at all.
> This forced me to invoke the same code from Eclipse and check what is the
> issue.,.
>
> I ran the code on server with 8 GB memory.. Even then no exception
> occurred....!!.. Only write.lock is formed..
>
> Removing contents field is not desirable as this is needed for search to
> work perfectly...
>
> On 8/29/2013 6:17 PM, Ian Lea wrote:
>>
>> So you do get an exception after all, OOM.
>>
>> Try it without this line:
>>
>> doc.add(new TextField("contents", new BufferedReader(new
>> InputStreamReader(fis, "UTF-8"))));
>>
>> I think that will slurp the whole file in one go which will obviously
>> need more memory on larger files than on smaller ones.
>>
>> Or just run the program with more memory,
>>
>>
>> --
>> Ian.
>>
>>
>> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
>> <an...@rancoretech.com> wrote:
>>
>>>
>>> Yes I know that Lucene should not have any document size limits. All I
>>> get
>>> is a lock file inside my index folder. Along with this there's no other
>>> file
>>> inside the index folder. Then I get OOM exception.
>>> Please provide some guidance...
>>>
>>> Here is the example:
>>>
>>> package com.issue;
>>>
>>>
>>> import org.apache.lucene.analysis.Analyzer;
>>> import org.apache.lucene.document.Document;
>>> import org.apache.lucene.document.Field;
>>> import org.apache.lucene.document.LongField;
>>> import org.apache.lucene.document.StringField;
>>> import org.apache.lucene.document.TextField;
>>> import org.apache.lucene.index.IndexCommit;
>>> import org.apache.lucene.index.IndexWriter;
>>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>>> import org.apache.lucene.index.IndexWriterConfig;
>>> import org.apache.lucene.index.LiveIndexWriterConfig;
>>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>>> import org.apache.lucene.index.MergePolicy;
>>> import org.apache.lucene.index.SerialMergeScheduler;
>>> import org.apache.lucene.index.MergePolicy.OneMerge;
>>> import org.apache.lucene.index.MergeScheduler;
>>> import org.apache.lucene.index.Term;
>>> import org.apache.lucene.store.Directory;
>>> import org.apache.lucene.store.FSDirectory;
>>> import org.apache.lucene.util.Version;
>>>
>>>
>>> import java.io.BufferedReader;
>>> import java.io.File;
>>> import java.io.FileInputStream;
>>> import java.io.FileNotFoundException;
>>> import java.io.FileReader;
>>> import java.io.IOException;
>>> import java.io.InputStreamReader;
>>> import java.io.LineNumberReader;
>>> import java.util.Date;
>>>
>>> public class D {
>>>
>>> /** Index all text files under a directory. */
>>>
>>>
>>> static String[] filenames;
>>>
>>> public static void main(String[] args) {
>>>
>>> //String indexPath = args[0];
>>>
>>> String indexPath="D:\\Issue";//Place where indexes will be created
>>> String docsPath="Issue"; //Place where the files are kept.
>>> boolean create=true;
>>>
>>> String ch="OverAll";
>>>
>>>
>>> final File docDir = new File(docsPath);
>>> if (!docDir.exists() || !docDir.canRead()) {
>>> System.out.println("Document directory '"
>>> +docDir.getAbsolutePath()+
>>> "' does not exist or is not readable, please check the path");
>>> System.exit(1);
>>> }
>>>
>>> Date start = new Date();
>>> try {
>>> Directory dir = FSDirectory.open(new File(indexPath));
>>> Analyzer analyzer=new
>>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
>>> analyzer);
>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>
>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>> if(ch.equalsIgnoreCase("OverAll")){
>>> indexDocs(writer, docDir,true);
>>> }else{
>>> filenames=args[2].split(",");
>>> // indexDocs(writer, docDir);
>>>
>>> }
>>> writer.commit();
>>> writer.close();
>>>
>>> } catch (IOException e) {
>>> System.out.println(" caught a " + e.getClass() +
>>> "\n with message: " + e.getMessage());
>>> }
>>> catch(Exception e)
>>> {
>>>
>>> e.printStackTrace();
>>> }
>>> }
>>>
>>> //Over All
>>> static void indexDocs(IndexWriter writer, File file,boolean flag)
>>> throws IOException {
>>>
>>> FileInputStream fis = null;
>>> if (file.canRead()) {
>>>
>>> if (file.isDirectory()) {
>>> String[] files = file.list();
>>> // an IO error could occur
>>> if (files != null) {
>>> for (int i = 0; i< files.length; i++) {
>>> indexDocs(writer, new File(file, files[i]),flag);
>>> }
>>> }
>>> } else {
>>> try {
>>> fis = new FileInputStream(file);
>>> } catch (FileNotFoundException fnfe) {
>>>
>>> fnfe.printStackTrace();
>>> }
>>>
>>> try {
>>>
>>> Document doc = new Document();
>>>
>>> Field pathField = new StringField("path", file.getPath(),
>>> Field.Store.YES);
>>> doc.add(pathField);
>>>
>>> doc.add(new LongField("modified", file.lastModified(),
>>> Field.Store.NO));
>>>
>>> doc.add(new
>>> StringField("name",file.getName(),Field.Store.YES));
>>>
>>> doc.add(new TextField("contents", new BufferedReader(new
>>> InputStreamReader(fis, "UTF-8"))));
>>>
>>> LineNumberReader lnr=new LineNumberReader(new
>>> FileReader(file));
>>>
>>>
>>> String line=null;
>>> while( null != (line = lnr.readLine()) ){
>>> doc.add(new
>>> StringField("SC",line.trim(),Field.Store.YES));
>>> // doc.add(new
>>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>> }
>>>
>>> if (writer.getConfig().getOpenMode() ==
>>> OpenMode.CREATE_OR_APPEND)
>>> {
>>>
>>> writer.addDocument(doc);
>>> writer.commit();
>>> fis.close();
>>> } else {
>>> try
>>> {
>>> writer.updateDocument(new Term("path", file.getPath()),
>>> doc);
>>>
>>> fis.close();
>>>
>>> }catch(Exception e)
>>> {
>>> writer.close();
>>> fis.close();
>>>
>>> e.printStackTrace();
>>>
>>> }
>>> }
>>>
>>> }catch (Exception e) {
>>> writer.close();
>>> fis.close();
>>>
>>> e.printStackTrace();
>>> }finally {
>>> // writer.close();
>>>
>>> fis.close();
>>> }
>>> }
>>> }
>>> }
>>> }
>>>
>>>
>>>
>>> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>>
>>>>
>>>> Lucene doesn't have document size limits.
>>>>
>>>> There are default limits for how many tokens the highlighters will
>>>> process
>>>> ...
>>>>
>>>> But, if you are passing each line as a separate document to Lucene,
>>>> then Lucene only sees a bunch of tiny documents, right?
>>>>
>>>> Can you boil this down to a small test showing the problem?
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>>>> <an...@rancoretech.com> wrote:
>>>>
>>>>
>>>>>
>>>>> Hello all,
>>>>>
>>>>> Faced with a typical issue.
>>>>> I have many files which I am indexing.
>>>>>
>>>>> Problem Faced:
>>>>> a. File having size less than 20 MB are successfully indexed and
>>>>> merged.
>>>>>
>>>>> b. File having size>20MB are not getting INDEXED.. No Exception is
>>>>> being
>>>>> thrown. Only a lock file is being created in the index directory. The
>>>>> indexing process for a single file exceeding 20 MB size continues for
>>>>> more
>>>>> than 8 minutes after which I have a code which merge the generated
>>>>> index
>>>>> to
>>>>> existing index.
>>>>>
>>>>> Since no index is being generated now, I get an exception during
>>>>> merging
>>>>> process.
>>>>>
>>>>> Why Files having size greater than 20 MB are not being indexed..??. I
>>>>> am
>>>>> indexing each line of the file. Why IndexWriter is not throwing any
>>>>> error.
>>>>>
>>>>> Do I need to change any parameter in Lucene or tweak the Lucene
>>>>> settings
>>>>> ??
>>>>> Lucene version is 4.4.0
>>>>>
>>>>> My current deployment for Lucene is on a server running with 128 MB and
>>>>> 512
>>>>> MB heap.
>>>>>
>>>>> --
>>>>> Regards
>>>>>
>>>>> Ankit Murarka
>>>>>
>>>>> "What lies behind us and what lies before us are tiny matters compared
>>>>> with
>>>>> what lies within us"
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters compared
>>> with
>>> what lies within us"
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello,
I get exception only when the code is fired from Eclipse.
When it is deployed on an application server, I get no exception at all.
This forced me to invoke the same code from Eclipse and check what is
the issue.,.
I ran the code on server with 8 GB memory.. Even then no exception
occurred....!!.. Only write.lock is formed..
Removing contents field is not desirable as this is needed for search to
work perfectly...
On 8/29/2013 6:17 PM, Ian Lea wrote:
> So you do get an exception after all, OOM.
>
> Try it without this line:
>
> doc.add(new TextField("contents", new BufferedReader(new
> InputStreamReader(fis, "UTF-8"))));
>
> I think that will slurp the whole file in one go which will obviously
> need more memory on larger files than on smaller ones.
>
> Or just run the program with more memory,
>
>
> --
> Ian.
>
>
> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
> <an...@rancoretech.com> wrote:
>
>> Yes I know that Lucene should not have any document size limits. All I get
>> is a lock file inside my index folder. Along with this there's no other file
>> inside the index folder. Then I get OOM exception.
>> Please provide some guidance...
>>
>> Here is the example:
>>
>> package com.issue;
>>
>>
>> import org.apache.lucene.analysis.Analyzer;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.document.LongField;
>> import org.apache.lucene.document.StringField;
>> import org.apache.lucene.document.TextField;
>> import org.apache.lucene.index.IndexCommit;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>> import org.apache.lucene.index.IndexWriterConfig;
>> import org.apache.lucene.index.LiveIndexWriterConfig;
>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>> import org.apache.lucene.index.MergePolicy;
>> import org.apache.lucene.index.SerialMergeScheduler;
>> import org.apache.lucene.index.MergePolicy.OneMerge;
>> import org.apache.lucene.index.MergeScheduler;
>> import org.apache.lucene.index.Term;
>> import org.apache.lucene.store.Directory;
>> import org.apache.lucene.store.FSDirectory;
>> import org.apache.lucene.util.Version;
>>
>>
>> import java.io.BufferedReader;
>> import java.io.File;
>> import java.io.FileInputStream;
>> import java.io.FileNotFoundException;
>> import java.io.FileReader;
>> import java.io.IOException;
>> import java.io.InputStreamReader;
>> import java.io.LineNumberReader;
>> import java.util.Date;
>>
>> public class D {
>>
>> /** Index all text files under a directory. */
>>
>>
>> static String[] filenames;
>>
>> public static void main(String[] args) {
>>
>> //String indexPath = args[0];
>>
>> String indexPath="D:\\Issue";//Place where indexes will be created
>> String docsPath="Issue"; //Place where the files are kept.
>> boolean create=true;
>>
>> String ch="OverAll";
>>
>>
>> final File docDir = new File(docsPath);
>> if (!docDir.exists() || !docDir.canRead()) {
>> System.out.println("Document directory '" +docDir.getAbsolutePath()+
>> "' does not exist or is not readable, please check the path");
>> System.exit(1);
>> }
>>
>> Date start = new Date();
>> try {
>> Directory dir = FSDirectory.open(new File(indexPath));
>> Analyzer analyzer=new
>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
>> analyzer);
>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>
>> IndexWriter writer = new IndexWriter(dir, iwc);
>> if(ch.equalsIgnoreCase("OverAll")){
>> indexDocs(writer, docDir,true);
>> }else{
>> filenames=args[2].split(",");
>> // indexDocs(writer, docDir);
>>
>> }
>> writer.commit();
>> writer.close();
>>
>> } catch (IOException e) {
>> System.out.println(" caught a " + e.getClass() +
>> "\n with message: " + e.getMessage());
>> }
>> catch(Exception e)
>> {
>>
>> e.printStackTrace();
>> }
>> }
>>
>> //Over All
>> static void indexDocs(IndexWriter writer, File file,boolean flag)
>> throws IOException {
>>
>> FileInputStream fis = null;
>> if (file.canRead()) {
>>
>> if (file.isDirectory()) {
>> String[] files = file.list();
>> // an IO error could occur
>> if (files != null) {
>> for (int i = 0; i< files.length; i++) {
>> indexDocs(writer, new File(file, files[i]),flag);
>> }
>> }
>> } else {
>> try {
>> fis = new FileInputStream(file);
>> } catch (FileNotFoundException fnfe) {
>>
>> fnfe.printStackTrace();
>> }
>>
>> try {
>>
>> Document doc = new Document();
>>
>> Field pathField = new StringField("path", file.getPath(),
>> Field.Store.YES);
>> doc.add(pathField);
>>
>> doc.add(new LongField("modified", file.lastModified(),
>> Field.Store.NO));
>>
>> doc.add(new StringField("name",file.getName(),Field.Store.YES));
>>
>> doc.add(new TextField("contents", new BufferedReader(new
>> InputStreamReader(fis, "UTF-8"))));
>>
>> LineNumberReader lnr=new LineNumberReader(new FileReader(file));
>>
>>
>> String line=null;
>> while( null != (line = lnr.readLine()) ){
>> doc.add(new StringField("SC",line.trim(),Field.Store.YES));
>> // doc.add(new
>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>> }
>>
>> if (writer.getConfig().getOpenMode() == OpenMode.CREATE_OR_APPEND)
>> {
>>
>> writer.addDocument(doc);
>> writer.commit();
>> fis.close();
>> } else {
>> try
>> {
>> writer.updateDocument(new Term("path", file.getPath()), doc);
>>
>> fis.close();
>>
>> }catch(Exception e)
>> {
>> writer.close();
>> fis.close();
>>
>> e.printStackTrace();
>>
>> }
>> }
>>
>> }catch (Exception e) {
>> writer.close();
>> fis.close();
>>
>> e.printStackTrace();
>> }finally {
>> // writer.close();
>>
>> fis.close();
>> }
>> }
>> }
>> }
>> }
>>
>>
>>
>> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>
>>> Lucene doesn't have document size limits.
>>>
>>> There are default limits for how many tokens the highlighters will process
>>> ...
>>>
>>> But, if you are passing each line as a separate document to Lucene,
>>> then Lucene only sees a bunch of tiny documents, right?
>>>
>>> Can you boil this down to a small test showing the problem?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>>> <an...@rancoretech.com> wrote:
>>>
>>>
>>>> Hello all,
>>>>
>>>> Faced with a typical issue.
>>>> I have many files which I am indexing.
>>>>
>>>> Problem Faced:
>>>> a. File having size less than 20 MB are successfully indexed and merged.
>>>>
>>>> b. File having size>20MB are not getting INDEXED.. No Exception is being
>>>> thrown. Only a lock file is being created in the index directory. The
>>>> indexing process for a single file exceeding 20 MB size continues for
>>>> more
>>>> than 8 minutes after which I have a code which merge the generated index
>>>> to
>>>> existing index.
>>>>
>>>> Since no index is being generated now, I get an exception during merging
>>>> process.
>>>>
>>>> Why Files having size greater than 20 MB are not being indexed..??. I am
>>>> indexing each line of the file. Why IndexWriter is not throwing any
>>>> error.
>>>>
>>>> Do I need to change any parameter in Lucene or tweak the Lucene settings
>>>> ??
>>>> Lucene version is 4.4.0
>>>>
>>>> My current deployment for Lucene is on a server running with 128 MB and
>>>> 512
>>>> MB heap.
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit Murarka
>>>>
>>>> "What lies behind us and what lies before us are tiny matters compared
>>>> with
>>>> what lies within us"
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "What lies behind us and what lies before us are tiny matters compared with
>> what lies within us"
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ian Lea <ia...@gmail.com>.
So you do get an exception after all, OOM.
Try it without this line:
doc.add(new TextField("contents", new BufferedReader(new
InputStreamReader(fis, "UTF-8"))));
I think that will slurp the whole file in one go which will obviously
need more memory on larger files than on smaller ones.
Or just run the program with more memory,
--
Ian.
On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Yes I know that Lucene should not have any document size limits. All I get
> is a lock file inside my index folder. Along with this there's no other file
> inside the index folder. Then I get OOM exception.
> Please provide some guidance...
>
> Here is the example:
>
> package com.issue;
>
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.LongField;
> import org.apache.lucene.document.StringField;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.IndexCommit;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.index.LiveIndexWriterConfig;
> import org.apache.lucene.index.LogByteSizeMergePolicy;
> import org.apache.lucene.index.MergePolicy;
> import org.apache.lucene.index.SerialMergeScheduler;
> import org.apache.lucene.index.MergePolicy.OneMerge;
> import org.apache.lucene.index.MergeScheduler;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.util.Version;
>
>
> import java.io.BufferedReader;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import java.io.IOException;
> import java.io.InputStreamReader;
> import java.io.LineNumberReader;
> import java.util.Date;
>
> public class D {
>
> /** Index all text files under a directory. */
>
>
> static String[] filenames;
>
> public static void main(String[] args) {
>
> //String indexPath = args[0];
>
> String indexPath="D:\\Issue";//Place where indexes will be created
> String docsPath="Issue"; //Place where the files are kept.
> boolean create=true;
>
> String ch="OverAll";
>
>
> final File docDir = new File(docsPath);
> if (!docDir.exists() || !docDir.canRead()) {
> System.out.println("Document directory '" +docDir.getAbsolutePath()+
> "' does not exist or is not readable, please check the path");
> System.exit(1);
> }
>
> Date start = new Date();
> try {
> Directory dir = FSDirectory.open(new File(indexPath));
> Analyzer analyzer=new
> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
> analyzer);
> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>
> IndexWriter writer = new IndexWriter(dir, iwc);
> if(ch.equalsIgnoreCase("OverAll")){
> indexDocs(writer, docDir,true);
> }else{
> filenames=args[2].split(",");
> // indexDocs(writer, docDir);
>
> }
> writer.commit();
> writer.close();
>
> } catch (IOException e) {
> System.out.println(" caught a " + e.getClass() +
> "\n with message: " + e.getMessage());
> }
> catch(Exception e)
> {
>
> e.printStackTrace();
> }
> }
>
> //Over All
> static void indexDocs(IndexWriter writer, File file,boolean flag)
> throws IOException {
>
> FileInputStream fis = null;
> if (file.canRead()) {
>
> if (file.isDirectory()) {
> String[] files = file.list();
> // an IO error could occur
> if (files != null) {
> for (int i = 0; i < files.length; i++) {
> indexDocs(writer, new File(file, files[i]),flag);
> }
> }
> } else {
> try {
> fis = new FileInputStream(file);
> } catch (FileNotFoundException fnfe) {
>
> fnfe.printStackTrace();
> }
>
> try {
>
> Document doc = new Document();
>
> Field pathField = new StringField("path", file.getPath(),
> Field.Store.YES);
> doc.add(pathField);
>
> doc.add(new LongField("modified", file.lastModified(),
> Field.Store.NO));
>
> doc.add(new StringField("name",file.getName(),Field.Store.YES));
>
> doc.add(new TextField("contents", new BufferedReader(new
> InputStreamReader(fis, "UTF-8"))));
>
> LineNumberReader lnr=new LineNumberReader(new FileReader(file));
>
>
> String line=null;
> while( null != (line = lnr.readLine()) ){
> doc.add(new StringField("SC",line.trim(),Field.Store.YES));
> // doc.add(new
> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
> }
>
> if (writer.getConfig().getOpenMode() == OpenMode.CREATE_OR_APPEND)
> {
>
> writer.addDocument(doc);
> writer.commit();
> fis.close();
> } else {
> try
> {
> writer.updateDocument(new Term("path", file.getPath()), doc);
>
> fis.close();
>
> }catch(Exception e)
> {
> writer.close();
> fis.close();
>
> e.printStackTrace();
>
> }
> }
>
> }catch (Exception e) {
> writer.close();
> fis.close();
>
> e.printStackTrace();
> }finally {
> // writer.close();
>
> fis.close();
> }
> }
> }
> }
> }
>
>
>
> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>
>> Lucene doesn't have document size limits.
>>
>> There are default limits for how many tokens the highlighters will process
>> ...
>>
>> But, if you are passing each line as a separate document to Lucene,
>> then Lucene only sees a bunch of tiny documents, right?
>>
>> Can you boil this down to a small test showing the problem?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>> <an...@rancoretech.com> wrote:
>>
>>>
>>> Hello all,
>>>
>>> Faced with a typical issue.
>>> I have many files which I am indexing.
>>>
>>> Problem Faced:
>>> a. File having size less than 20 MB are successfully indexed and merged.
>>>
>>> b. File having size>20MB are not getting INDEXED.. No Exception is being
>>> thrown. Only a lock file is being created in the index directory. The
>>> indexing process for a single file exceeding 20 MB size continues for
>>> more
>>> than 8 minutes after which I have a code which merge the generated index
>>> to
>>> existing index.
>>>
>>> Since no index is being generated now, I get an exception during merging
>>> process.
>>>
>>> Why Files having size greater than 20 MB are not being indexed..??. I am
>>> indexing each line of the file. Why IndexWriter is not throwing any
>>> error.
>>>
>>> Do I need to change any parameter in Lucene or tweak the Lucene settings
>>> ??
>>> Lucene version is 4.4.0
>>>
>>> My current deployment for Lucene is on a server running with 128 MB and
>>> 512
>>> MB heap.
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters compared
>>> with
>>> what lies within us"
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Ankit Murarka <an...@rancoretech.com>.
Yes I know that Lucene should not have any document size limits. All I
get is a lock file inside my index folder. Along with this there's no
other file inside the index folder. Then I get OOM exception.
Please provide some guidance...
Here is the example:
package com.issue;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.LongField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexCommit;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.LiveIndexWriterConfig;
import org.apache.lucene.index.LogByteSizeMergePolicy;
import org.apache.lucene.index.MergePolicy;
import org.apache.lucene.index.SerialMergeScheduler;
import org.apache.lucene.index.MergePolicy.OneMerge;
import org.apache.lucene.index.MergeScheduler;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.LineNumberReader;
import java.util.Date;
public class D {
/** Index all text files under a directory. */
static String[] filenames;
public static void main(String[] args) {
//String indexPath = args[0];
String indexPath="D:\\Issue";//Place where indexes will be created
String docsPath="Issue"; //Place where the files are kept.
boolean create=true;
String ch="OverAll";
final File docDir = new File(docsPath);
if (!docDir.exists() || !docDir.canRead()) {
System.out.println("Document directory '"
+docDir.getAbsolutePath()+ "' does not exist or is not readable, please
check the path");
System.exit(1);
}
Date start = new Date();
try {
Directory dir = FSDirectory.open(new File(indexPath));
Analyzer analyzer=new
com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
analyzer);
iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter writer = new IndexWriter(dir, iwc);
if(ch.equalsIgnoreCase("OverAll")){
indexDocs(writer, docDir,true);
}else{
filenames=args[2].split(",");
// indexDocs(writer, docDir);
}
writer.commit();
writer.close();
} catch (IOException e) {
System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
}
catch(Exception e)
{
e.printStackTrace();
}
}
//Over All
static void indexDocs(IndexWriter writer, File file,boolean flag)
throws IOException {
FileInputStream fis = null;
if (file.canRead()) {
if (file.isDirectory()) {
String[] files = file.list();
// an IO error could occur
if (files != null) {
for (int i = 0; i < files.length; i++) {
indexDocs(writer, new File(file, files[i]),flag);
}
}
} else {
try {
fis = new FileInputStream(file);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}
try {
Document doc = new Document();
Field pathField = new StringField("path", file.getPath(),
Field.Store.YES);
doc.add(pathField);
doc.add(new LongField("modified", file.lastModified(),
Field.Store.NO));
doc.add(new StringField("name",file.getName(),Field.Store.YES));
doc.add(new TextField("contents", new BufferedReader(new
InputStreamReader(fis, "UTF-8"))));
LineNumberReader lnr=new LineNumberReader(new FileReader(file));
String line=null;
while( null != (line = lnr.readLine()) ){
doc.add(new StringField("SC",line.trim(),Field.Store.YES));
// doc.add(new
Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
}
if (writer.getConfig().getOpenMode() ==
OpenMode.CREATE_OR_APPEND) {
writer.addDocument(doc);
writer.commit();
fis.close();
} else {
try
{
writer.updateDocument(new Term("path", file.getPath()), doc);
fis.close();
}catch(Exception e)
{
writer.close();
fis.close();
e.printStackTrace();
}
}
}catch (Exception e) {
writer.close();
fis.close();
e.printStackTrace();
}finally {
// writer.close();
fis.close();
}
}
}
}
}
On 8/29/2013 4:20 PM, Michael McCandless wrote:
> Lucene doesn't have document size limits.
>
> There are default limits for how many tokens the highlighters will process ...
>
> But, if you are passing each line as a separate document to Lucene,
> then Lucene only sees a bunch of tiny documents, right?
>
> Can you boil this down to a small test showing the problem?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
> <an...@rancoretech.com> wrote:
>
>> Hello all,
>>
>> Faced with a typical issue.
>> I have many files which I am indexing.
>>
>> Problem Faced:
>> a. File having size less than 20 MB are successfully indexed and merged.
>>
>> b. File having size>20MB are not getting INDEXED.. No Exception is being
>> thrown. Only a lock file is being created in the index directory. The
>> indexing process for a single file exceeding 20 MB size continues for more
>> than 8 minutes after which I have a code which merge the generated index to
>> existing index.
>>
>> Since no index is being generated now, I get an exception during merging
>> process.
>>
>> Why Files having size greater than 20 MB are not being indexed..??. I am
>> indexing each line of the file. Why IndexWriter is not throwing any error.
>>
>> Do I need to change any parameter in Lucene or tweak the Lucene settings ??
>> Lucene version is 4.4.0
>>
>> My current deployment for Lucene is on a server running with 128 MB and 512
>> MB heap.
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "What lies behind us and what lies before us are tiny matters compared with
>> what lies within us"
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Files greater than 20 MB not getting Indexed. No files generated
except write.lock even after 8-9 minutes.
Posted by Michael McCandless <lu...@mikemccandless.com>.
Lucene doesn't have document size limits.
There are default limits for how many tokens the highlighters will process ...
But, if you are passing each line as a separate document to Lucene,
then Lucene only sees a bunch of tiny documents, right?
Can you boil this down to a small test showing the problem?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Hello all,
>
> Faced with a typical issue.
> I have many files which I am indexing.
>
> Problem Faced:
> a. File having size less than 20 MB are successfully indexed and merged.
>
> b. File having size >20MB are not getting INDEXED.. No Exception is being
> thrown. Only a lock file is being created in the index directory. The
> indexing process for a single file exceeding 20 MB size continues for more
> than 8 minutes after which I have a code which merge the generated index to
> existing index.
>
> Since no index is being generated now, I get an exception during merging
> process.
>
> Why Files having size greater than 20 MB are not being indexed..??. I am
> indexing each line of the file. Why IndexWriter is not throwing any error.
>
> Do I need to change any parameter in Lucene or tweak the Lucene settings ??
> Lucene version is 4.4.0
>
> My current deployment for Lucene is on a server running with 128 MB and 512
> MB heap.
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org