You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by wojtekpia <wo...@hotmail.com> on 2009/09/16 18:52:29 UTC

FileListEntityProcessor and LineEntityProcessor

Hi,

I'm trying to import data from a list of files using the
FileListEntityProcessor. Here is my import configuration:

  <dataSource type="FileDataSource" name="fileDataSource"/>
  <document name="dict-entries">
    <entity name="f" processor="FileListEntityProcessor"
baseDir="d:\my\directory\" fileName=".*WRK" recursive="false"
rootEntity="false">
      <entity name="jc"
        processor="LineEntityProcessor"
        url="${f.fileAbsolutePath}"
        dataSource="fileDataSource"
        transformer="myTransformer">
      </entity>
    </entity>
  </document>

If I have only one file in d:\my\directory\ then everything works correctly.
If I have multiple files then I get the following exception: 

Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DocBuilder
buildDocum
ent
SEVERE: Exception while processing: f document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
reading f
rom input Processing Document # 53812
        at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:112)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:237)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:348)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:376)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:224)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:167)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:316)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:376)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:355)
Caused by: java.io.IOException: Stream closed
        at java.io.BufferedReader.ensureOpen(Unknown Source)
        at java.io.BufferedReader.readLine(Unknown Source)
        at java.io.BufferedReader.readLine(Unknown Source)
        at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:109)
        ... 8 more
Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DataImporter
doFullIm
port
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
reading f
rom input Processing Document # 53812
        at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:112)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:237)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:348)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:376)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:224)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:167)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:316)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:376)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:355)
Caused by: java.io.IOException: Stream closed
        at java.io.BufferedReader.ensureOpen(Unknown Source)
        at java.io.BufferedReader.readLine(Unknown Source)
        at java.io.BufferedReader.readLine(Unknown Source)
        at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:109)
        ... 8 more



Note that my input files have 53812 lines, which is the same as the document
number that I'm choking on. Does anyone know what I'm doing wrong?

Thanks,

Wojtek
-- 
View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25476443.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FileListEntityProcessor and LineEntityProcessor

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
I have opened an issue SOLR-1440

On Thu, Sep 17, 2009 at 2:46 AM, wojtekpia <wo...@hotmail.com> wrote:
>
> Note that if I change my import file to explicitly list all my files (instead
> of using the FileListEntityProcessor) as below then everything works as I
> expect.
>
>  <dataSource type="FileDataSource" name="fileDataSource"
> basePath="d:\my\directory\"/>
>  <document name="dict-entries">
>    <entity name="jc" processor="LineEntityProcessor" url="file1.WRK"
> dataSource="fileDataSource" transformer="myTransformer"></entity>
>    <entity name="jc" processor="LineEntityProcessor" url="file2.WRK"
> dataSource="fileDataSource" transformer="myTransformer"></entity>
>    <entity name="jc" processor="LineEntityProcessor" url="file3.WRK"
> dataSource="fileDataSource" transformer="myTransformer"></entity>
>
> ...
>
> </document>
> --
> View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25480830.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: FileListEntityProcessor and LineEntityProcessor

Posted by wojtekpia <wo...@hotmail.com>.
Note that if I change my import file to explicitly list all my files (instead
of using the FileListEntityProcessor) as below then everything works as I
expect.

  <dataSource type="FileDataSource" name="fileDataSource"
basePath="d:\my\directory\"/>
  <document name="dict-entries">
    <entity name="jc" processor="LineEntityProcessor" url="file1.WRK"
dataSource="fileDataSource" transformer="myTransformer"></entity>
    <entity name="jc" processor="LineEntityProcessor" url="file2.WRK"
dataSource="fileDataSource" transformer="myTransformer"></entity>
    <entity name="jc" processor="LineEntityProcessor" url="file3.WRK"
dataSource="fileDataSource" transformer="myTransformer"></entity>

...

</document>
-- 
View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25480830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FileListEntityProcessor and LineEntityProcessor

Posted by wojtekpia <wo...@hotmail.com>.


Fergus McMenemie-2 wrote:
> 
> 
> Can you provide more detail on what you are trying to do? ...
> You seem to listing all files "d:\my\directory\.*WRK". Do 
> these WRK files contain lists of files to be indexed?
> 
> 

That is my complete data config file. I have a directory containing a bunch
of files that have one entity per line. Each line contains "blocks" of data.
I parse out each block and process it appropriately using myTransformer. Is
this use of FileListEntityProcessor with LineEntityProcessor not supported?
-- 
View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25477613.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FileListEntityProcessor and LineEntityProcessor

Posted by Fergus McMenemie <fe...@twig.me.uk>.
>Hi,
>
>I'm trying to import data from a list of files using the
>FileListEntityProcessor. Here is my import configuration:
>
>  <dataSource type="FileDataSource" name="fileDataSource"/>
>  <document name="dict-entries">
>    <entity name="f" processor="FileListEntityProcessor"
>baseDir="d:\my\directory\" fileName=".*WRK" recursive="false"
>rootEntity="false">
>      <entity name="jc"
>        processor="LineEntityProcessor"
>        url="${f.fileAbsolutePath}"
>        dataSource="fileDataSource"
>        transformer="myTransformer">
>      </entity>
>    </entity>
>  </document>
>
>If I have only one file in d:\my\directory\ then everything works correctly.
>If I have multiple files then I get the following exception: 

Sorry but I dont quite follow this. FileListEntityProcessor and
LineEntityProcessor are somewhat similar in that they provide
a list of filenames which the likes of XPathEntityProcessor
then open and parse.

Is the above your complete data-config.xml?

Can you provide more detail on what you are trying to do? ...
You seem to listing all files "d:\my\directory\.*WRK". Do 
these WRK files contain lists of files to be indexed?





>Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DocBuilder
>buildDocum
>ent
>SEVERE: Exception while processing: f document : null
>org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
>reading f
>rom input Processing Document # 53812
>        at
>org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
>tityProcessor.java:112)
>        at
>org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>ityProcessorWrapper.java:237)
>        at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>r.java:348)
>        at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>r.java:376)
>        at
>org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>ava:224)
>        at
>org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>:167)
>        at
>org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>rter.java:316)
>        at
>org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>ava:376)
>        at
>org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
>va:355)
>Caused by: java.io.IOException: Stream closed
>        at java.io.BufferedReader.ensureOpen(Unknown Source)
>        at java.io.BufferedReader.readLine(Unknown Source)
>        at java.io.BufferedReader.readLine(Unknown Source)
>        at
>org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
>tityProcessor.java:109)
>        ... 8 more
>Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DataImporter
>doFullIm
>port
>SEVERE: Full Import failed
>org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
>reading f
>rom input Processing Document # 53812
>        at
>org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
>tityProcessor.java:112)
>        at
>org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>ityProcessorWrapper.java:237)
>        at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>r.java:348)
>        at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>r.java:376)
>        at
>org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>ava:224)
>        at
>org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>:167)
>        at
>org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>rter.java:316)
>        at
>org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>ava:376)
>        at
>org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
>va:355)
>Caused by: java.io.IOException: Stream closed
>        at java.io.BufferedReader.ensureOpen(Unknown Source)
>        at java.io.BufferedReader.readLine(Unknown Source)
>        at java.io.BufferedReader.readLine(Unknown Source)
>        at
>org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
>tityProcessor.java:109)
>        ... 8 more
>
>
>
>Note that my input files have 53812 lines, which is the same as the document
>number that I'm choking on. Does anyone know what I'm doing wrong?
>
>Thanks,
>
>Wojtek
>-- 
>View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25476443.html
>Sent from the Solr - User mailing list archive at Nabble.com.

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================