You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by wojtekpia <wo...@hotmail.com> on 2009/09/16 18:52:29 UTC
FileListEntityProcessor and LineEntityProcessor
Hi,
I'm trying to import data from a list of files using the
FileListEntityProcessor. Here is my import configuration:
<dataSource type="FileDataSource" name="fileDataSource"/>
<document name="dict-entries">
<entity name="f" processor="FileListEntityProcessor"
baseDir="d:\my\directory\" fileName=".*WRK" recursive="false"
rootEntity="false">
<entity name="jc"
processor="LineEntityProcessor"
url="${f.fileAbsolutePath}"
dataSource="fileDataSource"
transformer="myTransformer">
</entity>
</entity>
</document>
If I have only one file in d:\my\directory\ then everything works correctly.
If I have multiple files then I get the following exception:
Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DocBuilder
buildDocum
ent
SEVERE: Exception while processing: f document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
reading f
rom input Processing Document # 53812
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:112)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:348)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:376)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:376)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:355)
Caused by: java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:109)
... 8 more
Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DataImporter
doFullIm
port
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
reading f
rom input Processing Document # 53812
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:112)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:348)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:376)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:376)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:355)
Caused by: java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:109)
... 8 more
Note that my input files have 53812 lines, which is the same as the document
number that I'm choking on. Does anyone know what I'm doing wrong?
Thanks,
Wojtek
--
View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25476443.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: FileListEntityProcessor and LineEntityProcessor
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
I have opened an issue SOLR-1440
On Thu, Sep 17, 2009 at 2:46 AM, wojtekpia <wo...@hotmail.com> wrote:
>
> Note that if I change my import file to explicitly list all my files (instead
> of using the FileListEntityProcessor) as below then everything works as I
> expect.
>
> <dataSource type="FileDataSource" name="fileDataSource"
> basePath="d:\my\directory\"/>
> <document name="dict-entries">
> <entity name="jc" processor="LineEntityProcessor" url="file1.WRK"
> dataSource="fileDataSource" transformer="myTransformer"></entity>
> <entity name="jc" processor="LineEntityProcessor" url="file2.WRK"
> dataSource="fileDataSource" transformer="myTransformer"></entity>
> <entity name="jc" processor="LineEntityProcessor" url="file3.WRK"
> dataSource="fileDataSource" transformer="myTransformer"></entity>
>
> ...
>
> </document>
> --
> View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25480830.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
Re: FileListEntityProcessor and LineEntityProcessor
Posted by wojtekpia <wo...@hotmail.com>.
Note that if I change my import file to explicitly list all my files (instead
of using the FileListEntityProcessor) as below then everything works as I
expect.
<dataSource type="FileDataSource" name="fileDataSource"
basePath="d:\my\directory\"/>
<document name="dict-entries">
<entity name="jc" processor="LineEntityProcessor" url="file1.WRK"
dataSource="fileDataSource" transformer="myTransformer"></entity>
<entity name="jc" processor="LineEntityProcessor" url="file2.WRK"
dataSource="fileDataSource" transformer="myTransformer"></entity>
<entity name="jc" processor="LineEntityProcessor" url="file3.WRK"
dataSource="fileDataSource" transformer="myTransformer"></entity>
...
</document>
--
View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25480830.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: FileListEntityProcessor and LineEntityProcessor
Posted by wojtekpia <wo...@hotmail.com>.
Fergus McMenemie-2 wrote:
>
>
> Can you provide more detail on what you are trying to do? ...
> You seem to listing all files "d:\my\directory\.*WRK". Do
> these WRK files contain lists of files to be indexed?
>
>
That is my complete data config file. I have a directory containing a bunch
of files that have one entity per line. Each line contains "blocks" of data.
I parse out each block and process it appropriately using myTransformer. Is
this use of FileListEntityProcessor with LineEntityProcessor not supported?
--
View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25477613.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: FileListEntityProcessor and LineEntityProcessor
Posted by Fergus McMenemie <fe...@twig.me.uk>.
>Hi,
>
>I'm trying to import data from a list of files using the
>FileListEntityProcessor. Here is my import configuration:
>
> <dataSource type="FileDataSource" name="fileDataSource"/>
> <document name="dict-entries">
> <entity name="f" processor="FileListEntityProcessor"
>baseDir="d:\my\directory\" fileName=".*WRK" recursive="false"
>rootEntity="false">
> <entity name="jc"
> processor="LineEntityProcessor"
> url="${f.fileAbsolutePath}"
> dataSource="fileDataSource"
> transformer="myTransformer">
> </entity>
> </entity>
> </document>
>
>If I have only one file in d:\my\directory\ then everything works correctly.
>If I have multiple files then I get the following exception:
Sorry but I dont quite follow this. FileListEntityProcessor and
LineEntityProcessor are somewhat similar in that they provide
a list of filenames which the likes of XPathEntityProcessor
then open and parse.
Is the above your complete data-config.xml?
Can you provide more detail on what you are trying to do? ...
You seem to listing all files "d:\my\directory\.*WRK". Do
these WRK files contain lists of files to be indexed?
>Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DocBuilder
>buildDocum
>ent
>SEVERE: Exception while processing: f document : null
>org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
>reading f
>rom input Processing Document # 53812
> at
>org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
>tityProcessor.java:112)
> at
>org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>ityProcessorWrapper.java:237)
> at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>r.java:348)
> at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>r.java:376)
> at
>org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>ava:224)
> at
>org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>:167)
> at
>org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>rter.java:316)
> at
>org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>ava:376)
> at
>org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
>va:355)
>Caused by: java.io.IOException: Stream closed
> at java.io.BufferedReader.ensureOpen(Unknown Source)
> at java.io.BufferedReader.readLine(Unknown Source)
> at java.io.BufferedReader.readLine(Unknown Source)
> at
>org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
>tityProcessor.java:109)
> ... 8 more
>Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DataImporter
>doFullIm
>port
>SEVERE: Full Import failed
>org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
>reading f
>rom input Processing Document # 53812
> at
>org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
>tityProcessor.java:112)
> at
>org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>ityProcessorWrapper.java:237)
> at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>r.java:348)
> at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>r.java:376)
> at
>org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>ava:224)
> at
>org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>:167)
> at
>org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>rter.java:316)
> at
>org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>ava:376)
> at
>org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
>va:355)
>Caused by: java.io.IOException: Stream closed
> at java.io.BufferedReader.ensureOpen(Unknown Source)
> at java.io.BufferedReader.readLine(Unknown Source)
> at java.io.BufferedReader.readLine(Unknown Source)
> at
>org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
>tityProcessor.java:109)
> ... 8 more
>
>
>
>Note that my input files have 53812 lines, which is the same as the document
>number that I'm choking on. Does anyone know what I'm doing wrong?
>
>Thanks,
>
>Wojtek
>--
>View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25476443.html
>Sent from the Solr - User mailing list archive at Nabble.com.
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================