You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by zakaria benzidalmal <za...@gmail.com> on 2012/11/07 16:07:58 UTC

[SOLR-2549] DIH LineEntityProcessor support for delimited & fixed-width files

Hi all,

Could some one provide a clear exemple using this Processor
(data-config.xml exemple)?

I run into this problem after patching and building my code:

GRAVE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
        ... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
        ... 5 more
Caused by: java.lang.NullPointerException
        at
org.apache.solr.handler.dataimport.LineEntityProcessor.initDelimitedOrFixedWidth(LineEntityProcessor.java:142)
        at
org.apache.solr.handler.dataimport.LineEntityProcessor.init(LineEntityProcessor.java:115)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:74)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:430)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498)
        ... 6 more

Regards.

zakibenz.

Re: [SOLR-2549] DIH LineEntityProcessor support for delimited & fixed-width files

Posted by zakaria benzidalmal <za...@gmail.com>.
Hi James,

Yes, that was this parameter who made the request fail.

I've edited the patch and added the new version to jira.

Thank you.

2012/11/7 Dyer, James <Ja...@ingramcontent.com>

> Try specifying the "escape" parameter.  This is the character your file
> uses to escape delimiters occuring in the data.  If this fixes your problem
> and if you do not want to have to specify "escape", you can alter the
> patch, line 113:
>
> change: (escape == '\\')
> to: (escape !=null && escape == '\\')
>
> If this works for you, please upload the modified patch to the JIRA issue
> for future reference.  Thanks.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: zakaria benzidalmal [mailto:zakibenz@gmail.com]
> Sent: Wednesday, November 07, 2012 9:53 AM
> To: solr-user@lucene.apache.org
> Subject: Re: [SOLR-2549] DIH LineEntityProcessor support for delimited &
> fixed-width files
>
> James,
>
> Thank you for you quick answer.
> This is my data-config.xml file:
>
> <dataConfig>
>     <dataSource name="dfs" type="FileDataSource"/>
>     <document>
>         <entity name="sourcefile"
>                 processor="FileListEntityProcessor"
>                 fileName="rocinter.csv"
>                 rootEntity="false"
>                 baseDir="/user/work/solr/example/example-DIH/solr/csv/in"
>         >
>
>             <entity name="entryline"
>                     processor="LineEntityProcessor"
>                     url="${sourcefile.fileAbsolutePath}"
>                     rootEntity="true"
>                     dataSource="fds"
>                     separator=","
>             >
>             </entity>
>         </entity>
>     </document>
> </dataConfig>
>
> am I doing something wrong.
>
> Cordialement.
> ______________________
> Zakaria BENZIDALMAL
> mobile: 06 31 40 04 33
>
>
> 2012/11/7 Dyer, James <Ja...@ingramcontent.com>
>
> > Zakaria,
> >
> > You might want to post your data-config.xml, or at least the part that
> > uses SOLR-2549.  If its throwing an NPE, it certaintly has a bug (if
> you're
> > doing something wrong, it would at least give you a sensible error
> > message).  Also, unless you need to use DIH for some other reason, you
> > might want to consider the csv request handler to do your imports, which
> is
> > a mature feature of Solr for importing whole documents from delimited
> (not
> > just csv) files.  See http://wiki.apache.org/solr/UpdateCSV
> >
> > Here is an example that loads a fixed-width file using DIH and SOLR-2549
> > (actually it uses code that SOLR-2549 was based on.  I haven't tried this
> > with the exact code in SOLR-2549):
> >
> > <dataConfig>
> >         <dataSource name="URL"
> > baseUrl="${dataimporter.request.fileBasepath}" type="URLDataSource" />
> >         <document name="FixedWidthCounts">
> >                 <entity
> >                         name="Counts"
> >
> > processor="org.apache.solr.handler.dataimport.LineEntityProcessor"
> >                         dataSource="URL"
> >                         url="incoming/COUNTS.txt"
> >                         colDef1="ID,0,9,BIGDECIMAL,0,LEFT"
> >                         colDef2="COUNT,9,19,INTEGER,0,LEFT"
> >                 />
> >         </document>
> > </dataConfig>
> >
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: zakaria benzidalmal [mailto:zakibenz@gmail.com]
> > Sent: Wednesday, November 07, 2012 9:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: [SOLR-2549] DIH LineEntityProcessor support for delimited &
> > fixed-width files
> >
> > Hi all,
> >
> > Could some one provide a clear exemple using this Processor
> > (data-config.xml exemple)?
> >
> > I run into this problem after patching and building my code:
> >
> > GRAVE: Full Import failed:java.lang.RuntimeException:
> > java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > java.lang.NullPointerException
> >         at
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
> > Caused by: java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > java.lang.NullPointerException
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
> >         at
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
> >         ... 3 more
> > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> > java.lang.NullPointerException
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
> >         ... 5 more
> > Caused by: java.lang.NullPointerException
> >         at
> >
> >
> org.apache.solr.handler.dataimport.LineEntityProcessor.initDelimitedOrFixedWidth(LineEntityProcessor.java:142)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.LineEntityProcessor.init(LineEntityProcessor.java:115)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:74)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:430)
> >         at
> >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498)
> >         ... 6 more
> >
> > Regards.
> >
> > zakibenz.
> >
> >
>
>

RE: [SOLR-2549] DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Try specifying the "escape" parameter.  This is the character your file uses to escape delimiters occuring in the data.  If this fixes your problem and if you do not want to have to specify "escape", you can alter the patch, line 113:

change: (escape == '\\')
to: (escape !=null && escape == '\\')

If this works for you, please upload the modified patch to the JIRA issue for future reference.  Thanks.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: zakaria benzidalmal [mailto:zakibenz@gmail.com] 
Sent: Wednesday, November 07, 2012 9:53 AM
To: solr-user@lucene.apache.org
Subject: Re: [SOLR-2549] DIH LineEntityProcessor support for delimited & fixed-width files

James,

Thank you for you quick answer.
This is my data-config.xml file:

<dataConfig>
    <dataSource name="dfs" type="FileDataSource"/>
    <document>
        <entity name="sourcefile"
                processor="FileListEntityProcessor"
                fileName="rocinter.csv"
                rootEntity="false"
                baseDir="/user/work/solr/example/example-DIH/solr/csv/in"
        >

            <entity name="entryline"
                    processor="LineEntityProcessor"
                    url="${sourcefile.fileAbsolutePath}"
                    rootEntity="true"
                    dataSource="fds"
                    separator=","
            >
            </entity>
        </entity>
    </document>
</dataConfig>

am I doing something wrong.

Cordialement.
______________________
Zakaria BENZIDALMAL
mobile: 06 31 40 04 33


2012/11/7 Dyer, James <Ja...@ingramcontent.com>

> Zakaria,
>
> You might want to post your data-config.xml, or at least the part that
> uses SOLR-2549.  If its throwing an NPE, it certaintly has a bug (if you're
> doing something wrong, it would at least give you a sensible error
> message).  Also, unless you need to use DIH for some other reason, you
> might want to consider the csv request handler to do your imports, which is
> a mature feature of Solr for importing whole documents from delimited (not
> just csv) files.  See http://wiki.apache.org/solr/UpdateCSV
>
> Here is an example that loads a fixed-width file using DIH and SOLR-2549
> (actually it uses code that SOLR-2549 was based on.  I haven't tried this
> with the exact code in SOLR-2549):
>
> <dataConfig>
>         <dataSource name="URL"
> baseUrl="${dataimporter.request.fileBasepath}" type="URLDataSource" />
>         <document name="FixedWidthCounts">
>                 <entity
>                         name="Counts"
>
> processor="org.apache.solr.handler.dataimport.LineEntityProcessor"
>                         dataSource="URL"
>                         url="incoming/COUNTS.txt"
>                         colDef1="ID,0,9,BIGDECIMAL,0,LEFT"
>                         colDef2="COUNT,9,19,INTEGER,0,LEFT"
>                 />
>         </document>
> </dataConfig>
>
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: zakaria benzidalmal [mailto:zakibenz@gmail.com]
> Sent: Wednesday, November 07, 2012 9:08 AM
> To: solr-user@lucene.apache.org
> Subject: [SOLR-2549] DIH LineEntityProcessor support for delimited &
> fixed-width files
>
> Hi all,
>
> Could some one provide a clear exemple using this Processor
> (data-config.xml exemple)?
>
> I run into this problem after patching and building my code:
>
> GRAVE: Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NullPointerException
>         at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
>         at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
>         at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
>         at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NullPointerException
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
>         at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
>         ... 3 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NullPointerException
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542)
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
>         ... 5 more
> Caused by: java.lang.NullPointerException
>         at
>
> org.apache.solr.handler.dataimport.LineEntityProcessor.initDelimitedOrFixedWidth(LineEntityProcessor.java:142)
>         at
>
> org.apache.solr.handler.dataimport.LineEntityProcessor.init(LineEntityProcessor.java:115)
>         at
>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:74)
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:430)
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498)
>         ... 6 more
>
> Regards.
>
> zakibenz.
>
>


Re: [SOLR-2549] DIH LineEntityProcessor support for delimited & fixed-width files

Posted by zakaria benzidalmal <za...@gmail.com>.
James,

Thank you for you quick answer.
This is my data-config.xml file:

<dataConfig>
    <dataSource name="dfs" type="FileDataSource"/>
    <document>
        <entity name="sourcefile"
                processor="FileListEntityProcessor"
                fileName="rocinter.csv"
                rootEntity="false"
                baseDir="/user/work/solr/example/example-DIH/solr/csv/in"
        >

            <entity name="entryline"
                    processor="LineEntityProcessor"
                    url="${sourcefile.fileAbsolutePath}"
                    rootEntity="true"
                    dataSource="fds"
                    separator=","
            >
            </entity>
        </entity>
    </document>
</dataConfig>

am I doing something wrong.

Cordialement.
______________________
Zakaria BENZIDALMAL
mobile: 06 31 40 04 33


2012/11/7 Dyer, James <Ja...@ingramcontent.com>

> Zakaria,
>
> You might want to post your data-config.xml, or at least the part that
> uses SOLR-2549.  If its throwing an NPE, it certaintly has a bug (if you're
> doing something wrong, it would at least give you a sensible error
> message).  Also, unless you need to use DIH for some other reason, you
> might want to consider the csv request handler to do your imports, which is
> a mature feature of Solr for importing whole documents from delimited (not
> just csv) files.  See http://wiki.apache.org/solr/UpdateCSV
>
> Here is an example that loads a fixed-width file using DIH and SOLR-2549
> (actually it uses code that SOLR-2549 was based on.  I haven't tried this
> with the exact code in SOLR-2549):
>
> <dataConfig>
>         <dataSource name="URL"
> baseUrl="${dataimporter.request.fileBasepath}" type="URLDataSource" />
>         <document name="FixedWidthCounts">
>                 <entity
>                         name="Counts"
>
> processor="org.apache.solr.handler.dataimport.LineEntityProcessor"
>                         dataSource="URL"
>                         url="incoming/COUNTS.txt"
>                         colDef1="ID,0,9,BIGDECIMAL,0,LEFT"
>                         colDef2="COUNT,9,19,INTEGER,0,LEFT"
>                 />
>         </document>
> </dataConfig>
>
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: zakaria benzidalmal [mailto:zakibenz@gmail.com]
> Sent: Wednesday, November 07, 2012 9:08 AM
> To: solr-user@lucene.apache.org
> Subject: [SOLR-2549] DIH LineEntityProcessor support for delimited &
> fixed-width files
>
> Hi all,
>
> Could some one provide a clear exemple using this Processor
> (data-config.xml exemple)?
>
> I run into this problem after patching and building my code:
>
> GRAVE: Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NullPointerException
>         at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
>         at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
>         at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
>         at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NullPointerException
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
>         at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
>         ... 3 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NullPointerException
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542)
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
>         ... 5 more
> Caused by: java.lang.NullPointerException
>         at
>
> org.apache.solr.handler.dataimport.LineEntityProcessor.initDelimitedOrFixedWidth(LineEntityProcessor.java:142)
>         at
>
> org.apache.solr.handler.dataimport.LineEntityProcessor.init(LineEntityProcessor.java:115)
>         at
>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:74)
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:430)
>         at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498)
>         ... 6 more
>
> Regards.
>
> zakibenz.
>
>

RE: [SOLR-2549] DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Zakaria,

You might want to post your data-config.xml, or at least the part that uses SOLR-2549.  If its throwing an NPE, it certaintly has a bug (if you're doing something wrong, it would at least give you a sensible error message).  Also, unless you need to use DIH for some other reason, you might want to consider the csv request handler to do your imports, which is a mature feature of Solr for importing whole documents from delimited (not just csv) files.  See http://wiki.apache.org/solr/UpdateCSV

Here is an example that loads a fixed-width file using DIH and SOLR-2549 (actually it uses code that SOLR-2549 was based on.  I haven't tried this with the exact code in SOLR-2549):

<dataConfig>
	<dataSource name="URL" baseUrl="${dataimporter.request.fileBasepath}" type="URLDataSource" />
	<document name="FixedWidthCounts">
		<entity
			name="Counts"
			processor="org.apache.solr.handler.dataimport.LineEntityProcessor"
			dataSource="URL"
			url="incoming/COUNTS.txt"
			colDef1="ID,0,9,BIGDECIMAL,0,LEFT"
			colDef2="COUNT,9,19,INTEGER,0,LEFT"
		/>
	</document>
</dataConfig>


James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: zakaria benzidalmal [mailto:zakibenz@gmail.com] 
Sent: Wednesday, November 07, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: [SOLR-2549] DIH LineEntityProcessor support for delimited & fixed-width files

Hi all,

Could some one provide a clear exemple using this Processor
(data-config.xml exemple)?

I run into this problem after patching and building my code:

GRAVE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
        ... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
        ... 5 more
Caused by: java.lang.NullPointerException
        at
org.apache.solr.handler.dataimport.LineEntityProcessor.initDelimitedOrFixedWidth(LineEntityProcessor.java:142)
        at
org.apache.solr.handler.dataimport.LineEntityProcessor.init(LineEntityProcessor.java:115)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:74)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:430)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498)
        ... 6 more

Regards.

zakibenz.