You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Florian Aumeier <fa...@mediaventures.de> on 2008/10/14 13:35:17 UTC
error with delta import
Hi,
I have some problems with delta-import. Here are the infos I have.
The result from the web API, apparantly everything is fine:
<response>
−
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
−
<lst name="initArgs">
−
<lst name="defaults">
<str name="config">db-psql-data-config.xml</str>
</lst>
</lst>
<str name="status">idle</str>
<str name="importResponse"/>
−
<lst name="statusMessages">
<str name="Time Elapsed">0:29:30.615</str>
<str name="Total Requests made to DataSource">1</str>
<str name="Total Rows Fetched">16194</str>
<str name="Total Documents Processed">0</str>
<str name="Total Documents Skipped">0</str>
<str name="Delta Dump started">2008-10-14 11:23:31</str>
<str name="Identifying Delta">2008-10-14 11:23:31</str>
<str name="Deltas Obtained">2008-10-14 11:32:16</str>
<str name="Building documents">2008-10-14 11:32:16</str>
<str name="Total Changed Documents">16194</str>
</lst>
−
<str name="WARNING">
This response format is experimental. It is likely to change in the future.
</str>
</response>
From the log:
INFO: Starting Delta Import
Oct 14, 2008 11:23:31 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={command=delta-import}
status=0 QTime=1
Oct 14, 2008 11:23:31 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Oct 14, 2008 11:23:31 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta
INFO: Starting delta collection.
Oct 14, 2008 11:23:31 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: articles
Oct 14, 2008 11:23:31 AM
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity articles with URL:
jdbc:postgresql://bm02:5432/bm
Oct 14, 2008 11:23:35 AM
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 3694
Oct 14, 2008 11:29:16 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 16194
Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running DeletedRowKey() for Entity: articles
Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0
Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: articles
Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
SEVERE: Delta Import Failed
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:136)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
Any help and or hints is appreciated
Florian
Re: error with delta import
Posted by Florian Aumeier <fa...@mediaventures.de>.
Lance Norskog schrieb:
> If you make a database view with the query, it is easy to examine the data you want to index. Then, your solr import query would just pull the view. The Solr setup file is much simpler this way.
>
I will try and let you know.
RE: error with delta import
Posted by Lance Norskog <go...@gmail.com>.
If you make a database view with the query, it is easy to examine the data you want to index. Then, your solr import query would just pull the view. The Solr setup file is much simpler this way.
-----Original Message-----
From: Noble Paul നോബിള് नोब्ळ् [mailto:noble.paul@gmail.com]
Sent: Wednesday, October 15, 2008 2:46 AM
To: solr-user@lucene.apache.org
Subject: Re: error with delta import
The delta implementation is a bit fragile in DIH for complex queries
I recommend you do delta-import using a full-import
.................
Re: where's the bottleneck
Posted by Yonik Seeley <yo...@apache.org>.
On Thu, Oct 30, 2008 at 1:02 AM, Barnett, Jeffrey
<je...@yale.edu> wrote:
> I thought it was turned off already. ( Lucene vs Solr ?) Where do I make this change?
Comment out this part in your solrconfig.xml
<autoCommit>
<maxDocs>20000</maxDocs>
<maxTime>40000</maxTime>
</autoCommit>
-Yonik
> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
> Sent: Wednesday, October 29, 2008 11:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: where's the bottleneck
>
> On Wed, Oct 29, 2008 at 9:48 PM, Barnett, Jeffrey
> <je...@yale.edu> wrote:
>> Reported import rates start a 70 docs per second, and decrease as more records are added.
>
> It might just be segment merges (that takes more time as segments grow in size).
> From the solrconfig.xml I see you have autocommit turned on... try
> with it off and see if it helps.
>
> -Yonik
>
RE: where's the bottleneck
Posted by "Barnett, Jeffrey" <je...@yale.edu>.
I thought it was turned off already. ( Lucene vs Solr ?) Where do I make this change?
-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, October 29, 2008 11:28 PM
To: solr-user@lucene.apache.org
Subject: Re: where's the bottleneck
On Wed, Oct 29, 2008 at 9:48 PM, Barnett, Jeffrey
<je...@yale.edu> wrote:
> Reported import rates start a 70 docs per second, and decrease as more records are added.
It might just be segment merges (that takes more time as segments grow in size).
>From the solrconfig.xml I see you have autocommit turned on... try
with it off and see if it helps.
-Yonik
Re: where's the bottleneck
Posted by Yonik Seeley <yo...@apache.org>.
On Wed, Oct 29, 2008 at 9:48 PM, Barnett, Jeffrey
<je...@yale.edu> wrote:
> Reported import rates start a 70 docs per second, and decrease as more records are added.
It might just be segment merges (that takes more time as segments grow in size).
>From the solrconfig.xml I see you have autocommit turned on... try
with it off and see if it helps.
-Yonik
where's the bottleneck
Posted by "Barnett, Jeffrey" <je...@yale.edu>.
I saw a similar subject posted earlier. This is not a continuation of that thread, but the problem is similar. I have a large, fast, dedicated machine, that despite boosting various parameters in solrconfig.xml (attached) and in the JVM, utilizes at most 10% of the cpu while importing: (from top)
5817 vufind 46 17 4 4721M 4691M cpu/35 18.8H 8.85% /usr/jdk/instances/jdk1.6.0/bin/sparcv9/java -Xms4096m -Xmx4096m -Xmn2g -XX:+UseParallelGC -XX:+AggressiveOpts
There is 0.0% reported iowait time, 32GB real memory, and virtually no other processes running. The index is relatively large (8M docs, 30GB), but not extreme by the standards of others it see in this list. Reported import rates start a 70 docs per second, and decrease as more records are added. Why is the program not using the resources it has been given?
OS: Solaris 10
Java 1.6
Solr: 1.3
Re: error with delta import
Posted by Chris Hostetter <ho...@fucit.org>.
: The case in point is DIH. DIH uses the standard DOM parser that comes
: w/ JDK. If it reads the xml properly do we need to complain?. I guess
: that data-config.xml may not be used for any other purposes.
that's a vague statement as well ... there is no such thing as "the
standard DOM parser that comes w/ JDK" ... that's an implementation detail
of the JRE, and differnet JRE providers might use different parsers in
their DocumentBuilders, some of which might be stricter then others.
*AND* even the choice of DocumentBuilder and DocumentBuilder factory can
be decided at runtime -- so even if someone uses the same JRE as you,
their servlet container might be registering it's own
DocumentBuilderFactory.
So it's not safe to make any assumptions that just because the
javax.xml.parsers.DocumentBuilder used in one Solr deployment cleanly
parses a mallform XML file that it will work on any other machine.
:
:
: On Wed, Oct 22, 2008 at 10:10 PM, Walter Underwood
: <wu...@netflix.com> wrote:
: > On 10/22/08 8:57 AM, "Steven A Rowe" <sa...@syr.edu> wrote:
: >
: >> Telling people that it's not a problem (or required!) to write non-well-formed
: >> XML, because a particular XML parser can't accept well-formed XML is kind of
: >> insidious.
: >
: > I'm with you all the way on this.
: >
: > A parser which accepts non-well-formed XML is not an XML parser, since the
: > XML spec requires reporting a fatal error.
: >
: > It is really easy to test these things. Modern browsers have good XML
: > parsers, so put your test case in a "test.xml" file and open it in a
: > browser. If it isn't well-formed, you'll get an error.
: >
: > Here is my test XML:
: >
: > <root attribute="<"/>
: >
: > Here is what Firefox 3.0.3 says about that:
: >
: > XML Parsing Error: not well-formed
: > Location: file:///Users/wunderwood/Desktop/test.xml
: > Line Number 1, Column 18:
: >
: > <root attribute="<"/>
: > -----------------^
: >
: > wunder
: >
: >
:
:
:
: --
: --Noble Paul
:
-Hoss
Re: error with delta import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
The case in point is DIH. DIH uses the standard DOM parser that comes
w/ JDK. If it reads the xml properly do we need to complain?. I guess
that data-config.xml may not be used for any other purposes.
On Wed, Oct 22, 2008 at 10:10 PM, Walter Underwood
<wu...@netflix.com> wrote:
> On 10/22/08 8:57 AM, "Steven A Rowe" <sa...@syr.edu> wrote:
>
>> Telling people that it's not a problem (or required!) to write non-well-formed
>> XML, because a particular XML parser can't accept well-formed XML is kind of
>> insidious.
>
> I'm with you all the way on this.
>
> A parser which accepts non-well-formed XML is not an XML parser, since the
> XML spec requires reporting a fatal error.
>
> It is really easy to test these things. Modern browsers have good XML
> parsers, so put your test case in a "test.xml" file and open it in a
> browser. If it isn't well-formed, you'll get an error.
>
> Here is my test XML:
>
> <root attribute="<"/>
>
> Here is what Firefox 3.0.3 says about that:
>
> XML Parsing Error: not well-formed
> Location: file:///Users/wunderwood/Desktop/test.xml
> Line Number 1, Column 18:
>
> <root attribute="<"/>
> -----------------^
>
> wunder
>
>
--
--Noble Paul
Re: error with delta import
Posted by Walter Underwood <wu...@netflix.com>.
On 10/22/08 8:57 AM, "Steven A Rowe" <sa...@syr.edu> wrote:
> Telling people that it's not a problem (or required!) to write non-well-formed
> XML, because a particular XML parser can't accept well-formed XML is kind of
> insidious.
I'm with you all the way on this.
A parser which accepts non-well-formed XML is not an XML parser, since the
XML spec requires reporting a fatal error.
It is really easy to test these things. Modern browsers have good XML
parsers, so put your test case in a "test.xml" file and open it in a
browser. If it isn't well-formed, you'll get an error.
Here is my test XML:
<root attribute="<"/>
Here is what Firefox 3.0.3 says about that:
XML Parsing Error: not well-formed
Location: file:///Users/wunderwood/Desktop/test.xml
Line Number 1, Column 18:
<root attribute="<"/>
-----------------^
wunder
RE: error with delta import
Posted by Steven A Rowe <sa...@syr.edu>.
Hi Shalin,
I wasn't talking about the behavior of parsers in the wild, but rather about the XML specification (paraphrasing):
1. An XML document is not well-formed unless it matches the production labeled document.
2. Violations of well-formedness constraints are fatal errors.
3. Once a fatal error is detected, an XML parser MUST NOT continue normal processing.
So although there are undoubtedly parsers that will parse '<' in attribute values, in so doing, these parsers are non-conformant with the XML specification. This is important only to the extent that people who create documents that target non-conforming features of parsers can't reliably expect these documents to be parsed by conformant parsers; XML's write-once-parse-anywhere promise thereby inexorably evaporates.
Telling people that it's not a problem (or required!) to write non-well-formed XML, because a particular XML parser can't accept well-formed XML is kind of insidious. I for one will not stand idly by and permit this outrage to remain unchallenged!!!
:)
Steve
On 10/22/2008 at 4:01 AM, Shalin Shekhar Mangar wrote:
> Actually, most XML parsers don't require you to escape such
> characters in attributes. You are welcome to try this out,
> just look at the example-DIH :)
>
> On Tue, Oct 21, 2008 at 11:11 PM, Steven A Rowe
> <sa...@syr.edu> wrote:
>
> > Wow, I really should read more closely before I respond - I see now,
> > Noble, that you were talking about DIH's ability to parse escaped '<'s
> > in attribute values, rather than about whether '<' was an acceptable
> > character in attribute values.
> >
> > I should repurpose my remarks to note to Shalin, though, that all
> > (conformant) XML parsers have to be able to handle escaped '<'s in
> > attribute values, since an XML document with a '<' in an attribute
> > value is not well-formed.
> >
> > Steve
> >
> > On 10/21/2008 at 1:10 PM, Steven A Rowe wrote:
> > > On 10/21/2008 at 12:14 AM, Noble Paul നോബിള് नोब्ळ् wrote:
> > > > On Tue, Oct 21, 2008 at 12:56 AM, Shalin Shekhar Mangar
> > > <sh...@gmail.com> wrote:
> > > > > Your data-config looks fine except for one thing --
> you do not need
> > to
> > > > > escape '<' character in an XML attribute. It maybe throwing off the
> > > > > parsing code in DataImportHandler.
> > > >
> > > > not really '<' is fine in attribute
> > >
> > > Noble, I think you're wrong - AFAICT from the XML spec., '<' is *not*
> > > fine in an attribute value - from
> > > <http://www.w3.org/TR/REC-xml/#NT-AttValue>:
> > >
> > > [10] AttValue ::= '"' ([^<&"] | Reference)* '"'
> > > | "'" ([^<&'] | Reference)* "'"
> > >
> > > where an attribute <http://www.w3.org/TR/REC-xml/#dt-stag> is:
> > >
> > > [41] Attribute ::= Name Eq AttValue
> > >
> > > Steve
Re: error with delta import
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Actually, most XML parsers don't require you to escape such characters in
attributes. You are welcome to try this out, just look at the example-DIH :)
On Tue, Oct 21, 2008 at 11:11 PM, Steven A Rowe <sa...@syr.edu> wrote:
> Wow, I really should read more closely before I respond - I see now, Noble,
> that you were talking about DIH's ability to parse escaped '<'s in attribute
> values, rather than about whether '<' was an acceptable character in
> attribute values.
>
> I should repurpose my remarks to note to Shalin, though, that all
> (conformant) XML parsers have to be able to handle escaped '<'s in attribute
> values, since an XML document with a '<' in an attribute value is not
> well-formed.
>
> Steve
>
> On 10/21/2008 at 1:10 PM, Steven A Rowe wrote:
> > On 10/21/2008 at 12:14 AM, Noble Paul നോബിള് नोब्ळ् wrote:
> > > On Tue, Oct 21, 2008 at 12:56 AM, Shalin Shekhar Mangar
> > <sh...@gmail.com> wrote:
> > > > Your data-config looks fine except for one thing -- you do not need
> to
> > > > escape '<' character in an XML attribute. It maybe throwing off the
> > > > parsing code in DataImportHandler.
> > >
> > > not really '<' is fine in attribute
> >
> > Noble, I think you're wrong - AFAICT from the XML spec., '<' is *not*
> > fine in an attribute value - from
> > <http://www.w3.org/TR/REC-xml/#NT-AttValue>:
> >
> > [10] AttValue ::= '"' ([^<&"] | Reference)* '"'
> > | "'" ([^<&'] | Reference)* "'"
> >
> > where an attribute <http://www.w3.org/TR/REC-xml/#dt-stag> is:
> >
> > [41] Attribute ::= Name Eq AttValue
> >
> > Steve
>
--
Regards,
Shalin Shekhar Mangar.
RE: error with delta import
Posted by Steven A Rowe <sa...@syr.edu>.
Wow, I really should read more closely before I respond - I see now, Noble, that you were talking about DIH's ability to parse escaped '<'s in attribute values, rather than about whether '<' was an acceptable character in attribute values.
I should repurpose my remarks to note to Shalin, though, that all (conformant) XML parsers have to be able to handle escaped '<'s in attribute values, since an XML document with a '<' in an attribute value is not well-formed.
Steve
On 10/21/2008 at 1:10 PM, Steven A Rowe wrote:
> On 10/21/2008 at 12:14 AM, Noble Paul നോബിള് नोब्ळ् wrote:
> > On Tue, Oct 21, 2008 at 12:56 AM, Shalin Shekhar Mangar
> <sh...@gmail.com> wrote:
> > > Your data-config looks fine except for one thing -- you do not need to
> > > escape '<' character in an XML attribute. It maybe throwing off the
> > > parsing code in DataImportHandler.
> >
> > not really '<' is fine in attribute
>
> Noble, I think you're wrong - AFAICT from the XML spec., '<' is *not*
> fine in an attribute value - from
> <http://www.w3.org/TR/REC-xml/#NT-AttValue>:
>
> [10] AttValue ::= '"' ([^<&"] | Reference)* '"'
> | "'" ([^<&'] | Reference)* "'"
>
> where an attribute <http://www.w3.org/TR/REC-xml/#dt-stag> is:
>
> [41] Attribute ::= Name Eq AttValue
>
> Steve
RE: error with delta import
Posted by Steven A Rowe <sa...@syr.edu>.
On 10/21/2008 at 12:14 AM, Noble Paul നോബിള് नोब्ळ् wrote:
> On Tue, Oct 21, 2008 at 12:56 AM, Shalin Shekhar Mangar <sh...@gmail.com> wrote:
> > Your data-config looks fine except for one thing -- you do not need to
> > escape '<' character in an XML attribute. It maybe throwing off the
> > parsing code in DataImportHandler.
>
> not really '<' is fine in attribute
Noble, I think you're wrong - AFAICT from the XML spec., '<' is *not* fine in an attribute value - from <http://www.w3.org/TR/REC-xml/#NT-AttValue>:
[10] AttValue ::= '"' ([^<&"] | Reference)* '"'
| "'" ([^<&'] | Reference)* "'"
where an attribute <http://www.w3.org/TR/REC-xml/#dt-stag> is:
[41] Attribute ::= Name Eq AttValue
Steve
Re: error with delta import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
On Tue, Oct 21, 2008 at 12:56 AM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> Your data-config looks fine except for one thing -- you do not need to
> escape '<' character in an XML attribute. It maybe throwing off the parsing
> code in DataImportHandler.
not really '<' is fine in attribute
>
> Another question, does the full-import work fine?
>
> On Mon, Oct 20, 2008 at 7:31 PM, Florian Aumeier
> <fa...@mediaventures.de>wrote:
>
>> sorry to bother you again, but the delta import still does not work for me
>> :-(
>>
>> We tried:
>> * delta-import by full-import
>> <entity name="articles-delta rootEntity="false"
>> query="<your-delta-query-here>"> with entity=articles-delta&clean=false
>>
>> * delta-import by full-import with simplified query
>>
>> * delta-import with simplified query
>> <entity name="articles-delta" pk="article_ref" deltaQuery="SELECT *
>> FROM full_text_view WHERE article_id < 300">
>>
>> * replaced files below with files from nightly-build 15.10.08 and rerun the
>> delta and full imports as described above
>> dist/apache-solr-dataimporthandler-1.3.0.jar
>> dist/solrj-lib/slf4j-api-1.5.3.jar
>> dist/solrj-lib/slf4j-jdk14-1.5.3.jar
>>
>>
>> No matter what we do, we always end up in a situation, when the dataimport
>> status looks fine:
>>
>> <lst name="statusMessages">
>> <str name="Time Elapsed">0:0:8.442</str>
>> <str name="Total Requests made to DataSource">1</str>
>> <str name="Total Rows Fetched">218</str>
>> <str name="Total Documents Skipped">0</str>
>> <str name="Delta Dump started">2008-10-20 15:31:54</str>
>> <str name="Identifying Delta">2008-10-20 15:31:54</str>
>> <str name="Deltas Obtained">2008-10-20 15:31:57</str>
>> <str name="Building documents">2008-10-20 15:31:57</str>
>> <str name="Total Changed Documents">218</str>
>>
>> but the log reads:
>> Oct 20, 2008 3:56:44 PM org.apache.solr.core.SolrCore execute
>> INFO: [test] webapp=/solr path=/dataimport params={command=delta-import}
>> status=0 QTime=0
>> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DataImporter
>> doDeltaImport
>> INFO: Starting Delta Import
>> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.SolrWriter
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder
>> doDelta
>> INFO: Starting delta collection.
>> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Running ModifiedRowKey() for Entity: articles-full
>> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Creating a connection for entity articles-full with URL:
>> jdbc:postgresql://blogmonitor02:5432/blogmonitor
>> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Time taken for getConnection(): 5
>> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed ModifiedRowKey for Entity: articles-full rows obtained :
>> 218
>> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Running DeletedRowKey() for Entity: articles-full
>> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed DeletedRowKey for Entity: articles-full rows obtained : 0
>> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed parentDeltaQuery for Entity: articles-full
>> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DataImporter
>> doDeltaImport
>> SEVERE: Delta Import Failed
>> java.lang.NullPointerException
>> at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
>> at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
>> at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
>> at
>> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
>> at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
>> at
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>
>> here is the full data-config:
>>
>> <dataConfig>
>> <dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
>> url="jdbc:postgresql://bm02:5432/bm" user="bm" />
>>
>> <document name="articles">
>> <entity name="articles-full" pk="id" query="SELECT * FROM full_text_view
>> where article_id < 200" deltaQuery="SELECT * FROM full_text_view WHERE
>> article_id < 300">
>> <field column="article_id" name="a_id" />
>> <field column="normalized_text" name="norm_text" />
>> <field column="article_ref" name="id" />
>> <field column="article_stub" name="stub" />
>> <field column="id_blogs" name="blog_id" />
>> <field column="article_title" name="a_title" />
>> <field column="article_url" name="article_url" />
>> <field column="ts" name="ts" />
>> <field column="rank" name="rank" />
>> <field column="blog_ref" name="blog_ref" />
>> <field column="blog_title" name="b_title" />
>> <field column="blog_subtitle" name="subtitle" />
>> <field column="blog_url" name="blog_url" />
>> </entity>
>>
>> </document>
>>
>> </dataConfig>
>>
>> what are we doing wrong?
>> Florian
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
--
--Noble Paul
Re: error with delta import
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Your data-config looks fine except for one thing -- you do not need to
escape '<' character in an XML attribute. It maybe throwing off the parsing
code in DataImportHandler.
Another question, does the full-import work fine?
On Mon, Oct 20, 2008 at 7:31 PM, Florian Aumeier
<fa...@mediaventures.de>wrote:
> sorry to bother you again, but the delta import still does not work for me
> :-(
>
> We tried:
> * delta-import by full-import
> <entity name="articles-delta rootEntity="false"
> query="<your-delta-query-here>"> with entity=articles-delta&clean=false
>
> * delta-import by full-import with simplified query
>
> * delta-import with simplified query
> <entity name="articles-delta" pk="article_ref" deltaQuery="SELECT *
> FROM full_text_view WHERE article_id < 300">
>
> * replaced files below with files from nightly-build 15.10.08 and rerun the
> delta and full imports as described above
> dist/apache-solr-dataimporthandler-1.3.0.jar
> dist/solrj-lib/slf4j-api-1.5.3.jar
> dist/solrj-lib/slf4j-jdk14-1.5.3.jar
>
>
> No matter what we do, we always end up in a situation, when the dataimport
> status looks fine:
>
> <lst name="statusMessages">
> <str name="Time Elapsed">0:0:8.442</str>
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">218</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Delta Dump started">2008-10-20 15:31:54</str>
> <str name="Identifying Delta">2008-10-20 15:31:54</str>
> <str name="Deltas Obtained">2008-10-20 15:31:57</str>
> <str name="Building documents">2008-10-20 15:31:57</str>
> <str name="Total Changed Documents">218</str>
>
> but the log reads:
> Oct 20, 2008 3:56:44 PM org.apache.solr.core.SolrCore execute
> INFO: [test] webapp=/solr path=/dataimport params={command=delta-import}
> status=0 QTime=0
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Starting delta collection.
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: articles-full
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity articles-full with URL:
> jdbc:postgresql://blogmonitor02:5432/blogmonitor
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Time taken for getConnection(): 5
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: articles-full rows obtained :
> 218
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running DeletedRowKey() for Entity: articles-full
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: articles-full rows obtained : 0
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: articles-full
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> SEVERE: Delta Import Failed
> java.lang.NullPointerException
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
> at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>
> here is the full data-config:
>
> <dataConfig>
> <dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
> url="jdbc:postgresql://bm02:5432/bm" user="bm" />
>
> <document name="articles">
> <entity name="articles-full" pk="id" query="SELECT * FROM full_text_view
> where article_id < 200" deltaQuery="SELECT * FROM full_text_view WHERE
> article_id < 300">
> <field column="article_id" name="a_id" />
> <field column="normalized_text" name="norm_text" />
> <field column="article_ref" name="id" />
> <field column="article_stub" name="stub" />
> <field column="id_blogs" name="blog_id" />
> <field column="article_title" name="a_title" />
> <field column="article_url" name="article_url" />
> <field column="ts" name="ts" />
> <field column="rank" name="rank" />
> <field column="blog_ref" name="blog_ref" />
> <field column="blog_title" name="b_title" />
> <field column="blog_subtitle" name="subtitle" />
> <field column="blog_url" name="blog_url" />
> </entity>
>
> </document>
>
> </dataConfig>
>
> what are we doing wrong?
> Florian
>
>
--
Regards,
Shalin Shekhar Mangar.
Re: error with delta import
Posted by Florian Aumeier <fa...@mediaventures.de>.
hello everybody
thank you all for your help and ideas it works now.
>> what are we doing wrong?
>> Florian
>>
actually, I am not sure what we did wrong. After we started it again
from scratch and with the simplified query it all worked as expected.
Regards
Florian
Re: error with delta import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
you are still doing a delta import . with the modified data-config you
must do a command=full-import
On Mon, Oct 20, 2008 at 7:31 PM, Florian Aumeier
<fa...@mediaventures.de> wrote:
> sorry to bother you again, but the delta import still does not work for me
> :-(
>
> We tried:
> * delta-import by full-import
> <entity name="articles-delta rootEntity="false"
> query="<your-delta-query-here>"> with entity=articles-delta&clean=false
>
> * delta-import by full-import with simplified query
>
> * delta-import with simplified query
> <entity name="articles-delta" pk="article_ref" deltaQuery="SELECT *
> FROM full_text_view WHERE article_id < 300">
>
> * replaced files below with files from nightly-build 15.10.08 and rerun the
> delta and full imports as described above
> dist/apache-solr-dataimporthandler-1.3.0.jar
> dist/solrj-lib/slf4j-api-1.5.3.jar
> dist/solrj-lib/slf4j-jdk14-1.5.3.jar
>
>
> No matter what we do, we always end up in a situation, when the dataimport
> status looks fine:
>
> <lst name="statusMessages">
> <str name="Time Elapsed">0:0:8.442</str>
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">218</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Delta Dump started">2008-10-20 15:31:54</str>
> <str name="Identifying Delta">2008-10-20 15:31:54</str>
> <str name="Deltas Obtained">2008-10-20 15:31:57</str>
> <str name="Building documents">2008-10-20 15:31:57</str>
> <str name="Total Changed Documents">218</str>
>
> but the log reads:
> Oct 20, 2008 3:56:44 PM org.apache.solr.core.SolrCore execute
> INFO: [test] webapp=/solr path=/dataimport params={command=delta-import}
> status=0 QTime=0
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Starting delta collection.
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: articles-full
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity articles-full with URL:
> jdbc:postgresql://blogmonitor02:5432/blogmonitor
> Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Time taken for getConnection(): 5
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: articles-full rows obtained : 218
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running DeletedRowKey() for Entity: articles-full
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: articles-full rows obtained : 0
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: articles-full
> Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> SEVERE: Delta Import Failed
> java.lang.NullPointerException
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
> at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>
> here is the full data-config:
>
> <dataConfig>
> <dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
> url="jdbc:postgresql://bm02:5432/bm" user="bm" />
>
> <document name="articles">
> <entity name="articles-full" pk="id" query="SELECT * FROM full_text_view
> where article_id < 200" deltaQuery="SELECT * FROM full_text_view WHERE
> article_id < 300">
> <field column="article_id" name="a_id" />
> <field column="normalized_text" name="norm_text" />
> <field column="article_ref" name="id" />
> <field column="article_stub" name="stub" />
> <field column="id_blogs" name="blog_id" />
> <field column="article_title" name="a_title" />
> <field column="article_url" name="article_url" />
> <field column="ts" name="ts" />
> <field column="rank" name="rank" />
> <field column="blog_ref" name="blog_ref" />
> <field column="blog_title" name="b_title" />
> <field column="blog_subtitle" name="subtitle" />
> <field column="blog_url" name="blog_url" />
> </entity>
>
> </document>
>
> </dataConfig>
>
> what are we doing wrong?
> Florian
>
>
--
--Noble Paul
Re: error with delta import
Posted by Florian Aumeier <fa...@mediaventures.de>.
sorry to bother you again, but the delta import still does not work for
me :-(
We tried:
* delta-import by full-import
<entity name="articles-delta rootEntity="false"
query="<your-delta-query-here>"> with entity=articles-delta&clean=false
* delta-import by full-import with simplified query
* delta-import with simplified query
<entity name="articles-delta" pk="article_ref"
deltaQuery="SELECT * FROM full_text_view WHERE article_id < 300">
* replaced files below with files from nightly-build 15.10.08 and rerun
the delta and full imports as described above
dist/apache-solr-dataimporthandler-1.3.0.jar
dist/solrj-lib/slf4j-api-1.5.3.jar
dist/solrj-lib/slf4j-jdk14-1.5.3.jar
No matter what we do, we always end up in a situation, when the
dataimport status looks fine:
<lst name="statusMessages">
<str name="Time Elapsed">0:0:8.442</str>
<str name="Total Requests made to DataSource">1</str>
<str name="Total Rows Fetched">218</str>
<str name="Total Documents Skipped">0</str>
<str name="Delta Dump started">2008-10-20 15:31:54</str>
<str name="Identifying Delta">2008-10-20 15:31:54</str>
<str name="Deltas Obtained">2008-10-20 15:31:57</str>
<str name="Building documents">2008-10-20 15:31:57</str>
<str name="Total Changed Documents">218</str>
but the log reads:
Oct 20, 2008 3:56:44 PM org.apache.solr.core.SolrCore execute
INFO: [test] webapp=/solr path=/dataimport params={command=delta-import}
status=0 QTime=0
Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
INFO: Starting Delta Import
Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder
doDelta
INFO: Starting delta collection.
Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: articles-full
Oct 20, 2008 3:56:44 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity articles-full with URL:
jdbc:postgresql://blogmonitor02:5432/blogmonitor
Oct 20, 2008 3:56:44 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 5
Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: articles-full rows obtained : 218
Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running DeletedRowKey() for Entity: articles-full
Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: articles-full rows obtained : 0
Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: articles-full
Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
SEVERE: Delta Import Failed
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
here is the full data-config:
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
url="jdbc:postgresql://bm02:5432/bm" user="bm" />
<document name="articles">
<entity name="articles-full" pk="id" query="SELECT * FROM
full_text_view where article_id < 200" deltaQuery="SELECT * FROM
full_text_view WHERE article_id < 300">
<field column="article_id" name="a_id" />
<field column="normalized_text" name="norm_text" />
<field column="article_ref" name="id" />
<field column="article_stub" name="stub" />
<field column="id_blogs" name="blog_id" />
<field column="article_title" name="a_title" />
<field column="article_url" name="article_url" />
<field column="ts" name="ts" />
<field column="rank" name="rank" />
<field column="blog_ref" name="blog_ref" />
<field column="blog_title" name="b_title" />
<field column="blog_subtitle" name="subtitle" />
<field column="blog_url" name="blog_url" />
</entity>
</document>
</dataConfig>
what are we doing wrong?
Florian
Re: error with delta import
Posted by Florian Aumeier <fa...@mediaventures.de>.
Noble Paul നോബിള് नोब्ळ् schrieb:
> the last-index_time is available only from second time onwards that is
> . It expects a full-import to be done first
> It knows that by the presence of dataimport.properties in the config
> directory. Did you check if it is present?
>
>
yes, I did a check and the file is still present. It is the same file as
used by the delta-import?
Re: error with delta import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
the last-index_time is available only from second time onwards that is
. It expects a full-import to be done first
It knows that by the presence of dataimport.properties in the config
directory. Did you check if it is present?
On Thu, Oct 16, 2008 at 5:33 PM, Florian Aumeier
<fa...@mediaventures.de> wrote:
> Noble Paul നോബിള് नोब्ळ् schrieb:
>>>
>>> Well, when doing the way you described below (full-import with the delta
>>> query), the '${dataimporter.last_index_time}' timestamp is empty:
>>>
>>
>> I guess this was fixed post 1.3 . probably you can take
>> dataimporthandler.jar from a nightly build (you may also need to add
>> slf4j.jar)
>>
>>>
> I replaced
> dist/apache-solr-dataimporthandler-1.3.0.jar
> dist/solrj-lib/slf4j-api-1.5.3.jar
> dist/solrj-lib/slf4j-jdk14-1.5.3.jar
>
> with their counterparts from the nightly build, but it did not help. Then I
> tried to enter the date kind of hard coded (now() - '12 hours'::interval).
> Everything looks fine, but there are no new documents in the index.
>
> here is the log:
>
> INFO: Starting Full Import
> Oct 16, 2008 1:07:08 PM org.apache.solr.core.SolrCore executeINFO: [test]
> webapp=/solr path=/dataimport
> params={command=full-import&clean=false&entity=articles-delta} status=0
> QTime=0
> Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity articles-delta with URL:
> jdbc:postgresql://bm02:5432/bm
> Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 45
> Oct 16, 2008 1:14:53 PM org.apache.solr.core.SolrCore execute
> INFO: [test] webapp=/solr path=/dataimport params={} status=0 QTime=1
> Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerPropertiesINFO: Read dataimport.properties
> Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter
> persistStartTime
> INFO: Wrote last indexed time to dataimport.properties
> Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.DocBuilder
> commitINFO: Full Import completed successfullyOct 16, 2008 1:16:11 PM
> org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)Oct 16,
> 2008 1:16:11 PM org.apache.solr.search.SolrIndexSearcher <init>INFO: Opening
> Searcher@3cd0d12e mainOct 16, 2008 1:16:11 PM
> org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> ... (autowarming)
> Oct 16, 2008 1:16:12 PM org.apache.solr.handler.dataimport.DocBuilder
> execute
> INFO: Time taken = 0:9:3.231
>
>
--
--Noble Paul
Re: error with delta import
Posted by Florian Aumeier <fa...@mediaventures.de>.
Noble Paul നോബിള് नोब्ळ् schrieb:
>> Well, when doing the way you described below (full-import with the delta
>> query), the '${dataimporter.last_index_time}' timestamp is empty:
>>
> I guess this was fixed post 1.3 . probably you can take
> dataimporthandler.jar from a nightly build (you may also need to add
> slf4j.jar)
>
>>
I replaced
dist/apache-solr-dataimporthandler-1.3.0.jar
dist/solrj-lib/slf4j-api-1.5.3.jar
dist/solrj-lib/slf4j-jdk14-1.5.3.jar
with their counterparts from the nightly build, but it did not help.
Then I tried to enter the date kind of hard coded (now() - '12
hours'::interval).
Everything looks fine, but there are no new documents in the index.
here is the log:
INFO: Starting Full Import
Oct 16, 2008 1:07:08 PM org.apache.solr.core.SolrCore executeINFO:
[test] webapp=/solr path=/dataimport
params={command=full-import&clean=false&entity=articles-delta} status=0
QTime=0
Oct 16, 2008 1:07:08 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity articles-delta with URL:
jdbc:postgresql://bm02:5432/bm
Oct 16, 2008 1:07:08 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Time taken
for getConnection(): 45
Oct 16, 2008 1:14:53 PM org.apache.solr.core.SolrCore execute
INFO: [test] webapp=/solr path=/dataimport params={} status=0 QTime=1
Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerPropertiesINFO: Read dataimport.properties
Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter
persistStartTime
INFO: Wrote last indexed time to dataimport.properties
Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.DocBuilder
commitINFO: Full Import completed successfullyOct 16, 2008 1:16:11 PM
org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)Oct
16, 2008 1:16:11 PM org.apache.solr.search.SolrIndexSearcher <init>INFO:
Opening Searcher@3cd0d12e mainOct 16, 2008 1:16:11 PM
org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
... (autowarming)
Oct 16, 2008 1:16:12 PM org.apache.solr.handler.dataimport.DocBuilder
execute
INFO: Time taken = 0:9:3.231
Re: error with delta import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
On Thu, Oct 16, 2008 at 2:08 PM, Florian Aumeier
<fa...@mediaventures.de> wrote:
> Noble Paul നോബിള് नोब्ळ् schrieb:
>>
>> The delta implementation is a bit fragile in DIH for complex queries
>>
>>
>
> that's too bad. It's a nice interface and less complex to configure than to
> go the XML /update way.
>
>
> Well, when doing the way you described below (full-import with the delta
> query), the '${dataimporter.last_index_time}' timestamp is empty:
I guess this was fixed post 1.3 . probably you can take
dataimporthandler.jar from a nightly build (you may also need to add
slf4j.jar)
>
> Oct 16, 2008 10:14:53 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS
> article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, au.url
> AS article_url, bu.url AS blog_url, b.title AS blog_title,b.subtitle AS
> blog_subtitle, r.rank, coalesce(a.updated,a.published,a.added) as ts, a.stub
> as article_stub FROM articles a join blogs b on a.id_blogs = b.id join urls
> au on a.id_urls = au.id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN
> ranks r on a.id = r.id_articles WHERE b.id_urls is not null AND a.hidden is
> false AND b.hidden is false AND a.ref is not null AND b.ref is not null and
> (rankid in (SELECT rankid FROM ranks order by rankid desc limit 1) OR rankid
> is null) AND coalesce(a.updated,a.published,a.added) > '' Processing
> Document # 1
>
> Regards
> Florian
>
>
>> I recommend you do delta-import using a full-import
>>
>> it can be done as follows
>> define a diffferent entity
>>
>> <dataConfig>
>> <dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
>> url="jdbc:postgresql://bm02:5432/bm" user="user" />
>>
>> <document name="articles">
>> <entity name="articles-full" ..>
>> </entity>
>>
>> <entity name="articles-delta rootEntity="false"
>> query="<your-delta-query-here>">
>> <!-- this following entity can be a copy articles-full entity
>> without any delta query because rootEntity=false for
>> articles-delta the following will be used for creating
>> documents. all other rules are same-->
>> <entity name="anyname" ..>
>> </entity>
>> </entity>
>> </document>
>>
>> when you wish to do a full-import pass the request parameter
>> entity=articles-full
>>
>> for delta-import use the request parameter
>> entity=articles-delta&clean=false (command has to be full-import only)
>>
>>
>>
>> On Wed, Oct 15, 2008 at 1:42 PM, Florian Aumeier
>> <fa...@mediaventures.de> wrote:
>>
>>>
>>> Shalin Shekhar Mangar schrieb:
>>>
>>>>
>>>> You are missing the "pk" field (primary key). This is used for delta
>>>> imports.
>>>>
>>>>
>>>
>>> I added the pk field and rebuild the index yesterday. However, when I run
>>> the delta-import, I still have this error message in the log:
>>>
>>> INFO: Starting delta collection.
>>> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Running ModifiedRowKey() for Entity: articles
>>> Oct 15, 2008 9:37:27 AM
>>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>>> call
>>> INFO: Creating a connection for entity articles with URL:
>>> jdbc:postgresql://bm02:5432/bm
>>> Oct 15, 2008 9:37:27 AM
>>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>>> call
>>> INFO: Time taken for getConnection(): 43
>>> Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute
>>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>>> Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute
>>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Running DeletedRowKey() for Entity: articles
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Completed parentDeltaQuery for Entity: articles
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter
>>> doDeltaImport
>>> SEVERE: Delta Import Failed
>>> java.lang.NullPointerException
>>> at
>>>
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
>>> at
>>>
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
>>> at
>>>
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>>> at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
>>> at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
>>> at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
>>> at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
>>> at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
>>> at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>> Oct 15, 2008 9:50:58 AM org.apache.solr.core.SolrCore execute
>>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>>>
>>> Regards
>>> Florian
>>>
>>>
>>
>>
>>
>>
>
>
> --
> Media Ventures GmbH Entwicklung Blogmonitor.de
>
> Jabber-ID faumeier@mabber.de
> Telefon +49 (0) 2236 480 10 22
>
>
--
--Noble Paul
Re: error with delta import
Posted by Florian Aumeier <fa...@mediaventures.de>.
Noble Paul നോബിള് नोब्ळ् schrieb:
> The delta implementation is a bit fragile in DIH for complex queries
>
>
that's too bad. It's a nice interface and less complex to configure than
to go the XML /update way.
Well, when doing the way you described below (full-import with the delta
query), the '${dataimporter.last_index_time}' timestamp is empty:
Oct 16, 2008 10:14:53 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS
article_ref,a.id_blogs,a.title AS article_title, a.normalized_text,
au.url AS article_url, bu.url AS blog_url, b.title AS
blog_title,b.subtitle AS blog_subtitle, r.rank,
coalesce(a.updated,a.published,a.added) as ts, a.stub as article_stub
FROM articles a join blogs b on a.id_blogs = b.id join urls au on
a.id_urls = au.id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN
ranks r on a.id = r.id_articles WHERE b.id_urls is not null AND a.hidden
is false AND b.hidden is false AND a.ref is not null AND b.ref is not
null and (rankid in (SELECT rankid FROM ranks order by rankid desc limit
1) OR rankid is null) AND coalesce(a.updated,a.published,a.added) > ''
Processing Document # 1
Regards
Florian
> I recommend you do delta-import using a full-import
>
> it can be done as follows
> define a diffferent entity
>
> <dataConfig>
> <dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
> url="jdbc:postgresql://bm02:5432/bm" user="user" />
>
> <document name="articles">
> <entity name="articles-full" ..>
> </entity>
>
> <entity name="articles-delta rootEntity="false"
> query="<your-delta-query-here>">
> <!-- this following entity can be a copy articles-full entity
> without any delta query because rootEntity=false for
> articles-delta the following will be used for creating
> documents. all other rules are same-->
> <entity name="anyname" ..>
> </entity>
> </entity>
> </document>
>
> when you wish to do a full-import pass the request parameter
> entity=articles-full
>
> for delta-import use the request parameter
> entity=articles-delta&clean=false (command has to be full-import only)
>
>
>
> On Wed, Oct 15, 2008 at 1:42 PM, Florian Aumeier
> <fa...@mediaventures.de> wrote:
>
>> Shalin Shekhar Mangar schrieb:
>>
>>> You are missing the "pk" field (primary key). This is used for delta
>>> imports.
>>>
>>>
>> I added the pk field and rebuild the index yesterday. However, when I run
>> the delta-import, I still have this error message in the log:
>>
>> INFO: Starting delta collection.
>> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Running ModifiedRowKey() for Entity: articles
>> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Creating a connection for entity articles with URL:
>> jdbc:postgresql://bm02:5432/bm
>> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Time taken for getConnection(): 43
>> Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute
>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>> Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute
>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584
>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Running DeletedRowKey() for Entity: articles
>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0
>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed parentDeltaQuery for Entity: articles
>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter
>> doDeltaImport
>> SEVERE: Delta Import Failed
>> java.lang.NullPointerException
>> at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
>> at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
>> at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
>> at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
>> at
>> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
>> at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
>> at
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>> Oct 15, 2008 9:50:58 AM org.apache.solr.core.SolrCore execute
>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>>
>> Regards
>> Florian
>>
>>
>
>
>
>
--
Media Ventures GmbH
Entwicklung Blogmonitor.de
Jabber-ID faumeier@mabber.de
Telefon +49 (0) 2236 480 10 22
Re: error with delta import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
The delta implementation is a bit fragile in DIH for complex queries
I recommend you do delta-import using a full-import
it can be done as follows
define a diffferent entity
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
url="jdbc:postgresql://bm02:5432/bm" user="user" />
<document name="articles">
<entity name="articles-full" ..>
</entity>
<entity name="articles-delta rootEntity="false"
query="<your-delta-query-here>">
<!-- this following entity can be a copy articles-full entity
without any delta query because rootEntity=false for
articles-delta the following will be used for creating
documents. all other rules are same-->
<entity name="anyname" ..>
</entity>
</entity>
</document>
when you wish to do a full-import pass the request parameter
entity=articles-full
for delta-import use the request parameter
entity=articles-delta&clean=false (command has to be full-import only)
On Wed, Oct 15, 2008 at 1:42 PM, Florian Aumeier
<fa...@mediaventures.de> wrote:
> Shalin Shekhar Mangar schrieb:
>>
>> You are missing the "pk" field (primary key). This is used for delta
>> imports.
>>
>
> I added the pk field and rebuild the index yesterday. However, when I run
> the delta-import, I still have this error message in the log:
>
> INFO: Starting delta collection.
> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: articles
> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity articles with URL:
> jdbc:postgresql://bm02:5432/bm
> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Time taken for getConnection(): 43
> Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute
> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
> Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute
> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584
> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running DeletedRowKey() for Entity: articles
> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0
> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: articles
> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> SEVERE: Delta Import Failed
> java.lang.NullPointerException
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
> at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
> Oct 15, 2008 9:50:58 AM org.apache.solr.core.SolrCore execute
> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>
> Regards
> Florian
>
--
--Noble Paul
Re: error with delta import
Posted by Florian Aumeier <fa...@mediaventures.de>.
Shalin Shekhar Mangar schrieb:
> You are missing the "pk" field (primary key). This is used for delta
> imports.
>
I added the pk field and rebuild the index yesterday. However, when I
run the delta-import, I still have this error message in the log:
INFO: Starting delta collection.
Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: articles
Oct 15, 2008 9:37:27 AM
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity articles with URL:
jdbc:postgresql://bm02:5432/bm
Oct 15, 2008 9:37:27 AM
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 43
Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running DeletedRowKey() for Entity: articles
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: articles
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
SEVERE: Delta Import Failed
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
Oct 15, 2008 9:50:58 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
Regards
Florian
Re: error with delta import
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
You are missing the "pk" field (primary key). This is used for delta
imports.
On Tue, Oct 14, 2008 at 6:16 PM, Florian Aumeier
<fa...@mediaventures.de>wrote:
> Noble Paul നോബിള് नोब्ळ् schrieb:
>
>> apparently you have not specified the deltaQuery attribute in the entity.
>> Check the delta-import section in the wiki
>> http://wiki.apache.org/solr/DataImportHandler
>> or you can share your data-config file and we can take a quick look
>>
>>
>>
> here is my data-config. I configured both, the deltaQuery and query entity
> in one data-config. Is this the correct usecase?
> Also, I found it easier to join the document on the database level instead
> of leaving it to solr.
>
> <dataConfig>
> <dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
> url="jdbc:postgresql://bm02:5432/bm" user="user" />
>
> <document name="articles">
> <entity name="articles" deltaQuery="SELECT a.id AS article_id,a.stub AS
> article_stub,a.ref AS article_ref,a.id_blogs,a.title AS article_title,
> a.normalized_text, au.url AS article_url, bu.url AS blog_url, b.title AS
> blog_title,b.subtitle AS blog_subtitle, r.rank,
> coalesce(a.updated,a.published,a.added) as ts FROM articles a join blogs b
> on a.id_blogs = b.id join urls au on a.id_urls = au.id join urls bu on
> b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id = r.id_articles WHERE
> b.id_urls is not null AND a.hidden is false AND b.hidden is false AND a.ref
> is not null AND b.ref is not null AND (rankid in (SELECT rankid FROM ranks
> order by rankid desc limit 1) OR rankid is null) AND
> coalesce(a.updated,a.published,a.added) >
> '${dataimporter.last_index_time}'"
> query="SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS
> article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, au.url
> AS article_url, bu.url AS blog_url, b.t\
> itle AS blog_title,b.subtitle AS blog_subtitle, r.rank,
> coalesce(a.updated,a.published,a.added) as ts FROM articles a join blogs b
> on a.id_blogs = b.id join urls au on a.id_urls = au\
> .id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id =
> r.id_articles WHERE b.id_urls is not null AND a.hidden is false AND b.hidden
> is false AND a.ref is not null AN\
> D b.ref is not null AND (rankid in (SELECT rankid FROM ranks order by
> rankid desc limit 1) OR rankid is null) AND
> coalesce(a.updated,a.published,a.added)">
> <field column="article_id" name="a_id" />
> <field column="normalized_text" name="norm_text" />
> <field column="article_ref" name="id" />
> <field column="article_stub" name="stub" />
> <field column="id_blogs" name="blog_id" />
> <field column="article_title" name="a_title" />
> <field column="article_url" name="article_url" />
> <field column="ts" name="ts" />
> <field column="rank" name="rank" />
> <field column="blog_ref" name="blog_ref" />
> <field column="blog_title" name="b_title" />
> <field column="blog_subtitle" name="subtitle" />
>
> <field column="blog_url" name="blog_url" />
>
> </entity>
>
> </document>
>
> </dataConfig>
>
> Florian
>
>
--
Regards,
Shalin Shekhar Mangar.
Re: error with delta import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
the query makes my head spin .
joining on an sql does not enable you to populate multivalued fields .
Otherwise , it is all fine
pk attribute is missing in the entity
On Tue, Oct 14, 2008 at 6:16 PM, Florian Aumeier
<fa...@mediaventures.de> wrote:
> Noble Paul നോബിള് नोब्ळ् schrieb:
>>
>> apparently you have not specified the deltaQuery attribute in the entity.
>> Check the delta-import section in the wiki
>> http://wiki.apache.org/solr/DataImportHandler
>> or you can share your data-config file and we can take a quick look
>>
>>
>
> here is my data-config. I configured both, the deltaQuery and query entity
> in one data-config. Is this the correct usecase?
> Also, I found it easier to join the document on the database level instead
> of leaving it to solr.
>
> <dataConfig>
> <dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
> url="jdbc:postgresql://bm02:5432/bm" user="user" />
>
> <document name="articles">
> <entity name="articles" deltaQuery="SELECT a.id AS article_id,a.stub AS
> article_stub,a.ref AS article_ref,a.id_blogs,a.title AS article_title,
> a.normalized_text, au.url AS article_url, bu.url AS blog_url, b.title AS
> blog_title,b.subtitle AS blog_subtitle, r.rank,
> coalesce(a.updated,a.published,a.added) as ts FROM articles a join blogs b
> on a.id_blogs = b.id join urls au on a.id_urls = au.id join urls bu on
> b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id = r.id_articles WHERE
> b.id_urls is not null AND a.hidden is false AND b.hidden is false AND a.ref
> is not null AND b.ref is not null AND (rankid in (SELECT rankid FROM ranks
> order by rankid desc limit 1) OR rankid is null) AND
> coalesce(a.updated,a.published,a.added) >
> '${dataimporter.last_index_time}'"
> query="SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS
> article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, au.url
> AS article_url, bu.url AS blog_url, b.t\
> itle AS blog_title,b.subtitle AS blog_subtitle, r.rank,
> coalesce(a.updated,a.published,a.added) as ts FROM articles a join blogs b
> on a.id_blogs = b.id join urls au on a.id_urls = au\
> .id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id =
> r.id_articles WHERE b.id_urls is not null AND a.hidden is false AND b.hidden
> is false AND a.ref is not null AN\
> D b.ref is not null AND (rankid in (SELECT rankid FROM ranks order by rankid
> desc limit 1) OR rankid is null) AND
> coalesce(a.updated,a.published,a.added)">
> <field column="article_id" name="a_id" />
> <field column="normalized_text" name="norm_text" />
> <field column="article_ref" name="id" />
> <field column="article_stub" name="stub" />
> <field column="id_blogs" name="blog_id" />
> <field column="article_title" name="a_title" />
> <field column="article_url" name="article_url" />
> <field column="ts" name="ts" />
> <field column="rank" name="rank" />
> <field column="blog_ref" name="blog_ref" />
> <field column="blog_title" name="b_title" />
> <field column="blog_subtitle" name="subtitle" />
>
> <field column="blog_url" name="blog_url" />
>
> </entity>
>
> </document>
>
> </dataConfig>
>
> Florian
>
>
--
--Noble Paul
Re: error with delta import
Posted by Florian Aumeier <fa...@mediaventures.de>.
Noble Paul നോബിള് नोब्ळ् schrieb:
> apparently you have not specified the deltaQuery attribute in the entity.
> Check the delta-import section in the wiki
> http://wiki.apache.org/solr/DataImportHandler
> or you can share your data-config file and we can take a quick look
>
>
here is my data-config. I configured both, the deltaQuery and query
entity in one data-config. Is this the correct usecase?
Also, I found it easier to join the document on the database level
instead of leaving it to solr.
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.postgresql.Driver"
url="jdbc:postgresql://bm02:5432/bm" user="user" />
<document name="articles">
<entity name="articles" deltaQuery="SELECT a.id AS article_id,a.stub AS
article_stub,a.ref AS article_ref,a.id_blogs,a.title AS article_title,
a.normalized_text, au.url AS article_url, bu.url AS blog_url, b.title AS
blog_title,b.subtitle AS blog_subtitle, r.rank,
coalesce(a.updated,a.published,a.added) as ts FROM articles a join blogs
b on a.id_blogs = b.id join urls au on a.id_urls = au.id join urls bu on
b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id = r.id_articles WHERE
b.id_urls is not null AND a.hidden is false AND b.hidden is false AND
a.ref is not null AND b.ref is not null AND (rankid in (SELECT rankid
FROM ranks order by rankid desc limit 1) OR rankid is null) AND
coalesce(a.updated,a.published,a.added) >
'${dataimporter.last_index_time}'"
query="SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS
article_ref,a.id_blogs,a.title AS article_title, a.normalized_text,
au.url AS article_url, bu.url AS blog_url, b.t\
itle AS blog_title,b.subtitle AS blog_subtitle, r.rank,
coalesce(a.updated,a.published,a.added) as ts FROM articles a join blogs
b on a.id_blogs = b.id join urls au on a.id_urls = au\
.id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id =
r.id_articles WHERE b.id_urls is not null AND a.hidden is false AND
b.hidden is false AND a.ref is not null AN\
D b.ref is not null AND (rankid in (SELECT rankid FROM ranks order by
rankid desc limit 1) OR rankid is null) AND
coalesce(a.updated,a.published,a.added)">
<field column="article_id" name="a_id" />
<field column="normalized_text" name="norm_text" />
<field column="article_ref" name="id" />
<field column="article_stub" name="stub" />
<field column="id_blogs" name="blog_id" />
<field column="article_title" name="a_title" />
<field column="article_url" name="article_url" />
<field column="ts" name="ts" />
<field column="rank" name="rank" />
<field column="blog_ref" name="blog_ref" />
<field column="blog_title" name="b_title" />
<field column="blog_subtitle" name="subtitle" />
<field column="blog_url" name="blog_url" />
</entity>
</document>
</dataConfig>
Florian
Re: error with delta import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
apparently you have not specified the deltaQuery attribute in the entity.
Check the delta-import section in the wiki
http://wiki.apache.org/solr/DataImportHandler
or you can share your data-config file and we can take a quick look
On Tue, Oct 14, 2008 at 5:05 PM, Florian Aumeier
<fa...@mediaventures.de> wrote:
> Hi,
>
> I have some problems with delta-import. Here are the infos I have.
>
> The result from the web API, apparantly everything is fine:
> <response>
> −
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> </lst>
> −
> <lst name="initArgs">
> −
> <lst name="defaults">
> <str name="config">db-psql-data-config.xml</str>
> </lst>
> </lst>
> <str name="status">idle</str>
> <str name="importResponse"/>
> −
> <lst name="statusMessages">
> <str name="Time Elapsed">0:29:30.615</str>
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">16194</str>
> <str name="Total Documents Processed">0</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Delta Dump started">2008-10-14 11:23:31</str>
> <str name="Identifying Delta">2008-10-14 11:23:31</str>
> <str name="Deltas Obtained">2008-10-14 11:32:16</str>
> <str name="Building documents">2008-10-14 11:32:16</str>
> <str name="Total Changed Documents">16194</str>
> </lst>
> −
> <str name="WARNING">
> This response format is experimental. It is likely to change in the future.
> </str>
> </response>
>
> From the log:
> INFO: Starting Delta Import
> Oct 14, 2008 11:23:31 AM org.apache.solr.core.SolrCore execute
> INFO: [db] webapp=/solr path=/dataimport params={command=delta-import}
> status=0 QTime=1
> Oct 14, 2008 11:23:31 AM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Oct 14, 2008 11:23:31 AM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Starting delta collection.
> Oct 14, 2008 11:23:31 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: articles
> Oct 14, 2008 11:23:31 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity articles with URL:
> jdbc:postgresql://bm02:5432/bm
> Oct 14, 2008 11:23:35 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Time taken for getConnection(): 3694
> Oct 14, 2008 11:29:16 AM org.apache.solr.core.SolrCore execute
> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
> Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 16194
> Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running DeletedRowKey() for Entity: articles
> Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0
> Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: articles
> Oct 14, 2008 11:32:16 AM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> SEVERE: Delta Import Failed
> java.lang.NullPointerException
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:136)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
> at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>
>
> Any help and or hints is appreciated
> Florian
>
>
>
--
--Noble Paul