You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by ju...@francelabs.com on 2021/07/13 13:39:27 UTC
Solr output connector - behavior on some exceptions
Hi,
I would like to change the behavior of the Solr output connector concerning
two exception handling cases :
1. In the current < handleIOException > method of the HttpPoster class,
the < unknown > case looks like this :
As the comment says, we don't know the type of IOException, so it is not
necessary to make the ServiceInterruption fail after a period, especially
since all < Solr down > exceptions have been handled upstream
2. The current < handleSolrServerException > method of the HttPoster
class. Same as above, this method is called for an unknown exception that
cannot be related to a < Solr down > issue; it can only be related to some
missconfiguration or document specific issue. It is therefore not necessary
to throw a ManifoldCFException that will stop the job with a failure state
What do you think ? If you agree with me, I can create a ticket for that and
submit a patch. This would allow to graciously keep the job running while
properly skipping identified exceptions.
Regards,
Julien
--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
RE: Solr output connector - behavior on some exceptions
Posted by ju...@francelabs.com.
Here is the code snippet that I mentionned in the section 1) of the previous
mail :
// Otherwise, no idea what the trouble is, so presume that retries might fix
it.
String message3 = "IO exception during "+context+": "+e.getMessage();
Logging.ingest.warn(message3,e);
throw new ServiceInterruption(message3,
e,
currentTime + interruptionRetryTime,
-1L,
3,
false);
De : julien.massiera@francelabs.com <ju...@francelabs.com>
Envoyé : mardi 13 juillet 2021 15:39
À : dev@manifoldcf.apache.org
Objet : Solr output connector - behavior on some exceptions
Hi,
I would like to change the behavior of the Solr output connector concerning
two exception handling cases :
1. In the current « handleIOException » method of the HttpPoster class,
the « unknown » case looks like this :
As the comment says, we dont know the type of IOException, so it is not
necessary to make the ServiceInterruption fail after a period, especially
since all « Solr down » exceptions have been handled upstream
2. The current « handleSolrServerException » method of the HttPoster
class. Same as above, this method is called for an unknown exception that
cannot be related to a « Solr down » issue; it can only be related to some
missconfiguration or document specific issue. It is therefore not necessary
to throw a ManifoldCFException that will stop the job with a failure state
What do you think ? If you agree with me, I can create a ticket for that and
submit a patch. This would allow to graciously keep the job running while
properly skipping identified exceptions.
Regards,
Julien
Re: Solr output connector - behavior on some exceptions
Posted by Karl Wright <da...@gmail.com>.
It is called the "Lucene/Solr Connector" component.
Karl
On Tue, Jul 13, 2021 at 10:11 AM <ju...@francelabs.com> wrote:
> Ok, there is no "Solr connector" component in JIRA, can you add it please
> ?
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : mardi 13 juillet 2021 16:04
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: Solr output connector - behavior on some exceptions
>
> Null values causing exceptions in the output connector should be addressed
> independently in the output connector. But basically as long as that is
> done I am fine with your proposal.
>
> Karl
>
>
> On Tue, Jul 13, 2021 at 9:59 AM <ju...@francelabs.com> wrote:
>
> > I ended up in that part of the code while debugging after we had a
> > crawling job stopped because of an exception concerning a document
> > having a null value for a specific metadata and another one with a
> > value that triggered a request parsing issue on Solr side.
> >
> > Julien
> >
> > -----Message d'origine-----
> > De : Karl Wright <da...@gmail.com>
> > Envoyé : mardi 13 juillet 2021 15:48
> > À : dev <de...@manifoldcf.apache.org>
> > Objet : Re: Solr output connector - behavior on some exceptions
> >
> > If the "solr is down" exceptions are indeed caught upstream, I'm
> > tentatively in agreement that this fallback logic can be changed. But
> > I would like to understand what specifically you are seeing this happen
> for.
> > What cases are you hoping to improve?
> >
> > Karl
> >
> >
> > On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I would like to change the behavior of the Solr output connector
> > > concerning two exception handling cases :
> > >
> > >
> > >
> > > 1. In the current « handleIOException » method of the HttpPoster
> > > class, the « unknown » case looks like this :
> > >
> > >
> > >
> > > As the comment says, we don’t know the type of IOException, so it is
> > > not necessary to make the ServiceInterruption fail after a period,
> > > especially since all « Solr down » exceptions have been handled
> > > upstream
> > >
> > > 2. The current « handleSolrServerException » method of the HttPoster
> > > class. Same as above, this method is called for an unknown
> > > exception
> > that
> > > cannot be related to a « Solr down » issue; it can only be
> > > related to
> > some
> > > missconfiguration or document specific issue. It is therefore not
> > necessary
> > > to throw a ManifoldCFException that will stop the job with a
> > > failure state
> > >
> > >
> > >
> > >
> > >
> > > What do you think ? If you agree with me, I can create a ticket for
> > > that and submit a patch. This would allow to graciously keep the job
> > > running while properly skipping identified exceptions.
> > >
> > >
> > >
> > >
> > >
> > > Regards,
> > > Julien
> > >
> > >
> > >
> > >
> > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&ut
> > > m_ campaign=sig-email&utm_content=emailclient> Garanti sans virus.
> > > www.avast.com
> > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&ut
> > > m_ campaign=sig-email&utm_content=emailclient>
> > > <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > >
> >
> >
>
>
RE: Solr output connector - behavior on some exceptions
Posted by ju...@francelabs.com.
Ok, there is no "Solr connector" component in JIRA, can you add it please ?
-----Message d'origine-----
De : Karl Wright <da...@gmail.com>
Envoyé : mardi 13 juillet 2021 16:04
À : dev <de...@manifoldcf.apache.org>
Objet : Re: Solr output connector - behavior on some exceptions
Null values causing exceptions in the output connector should be addressed independently in the output connector. But basically as long as that is done I am fine with your proposal.
Karl
On Tue, Jul 13, 2021 at 9:59 AM <ju...@francelabs.com> wrote:
> I ended up in that part of the code while debugging after we had a
> crawling job stopped because of an exception concerning a document
> having a null value for a specific metadata and another one with a
> value that triggered a request parsing issue on Solr side.
>
> Julien
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : mardi 13 juillet 2021 15:48
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: Solr output connector - behavior on some exceptions
>
> If the "solr is down" exceptions are indeed caught upstream, I'm
> tentatively in agreement that this fallback logic can be changed. But
> I would like to understand what specifically you are seeing this happen for.
> What cases are you hoping to improve?
>
> Karl
>
>
> On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:
>
> > Hi,
> >
> >
> >
> > I would like to change the behavior of the Solr output connector
> > concerning two exception handling cases :
> >
> >
> >
> > 1. In the current « handleIOException » method of the HttpPoster
> > class, the « unknown » case looks like this :
> >
> >
> >
> > As the comment says, we don’t know the type of IOException, so it is
> > not necessary to make the ServiceInterruption fail after a period,
> > especially since all « Solr down » exceptions have been handled
> > upstream
> >
> > 2. The current « handleSolrServerException » method of the HttPoster
> > class. Same as above, this method is called for an unknown
> > exception
> that
> > cannot be related to a « Solr down » issue; it can only be
> > related to
> some
> > missconfiguration or document specific issue. It is therefore not
> necessary
> > to throw a ManifoldCFException that will stop the job with a
> > failure state
> >
> >
> >
> >
> >
> > What do you think ? If you agree with me, I can create a ticket for
> > that and submit a patch. This would allow to graciously keep the job
> > running while properly skipping identified exceptions.
> >
> >
> >
> >
> >
> > Regards,
> > Julien
> >
> >
> >
> >
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&ut
> > m_ campaign=sig-email&utm_content=emailclient> Garanti sans virus.
> > www.avast.com
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&ut
> > m_ campaign=sig-email&utm_content=emailclient>
> > <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
>
>
Re: Solr output connector - behavior on some exceptions
Posted by Karl Wright <da...@gmail.com>.
Null values causing exceptions in the output connector should be addressed
independently in the output connector. But basically as long as that is
done I am fine with your proposal.
Karl
On Tue, Jul 13, 2021 at 9:59 AM <ju...@francelabs.com> wrote:
> I ended up in that part of the code while debugging after we had a
> crawling job stopped because of an exception concerning a document having a
> null value for a specific metadata and another one with a value that
> triggered a request parsing issue on Solr side.
>
> Julien
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : mardi 13 juillet 2021 15:48
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: Solr output connector - behavior on some exceptions
>
> If the "solr is down" exceptions are indeed caught upstream, I'm
> tentatively in agreement that this fallback logic can be changed. But I
> would like to understand what specifically you are seeing this happen for.
> What cases are you hoping to improve?
>
> Karl
>
>
> On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:
>
> > Hi,
> >
> >
> >
> > I would like to change the behavior of the Solr output connector
> > concerning two exception handling cases :
> >
> >
> >
> > 1. In the current « handleIOException » method of the HttpPoster
> > class, the « unknown » case looks like this :
> >
> >
> >
> > As the comment says, we don’t know the type of IOException, so it is
> > not necessary to make the ServiceInterruption fail after a period,
> > especially since all « Solr down » exceptions have been handled
> > upstream
> >
> > 2. The current « handleSolrServerException » method of the HttPoster
> > class. Same as above, this method is called for an unknown exception
> that
> > cannot be related to a « Solr down » issue; it can only be related to
> some
> > missconfiguration or document specific issue. It is therefore not
> necessary
> > to throw a ManifoldCFException that will stop the job with a
> > failure state
> >
> >
> >
> >
> >
> > What do you think ? If you agree with me, I can create a ticket for
> > that and submit a patch. This would allow to graciously keep the job
> > running while properly skipping identified exceptions.
> >
> >
> >
> >
> >
> > Regards,
> > Julien
> >
> >
> >
> >
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> > campaign=sig-email&utm_content=emailclient> Garanti sans virus.
> > www.avast.com
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> > campaign=sig-email&utm_content=emailclient>
> > <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
>
>
RE: Solr output connector - behavior on some exceptions
Posted by ju...@francelabs.com.
I ended up in that part of the code while debugging after we had a crawling job stopped because of an exception concerning a document having a null value for a specific metadata and another one with a value that triggered a request parsing issue on Solr side.
Julien
-----Message d'origine-----
De : Karl Wright <da...@gmail.com>
Envoyé : mardi 13 juillet 2021 15:48
À : dev <de...@manifoldcf.apache.org>
Objet : Re: Solr output connector - behavior on some exceptions
If the "solr is down" exceptions are indeed caught upstream, I'm tentatively in agreement that this fallback logic can be changed. But I would like to understand what specifically you are seeing this happen for.
What cases are you hoping to improve?
Karl
On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:
> Hi,
>
>
>
> I would like to change the behavior of the Solr output connector
> concerning two exception handling cases :
>
>
>
> 1. In the current « handleIOException » method of the HttpPoster
> class, the « unknown » case looks like this :
>
>
>
> As the comment says, we don’t know the type of IOException, so it is
> not necessary to make the ServiceInterruption fail after a period,
> especially since all « Solr down » exceptions have been handled
> upstream
>
> 2. The current « handleSolrServerException » method of the HttPoster
> class. Same as above, this method is called for an unknown exception that
> cannot be related to a « Solr down » issue; it can only be related to some
> missconfiguration or document specific issue. It is therefore not necessary
> to throw a ManifoldCFException that will stop the job with a
> failure state
>
>
>
>
>
> What do you think ? If you agree with me, I can create a ticket for
> that and submit a patch. This would allow to graciously keep the job
> running while properly skipping identified exceptions.
>
>
>
>
>
> Regards,
> Julien
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> campaign=sig-email&utm_content=emailclient> Garanti sans virus.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> campaign=sig-email&utm_content=emailclient>
> <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
Re: Solr output connector - behavior on some exceptions
Posted by Karl Wright <da...@gmail.com>.
If the "solr is down" exceptions are indeed caught upstream, I'm
tentatively in agreement that this fallback logic can be changed. But I
would like to understand what specifically you are seeing this happen for.
What cases are you hoping to improve?
Karl
On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:
> Hi,
>
>
>
> I would like to change the behavior of the Solr output connector
> concerning two exception handling cases :
>
>
>
> 1. In the current « handleIOException » method of the HttpPoster
> class, the « unknown » case looks like this :
>
>
>
> As the comment says, we don’t know the type of IOException, so it is
> not necessary to make the ServiceInterruption fail after a period,
> especially since all « Solr down » exceptions have been handled upstream
>
> 2. The current « handleSolrServerException » method of the HttPoster
> class. Same as above, this method is called for an unknown exception that
> cannot be related to a « Solr down » issue; it can only be related to some
> missconfiguration or document specific issue. It is therefore not necessary
> to throw a ManifoldCFException that will stop the job with a failure state
>
>
>
>
>
> What do you think ? If you agree with me, I can create a ticket for that
> and submit a patch. This would allow to graciously keep the job running
> while properly skipping identified exceptions.
>
>
>
>
>
> Regards,
> Julien
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> Garanti
> sans virus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>