You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by ju...@francelabs.com on 2021/07/13 13:39:27 UTC

Solr output connector - behavior on some exceptions

Hi, 

 

I would like to change the behavior of the Solr output connector concerning
two exception handling cases :

 

1.	In the current < handleIOException > method of the HttpPoster class,
the < unknown > case looks like this :



As the comment says, we don't know the type of IOException, so it is not
necessary to make the ServiceInterruption fail after a period, especially
since all < Solr down > exceptions have been handled upstream


2.	The current < handleSolrServerException > method of the HttPoster
class. Same as above, this method is called for an unknown exception that
cannot be related to a < Solr down > issue; it can only be related to some
missconfiguration or document specific issue. It is therefore not necessary
to throw a ManifoldCFException that will stop the job with a failure state

 

 

What do you think ? If you agree with me, I can create a ticket for that and
submit a patch. This would allow to graciously keep the job running while
properly skipping identified exceptions.

 

 

Regards,
Julien

 



-- 
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

RE: Solr output connector - behavior on some exceptions

Posted by ju...@francelabs.com.
Here is the code snippet that I mentionned in the section 1) of the previous
mail :

 

// Otherwise, no idea what the trouble is, so presume that retries might fix
it.

    String message3 = "IO exception during "+context+": "+e.getMessage();

    Logging.ingest.warn(message3,e);

    throw new ServiceInterruption(message3,

     e,

      currentTime + interruptionRetryTime,

      -1L,

      3,

      false);

 

De : julien.massiera@francelabs.com <ju...@francelabs.com> 
Envoyé : mardi 13 juillet 2021 15:39
À : dev@manifoldcf.apache.org
Objet : Solr output connector - behavior on some exceptions

 

Hi, 

 

I would like to change the behavior of the Solr output connector concerning
two exception handling cases :

 

1.	In the current « handleIOException » method of the HttpPoster class,
the « unknown » case looks like this :



As the comment says, we don’t know the type of IOException, so it is not
necessary to make the ServiceInterruption fail after a period, especially
since all « Solr down » exceptions have been handled upstream
2.	The current « handleSolrServerException » method of the HttPoster
class. Same as above, this method is called for an unknown exception that
cannot be related to a « Solr down » issue; it can only be related to some
missconfiguration or document specific issue. It is therefore not necessary
to throw a ManifoldCFException that will stop the job with a failure state

 

 

What do you think ? If you agree with me, I can create a ticket for that and
submit a patch. This would allow to graciously keep the job running while
properly skipping identified exceptions.

 

 

Regards,
Julien

 


Re: Solr output connector - behavior on some exceptions

Posted by Karl Wright <da...@gmail.com>.
It is called the "Lucene/Solr Connector" component.
Karl


On Tue, Jul 13, 2021 at 10:11 AM <ju...@francelabs.com> wrote:

> Ok, there is no "Solr connector" component in JIRA, can you add it please
> ?
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : mardi 13 juillet 2021 16:04
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: Solr output connector - behavior on some exceptions
>
> Null values causing exceptions in the output connector should be addressed
> independently in the output connector.  But basically as long as that is
> done I am fine with your proposal.
>
> Karl
>
>
> On Tue, Jul 13, 2021 at 9:59 AM <ju...@francelabs.com> wrote:
>
> > I ended up in that part of the code while debugging after we had a
> > crawling job stopped because of an exception concerning a document
> > having a null value for a specific metadata and another one with a
> > value that triggered a request parsing issue on Solr side.
> >
> > Julien
> >
> > -----Message d'origine-----
> > De : Karl Wright <da...@gmail.com>
> > Envoyé : mardi 13 juillet 2021 15:48
> > À : dev <de...@manifoldcf.apache.org>
> > Objet : Re: Solr output connector - behavior on some exceptions
> >
> > If the "solr is down" exceptions are indeed caught upstream, I'm
> > tentatively in agreement that this fallback logic can be changed.  But
> > I would like to understand what specifically you are seeing this happen
> for.
> > What cases are you hoping to improve?
> >
> > Karl
> >
> >
> > On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I would like to change the behavior of the Solr output connector
> > > concerning two exception handling cases :
> > >
> > >
> > >
> > >    1. In the current « handleIOException » method of the HttpPoster
> > >    class, the « unknown » case looks like this :
> > >
> > >
> > >
> > >    As the comment says, we don’t know the type of IOException, so it is
> > >    not necessary to make the ServiceInterruption fail after a period,
> > >    especially since all « Solr down » exceptions have been handled
> > > upstream
> > >
> > >    2. The current « handleSolrServerException » method of the HttPoster
> > >    class. Same as above, this method is called for an unknown
> > > exception
> > that
> > >    cannot be related to a « Solr down » issue; it can only be
> > > related to
> > some
> > >    missconfiguration or document specific issue. It is therefore not
> > necessary
> > >    to throw a ManifoldCFException that will stop the job with a
> > > failure state
> > >
> > >
> > >
> > >
> > >
> > > What do you think ? If you agree with me, I can create a ticket for
> > > that and submit a patch. This would allow to graciously keep the job
> > > running while properly skipping identified exceptions.
> > >
> > >
> > >
> > >
> > >
> > > Regards,
> > > Julien
> > >
> > >
> > >
> > >
> > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&ut
> > > m_ campaign=sig-email&utm_content=emailclient> Garanti sans virus.
> > > www.avast.com
> > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&ut
> > > m_ campaign=sig-email&utm_content=emailclient>
> > > <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > >
> >
> >
>
>

RE: Solr output connector - behavior on some exceptions

Posted by ju...@francelabs.com.
Ok, there is no "Solr connector" component in JIRA, can you add it please ? 

-----Message d'origine-----
De : Karl Wright <da...@gmail.com> 
Envoyé : mardi 13 juillet 2021 16:04
À : dev <de...@manifoldcf.apache.org>
Objet : Re: Solr output connector - behavior on some exceptions

Null values causing exceptions in the output connector should be addressed independently in the output connector.  But basically as long as that is done I am fine with your proposal.

Karl


On Tue, Jul 13, 2021 at 9:59 AM <ju...@francelabs.com> wrote:

> I ended up in that part of the code while debugging after we had a 
> crawling job stopped because of an exception concerning a document 
> having a null value for a specific metadata and another one with a 
> value that triggered a request parsing issue on Solr side.
>
> Julien
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : mardi 13 juillet 2021 15:48
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: Solr output connector - behavior on some exceptions
>
> If the "solr is down" exceptions are indeed caught upstream, I'm 
> tentatively in agreement that this fallback logic can be changed.  But 
> I would like to understand what specifically you are seeing this happen for.
> What cases are you hoping to improve?
>
> Karl
>
>
> On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:
>
> > Hi,
> >
> >
> >
> > I would like to change the behavior of the Solr output connector 
> > concerning two exception handling cases :
> >
> >
> >
> >    1. In the current « handleIOException » method of the HttpPoster
> >    class, the « unknown » case looks like this :
> >
> >
> >
> >    As the comment says, we don’t know the type of IOException, so it is
> >    not necessary to make the ServiceInterruption fail after a period,
> >    especially since all « Solr down » exceptions have been handled 
> > upstream
> >
> >    2. The current « handleSolrServerException » method of the HttPoster
> >    class. Same as above, this method is called for an unknown 
> > exception
> that
> >    cannot be related to a « Solr down » issue; it can only be 
> > related to
> some
> >    missconfiguration or document specific issue. It is therefore not
> necessary
> >    to throw a ManifoldCFException that will stop the job with a 
> > failure state
> >
> >
> >
> >
> >
> > What do you think ? If you agree with me, I can create a ticket for 
> > that and submit a patch. This would allow to graciously keep the job 
> > running while properly skipping identified exceptions.
> >
> >
> >
> >
> >
> > Regards,
> > Julien
> >
> >
> >
> >
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&ut
> > m_ campaign=sig-email&utm_content=emailclient> Garanti sans virus.
> > www.avast.com
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&ut
> > m_ campaign=sig-email&utm_content=emailclient>
> > <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
>
>


Re: Solr output connector - behavior on some exceptions

Posted by Karl Wright <da...@gmail.com>.
Null values causing exceptions in the output connector should be addressed
independently in the output connector.  But basically as long as that is
done I am fine with your proposal.

Karl


On Tue, Jul 13, 2021 at 9:59 AM <ju...@francelabs.com> wrote:

> I ended up in that part of the code while debugging after we had a
> crawling job stopped because of an exception concerning a document having a
> null value for a specific metadata and another one with a value that
> triggered a request parsing issue on Solr side.
>
> Julien
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : mardi 13 juillet 2021 15:48
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: Solr output connector - behavior on some exceptions
>
> If the "solr is down" exceptions are indeed caught upstream, I'm
> tentatively in agreement that this fallback logic can be changed.  But I
> would like to understand what specifically you are seeing this happen for.
> What cases are you hoping to improve?
>
> Karl
>
>
> On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:
>
> > Hi,
> >
> >
> >
> > I would like to change the behavior of the Solr output connector
> > concerning two exception handling cases :
> >
> >
> >
> >    1. In the current « handleIOException » method of the HttpPoster
> >    class, the « unknown » case looks like this :
> >
> >
> >
> >    As the comment says, we don’t know the type of IOException, so it is
> >    not necessary to make the ServiceInterruption fail after a period,
> >    especially since all « Solr down » exceptions have been handled
> > upstream
> >
> >    2. The current « handleSolrServerException » method of the HttPoster
> >    class. Same as above, this method is called for an unknown exception
> that
> >    cannot be related to a « Solr down » issue; it can only be related to
> some
> >    missconfiguration or document specific issue. It is therefore not
> necessary
> >    to throw a ManifoldCFException that will stop the job with a
> > failure state
> >
> >
> >
> >
> >
> > What do you think ? If you agree with me, I can create a ticket for
> > that and submit a patch. This would allow to graciously keep the job
> > running while properly skipping identified exceptions.
> >
> >
> >
> >
> >
> > Regards,
> > Julien
> >
> >
> >
> >
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> > campaign=sig-email&utm_content=emailclient> Garanti sans virus.
> > www.avast.com
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> > campaign=sig-email&utm_content=emailclient>
> > <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
>
>

RE: Solr output connector - behavior on some exceptions

Posted by ju...@francelabs.com.
I ended up in that part of the code while debugging after we had a crawling job stopped because of an exception concerning a document having a null value for a specific metadata and another one with a value that triggered a request parsing issue on Solr side. 

Julien

-----Message d'origine-----
De : Karl Wright <da...@gmail.com> 
Envoyé : mardi 13 juillet 2021 15:48
À : dev <de...@manifoldcf.apache.org>
Objet : Re: Solr output connector - behavior on some exceptions

If the "solr is down" exceptions are indeed caught upstream, I'm tentatively in agreement that this fallback logic can be changed.  But I would like to understand what specifically you are seeing this happen for.
What cases are you hoping to improve?

Karl


On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:

> Hi,
>
>
>
> I would like to change the behavior of the Solr output connector 
> concerning two exception handling cases :
>
>
>
>    1. In the current « handleIOException » method of the HttpPoster
>    class, the « unknown » case looks like this :
>
>
>
>    As the comment says, we don’t know the type of IOException, so it is
>    not necessary to make the ServiceInterruption fail after a period,
>    especially since all « Solr down » exceptions have been handled 
> upstream
>
>    2. The current « handleSolrServerException » method of the HttPoster
>    class. Same as above, this method is called for an unknown exception that
>    cannot be related to a « Solr down » issue; it can only be related to some
>    missconfiguration or document specific issue. It is therefore not necessary
>    to throw a ManifoldCFException that will stop the job with a 
> failure state
>
>
>
>
>
> What do you think ? If you agree with me, I can create a ticket for 
> that and submit a patch. This would allow to graciously keep the job 
> running while properly skipping identified exceptions.
>
>
>
>
>
> Regards,
> Julien
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> campaign=sig-email&utm_content=emailclient> Garanti sans virus. 
> www.avast.com 
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> campaign=sig-email&utm_content=emailclient>
> <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


Re: Solr output connector - behavior on some exceptions

Posted by Karl Wright <da...@gmail.com>.
If the "solr is down" exceptions are indeed caught upstream, I'm
tentatively in agreement that this fallback logic can be changed.  But I
would like to understand what specifically you are seeing this happen for.
What cases are you hoping to improve?

Karl


On Tue, Jul 13, 2021 at 9:39 AM <ju...@francelabs.com> wrote:

> Hi,
>
>
>
> I would like to change the behavior of the Solr output connector
> concerning two exception handling cases :
>
>
>
>    1. In the current « handleIOException » method of the HttpPoster
>    class, the « unknown » case looks like this :
>
>
>
>    As the comment says, we don’t know the type of IOException, so it is
>    not necessary to make the ServiceInterruption fail after a period,
>    especially since all « Solr down » exceptions have been handled upstream
>
>    2. The current « handleSolrServerException » method of the HttPoster
>    class. Same as above, this method is called for an unknown exception that
>    cannot be related to a « Solr down » issue; it can only be related to some
>    missconfiguration or document specific issue. It is therefore not necessary
>    to throw a ManifoldCFException that will stop the job with a failure state
>
>
>
>
>
> What do you think ? If you agree with me, I can create a ticket for that
> and submit a patch. This would allow to graciously keep the job running
> while properly skipping identified exceptions.
>
>
>
>
>
> Regards,
> Julien
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> Garanti
> sans virus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> <#m_-5206088803545595557_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>