You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Julien Massiera <ju...@francelabs.com> on 2019/09/05 16:05:18 UTC

TagParseState behavior with Web connector

Hi Karl,

I discovered a problematic behavior with the 
org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState class when 
crawling web pages. This behavior poses problem in particular for the 
scenario of form based authentication, as explained further in my email.

The org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState class 
which is called by the TagParseState on each noteTag() or noteEndTag() 
methods, uses the 
org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState 
class to detect if the parsing process is in or out of a 'script' tag 
and then do something or not with the incoming data.

The problem is that the TagParseState class is not aware of the type of 
tag currently parsed, so it continues to analyze any char encountered to 
detect tags even if it is actually parsing a script tag.
So let's imagine you have a script tag built like this in a web page:

<script>if(myvar <= 9) {.......}</script>

When the TagParseState parses the char '<' it will consider that a new 
tag begins until it encounters a '>' char. So in the case above, the 
TagParseState will never catch the end of the script tag, and thus, 
the scriptParseState variable in the ScriptParseState class will remain 
in the SCRIPTPARSESTATE_INSCRIPT state and the rest of the web page will 
not be correctly handled by the other parsers.

As a result, if you, for example, configure a form authentication for 
your crawl and that the form web page contains this kind of script tag 
prior to the form tag, the form will never be handled and the 
authentication will fail. This was the case I encountered, and I 
resolved it by forcing the scriptParseState to be SCRIPTPARSESTATE_NORMAL.

I have difficulties finding an elegant way to solve this issue, so I 
would gladly welcome your thoughts on that.

To simplify the reproductibility of this behavior just create an HTML 
with the following content :


<!doctype html><html lang="fr"><head><meta name="Viewport" content="width=device-width, height=device-height"/><meta charset="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge" /><noscript><meta http-equiv="refresh" content="0; URL=error.jsp?errorMessage=error.JavaScriptDisabled"/></noscript><link rel="shortcut icon" type="image/x-icon" href="/form/images/favicon.ico" /><link rel="stylesheet" href="/form/css/jQuery/ui-ilex-theme/jquery-ui-1.10.4.custom.min.css" type="text/css"><link rel="stylesheet" type="text/css" href="/form/css/bootstrap.min.css" /><link rel="stylesheet" type="text/css" href="/form/css/styles_sign_and_go.css" /><link rel="stylesheet" type="text/css" href="/form/css/styles_custom.css" /><script src="/form/js/jQuery/jquery.min.js"></script><script src="/form/js/bootstrap.min.js"></script><script src="/form/js/authenticator.js"></script><script>$(document).ready(function() {$("button, input[type='submit'], input[type='cancel'], input[type='button']").addClass("ui-button ui-widget ui-state-default ui-corner-all");});</script><script src="/form/img_func.js"></script><!--[if lt IE 9]><script src="/form/js/ie_polyfills.js"></script><![endif]--><script src="/form/js/custom.js"></script><title>Redirection to source URL </title>	
			</head>
			<body onload='give_focus_and_verif_cookie_enabled()'><script>var retryCount=0;function getIEVersion() { var match = navigator.userAgent.match(/(?:MSIE |Trident\/.*; rv:)(\d+)/); return match ? parseInt(match[1]) : -1; }function give_focus(){if(retryCount>100){return;}var currentIEVersion = getIEVersion();if(currentIEVersion <= 9){var bFound = false;if(document.forms[0]!=null){for(i=0; i < document.forms[0].length; i++){retryCount = retryCount+1;try{if (document.forms[0][i].type != "hidden") { if (document.forms[0][i].disabled != true) {     document.forms[0][i].focus();     var bFound = true;  } } if (bFound == true)   break; } catch(err) { setTimeout("give_focus()",1000); } }}}}function give_focus_and_verif_cookie_enabled(){give_focus();if(!navigator.cookieEnabled){   window.location.href="error.jsp?errorMessage=error.CookieDisabled";}}</script><div id="wrapper"><div id="header"><div class="container"><div class="logo"></div><h1>Authentication</h1><div class="changeLang"><a href="?displayLang=en-gb">EN</a> | <a href="?displayLang=fr-fr">FR</a></div></div></div>
			
			<form action='login.jsp' method='post' name='theform'>
	 		<input type="hidden" name="csrfAuth" value="-aja2lwx5jf09">
	 		<input type="hidden" name="sng-remember-me-fingerprint" id="sng-remember-me-fingerprint" value="null" >
			
			
			</form>
		    <div id="content">
		      <div class="container">
		        <div id="contenu_specifique_application" >
				 <div class="app msgLoading">
				   <div class="app-description" style="height:64px;">
			<h3>You will be redirected within a few seconds.</h3>
			</div>
			</div>
			</div>
			</div>
			</div>
       <script>
       $(document).ready(function(){
         document.getElementById('sng-remember-me-fingerprint').value = getStoreLocal('sng-remember-me-fingerprint');
         document.theform.submit();
         try {
             history.replaceState(null, "", document.referrer );
         } catch(err) {
           // security error in edge
         }
       });
       </script>
			</div></body></html>




Regards

-- 
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
www.francelabs.com


Re: TagParseState behavior with Web connector

Posted by Karl Wright <da...@gmail.com>.
If you go the strict override route, then it must be limited to parsing of
HTML, and cannot apply to general parsing of XML.  There is a pathway for
that in the Web Connector but I will need to look at it in depth and I do
not have time this week.  Perhaps this weekend.

Karl


On Mon, Sep 9, 2019 at 5:28 AM <ju...@francelabs.com> wrote:

> Hi Karl,
>
> I'm not sure we're going in the good direction by trying to apply a strict
> XML parser in the HTML connector. HTML is not mandatorily XML compliant
> (otherwise it is XHTML), and it is therefore not what many web pages are
> made of. Speaking of which, the HTML source code I took as example passes
> the HTML validation.
> I've spent some time understanding how the main browsers handle the script
> tag while creating their DOM representation. As a matter of fact, they
> basically pause the DOM creation when finding it, and hand the scripts over
> to dedicated engines. See for instance this blog explaining it :
> https://hacks.mozilla.org/2017/09/building-the-dom-faster-speculative-parsing-async-defer-and-preload/
> As such, if we want to follow a similar approach, one way I have in mind
> could be the following:
>
> Have a "getScriptParseState" method in the TagParseState class :
>
> protected int getScriptParseState()
> {
>   return 0;
> }
>
> that would be overriden by the FormParseState class :
>
> protected int getScriptParseState()
> {
>       return scriptParseState;
> }
>
> Then use this method in the switch case of the TagParseState class for the
> TAGPARSESTATE_SAWLEFTANGLE case (l271 in MCF v2.12) :
>
> ....
> else if (bTagDepth == 0)
>       {
>         if (isWhitespace(thisChar) || getScriptParseState() == 1 )
>         {
>           // Not a tag.
>           currentState = TAGPARSESTATE_NORMAL;
> ....
>
> As the scriptParseState parameter would only be set to 1 in the
> ScriptParseState class which is specific to the web connector, we are sure
> that a connector willing to parse a standard XML file will not be impacted
> by our HTML specific method.
>
> What do you think ?
>
> Julien
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : vendredi 6 septembre 2019 16:54
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: TagParseState behavior with Web connector
>
> *IF* you wanted to allow broken XML to be still correctly parsed, the
> first thing you must do is come up with a list of exceptions to standard
> XML parsing that you would want to support.  Presuming that you have a
> browser that you think is doing a good job of handling the broken HTML in
> question, you can certainly experiment to determine what that browser does
> with specific exception cases that you come up with.  Once that is done,
> then the state diagram for the tag parser must be modified in the minimal
> way to permit your exceptions to work.
>
> This is no small task, because you will be forced to consider certain tags
> as applying context, and since you are doing that, you are therefore going
> to necessarily break correct XML parsing in a non-HTML situation.  For
> example:
>
> <script>if a<b {dostuff};</script>
>
> ... would, in a true XML setting, recognize the beginning of a <b> tag,
> and you would not want to break the case where it really was a <b> tag:
>
> <something> text <b> bold text </b> </something>
>
> So an exception rule you might propose might be that if you start a tag,
> but don't properly complete it, the tag is not considered valid.  But then
> there's this case:
>
> <script> if a<b&&c>d {dostuff};</script>
>
> Since the & is an XML entity begin, what do you do here?  The parser will
> correctly detect an invalid entity, but then it also needs to understand
> that it's also an invalid tag.
>
> There are a ton of cases, and they would all have to be handled correctly
> for javascript to consistently and successfully not be interpreted as tags.
>
> I'm willing to look at this but you're going to need to supply that list
> of cases.
>
> Karl
>
>
> On Fri, Sep 6, 2019 at 9:34 AM <ju...@francelabs.com> wrote:
>
> > Hi Karl,
> >
> > Thanks for your suggestion. Took me some time to think about it, but I
> > think we have two different approaches for this case:
> > 1. In your case, it seems like if a source is problematic, it is its
> > own problem, not the one of the parser/connector, so the latter should
> > just discard the doc 2. In my case, we start from the principle that
> > in many situations (especially in web or enterprise scenarii), sources
> > cannot be changed as we want, be it for instance because they belong
> > to another party that has no interest in changing the code (think any
> > website that does not care who parses it), or because the software is
> > not maintained anymore (old versions of CMS systems for instance).
> >
> > The question then is: do we want to enable connectors to be modified
> > so that they can handle special non-compliant cases (which is our
> > case), or do we want connectors that only and strictly index content
> > that respect given specifications.
> > The solutions here would be :
> > 1. Use CDATA
> > 2. Put the javascript code in its own file 3. Encode every problematic
> > chars in the javascript Each solution requires to modify the source
> > webpage which may be impossible or refused by the source owner, and
> > the latter one would make the javascript code less readable and easy
> > to understand by developers...
> >
> > So if I rephrase a bit my question, I would add to what I wrote in my
> > first email:
> >
> > Assuming that the mentioned source document MUST be parsed to manage
> > to perform the form based authentication, and assuming that it cannot
> > be modified and thus it cannot comply with any of the recommendations
> > exposed above, what would be your recommended approach to modify the
> > connector so that it may optionally accept to handle such cases where
> > we have spotted a given sequence of characters that pose problem ?
> >
> > Regards,
> > Julien
> >
> > -----Message d'origine-----
> > De : Karl Wright <da...@gmail.com>
> > Envoyé : jeudi 5 septembre 2019 18:30
> > À : dev <de...@manifoldcf.apache.org>
> > Objet : Re: TagParseState behavior with Web connector
> >
> > The parser requires that the document being parsed be valid XML.  Data
> > within non-CDATA sections is *required* to use entity references to
> > include < or > characters.  See:
> >
> >
> > https://stackoverflow.com/questions/330725/use-of-greater-than-symbol-
> > in-xml
> >
> >
> > Karl
> >
> >
> > On Thu, Sep 5, 2019 at 12:10 PM Julien Massiera <
> > julien.massiera@francelabs.com> wrote:
> >
> > > Hi Karl,
> > >
> > > I discovered a problematic behavior with the
> > > org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState class
> > > when crawling web pages. This behavior poses problem in particular
> > > for the scenario of form based authentication, as explained further in
> my email.
> > >
> > > The org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState
> > > class which is called by the TagParseState on each noteTag() or
> > > noteEndTag() methods, uses the
> > > org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState
> > > class to detect if the parsing process is in or out of a 'script'
> > > tag and then do something or not with the incoming data.
> > >
> > > The problem is that the TagParseState class is not aware of the type
> > > of tag currently parsed, so it continues to analyze any char
> > > encountered to detect tags even if it is actually parsing a script tag.
> > > So let's imagine you have a script tag built like this in a web page:
> > >
> > > <script>if(myvar <= 9) {.......}</script>
> > >
> > > When the TagParseState parses the char '<' it will consider that a
> > > new tag begins until it encounters a '>' char. So in the case above,
> > > the TagParseState will never catch the end of the script tag, and
> > > thus, the scriptParseState variable in the ScriptParseState class
> > > will remain in the SCRIPTPARSESTATE_INSCRIPT state and the rest of
> > > the web page will not be correctly handled by the other parsers.
> > >
> > > As a result, if you, for example, configure a form authentication
> > > for your crawl and that the form web page contains this kind of
> > > script tag prior to the form tag, the form will never be handled and
> > > the authentication will fail. This was the case I encountered, and I
> > > resolved it by forcing the scriptParseState to be
> > SCRIPTPARSESTATE_NORMAL.
> > >
> > > I have difficulties finding an elegant way to solve this issue, so I
> > > would gladly welcome your thoughts on that.
> > >
> > > To simplify the reproductibility of this behavior just create an
> > > HTML with the following content :
> > >
> > >
> > > <!doctype html><html lang="fr"><head><meta name="Viewport"
> > > content="width=device-width, height=device-height"/><meta
> > > charset="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge"
> > > /><noscript><meta http-equiv="refresh" content="0;
> > > URL=error.jsp?errorMessage=error.JavaScriptDisabled"/></noscript><li
> > > nk rel="shortcut icon" type="image/x-icon"
> > > href="/form/images/favicon.ico"
> > > /><link rel="stylesheet"
> > > href="/form/css/jQuery/ui-ilex-theme/jquery-ui-1.10.4.custom.min.css"
> > > type="text/css"><link rel="stylesheet" type="text/css"
> > > href="/form/css/bootstrap.min.css" /><link rel="stylesheet"
> > type="text/css"
> > > href="/form/css/styles_sign_and_go.css" /><link rel="stylesheet"
> > > type="text/css" href="/form/css/styles_custom.css" /><script
> > > src="/form/js/jQuery/jquery.min.js"></script><script
> > > src="/form/js/bootstrap.min.js"></script><script
> > > src="/form/js/authenticator.js"></script><script>$(document).ready(f
> > > un
> > > ction() {$("button, input[type='submit'], input[type='cancel'],
> > > input[type='button']").addClass("ui-button ui-widget
> > > ui-state-default ui-corner-all");});</script><script
> > > src="/form/img_func.js"></script><!--[if lt IE 9]><script
> > > src="/form/js/ie_polyfills.js"></script><![endif]--><script
> > > src="/form/js/custom.js"></script><title>Redirection to source URL
> > > </title>
> > >                         </head>
> > >                         <body
> > > onload='give_focus_and_verif_cookie_enabled()'><script>var
> > > retryCount=0;function getIEVersion() { var match =
> > > navigator.userAgent.match(/(?:MSIE |Trident\/.*; rv:)(\d+)/); return
> > > match ? parseInt(match[1]) : -1; }function
> > > give_focus(){if(retryCount>100){return;}var currentIEVersion =
> > > getIEVersion();if(currentIEVersion <= 9){var bFound =
> > > false;if(document.forms[0]!=null){for(i=0; i <
> > > document.forms[0].length;
> > > i++){retryCount = retryCount+1;try{if (document.forms[0][i].type !=
> > > "hidden") { if (document.forms[0][i].disabled != true) {
> > >  document.forms[0][i].focus();     var bFound = true;  } } if (bFound
> ==
> > > true)   break; } catch(err) { setTimeout("give_focus()",1000); }
> > > }}}}function
> > > give_focus_and_verif_cookie_enabled(){give_focus();if(!navigator.coo
> > > ki
> > > eEnabled){
> > > window.location.href="error.jsp?errorMessage=error.CookieDisabled";}
> > > }< /script><div id="wrapper"><div id="header"><div
> > > class="container"><div
> > > class="logo"></div><h1>Authentication</h1><div class="changeLang"><a
> > > href="?displayLang=en-gb">EN</a> | <a
> > > href="?displayLang=fr-fr">FR</a></div></div></div>
> > >
> > >                         <form action='login.jsp' method='post'
> > > name='theform'>
> > >                         <input type="hidden" name="csrfAuth"
> > > value="-aja2lwx5jf09">
> > >                         <input type="hidden"
> > > name="sng-remember-me-fingerprint" id="sng-remember-me-fingerprint"
> > > value="null" >
> > >
> > >
> > >                         </form>
> > >                     <div id="content">
> > >                       <div class="container">
> > >                         <div id="contenu_specifique_application" >
> > >                                  <div class="app msgLoading">
> > >                                    <div class="app-description"
> > > style="height:64px;">
> > >                         <h3>You will be redirected within a few
> > > seconds.</h3>
> > >                         </div>
> > >                         </div>
> > >                         </div>
> > >                         </div>
> > >                         </div>
> > >        <script>
> > >        $(document).ready(function(){
> > >
> > > document.getElementById('sng-remember-me-fingerprint').value
> > > = getStoreLocal('sng-remember-me-fingerprint');
> > >          document.theform.submit();
> > >          try {
> > >              history.replaceState(null, "", document.referrer );
> > >          } catch(err) {
> > >            // security error in edge
> > >          }
> > >        });
> > >        </script>
> > >                         </div></body></html>
> > >
> > >
> > >
> > >
> > > Regards
> > >
> > > --
> > > Julien MASSIERA
> > > Directeur développement produit
> > > France Labs – Les experts du Search
> > > www.francelabs.com
> > >
> > >
> >
> >
>
>

RE: TagParseState behavior with Web connector

Posted by ju...@francelabs.com.
Hi Karl, 

I'm not sure we're going in the good direction by trying to apply a strict XML parser in the HTML connector. HTML is not mandatorily XML compliant (otherwise it is XHTML), and it is therefore not what many web pages are made of. Speaking of which, the HTML source code I took as example passes the HTML validation.
I've spent some time understanding how the main browsers handle the script tag while creating their DOM representation. As a matter of fact, they basically pause the DOM creation when finding it, and hand the scripts over to dedicated engines. See for instance this blog explaining it : https://hacks.mozilla.org/2017/09/building-the-dom-faster-speculative-parsing-async-defer-and-preload/
As such, if we want to follow a similar approach, one way I have in mind could be the following:

Have a "getScriptParseState" method in the TagParseState class :

protected int getScriptParseState()
{
  return 0;
}

that would be overriden by the FormParseState class : 

protected int getScriptParseState()            
{ 
      return scriptParseState;
}

Then use this method in the switch case of the TagParseState class for the TAGPARSESTATE_SAWLEFTANGLE case (l271 in MCF v2.12) :

....
else if (bTagDepth == 0)
      {
        if (isWhitespace(thisChar) || getScriptParseState() == 1 )
        {
          // Not a tag.
          currentState = TAGPARSESTATE_NORMAL;
....

As the scriptParseState parameter would only be set to 1 in the ScriptParseState class which is specific to the web connector, we are sure that a connector willing to parse a standard XML file will not be impacted by our HTML specific method.  

What do you think ? 

Julien

-----Message d'origine-----
De : Karl Wright <da...@gmail.com> 
Envoyé : vendredi 6 septembre 2019 16:54
À : dev <de...@manifoldcf.apache.org>
Objet : Re: TagParseState behavior with Web connector

*IF* you wanted to allow broken XML to be still correctly parsed, the first thing you must do is come up with a list of exceptions to standard XML parsing that you would want to support.  Presuming that you have a browser that you think is doing a good job of handling the broken HTML in question, you can certainly experiment to determine what that browser does with specific exception cases that you come up with.  Once that is done, then the state diagram for the tag parser must be modified in the minimal way to permit your exceptions to work.

This is no small task, because you will be forced to consider certain tags as applying context, and since you are doing that, you are therefore going to necessarily break correct XML parsing in a non-HTML situation.  For
example:

<script>if a<b {dostuff};</script>

... would, in a true XML setting, recognize the beginning of a <b> tag, and you would not want to break the case where it really was a <b> tag:

<something> text <b> bold text </b> </something>

So an exception rule you might propose might be that if you start a tag, but don't properly complete it, the tag is not considered valid.  But then there's this case:

<script> if a<b&&c>d {dostuff};</script>

Since the & is an XML entity begin, what do you do here?  The parser will correctly detect an invalid entity, but then it also needs to understand that it's also an invalid tag.

There are a ton of cases, and they would all have to be handled correctly for javascript to consistently and successfully not be interpreted as tags.

I'm willing to look at this but you're going to need to supply that list of cases.

Karl


On Fri, Sep 6, 2019 at 9:34 AM <ju...@francelabs.com> wrote:

> Hi Karl,
>
> Thanks for your suggestion. Took me some time to think about it, but I 
> think we have two different approaches for this case:
> 1. In your case, it seems like if a source is problematic, it is its 
> own problem, not the one of the parser/connector, so the latter should 
> just discard the doc 2. In my case, we start from the principle that 
> in many situations (especially in web or enterprise scenarii), sources 
> cannot be changed as we want, be it for instance because they belong 
> to another party that has no interest in changing the code (think any 
> website that does not care who parses it), or because the software is 
> not maintained anymore (old versions of CMS systems for instance).
>
> The question then is: do we want to enable connectors to be modified 
> so that they can handle special non-compliant cases (which is our 
> case), or do we want connectors that only and strictly index content 
> that respect given specifications.
> The solutions here would be :
> 1. Use CDATA
> 2. Put the javascript code in its own file 3. Encode every problematic 
> chars in the javascript Each solution requires to modify the source 
> webpage which may be impossible or refused by the source owner, and 
> the latter one would make the javascript code less readable and easy 
> to understand by developers...
>
> So if I rephrase a bit my question, I would add to what I wrote in my 
> first email:
>
> Assuming that the mentioned source document MUST be parsed to manage 
> to perform the form based authentication, and assuming that it cannot 
> be modified and thus it cannot comply with any of the recommendations 
> exposed above, what would be your recommended approach to modify the 
> connector so that it may optionally accept to handle such cases where 
> we have spotted a given sequence of characters that pose problem ?
>
> Regards,
> Julien
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : jeudi 5 septembre 2019 18:30
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: TagParseState behavior with Web connector
>
> The parser requires that the document being parsed be valid XML.  Data 
> within non-CDATA sections is *required* to use entity references to 
> include < or > characters.  See:
>
>
> https://stackoverflow.com/questions/330725/use-of-greater-than-symbol-
> in-xml
>
>
> Karl
>
>
> On Thu, Sep 5, 2019 at 12:10 PM Julien Massiera < 
> julien.massiera@francelabs.com> wrote:
>
> > Hi Karl,
> >
> > I discovered a problematic behavior with the 
> > org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState class 
> > when crawling web pages. This behavior poses problem in particular 
> > for the scenario of form based authentication, as explained further in my email.
> >
> > The org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState 
> > class which is called by the TagParseState on each noteTag() or 
> > noteEndTag() methods, uses the 
> > org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState
> > class to detect if the parsing process is in or out of a 'script' 
> > tag and then do something or not with the incoming data.
> >
> > The problem is that the TagParseState class is not aware of the type 
> > of tag currently parsed, so it continues to analyze any char 
> > encountered to detect tags even if it is actually parsing a script tag.
> > So let's imagine you have a script tag built like this in a web page:
> >
> > <script>if(myvar <= 9) {.......}</script>
> >
> > When the TagParseState parses the char '<' it will consider that a 
> > new tag begins until it encounters a '>' char. So in the case above, 
> > the TagParseState will never catch the end of the script tag, and 
> > thus, the scriptParseState variable in the ScriptParseState class 
> > will remain in the SCRIPTPARSESTATE_INSCRIPT state and the rest of 
> > the web page will not be correctly handled by the other parsers.
> >
> > As a result, if you, for example, configure a form authentication 
> > for your crawl and that the form web page contains this kind of 
> > script tag prior to the form tag, the form will never be handled and 
> > the authentication will fail. This was the case I encountered, and I 
> > resolved it by forcing the scriptParseState to be
> SCRIPTPARSESTATE_NORMAL.
> >
> > I have difficulties finding an elegant way to solve this issue, so I 
> > would gladly welcome your thoughts on that.
> >
> > To simplify the reproductibility of this behavior just create an 
> > HTML with the following content :
> >
> >
> > <!doctype html><html lang="fr"><head><meta name="Viewport"
> > content="width=device-width, height=device-height"/><meta 
> > charset="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge"
> > /><noscript><meta http-equiv="refresh" content="0; 
> > URL=error.jsp?errorMessage=error.JavaScriptDisabled"/></noscript><li
> > nk rel="shortcut icon" type="image/x-icon" 
> > href="/form/images/favicon.ico"
> > /><link rel="stylesheet"
> > href="/form/css/jQuery/ui-ilex-theme/jquery-ui-1.10.4.custom.min.css"
> > type="text/css"><link rel="stylesheet" type="text/css"
> > href="/form/css/bootstrap.min.css" /><link rel="stylesheet"
> type="text/css"
> > href="/form/css/styles_sign_and_go.css" /><link rel="stylesheet"
> > type="text/css" href="/form/css/styles_custom.css" /><script 
> > src="/form/js/jQuery/jquery.min.js"></script><script
> > src="/form/js/bootstrap.min.js"></script><script
> > src="/form/js/authenticator.js"></script><script>$(document).ready(f
> > un
> > ction() {$("button, input[type='submit'], input[type='cancel'], 
> > input[type='button']").addClass("ui-button ui-widget 
> > ui-state-default ui-corner-all");});</script><script
> > src="/form/img_func.js"></script><!--[if lt IE 9]><script 
> > src="/form/js/ie_polyfills.js"></script><![endif]--><script
> > src="/form/js/custom.js"></script><title>Redirection to source URL 
> > </title>
> >                         </head>
> >                         <body
> > onload='give_focus_and_verif_cookie_enabled()'><script>var
> > retryCount=0;function getIEVersion() { var match = 
> > navigator.userAgent.match(/(?:MSIE |Trident\/.*; rv:)(\d+)/); return 
> > match ? parseInt(match[1]) : -1; }function 
> > give_focus(){if(retryCount>100){return;}var currentIEVersion = 
> > getIEVersion();if(currentIEVersion <= 9){var bFound = 
> > false;if(document.forms[0]!=null){for(i=0; i < 
> > document.forms[0].length;
> > i++){retryCount = retryCount+1;try{if (document.forms[0][i].type !=
> > "hidden") { if (document.forms[0][i].disabled != true) {
> >  document.forms[0][i].focus();     var bFound = true;  } } if (bFound ==
> > true)   break; } catch(err) { setTimeout("give_focus()",1000); }
> > }}}}function
> > give_focus_and_verif_cookie_enabled(){give_focus();if(!navigator.coo
> > ki
> > eEnabled){
> > window.location.href="error.jsp?errorMessage=error.CookieDisabled";}
> > }< /script><div id="wrapper"><div id="header"><div 
> > class="container"><div 
> > class="logo"></div><h1>Authentication</h1><div class="changeLang"><a 
> > href="?displayLang=en-gb">EN</a> | <a 
> > href="?displayLang=fr-fr">FR</a></div></div></div>
> >
> >                         <form action='login.jsp' method='post'
> > name='theform'>
> >                         <input type="hidden" name="csrfAuth"
> > value="-aja2lwx5jf09">
> >                         <input type="hidden"
> > name="sng-remember-me-fingerprint" id="sng-remember-me-fingerprint"
> > value="null" >
> >
> >
> >                         </form>
> >                     <div id="content">
> >                       <div class="container">
> >                         <div id="contenu_specifique_application" >
> >                                  <div class="app msgLoading">
> >                                    <div class="app-description"
> > style="height:64px;">
> >                         <h3>You will be redirected within a few 
> > seconds.</h3>
> >                         </div>
> >                         </div>
> >                         </div>
> >                         </div>
> >                         </div>
> >        <script>
> >        $(document).ready(function(){
> >          
> > document.getElementById('sng-remember-me-fingerprint').value
> > = getStoreLocal('sng-remember-me-fingerprint');
> >          document.theform.submit();
> >          try {
> >              history.replaceState(null, "", document.referrer );
> >          } catch(err) {
> >            // security error in edge
> >          }
> >        });
> >        </script>
> >                         </div></body></html>
> >
> >
> >
> >
> > Regards
> >
> > --
> > Julien MASSIERA
> > Directeur développement produit
> > France Labs – Les experts du Search
> > www.francelabs.com
> >
> >
>
>


Re: TagParseState behavior with Web connector

Posted by Karl Wright <da...@gmail.com>.
*IF* you wanted to allow broken XML to be still correctly parsed, the first
thing you must do is come up with a list of exceptions to standard XML
parsing that you would want to support.  Presuming that you have a browser
that you think is doing a good job of handling the broken HTML in question,
you can certainly experiment to determine what that browser does with
specific exception cases that you come up with.  Once that is done, then
the state diagram for the tag parser must be modified in the minimal way to
permit your exceptions to work.

This is no small task, because you will be forced to consider certain tags
as applying context, and since you are doing that, you are therefore going
to necessarily break correct XML parsing in a non-HTML situation.  For
example:

<script>if a<b {dostuff};</script>

... would, in a true XML setting, recognize the beginning of a <b> tag, and
you would not want to break the case where it really was a <b> tag:

<something> text <b> bold text </b> </something>

So an exception rule you might propose might be that if you start a tag,
but don't properly complete it, the tag is not considered valid.  But then
there's this case:

<script> if a<b&&c>d {dostuff};</script>

Since the & is an XML entity begin, what do you do here?  The parser will
correctly detect an invalid entity, but then it also needs to understand
that it's also an invalid tag.

There are a ton of cases, and they would all have to be handled correctly
for javascript to consistently and successfully not be interpreted as tags.

I'm willing to look at this but you're going to need to supply that list of
cases.

Karl


On Fri, Sep 6, 2019 at 9:34 AM <ju...@francelabs.com> wrote:

> Hi Karl,
>
> Thanks for your suggestion. Took me some time to think about it, but I
> think we have two different approaches for this case:
> 1. In your case, it seems like if a source is problematic, it is its own
> problem, not the one of the parser/connector, so the latter should just
> discard the doc
> 2. In my case, we start from the principle that in many situations
> (especially in web or enterprise scenarii), sources cannot be changed as we
> want, be it for instance because they belong to another party that has no
> interest in changing the code (think any website that does not care who
> parses it), or because the software is not maintained anymore (old versions
> of CMS systems for instance).
>
> The question then is: do we want to enable connectors to be modified so
> that they can handle special non-compliant cases (which is our case), or do
> we want connectors that only and strictly index content that respect given
> specifications.
> The solutions here would be :
> 1. Use CDATA
> 2. Put the javascript code in its own file
> 3. Encode every problematic chars in the javascript
> Each solution requires to modify the source webpage which may be
> impossible or refused by the source owner, and the latter one would make
> the javascript code less readable and easy to understand by developers...
>
> So if I rephrase a bit my question, I would add to what I wrote in my
> first email:
>
> Assuming that the mentioned source document MUST be parsed to manage to
> perform the form based authentication, and assuming that it cannot be
> modified and thus it cannot comply with any of the recommendations exposed
> above, what would be your recommended approach to modify the connector so
> that it may optionally accept to handle such cases where we have spotted a
> given sequence of characters that pose problem ?
>
> Regards,
> Julien
>
> -----Message d'origine-----
> De : Karl Wright <da...@gmail.com>
> Envoyé : jeudi 5 septembre 2019 18:30
> À : dev <de...@manifoldcf.apache.org>
> Objet : Re: TagParseState behavior with Web connector
>
> The parser requires that the document being parsed be valid XML.  Data
> within non-CDATA sections is *required* to use entity references to include
> < or > characters.  See:
>
>
> https://stackoverflow.com/questions/330725/use-of-greater-than-symbol-in-xml
>
>
> Karl
>
>
> On Thu, Sep 5, 2019 at 12:10 PM Julien Massiera <
> julien.massiera@francelabs.com> wrote:
>
> > Hi Karl,
> >
> > I discovered a problematic behavior with the
> > org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState class when
> > crawling web pages. This behavior poses problem in particular for the
> > scenario of form based authentication, as explained further in my email.
> >
> > The org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState class
> > which is called by the TagParseState on each noteTag() or noteEndTag()
> > methods, uses the
> > org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState
> > class to detect if the parsing process is in or out of a 'script' tag
> > and then do something or not with the incoming data.
> >
> > The problem is that the TagParseState class is not aware of the type
> > of tag currently parsed, so it continues to analyze any char
> > encountered to detect tags even if it is actually parsing a script tag.
> > So let's imagine you have a script tag built like this in a web page:
> >
> > <script>if(myvar <= 9) {.......}</script>
> >
> > When the TagParseState parses the char '<' it will consider that a new
> > tag begins until it encounters a '>' char. So in the case above, the
> > TagParseState will never catch the end of the script tag, and thus,
> > the scriptParseState variable in the ScriptParseState class will
> > remain in the SCRIPTPARSESTATE_INSCRIPT state and the rest of the web
> > page will not be correctly handled by the other parsers.
> >
> > As a result, if you, for example, configure a form authentication for
> > your crawl and that the form web page contains this kind of script tag
> > prior to the form tag, the form will never be handled and the
> > authentication will fail. This was the case I encountered, and I
> > resolved it by forcing the scriptParseState to be
> SCRIPTPARSESTATE_NORMAL.
> >
> > I have difficulties finding an elegant way to solve this issue, so I
> > would gladly welcome your thoughts on that.
> >
> > To simplify the reproductibility of this behavior just create an HTML
> > with the following content :
> >
> >
> > <!doctype html><html lang="fr"><head><meta name="Viewport"
> > content="width=device-width, height=device-height"/><meta
> > charset="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge"
> > /><noscript><meta http-equiv="refresh" content="0;
> > URL=error.jsp?errorMessage=error.JavaScriptDisabled"/></noscript><link
> > rel="shortcut icon" type="image/x-icon" href="/form/images/favicon.ico"
> > /><link rel="stylesheet"
> > href="/form/css/jQuery/ui-ilex-theme/jquery-ui-1.10.4.custom.min.css"
> > type="text/css"><link rel="stylesheet" type="text/css"
> > href="/form/css/bootstrap.min.css" /><link rel="stylesheet"
> type="text/css"
> > href="/form/css/styles_sign_and_go.css" /><link rel="stylesheet"
> > type="text/css" href="/form/css/styles_custom.css" /><script
> > src="/form/js/jQuery/jquery.min.js"></script><script
> > src="/form/js/bootstrap.min.js"></script><script
> > src="/form/js/authenticator.js"></script><script>$(document).ready(fun
> > ction() {$("button, input[type='submit'], input[type='cancel'],
> > input[type='button']").addClass("ui-button ui-widget ui-state-default
> > ui-corner-all");});</script><script
> > src="/form/img_func.js"></script><!--[if lt IE 9]><script
> > src="/form/js/ie_polyfills.js"></script><![endif]--><script
> > src="/form/js/custom.js"></script><title>Redirection to source URL
> > </title>
> >                         </head>
> >                         <body
> > onload='give_focus_and_verif_cookie_enabled()'><script>var
> > retryCount=0;function getIEVersion() { var match =
> > navigator.userAgent.match(/(?:MSIE |Trident\/.*; rv:)(\d+)/); return
> > match ? parseInt(match[1]) : -1; }function
> > give_focus(){if(retryCount>100){return;}var currentIEVersion =
> > getIEVersion();if(currentIEVersion <= 9){var bFound =
> > false;if(document.forms[0]!=null){for(i=0; i <
> > document.forms[0].length;
> > i++){retryCount = retryCount+1;try{if (document.forms[0][i].type !=
> > "hidden") { if (document.forms[0][i].disabled != true) {
> >  document.forms[0][i].focus();     var bFound = true;  } } if (bFound ==
> > true)   break; } catch(err) { setTimeout("give_focus()",1000); }
> > }}}}function
> > give_focus_and_verif_cookie_enabled(){give_focus();if(!navigator.cooki
> > eEnabled){
> > window.location.href="error.jsp?errorMessage=error.CookieDisabled";}}<
> > /script><div id="wrapper"><div id="header"><div class="container"><div
> > class="logo"></div><h1>Authentication</h1><div class="changeLang"><a
> > href="?displayLang=en-gb">EN</a> | <a
> > href="?displayLang=fr-fr">FR</a></div></div></div>
> >
> >                         <form action='login.jsp' method='post'
> > name='theform'>
> >                         <input type="hidden" name="csrfAuth"
> > value="-aja2lwx5jf09">
> >                         <input type="hidden"
> > name="sng-remember-me-fingerprint" id="sng-remember-me-fingerprint"
> > value="null" >
> >
> >
> >                         </form>
> >                     <div id="content">
> >                       <div class="container">
> >                         <div id="contenu_specifique_application" >
> >                                  <div class="app msgLoading">
> >                                    <div class="app-description"
> > style="height:64px;">
> >                         <h3>You will be redirected within a few
> > seconds.</h3>
> >                         </div>
> >                         </div>
> >                         </div>
> >                         </div>
> >                         </div>
> >        <script>
> >        $(document).ready(function(){
> >          document.getElementById('sng-remember-me-fingerprint').value
> > = getStoreLocal('sng-remember-me-fingerprint');
> >          document.theform.submit();
> >          try {
> >              history.replaceState(null, "", document.referrer );
> >          } catch(err) {
> >            // security error in edge
> >          }
> >        });
> >        </script>
> >                         </div></body></html>
> >
> >
> >
> >
> > Regards
> >
> > --
> > Julien MASSIERA
> > Directeur développement produit
> > France Labs – Les experts du Search
> > www.francelabs.com
> >
> >
>
>

RE: TagParseState behavior with Web connector

Posted by ju...@francelabs.com.
Hi Karl,

Thanks for your suggestion. Took me some time to think about it, but I think we have two different approaches for this case:
1. In your case, it seems like if a source is problematic, it is its own problem, not the one of the parser/connector, so the latter should just discard the doc 
2. In my case, we start from the principle that in many situations (especially in web or enterprise scenarii), sources cannot be changed as we want, be it for instance because they belong to another party that has no interest in changing the code (think any website that does not care who parses it), or because the software is not maintained anymore (old versions of CMS systems for instance).

The question then is: do we want to enable connectors to be modified so that they can handle special non-compliant cases (which is our case), or do we want connectors that only and strictly index content that respect given specifications. 
The solutions here would be :
1. Use CDATA
2. Put the javascript code in its own file
3. Encode every problematic chars in the javascript
Each solution requires to modify the source webpage which may be impossible or refused by the source owner, and the latter one would make the javascript code less readable and easy to understand by developers...

So if I rephrase a bit my question, I would add to what I wrote in my first email:

Assuming that the mentioned source document MUST be parsed to manage to perform the form based authentication, and assuming that it cannot be modified and thus it cannot comply with any of the recommendations exposed above, what would be your recommended approach to modify the connector so that it may optionally accept to handle such cases where we have spotted a given sequence of characters that pose problem ? 

Regards,
Julien

-----Message d'origine-----
De : Karl Wright <da...@gmail.com> 
Envoyé : jeudi 5 septembre 2019 18:30
À : dev <de...@manifoldcf.apache.org>
Objet : Re: TagParseState behavior with Web connector

The parser requires that the document being parsed be valid XML.  Data within non-CDATA sections is *required* to use entity references to include < or > characters.  See:

https://stackoverflow.com/questions/330725/use-of-greater-than-symbol-in-xml


Karl


On Thu, Sep 5, 2019 at 12:10 PM Julien Massiera < julien.massiera@francelabs.com> wrote:

> Hi Karl,
>
> I discovered a problematic behavior with the 
> org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState class when 
> crawling web pages. This behavior poses problem in particular for the 
> scenario of form based authentication, as explained further in my email.
>
> The org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState class 
> which is called by the TagParseState on each noteTag() or noteEndTag() 
> methods, uses the 
> org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState
> class to detect if the parsing process is in or out of a 'script' tag 
> and then do something or not with the incoming data.
>
> The problem is that the TagParseState class is not aware of the type 
> of tag currently parsed, so it continues to analyze any char 
> encountered to detect tags even if it is actually parsing a script tag.
> So let's imagine you have a script tag built like this in a web page:
>
> <script>if(myvar <= 9) {.......}</script>
>
> When the TagParseState parses the char '<' it will consider that a new 
> tag begins until it encounters a '>' char. So in the case above, the 
> TagParseState will never catch the end of the script tag, and thus, 
> the scriptParseState variable in the ScriptParseState class will 
> remain in the SCRIPTPARSESTATE_INSCRIPT state and the rest of the web 
> page will not be correctly handled by the other parsers.
>
> As a result, if you, for example, configure a form authentication for 
> your crawl and that the form web page contains this kind of script tag 
> prior to the form tag, the form will never be handled and the 
> authentication will fail. This was the case I encountered, and I 
> resolved it by forcing the scriptParseState to be SCRIPTPARSESTATE_NORMAL.
>
> I have difficulties finding an elegant way to solve this issue, so I 
> would gladly welcome your thoughts on that.
>
> To simplify the reproductibility of this behavior just create an HTML 
> with the following content :
>
>
> <!doctype html><html lang="fr"><head><meta name="Viewport"
> content="width=device-width, height=device-height"/><meta 
> charset="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge"
> /><noscript><meta http-equiv="refresh" content="0; 
> URL=error.jsp?errorMessage=error.JavaScriptDisabled"/></noscript><link
> rel="shortcut icon" type="image/x-icon" href="/form/images/favicon.ico"
> /><link rel="stylesheet"
> href="/form/css/jQuery/ui-ilex-theme/jquery-ui-1.10.4.custom.min.css"
> type="text/css"><link rel="stylesheet" type="text/css"
> href="/form/css/bootstrap.min.css" /><link rel="stylesheet" type="text/css"
> href="/form/css/styles_sign_and_go.css" /><link rel="stylesheet"
> type="text/css" href="/form/css/styles_custom.css" /><script 
> src="/form/js/jQuery/jquery.min.js"></script><script
> src="/form/js/bootstrap.min.js"></script><script
> src="/form/js/authenticator.js"></script><script>$(document).ready(fun
> ction() {$("button, input[type='submit'], input[type='cancel'], 
> input[type='button']").addClass("ui-button ui-widget ui-state-default 
> ui-corner-all");});</script><script
> src="/form/img_func.js"></script><!--[if lt IE 9]><script 
> src="/form/js/ie_polyfills.js"></script><![endif]--><script
> src="/form/js/custom.js"></script><title>Redirection to source URL 
> </title>
>                         </head>
>                         <body
> onload='give_focus_and_verif_cookie_enabled()'><script>var
> retryCount=0;function getIEVersion() { var match = 
> navigator.userAgent.match(/(?:MSIE |Trident\/.*; rv:)(\d+)/); return 
> match ? parseInt(match[1]) : -1; }function 
> give_focus(){if(retryCount>100){return;}var currentIEVersion = 
> getIEVersion();if(currentIEVersion <= 9){var bFound = 
> false;if(document.forms[0]!=null){for(i=0; i < 
> document.forms[0].length;
> i++){retryCount = retryCount+1;try{if (document.forms[0][i].type !=
> "hidden") { if (document.forms[0][i].disabled != true) {
>  document.forms[0][i].focus();     var bFound = true;  } } if (bFound ==
> true)   break; } catch(err) { setTimeout("give_focus()",1000); }
> }}}}function
> give_focus_and_verif_cookie_enabled(){give_focus();if(!navigator.cooki
> eEnabled){  
> window.location.href="error.jsp?errorMessage=error.CookieDisabled";}}<
> /script><div id="wrapper"><div id="header"><div class="container"><div 
> class="logo"></div><h1>Authentication</h1><div class="changeLang"><a 
> href="?displayLang=en-gb">EN</a> | <a 
> href="?displayLang=fr-fr">FR</a></div></div></div>
>
>                         <form action='login.jsp' method='post'
> name='theform'>
>                         <input type="hidden" name="csrfAuth"
> value="-aja2lwx5jf09">
>                         <input type="hidden"
> name="sng-remember-me-fingerprint" id="sng-remember-me-fingerprint"
> value="null" >
>
>
>                         </form>
>                     <div id="content">
>                       <div class="container">
>                         <div id="contenu_specifique_application" >
>                                  <div class="app msgLoading">
>                                    <div class="app-description"
> style="height:64px;">
>                         <h3>You will be redirected within a few 
> seconds.</h3>
>                         </div>
>                         </div>
>                         </div>
>                         </div>
>                         </div>
>        <script>
>        $(document).ready(function(){
>          document.getElementById('sng-remember-me-fingerprint').value 
> = getStoreLocal('sng-remember-me-fingerprint');
>          document.theform.submit();
>          try {
>              history.replaceState(null, "", document.referrer );
>          } catch(err) {
>            // security error in edge
>          }
>        });
>        </script>
>                         </div></body></html>
>
>
>
>
> Regards
>
> --
> Julien MASSIERA
> Directeur développement produit
> France Labs – Les experts du Search
> www.francelabs.com
>
>


Re: TagParseState behavior with Web connector

Posted by Karl Wright <da...@gmail.com>.
The parser requires that the document being parsed be valid XML.  Data
within non-CDATA sections is *required* to use entity references to include
< or > characters.  See:

https://stackoverflow.com/questions/330725/use-of-greater-than-symbol-in-xml


Karl


On Thu, Sep 5, 2019 at 12:10 PM Julien Massiera <
julien.massiera@francelabs.com> wrote:

> Hi Karl,
>
> I discovered a problematic behavior with the
> org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState class when
> crawling web pages. This behavior poses problem in particular for the
> scenario of form based authentication, as explained further in my email.
>
> The org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState class
> which is called by the TagParseState on each noteTag() or noteEndTag()
> methods, uses the
> org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState
> class to detect if the parsing process is in or out of a 'script' tag
> and then do something or not with the incoming data.
>
> The problem is that the TagParseState class is not aware of the type of
> tag currently parsed, so it continues to analyze any char encountered to
> detect tags even if it is actually parsing a script tag.
> So let's imagine you have a script tag built like this in a web page:
>
> <script>if(myvar <= 9) {.......}</script>
>
> When the TagParseState parses the char '<' it will consider that a new
> tag begins until it encounters a '>' char. So in the case above, the
> TagParseState will never catch the end of the script tag, and thus,
> the scriptParseState variable in the ScriptParseState class will remain
> in the SCRIPTPARSESTATE_INSCRIPT state and the rest of the web page will
> not be correctly handled by the other parsers.
>
> As a result, if you, for example, configure a form authentication for
> your crawl and that the form web page contains this kind of script tag
> prior to the form tag, the form will never be handled and the
> authentication will fail. This was the case I encountered, and I
> resolved it by forcing the scriptParseState to be SCRIPTPARSESTATE_NORMAL.
>
> I have difficulties finding an elegant way to solve this issue, so I
> would gladly welcome your thoughts on that.
>
> To simplify the reproductibility of this behavior just create an HTML
> with the following content :
>
>
> <!doctype html><html lang="fr"><head><meta name="Viewport"
> content="width=device-width, height=device-height"/><meta
> charset="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge"
> /><noscript><meta http-equiv="refresh" content="0;
> URL=error.jsp?errorMessage=error.JavaScriptDisabled"/></noscript><link
> rel="shortcut icon" type="image/x-icon" href="/form/images/favicon.ico"
> /><link rel="stylesheet"
> href="/form/css/jQuery/ui-ilex-theme/jquery-ui-1.10.4.custom.min.css"
> type="text/css"><link rel="stylesheet" type="text/css"
> href="/form/css/bootstrap.min.css" /><link rel="stylesheet" type="text/css"
> href="/form/css/styles_sign_and_go.css" /><link rel="stylesheet"
> type="text/css" href="/form/css/styles_custom.css" /><script
> src="/form/js/jQuery/jquery.min.js"></script><script
> src="/form/js/bootstrap.min.js"></script><script
> src="/form/js/authenticator.js"></script><script>$(document).ready(function()
> {$("button, input[type='submit'], input[type='cancel'],
> input[type='button']").addClass("ui-button ui-widget ui-state-default
> ui-corner-all");});</script><script
> src="/form/img_func.js"></script><!--[if lt IE 9]><script
> src="/form/js/ie_polyfills.js"></script><![endif]--><script
> src="/form/js/custom.js"></script><title>Redirection to source URL
> </title>
>                         </head>
>                         <body
> onload='give_focus_and_verif_cookie_enabled()'><script>var
> retryCount=0;function getIEVersion() { var match =
> navigator.userAgent.match(/(?:MSIE |Trident\/.*; rv:)(\d+)/); return match
> ? parseInt(match[1]) : -1; }function
> give_focus(){if(retryCount>100){return;}var currentIEVersion =
> getIEVersion();if(currentIEVersion <= 9){var bFound =
> false;if(document.forms[0]!=null){for(i=0; i < document.forms[0].length;
> i++){retryCount = retryCount+1;try{if (document.forms[0][i].type !=
> "hidden") { if (document.forms[0][i].disabled != true) {
>  document.forms[0][i].focus();     var bFound = true;  } } if (bFound ==
> true)   break; } catch(err) { setTimeout("give_focus()",1000); }
> }}}}function
> give_focus_and_verif_cookie_enabled(){give_focus();if(!navigator.cookieEnabled){
>  window.location.href="error.jsp?errorMessage=error.CookieDisabled";}}</script><div
> id="wrapper"><div id="header"><div class="container"><div
> class="logo"></div><h1>Authentication</h1><div class="changeLang"><a
> href="?displayLang=en-gb">EN</a> | <a
> href="?displayLang=fr-fr">FR</a></div></div></div>
>
>                         <form action='login.jsp' method='post'
> name='theform'>
>                         <input type="hidden" name="csrfAuth"
> value="-aja2lwx5jf09">
>                         <input type="hidden"
> name="sng-remember-me-fingerprint" id="sng-remember-me-fingerprint"
> value="null" >
>
>
>                         </form>
>                     <div id="content">
>                       <div class="container">
>                         <div id="contenu_specifique_application" >
>                                  <div class="app msgLoading">
>                                    <div class="app-description"
> style="height:64px;">
>                         <h3>You will be redirected within a few
> seconds.</h3>
>                         </div>
>                         </div>
>                         </div>
>                         </div>
>                         </div>
>        <script>
>        $(document).ready(function(){
>          document.getElementById('sng-remember-me-fingerprint').value =
> getStoreLocal('sng-remember-me-fingerprint');
>          document.theform.submit();
>          try {
>              history.replaceState(null, "", document.referrer );
>          } catch(err) {
>            // security error in edge
>          }
>        });
>        </script>
>                         </div></body></html>
>
>
>
>
> Regards
>
> --
> Julien MASSIERA
> Directeur développement produit
> France Labs – Les experts du Search
> www.francelabs.com
>
>