You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "QualiteSys QualiteSys (Jira)" <ji...@apache.org> on 2021/04/28 17:31:00 UTC

[jira] [Created] (SOLR-15381) SimplePostTool.java PageFetcher error

QualiteSys QualiteSys created SOLR-15381:
--------------------------------------------

             Summary: SimplePostTool.java PageFetcher error
                 Key: SOLR-15381
                 URL: https://issues.apache.org/jira/browse/SOLR-15381
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SimplePostTool
            Reporter: QualiteSys QualiteSys


The SimplePostTool fails to grab web pages in simple cases.

The getLinksFromWebPage process fails to detect url within the html page in line 1252. Seams to be a problem when the html page is not perfect, from the xml point of view.

 

Example to reproduce the problem :

java -Dc=techproducts -Ddata=web -Drecursive=3 -jar example\exampledocs\post.jar [http://www.google.com/]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org