You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by va...@labein.es on 2008/02/27 08:53:55 UTC

beginner doubt

Hi,

I am trying to use HttpClient to get the following page:

http://ted.europa.eu/Exec?DataFlow=ShowPage.dfl&Template=TED/N_one_result_detail_data.htm&docnumber=40361-2006&docId=40361-2006&StatLang=EN


Everything seems to be OK but the only result I get is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!--<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
        <head>
                <script type="text/javascript">
                        function setRedirectionValue () {
                                var redirect = document.location;
                                if (redirect == "") {
                                        redirect = "/";
                                }
                                document.forms.restoreSession.Name.value =
"";
                                document.forms.restoreSession.Value.value =
"";

document.forms.restoreSession.Redirect.value = "";

document.forms.restoreSession.Language.value = "";

document.forms.restoreSession.RedirectError.value = "";

document.forms.restoreSession.forwardValue.value = redirect;
                                document.forms.restoreSession.submit();
                        }
                </script>
        </head>
        <body>
                <form id="restoreSession" action="/RestoreSession"
method="post">
            <input name="Name" type ="hidden" value="" />
            <input name="Value" type ="hidden" value="" />
            <input name="Redirect" type ="hidden" value="" />
            <input name="Language" type ="hidden" value="" />
            <input name="RedirectError" type ="hidden" value="" />
            <input name="forwardValue" type="hidden"  value="" />
                </form>
                <script
type="text/javascript">setRedirectionValue();</script>
        </body>
</html>

However, when I use a browser and use the "view/source code" command the
result is completely different, I get the real data.

Any Idea about how to get the same result using httpclient or any other
library?

Thanks

      Valentín Sánchez     Fundación LABEIN

                           Parque Tecnológico de
                           Bizkaia

               e-mail:     Edificio 700
       valen@labein.es

                           48160 DERIO, Bizkaia
                           (Spain)

                LABEIN     Tel: +34 94 607 33 54

         www.labein.es     Fax: +34 94 607 33 49






---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Imprime sólo lo imprescindible, recuerda tu compromiso con el MEDIO AMBIENTE // Behar beharrezkoa baino ez imprimatu, gogoratu zure INGURUGIROArekiko konpromezua // Before printing think about the ENVIRONMENT

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente por esta misma vía y proceda a su destrucción.

Mezu honek eta erantsita dituen agiriek (baldin baditu) isilpeko informazioa izan dezakete. Hori dela eta, hutsegite baten ondorioz jasotzen duenak jakin beza bertan dagoen informazioa ezkutukoa dela eta legeak galarazi egiten duela berori baimenik gabe erabiltzea.

This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: beginner doubt

Posted by va...@labein.es.
Hello Roland,

Thank you for your answer.
Anyway, I have found one solution.
Here it is , in case anyone needs it.

I have used the library HtmlUnit. The code is very simple:


 public static void testHomePage() throws Exception {
    final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2);
    final HtmlPage page = (HtmlPage) webClient.getPage(url);
    pp = page.asText();
    pp = page.asXml(); // both functions work fine.
    System.out.println(pp);
}

Regards,
Valen

      Valentín Sánchez     Fundación LABEIN

                           Parque Tecnológico de
                           Bizkaia

               e-mail:     Edificio 700
       valen@labein.es

                           48160 DERIO, Bizkaia
                           (Spain)

                LABEIN     Tel: +34 94 607 33 54

         www.labein.es     Fax: +34 94 607 33 49






Roland Weber <os...@dubioso.net> escribió el 01/03/2008 06:37:08:

> Hello Valentin,
>
> > Everything seems to be OK but the only result I get is:
> >
> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> >     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> > <!--<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">-->
> > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
> >         <head>
> >                 <script type="text/javascript">
> >                         function setRedirectionValue () {
> >                                 var redirect = document.location;
> >                                 if (redirect == "") {
> >                                         redirect = "/";
> >                                 }
> >
document.forms.restoreSession.Name.value =
> > "";
> >
document.forms.restoreSession.Value.value =
> > "";
> >
> > document.forms.restoreSession.Redirect.value = "";
> >
> > document.forms.restoreSession.Language.value = "";
> >
> > document.forms.restoreSession.RedirectError.value = "";
> >
> > document.forms.restoreSession.forwardValue.value = redirect;
> >                                 document.forms.restoreSession.submit();
> >                         }
> >                 </script>
> >         </head>
> >         <body>
> >                 <form id="restoreSession" action="/RestoreSession"
> > method="post">
> >             <input name="Name" type ="hidden" value="" />
> >             <input name="Value" type ="hidden" value="" />
> >             <input name="Redirect" type ="hidden" value="" />
> >             <input name="Language" type ="hidden" value="" />
> >             <input name="RedirectError" type ="hidden" value="" />
> >             <input name="forwardValue" type="hidden"  value="" />
> >                 </form>
> >                 <script
> > type="text/javascript">setRedirectionValue();</script>
> >         </body>
> > </html>
> >
> > However, when I use a browser and use the "view/source code" command
the
> > result is completely different, I get the real data.
>
>
> HttpClient is not a browser. It certainly doesn't interpret JavaScript:
> http://wiki.apache.org/HttpComponents/ForAbsoluteBeginners#head-
> e5df784207b3082d88f0c254a0b656275c2b2855
>
>
> > Any Idea about how to get the same result using httpclient or any other
> > library?
>
> Figure out what POST request is generated by the JavaScript code and
> send that after your initial GET. The primer linked above might help.
>
> cheers,
>    Roland
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>



---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Imprime sólo lo imprescindible, recuerda tu compromiso con el MEDIO AMBIENTE // Behar beharrezkoa baino ez imprimatu, gogoratu zure INGURUGIROArekiko konpromezua // Before printing think about the ENVIRONMENT

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente por esta misma vía y proceda a su destrucción.

Mezu honek eta erantsita dituen agiriek (baldin baditu) isilpeko informazioa izan dezakete. Hori dela eta, hutsegite baten ondorioz jasotzen duenak jakin beza bertan dagoen informazioa ezkutukoa dela eta legeak galarazi egiten duela berori baimenik gabe erabiltzea.

This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: beginner doubt

Posted by Roland Weber <os...@dubioso.net>.
Hello Valentin,

> Everything seems to be OK but the only result I get is:
> 
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> <!--<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">-->
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
>         <head>
>                 <script type="text/javascript">
>                         function setRedirectionValue () {
>                                 var redirect = document.location;
>                                 if (redirect == "") {
>                                         redirect = "/";
>                                 }
>                                 document.forms.restoreSession.Name.value =
> "";
>                                 document.forms.restoreSession.Value.value =
> "";
> 
> document.forms.restoreSession.Redirect.value = "";
> 
> document.forms.restoreSession.Language.value = "";
> 
> document.forms.restoreSession.RedirectError.value = "";
> 
> document.forms.restoreSession.forwardValue.value = redirect;
>                                 document.forms.restoreSession.submit();
>                         }
>                 </script>
>         </head>
>         <body>
>                 <form id="restoreSession" action="/RestoreSession"
> method="post">
>             <input name="Name" type ="hidden" value="" />
>             <input name="Value" type ="hidden" value="" />
>             <input name="Redirect" type ="hidden" value="" />
>             <input name="Language" type ="hidden" value="" />
>             <input name="RedirectError" type ="hidden" value="" />
>             <input name="forwardValue" type="hidden"  value="" />
>                 </form>
>                 <script
> type="text/javascript">setRedirectionValue();</script>
>         </body>
> </html>
> 
> However, when I use a browser and use the "view/source code" command the
> result is completely different, I get the real data.


HttpClient is not a browser. It certainly doesn't interpret JavaScript:
http://wiki.apache.org/HttpComponents/ForAbsoluteBeginners#head-e5df784207b3082d88f0c254a0b656275c2b2855


> Any Idea about how to get the same result using httpclient or any other
> library?

Figure out what POST request is generated by the JavaScript code and
send that after your initial GET. The primer linked above might help.

cheers,
   Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org