syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In Python how do I wget a web page URL?

Linux provides a command-line utility named 'wget'.

It is useful for getting HTML from a web page URL if that page is not too dependent on JavaScript.

Here is a demo of wget:
dan@hp ~ $ 
dan@hp ~ $ cd /tmp/
dan@hp /tmp $ 
dan@hp /tmp $ wget www.syntax611.com
--2014-12-16 23:59:31--  http://www.syntax611.com/
Resolving www.syntax611.com (www.syntax611.com)... 50.17.181.87
Connecting to www.syntax611.com (www.syntax611.com)|50.17.181.87|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

    [ <=>                                   ] 3,692       21.9KB/s   in 0.2s   

2014-12-16 23:59:32 (21.9 KB/s) - ‘index.html’ saved [3692]

dan@hp /tmp $ 
dan@hp /tmp $ head index.html


<!DOCTYPE html>
<html>
<head>
  <title>Syntax611.com</title>
  <link href="/myassets/site.css" media="all" rel="stylesheet" />
  <script src="/myassets/jquery-2.1.0.min.js"></script>
  <script src="/myassets/d3.min.js"></script>
  <script src="/myassets/jquery.tablesorter.min.js"></script>
  <meta charset="utf-8" />
  <link rel="stylesheet" href="/myassets/styles/default.css">
dan@hp /tmp $ 
dan@hp /tmp $ 
dan@hp /tmp $ 
So, wget is useful but how do I wget a web page using Python instead of bash?

Let the syntax do the getting:
# python_wget.py

import urllib2
response = urllib2.urlopen('http://www.syntax611.com')
myhtml = response.read()
response.close()
fh = open('/tmp/syntax611.html','w')
fh.write(myhtml)
fh.close()
from subprocess import call
call(['head','/tmp/syntax611.html'])

dan@hp /tmp $ 
dan@hp /tmp $ python python_wget.py 
<!DOCTYPE html>
<html>
<head>
  <title>Syntax611.com</title>
  <link href="/myassets/site.css" media="all" rel="stylesheet" />
  <script src="/myassets/jquery-2.1.0.min.js"></script>
  <script src="/myassets/d3.min.js"></script>
  <script src="/myassets/jquery.tablesorter.min.js"></script>
  <meta charset="utf-8" />
  <link rel="stylesheet" href="/myassets/styles/default.css">
dan@hp /tmp $ 
dan@hp /tmp $ 
dan@hp /tmp $ 

syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me