18

I want to download a webpage using selenium with python. using the following code:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument('--save-page-as-mhtml')
d = DesiredCapabilities.CHROME
driver = webdriver.Chrome()

driver.get("http://www.yahoo.com")

saveas = ActionChains(driver).key_down(Keys.CONTROL)\
         .key_down('s').key_up(Keys.CONTROL).key_up('s')
saveas.perform()
print("done")

However the above code isnt working. I am using windows 7. Is there any by which i can bring up the 'Save as" Dialog box?

Thanks Karan

CC BY-SA 3.0

1 Answer 1

33

You can use below code to download page HTML:

from selenium import webdriver
  
driver = webdriver.Chrome()
driver.get("http://www.yahoo.com")
with open("/path/to/page_source.html", "w", encoding='utf-8') as f:
    f.write(driver.page_source)

Just replace "/path/to/page_source.html" with desirable path to file and file name

Update

If you need to get complete page source (including CSS, JS, ...), you can use following solution:

pip install pyahk # from command line

Python code:

from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
import ahk

firefox = FirefoxBinary("C:\\Program Files (x86)\\Mozilla Firefox\\firefox.exe")
from selenium import webdriver

driver = web.Firefox(firefox_binary=firefox)
driver.get("http://www.yahoo.com")
ahk.start()
ahk.ready()
ahk.execute("Send,^s")
ahk.execute("WinWaitActive, Save As,,2")
ahk.execute("WinActivate, Save As")
ahk.execute("Send, C:\\path\\to\\file.htm")
ahk.execute("Send, {Enter}")
CC BY-SA 4.0
11
  • Buddy, thanks for the prompt reply. But I am getting the following error: Traceback (most recent call last): File "C:\Users\karanjuneja\Desktop\Eclipse Workspace\Library\test1.py", line 35, in <module> f.write(driver.page_source) File "C:\Users\karanjuneja\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 106288-106293: character maps to <undefined> Commented Mar 20, 2017 at 10:08
  • I want to save the file in .mhtml format. I used the following code: from selenium import webdriver driver = webdriver.Chrome() driver.get("yahoo.com") with open("/path/to/page_source.html", "w", encoding="utf-8") as f: f.write(driver.page_source) It saved the page but the page just had source code. Cannot view the original content on the page. Any suggestions? Commented Mar 20, 2017 at 10:28
  • You mean you want browser to open your local (downloaded) page copy just like you get it from server directly?
    – Andersson
    Commented Mar 20, 2017 at 10:32
  • Yes buddy, just like the webpage appears when we open it. Commented Mar 20, 2017 at 10:33
  • :Thanks so much for the answer. Can you please provide the same solution for ChromeDriver as I am using Chrome browser. Would be grateful. :) Commented Mar 20, 2017 at 13:41

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.