Python Program To Download Files From Website
· four min read · Updated sep 2021 · General Python Tutorials
Disclosure: This post may contain chapter links, meaning when you click the links and make a purchase, we receive a commission.
Downloading files from the Internet is i of the most common daily tasks to perform on the Web. It is important due to the fact that a lot of successful software allows their users to download files from the Cyberspace.
In this tutorial, you volition learn how you can download files over HTTP in Python using the requests library.
Related:How to Use Hash Algorithms in Python using hashlib.
Let's get started, installing the required dependencies:
pip3 install requests tqdm
We gonna use the tqdm module hither just to print a expert-looking progress bar in the downloading process.
Open up a new Python file and import:
from tqdm import tqdm import requests import cgi import sys
We'll be getting the file URL from the command line arguments:
# the url of file you desire to download, passed from command line arguments url = sys.argv[1]
Now the method we gonna employ to download content from the web is requests.go()
, but the problem is it downloads the file immediately and we don't want that, as information technology will get stuck on large files and the retention will be filled. Luckily for us, there is an aspect we can prepare to Truthful
, which is stream
parameter:
# read 1024 bytes every time buffer_size = 1024 # download the torso of response past chunk, not immediately response = requests.get(url, stream=True)
Now only the response headers are downloaded and the connexion remains open, hence allowing usa to control the workflow by the use of iter_content()
method. Before we see information technology in action, we first need to retrieve the total file size and the file name:
# get the full file size file_size = int(response.headers.get("Content-Length", 0)) # get the default filename default_filename = url.split up("/")[-1] # get the content disposition header content_disposition = response.headers.become("Content-Disposition") if content_disposition: # parse the header using cgi value, params = cgi.parse_header(content_disposition) # extract filename from content disposition filename = params.get("filename", default_filename) else: # if content dispotion is not available, just use default from URL filename = default_filename
We become the file size in bytes from Content-Length
response header, we besides get the file proper noun in Content-Disposition
header, but we need to parse it using cgi.parse_header()
office.
Let's download the file now:
# progress bar, irresolute the unit to bytes instead of iteration (default by tqdm) progress = tqdm(response.iter_content(buffer_size), f"Downloading {filename}", total=file_size, unit="B", unit_scale=True, unit_divisor=1024) with open up(filename, "wb") every bit f: for information in progress.iterable: # write data read to the file f.write(data) # update the progress bar manually progress.update(len(data))
iter_content()
method iterates over the response data, this avoids reading the content at once into memory for large responses, we specified buffer_size
as the number of bytes it should read into memory in every loop.
We then wrapped the iteration with a tqdm object, which volition impress a fancy progress bar. We likewise changed the tqdm default unit of measurement from iteration to bytes.
After that, in each iteration, we read a chunk of data and write information technology to the file opened, and update the progress bar.
Here is my issue after trying to download a file, you tin can cull any file you lot desire, but make sure it ends with the file extension (.exe, .pdf, etc.):
C:\file-downloader>python download.py https://download.virtualbox.org/virtualbox/6.1.18/VirtualBox-6.1.18-142142-Win.exe Downloading VirtualBox-half-dozen.ane.xviii-142142-Win.exe: 8%|██▍ | 7.84M/103M [00:06<01:14, 1.35MB/s]
It is working!
Alright, we are done, as you may encounter, downloading files in Python is pretty easy using powerful libraries like requests, you can now use this on your Python applications, good luck!
Here are some ideas yous can implement:
- Downloading all images from a web folio.
- A Python script to download compressed archive files from the Internet and excerpt them automatically.
Past the way, if yous wish to download torrent files, check this tutorial.
Finally, many of the Python concepts aren't discussed in detail hither, if yous feel you want to dig more into Python, I highly suggest you get one of these astonishing courses:
Happy Coding ♥
View Total Code
Read Also
Annotate panel
DOWNLOAD HERE
Posted by: harrisaloost.blogspot.com