I was trying to create a website in Django, which basically scrapes the data from google news and puts it on my website. But I didn’t know how to use the data that I extracted from google news in my Django HTML file. Is there a way that I could do that.
Also, It slows the website very much, so is this the best way to do it?
The web scraping code:
from bs4 import BeautifulSoup
import requests
url = "https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en"
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
n = 1
for link in soup.findAll('h3', {'class', 'ipQwMb ekueJc RD0gLb'}):
title = link.string
for a in link.findAll('a', {'class', 'DY5T1d'}):
href = a.get('href')
link_href = href.replace(".", "")
print("(" + str(n) + ")" + title + "n" + "https://news.google.com" + link_href)
n += 1
Even if this post is old right now my answer might help others along their way 😉
You have to implement threading to avoid the slow down of the page, while the scraping process (or any process that takes time). Means one task should always get a new thread. Find multiple threading on YouTube and google there are a lot of tutorials, even specifically for Django.
Best of luck and enjoy coding 🙂