seo analysis with python

What is SEO and Why does SEO matter?

So, you know Python and you have started to blog (like me!), but you want to make sure that what you write is read by people! How are people supposed to find your content? Well, social media sharing can be a good start. You can share what you write on various social media platforms and encourage people to subscribe to an email list where they are informed when you have posted something new.

Another way is to be found on search engines. People are interested in a certain topic, they search for some terms and they are presented with some results. The position of these results on search engine pages depend on a lot of factors including the quality of content and the structure of it.

SEO, or search engine optimization, is the process of improving the visibility and ranking of a website or web page in search engine results pages (SERPs). This can be done through various techniques, such as optimizing the website’s content and structure, building backlinks, and using relevant keywords.

SEO Analysis with python

In this article, I will show you how you can build an SEO analyzer with Python to analyze the SEO of your website with regard to several factors: your common keywords, title of the post, the meta description, the headings, and Alt attribute of images. You can also watch the video at the end of this post for more explanation.

I am going to use the Requests library and the BeautifulSoup library to extract the relevant SEO features with python. I will also use the NLTK library to extract the most common keywords of your post. You an read my other post on Web Scraping with BeautifulSoup as well.

The script below uses the requests library to send a GET request to the website and the beautifulsoup4 library to parse the HTML content. The soup.find() method is used to extract the title and description from the meta tags. I use soup.find_all() method to find all instances of headings or images withourtAlt attribute.

NLTK library is imported to deal with processing the text of our webpage. Firstly, I tokenize the text, that is, I turn the whole text into tokens (words or characters) and put them in a list. Then, I use the NLTK stopwords to get rid of the words and characters that are not helpful in analysis. These stopwords include words like: in, or, with, the, ….

Finally, I extract the first 10 common words inside the list.

Subscribe to Receive the Latest Python Tips

The final code for python sEO analyzer

import requests
from bs4 import BeautifulSoup
import nltk
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punkt') 

def seo_analysis(url):
# Save the good and the warnings in lists
    good = []
    bad = []
# Send a GET request to the website
    response = requests.get(url)
# Check the response status code
    if response.status_code != 200:
        print("Error: Unable to access the website.")
        return

# Parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')

# Extract the title and description
    title = soup.find('title').get_text()
    description = soup.find('meta', attrs={'name': 'description'})['content']

# Check if the title and description exist
    if title:
        good.append("Title Exists! Great!")
    else:
        bad.append("Title does not exist! Add a Title")

    if description:
        good.append("Description Exists! Great!")
    else:
        bad.append("Description does not exist! Add a Meta Description")

# Grab the Headings
  hs = ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']
  h_tags = []
  for h in soup.find_all(hs):
    good.append(f"{h.name}-->{h.text.strip()}")
    h_tags.append(h.name)

  if 'h1' not in h_tags:
    bad.append("No H1 found!")

# Extract the images without Alt
    for i in soup.find_all('img', alt=''):
        bad.append(f"No Alt: {i}") 

# Extract keywords
# Grab the text from the body of html
    bod = soup.find('body').text

# Extract all the words in the body and lowercase them in a list
    words = [i.lower() for i in word_tokenize(bod)]

# Grab a list of English stopwords
    sw = nltk.corpus.stopwords.words('english')
    new_words = []

# Put the tokens which are not stopwords and are actual words (no punctuation) in a new list
    for i in words:
      if i not in sw and i.isalpha():
        new_words.append(i)

# Extract the fequency of the words and get the 10 most common ones
    freq = nltk.FreqDist(new_words)
    keywords= freq.most_common(10)

# Print the results
    print("Keywords: ", keywords)
    print("The Good: ", good)
    print("The Bad: ", bad)
    
# Call the function to see the results
seo_analysis("https://pythonology.eu/what-is-syntax-in-programming-and-linguistics/")

Video tutorial for python sEO analyzer

Here is the first part of the tutorial. Check the channel for the other parts.

Consider subscribing to the email list if you would like to receive tutorials like this.

Similar Posts