An Introduction to Python for Technical SEO via @rvtheverett
Python has been receiving a lot of attention within the SEO community recently.
So, being the curious technical SEO that I am, I started looking into why and before I knew it, I was deep into learning and applying it.
It’s fair to say that I have fallen in love with the language over the past few months that I have been learning it and want to share it with everyone, to show how it can help automate SEO tasks.
I’m not a data scientist and I don’t have a computer science background, but the beauty of Python is you don’t need to have experience in either of these things in order to understand and start using it.
What Is Python?
In short, Python is an open-source, object-oriented interactive programming language that is interpreted line by line.
With simple and easy to learn syntax, as well as advanced readability and support for a number of modules and libraries, Python is well-loved due to the increased productivity it provides.
As a testament to this, Python is used by some of the biggest organizations in the world to power their platforms, perform data analysis, and run their machine learning models.
Companies including Google, YouTube, Netflix, NASA, Spotify, and IBM have publicly stated Python has been an important part of their growth, due to its simplicity, speed, and scalability.
In fact, Google’s first web-crawler was actually written in Python and it remains one of their official server-side languages.
How to Run Python
You can run Python scripts in a number of ways, depending on what works best for you.
Most systems come with Python already installed, although it’s worth noting that this will more than likely be Python 2, which will be officially deprecated in 2020 now that Python 3 is deemed stable.
You can run Python from your terminal or command line IDE (Integrated Development Environment) or use cloud-based alternatives including:
These provide an easier experience for beginners to learn and test elements of code line by line.
The main power of Python is in its libraries, which enable a number of add-ons including:
- Data extraction.
- Analysis and preparation.
- Scientific computing.
- Natural language processing.
- Machine learning.
Some useful libraries for tasks involving data analysis and automation include:
- TensorFlow: An open-source machine learning library.
- NumPy: Useful for scientific computing.
- SciPy: Used for scientific and technical computing.
- SciKit Lear: Machine learning for data mining and analysis.
- Pandas: Used for data manipulation and analysis.
- SpaCy: A great natural language processing library.
- Requests: A library for making HTTP requests.
How Python Can Help with Technical SEO
Python empowers SEO professionals in a number of ways as it not only enables us to automate repetitive tasks, but also to extract and analyze large data sets.
The amount of data marketers work with is only increasing, so being able to efficiently analyze this will help to solve many complex problems in a shorter amount of time.
This in turn saves valuable time and allows us to be more efficient in undertaking other important SEO tasks.
These factors combined have led to a growth in the popularity of Python amongst SEO professionals.
The ability to better understand data will not only help us do our jobs better, but will also allow us to make data-driven decisions.
These decisions will then enable us to provide concrete insights for our clients and stakeholders and have more confidence in the recommendations we implement.
Automating with Python
While Python will not be able to imitate human, emotion-led strategy, Python scripts can be used to automate a large number of time-consuming tasks.
This list of tasks you can automate with Python is growing continuously:
- Identifying user intent.
- Mapping URLs ahead of a migration.
- Internal link analysis.
- Performing keyword research.
- Optimizing images.
- Scraping websites.
Example Scripts to Try
Ready to get started with Python?
Here are a few useful scripts which I have been exploring recently, along with a brief description of how each one works and the challenges they solve.
Image Captioning with Pythia
This is the first script that introduced me to the language and the one that kick-started my desire to learn.
Using Pythia, which is a modular deep learning framework created by Facebook, this script generates a caption for an image URL.
This caption can then be used for images currently missing alt tags, which are important for accessibility and image search.
The script is based upon the bottom up and top down mechanism, which calculates results by focusing attention on different elements within an image.
For each word generated, attention is weighted to individual pixels within the image, outlining the region with the maximum attention.
The ease of this script is due to the fact that it can be run straight from Google Colab and requires no direct coding.
Once a copy of the necessary code is saved to your personal Google Colab drive, all cells can be run, performing each step for you.
This will download the data sources needed to run the process, as well as automatically completing all of the steps that would typically need to be undertaken manually.
For example, all libraries will be installed, classes will be created and functions assigned.
This will generate an area to add in your image URL and a button to caption the image.
A caption will then be provided for each image, which can be directly used as an alt tag, or to inspire the creation of one.
Hamlet Batista has written a comprehensive guide to generate text from images with Python which shows this script in action.
I found this SEO Analyzer script, created by Seth Black, on GitHub, which is used to analyze the structure of a site by crawling it and providing an analysis of the basic SEO issues.
It requires Python 3.4 or above, as well as the BeautifulSoup and urllib packages. Once installed you can then crawl a website from the homepage or XML sitemap.
Once it has finished crawling the site it will display data including word count, page titles, and meta descriptions as well as warnings, where applicable, for missing titles, meta descriptions and alt text.
Another GitHub find is this script, created by Victor Domingos, which is written in pure Python and is used to reduce the file size of images.
It requires Python 3.6 or above, as well as the Pillow library, in order to run.
Once installed you will be able to optimize either a single image or a folder with multiple images, using the appropriate string detailed in the GitHub repository.
It is worth noting that this script does optimize images destructively, so it’s recommended that you save a copy before running the operation.
In this example, the image I ran through the script was reduced by 5%, decreasing the file size from 2.8 MB to 2.6 MB.
As you can see below, there is no visible difference between the original and optimized images.
Even this 5% reduction in weight on a page can have a significant impact on the performance.
These three examples are just scratching the surface, there are many more automation and optimization possibilities using Python scripts, including:
- Internal linking analysis.
- Log file analysis.
- Hreflang validation.
- Keyword growth calculation.
- Collecting GSC data.
- Performing competitor analysis.
Powering Machine Learning
Python is also a popular language used to power machine learning applications due to its simple, intuitive and accessible syntax.
It is also open-source, with several developer advocates providing support for users.
In addition, there are a large number of useful libraries which are helpful when working with and training machine learning models.
What Is Machine Learning?
Machine learning is essentially “an application of artificial intelligence that provides systems with the ability to automatically learn and improve from experience, without the need to be explicitly programmed” (a full definition can be found here).
Machine learning is typically used to identify patterns in data, upon which predictions can then be made.
Python & Machine Learning
Run in conjunction with machine learning, Python can be used to power scripts for training a dataset, before it summarizes and visualizes the data.
From here, the model will evaluate the algorithms to enable predictions to be made.
Real-World Machine Learning Examples
The use of machine learning on the web is increasing all the time, with new models being created and training data becoming more accessible daily.
Some real-world machine learning examples include:
- Google’s RankBrain algorithm.
- Baidu’s Deep Voice program.
- Twitter’s curated timelines.
- Netflix and Spotify recommendations.
- Salesforce’s Einstein feature.
SEO Possibilities with Machine Learning
Due to their ability to solve complex problems, it is no surprise that machine learning models are being used to help make marketers’ lives easier.
As Britney Muller says:
“Machine Learning is becoming more accessible and will free us up to work on higher-level strategy.”
This will enable you to spend more time finding solutions, rather than just identifying problems.
Some examples of machine learning models used in SEO include:
- Content quality evaluation.
- Identifying keyword gaps and opportunities.
- Gaining insights into user engagement.
- Optimizing title tags.
- Automating meta description creation.
- Transcribing audio.
Google’s NLP Model
One such model worth checking out is Google’s Natural Language Processing API, which uses machine learning to reveal the structure and meaning of text. It analyzes text to understand the sentiment, as well as extract key information.
Not only does this API allow you to train a model personalized to your content, providing results that are relevant to your specific needs, you will also gain an insight into Google’s understanding of your content.
I hope this has inspired you to start learning Python and explore how it can help you with automating tasks and analyzing complex data in order to increase your efficiency.
To finish, I wanted to share my three biggest tips for getting started and continuing to learn:
Tip 1: Talk to Your Developers
There’s a high chance the developers you work with will have an understanding of Python.
Have a conversation with them, let them know what you’re working on and spark their interest too – there may even be something you can collaborate on!
Tip 2: Join Communities
One of the best things about learning Python is the support available, there are so many online communities (such as this one) with hundreds of supportive individuals willing to provide non-judgmental advice.
Tip 3: Keep Practicing & Have Fun
This is the most important piece of advice a developer friend gave to me.
There is no pressure to become a Python master in weeks.
Take your time to learn the language and start fun side projects to put what you are learning into practice.
Some great resources which helped me get started include:
- How to Use Python to Analyze SEO Data: A Reference Guide
- How to Uncover Powerful Data Stories with Python
- How to Spy on Competitors with Python & Data Studio
All screenshots taken by author, October 2019