Keyword clustering is everywhere in SEO and it’s a quite common technique to group keywords.
I will show you many alternative methods and share some code so that even YOU can create some basic app with it.
If you want to implement some of this into your existing workflows, you are in the right place. π
Table of Contents
Prerequiments + Sample Code
There isn’t too much needed but have at least some familiarity with:
- Python (or any other programming language)
- Familarity with SERP APIs
- A keyword-domain list
You can easily get your keyword-domain via DataForSEO. I have written a guide for this process but you need to subscribe to my newsletter π
The SERP Clustering script is publicly available on my GitHub (plus more nice features).
My advice is to download the entire folder in Cursor and help yourself with the power of LLMs.
There are many things you change and toy with, so I don’t recommend this code to newcomers.
Enjoy!

Now LLMs make it even easier for you to understand and learn.
You can run most of these concepts through Claude and get satisfactory detailed explanations.
SERP Clustering
This is the best method for SEO use cases as you are simply following what Google ranks.
It’s based on a set of rules and associations, everything is transparent and clear. Your stakeholders will love the clarity it provides.
There are many variations on this one but the underlying idea is always the same: grouping keywords that share some URLs in common.
This is nothing else than the evergreen set theory you study in school, in this case the Overlap Coefficient.

Now, there are different opinions on the threshold, namely how many URLs in common
I usually pick values between 3 and 4, depending on the type of queries I am analyzing.
For Ecommerce, a number like 4 is more recommended as you can see a LOT of hybrid SERPs with both informational and commercial intent.
Stat | Value | Considerations |
---|---|---|
Cost | β β | The only cost here is scraping the SERPs. |
Speed | β β β β β | Blazing fast method. |
Ease of use | β β β β β | For a technical person, this is super easy. |
Flexibility | β β | The rule is the rule. Nothing too particular going on here. |
Interpretability | β β β β β | The easiest to explain and interpret, really. |
The great thing about rules is that they are clear and not opinable.
You have FULL control over what are your results and can quickly troubleshoot issues.
Cheap, fast and easy to deploy at scale, lovely.
We will see what will change with AI in terms of measurement but as of today, this is how you do it.
P.S. In the example I shared on GitHub I used a graph as a way to connect the different URLs.
Based on that, I apply TF-IDF, an algorithm to identify the main topic of the cluster.
ML-based Methods
If your goal is to classify a set of keywords together for non-SEO reasons, then these methods are better.
We often refer to them as “Semantic Clustering” because they are based on semantic “connections”.
Unlike the previous class of methods, these ones tend to be more expensive:
Stat | Value | Considerations |
---|---|---|
Cost | β β β β | |
Speed | β β | Complex methods are also slower. |
Ease of use | β β β | Wouldn’t say this is super complex but not necessarily easy. |
Flexibility | β β β β β | More options than a set of rules for sure! |
Interpretability | β β | Hard to “actually” explain why a certain keyword was assigned to a group. |
The most modern approach involves using vector embeddings to convert your strings into vectors.
Imagine converting words into numbers so that machines have an easier time comparing them.
One of the best examples is provided by Lee Foot, who extensively worked and still works on such use cases.
Here is the link to his semantic keyword clustering algorithm.
It does much more than clustering keywords and this is one of the big advantages of going down the ML route.
If you want to do something more complete, you need to go beyond what Google thinks and consider topics instead.
SEO-only content is no longer a thing and this is why identifying topics may be your next step.
For SEO, it’s always better to rely on what Google shows.
For omnichannel purposes, ditch the SERP method as it’s secondary.
Recommended Python libraries: sentence-transformers, keyBERT
Rules VS Machine Learning
One of the concepts I keep teaching people is that simple algorithms based on rules are almost always better that more complex methods.
There are many reasons as to why:
- rules can be explained easily and don’t have surprises
- It’s cheaper and faster
- Simple > Complex
Marketers love to overcomplicate the obvious but remember, these problems are actually tackled by analysts and engineers.
And as a Data Analyst, I tell you that most complex solutions are expensive and subpar π
As explained in my article about Web Analytics Strategy, you should frame problem first and propose solutions after.
Buy or Build?
Most companies can pay for a tool and call it a day.
If you are working on complex systems and need a hand, you need to have your own solutions to save costs and have more flexibility.
Factor | Buy | Build | Explanation |
---|---|---|---|
Cost | β | Building is absolutely cheaper. It took me 10 minutes with Cursor for a mockup. | |
Convenience | β | Buying is often better because you get other features too. | |
Complexity | β | Subscribing to the nth tool removes a lot of work on your end. | |
Integration | β | Custom solutions can be integrated wherever you want. |
Being able to integrate your solution into existing workflows is the biggest pro in favor of building something.
This is an example of a workflow using clustering:

This is similar to what I covered when discussing Modern Content Management, so having a detailed process to create and later manage your content.
You can bulk extract GSC/Google Ads keywords based on criteria you define and then run them through your script.
Remember, you don’t need to do clustering every single day, especially for content and informational queries.
It’s more important to get an idea of your topics and THEN actually write about them.
Not many companies are able to act on their content plans, though.
For Ecommerce, it’s more about categories and products and depending on their size, you may want to expand the use cases.
Many people get the timing wrong.
Clustering is something you do once in a while to define your plan.
For sure, SERP Clustering is in the top tier of automation solutions you can ever think of:

It’s not complex and the business value is clearly there.
Just be sure not to overdo it!
The Why Of Clustering
There is a specific reason why we often talk about this topic:
- Save countless hours of manual grouping
- It can be used to prevent cannibalization
- Not only for SEO… it’s for audience research!
For keyword cannibalization, there are more efficient (and free) methods like the one I teach in my ebook and course.
The big issue of many SEOs is that they think in silos.
The benefits of keyword clustering go beyond SEO and can involve the other channels.
From a Content Marketing perspective, this is ideal for researching what your audience wants and produce content.
An Example On Seotistics
For example, let’s extract some data from Seotistics (aka this very website):

If you have followed my BigQuery guide, you would know that it’s the best solution to store your keyword data.
We can export this data as a CSV file and ingest it into my script:

Now we got a nice list of keywords grouped, it’s possible to think about content!
For small websites like this one, it’s possible to simply pick the best clusters, which isn’t always possible for big players.

For my website, these are some of the topics it detected.
Since it’s a small website at the time of writing, you have to consider many clusters will contain a few keywords, if not only one.
(The example above is also showing super broad topics, you want to focus on more bespoke options instead)
In terms of actions:
- 1 cluster = 1 article idea. Some of these are too “tactical” for me but I got some great ideas for my future content.
- Think about social media too. This is all audience data in the end, so great tips on what people may be interested in.
You can add the total volume and other information if you also want… but remember, this will cost you some more money!
How To Go Beyond
That’s not all, you can actually go beyond and improve my basic code with the following:
- Simulate overall search intent (e.g. is it transactional or purely informational?)
- Total Volume (or Impressions if using GSC data)
- Classify clusters even more and create topical clusters (macro-categories)
The opportunities are endless and you can help yourself with Cursor and Claude to add whatever you want.
Just be careful when you run the code as it can be completely wrong π
We will see how it can work with all the recent AI changes but I am sure there will be similar methods even for AI mode.
The main differentiation here is that you can apply this advice to create content for other channels and not only for “SEO”.
Common Industry Tools
Not everyone has the infrastructure or technical know-how to deploy the solutions I described.
So you can just pay for one of the following tools, they also offer more features.
I would rather not evaluate the tools themselves but simply list them below so you can choose which you prefer based on your own criteria:
(All of them offer SERP Clustering).
I don’t recommend relying on tools like Semrush for this task as it’s a quite limited feature.
All of the tools I listed offer more functionalities and go beyond clustering as it’s a simple feature to emulate.
If you need to do non-SEO tasks, it’s much better to build your own solutions.
This also includes
The Era Of LLMs
If you are a technical user, you can replicate whatever you want with Cursor and Claude.
For sure, production-ready code is a different beast and should be handled by professionals.
But if you are like me and don’t want to pay for a quick solution, LLMs are the best choice.
I recommend you edit my code and apply the necessary adjustments based on your use case.
That said, using LLMs to actually do the clustering is out of discussion, as you may have well understood from this article. π
We will see whether this approach will still make sense in the next future of SEO.
For now enjoy the 10 blue links and the AI Overviews!