The Google Search Console/Google Analytics 4 APIs are no longer the best alternative and it’s time to show you what to do instead.
You can store your GSC and GA4 data in BigQuery thanks to their free native connectors.
I’ll show you all the valid reasons that move the needle from a business perspective.
The short answer is: you get much more data and everything in one place, for (almost) free!
P.S. You folks have asked for it, now you get the article!
Table of Contents
The Problem
You need to prepare a report for your boss for the last 2 years.
You query your data but suddenly realize that the GSC API can’t go past 16 months.
A colleague of yours already has some data but it doesn’t match because they applied different filters.
Cool, I can check GA4 at least.
You use its API and notice that you have finished the quota.
No report for you!
Even if you have the data, you need to:
- clean it
- run analysis
- makes sense of it
- prepare a nice report
This was my life some years ago and I hated it.
I still hate it today and that’s why I solved the problem by pushing for data warehouses.
The actual setup is available here for GSC and here for GA4.
My previous article on combining GSC and GA4 data in BigQuery contains much more technical information.
I don’t want to write another article that just summarizes Google documentation.
This is for business/strategic purposes!
Why Even BigQuery?
Simply put, you get a free connector to BigQuery from both GA4 and GSC.
BigQuery is a data warehouse solution offered by Google itself.
Cool, but what does that mean?
A data warehouse is an enterprise system storing data from multiple sources.
It’s a storage for your marketing data in this case.
The 2 Most Common Stereotypes
This is what I often read online:
❌ BigQuery is only for big websites.
Wrong, without it, your small website gets even less data.
⚠ Google likes to apply thresholding to low-traffic websites in GA4 so that you can’t track individual users.
This means you lose some page data and your data won’t be that accurate or complete.
❌ BigQuery is expensive.
It costs below $5/month for many normal websites for storing data.
If you want to run queries, you pay after 1TB of compute per month.
That’s actually a lot for many simple use cases!
What If My Website Is Small
BigQuery is still cheap, as one of my not-so-small websites below:
You read a lot of wrong takes on LinkedIn and X like “BigQuery is too expensive for our company”.
Me: “Did you actually use it?”
Them: “No.”
It costs pennies, the smaller, the better.
Do it now because otherwise you’ll have nothing to compare and regret it.
You are forced to use BigQuery for small websites because Google Analytics 4 applies thresholding if you don’t get enough traffic.
They claim it’s done to avoid the identification of specific users.
Search Console has much more data with the Bulk Export, including what you miss from anonymized queries.
No More Inconsistencies
Storing data is a must because you don’t want to download your data every single time and do the manual work…
it’s prone to error and obnoxious.
Databases are created to fit the bill, you put the data there and you can sleep safely.
If you start today with the exports, you won’t have to do much.
All of this data will flow to BigQuery and you don’t need to do anything, really.
This is why we love having a single source of truth:
Google data is already confusing, there is no need to add other problems.
You are solving a business problem (having all the data somewhere) with engineering.
P.S. Neither connector lets you backfill data. The sooner you store the data, the better.
The Case for Search Console API Calls
The biggest mismatch is given by the GSC API when making calls with different dimensions.
Let’s get only the page dimension from the API:
I get these 2 pages, fine.
Now, I make another call to the API and ask for the query dimension too:
It’s not all the rows, I’ve just taken a screenshot of the first few rows!
Now, let’s create a pivot table based on this data to reproduce what I got in the 1st API call (page only):
Nowhere close to the 1st call and that’s because Google hid anonymized queries when I asked for more granularity (aka the query column).
Would you trust relying on a process like this for your company?
Most likely not.
The boring technical stuff can be found at this nice URL.
Yes, the GSC API also has a “soft” limit on the rows you can get.
Google Search Console VS Google BigQuery
GSC data is similar to the API but there are some key differences.
The main advantages are:
- Going beyond the 16-month limit (hello historical data!)
- Missing data from anonymized queries
- Getting much more clicks and impressions data
The anonymized queries have a big effect on the clicks you get and can lead you to underestimate SEO.
This is even more true for smaller websites, as shown by the 17% below for Seotistics, which is super small!
Most data is missing from Search Console and you can see a lot of missing queries in BigQuery!
To be fair, the export isn’t powerful just because you get more data… but because it’s the best data you can get.
What To Know
The data is relatively similar to the GSC API you are used to with some minor differences:
- You have to calculate CTR yourself
- The average position is absent and you have to calculate based on the cryptic metric named sum_position
- Data for anonymized queries is there but not the actual queries, e.g. you see the clicks but not the query
As explained a lot of times, GSC has 2 tables (ignore ExportLog):
Search Console is straightforward as always but remember, you can’t make actual comparisons with the UI/API data for URLs and queries.
You will always the URL table and I’ve summarized its structure below:
Google Analytics 4 VS Google BigQuery
GA4 data is the hardest to understand but it’s the best to learn BigQuery.
This export has many more advantages than the GSC one.
The most evident is bypassing the GA4 quota limits that stop you from making API calls after a certain threshold.
Yes, you can’t use Explorations or even the API as you wish, there are quota limits.
You also need to consider that cardinality and sampling affect your reports.
- Cardinality = a dimension with too many unique values, e.g. Page location. Data past the limit will get labeled as (other).
- Sampling = GA4 may consider only some of your data because of quota limits.
- Thresholding = Hiding data to prevent you from inferring the identity or sensitive information of individual users based on demographics, interests, or other signals present in the data.
Analytics 360 is the paid version of GA4 with higher quota limits and costs a lot… would you pay for it? Hell no!
Having raw data allows you to define any metric you like and create your attribution models.
You have absolute control over your data unlike what GA4 gives you by default.
This export will not contain data from Google Signals but solves sampling, cardinality, thresholding and pretty much every issue.
Explorations in GA4 stop at 14 months, so if you want to do a YoY comparison, you are forced to use the export.
The linking gives you access to users data too.
Since the GA4 schema is quite hard, I’ve prepared tons of material on how to use it.
What To Know
GA4 data is challenging even for experienced engineers so it requires more study and effort.
As explained in my previous article, you get 4 tables at most:
- events
- events_intraday
- pseudonymous_users
- users
Out of all of them, you will almost always use the events table.
The structure is nested as you can see below:
That’s a mess to read, let’s visualize it in Preview mode:
Much better! This table records 1 event for each row. Imagine every row as a subtable for each event.
Luckily, there are too many free resources online to work with it, including this nice website.
I have a dedicated course on both data sources, with a particular focus on BigQuery and GA4 now.
If you want to learn more about this topic, I’ve prepared an Analytics for SEO course that covers both theory and practice:
The events table can be summarized like this:
My Personal Experiences
I’m obsessed with the topic and so far I’ve always loved BigQuery.
In ALL of my experiences, this practice has shown positive benefits and fewer headaches.
Imagine the following;
- 60+ domains
- No decision-making process
I stored all the data in BQ and then worked on the other steps, namely cleaning and processing the data.
Documentation is the most crucial step, taking the most time.
You agree on definitions and you are done. I don’t need to review them every time.
Actual analysis and building dashboards are easy once you tackle the basics.
The advantages are a boost in productivity, saving money and being able to decide much faster.
The reality is something like this:
- you query the GSC API and figure out how many dimensions you want
- another person does the same and sets different requirements
- the data doesn’t match
- you don’t get access to past data
What They Don’t Tell You
Using tools like BigQuery isn’t free of hassle, depending on how you use them.
You don’t (always) need a Data Engineer for simpler use cases, like storing your data and running some SQL once in a while.
It’s completely different for medium/larger websites that want to build automation and process data properly.
Or for those who already have a data warehouse and want to integrate BigQuery.
A technical person can work for smaller websites and the bill will be super low.
Consider a different scenario for larger websites as you will need to talk to engineers.
DBT and Dataform are partners in crime because you shouldn’t really work with the schema Google provides you.
There are packages like this one that do most of the job for you.
For simpler use cases, you can just use Google Sheets or even Python libraries and analyze data on the go.
This summary table will give you a better idea: