We analyzed 208K websites. Here's what we learned about Core Web Vitals and UX:
We analyzed 208,085 web pages to learn more about Core Web Vitals.
First, we set benchmarks for cumulative layout shift, first input delay, and largest content color.
We then examined the correlation between core web vitals and user experience metrics (such as bounce rate).
Thanks to the data provided by WebCEO, we were able to uncover some interesting results.
Let's dive right into the data.
Here is a summary of our key findings:
1. 53.77% of the websites had a good LCP (Largest Contentful Paint) score. 46.23% of the websites had “bad” or “needing improvement” LCP ratings.
2. 53.85% of the websites in our dataset had optimal FID (First Input Delay) ratings. Only 8.57% of the websites had a "bad" FID score.
3. 65.13% of the analyzed websites had good optimal CLS values (Cumulative Layout Shift).
4. The average LCP of the locations we analyzed was clocked 2,386 milliseconds.
5. Average FID was 137.74 milliseconds.
6. The mean CLS score was 0.14. This is slightly higher than the optimal score.
7. The most common problems affecting LCP were high number of requests and large transfer sizes.
8. Large layout shifts were the main cause of poor CLS values.
9. The most common problem affecting FID was an inefficient cache policy.
10. There was a weak correlation between core web vital scores and UX metrics.
11. We found that FID correlated slightly with page views.
53.77% of the websites had an optimal, largest content paint score
Our first goal was to determine the performance of each website based on the three factors that make up Google's top web vitals: Largest content color, cumulative layout shift, and first input delay.
Specifically, we wanted to determine the percentage of pages that were rated "good," "needs improvement," and "bad" in the search console of each website.
For this purpose, we analyzed anonymous data from the Google Search Console from 208,000 pages (a total of approx. 20,000 websites).
Our first task: analyze LCP (Large Contentful Paint). In simple terms, LCP measures how long it takes a page to load its visible content.
This is how the websites we analyzed developed:
- Good: 53.77%
- Need for improvement: 28.76%
- Bad: 17.47%
As you can see, the majority of the websites we looked at had a "good" LCP rating. This was higher than expected, especially considering other benchmarking efforts (like this one from iProspect).
It is possible that the websites in our data set check the page performance very carefully. Or it may be due in part to a difference in sample size (iProspect analysis continuously monitors 1,500 locations. We analyzed more than 20,000).
In any case, it's encouraging to see that only about half of all websites need to work on their LCP.
53.85% of the websites we analyzed had good first-time input delay ratings
Next, we looked at the First Input Delay (FID) ratings reported by Search Console. As the name suggests, FIP measures the delay between the first request and when a user is entered (e.g. entering a username).
Here is a breakdown of the FID values from our dataset:
- Good: 53.85%
- Need for improvement: 37.58%
- Bad: 8.57%
Again, just over half of the websites we looked at had “good” FID ratings.
Interestingly, very few (8.57%) had “bad” scores. This shows that a relatively small number of websites are likely to be negatively impacted when Google incorporates FID into its algorithm.
65.13% of the websites had an optimal cumulative layout shift score
Finally, we looked at the Cumulative Layout Shift (CLS) ratings in the search console.
CLS is a measure of how elements move on a page when it is loaded. Pages that are relatively stable while loading have high (good) CLS scores.
Here are the CLS ratings among the websites we analyzed:
- Good: 65.13%
- Need for improvement: 17.03%
- Bad: 17.84%
Of the three Core Web Vitals scores, CLS was the least problematic. In fact, only about 35% of the websites we analyzed need to work on their CLS.
The average LCP is 2,836 milliseconds
Next, we wanted to establish benchmarks for each Core Web Vital metric. As mentioned above, Google has created its own guidelines for each Core Web Vital.
(For example, a "good" LCP is considered less than 2.5 seconds.)
However, we hadn't seen a large-scale analysis attempting to rank each Core Web Vital metric "in the wild".
First, we compared the LCP values for the websites in our database.
Among the locations we analyzed, the average LCP was 2,836 milliseconds (2.8 seconds).
Here were the most common issues that negatively impacted LCP performance:
- High number of requests and large transfer sizes (100% of the pages)
- High network round trip time (100% of the pages)
- Critical requirement chains (98.9% of the pages)
- High initial server response time (57.4% of pages)
- Images not served in next-gen format (44.6% of pages)
Overall, 100% of the pages had high LCP values, at least partly due to "high numbers of requests and large transfer sizes". In other words, pages that are full of excess code, large files, or both.
This finding is in line with another analysis we conducted that found that large pages tend to be the culprit for most slow loading pages.
The average FID is 137.4 milliseconds
We then looked at the FID values on the pages of our dataset.
Overall, the mean delay in the first entry was 137.4 milliseconds:
Here are the most common FID problems we discovered:
- Inefficient cache policy (87.4% of pages)
- Long main thread tasks (78.4% of pages)
- Unused CSS (38.7% of pages)
- Excessive size of the document object model (22.3% of the pages)
It was interesting to see that caching issues negatively affected the FID more than any other issue. And it's not surprising that poorly optimized code (in the form of unused JS and CSS) is behind a lot of high FID values.
Average CLS is .14
We found that the average CLS score was 0.14.
This metric specifically examines how the content of a page “shifts”. A score of zero is considered ideal. And anything above .1 is rated "good" in the search console.
The most common problems affecting the CLS of the projects are:
- Large layout shifts (94.5% of the pages)
- Block resources for rendering (86.3% of pages)
- Text hidden when loading web fonts (82.6% of pages)
- Key queries not preinstalled (26.7% of pages)
- Incorrect size images (24.7% of pages)
How LCP correlates with user behavior
With the benchmarks set, we wanted to find out how exactly Core Web Vitals represented the real-world user experience.
In fact, this relationship is something that Google itself highlights in its Core Web Vitals Report documentation:
In order to analyze the most important web vitals and their effects on UX, we looked at three UX metrics that are supposed to represent user behavior on websites:
- Bounce rate (% of users who left a website's page while visiting)
- Page depth per session (how many pages users see before they leave the site)
- Time on the website (how much time users spend on a website in a single session)
Our hypothesis was as follows: If you improve a website's Core Web Vitals, it will have a positive effect on its UX metrics.
In other words, a site with “good” Core Web Vitals will have lower bounce rates, longer sessions, and higher page views. Fortunately, in addition to Search Console data, that dataset also contained UX metrics from Google Analytics.
Then we just had to compare each website's Core Web Vitals to each UX metric. Our results for LCP can be found below:
LCP and bounce rate
LCP and pages per session
LCP and time on site
In the three diagrams it was clear that all three different segments (good, bad and need for improvement) are somewhat evenly distributed in the diagram.
In other words, there was no direct relationship between LCP and UX metrics.
FID has a slight relationship with page views
Next, we examined the possible relationship between the initial input delay and user behavior.
As with LCP, it's logical that a bad FID would negatively impact UX metrics (especially bounce rate).
A user who has to wait to select from a menu or enter their password is likely to get frustrated and jump out. If this experience spans multiple pages, it can result in a reduction in overall page views.
Here's how FID correlates with your behavior metrics.
FID and bounce rate
FID and pages per session
Note: We found that a high FID tends to correlate with a low number of pages per session. The opposite was also the case.
FID and time on site
Overall, we only see evidence of a correlation when we compare the FID to the number of pages displayed per session. When it comes to website bounce rate and time, a website's FID doesn't seem to have any impact on user behavior.
How CLS affects user behavior
Next, we wanted to examine a possible relationship between CLS and user activity.
It seems logical that a bad CLS user would be frustrating. And could therefore increase the bounce rate and shorten the session time.
However, no case studies or large-scale analyzes could be found showing that high CLS scores affect user behavior. So we decided to conduct an analysis looking for possible relationships between CLS, bounce rate, dwell time and pages viewed. We found the following:
CLS and bounce rate
CLS and pages per session
CLS and time on site
Overall, we could not find any significant correlation between CLS, bounce rate, time spent on site or page views.
Hope you found this analysis interesting and useful (especially with the Google Page Experience update on the way).
Here is a link to the raw dataset we used. Together with our methods.
I would like to thank WebCEO for providing the data that made this industry study possible.
Overall, it was interesting to see that most of the websites we analyzed performed relatively well. And are largely ready for the Google update. It was interesting to note that while Core Web Vitals were metrics of positive UX on a website, we couldn't find any correlation with behavior metrics.
Now I would like to hear from you:
What is your most important finding from today's study? Or maybe you have a question about something from the analysis. In any case, leave a comment below.