How do search engines work and why should you care?
Before we get into the technical issues, let's first make sure we understand what search engines actually are, why they exist, and why this matters in the first place.
What are search engines?
Search engines are tools that find and rank web content that matches a user's search query.
Every search engine consists of two main parts:
- Search index. A digital library of information about websites.
- Search algorithm (s). Computer programs that evaluate matching results from the search index.
Examples of popular search engines are Google, Bing, and DuckDuckGo.
What is the goal of search engines?
Every search engine aims to provide the best and most relevant results for the users. In this way, they receive or hold – at least in theory – market shares.
How do search engines make money?
Search engines have two types of search results:
- Organic results from the search index. You can't pay to be here.
- Paid results from advertisers. You can pay to be here.
Every time someone clicks on a paid search result, the advertiser pays the search engine. This is called pay per click (PPC) Advertising.
That is why market share is important. More users mean more ad clicks and more sales.
Why should you care how search engines work?
Knowing how search engines find, index, and rate content will help you rank your website by relevant and popular keywords in organic search results.
If you can rank high for these queries, you will get more clicks and organic traffic for your content.
Which is the most popular search engine?
Google. It has a 92% market share.
Google is the search engine that does the most SEO Pros and website owners care because it has the potential to send more traffic on its way than any other search engine.
Most popular search engines like Google and Bing have trillions of pages in their search indexes. So before we dive into ranking algorithms, let's dig deeper into the mechanisms used to create and maintain a web index.
Here is the basic process, courtesy of Google:
Let's break this down step by step:
- Processing & Rendering
The following process is specific to Google, but is likely to be very similar for other web search engines like Bing. There are other types of search engines like Amazon, YouTube, and Wikipedia that only display results from their website.
Step 1. URLs
It all starts with a known list of URLs. Google discovers these through various processes, but the three most common are:
Google already has an index of trillions of websites. If someone adds a link to one of your pages from one of these websites, they can find it from there.
You can view your website's backlinks for free in Site Explorer using Ahrefs Webmaster Tools.
- Open a free account with Ahrefs Webmaster Tools
- Paste your domain into Site Explorer
- Go to Backlinks Report.
Our crawler is the second most active after Google. Hence, you should see a reasonably complete view of your backlinks here.
Sitemaps list all the important pages on your website. Submitting your sitemap to Google may help them discover your site more quickly.
Of Url Submissions
Google also enables the submission of individual URLs via the Google Search Console.
Step 2. Crawling
When crawling, a computer bot called a spider (e.g. Googlebot) visits the detected pages and downloads them.
It's important to note that Google doesn't always crawl pages in the order in which they discover them.
Google queues URLs for crawling based on a few factors including:
- the PageRank of the Url
- how often the Url Changes
- whether it's new or not
This is important because search engines may crawl and index some of your pages before others. If you have a large website, it can take search engines a while to fully crawl it.
Step 3. Processing
During processing, Google works to understand and extract important information from crawled pages. Nobody outside of Google knows every detail of this process, but the important parts to our understanding are extracting links and storing content for indexing.
Google needs to render pages in order to fully process them. This is where Google runs the page's code to understand what it looks like to users.
That said, some processing is done before and after rendering – as you can see in the diagram.
Step 4. Indexing
Indexing is the process of adding processed information from crawled pages to a large database called a search index. This is essentially a digital library of trillions of web pages that Google search results came from.
This is an important point. When you type a query into a search engine, you are not looking directly for matching results on the Internet. You are looking in the website index of a search engine. If a web page is not in the search index, search engine users will not find it. This is why it is so important that your website is indexed on major search engines like Google and Bing.
Finding, crawling, and indexing content is only the first piece of the puzzle. Search engines also need a way to evaluate matching results when a user performs a search. This is the job of search engine algorithms.
Every search engine has unique algorithms for ranking websites. However, since Google is by far the most widely used search engine (at least in the western world), we will focus on these in the rest of this guide.
As is well known, Google has more than 200 ranking factors.
Nobody knows what all of these ranking factors are, but we know the key factors.
Let's discuss some of them.
- Current authority
- Page speed
- Mobile friendliness
Backlinks are one of the most important ranking factors for Google.
Andrey Lipattsev, Senior Search Quality Strategist at Google, confirmed this during a live webinar in 2016. When asked about the two most important ranking factorshis answer was simple: Contents and links.
Absolutely. I can tell you what they are (the two most important ranking factors). It is satisfied. And there are links that point to your website.
Links have been an important ranking factor at Google since 1997 when they introduced PageRank, a formula for assessing the value of a website based on the quantity and quality of backlinks pointing to it.
When analyzing over a billion pages, we found a clear correlation between the number of websites linking to a page and the amount of organic traffic from Google.
However, it's not all about quantity as not all backlinks are created equal. It is entirely possible that a page with some high quality backlinks will outperform a page with many lower quality backlinks.
There are six key attributes of a good backlink.
Let's take a closer look at the two most important ones: authority and relevance.
Backlinks from authoritative pages and websites usually have the greatest impact on ranking.
How do you define authority? In connection with SEO, Authoritative pages and websites are those that have many backlinks, or "votes".
In Ahrefs we have two metrics for estimating the relative authority of websites and pages:
- Domain Rating (DR): The relative authority of a website on a scale from 0 to 100.
- Url Rating (UR): The relative authority of a page on a scale from 0 to 100.
You can check the eligibility of any website or webpage in Ahrefs' Site Explorer.
Links from relevant websites and web pages are usually the most valuable.
Google talks about the relevance related to the ranking of useful pages on their page about how the search works.
If other prominent websites on the subject If you link to the page, it is a good sign that the information is of high quality.
When you're wondering why relevance matters, start thinking about how things work in the real world. You would likely trust your chef's advice when looking for the best Italian restaurant instead of your veterinarian friend's advice. However, if you're looking for cat food recommendations, it's the other way around.
Google has many options for determining page relevance.
At the simplest level, it searches for pages that contain the same keywords as the search query.
However, the relevance goes far beyond keyword matching.
Google also uses interaction data to evaluate whether search results are relevant to queries. In other words, do searchers find the site useful?
Partly, that's why all of the top scores for Apple are about the tech company, not the fruit. Google knows from interaction data that most searchers are looking for information about the former, not the latter.
However, interaction data is nowhere near the only way Google is doing this.
Google has invested in many technologies to understand the relationships between entities such as people, places, and things. The knowledge graph is one of those technologies, which is essentially a huge knowledge base of entities and their relationships.
Both apple (fruit) and apple (technology company) are units in the knowledge diagram.
Google uses the relationships between entities to better understand page relevance. A fitting result for "apple" that speaks of oranges and bananas is clearly the fruit. Clearly, anyone talking about the iPhone, iPad, and iOS is talking about the tech company.
This is in part thanks to the Knowledge Graph that allows Google to go beyond keyword matching.
Sometimes even search results are displayed that seem to omit important keywords from the query. For example, take the second result for Apple Paper App, which doesn't mention the word apple anywhere on the page.
Google can determine that this is a relevant result, also because entities such as iPhone and iPad are mentioned in the Knowledge Graph, which are undoubtedly closely related to Apple.
Interaction data and the knowledge graph aren't the only technologies Google uses to understand the relevance of a page to the search query. Much of the work is done using technology to understand the meaning and intent of the query itself, such as: BERT and RankBrain. Google even sometimes rewrites queries behind the scenes to provide more relevant results.
Freshness is a query-based ranking factor, which means it is more important to some results than others.
For a query like "What's new on Amazon Prime?" Freshness is important as searchers want to know about recently added movies and info TV shows. This is probably why Google ranks newly published or updated search results higher.
When it comes to questions like “best headphones”, freshness is important, but not that important. Headphone technology is moving fast, so a 2015 result won't bring much benefit. However, a post published two to three months ago will still be useful.
Google knows this and shows results that have been updated or published in the past few months.
There are also questions where the freshness of the results mostly doesn't matter, e.g. B. "How to Tie a Tie". This process has not changed in decades. So it doesn't matter whether the search results are from yesterday or 1998. Google knows this and has no concerns about the ranking of posts that were published years ago.
Google wants to rate content from websites that are responsible for the topic. This means that Google may see a website as a good source of results for queries on one topic, but not on any other.
Google talks about this in one of their patents:
Whether the search system considers a site authoritative usually depends on the query. (…) The search system may consider the Centers for Disease Control site, "cdc.gov", to be the authoritative site for a query "cdc.gov".CDC Mosquito Bites ”, but may not be the same website as authoritative for the“ Restaurant Recommendations ”query.
While this is just one of many patents filed by Google, we see indications in the search results for many searches that “current authority” matters.
Just look at the results for "Sous Vide Vacuum Sealers".
Here we see two little niche sites about sous vide cooking that are superior to the New York Times.
While other factors undoubtedly play a role here, it is likely that “current authority” is one of the reasons these sites ranked where they are.
This is probably the reason for Google SEO The starter guide asks website owners to do the following:
Maintain a reputation for expertise and trustworthiness in a specific area.
Nobody likes to wait for pages to load, and Google knows it. Because of this, they made page speed a ranking factor for desktop searches in 2010 and mobile searches in 2018.
Lots of people got stuck because of page speed. So it's worth noting that your pages don't have to be lightning fast to rank. According to Google, page speed is only an issue for pages that "give users the slowest experience".
In other words, shaving a few milliseconds from a website that is already fast is unlikely to improve its ranking. It just has to be fast enough so as not to negatively affect users.
You can check the speed of any webpage in PageSpeed Insights which will also generate suggestions to make the page faster.
PageSpeed Insights also shows how your page is doing on Core Web Vitals.
Core Web Vitals consist of three metrics that evaluate the loading performance, interactivity and visual stability of your web pages. Google has confirmed that Core Web Vitals will be a ranking signal from June 2021.
You can view the performance of all the pages on your website using the Core Web Vitals report in Google Search Console.
If a lot of URLs are performing badly or need improvement, contact a developer.
65% of Google searches are done on mobile devices. Because of this, cell phone friendliness has been a factor in mobile devices since 2015.
Since 2019, thanks to the switch from Google to mobile-first indexing, mobile-phone friendliness has also been a ranking factor for desktop searches. This means that Google "mostly uses the mobile version of the content for indexing and ranking" on all devices.
In other words, a lack of mobile friendliness can affect ranking anywhere.
You can check the mobile friendliness of any website using Google's mobile-friendly test tool or on the Internet Mobile ease of use Report in the Google Search Console.
Search engines understand that different results target different people. That's why they customize their results for each user.
If you've ever searched for the same item on multiple devices or browsers, you've likely seen the effects of this personalization. The results are often displayed in different locations depending on various factors.
Because of this personalization, this is the case SEOBetter to use a dedicated tool like Ahrefs' Rank Tracker to keep track of ranking positions. The reported positions in these tools are likely closer to the truth as they are surfing the internet in such a way that search engines do not get a lot of useful information for personalization.
How do search engines personalize results?
Google states that "Information like your location, past search history, and search settings help (us) tailor your results to what is most useful and relevant to you at the moment."
Let's take a closer look at these three things.
If you were looking for something like "Italian Restaurant", all of the results in the map package are local restaurants.
Google does this because you're unlikely to fly halfway around the world for lunch.
However, Google also uses your location to personalize search results outside of the map package. When we scroll down our search for “Italian Restaurant” even TripAdvisor results are personalized, and we see that many of the top results are local restaurant websites.
The same applies to a request such as "Buy a house". Google is returning pages with local rather than national listings since you probably don't want to move to another country.
Your location affects the results of local queries so dramatically that there is virtually no overlap if you search for the same object in two different locations.
Google knows that there is no point in showing English results to Spanish users. For this reason, Google ranks in the English version of our YouTube SEO Tutorial for the English search and the Spanish version for the Spanish search.
However, Google relies somewhat on website owners to do so. If you have pages in multiple languages, Google may not realize it is unless you let us know.
You can do this with one HTML Attribute called hreflang.
Hreflang is a bit complicated and is way beyond the scope of this guide. Basically, however, it's a little piece of code that shows the relationship between multiple versions of the same page in different languages.
3. Search history
Perhaps the most obvious example of Google using search history to personalize results is when a previously clicked result is ranked higher the next time you run the same search.
It doesn't always happen, but it seems to be quite common – especially if you click or visit the page multiple times in a short amount of time.
Let's sum that up
Understanding how search engines work is the first step in ranking higher and getting more traffic on Google. If search engines can't find, crawl, and index your pages, you're dead in the water before you even start.
If you want to know how to do this and how to start optimizing your website SEORead our guide too SEO Basics.
Any questions? Let me know in the comments or on Twitter.