In today’s Internet, there are many channels for acquiring traffic, but for independent websites, Google is still an important source of traffic. This is because Google has a huge user base, and through Google, potential users can find our website more easily. However, the reality is that some websites may not be included in Google for a long time, which leads to the embarrassing situation that these independent websites are rarely visited. So, if you encounter a situation where many pages of your website are not included in Google, or have not been included for a long time, it means that there may be some problems with the website. This article summarizes some common problems and solutions for websites not being included in Google, and I hope it will be inspiring to you.
How is Google indexed?
Google’s inclusion method is to automatically visit your website through crawlers and add the content and information of the website to its index database. Google’s crawlers will evaluate your website according to algorithmic rules to determine whether it is useful to users and decide whether to add your website to its index database.
- Crawling and indexing are the two main processes by which search engines process and organize Internet pages.
Crawling: Crawling refers to the process by which search engines automatically access web pages on the Internet through programs (crawlers) to extract web page content and links. The crawler will start from a starting URL, continuously crawl other web pages through links in the web page, and store the content of these pages in the search engine’s database. Crawling is the process by which search engines discover new web pages and update the content of existing web pages. - Indexing: Indexing is the process of organizing and storing the crawled web page content. The search engine will analyze the crawled web pages and extract the key information of the web pages, such as the title, body, URL, etc. Then, this information will be added to the search engine’s index database. Indexing is a data structure that the search engine creates for fast access based on the web page content to support subsequent search operations.
Through crawling and indexing, search engines can build an index database containing a large amount of web page information. When a user enters a search query, the search engine will match relevant web pages in the index database and display the most relevant search results to the user based on a certain algorithm. Crawling and indexing are the basic steps for search engines to achieve accurate and efficient search.
If you can’t access Google for some well-known reasons, you can use this magic tool .
How to check if your page is indexed and included by Google?
How to calculate the inclusion rate
The index rate of a website refers to the ratio of the number of pages on the website that have been indexed by search engines to the total number of pages on the website. Calculating the index rate of a website can help you understand the extent to which search engines cover your website and whether there are any pages that have not been indexed. Here is how to calculate the index rate of a website:
- Determine the total number of pages on your website: First, you need to determine how many pages your website has. This includes the homepage, subpages, articles, product pages, tag pages, category pages, etc. You can use a sitemap to help you list all the pages, or use a website analysis tool to get this information.
- Determine the number of pages indexed: You can use Google Search Console to count the number of indexed pages, or you can use some method tools.
- To calculate the index rate, divide the number of pages indexed by Google by the total number of pages on your site, then multiply the result by 100 to get the percentage. The formula for the index rate is as follows:
收录率 = (已被Google索引的页面数量 / 总页面数量) * 100
Copy
For example, if your website has a total of 100 pages and Google has indexed 80 of them, then your website’s inclusion rate is 80%.
How to check if a webpage is indexed and included by Google
- To search Google using the site: command: Type “site:yourwebsite.com” into the Google search bar (replace “yourwebsite.com” with your website domain name) and press Enter. This will display all the pages related to your website that are indexed by Google. If any pages appear in the search results, then they have been indexed by Google.
- Use Google Search Console: Google Search Console is a powerful tool for managing your website’s performance in Google Search. If you haven’t added your website to Google Search Console yet, you can sign up for a free account and verify your website ownership. For specific operations, please refer to the ” Tutorial on How to Quickly Get Indexed by Google “. In Google Search Console, you can view detailed information about your website’s indexing, including which pages are indexed, whether there are indexing errors, and which keywords bring traffic.
- Use website analysis tools: In addition to Google Search Console, there are some third-party website analysis tools, such as SEMrush, Ahrefs, and Moz, that can help you monitor your website’s indexation. These tools provide detailed data about your website’s performance in search engines and which pages are indexed.
- Manually check if a page is indexed: If you want to manually check if a specific page is indexed by Google, you can enter the URL of the page in the Google search bar and check if there are any corresponding search results. If the search results include your page, then it is indexed.
Common reasons and solutions for websites not being included in Google
New site not included
If your website is newly created, it may take some time for it to be indexed by search engines. New websites usually need to wait for search engine crawlers to discover and index their content. Generally speaking, it may take days to weeks to be indexed.
If you want Google to discover and index your website more quickly and comprehensively, you can proactively submit your website’s sitemap to Google Search Console. If you don’t know how to do it, this article ” Tutorial on How to Quickly Get Indexed by Google ” has a detailed introduction. In highly competitive industries, search engines may be more picky and slower to index new websites. More effort is needed to improve the quality and relevance of the website.
Robots.txt file blocks indexing
The robots.txt file is a text file located in the root directory of WordPress. It is used to guide search engine crawlers to determine which pages should be crawled and which should not be crawled. A properly configured robots.txt file can help prevent crawlers from crawling unnecessary content, and it can also help manage crawl quotas to ensure that resources are used reasonably.
If your website’s robots.txt file contains instructions that do not allow search engines to crawl, such as “Disallow: /”, search engines will not index your website content. Make sure the robots.txt file is configured correctly to allow search engines to crawl important pages.
So how do you check your website’s robots.txt file? First, open your browser and type directly in the address bar:
http://yourdomain.com/robots.txt
Copy
Allow means web pages that are allowed to be crawled, and Disallow means web pages that are not allowed to be crawled.
Check your robots.txt settings. If you find the following snippet:
User-agent: Googlebot
Disallow: /
User-agent: *
Disallow: /
Copy
User-agent stands for user agent. The asterisk (*) in the line means that it applies to all possible crawlers, that is, all crawlers must crawl the website content reasonably according to the rules in the robots file. The
forward slash (/) in the Disallow line tells the crawler that all pages of the website are prohibited from crawling.
The meaning of these lines of code is to tell Google crawlers not to crawl any pages of your website. To solve this problem, it is very simple, just delete these codes.
Next, double-check any other “Disallow” rules in your robots.txt file. If any of them contain pages you want to be indexed, be sure to remove the corresponding “Disallow” rules.
Meta tag Noindex indicates not to be indexed
Sometimes, the HTML header of a website page may contain a <meta name=”robots” content=”noindex”> tag, which tells search engines not to index the page. Check your website’s HTML code to make sure there are no such tags.
To check: right-click the page -> View page source -> search for “noindex”.
If you find the following line of code:
<meta name="robots" content="noindex,nofollow" />
Copy
Then all you have to do is remove this line of code.
How to find pages with the Noindex tag?
First, connect your website in Google Search Console and select “Indexing” – “Pages”.
Next, you can view the reasons why the webpage is not indexed on this page. If a webpage has a Noindex tag, it will be classified as “Excluded by ‘noindex’ tag”, which means that these pages are excluded from the index.
You can click on this reason and then view the specific list of webpages. If you find that the list includes pages that should not have the Noindex tag, you can go to the backend of your website to edit and modify them.
If you want to speed up the re-indexing process of these pages, you can submit these pages in Google Search Console and let the search engine re-check them.
Please note: For a noindex rule to be effective, the page must not be blocked by a robots.txt file and must be accessible to search engine crawlers. If a page is excluded in a robots.txt file or is inaccessible to search engine crawlers, the noindex tag will not take effect and the page may still appear in search results, especially if other pages link to it, and its content may still be indexed.
Internal links with ‘nofollow’
Nofollow links are those with the ‘rel=”nofollow”‘ attribute. They exist to prevent link equity from being passed to the target page.
Here’s how Google handles ‘nofollow’:
In practice, using ‘nofollow’ causes the target link page to be excluded from the index.
However, if other sites link to the target page and do not use ‘nofollow’, or if the target page’s URL has been submitted to Google’s sitemap, then the target page may still appear in our index.
Therefore, for links within your own website, make sure all of these internal links are ‘follow’ links.
So, if you want your page to be indexed by Google, make sure to remove the ‘nofollow’ attribute in the internal links that link to the target page.
Low quality content
If your website has low-quality content, including duplicate content, low-quality content, or unoriginal content, search engines may reduce or stop indexing your website.
You can optimize your website in the following ways :
- Improve the quality of website content: Google pays the most attention to user experience. Therefore, it is crucial to pay attention to the quality of content on each page of the website. Providing high-quality content can provide users with a better experience. Therefore, it is important to ensure that the website content is substantial and avoid lengthy and useless content. At the same time, do not ignore the SEO of images and add meaningful Alt tags to images.
- Optimize website structure: Keep the website structure flat, make the URL concise and clear, avoid using dynamic garbled characters, and try to include keywords in the URL. Improve
website loading speed: Make sure the website loads quickly, which is crucial for user experience and search engine rankings. - Create a custom 404 error page: Configure a custom 404 error page to help users navigate to other helpful content when a page can’t be found.
- Optimize SEO TDK for each page: Set a good SEO title, description, and keywords (TDK) for each page, which helps search engines better understand the content of the page and improve the attractiveness of search results.
Lack of internal links
Google discovers new content through internal links in pages, so if your web pages lack internal links, it is difficult for search engines to automatically discover them. At the same time, visitors cannot directly access these pages through the internal navigation of the website.
Such pages without other internal links are also called orphan pages, so how to fix it ?
- If the page is not important, you may want to consider deleting it and excluding it from your sitemap. This will ensure that search engines don’t waste time crawling unimportant pages.
- If the page is important, you should add internal links to it from other web pages. This ensures that search engines and visitors can easily find and access this important page, improving the accessibility and search engine indexability of the website.
Non-compliant SEO practices may be punished
Using illegal SEO (search engine optimization) techniques, such as keyword stuffing, doorway pages, or other black hat SEO methods, can cause search engines to exclude your site or even penalize it.
If your site violates the search engine’s rules and policies, it may be penalized, including being excluded from the index. These violations may include malware, spam, copyright-infringing content, etc.
Final Thoughts
There are usually two main reasons why a web page is not indexed and included by search engines , and sometimes both problems may exist at the same time:
- Technical issues affecting indexing: Technical issues include misconfigured robots.txt files, slow page loading, server issues, duplicate content, 404 errors, etc. These issues can make it difficult for search engines to crawl and index your pages.
- Low-quality content or unclear value: If search engines believe that your webpage content is low-quality, unoriginal, or not of sufficient value, they may choose not to index your webpage. The goal of search engines is to provide quality and relevant search results, so quality issues may lead to inclusion problems.
In practice, technical issues are usually the main reason for page inclusion, but you also need to pay attention to the quality and relevance of your website content. By taking appropriate technical measures, such as fixing technical issues, optimizing page speed, configuring the correct robots.txt file, and improving the quality and relevance of your content, you can solve page indexing and inclusion problems and improve your website’s performance in search engines.