Now that you know what a log file is, and how a log file analysis can help you avoid wasting your crawl budget and make your website rank better, we would like to present our Server Log File Analysis Checklist which we have based on a great piece by JetOctopus.

You can use it each time you start analysing your website’s log file so that you won’t forget to take actions that would increase the indexability and crawlability of your website.

Orphaned Pages

Orphaned pages are pages that aren’t indexed or crawled because there is not even a single internal link that would point toward them.

If you are running an eCommerce website, then there are probably pages of products that you are no longer selling, in which case it would be understandable if there weren’t any internal links pointing toward them. You can delete them or add the noindex tag.

However, if there are any orphaned pages with relevant and useful information, you should add internal links to make them crawlable so that users could find them.

1. Identify orphaned pages that are of value to your website

2. Organise these pages into relevant categories or directories (this will make it easier to add them to your website’s structure)

3. Add internal links to these pages from other relevant URLs so that they can be discovered by crawl bots.

4. Noindex and delete any orphaned pages that have no value.

Pages With No Visits

Googlebot will send only a limited number of requests to crawl your website at a time. If the number of the pages on your website exceeds the crawl budget, which could happen due to a poor site architecture, duplicate content, or infinite spaces, some of the relevant pages won’t be crawled. As a result, users won’t be able to find them, which will negatively affect the rankings of your website and the user experience.

If you want to prevent it, or at least decrease the possibility of such deviations, there are several factors that you should consider.

Distance From Index (DFI)

Even if a particular page has internal links pointing toward it, but it takes way too many clicks to get there from the homepage, it might not be crawled. In short, the fewer clicks that are required, the higher the chance that search engine spiders will crawl the page.

You should give some serious consideration to how individual pages are interconnected so that the essential pages would be a short distance away from the homepage, which would improve their crawlability. If the pages with the most useful content require many clicks to get there, consider linking to them from a homepage or another page with a low DFI.

1. Ensure pages that are important are no more than 3 clicks away from the homepage.

2. Identify important pages that have not been crawled and add links to them from pages with a smaller DFI.

Inlinks

If there are external and internal links pointing toward a particular page, it is a sign for search engine crawlers that it might be useful to the users. As a result, it has a serious effect on the crawlability and rankings of the page.

1. Identify pages of importance that have not been crawled.

2. Increase the number of inlinks pointing towards these pages. Alternatively, you could outreach to other websites – external links would be fine too.

Wordcount

Although it isn’t necessarily true that pages with more content are better, you should generally avoid pages with less than 500 words, as they are less likely to be crawled. Not every page needs to have 1500+ words articles, but consider whether your audience would benefit from a longer piece of content. Alternatively, you could combine a few shorter pages into a single one.

Title Tags

Titles help both your audience and crawler bots understand in a short amount of time what your content is really about. Issues with title tags can prevent search engine bots from properly understanding the context of the text, decreasing the crawlability, indexability, and rankings of the page.

From an SEO perspective, the worst-case scenario is if the title tags are empty, which leads to confusion among your readers and robots alike. However, search engine crawlers might also get confused if the titles are duplicated. Although it’s a less serious problem, it could be harmful to your website as well.

1. Identify pages with empty title tags and optimise accordingly

2. Identify pages with duplicate title tags and make sure to replace with unique tags

Crawl Frequency

If you would like to make changes to the internal linking on your website, we recommend checking which pages are the most and the least frequently visited by crawlers. With the JetOctopus Log Analyzer, you can do so by clicking on charts in the “Pages by Bot Visits” section.

If you discover that some of the pages with the most useful content to the users are visited by bots very infrequently, it could be a sign that there are some issues that need to be fixed. As we have mentioned previously, the page might require too many clicks from the homepage, there could be not enough inlinks pointing toward it, or the page might lack title tags.

Non-Indexable Pages

Even though some pages are non-indexable, they are visited by search engine crawlers. It includes:

Pages with non-200 status code
Non-canonical pages
Pages that are blocked both in robots meta tag and X-Robots-Tag

Why are they being crawled? Usually, that’s because other pages on your website point toward them. It is especially common for the non-canonical pages to be mistakenly linked to as canonical, though it also happens if there are multiple language versions.

As a result, crawlers will visit non-indexable pages, wasting the crawl budget, which could lead to other important pages not being visited by robots.

1. Remove internal linking to pages that are non-indexable

2. Check if all the links point toward canonical pages or the appropriate language version.

HTTP Response Status Codes

Next, let’s focus on HTTP response status codes which we have already explored in the previous module. We have seen how some of them can have a considerable effect on your website’s crawlability and indexability, which is why you should monitor their numbers if you want to ensure your website’s health.

5xx Errors

5xx errors indicate that there are problems with the server’s response time, which is extremely bad from an SEO standpoint. It makes it harder for search engine bots to crawl your website, but it also leads to poor user experience.

To check if there are any 5xx errors presented on your website, go to the Health and Bot Dynamics Dashboards in JetOctopus Log Analyzer. If you discover that your website has a high number of 5xx errors, you might need to invest in better servers.

3xx Status Codes

3xx status codes indicate that a redirection is required for a user to arrive at the requested destination. If you discover that they are visited by crawlers, we recommend checking if:

They are linked to other pages.
They have a canonical tag.
Other language versions point toward them.
They can be found in XML sitemaps.
There are issues with redirect chains.

4xx Status Codes

4xx status codes can hurt the website’s crawlability, indexability, and rankings, as they indicate that the requested page couldn’t be found. Regardless of whether the page is permanently or temporarily missing, it will lead to a waste of crawl budget. To fix this issue, you should check the exact same things as in the case of 3xx status codes.

Page Load Time

How long it takes for a page to load has a huge impact on its crawlability, but it also affects user experience. If it takes too much time to load the page, the visitors will simply leave. That’s why monitoring the load time of the pages on your website is essential from an SEO perspective.

Refer to our Site Speed Optimisation from earlier in this module to learn how to improve your page load times.

Mobile-First Indexing

Mobile devices account for the majority of traffic, which is why currently, Google uses the mobile version of your pages for indexing. That’s why you should ensure that mobile users don’t encounter any issues while visiting your website. For a refresher on mobile-first indexing, you can head over to your dedicated chapter from this module.

User Experience

The final aspect that we want to mention is the user experience. You can use log files to discover how users navigate on your website and what problems they often encounter.

Consider which pages are most often visited, and which ones are skipped. If some of the pages with excellent content are rarely visited, you might need to rework the internal linking on your website, or there might be some technical issues that would require your attention.

Conclusion

Log file analysis can help you diagnose problems on your website that could be hurting its crawlability and rankings. We hope that this checklist will help you fix any issues you might encounter.