What are sitemaps and what are they used for
Sitemaps are files that provide information about the URLs that make up a website. We find the URLs of the different pages, images, videos, other files...
Obviously these files are not made for users. Take a look at them, they're an uglier column of code than a refrigerator behind.
Google and other search engines crawl all the links contained in the sitemaps to cover all those URLs quickly, without having to find these pages by tracking internal links on the web.
We can say that it is like an index where all the URLs that we consider to be priorities are listed. Or so it should be.
A common thing is to generate different sitemaps segmented by type of pages.
For example, if we have an ecommerce, we can have a sitemap for products, another for category pages, another for the rest of the pages that make up the site... and if we have a blog it can also be interesting to separate pages, posts...
It is even advisable to generate sitemaps for images where we can add more information such as the type of image, the theme, license... and the same with the videos.
When should we use a sitemap?
My personal recommendation is that whenever technically possible, you have a file of this type on your website.
They are not mandatory, in fact on websites that have few URLs and are well structured, crawling by Google will not be a problem, but we have situations where it is 100% recommended to have a file of this type.
Large, poorly internally linked sites
When we manage websites with many URLs, many levels of depth, robots can have difficulty finding and crawling all pages.
If we have a file where we indicate all the URLs to be indexed, we are making the crawl accessible. Let's avoid those distractions of robots.
The same is true if you have sections that are not very well linked or even isolated. In these cases it is essential, otherwise Google would not be able to reach those pages.
When your site doesn't even know the tact
If we launch a new website, regardless of the number of pages it contains, we need to give robots a little information about their existence.
Since we don't have external links that point to our project, Google will hardly reach our website.
The recommendation is to generate this file and send it to Google for regular crawling.
And how do we ship it?... Well, now we're going to see it, but first we need to know how we can generate this file.
How to set up a sitemap
The first thing you should know is that these files need a certain amount of refreshment or dynamism.
What do I mean by this? Well, they will be living files that must include and remove the URLs according to their life. Every time you create a new blog post, you need to add it to the sitemap. The same with products and the rest of the pages.
Obviously, when we delete URLs we will also have to remove them from the sitemap, otherwise we will be inviting us to crawl pages with 404 or redirects. I don't know if Google is very thankful for wasting time, I already tell you no.
In addition to avoiding URLs with response codes other than 200, there are others that we should not include. Let's see:
- URLs with noindex
- Non-canonical URLs
- URLs blocked by robots.txt
- URLs with password
** How do we generate it? **
Most of the CMS we use have modules, extensions, plugins that help us automate the creation and management of this file.
This is great, but many times these modules don't take into account what we've talked about before and they put all the URLs of our website in the sitemaps. MISTAKE
We need them to comply with the guidelines we need, being able to manage which URLs should and should not be in the file.
Another option available to us is to generate our own file manually.
We can extract all the URLs that we want to include and mount it in a .xml file. This would be a Chinese job and remember that you should update it constantly.
Something in between we can do it through Screaming Frog. It has an option to generate sitemaps after a crawl of your website. It does everything automatically and generates the clean file of URLs that do not meet the conditions I have told you.
Uploading the Sitemap file to Search Console
It is just as important to have this file generated as it is to send it to Google directly.
If you have a Search Console property generated for your website, you'll know that there's an option to send sitemaps for crawling.
You enter the path of the file and the rest of the work will be done by Google robots.
You will be able to identify errors in that file and cross-check the rest of the data offered by Search Console. For example, you can see the coverage report filtered by the URLs you have in the sitemap, which is very valuable for seeing how the indexing of those URLs that we consider to be priorities is doing.
Frequently Asked Questions Sitemap
We have seen the general concepts about this type of file and now we are going with some of the frequently asked questions:
** SHOULD THE SITEMAP.XML FILE BE IN THE** ROOT**? **
The answer is no. In fact, many modules will generate these files inside their own folders, so it will not be possible to create them in the root directory. That's what Search Console is for, so we can indicate the precise route.
** DO THEY HAVE TO HAVE THE EXACT NAME "SITEMAP.XML “? **
Of course not, you can call it whatever suits you best. What would you do if you had 3 or 4 different sitemaps?
** IS THE SITEMAP INDEX HARMFUL? **
One thing I haven't told you is that it's possible to make a sitemap index file when we have more than one available. Search Console understands it perfectly and tracks it without problems.
** WHAT IS THE LIMIT OF URLS AND WEIGHT OF A SITEMAP.XML FILE? **
Sitemaps of up to 50,000 URLs or 50mb in weight are allowed.