-
-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't index page 1, page 2, ..., page n #970
Comments
Hi @ploeh, I don't think it's a good idea to disable indexing in your case, as there's no sitemap available for the search engine. Instead, I suggest adding canonical URLs to the meta information of the relevant pages. That should ideally be enough to address the issue. Here's a helpful link. Best of luck with your article writing! |
Maybe a wild idea, but a different angle of approach: As your blog has pretty sustained rate of new content, why don't you change paging to be time-invariant? E.g. having a page per month. Your homepage could always show posts from the current month plus the last one (to ensure there's always at least a month of content), then a link to "October 2024", etc. Or, maybe even simpler, just count the pages from the end? |
Would adding one help? After all, the archive contains almost everything of interest on the site, apart from the About page and perhaps a few other pages. I don't think it'd be hard to get Jekyll to generate a similar sitemap file. Be that as it may, Google's documentation seems to indicate that it's not really required:
It goes on to talk a bit more about large sites, where it can be difficult to ensure that all pages are being linked to, but that's hardly an issue here, as the archive links to everything of interest, again apart from a few special pages. Those, however, are linked from the 'top menu' on each page. And to be clear, the Archive page is automatically generated by Jekyll.
I apologize for being dense, but even after perusing the link you provided, I don't understand how that helps. It's not that I have alternative URLs pointing to the same page... Could I perhaps ask you to elaborate a bit? |
Both of these would address the issue, I suppose. Still, Page 18 or Page 44 aren't really useful pages as far as I can tell, even if they were stable. The more I think about it, the more I'm considering entirely getting rid of all of those extra pages... |
@ploeh I was wrong and the site has a sitemap. I'm sorry about that. I'm afraid that if you start disallowing indexing, it might break something, I've heard of things like that. I think the least invasive option is to add canonical links. If that doesn't work, then you can think further.
The idea of canonical links is that the search engine knows where the original source is. It will then display links to the original in the results. If there are no canonical links, Google chooses the links itself. In your case, it is wrong. Of course, search engines can be wrong. |
As a standard blog site, ploeh blog has a set of pages that a user may navigate using next and previous buttons. I don't think that the average user would use that feature much, but according to Google Analytics, Pages is ranked number 31 on the site.
The top page on the site (ranked 1) is, perhaps not surprising, the 'home page' at https://blog.ploeh.dk.
All that said, I sometimes need to find stuff on the site, and while I often search the source code (i.e. the HTML files), I also occasionally use a site-specific web search, and I've noticed that web search results often list, say page 18 or page 58, simply because the crawler found a particular keyword on that page at that time.
These pages are 'aggregation pages', and articles move around on these pages as they get pushed further into the past. Therefore these search results aren't useful.
What's the best way to tell search engines to not index these pages? robots.txt?
I'm not up to date with modern SEO techniques, so would appreciate input if
robots.txt
isn't the best option.The text was updated successfully, but these errors were encountered: