Evaluating link rot 

Tags :

After reading Brian Suda’s article on link rot, I ran his script on my Pinboard collection, and discovered that around 12% of my bookmarked links are invalid.

75808590951002006200820102012201420192021

Here’s the data used to plot the above graph. The total number of bookmarked links vary from year to year, with a high between 2010 and 2011.

Year Successful Bookmarks Bookmarks Average
200680109 73.39
200713216380.98
200818523778.05
200926831984.01
201075785588.53
201179489089.21
201220223087.82
201314315890.50
2014252696.15
2018515986.44
2019565994.91
2020495294.23
20213737100
202266100
Totals2785320088.87

Note: I didn’t use Pinboard between 2014 and 2017 😶

Brian’s script works like this:

The code looks through your bookmarks and attempts to fetch each URL. If the HTTP code is less than 400 we mark it as a success. Without manually checking every URL, there might be some false positives: people selling existing domains, hosting provider redirects, etc. If the status code was 400 or higher, we marked it as a failure. After some manual investigation, we realized that some domains were not allowing bots to crawl them. Our code was using cURL, which appears as a bot, so we faked a browser’s user-agent string and decreased our failure rate by ~4%.

Pinboard aka del.icio.us

I started to use del.icio.us back in 2006, when I discovered the service at “The Future of Web Apps London” and somehow forgot about it between 2014 and 2018.

I converted my account last year when Pinboard’s creator Maciej reached out to ask if we, original one-time payment users, would consider converting to a subscription model, helping him to continue maintaining and developing the service, and make a living out of it..

I was surprised that the numbers of invalid links weren’t higher, considering that a vast majority of the links of my blog are now invalid. There is probably a significant number of false positives among the 88.87% of valid bookmarks. Randomly clicking through old links turned up a fair amount of them.

I still need to finish my link checking script that replaces invalid links by a link to the Internet Archive Wayback Machine project.

Posted a response ? — Webmention it

This site uses webmentions. If you've posted a response and need to manually notify me, you can enter the URL of your response below.

Want more ? — prev/next entries