The Spam Club

» The Spam Club - Life, The Universe and Everything - Site Issues - Googlebot - Reply

Reply

Username:
Not Authentication Code (blank):
Password:
Guest Password: ujOMs
Post:
Attachment: (max. 5000000 bytes)
Mail Notification?Yes
No

Last 20 Posts (View All)

Posted at 09:23 on October 28th, 2019 | Quote | Edit | Delete
Avatar
Admin
Reborn Gumby
Posts: 11126
Hey, thanks, this is some valuable research in what I consider one of the blackest black boxes on the Internet :D

Re. wget: a now needless measure from legacy times for sure. Should be changed, I agree. Though I don't see how this, in any way, should affect Google.

Re. 302: a few years ago, Google announced they would penalize "non mobile friendly" websites in their search results. The contents of both site versions here are the same, the only difference is an extra stylesheet. Hence this. Again, I don't see how that would affect crawling negatively. Should actually have a positive effect, since duplicated contents under several URLs seems also not well-liked.

Re. Google-Images: maybe also somewhat archaic decision from a long time ago. Nevertheless, the basic feeling remains. Whenever I tested Google's image search, I found it virtually impossible to find a link to the image source page on their (I believe) intentionally messy layout. I don't see how that could possibly bring any visitors to the website. On top, do you really think Google penalizes websites which do this in their textual search results (actually, it wouldn't surprise me, considering how they put the definition of "evil" to whole new levels unheard of before)?

Re. linking: sure. I just don't know or use many other websites. Seriously, I'm useless to do any promitional work. You are spot on with your assessment: the "network" we used to have with other websites simply went away. Nothing replaced it. This far, I'm following. But I'm clueless as to what to do.
-----
Now you see the violence inherent in the system!
Posted at 04:18 on October 27th, 2019 | Quote | Edit | Delete
Member
Baby Gumby
Posts: 4
By the way, Google is definitely not delisting goodolddays.net for copyright violation. Google only does that when they are forced to. In the US that requires the copyright holder to file a DMCA request, and Google publishes every single request.

You can search for URLs delisted by Google by going to Google's Transparency Report or the Lumen Database. You'll see that there have been no DMCA requests concerning goodolddays.net.


For a contrasting example, search for abandonwaredos.com and you'll see that there are 17 URLs that Google was told to take down. Of those, only three were actually removed (for example, their CD image of Grand Theft Auto 1 from 1997).

P.S. Yes, Google actually checks and refuses DMCA takedown requests! For example, here's one of Warner Brothers telling Google to remove 41 different URLs, including abandonwaredos.com, debian.org and (bizarrely) gog.com for carrying Angband, an open source game loosely derived from Tolkein's writings. Google ignored them.
-----
Edited by spamolatherobust at 04:28 on October 27th, 2019
Posted at 01:58 on October 27th, 2019 | Quote | Edit | Delete
Member
Baby Gumby
Posts: 4
Originally posted by Mr Creosote at 09:16 on October 17th, 2019:
Searching for
Code:
site:goodolddays.net/game
produces ~700 results. Meaning about half of the pages which I consider the main contents are not in their index at all.


Goodolddays.net is rather unfriendly to being indexed by robots. For example, if I try a simple
Code:
wget www.goodolddays.net

I get a 403 FORBIDDEN response. There is no good reason to do that.

I can convince goodolddays.net to allow `wget` access like so:
Code:
wget --user-agent Googlebot www.goodolddays.net

but then, instead of showing me the normal page, I get the response 302 MOVED TEMPORARILY, redirecting Google's search bot to only index the mobile version of the site: https://m.goodolddays.net.

Further, https://goodolddays.net/robots.txt specifies that Google Images is not allowed to crawl the website. I'm not sure if the idea was that Google Images was somehow "stealing" bandwidth by showing people images out of context, but I would suspect it doesn't help the Page Rank or site reputation.

Which leads me to perhaps the most important point: Google ranks pages based primarily upon how many other websites link to that page and of what quality those websites are. For example, vice.com, which is highly ranked, links to goodolddays.net's entry for Deluxe Paint 2. That not only increases the Page Rank for the Deluxe Paint 2 page, but also improves the reputation of goodolddays.net in general.

The best way for goodolddays.net to be in the top 10 results is to have a healthy community of other websites that link to it. (Note that comments in forums and blogs don't count as they use rel=nofollow). If the "abandonware" community is not thriving, that could explain why goodolddays.net is not ranked as highly ranked by Google as it used to be. Still, it should be possible to improve the page rank. I'd start with fixing the website so that it is friendly to robots.
-----
Edited by spamolatherobust at 04:25 on October 27th, 2019
Powered by Spam Board 5.2.4 © 2007 - 2011 Spam Board Team