Brian,
Thanks for noticing that block. Unfortunately, something is still blocking Google. Their
“Live Test” still comes back with a crawl anomaly.
In they documents they claim that their bots can come from a wide array of ip addresses
and that they don’t publish them. Is it possible
that there is another ip or block that has been blocked off, that you might be able to be
opened. At least long enough to see if that fixes the problem.
Google won’t say what all their googlebot IPs are but I found this:
Known Googlebots:
64.233.160.0 64.233.191.255
66.102.0.0 66.102.15.255
66.249.64.0 66.249.95.255
72.14.192.0 72.14.255.255
74.125.0.0 74.125.255.255
209.85.128.0 209.85.255.255
216.239.32.0 216.239.63.255
Google owns these (maybe google bots)
64.18.0.0/20 64.18.0.0 - 64.18.15.255
64.233.160.0/19 64.233.160.0 - 64.233.191.255
66.102.0.0/20 66.102.0.0 - 66.102.15.255
66.249.80.0/20 66.249.80.0 - 66.249.95.255
72.14.192.0/18 72.14.192.0 - 72.14.255.255
74.125.0.0/16 74.125.0.0 - 74.125.255.255
108.177.8.0/21 108.177.8.0 - 108.177.15.255
172.217.0.0/19 172.217.0.0 - 172.217.31.255
173.194.0.0/16 173.194.0.0 - 173.194.255.255
207.126.144.0/20 207.126.144.0 - 207.126.159.255
209.85.128.0/17 209.85.128.0 - 209.85.255.255
216.58.192.0/19 216.58.192.0 - 216.58.223.255
216.239.32.0/19 216.239.32.0 - 216.239.63.255
2001:4860:4000::/36 2001:4860:4000:0:0:0:0:0 - 2001:4860:4fff:ffff:ffff:ffff:ffff:ffff
2404:6800:4000::/36 2404:6800:4000:0:0:0:0:0 - 2404:6800:4fff:ffff:ffff:ffff:ffff:ffff
2607:f8b0:4000::/36 2607:f8b0:4000:0:0:0:0:0 - 2607:f8b0:4fff:ffff:ffff:ffff:ffff:ffff
2800:3f0:4000::/36 2800:3f0:4000:0:0:0:0:0 - 2800:3f0:4fff:ffff:ffff:ffff:ffff:ffff
2a00:1450:4000::/36 2a00:1450:4000:0:0:0:0:0 - 2a00:1450:4fff:ffff:ffff:ffff:ffff:ffff
2c0f:fb50:4000::/36 2c0f:fb50:4000:0:0:0:0:0 - 2c0f:fb50:4fff:ffff:ffff:ffff:ffff:ffff
I will certainly be very respectful of the bandwidth. As I said before, we really don’t
get a lot of hits and the site is more for our members than anyone else (plus the
occasional new person wanting to join).
Someone mentioned that my page size is a bit large. Yes, I do have some Javascript and it
does make it appear as though the page size is 3mb, but that is a deceptive assessment.
That 3mb includes a jQuery library, that most people already have in their cache, since so
many people are using jQuery. In actual fact, if jQuery is on your machine (Likely) the
actual page size is in the mid kbs. A quote from jQuery Doc:
"If you serve jQuery from a popular CDN such as Google's Hosted Libraries or
cdnjs, it won't be redownloaded if your visitor has been on a site that referenced it,
from the same source (as long as the cached version has not expired).”
Thanks for trying to help me resolve this.
Roger
VA7LBB
On May 14, 2019, at 12:00 PM,
44net-request(a)mailman.ampr.org wrote:
Send 44Net mailing list submissions to
44net(a)mailman.ampr.org
To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.ampr.org/mailman/listinfo/44net
or, via email, send a message with subject or body 'help' to
44net-request(a)mailman.ampr.org
You can reach the person managing the list at
44net-owner(a)mailman.ampr.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of 44Net digest..."
Today's Topics:
1. Portal API (Nate Sales)
2. Re: Google indexing (Brian Kantor)
3. Re: Google indexing (Rob Janssen)
From: Nate Sales <nate.wsales(a)gmail.com>
Subject: Portal API
Date: May 13, 2019 at 2:22:55 PM PDT
To: AMPRNet working group <44net(a)mailman.ampr.org>
Hello,
Is there any plan to make the API more complete? It would be really cool to
be able to update gateways and such programatically.
73,
-Nate
From: Brian Kantor <Brian(a)bkantor.net>
Subject: Re: [44net] Google indexing
Date: May 13, 2019 at 2:25:38 PM PDT
To: AMPRNet working group <44net(a)mailman.ampr.org>
On Mon, May 13, 2019 at 11:58:18AM -0700, Roger wrote:
I wanted to thank everyone for their help with
the google issue I’m having. It is not resolved but I’ve made some discoveries. It looks
like a fair number of the
ampr.org sites that come up on google may in fact be done via
BGP. Rob’s is and the others that I did a traceroute on, terminate on an address that is
not 44.
But that said, I now think this is a 100% Google issue. I don’t know what kind of
stupidity they are up to but Yandex and Bing, have no problems indexing my site. I have
read of others having similar issues. Bing and Yandex actually use Google’s same system
for verification and they crawl just fine.
73
Roger
VA7LBB
After Roger mentioned that AMPRNet BGP-advertised web sites were
getting indexed, but not very many others, and then someone posted
that Google's indexing bots often run in the IP address range
66.249.x.x, I took a look at the ingress filter in amprgw.
66.249.90.x and 66.249.91.x were indeed blocked.
I have unblocked them. Roger, you may see Google crawling your web
site from addresses in those subnets now. If you have some way to
stimulate them to do so, you might want to try that.
I don't know how among many possible ways that those addresses got
on the blocking list, as it was too long ago for the current logs
to reflect it.
- Brian
From: Rob Janssen <pe1chl(a)amsat.org>
Subject: Re: [44net] Google indexing
Date: May 14, 2019 at 11:01:09 AM PDT
To: "44net(a)mailman.ampr.org" <44net(a)mailman.ampr.org>
66.249.90.x and 66.249.91.x were indeed blocked.
Ahh... that explains a lot!
I don't know how among many possible ways
that those addresses got
on the blocking list, as it was too long ago for the current logs
to reflect it.
Maybe there was "a lot" of traffic? Possibly also "a lot" in terms
of those days.
But of course everyone running a website on an IPIP tunneled
ampr.org site has some
responsibility in this. Make sure when you have areas with lots of data, those large
files are not indexed. This can be done using robots.txt files, headers in the page
content, etc.
E.g. you run a site with equipment schematics. You have some text pages with indexes
and a lot of huge PDF files with the scanned schematics themselves. It is not difficult
to make Google (and other crawlers) index only the text index files and not the PDFs.
Or you have a local amateur group site and it has lots of photographs and maybe even
video of the fieldday or other events. It is possible to make the huge 30-megapixel
photographs and the video not being indexed and only index the text content and maybe
the thumbnails.
When this is done in a responsible manner, indexing the websites that are behind IPIP
tunnels should not cause much more "useless traffic" than there already is due
to
jerks like shodan.io,
stretchoid.com and the like.
(those are scanning the entire IP range, not just websites that have been announced
to Google or are linked from other sites)
Rob
_______________________________________________
44Net mailing list
44Net(a)mailman.ampr.org
https://mailman.ampr.org/mailman/listinfo/44net