Brian, Thanks for noticing that block. Unfortunately, something is still blocking Google. Their “Live Test” still comes back with a crawl anomaly. In they documents they claim that their bots can come from a wide array of ip addresses and that they don’t publish them. Is it possible that there is another ip or block that has been blocked off, that you might be able to be opened. At least long enough to see if that fixes the problem. Google won’t say what all their googlebot IPs are but I found this: Known Googlebots: 64.233.160.0 64.233.191.255 66.102.0.0 66.102.15.255 66.249.64.0 66.249.95.255 72.14.192.0 72.14.255.255 74.125.0.0 74.125.255.255 209.85.128.0 209.85.255.255 216.239.32.0 216.239.63.255
Google owns these (maybe google bots) 64.18.0.0/20 64.18.0.0 - 64.18.15.255 64.233.160.0/19 64.233.160.0 - 64.233.191.255 66.102.0.0/20 66.102.0.0 - 66.102.15.255 66.249.80.0/20 66.249.80.0 - 66.249.95.255 72.14.192.0/18 72.14.192.0 - 72.14.255.255 74.125.0.0/16 74.125.0.0 - 74.125.255.255 108.177.8.0/21 108.177.8.0 - 108.177.15.255 172.217.0.0/19 172.217.0.0 - 172.217.31.255 173.194.0.0/16 173.194.0.0 - 173.194.255.255 207.126.144.0/20 207.126.144.0 - 207.126.159.255 209.85.128.0/17 209.85.128.0 - 209.85.255.255 216.58.192.0/19 216.58.192.0 - 216.58.223.255 216.239.32.0/19 216.239.32.0 - 216.239.63.255 2001:4860:4000::/36 2001:4860:4000:0:0:0:0:0 - 2001:4860:4fff:ffff:ffff:ffff:ffff:ffff 2404:6800:4000::/36 2404:6800:4000:0:0:0:0:0 - 2404:6800:4fff:ffff:ffff:ffff:ffff:ffff 2607:f8b0:4000::/36 2607:f8b0:4000:0:0:0:0:0 - 2607:f8b0:4fff:ffff:ffff:ffff:ffff:ffff 2800:3f0:4000::/36 2800:3f0:4000:0:0:0:0:0 - 2800:3f0:4fff:ffff:ffff:ffff:ffff:ffff 2a00:1450:4000::/36 2a00:1450:4000:0:0:0:0:0 - 2a00:1450:4fff:ffff:ffff:ffff:ffff:ffff 2c0f:fb50:4000::/36 2c0f:fb50:4000:0:0:0:0:0 - 2c0f:fb50:4fff:ffff:ffff:ffff:ffff:ffff
I will certainly be very respectful of the bandwidth. As I said before, we really don’t get a lot of hits and the site is more for our members than anyone else (plus the occasional new person wanting to join). Someone mentioned that my page size is a bit large. Yes, I do have some Javascript and it does make it appear as though the page size is 3mb, but that is a deceptive assessment. That 3mb includes a jQuery library, that most people already have in their cache, since so many people are using jQuery. In actual fact, if jQuery is on your machine (Likely) the actual page size is in the mid kbs. A quote from jQuery Doc: "If you serve jQuery from a popular CDN such as Google's Hosted Libraries or cdnjs, it won't be redownloaded if your visitor has been on a site that referenced it, from the same source (as long as the cached version has not expired).”
Thanks for trying to help me resolve this.
Roger VA7LBB
On May 14, 2019, at 12:00 PM, 44net-request@mailman.ampr.org wrote:
Send 44Net mailing list submissions to 44net@mailman.ampr.org
To subscribe or unsubscribe via the World Wide Web, visit https://mailman.ampr.org/mailman/listinfo/44net or, via email, send a message with subject or body 'help' to 44net-request@mailman.ampr.org
You can reach the person managing the list at 44net-owner@mailman.ampr.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of 44Net digest..." Today's Topics:
- Portal API (Nate Sales)
- Re: Google indexing (Brian Kantor)
- Re: Google indexing (Rob Janssen)
From: Nate Sales nate.wsales@gmail.com Subject: Portal API Date: May 13, 2019 at 2:22:55 PM PDT To: AMPRNet working group 44net@mailman.ampr.org
Hello, Is there any plan to make the API more complete? It would be really cool to be able to update gateways and such programatically. 73, -Nate
From: Brian Kantor Brian@bkantor.net Subject: Re: [44net] Google indexing Date: May 13, 2019 at 2:25:38 PM PDT To: AMPRNet working group 44net@mailman.ampr.org
On Mon, May 13, 2019 at 11:58:18AM -0700, Roger wrote:
I wanted to thank everyone for their help with the google issue I’m having. It is not resolved but I’ve made some discoveries. It looks like a fair number of the ampr.org sites that come up on google may in fact be done via BGP. Rob’s is and the others that I did a traceroute on, terminate on an address that is not 44. But that said, I now think this is a 100% Google issue. I don’t know what kind of stupidity they are up to but Yandex and Bing, have no problems indexing my site. I have read of others having similar issues. Bing and Yandex actually use Google’s same system for verification and they crawl just fine.
73 Roger VA7LBB
After Roger mentioned that AMPRNet BGP-advertised web sites were getting indexed, but not very many others, and then someone posted that Google's indexing bots often run in the IP address range 66.249.x.x, I took a look at the ingress filter in amprgw.
66.249.90.x and 66.249.91.x were indeed blocked.
I have unblocked them. Roger, you may see Google crawling your web site from addresses in those subnets now. If you have some way to stimulate them to do so, you might want to try that.
I don't know how among many possible ways that those addresses got on the blocking list, as it was too long ago for the current logs to reflect it.
- Brian
From: Rob Janssen pe1chl@amsat.org Subject: Re: [44net] Google indexing Date: May 14, 2019 at 11:01:09 AM PDT To: "44net@mailman.ampr.org" 44net@mailman.ampr.org
66.249.90.x and 66.249.91.x were indeed blocked.
Ahh... that explains a lot!
I don't know how among many possible ways that those addresses got on the blocking list, as it was too long ago for the current logs to reflect it.
Maybe there was "a lot" of traffic? Possibly also "a lot" in terms of those days.
But of course everyone running a website on an IPIP tunneled ampr.org site has some responsibility in this. Make sure when you have areas with lots of data, those large files are not indexed. This can be done using robots.txt files, headers in the page content, etc.
E.g. you run a site with equipment schematics. You have some text pages with indexes and a lot of huge PDF files with the scanned schematics themselves. It is not difficult to make Google (and other crawlers) index only the text index files and not the PDFs.
Or you have a local amateur group site and it has lots of photographs and maybe even video of the fieldday or other events. It is possible to make the huge 30-megapixel photographs and the video not being indexed and only index the text content and maybe the thumbnails.
When this is done in a responsible manner, indexing the websites that are behind IPIP tunnels should not cause much more "useless traffic" than there already is due to jerks like shodan.io, stretchoid.com and the like. (those are scanning the entire IP range, not just websites that have been announced to Google or are linked from other sites)
Rob
44Net mailing list 44Net@mailman.ampr.org https://mailman.ampr.org/mailman/listinfo/44net