Re: [44net] 44Net Digest, Vol 8, Issue 81

14 May 2019

      Brian,
Thanks for noticing that block. Unfortunately, something is still blocking Google.  Their “Live Test” still comes back with a crawl anomaly.  
In they documents they claim that their bots can come from a wide array of ip addresses and that they don’t publish them.  Is it possible 
that there is another ip or block that has been blocked off, that you  might be able to be opened. At least long enough to see if that fixes the problem.
Google won’t say what all their googlebot IPs are but I found this:
Known Googlebots:
64.233.160.0	64.233.191.255
66.102.0.0	66.102.15.255
66.249.64.0	66.249.95.255
72.14.192.0	72.14.255.255
74.125.0.0	74.125.255.255
209.85.128.0	209.85.255.255
216.239.32.0	216.239.63.255
Google owns these (maybe google bots)
64.18.0.0/20	64.18.0.0 - 64.18.15.255
64.233.160.0/19	64.233.160.0 - 64.233.191.255
66.102.0.0/20	66.102.0.0 - 66.102.15.255
66.249.80.0/20	66.249.80.0 - 66.249.95.255
72.14.192.0/18	72.14.192.0 - 72.14.255.255
74.125.0.0/16	74.125.0.0 - 74.125.255.255
108.177.8.0/21	108.177.8.0 - 108.177.15.255
172.217.0.0/19	172.217.0.0 - 172.217.31.255
173.194.0.0/16	173.194.0.0 - 173.194.255.255
207.126.144.0/20	207.126.144.0 - 207.126.159.255
209.85.128.0/17	209.85.128.0 - 209.85.255.255
216.58.192.0/19	216.58.192.0 - 216.58.223.255
216.239.32.0/19	216.239.32.0 - 216.239.63.255
2001:4860:4000::/36	2001:4860:4000:0:0:0:0:0 - 2001:4860:4fff:ffff:ffff:ffff:ffff:ffff
2404:6800:4000::/36	2404:6800:4000:0:0:0:0:0 - 2404:6800:4fff:ffff:ffff:ffff:ffff:ffff
2607:f8b0:4000::/36	2607:f8b0:4000:0:0:0:0:0 - 2607:f8b0:4fff:ffff:ffff:ffff:ffff:ffff
2800:3f0:4000::/36	2800:3f0:4000:0:0:0:0:0 - 2800:3f0:4fff:ffff:ffff:ffff:ffff:ffff
2a00:1450:4000::/36	2a00:1450:4000:0:0:0:0:0 - 2a00:1450:4fff:ffff:ffff:ffff:ffff:ffff
2c0f:fb50:4000::/36	2c0f:fb50:4000:0:0:0:0:0 - 2c0f:fb50:4fff:ffff:ffff:ffff:ffff:ffff
I will certainly be very respectful of the bandwidth. As I said before, we really don’t get a lot of hits and the site is more for our members than anyone else (plus the occasional new person wanting to join).
Someone mentioned that my page size is a bit large. Yes, I do have some Javascript and it does make it appear as though the page size is 3mb, but that is a deceptive assessment. That 3mb includes a jQuery library, that most people already have in their cache, since so many people are using jQuery. In actual fact, if jQuery is on your machine (Likely) the actual page size is in the mid kbs. A quote from jQuery Doc:
"If you serve jQuery from a popular CDN such as Google's Hosted Libraries or cdnjs, it won't be redownloaded if your visitor has been on a site that referenced it, from the same source (as long as the cached version has not expired).”
Thanks for trying to help me resolve this.
Roger
VA7LBB
...
On May 14, 2019, at 12:00 PM, 44net-request@mailman.ampr.org wrote:
Send 44Net mailing list submissions to
   44net@mailman.ampr.org
To subscribe or unsubscribe via the World Wide Web, visit
   https://mailman.ampr.org/mailman/listinfo/44net
or, via email, send a message with subject or body 'help' to
   44net-request@mailman.ampr.org
You can reach the person managing the list at
   44net-owner@mailman.ampr.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of 44Net digest..."
Today's Topics:

Portal API (Nate Sales)
Re: Google indexing (Brian Kantor)
Re: Google indexing (Rob Janssen)

From: Nate Sales nate.wsales@gmail.com
Subject: Portal API
Date: May 13, 2019 at 2:22:55 PM PDT
To: AMPRNet working group 44net@mailman.ampr.org
Hello,
Is there any plan to make the API more complete? It would be really cool to
be able to update gateways and such programatically.
73,
-Nate
From: Brian Kantor Brian@bkantor.net
Subject: Re: [44net] Google indexing
Date: May 13, 2019 at 2:25:38 PM PDT
To: AMPRNet working group 44net@mailman.ampr.org
On Mon, May 13, 2019 at 11:58:18AM -0700, Roger wrote:
...
I wanted to thank everyone for their help with the google issue I’m having. It is not resolved but I’ve made some discoveries. It looks like a fair number of the ampr.org sites that come up on google may in fact be done via BGP. Rob’s is and the others that I did a traceroute on, terminate on an address that is not 44.
But that said, I now think this is a 100% Google issue. I don’t know what kind of stupidity they are up to but Yandex and Bing, have no problems indexing my site. I have read of others having similar issues. Bing and Yandex actually use Google’s same system for verification and they crawl just fine.
73
Roger
VA7LBB
After Roger mentioned that AMPRNet BGP-advertised web sites were
getting indexed, but not very many others, and then someone posted
that Google's indexing bots often run in the IP address range
66.249.x.x, I took a look at the ingress filter in amprgw.
66.249.90.x and 66.249.91.x were indeed blocked.
I have unblocked them.  Roger, you may see Google crawling your web
site from addresses in those subnets now.  If you have some way to
stimulate them to do so, you might want to try that.
I don't know how among many possible ways that those addresses got
on the blocking list, as it was too long ago for the current logs
to reflect it.

Brian

From: Rob Janssen pe1chl@amsat.org
Subject: Re: [44net] Google indexing
Date: May 14, 2019 at 11:01:09 AM PDT
To: "44net@mailman.ampr.org" 44net@mailman.ampr.org
...
66.249.90.x and 66.249.91.x were indeed blocked.
Ahh... that explains a lot!
...
I don't know how among many possible ways that those addresses got
on the blocking list, as it was too long ago for the current logs
to reflect it.
Maybe there was "a lot" of traffic?  Possibly also "a lot" in terms of those days.
But of course everyone running a website on an IPIP tunneled ampr.org site has some
responsibility in this.  Make sure when you have areas with lots of data, those large
files are not indexed.  This can be done using robots.txt files, headers in the page
content, etc.
E.g. you run a site with equipment schematics.  You have some text pages with indexes
and a lot of huge PDF files with the scanned schematics themselves.  It is not difficult
to make Google (and other crawlers) index only the text index files and not the PDFs.
Or you have a local amateur group site and it has lots of photographs and maybe even
video of the fieldday or other events.  It is possible to make the huge 30-megapixel
photographs and the video not being indexed and only index the text content and maybe
the thumbnails.
When this is done in a responsible manner, indexing the websites that are behind IPIP
tunnels should not cause much more "useless traffic" than there already is due to
jerks like shodan.io, stretchoid.com and the like.
(those are scanning the entire IP range, not just websites that have been announced
to Google or are linked from other sites)
Rob

44Net mailing list
44Net@mailman.ampr.org
https://mailman.ampr.org/mailman/listinfo/44net

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [44net] 44Net Digest, Vol 8, Issue 81