Re: [44net] 44Net Digest, Vol 8, Issue 81

14 May 2019

Brian,
Thanks for noticing that block. Unfortunately, something is still blocking Google.  Their
“Live Test” still comes back with a crawl anomaly.
In they documents they claim that their bots can come from a wide array of ip addresses
and that they don’t publish them.  Is it possible
that there is another ip or block that has been blocked off, that you  might be able to be
opened. At least long enough to see if that fixes the problem.
Google won’t say what all their googlebot IPs are but I found this:
Known Googlebots:
64.233.160.0    64.233.191.255
66.102.0.0      66.102.15.255
66.249.64.0     66.249.95.255
72.14.192.0     72.14.255.255
74.125.0.0      74.125.255.255
209.85.128.0    209.85.255.255
216.239.32.0    216.239.63.255
Google owns these (maybe google bots)
64.18.0.0/20    64.18.0.0 - 64.18.15.255
64.233.160.0/19 64.233.160.0 - 64.233.191.255
66.102.0.0/20   66.102.0.0 - 66.102.15.255
66.249.80.0/20  66.249.80.0 - 66.249.95.255
72.14.192.0/18  72.14.192.0 - 72.14.255.255
74.125.0.0/16   74.125.0.0 - 74.125.255.255
108.177.8.0/21  108.177.8.0 - 108.177.15.255
172.217.0.0/19  172.217.0.0 - 172.217.31.255
173.194.0.0/16  173.194.0.0 - 173.194.255.255
207.126.144.0/20        207.126.144.0 - 207.126.159.255
209.85.128.0/17 209.85.128.0 - 209.85.255.255
216.58.192.0/19 216.58.192.0 - 216.58.223.255
216.239.32.0/19 216.239.32.0 - 216.239.63.255
2001:4860:4000::/36     2001:4860:4000:0:0:0:0:0 - 2001:4860:4fff:ffff:ffff:ffff:ffff:ffff
2404:6800:4000::/36     2404:6800:4000:0:0:0:0:0 - 2404:6800:4fff:ffff:ffff:ffff:ffff:ffff
2607:f8b0:4000::/36     2607:f8b0:4000:0:0:0:0:0 - 2607:f8b0:4fff:ffff:ffff:ffff:ffff:ffff
2800:3f0:4000::/36      2800:3f0:4000:0:0:0:0:0 - 2800:3f0:4fff:ffff:ffff:ffff:ffff:ffff
2a00:1450:4000::/36     2a00:1450:4000:0:0:0:0:0 - 2a00:1450:4fff:ffff:ffff:ffff:ffff:ffff
2c0f:fb50:4000::/36     2c0f:fb50:4000:0:0:0:0:0 - 2c0f:fb50:4fff:ffff:ffff:ffff:ffff:ffff
I will certainly be very respectful of the bandwidth. As I said before, we really don’t
get a lot of hits and the site is more for our members than anyone else (plus the
occasional new person wanting to join).
Someone mentioned that my page size is a bit large. Yes, I do have some Javascript and it
does make it appear as though the page size is 3mb, but that is a deceptive assessment.
That 3mb includes a jQuery library, that most people already have in their cache, since so
many people are using jQuery. In actual fact, if jQuery is on your machine (Likely) the
actual page size is in the mid kbs. A quote from jQuery Doc:
"If you serve jQuery from a popular CDN such as Google's Hosted Libraries or
cdnjs, it won't be redownloaded if your visitor has been on a site that referenced it,
from the same source (as long as the cached version has not expired).”
Thanks for trying to help me resolve this.
Roger
VA7LBB
...
  On May 14, 2019, at 12:00 PM,
44net-request(a)mailman.ampr.org wrote:
 Send 44Net mailing list submissions to
        44net(a)mailman.ampr.org
 To subscribe or unsubscribe via the World Wide Web, visit
        https://mailman.ampr.org/mailman/listinfo/44net
 or, via email, send a message with subject or body 'help' to
        44net-request(a)mailman.ampr.org
 You can reach the person managing the list at
        44net-owner(a)mailman.ampr.org
 When replying, please edit your Subject line so it is more specific
 than "Re: Contents of 44Net digest..."
 Today's Topics:
   1. Portal API (Nate Sales)
   2. Re: Google indexing (Brian Kantor)
   3. Re: Google indexing (Rob Janssen)
 From: Nate Sales &lt;nate.wsales(a)gmail.com&gt;
 Subject: Portal API
 Date: May 13, 2019 at 2:22:55 PM PDT
 To: AMPRNet working group &lt;44net(a)mailman.ampr.org&gt;
 Hello,
 Is there any plan to make the API more complete? It would be really cool to
 be able to update gateways and such programatically.
 73,
 -Nate
 From: Brian Kantor &lt;Brian(a)bkantor.net&gt;
 Subject: Re: [44net] Google indexing
 Date: May 13, 2019 at 2:25:38 PM PDT
 To: AMPRNet working group &lt;44net(a)mailman.ampr.org&gt;
 On Mon, May 13, 2019 at 11:58:18AM -0700, Roger wrote:
  I wanted to thank everyone for their help with
the google issue I’m having. It is not resolved but I’ve made some discoveries. It looks
like a fair number of the ampr.org sites that come up on google may in fact be done via
BGP. Rob’s is and the others that I did a traceroute on, terminate on an address that is
not 44.
 But that said, I now think this is a 100% Google issue. I don’t know what kind of
stupidity they are up to but Yandex and Bing, have no problems indexing my site. I have
read of others having similar issues. Bing and Yandex actually use Google’s same system
for verification and they crawl just fine.
 73
 Roger
 VA7LBB  
 After Roger mentioned that AMPRNet BGP-advertised web sites were
 getting indexed, but not very many others, and then someone posted
 that Google's indexing bots often run in the IP address range
 66.249.x.x, I took a look at the ingress filter in amprgw.
 66.249.90.x and 66.249.91.x were indeed blocked.
 I have unblocked them.  Roger, you may see Google crawling your web
 site from addresses in those subnets now.  If you have some way to
 stimulate them to do so, you might want to try that.
 I don't know how among many possible ways that those addresses got
 on the blocking list, as it was too long ago for the current logs
 to reflect it.
        - Brian
 From: Rob Janssen &lt;pe1chl(a)amsat.org&gt;
 Subject: Re: [44net] Google indexing
 Date: May 14, 2019 at 11:01:09 AM PDT
 To: &quot;44net(a)mailman.ampr.org&quot; &lt;44net(a)mailman.ampr.org&gt;
  66.249.90.x and 66.249.91.x were indeed blocked.

 Ahh... that explains a lot!
  I don't know how among many possible ways
that those addresses got
 on the blocking list, as it was too long ago for the current logs
 to reflect it. 
 Maybe there was "a lot" of traffic?  Possibly also "a lot" in terms
of those days.
 But of course everyone running a website on an IPIP tunneled ampr.org site has some
 responsibility in this.  Make sure when you have areas with lots of data, those large
 files are not indexed.  This can be done using robots.txt files, headers in the page
 content, etc.
 E.g. you run a site with equipment schematics.  You have some text pages with indexes
 and a lot of huge PDF files with the scanned schematics themselves.  It is not difficult
 to make Google (and other crawlers) index only the text index files and not the PDFs.
 Or you have a local amateur group site and it has lots of photographs and maybe even
 video of the fieldday or other events.  It is possible to make the huge 30-megapixel
 photographs and the video not being indexed and only index the text content and maybe
 the thumbnails.
 When this is done in a responsible manner, indexing the websites that are behind IPIP
 tunnels should not cause much more "useless traffic" than there already is due
to
 jerks like shodan.io, stretchoid.com and the like.
 (those are scanning the entire IP range, not just websites that have been announced
 to Google or are linked from other sites)
 Rob
 _______________________________________________
 44Net mailing list
 44Net(a)mailman.ampr.org
 https://mailman.ampr.org/mailman/listinfo/44net 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [44net] 44Net Digest, Vol 8, Issue 81