Re: [44net] Google indexing

4 May 2019

For what it's worth.  I tried a few times to get google to index a
IPv6 site of mine with no luck.  I did the site verification thing,
and saw the corresponding visit in the logs, but never saw it appear
in their index.  I gave up on google.
As you said, its not like the old days, where you could just submit a
site.  You have to go thru a lot of hoops.
On Fri, May 3, 2019 at 9:55 PM Roger &lt;va7lbb(a)rezgas.com&gt; wrote:
...

 Brian,
 My friend at google isn’t 100% sure what’s happening, just that when the crawler attempts
the crawl, it comes back with the same code it gets for a robots.txt or something similar.
So to clarify, google claims, something at the “top domain” (their words) has prevented
almost all subdomains from being crawled. And whatever that is, it clearly has an affect
because I really don’t find any results for ampr.org, except the main site, portal and
wiki.
 Jann:
 I realize there are many other reasons and uses for amprnet. I was just talking about
webserving some sort of content out to the internet at large. The issue I’m having affects
anyone using 44Net, including those in your examples, that are wanting it discoverable by
the world at large.
 73
 Roger
 VA7LBB
  On May 2, 2019, at 19:31, Brian Kantor
&lt;Brian(a)bkantor.net&gt; wrote:
 Having the robots.txt file at the site AMPR.ORG apply to pages
 served by that host is completely reasonable.
 Having the robots.txt file at the site AMPR.ORG apply to pages
 served from an entirely separate site called, e.g., VA7LBB.AMPR.ORG
 is one of the most colossal pieces of internet engineering stupidity
 I have ever encountered.
 Yet it would appear that is what the person at Google is telling
 you is happening to your site.
  From Wikipedia:    "A robots.txt file
covers one origin. For websites with multiple
   subdomains, each subdomain must have its own robots.txt file.
   If example.com had a robots.txt file but a.example.com did not,
   the rules that would apply for example.com WOULD NOT APPLY to
   a.example.com. In addition, each protocol and port needs its own
   robots.txt file; http://example.com/robots.txt does not apply
   to pages under http://example.com:8080/ or https://example.com/"
 If that's not what their crawlers are doing, their crawlers are broken.
 They broke it, they get to fix it.  It's not our problem.
    - Brian
  On Thu, May 02, 2019 at 06:13:40PM -0700, Roger
Andrews wrote:
 Hi,
 I recently talked with Brian briefly about this and wanted to
 throw it out to the group.  It’s incredibly rare to see any of the
 tunnels that have been created, represented in a Google search.
 While I understand and agree that any site that will become a high
 volume site has no place on Amprnet (we have to share resources)
 it also seems pointless to create a website that is undiscoverable.
 After all, isn’t the primary purpose of a website to share it’s
 content with others. I recently created a website on a 44net gateway
 and after several weeks, (and even convincing Brian to add a meta
 TXT entry allowing me to ask google to crawl), I am not seeing any
 content on Google.  I put in a service request to google (not the
 easiest task) and I was advised that robots.txt or some other
 prevention device is blocking indexing all the subdirectories on
 amp.org. I was told that the few gateways that I see in the results
 were likely crawled before the restriction on ampr.org was applied.
 I created the website for our ARES group and placed it on an ampr
 gateway because we don’t have funds, and in reality, see very little
 traffic. We had a .net site last year and averaged about 50 visitors
 a month.  My question is - is it really necessary to prevent the
 whole of ampr.org from being crawled (except of course the top
 domain which does show up). So many ip addresses, but almost none
 visible seems a real pity.
 Thanks for listening. My only hope is that this creates a little
 bit of debate around the issue.
 73
 Roger
 VA7LBB 

 _________________________________________
 44Net mailing list
 44Net(a)mailman.ampr.org
 https://mailman.ampr.org/mailman/listinfo/44net 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [44net] Google indexing