Hi, I recently talked with Brian briefly about this and wanted to throw it out to the group. It’s incredibly rare to see any of the tunnels that have been created, represented in a Google search. While I understand and agree that any site that will become a high volume site has no place on Amprnet (we have to share resources) it also seems pointless to create a website that is undiscoverable. After all, isn’t the primary purpose of a website to share it’s content with others. I recently created a website on a 44net gateway and after several weeks, (and even convincing Brian to add a meta TXT entry allowing me to ask google to crawl), I am not seeing any content on Google. I put in a service request to google (not the easiest task) and I was advised that robots.txt or some other prevention device is blocking indexing all the subdirectories on amp.org. I was told that the few gateways that I see in the results were likely crawled before the restriction on ampr.org http://ampr.org/ was applied. I created the website for our ARES group and placed it on an ampr gateway because we don’t have funds, and in reality, see very little traffic. We had a .net site last year and averaged about 50 visitors a month. My question is - is it really necessary to prevent the whole of ampr.org http://ampr.org/ from being crawled (except of course the top domain which does show up). So many ip addresses, but almost none visible seems a real pity. Thanks for listening. My only hope is that this creates a little bit of debate around the issue.
73 Roger VA7LBB
What is your site?
On Thu, May 2, 2019 at 6:16 PM Roger Andrews va7lbb@rezgas.com wrote:
Hi, I recently talked with Brian briefly about this and wanted to throw it out to the group. It’s incredibly rare to see any of the tunnels that have been created, represented in a Google search. While I understand and agree that any site that will become a high volume site has no place on Amprnet (we have to share resources) it also seems pointless to create a website that is undiscoverable. After all, isn’t the primary purpose of a website to share it’s content with others. I recently created a website on a 44net gateway and after several weeks, (and even convincing Brian to add a meta TXT entry allowing me to ask google to crawl), I am not seeing any content on Google. I put in a service request to google (not the easiest task) and I was advised that robots.txt or some other prevention device is blocking indexing all the subdirectories on amp.org. I was told that the few gateways that I see in the results were likely crawled before the restriction on ampr.org http://ampr.org/ was applied. I created the website for our ARES group and placed it on an ampr gateway because we don’t have funds, and in reality, see very little traffic. We had a .net site last year and averaged about 50 visitors a month. My question is - is it really necessary to prevent the whole of ampr.org http://ampr.org/ from being crawled (except of course the top domain which does show up). So many ip addresses, but almost none visible seems a real pity. Thanks for listening. My only hope is that this creates a little bit of debate around the issue.
73 Roger VA7LBB
44Net mailing list 44Net@mailman.ampr.org https://mailman.ampr.org/mailman/listinfo/44net
Having the robots.txt file at the site AMPR.ORG apply to pages served by that host is completely reasonable.
Having the robots.txt file at the site AMPR.ORG apply to pages served from an entirely separate site called, e.g., VA7LBB.AMPR.ORG is one of the most colossal pieces of internet engineering stupidity I have ever encountered.
Yet it would appear that is what the person at Google is telling you is happening to your site.
From Wikipedia:
"A robots.txt file covers one origin. For websites with multiple subdomains, each subdomain must have its own robots.txt file. If example.com had a robots.txt file but a.example.com did not, the rules that would apply for example.com WOULD NOT APPLY to a.example.com. In addition, each protocol and port needs its own robots.txt file; http://example.com/robots.txt does not apply to pages under http://example.com:8080/ or https://example.com/"
If that's not what their crawlers are doing, their crawlers are broken. They broke it, they get to fix it. It's not our problem. - Brian
On Thu, May 02, 2019 at 06:13:40PM -0700, Roger Andrews wrote:
Hi, I recently talked with Brian briefly about this and wanted to throw it out to the group. It’s incredibly rare to see any of the tunnels that have been created, represented in a Google search. While I understand and agree that any site that will become a high volume site has no place on Amprnet (we have to share resources) it also seems pointless to create a website that is undiscoverable. After all, isn’t the primary purpose of a website to share it’s content with others. I recently created a website on a 44net gateway and after several weeks, (and even convincing Brian to add a meta TXT entry allowing me to ask google to crawl), I am not seeing any content on Google. I put in a service request to google (not the easiest task) and I was advised that robots.txt or some other prevention device is blocking indexing all the subdirectories on amp.org. I was told that the few gateways that I see in the results were likely crawled before the restriction on ampr.org was applied. I created the website for our ARES group and placed it on an ampr gateway because we don’t have funds, and in reality, see very little traffic. We had a .net site last year and averaged about 50 visitors a month. My question is - is it really necessary to prevent the whole of ampr.org from being crawled (except of course the top domain which does show up). So many ip addresses, but almost none visible seems a real pity. Thanks for listening. My only hope is that this creates a little bit of debate around the issue.
73 Roger VA7LBB
I have had google crawl and index my ampr.org site. I have since blocked that though, as I don't really even need what I am running to be accessible outside of the 44 netspace.
Keep in mind webservers are just one type of service. My most frequent non 44 net type of traffic is SMTP.
If you are trying to get a feel for what is out there, here is a message from a few years ago. (Note: Most are only accessible from within the network)
---- Begin Forwarded Message ----
[44net] 44net cool toys Jann Traschewski jann at gmx.de Wed Mar 19 15:21:05 PDT 2014
Ok what are people actually DOING over their Internet-connected 44 radio net?
VOIP? APRS? Video links? Repeater linking? IP cameras? Remote bases?
All of them! You might want to click around:
Web: http://web.db0avh.ampr.org http://db0bi.ampr.org http://db0dah.ampr.org http://db0eeo.ampr.org http://db0end.ampr.org http://db0fhn.ampr.org http://db0res-svr.ampr.org http://db0gos.ampr.org http://db0dz.ampr.org http://db0ii.ampr.org http://db0iuz.ampr.org http://44.225.56.72/cms25 http://db0kwe.ampr.org http://srv.db0lj.ampr.org http://db0nis.ampr.org http://db0ovn.ampr.org http://44.225.60.2 http://db0sda.ampr.org http://db0ach.ampr.org http://db0pra.ampr.org http://websrv.db0pdf.ampr.org http://db0wet.ampr.org http://db0ham.ampr.org http://44.143.10.90 http://web.oe2xzr.at.ampr.org http://dm0ha.ampr.org http://dm0zgw.ampr.org http://db0erf.ampr.org http://web.oe5xbl.at.ampr.org http://web.oe7xci.ampr.at http://44.168.12.11 http://server.db0anf.ampr.org http://www.db0fuz.ampr.org http://linux.db0zeh.ampr.org http://monitor.db0mhb.ampr.org http://db0bul.ampr.org http://rpt.db0pob.ampr.org http://db0kpg.ampr.org http://db0lip.ampr.org http://44.225.76.161 http://db0zdf-srv01.db0zdf.ampr.org http://raspberry.db0abz.ampr.org http://db0fc.ampr.org http://cloud.db0fc.ampr.org http://db0oha.ampr.org http://dk0mav.ampr.org
SDR: http://dstar.db0vox.ampr.org:8901 http://debian.vm.db0tvm.ampr.org:8901 http://websdr.db0iuz.ampr.org:8901 http://websdr.oe4xlc.at.ampr.org http://db0tv.ampr.org/index.php?option=com_content&task=view&id=38&a...
Fotocam: http://db0bi.ampr.org/webcam/bielefeld
Maps: http://osm.oe2xzr.ampr.at http://tileserver.db0fc.ampr.org
Packet Radio: http://xnet.db0eeo.ampr.org http://dlc7.oe7xgr.at.ampr.org http://dlc7.db0kv.ampr.org http://dhcp200.db0wal.ampr.org http://db0res-node2.ampr.org http://pr-srv.db0gis.ampr.org
APRS: http://aprs.db0res.ampr.org
ATV: http://db0ko.ampr.org http://testpc.db0res.ampr.org http://db0ntv.ampr.org http://db0tv.ampr.org/index.php?option=com_content&task=view&id=71&a...
BBS: http://db0ii.ampr.org:8080 http://db0iuz.ampr.org:8080
Convers: http://db0iuz.ampr.org/cgi-bin/dlwwc.php
DX-Cluster: http://dxcluster.oe1xhq.ampr.at http://db0iuz.ampr.org/cgi-bin/spider.cgi
Pager: http://db0ii.ampr.org:9080 http://db0iuz.ampr.org:8081 http://db0fhn.ampr.org:4780
Webcam: http://webcam.oe1xar.at.ampr.org http://webcam.oe2xzr.at.ampr.org http://video.oe5xll.at.ampr.org http://webcam.oe7xzr.at.ampr.org http://webcam.oe8xdr.at.ampr.org http://db0iuz.ampr.org/webcam/webcam_radom.php http://db0iuz.ampr.org/webcam/webcam_techniker.php http://srv.db0lj.ampr.org/spycam.html http://webcam.db0zdf.ampr.org/view/index.shtml http://webcam.db0pob.ampr.org
Active Hosts: http://hamnetdb.net/?m=host&as=0
73, Jann DG8NGN