Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> (specified by the website owner via a robots.txt like spec).

Nope, if a website wants such a restriction, it must enforce it. Robots.txt is a request. It's worthless.



If a robot misbehaves, it'll either be blocked or it'll go to the networks abuse section and that bot will be taken down. That a site possibly could have some kind of technical solution to this doesn't matter.


Precisely - the solution here needs to be that the server blocks the robot - if it can differentiate it from other traffic that is. That's all well and good and that's the solution which should be used here. If you don't want to be archived, block the IP.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: