Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Instead of crawling by #id, crawl by new posts from the /newest page. For each post, split it into multiple pages, setting the parent id/title that way. Not that you have to, but a future suggestion.


I'm guessing the info is currently scraped from the 'threads?id=username' page, but the title to each story is already there after the word 'on'.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: