You are spot on on the paper that I referenced. As you can see in 4.3 which is in page 3, the paper mentioned two algorithms for sentence selection. These are Summation-Based Selection and Density-Based Selection. Which is SBS and DBS respectively.
What made you pick this representation in particular?
I'm kind of curious what different kinds of algorithms you might have looked at.
Summarizing only blogs posts seems a bit limiting to me. (Btw, I'm not trying to be negative, congrats on your success! texteaser looks great!)
I implemented a custom version (mainly changed the scoring scheme to include TF/IDF of words for initialized scoring) of TextRank and loved it.
The main thing I liked about it was how general it was. Words are nodes and sentences are vertices. Then you basically use pagerank to rank the sentences according the graph representation.
Hi, I focus on blog posts because I don't want it to be broad. This was because TextTeaser is my research for my graduate studies. And having a broader research means harder to accomplish. But it doesn't mean it can't be used to other type of text. It can still be used. It's just optimized for news.
I'm a little bit familiar with TextRank because I stumbled upon it when I'm doing my research. I also read several algorithms but forgot what they are called.
Ahh very cool! Thank you for the insight. I could see where that would be applicable then. Using comments as features is a very neat concept.
News is the most broadly applicable use for this so leveraging that isn't a bad thing. There's always a trade off of broad applicability vs overfitting for a particular case to get better results.