You are spot on on the paper that I referenced. As you can see in 4.3 which is i...

agibsonccc · on Oct 12, 2013

What made you pick this representation in particular? I'm kind of curious what different kinds of algorithms you might have looked at.

Summarizing only blogs posts seems a bit limiting to me. (Btw, I'm not trying to be negative, congrats on your success! texteaser looks great!)

I implemented a custom version (mainly changed the scoring scheme to include TF/IDF of words for initialized scoring) of TextRank and loved it.

The main thing I liked about it was how general it was. Words are nodes and sentences are vertices. Then you basically use pagerank to rank the sentences according the graph representation.

[1] http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Mihalcea.pdf

MojoJolo · on Oct 12, 2013

Hi, I focus on blog posts because I don't want it to be broad. This was because TextTeaser is my research for my graduate studies. And having a broader research means harder to accomplish. But it doesn't mean it can't be used to other type of text. It can still be used. It's just optimized for news.

I'm a little bit familiar with TextRank because I stumbled upon it when I'm doing my research. I also read several algorithms but forgot what they are called.

agibsonccc · on Oct 12, 2013

Ahh very cool! Thank you for the insight. I could see where that would be applicable then. Using comments as features is a very neat concept.

News is the most broadly applicable use for this so leveraging that isn't a bad thing. There's always a trade off of broad applicability vs overfitting for a particular case to get better results.

Thanks for the insight! Again great work.

aswanson · on Oct 12, 2013

Any suggestions on papers for Luhn's abstract algorithm? I hadn't heard of it before.

BjoernKW · on Oct 12, 2013

There you go:

https://text-analysis.googlecode.com/files/luhn58.pdf‎ http://dl.acm.org/citation.cfm?id=1662360