BlogSum: Helping Researchers Make Sense of Social Media

Author:  Katie Campbell
Institution:  University of Florida
Date:  October, 2012

748443511_e3b89339d2_z-e1352051490230.jpg

Last month, computer scientists at Concordia University unveiled a new system called BlogSum that aims to bring computers one step closer to understanding materials like online blogs. In today’s world of blogs and connectivity in social media, this breakthrough technology could be the key to unlocking how consumers view an organization more quickly and easily than ever before.

Using this tool, users can submit a question and BlogSum can measure consumer preferences or voter intentions by going through websites and examining real-life expressions and conversations relating to the original question. Summaries are then generated to focus specifically on the question being asked.

Ultimately, BlogSum users will get a clear idea of actual human responses as seen on the Internet.

Leila Kosseim, one of the lead researchers at Concordia's Computational Linguistics Laboratory and one of BlogSum’s developers, explained its novel use during the age of the blog: “Huge quantities of electronic texts have become easily available on the Internet, but people can be overwhelmed, and they need help to find the real content hiding in the mass of information.”

When analyzing online blogs or other forms of informal writing rather than hard news articles of pure fact, summarization tools similar to BlogSum must account for human qualities like emotions, opinions and ideas that are not necessarily fact. Spelling and grammar can also prove to be obstacles.

In order to be successful, BlogSum must account for question irrelevance – or sentences that do not directly concern the main question  – and discourse incoherence - or sentences in which a writer’s meaning is not immediately clear.

BlogSum overcame both challenges, surpassing older systems and rankings with superior results and also ranking higher with human subjects, who found the system to be better and more successful than others. The system was ranked based on the readability of summaries produced based on large amounts of online text.

BlogSum analyzes data from language using “discourse relations.” A discourse relation is a way of filtering and ordering sentences found in web content into understandable summaries. By using discourse relations, BlogSum addresses question irrelevance and discourse incoherence, the two major problems similar summarization tools run into when analyzing informal language, like a blog.

This study is an example of Natural Language Processing. Concordia is a leader in this area of research.

"The field of natural language processing is starting to become fundamental to computer science, with many everyday applications – making search engines find more relevant documents or making smart phones even smarter,” Kosseim said.

This science feature article was written under the guidance of JYI Science Writing Mentor Robert Aboukhalil.