A conversation with Galahad’s Rishi Sharma, Data Scientist and Bob MacDonald, CMO.

Congratulations to Rishi Sharma as the first author on a research paper “Tackling the Story Ending Biases in the Story Cloze Test” that was published in the Association for Computational Linguistics 56th edition. Rishi participated in the ACL 2018 conference in Melbourne, Australia and presented his research there.

So, Rishi can you explain what the ACL conference is?

Sure, it’s the annual conference of the Association for Computational Linguistics which publishes what is widely regarded as the premier journal for NLP research. It’s been around for over half a century and is a worldwide journal. Among the biggest contributions in the history of NLP, machine learning, and AI have come through ACL.

That’s great, congratulations! So, explain a little bit about your research group.

Well, thank you. In addition to the work I do with Galahad, I work as a researcher with the University of Rochester’s ROCNLP group. We handle a lot of NLP research focused around event relationships. So, being able to build systems that can grasp the temporal relationships, the entities and the sense of what is happening given some source text. It’s in vogue now since we have so many companies building products that interact using language. Our dataset ROCStories and our evaluation metric Story Cloze Test are really popular among research groups in academia and industry.

Can you comment on what someone with NLP knowledge can offer businesses now?

Sure. A lot of the easier tasks within NLP have great solutions for them. For example, understanding the topics found in a set of documents have a variety of tools that make that easy to implement. Understanding the sentiment of user messages is also pretty well done. Even translating or creating pretty good chatbots is something that open source and even some enterprise tools have packaged up fairly well.

And how does what your group do fit into different products or business solutions?

We want to get beyond simple classification or regression on text to get to a deeper set of understanding. With that we can create more sophisticated chatbots or virtual assistants. We can analyze customer dissatisfaction better. We can read through legacy or legal documents and find the relevant parts to a query with more context. We can summarize larger volumes of text to make it more useful. A lot of that needs deeper understanding about how and in what order things happened to what entities.

Let’s get into the specifics of your publication. What is your paper about?

Well, our group wants to formalize common sense understanding so that we can build that within NLP systems. We published a dataset called ROCStories and evaluation metric called the Story Cloze Test that hope to capture that understanding to test models’ ability to use common sense reasoning.

So how does it do that exactly?

Well, we have a training dataset of 100,000 five-sentence stories. They follow characters through a simple set of events and are all conclusive. They come from all sorts of everyday situations. We also have two evaluation sets where the system is given a four-sentence story context and a choice of two endings. The task is to choose the better ending.

Interesting, and where did you get this data?

We crowd sourced the sentences using Amazon Mechanical Turk. We had a rigorous vetting process to make sure the data was a clean, clear and representative sample of good language used to describe common everyday events. We also wanted to make sure the choice of right endings was distinctly correct from the wrong one. This took some time. We also found linguistic stylistic biases between right and wrong endings which was particular interesting, because people were writing right and wrong endings consistently differently.

And what is the result of your research?

A lot of other premier research groups in academia and industry are using our dataset to test their models. We get many downloads every day. It’s worth noting that some of the most advanced systems in NLP still perform nowhere near that of a human at this task, so I think it will serve as a goal for many premier NLP researchers for a while.

That sounds great, Rishi. So, how does the business community start to use research like this?

Well, I think that the business community might not directly use something like this, but they will likely want to get involved with NLP solutions that are capable of doing better on this task than others. Like I said, it’s not going to be long before all businesses, big, medium or even small might want to interpret their text information with more than just a surface-level look.

Completely agree. So, what’s new for your research endeavors?

Well, there’s a lot to do when you’re trying to formalize something as broad as common sense for AI. Of course, our research is really only focusing on North American English based examples. We probably want to work with other researchers to expand that for other cultures and languages. Also, we are looking to see what areas of everyday life we are missing and want to put into our training and evaluation set. And of course, we want to build better systems. We consistently work with other universities, and businesses like OpenAI or Google’s teams to do so.So what sorts of interesting research did you see at the ACL conference?Really amazing stuff. Our abilities to automatically translate from language to language has greatly improved. We are also seeing huge advancements on question answering for NLP or querying data from natural language. The Salesforce AI team released some incredibly powerful solutions for general purpose NLP too.

Last question: what are some of the groups you saw there?

Well, a lot of the schools that are traditionally known for NLP like Stanford, Carnegie Mellon, UPenn, and U Washington. There are research groups like the Allen Institute of AI. There are also companies like Google, Facebook, Apple, Amazon, Microsoft and IBM. Also, non-US companies like Airbus, Baidu and Huawei. Really a lot of people!

Link to the research paper:
http://www.aclweb.org/anthology/P18-2119