MT4M - Machine Translation For Microblogs

Isabel Trancoso  Alan W Black  Chris Dyer  "The MT4M project is an exciting opportunity to advance the state of the art not only in machine translation but also in general automated processing of online text. It is a perfect project for our transnational team who are reminded on a daily basis of the importance of effective communication across linguistic and culture divides. The technology we are developing will support both commercial endeavors and be a vehicle for scientific understanding of creative use of language." Isabel Trancoso, Alan Black and Chris Dyer. 
Portuguese PI
Isabel Trancoso (IST-UL/ INESC ID) 

Alan W. Black

Chris Dyer


Research teams: INESC ID/IST-ID; CMU
Organizations: Unbabel
Funding Reference: FCT CMUP-EPB/TIC/0026/2013
Duration: 12 months
Keywords: Machine Translation; Normalization; Microblogs; Twitter 

The MT4M project develops machine translation systems for content in microblogs, such as Twitter. This domain is characterized by creative use of language, dialectal lexemes, and informal register, which challenge traditional systems. For example, Google's English-Portuguese translation system translates the English sentence "ill cook it brotha!" (an informal variant of "I'll cook it, brother!" which the same translation system effectively translates) into the completely unintelligible "doente cozinhar brotha!" (roughly: "sick to cook brotha!"). The work on this project involves the development of a tweet normalizer that is capable of converting non-standard text into a standard text while preserving the meaning of the original tweet. 


The Phase II of the Carnegie Mellon Portugal Program emphasizes advanced education and research that can lead to significant entrepreneurial impact. The Early Bird Projects are designed to assist small teams of researchers from Portuguese institutions, Carnegie Mellon University and industry partners, to jumpstart high-impact potential activities of strategic relevance for the Program. 

Research Opportunities more