CMU Portugal student André Duarte speaks at Priberam Lab Seminar

André Duarte, a Dual Degree Ph.D. student in Language Technologies at Instituto Superior Técnico / INESC-ID and Carnegie Mellon University, was the invited speaker at the latest  Priberam Lab Machine Learning Lunch Seminar held on March 11th. This seminar is part of a series of biweekly informal meetings hosted at Instituto Superior Técnico, in Lisbon. 

Credits: Priberam

At the event, André talked about DE-COP, a method designed to detect whether copyrighted text has been included in a language model’s training data. André shared that “by leveraging multiple-choice questions that contrast verbatim text with its paraphrases, DE-COP effectively exposes memorization, significantly outperforming prior methods”.  

The CMU Portugal student  also talked about his extended investigation to vision-language models (VLM) with DIS-CO, a new approach for identifying copyrighted visual content in training data. By “using our MovieTection benchmark, built from 14,000 frames across various films, we find that many popular VLMs display clear signs of memorization, raising broader concerns about AI training practices and copyright compliance” explains André. 

“I was really happy to receive this invitation. As a Ph.D. student, I see these opportunities to share our work as truly valuable. So, I can only say thanks for being the one chosen this time.” shared André, who was quite impressed with the turnout of approximately 60 attendees. 

André Duarte’s research focuses on the security and privacy of Generative AI models, with a particular emphasis on Membership Inference Attacks. He is supervised by Arlindo Oliveira, at Instituto Superior Técnico and INESC-ID, and Lei Li, at the Language Technologies Institute of Carnegie Mellon University

In November 2024, André Duarte won the SPARK award for best student article at  Center for Responsible AI Forum 2024, for his article “DE-COP: Detecting Copyrighted Content in Language Models Training Data”, selected among 44 academic submitted projects.