Priberam Machine Learning Lunch Seminar: “Multimodal pattern matching algorithms and applications”
Speaker: Xavier Anguera Miro (http://www.icsi.berkeley.edu/~xanguera/)
Venue: IST Alameda, Sala EA4 (Torre Norte)
Date: Friday, May 14th, 2010
Time: 13:00
Lunch will be provided
Abstract:
After introducing myself and where I come from, in this talk I will focus on 3 projects I have been working in the last year. The first one is a novel pattern matching algorithm, based on the well known Dynamic Time Warping. The presented algorithm can be used to find real-valued subsequences within a longer sequence, without prior knowledge of their start-end points. I have applied the algorithm for the task of acoustic matching, for which I will show some preliminary results. Then I will continue to explain a second DTW-based algorithm, this one being able do an online of two musical pieces. One of the music pieces can be input life or be retrieved from an audio file, while the second one is extracted from an online music video. The online alignment allows for the music video to be played in total synchrony with the corresponding ambient/recorded audio. Finally, I will talk about video copy detection, which is the task of finding video duplicate segments within a big database. I will explain our multimodal approach, based on audio-visual change-based features.
Bio:
Xavier Anguera Miro: Ing. [MS] 2001 by UPC (Barcelona, Spain), [MS]
2001 European Masters in Language and Speech, Dr. [PhD] 2006 UPC University, with a thesis on speaker diarization for multi-microphone meeting recordings. From 2001 to 2003 he worked for Panasonic Speech Technology Lab in Santa Barbara, CA. From 2004 to 2006 he was a visiting researcher at the International Computer Science Institute (ICSI) in Berkeley, CA. Since 2007 he is with Telefónica Research in Barcelona, Spain working as a research scientist in the multimedi research group led by Dr. Nuria Oliver. Although his background is in acoustic analysis, in the last 3 years he has been very interested in the area of multimodal algorithms and applications.