Conference Papers

Macedo M., Candeias A., Marques M.
IEEE 16th International Conference on Rehabilitation Robotics (ICORR)
2019
Abstract:
We propose a methodology to classify motion of subjects with cerebral palsy based on RGB image sequences and present a new dataset with 2D facial landmark trajectories from RGB images of people with and without disabilities while performing specific types of movements. Depending on these movements, parts of the face can be occluded and we are able to recover the 3D face’s shape and its motion based on the Structure from Motion framework. Using the 3D structure and the motion, we propose two different motion descriptors, one is focused on describing the spatial distribution of the motion and the other on the temporal distribution. Finally, we discuss the physical meaning of these descriptors and show that they are very informative about the degree of the subjects’ disabilities. Our descriptor can classify people with and without cerebral palsy from 2D image sequences.
Vongkulbhisal J., Cabral R., De La Torre F., Costeira J.P.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2016
Abstract:
Object detection has been a long standing problem in computer vision, and state-of-the-art approaches rely on the use of sophisticated features and/or classifiers. However, these learning-based approaches heavily depend on the quality and quantity of labeled data, and do not generalize well to extreme poses or textureless objects. In this work, we explore the use of 3D shape models to detect objects in videos in an unsupervised manner. We call this problem Motion from Structure (MfS): given a set of point trajectories and a 3D model of the object of interest, find a subset of trajectories that correspond to the 3D model and estimate its alignment (i.e., compute the motion matrix). MfS is related to Structure from Motion (SfM) and motion segmentation problems: unlike SfM, the structure of the object is known but the correspondence between the trajectories and the object is unknown, unlike motion segmentation, the MfS problem incorporates 3D structure, providing robustness to tracking mismatches and outliers. Experiments illustrate how our MfS algorithm outperforms alternative approaches in both synthetic data and real videos extracted from YouTube.
Hauptmann A., Magalhães J., Sousa R., Costeira J.P.
ACM Multimedia 2020
2020
Abstract:
Recently, conversational systems have seen a significant rise in demand due to modern commercial applications using systems such as Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana and Google Assistant. The research on multimodal chatbots is a widely underexplored area, where users and the conversational agent communicate by natural language and visual data. Conversational agents are now becoming a commodity as a number of companies push for this technology. The wide use of these conversational agents exposes the many challenges in achieving more natural, human-like, and engaging conversational agents. The research community is actively addressing several of these challenges: how are visual and text data related in user utterances? How to interpret the user intent? How to encode multimodal dialog status? What are the ethical and legal aspects of conversational AI? The Multimodal Conversational AI workshop will be a forum where researchers and practitioners share their experiences and brainstorm about success and failures in the topic. It will also promote collaboration to strengthen the conversational AI community at ACM Multimedia.
Magalhães J., Hauptmann A., Sousa R., Santiago C.
Proceedings of the 29th ACM International Conference on Multimedia
2021
Abstract:
The second edition of the International Workshop on Multimodal Conversational AI puts forward a diverse set of contributions that aim to brainstorm this new field. Conversational agents are now becoming a commodity as this technology is being applied to a wide range of domains. Healthcare, assisting technologies, e-commerce, information seeking, are some of the domains where multimodal conversational AI is being explored. The wide use of multimodal conversational agents exposes the many challenges in achieving more natural, human-like, and engaging conversational agents. The research contributions of the Workshop actively address several of relevant challenges: How to include assistive-technologies in dialog systems? How can agents engage in negotiation in dialogs? How to handle the embodiment of conversational agents? Keynote speakers, both with real-world experience in conversational AI, will share their most recent and exciting work. The panel will address technological, ethical, legal and social aspects of conversational search. Finally, invited contributions from research projects will showcase how the different domains can benefit from conversational technology.
Mota P., Eskenazi M., Coheur L.
Proceedings - 2016 IEEE 10th International Conference on Semantic Computing, ICSC 2016
2016
Abstract:
This paper proposes the use of lexical similarity across different documents in order to improve a topic segmentation task. Given a set of topically related documents, the segmentation process is carried out using a Bayesian framework. By using similar sentences from different documents more accurate segment likelihood estimations are obtained. The proposed approach was tested in an educational domain where a set of learning materials from different media sources needed to be segmented so that students could browse through them more efficiently. Initial results show that the proposed method does afford better segmentation compared to one of the present state of the art algorithms, a Bayesian baseline approach that segments the documents individually.
Mosabbeb E.A., Cabral R., De la Torre F., Fathy M.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
2015
Abstract:
Activity recognition in video has become increasingly important due to its many applications ranging from in-home elder care, surveillance, human computer interaction to automatic sports commentary. To date, most approaches to video rely on fully supervised settings that require time consuming and error prone manual labeling. Moreover, existing supervised approaches are typically tailored for classification, not detection problems (the spatial and temporal support of the action has to be detected). Recently, weakly-supervised learning (WSL) approaches were able to learn discriminative classifiers while localizing the action in space and/or time using weak labels. However, existing approaches for WSL provide coarse localization in terms of spatial regions or spatio-temporal volumes. Moreover, it is unclear how to extend current approaches to the multi-label case that is common in practical applications. This paper proposes a matrix completion approach to the problem of WSL for multi-label learning for video. Our approach localizes non-rectangular spatio-temporal discriminative regions that are inferred by clustering regions of common texture and motion features. We illustrate how our approach improves existing WSL and supervised learning techniques in three standard databases: Hollywood, UCF sports, and MSR-II.
Lopes J., Trancoso I., Correia R., Pellegrini T., Meinedo H., Mamede N., Eskenazi M.
2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings
2010
Abstract:
This paper describes the integration of multimedia documents in the Portuguese version of REAP, a tutoring system for vocabulary learning. The documents result from the pipeline processing of Broadcast News videos that automatically segments the audio files, transcribes them, adds punctuation and capitalization, and breaks them into stories classified by topics. The integration of these materials in REAP was done in a way that tries to decrease the impact of potential errors of the automatic chain in the learning process.
Tonguz O.K., Boban M.
Ad Hoc Networks
2010
Abstract:
In this paper we investigate the possibility of a new type of application, namely multiplayer games, in a Vehicular Ad Hoc Network (VANET) environment. First, we analyze the available empirical data on travel and traffic volume in the United States, and point out the most important challenges that have to be met in order to enable multiplayer games over VANET. We then propose a new paradigm of multiplayer games over VANET, one which utilizes the new, interactive and dynamic VANET environment, while adapting to its inherent constraints.
Brandao S., Veloso M., Costeira J.P.
Proceedings - 2014 International Conference on 3D Vision, 3DV 2014
2015
Abstract:
The current paper addresses the problem of object identification from multiple3D partial views, collected from different view angles with the objective of disambiguating between similar objects. We assume a mobile robot equipped with a depth sensor that autonomously collects observations from an object from different positions, with no previous known pattern. The challenge is to efficiently combine the set of observations into a single classification. We approach the problem with a multiple hypothesis filter that allows to combine information from a sequence of observations given the robot movement. We further innovate by off-line learning neighborhoods between possible hypothesis based on the similarity of observations. Such neighborhoods translate directly the ambiguity between objects, and allow to transfer the knowledge of one object to the other. In this paper we introduce our algorithm, Multiple Hypothesis for Object Class Disambiguation from Multiple Observations, and evaluate its accuracy and efficiency.