Sousa R. , Ferreira P., Costa P., Azevedo P., Costeira J.P., Santiago C., Magalhães J., Semedo D., Ferreira R., Rudnicky A., Hauptmann A.

Proceedings of the 2nd ACM Multimedia Workshop on Multimodal Conversational AI, MuCAI'21

pp 25



Most of the interaction between large organizations and their users will be mediated by AI agents in the near future. This perception is becoming undisputed as online shopping dominates entire market segments, and the new “digitally-native” generations become consumers. iFetch is a new generation of task-oriented conversational agents that interact with users seamlessly using verbal and visual information. Through the conversation, iFetch provides targeted advice and a “physical store-like” experience while maintaining user engagement. This context entails the following vital components: 1) highly complex memory models that keep track of the conversation, 2) extraction of key semantic features from language and images that reveal user intent, 3) generation of multimodal responses that will keep users engaged in the conversation and 4) an interrelated knowledge base of products from which to extract relevant product lists.