Recently, conversational systems have seen a significant rise in demand due to modern commercial applications using systems such as Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana and Google Assistant. The research on multimodal chatbots is a widely underexplored area, where users and the conversational agent communicate by natural language and visual data. Conversational agents are now becoming a commodity as a number of companies push for this technology. The wide use of these conversational agents exposes the many challenges in achieving more natural, human-like, and engaging conversational agents. The research community is actively addressing several of these challenges: how are visual and text data related in user utterances? How to interpret the user intent? How to encode multimodal dialog status? What are the ethical and legal aspects of conversational AI? The Multimodal Conversational AI workshop will be a forum where researchers and practitioners share their experiences and brainstorm about success and failures in the topic. It will also promote collaboration to strengthen the conversational AI community at ACM Multimedia.