Most translation tasks in the entertainment industry involve multiple modes of communication, i.e. they are multimodal, not solely language-based. A translator is expected to analyse, evaluate and transfer each of those modes to render an accurate translation of the source text. This is especially important in films, documentaries, TV and animated shows – multimodal scripts which are being localised for various contexts. An important step in the translation process in the entertainment industry should be the identification of translation errors in the final product which should be based on a proper translation error classification. Given that available translation error classifications rely solely on linguistic modes of communication, the aim of this paper is to propose a multimodal translation error classification which would be based on the multimodality of scripts to be translated and thus provide a reliable tool for the quality check of the final translation product in the entertainment industry. In that way, translators in this industry will be alerted to recognise elements (e.g. tone of voice, facial expressions, proximity, etc.) existing in multimodal scripts where both the source and the target texts as essential parts of the scripts are multimodal products