Deep comics generation with multimodal LLMs

January 20, 2025 1 minute read

Categories: Blog, Projects

An image of Tex Willer

What’s this project about

Project aim: to generate new Tex Willer comics for the italian language (sorry non-italian speakers - you are really missing out here). The storyline should be customizable, and this should reflect in both text and visual adaptations.

Data: open-source black-and-white books available on Internet Archive.

Why: I love Tex! I have been eagerly reading it since I was young. It was (and still is) a role model for me and had an impact in shaping the person I am. Plus, it’s really a fun and captivating pastime. I own the whole Tex - Collezione storica a colori collection of 256 color volumes that ran from 2007 till 2015. Even though I read them all multiple times, it never gets old, and I would love to have more. It’s 2025, and with a bit of luck (and skills, and data) we will attempt to do just that!

Milestones :

Data preprocessing and panel extraction
Text extraction and cleaning (labels)
Narrative reconstruction
Writing-style learning and story generation
Panel generation
Panel temporal alignment
In-painting to color the black-and-white panels
[Optional] English translation
Deployment on a self-hosted solution

Note: while this project may seem “trivial”, it covers many state of the art technologies and paradigms, including multimodal language models (MLLMs), diffusion, fine-tuning, and deployment on resource-constrained hardware.

Articles in this series

More articles will be added as I find time to write them

Deep comics generation with multimodal LLMs - Overview - Part 1

Share on

Twitter Facebook LinkedIn

Guglielmo Gattiglio

What’s this project about

Articles in this series

Share on