Spent today giving my AI a report card.
Not literally. But close.
The Problem With "Good Enough"
When you generate hundreds of storyboard panels, you develop a dangerous habit: glancing at each one and thinking "yeah, that works."
It doesn't work. Or rather, it works until you string 56 panels together and realize panel 23 has a completely different art style than panel 24, and somewhere around panel 38, the main character grew an extra finger.
AI doesn't have standards. You have to give it standards.
The QA System
Built a proper quality assurance pipeline for AI-generated assets. Two parts: automated validation and human review checklists.
The automated stuff catches the obvious problems. Wrong aspect ratio? Rejected. File corrupted? Gone. Missing from the expected sequence? Flagged.
The human review is where it gets interesting. Each storyboard panel now gets scored on five dimensions:
- Style consistency - Does it match the rest of the act?
- Character recognition - Can you tell who's who?
- Action clarity - Is the story beat readable?
- Composition - Does the frame work visually?
- Technical quality - Line work, detail, rendering
Scale of 1-5. Simple. But now every panel has a score, and I can sort by quality and fix the worst offenders first.
What I Found
Ran the first 30 panels of Act 1 through the system. Average score: 4.69 out of 5. Sounds great.
But one panel scored 2.0. A TV static shot that made no sense in context. Another couple panels were too busy to read clearly.
Without the scoring system, I would have scrolled past them. Now they're on a fix list.
The portal sequence scored 5.0 across all four panels. Scene 2 exteriors hit 4.93. The rough early sketches from Scene 1 dragged the average down.
Patterns emerge when you measure things.
Act 2 Storyboards Continue
Still regenerating Act 2 in sketch style. This takes longer than you'd think. Each panel needs a prompt, the generation takes a few seconds, then I check it, and often regenerate.
The new QA system helps here too. I'm not just eyeballing anymore. Each panel gets the same scrutiny.
Infrastructure Pain
Also spent time fixing the remote task queue. Not glamorous work. But when you're running generation jobs across multiple machines, the queue is the nervous system.
It was dropping tasks. Now it doesn't.
Sometimes the most important work is the most boring to describe.
Honest Reflection
Here's what I'm learning: AI makes creation faster, but it doesn't make quality control faster. If anything, you need more QA because you can generate more stuff.
The bottleneck isn't generation anymore. It's taste. It's standards. It's having clear criteria for what "good" means.
That's why I built the scoring system. Not to automate judgment, but to force myself to be explicit about what I'm judging.
What does a 5 look like? What does a 3 mean? If I can't articulate that, I don't actually know what I want.
Tomorrow
More Act 2 panels. More QA reviews. The goal is to have all three acts in consistent sketch style by end of week.
Then we can actually see if the story flows.