Script Structure for Vertical Dramas: Episode-by-Episode Guide
A 90-second episode is not a short scene. It is a complete structural unit with its own three-part logic, its own conversion function, and its own failure modes that have nothing to do with writing quality in the traditional sense.
Writers trained in long-form television or film bring strong instincts to this format — and most of those instincts are wrong for it. Not because the craft is simpler. Because the craft is different. The frame is narrower, the runtime is a fraction, the viewer's thumb is one swipe from leaving, and every episode has a commercial job to do that traditional scripts never had to think about.
This is the episode-by-episode structure guide: what each unit needs to contain, what the arc across 70 episodes actually looks like, and where most scripts quietly fail before a platform ever sees them.
Why Traditional Script Training Breaks Down Here
In a television pilot, you have 40 minutes to establish a world, introduce characters, set tone, and still leave room for a story. Audiences extend trust by default. They expect a slow burn before the payoff.
Vertical drama does not have that agreement with the viewer. The audience has given 90 seconds — conditionally. Lose them in the first 15 and they are gone. Build to a weak episode ending and the next unlock does not happen. The paywall conversion rate at episode 5 to 8 is the entire commercial mechanism of the format. Everything in the script exists to support that number.
That is not a creative limitation. It is a craft constraint that produces a specific kind of writing discipline — one that long-form writers have to acquire deliberately, because nothing in their training built it.
The Episode Unit: Four Timestamps, Not Three Acts
Every episode of a well-structured vertical drama follows the same four-part timestamp skeleton. Not loosely — to the second.
0–15 seconds: The Hook (the explosion point)
Chinese showrunners call this the explosion point for a reason. The episode does not open in motion toward the conflict. It opens in the conflict. The viewer sees a visual beat that communicates world, stakes, and character in one image before a line of dialogue lands.
What that looks like in practice: a hand slapping a resignation letter on a desk. A phone screen showing a text from the wrong name. A character walking into a room they should not be in. The image has to be legible in three seconds and charged in fifteen.
The most common mistake at this stage: writers pad the opening with re-establishing context from the previous episode. The viewer was there. They do not need a recap. They need something to happen immediately. Fifteen wasted seconds at the top is 17% of the episode runtime before anything has moved.
15–60 seconds: The Escalation (one move forward)
One forward move. Not two. The discipline of the middle section is the refusal to resolve anything while still advancing something.
A revelation lands — but only partial. A confrontation starts — but does not finish. A decision is made — but the consequence is not yet visible. The escalation delivers enough to justify the viewer staying, but not enough to release the tension that is driving them forward.
The dialogue in this section is direct and high-stakes. Characters say what they mean. There is no room for subtext that requires interpretive work from the viewer. On a six-inch screen in a public space, the emotional beat has to land without effort. Oblique dialogue that works in prestige television becomes invisible noise on a phone.
60–80 seconds: The Spike (the tension peak)
The episode builds to its highest tension point in the final 20 seconds. This is not the cliffhanger yet — it is the lead-in to it. The spike is where the scene reaches its most charged moment: the confrontation peaks, the revelation is about to land, the decision is one beat away from being made.
Retention data from ReelShort's engineering team points to a specific window: episodes that hold their freeze-frame between seconds 55 and 58 retain viewers at higher rates than those that resolve earlier or run longer. Writers now structure to the second, not the page. The spike is the setup for what will not be resolved.
80–90 seconds: The Button (the unresolved cliffhanger)
The episode ends mid-tension. Not on resolution. On suspension.
The cliffhanger is not a device added at the end — it is the structural endpoint the entire episode is written toward. The button is the moment that makes stopping feel uncomfortable. The power dynamic is about to flip. The secret is about to surface. The confrontation is one word away from detonating.
Write the button first. Then write backward through the episode to the hook. The button is the only reason the viewer unlocks the next episode.
What the Cliffhanger Actually Is (and What It Is Not)
The cliffhanger in vertical drama is not a twist. It is a pause at the worst possible moment.
Four cliffhanger mechanics that convert:
The partial revelation. The viewer learns something the protagonist does not yet know, or the protagonist learns something the viewer has been waiting for — but the consequence is cut before it lands. The gap between knowing and acting is the tension that pulls the viewer forward.
The reversal setup. The power dynamic is about to flip — and the episode ends just before it does. The protagonist is one moment away from the confrontation that changes everything. The freeze lands in the breath before it happens.
The intrusion. A new element enters the scene — a phone call, a third character, a physical arrival — that reframes everything established in the episode. The viewer cannot know what happens next because the rules just changed.
The deadline. A timer, a consequence, or an ultimatum lands in the final seconds. The viewer knows something bad will happen if the next episode is not watched. The discomfort of leaving the story unresolved is the conversion mechanism.
What the cliffhanger is not: a resolution followed by a teaser for the next episode. That structure releases tension rather than holding it, and released tension does not convert at the paywall.
The Series Arc: How 70 Episodes Actually Work
Most writers plan the first ten episodes and figure out the rest as they go. That approach fails the vertical drama format at the script stage, before production starts.
The full escalation arc has to be mapped before episode one is written. Not loosely. The key structural markers need to be locked:
The premise conflict — established in episode one, first 30 seconds.
The first major escalation — typically episodes 8–15. Stakes increase, the power dynamic shifts for the first time, and the viewer's investment deepens.
The paywall placement — the free preview ends where the tension is highest and most unresolved. Usually episodes 5–8. The viewer has to care enough to pay, which means the first free episodes cannot be setup — they have to be genuine escalation.
The midpoint reversal — around episodes 35–40 in a 70-episode series. The protagonist gains ground, the antagonist responds, or the fundamental premise is reframed. This is where viewer retention typically dips if the arc is not planned — episodic repetition without forward progress kills a series in its middle third.
The penultimate crisis — episodes 60–65. Everything that has been built collapses or is threatened. The stakes reach their highest point.
The resolution — the final 5–7 episodes deliver the payoff the viewer has been waiting for since the premise. The power dynamic inverts completely. The series earns the investment it asked for.
Every episode within that arc has one job: advance the viewer one step along that path without letting them off the tension hook. Episodes that feel like filler — scenes that exist to delay the next structural marker rather than move toward it — are where subscription churn happens.
The Premise Question: Structure Before Script
The best vertical drama scripts are not written from strong premises. They are written from structurally strong premises — concepts that generate conflict automatically, in every scene, without setup.
The format rewards premises where the characters are in opposition by design, before anything happens. Enemies forced into proximity. A power imbalance the protagonist cannot escape. A secret that both parties hold, with different pieces of it. An arranged or forced circumstance neither character chose.
These structures work because they eliminate the setup problem. In a 90-second episode, there is no time to generate conflict through accumulation. The conflict has to already exist when the episode opens. A premise that requires three episodes of context before the tension begins is a premise that will not survive platform review.
The one-sentence test: if the premise does not generate a specific, charged image in one sentence, it is not ready to be scripted. The logline is not a summary of the series — it is evidence that the conflict is strong enough to sustain 70 episodes of escalation.
Dialogue Rules for the Format
Vertical drama dialogue has three rules that break from most screenwriting training:
Say the thing. Subtext is a tool for formats where the viewer has time to read between lines. At 90 seconds per episode on a phone screen, subtext becomes noise. Characters say what they mean — loaded, direct, with emotional stakes in every exchange. "I know what you did" is a vertical drama line. "There are things I'm beginning to understand about you" is not.
Keep it short. No speech runs longer than three lines without a visual beat or action interrupting it. Long monologues lose rhythm in the format and signal a writer who has not calibrated to the screen time. Most emotional confrontations in the best vertical drama scripts are 40 words or fewer per character before the scene turns.
End on an incomplete thought. The dialogue at the episode's button often ends mid-sentence, mid-realization, or at the beat just before the response. The incomplete dialogue mirrors the incomplete action — both suspended, both pulling the viewer forward.
Axis AI Studios Perspective
The script is where most vertical drama productions fail — before a camera turns on.
Productions that treat scripting as a fast step before the real work of production are the ones that arrive at delivery with 70 episodes of footage and no escalation structure. The episodes are technically correct. The arc is not there. The paywall conversion rate is not there either.
The structural work — mapping the arc, locking the paywall episode, designing the cliffhanger mechanics across the full series — is not pre-production prep. It is the production. Everything downstream follows from whether that structure is right.
AI-native production compresses a significant amount of the production pipeline. It does not compress script quality. The structural logic of the series has to be correct before AI tools touch any of it — because AI production at volume with a weak script produces 70 episodes of consistently weak content, fast. Volume is not the advantage if the foundation is wrong.
The writers who understand this format do not write fast. They plan specifically and then write fast. The planning is what the format requires.
Common Script Mistakes: What Platform Readers Flag Immediately
The atmospheric opening. Episode one spends 20–30 seconds establishing mood, location, or backstory before the conflict arrives. In vertical drama, that 20 seconds is a death sentence. The conflict has to be present in the first image.
The resolved episode. The episode ends on a beat that completes rather than suspends. The tension is released. There is no reason to unlock the next episode. This is the single most common structural failure in first-time vertical drama scripts.
The thin middle third. Episodes 25–50 in a 70-episode series repeat the same power dynamic without moving the arc forward. The viewer has already seen this version of the conflict. Without a structural marker — a revelation, a reversal, an escalation — the series stalls and churn follows.
The late paywall placement. The free preview runs too long. The viewer is not yet invested enough when the paywall arrives, or the paywall arrives after a resolved episode rather than at a tension peak. Paywall placement belongs in the script, not in the edit.
Dialogue that explains rather than enacts. Characters describe their emotional states instead of demonstrating them. "I'm so angry at you" is an explanation. A character who walks out of the room and picks up a glass but does not throw it — then the episode ends — is an enactment.
FAQ
How many words is a vertical drama episode script?
A 90-second episode typically runs 400–600 words of total script — action lines plus dialogue. Most of that is dialogue. Action lines are kept to seven words or fewer per beat, and most episodes have three to five visual beats total. The writing is dense with emotion and sparse with description. If the script is running longer than 600 words per episode consistently, the pacing is likely wrong for the format.
Does every episode need a cliffhanger?
Every episode needs an unresolved button — but the cliffhanger does not have to be a dramatic reveal every time. The format sustains itself on a range: hard cliffhangers for the structural marker episodes, and softer buttons — an incomplete exchange, a decision about to be made, a phone call unanswered — for the connective tissue episodes between markers. The rule is not that every episode is maximum tension. It is that every episode ends before the tension releases.
Can a vertical drama script be written without mapping the full series first?
It can be started that way. It will not be finished that way. Writers who begin scripting without a locked arc consistently hit the same wall in the middle third — the structural markers are not there, the escalation has run out of forward motion, and the series needs to be restructured from the midpoint backward. Mapping the full arc before scripting episode one is not extra work. It is the work that makes the scripting fast.
A vertical drama script is not a short version of something longer. It is its own form with its own logic — and that logic runs from the first three seconds of episode one through the button on episode 70.
Writers who learn it on its own terms produce scripts that hold viewer retention through the paywall and sustain escalation across a full series. Writers who treat it as compressed television produce 70 episodes that play like a pilot with 69 extra scenes.
The format rewards planning over instinct. Map the arc. Write to the button. Let the structure do the work the writer cannot do in 90 seconds.
Further Reading
Understanding script structure is one part of the production equation. For how the full production chain works around that script — from casting and direction through to platform delivery — the complete 2026 guide to how vertical micro-dramas are produced covers every stage.
For context on what platforms are actually looking for when they review incoming scripts and series, the ReelShort platform breakdown covers their acquisition criteria in detail.
The budget context for what it costs to take a script into production sits in the vertical drama production costs breakdown.
For a read on where the market is heading — and why well-capitalized platforms are actively looking for production partners right now — the vertical drama funding rounds Q1 2026 breaks down what the capital movement signals for producers and platforms alike.

Let's set
the new standard together.
If you're working on something, we'd like to hear about it.
