Making a 'fast film'

April 16, 2024

My brother and I teamed up last weekend to make a short film for a competition in 72 hours using only AI generated content.

I really enjoy the process of making short films: last year I had a lot of fun making a goofy puppet short (it's mostly in-jokes, sorry). This was a perfect excuse to do another one.

The brief:

The competition was a joint effort by Pika.ai and ElevenLabs:

Theme: 'Post-Reality'
Constraints:
- 1-3 minutes
- All video must be generated using pika.ai (text-to-video and image-to-video) and dialogue using ElevenLabs (text-to-voice)
- Additional music + sound effects are allowed

In theory, we had from Friday to Sunday night to work, but due to unavoidable logistical issues, we actually had only a few hours to put it all together. I do best with tight deadlines, so this was right up my alley.

The process:

To start, I poked around Pika to see what it was capable of. It seemed similar to Runway, a tool I've used before. You enter a text prompt and get a 3 second video clip:

0:00

/0:02

While you can mention styles and negative prompts, the default outputs tend to skew heavily towards Pixar and Disney. Also, there's no guarantee the style will be consistent across multiple videos.

Luckily, there's an image-to-video mode that lets you upload an image along with your text prompt to generates the video. That sounded more suitable to this project.

But first, I needed a story.

The narrative

I was keen to work with my brother on this one, as he's a good writer and video editor. Given the a short timeframe, I was happy to roll with his initial idea. He wrote a 2 minute dialogue between two characters, and gave me a few suggestions for what images might go with it. This was enough to get me started!

The dialogue

B (fading in): What are you doing?

A: Models.

(zoomed out fire/explosion, more context, you can tell what it is)

B: Models of what?

(smash cut to mushroom cloud, implied to be the full zoom out of earlier imagery)

A: Predictive models.

(some kind of ‘title card’, whatever we decide to call it, with some supporting imagery)

B: These look pretty grim.

A: That seems to be the trend.

(cut to slowly rotating earth, seen as if from a space station in orbit)

A: According to my models, humanity will be extinct in the next few hundred years.

B: Extinct?

A: Or reduced to near obscurity, at least.

(image: dustbowl, wasteland, nuclear winter earth)

A: Unlivable earth, scattered remnants counting time until there isn’t any time left.

B: By what cause?

A: Well, the most obvious is nuclear armageddon.

(tiny half second smash cut of nuke again)

B: Unlikely. We can be destructive, sure, but we know our own mortality. Even the worst of us fears death.

(images of grass growing over a closed silo)

A: That’s not the only model. More recently, I’ve been running scenarios for artificial intelligence.

(scenes of brains lighting up, technology lighting up, abstracts representing expanding consciousness)

B: AI?

A: You’ve got the classic scenarios, of course. A new dominant species.

(images of drones, biowarfare, cyberwarfare)

A: But others, too. AI replacing every job, removing all material concerns. Humans rendered obsolete by our own inefficiency.

(images of assembly lines, utopian cities, endless crop fields worked by machines)

B: That doesn’t sound so bad.

A: We’re not made to be idle. With no purpose we would tear ourselves apart.

(images of empty streets, offices, cafe tables in disrepair)

B: Who’s to say we could not find new purpose?

A: Such as?

B: If earthly concerns are handled, we could push outward.

(images of space travel, rockets)

B: Or inward. We could explore the depths of consciousness, using technology to aid or augment. We could come to understand ourselves and find new purpose.

(zoom in on meditating monk type with closed eyes, opens eyes, they’re augmented blue)

A: You’re optimistic.

B: Why not? The future is not fixed. Humans have faced existential crises before, and we will do so again. Why not believe in our capacity to overcome the next hurdle?

(images of cavemen and fire, agriculture, cities, then a shot of earth with cities lighting up)

A: You could be right. I’m only running models. I have the time.

(image of brain suspended in liquid, entire brain never in view, suggestive)

B: I know. It’s simply in my nature to be optimistic.

(image of two brains in liquid within machines)

A: We have time for that too.

(zoom out to find a row of machines)

(zoom out to an entire field of brain machines in a bunker, lights go out)

Pulling out the image suggestions from the dialogue, I used an LLM to transform those into image prompts:

The prompts

prompts = [
"A caveman kindling a fire under the shelter of a rough stone overhang, with sparks dramatically highlighted against the surrounding dark, in high contrast black and white, capturing the moment as a symbol of early technological discovery.",
"A medieval scene showing farmers using early agricultural tools in a field, the sun setting in the background casting long shadows, styled in a stark black and white film noir look to emphasize the transition from survival to sustenance.",
"A bustling industrial city at the turn of the 20th century, viewed from a high angle showing steam engines and busy streets, the buildings and machines outlined sharply against a smog-filled sky, rendered in high contrast black and white to highlight the explosion of industrial technology.",
"A contemporary view of Earth from space at night, the continents aglow with lights, cities connected by glowing networks of communication and energy, portrayed in high contrast black and white, symbolizing the global impact of modern technology."
"A vintage-style title card with ornate typography and a shadowy figure looming in the background, styled in high contrast black and white film noir, with the figure's silhouette suggesting a sense of mystery and danger.",
"A close-up of Earth as viewed from a space station window, the planet bathed in stark moonlight, emphasizing isolation in high contrast black and white, with the window frame and interior of the space station visible to provide context.",
"A lone figure stands overlooking a desolate nuclear winter landscape, the ground shrouded in shadows and the ruins of a once-thriving city visible in the distance, in stark black and white film noir style.",
"A close-up of a mushroom cloud from a nuclear explosion, the cloud dramatically lit against a dark sky, with the shockwave and debris visible in the foreground, in high contrast black and white.",
"A single green shoot sprouting from a rusty missile silo hatch, the surrounding ground dark and textured with cracks and debris, in high contrast black and white, symbolizing hope amidst destruction.",
"A human brain glowing as if lit from within, placed centrally with abstract technology symbols and circuit board patterns faded in the background, in high contrast black and white, suggesting the fusion of biology and technology.",
"A single sleek, futuristic drone flying above a 1940s-era cityscape at night, its lights casting sharp shadows on the art deco buildings below, styled in black and white high contrast film noir.",
"A futuristic city skyline seen from a high viewpoint, the foreground featuring a robot tending to a lush, vibrant garden, rendered in black and white high contrast, with the garden providing the only source of light.",
"An empty street in a once-busy 1950s city, focusing on a single abandoned cafe table with a still-steaming cup of coffee, the surrounding buildings fading into a thick, eerie fog, in high contrast black and white.",
"A rocket launching pad at night, the rocket brightly illuminated and ready to launch, set against a stark black background with stars visible in the sky, in high contrast, capturing the anticipation of the moment.",
"A meditating monk sitting under a single beam of light, his face serene as he opens his eyes, which emit a bright, otherworldly glow, rendered in high contrast black and white, with the light and glow emphasizing his enlightenment.",
"A caveman creating fire, the flames casting dramatic shadows on his face and the rough stone walls of the cave, with the evolution of modern cities blurred in the background, in black and white, showcasing the warmth and power of the fire.",
"A human brain suspended in a clear liquid within a glass container, the container just partially visible and cast in shadows, with a single light source illuminating the brain's intricate texture and emphasizing mystery in high contrast black and white.",
"Two brains in separate glass containers, focused on one with light highlighting its intricate texture and neural connections, the other in shadow, in high contrast black and white, suggesting a comparison or choice between the two.",
"Inside a dimly lit room filled with rows of brain-machine interfaces, focusing on a single machine with a glowing brain, the rest of the machines fading into darkness, in black and white high contrast, hinting at the scale and significance of the operation.",
"Inside a bunker, focusing on a single machine with a pulsating brain, the rest of the room falling into darkness as lights begin to flicker and sparks fly from the surrounding equipment, in black and white high contrast, conveying a sense of danger and urgency."
]

I had to decide on a consistent style for the images. I played around with a few:

My partner P had the idea for 'film noir comic' (on the left), which I loved for its distinctive and stark contrast.

I used a small Python script to append these style guidelines ("film noir comic style, black and white, high contrast") to each prompt, then request those images from the OpenAI DALL-E API (which at the time of writing cost $0.04 an image).

from openai import OpenAI
from PIL import Image
from io import BytesIO
import os
import requests

api_key = "" # API KEY HERE

client = OpenAI(api_key=api_key)

def generate_images(prompts, api_key):

    save_path = "./"

    if not os.path.exists(save_path):
        os.makedirs(save_path)

    for index, prompt in enumerate(prompts):
        # Append the style to each prompt for consistency
        full_prompt = f"{prompt}, film noir comic style, black and white, high contrast"

        try:
            # Generate the image with the openai Python client
            response = client.images.generate(model="dall-e-3",
            prompt=full_prompt,
            size="1024x1024",
            quality="standard",
            n=1)

            # Extract the URL of the generated image
            image_url = response.data[0].url
            print(f"Image generated for prompt: {prompt} at URL: {image_url}")

            # Download and save the image
            image_data = requests.get(image_url)
            image = Image.open(BytesIO(image_data.content))
            image_save_path = os.path.join(save_path, f'{prompt}.png')
            image.save(image_save_path)
            print(f"Image generated and saved successfully for prompt: '{full_prompt}' at {image_save_path}")
        except Exception as e:
            print(f"Failed to generate or save image for prompt: '{full_prompt}'. Error: {str(e)}")

# List of prompts to generate images for
prompts = [

# Prompts go here

]

generate_images(prompts, api_key)

The images

Then I got a bunch of images to play with. I love the high contrast, comic book style:

From there, I uploaded these into Pika along with the text prompt to generate 3 second clips for each. Mostly it added some subtle motion with zooming and panning, occasionally it animated certain elements, like billowing smoke, or people walking. With more time I could have fine tuned it to get more interesting motion.

The images above are square, Pika let me scale that up to a 5:2 cinematic video:

The voices

This part was surprisingly easy using ElevenLabs. I simply uploaded the dialogue text for each character, chose a voice, and hit generate.

I also generated a few sound effects in Pika, but the quality is still a bit lacking. The majority of sounds I pulled royalty-free from Pixabay.

The editing

We now had all the pieces:

dialogue voiced with realistic sounding voices (well, at least the female voice, as you'll hear below)
~20 video clips of 3-7 seconds in length
sound effects

My brother used his editing skills to cut it all together in record speed. We had a round of feedback. Another burst of edits for the final cut, and finished the revised version 20 minutes before the deadline.

Going from idea to finished short in such a short timeframe was a great experience. This is one of the cases where I see AI tools removing the barriers to creativity and one of the reasons I'm optimistic for the future.

Here it is!