27 Apr 2024
D&D Art with DALL·E 3
In which I describe my attempts to beat a robot into submission.
A Rambling Introduction That Nobody Asked For
Generative AI is super trendy right now. Everyone is talking about it, and every single app or service has insisted on cramming some version of these tools into their platform. I’m not really sold on the idea that generative AI is the future of humanity. But, as annoyed as I am about them worming their way into every nook and cranny of the internet, I figure it’s at least worth learning how to use them properly, lest I one day turn into that guy who’s still flipping through encyclopedias at their local library while everyone else has moved on to significantly better options.
So I’ve decided it’s worth putting some effort into experimenting with these tools. My husband is doing it too - his experiments have mostly involved aggravating and ultimately fruitless arguments about the finer details of physics and mathematics. Unfortunately, generative AI doesn’t seem to handle nuance very well, and when you try to explain where it went wrong, it seems to listen to about ten words before it decides what its new answer is going to be.
My AI related goals are a little less lofty. I’m not trying to explore the mysteries of the universe, I just want to play pretend with my friends.
Of course, once you jump into the tabletop gaming hobby and start exploring online communities, it looks like everyone is waaaaay better at it than you. There are all these stories about the crazy shit their characters did, the snazzy puzzles their GMs created, and the cool encounters that inspired creative strategies at the table. There are professional actors playing the game publicly and of course their characters are all well-developed and well-acted. And the there’s the art. So. Much. Art. And it’s all beautiful and amazing.
How can a mere human like me hope to compete with all that and give her players the game they deserve?1 Enter generative AI. It’s nowhere near good enough to generate a whole high quality adventure for you on the fly. But it can be a pretty good sounding board when you’re stuck, or write you a pretty good paragraph of flavor-text if you’re not too picky about the details. And if you’re like me and have no artistic talent, AI generated images finally seem like they might be good enough to help you illustrate your world in those moments when your words just aren’t quite cutting it.
I feel obligated to point out that AI generated art is a little bit controversial right now. And I get it – there’s definitely something dystopian about having your entire body of work, which took time, blood, sweat, and tears to create, automatically vacuumed up without your permission and turned into a tool that, if silicon valley has its way, will put you out of business.
That being said, the cat’s kind of out of the bag. And, in my opinion anyway, a home D&D game is exactly the type of thing that benefits from these tools – it’s not mission critical and it’s not something I could afford to commission a real artist for anyway.
So, without further ado, here it is: my attempts at using DALL·E 3 to generate some D&D artwork.
Using DALL·E
A Simple Landscape
I started with something easy – a simple landscape without any crazy fantasy elements. I wanted a drawing of a single mountain in the middle of an open grassland.
In this case, I had already talking to Chat GPT about this mountain and what kinds of things my players might find there.
My d&d players are about to go to a single mountain in the the middle of the mountains called the Wolf’s Fang. Can you help me brainstorm what they might find there?
So after some discussion about possible treasures, old dead adventurers, war prophesies, and ancient forgotten rituals, I simply asked it:
Can you make a picture of the mountain?
And it did! It wasn’t what I had in mind, but that one is on me since I initially described it as a “single mountain in the the middle of the mountains”, which was definitely a typo on my part. So I corrected myself:
The mountain is a single mountain in the middle of the grasslands
This is where Chat GPT showed me one of its quirks: It’s not always sure when to have DALL·E create an image and when you are just expecting a text response. In this case it decided to take the middle ground; It implied that it was going to make me a picture without actually making one.
Got it! Let’s create a new image based on your description:
Description: A solitary mystical mountain, known as the Wolf’s Fang, rises dramatically from the middle of expansive grasslands in a fantasy setting. The mountain features sharp, jagged peaks that resemble the teeth of a giant wolf, distinct against a vast, open sky. Around the base of the mountain, the lush green grass provides a stark contrast to the rugged, rocky terrain of the peaks. This setting is designed to convey a sense of isolation and significance, making the mountain a central, mystical landmark in a wide-open landscape.
Now, in this case, it was probably a good thing that DALL·E didn’t spend the compute power trying to create another image. I could immediately tell from the text description that it was on the right track, but still didn’t quite understand exactly what I was going for. I didn’t need a dark, scary, jagged mountain range. I just needed a single, cheerful, peak in the middle of an otherwise flat landscape. So I told Chat GPT:
The mountain doesn’t need to have sharp jagged peaks.
Somehow this was enough to kick of the image generator, which produced a pretty solid picture. Probably this was useable enough as is, but since I was experimenting anyway, it made sense to try to get as close as possible to my mental image. Chat GPT offers a handy little feature where you can highlight a part of the image that you want to change and then offer some verbal description of what updates to make. So I highlighted most of the grass and the river in the foreground and asked Chat GPT:
Can you add some small hills and remove the river?
And this is where it had some problems. The portion of the image that I did not highlight stayed exactly the same, but portion that I did highlight was completely changed, and no longer felt like it belonged in the picture. And the seam between the different portions of the image was pretty clearly visible.
I made a couple more attempts to correct this version of the image, but eventually gave up – none of the changes that were being made really corrected the problems and the seam was clearly visible the whole time. I eventually went back to the original image and just asked it to remove the river and make the clouds a little less foreboding. The final result was definitely good enough for me. But then again, the concept of a mountain in the middle of the grasslands is probably simple enough that verbal descriptions are enough to get the idea across even without the artwork, so it’s unclear that the effort spent trying to get the image right was worth it. Overall I would call simple landscape generation a mild success.
A Fun In-Game Moment
With landscapes out of the way, I decided to take it up a notch and try to capture something highly specific. Something that could really benefit from the thousand words a picture is allegedly worth. In this case, I settled on trying to get DALL·E to draw a funny little scene that came in the middle of our game.
A couple of sessions ago, the party was trying to figure out how to swim through the reservoir of a well and find a secret tunnel that eventually led to a hidden underground monastery. Being a relatively low level party, they were running into some logistical constraints involving underwater light sources, so the druid decided to turn into an anglerfish and become her own little swimming light source. The party was finally able to find the tunnel and found themselves, dripping wet, in front of a pair of double doors that were dimly illuminated by this tiny little fish.
So I tried to describe this scene to Chat GPT.
Can you draw the following picture: A githyanki wizard, tiefling barbarian, tiefling ranger, and human cleric, and dragonborn paladin all stand in front of a large pair of underground stone double doors, dripping wet. The wizard is holding up a metal pot containing an anglerfish, which is providing light. To the left of the double doors is a large statue of a fox holding a pen as a sword and a book as a shield. To the right is a large statue of a phoenix holding a pen as a sword and a book as a sheild.
The result was an okay start, although it did completely ignore a large portion of my instructions. For starters, none of the player races were correct. There were some strange light sources being held by the players, but none of them were specifically the anglerfish in a bowl that I asked for. There were two statues, one of which was kind of fox-like, and the other of which was some kind of winged creature, but they were not in the correct positions or holding the correct things.
I went through several iterations of trying to get this picture to align more closely to what I described. I pretty quickly gave up on the player races since I wanted the players to be facing the door anyway, so fudging those a little bit wasn’t such a big deal. I mostly focussed on trying to get it so that the characters were facing the door, one character was holding an anglerfish, and the statues were correct.
To generate each image, I asked for a list of corrections to the image that came before it. I did not make use of the highlighting specific parts of the picture feature, since in every case I wanted major changes to the whole picture.
Getting the players to face the door was surprisingly difficult. I had to be super explicit for it to understand what I actually wanted:
Can you update the picture so the adventurers are fully facing the door, so that their backs are to the viewer of the picture.
It also really struggled with the anglerfish. The best result came from giving the AI some kind of definition to work with:
Also the wizard is still not holding an anglerfish. An anglerfish is a fish with an antennae that has a small light at the end.
That seemed to get the point across once, but then DALL·E lost the thread again in later versions of the picture.
The statues never came out as described either – DALL·E did not seem to understand what a phoenix was, and the idea of using a pen as a sword and a book as a shield was maybe not quite literal enough for it to understand. Some of the problems seemed to be related to being unable to consistently match descriptions up with what they were describing. Multiple images contain a party member that is a fox rather than a statue that is a fox.
A final issue is that even once DALL·E kind of understood the concept of an anglerfish, it refused to draw one in someone’s hands. My husband finally suggested that I give up trying to get it to draw the whole scene and just ask it to draw a single person holding up an anglerfish in a bowl. So in a separate chat window, I asked it:
Can you draw a picture of a person holding up a bowl containing an anglerfish as a light source.
Character Art
At this point I’m pretty convinced that DALL·E can draw generic landscapes pretty well, but that it can’t yet handle highly specific and detailed scenes with multiple characters and props. The obvious next question is how well it can handle something in between. So I jumped into something that every D&D player with no artistic talent has grappled with at one time or another: character art!
Character art is definitely more simple than the scene described above: there’s only one person that it needs to get right, and while you often have some ideas about how the character is posing and what is going on in the background, the specifics are usually less important. But it can still be a challenge to find the right art to showcase your character from existing sources. Even with the vast amounts of character art available online, many people have a very specific idea about a couple of unique aspects of their character’s appearance and personality that are not quite captured by existing art.
So I gave it a try with a character that my players were about to meet.
Can you draw me an elf with antlers sleeping besides an easel where a magical paintbrush is making a painting completely on its own? A cat is watching the paintbrush
I really like the picture I got out of this, and will almost certainly use her as inspiration for a different character in my game. But it was not what I was looking for here. My main objection is that DALL·E completely failed to follow my main instructions of having the character sleeping while a paintbrush paints on it’s own. I also had a male character in mind, although I can’t blame DALL·E for that since I didn’t specify.
So I tried again:
That’s pretty good! Can you make the following changes? The setting should be outside, looking out at a flat prairie. The elf should be a man, and should be sleeping. The paintbrush should be floating in the air and painting on it’s own. The cat should be sleeping.
The resulting image was almost exactly what I asked for, with the notable exception of the floating paintbrush. There was a floating something, but it doesn’t really look like a paintbrush. In addition, the character was still not quite what I had in mind, so I went through a few more iterations of this image, mostly using the highlighting tool to highlight some specific aspect that I wanted to change (e.g. just the character’s face or just the painting).
At this point it was looking pretty good, although the floating paintbrush was still giving it a hard time. However, I decided I wasn’t really vibing with the “photorealistic” style, and asked Chat GPT to give it more of a cartoony feel. But once I changed the whole image, I had an entirely new set of minor complaints and had to go through the process of iterating on that image all over again.
Overall I’m pretty impressed. Sure, there are definitely still mistakes. In the second image, I asked DALL·E “add a floating paintbrush” and it completely messed up the easel. When I asked DALL·E to change out the greenish object the elf was leaning on for a tree, it took a couple of tries before it looked correct. And about halfway through, when I decided the subject of the painting should be a mountain rather than open prairie, it was pretty easy to get the painting updated but pretty difficult to get the mountain to show up in the background of the landscape. But the end result was a pretty good portrait of the character that closely matched the overall vibe I was going for.
I do want to highlight a couple of particularly funny quirks though:
In one iteration I highlighted the little mountains showing up way back in the background of the image and asked if they could be removed. However, since the mountains are behind the elf, I also included a portion of the his face in the highlighted region. DALL·E did remove the mountains as requested, but the replacement was not very good. And because the face was highlighted, DALL·E decided that I wanted it changed even though I had not mentioned anything about it in the text. And the change it landed on was giving the man eyes that were red and looked pretty angry.
I also tried several times to get DALL·E to include a floating paintbrush. But whatever systems are in-place to help the AI generate more realistic images also seem to prevent it from generating a magical paintbrush that paints all by itself. In an attempt to get it to ignore some of those systems, I finally said:
There’s still no paintbrush there. There should be a flying paintbrush that is painting the picture on it’s own. I know that is unrealistic, but this is a fantasy setting.
I finally got my floating paintbrush, but in removing some of those “realism checks”, I also ended up with a misshapen cat, a two-sided paintbrush, and a painting that was a little bit odd. I ended up giving up on the paintbrush and manually editing one in after the fact.
Some Final Thoughts
Overall, I’d say the experiment was a moderately successful. Assuming that your picture doesn’t have too many interacting characters and objects it seems that DALL·E can usually follow instructions well enough to generate a picture that more or less matches what is in your head, and probably does so better than a random image you pulled from a Google images search.
It’s definitely not a quick fix though, and only works when you have a general idea with only a few specific details. For something generic like a landscape, you can probably get what you want pretty quickly, but with character art, it might take several iterations to get what you’re looking for, especially if you’re picky. And you will probably find yourself repeating the same instructions over and over again to no avail. There are certain things, especially unusual fantasy elements like a floating paintbrush or an anglerfish being held in a fishbowl, that it just cannot seem to figure out.
-
For anyone identifying just a little bit too much with this, try to remember that everyone posting on social media is only talking about their best moments. It’s easy to forget when you’re scrolling, but ninety percent of most people’s sessions are normal, boring-ass, sessions just like yours and mine. If you and your friends are having fun, you’re doing a good job. ↩︎